Calculating Pearson correlation and significance in Python


I am looking for a function that takes as input two lists, and returns the Pearson correlation, and the significance of the correlation.

11/22/2014 7:18:38 AM

You can have a look at scipy.stats:

from pydoc import help
from scipy.stats.stats import pearsonr

Help on function pearsonr in module scipy.stats.stats:

pearsonr(x, y)
 Calculates a Pearson correlation coefficient and the p-value for testing

 The Pearson correlation coefficient measures the linear relationship
 between two datasets. Strictly speaking, Pearson's correlation requires
 that each dataset be normally distributed. Like other correlation
 coefficients, this one varies between -1 and +1 with 0 implying no
 correlation. Correlations of -1 or +1 imply an exact linear
 relationship. Positive correlations imply that as x increases, so does
 y. Negative correlations imply that as x increases, y decreases.

 The p-value roughly indicates the probability of an uncorrelated system
 producing datasets that have a Pearson correlation at least as extreme
 as the one computed from these datasets. The p-values are not entirely
 reliable but are probably reasonable for datasets larger than 500 or so.

 x : 1D array
 y : 1D array the same length as x

 (Pearson's correlation coefficient,
  2-tailed p-value)

4/5/2017 6:44:25 PM

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow