I'm using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (linear, quadratic, etc.).

This much works, but I also want to calculate r (coefficient of correlation) and r-squared(coefficient of determination). I am comparing my results with Excel's best-fit trendline capability, and the r-squared value it calculates. Using this, I know I am calculating r-squared correctly for linear best-fit (degree equals 1). However, my function does not work for polynomials with degree greater than 1.

Excel is able to do this. How do I calculate r-squared for higher-order polynomials using Numpy?

Here's my function:

```
import numpy
# Polynomial Regression
def polyfit(x, y, degree):
results = {}
coeffs = numpy.polyfit(x, y, degree)
# Polynomial Coefficients
results['polynomial'] = coeffs.tolist()
correlation = numpy.corrcoef(x, y)[0,1]
# r
results['correlation'] = correlation
# r-squared
results['determination'] = correlation**2
return results
```

From the numpy.polyfit documentation, it is fitting linear regression. Specifically, numpy.polyfit with degree 'd' fits a linear regression with the mean function

E(y|x) = p_d * x**d + p_{d-1} * x **(d-1) + ... + p_1 * x + p_0

So you just need to calculate the R-squared for that fit. The wikipedia page on linear regression gives full details. You are interested in R^2 which you can calculate in a couple of ways, the easisest probably being

```
SST = Sum(i=1..n) (y_i - y_bar)^2
SSReg = Sum(i=1..n) (y_ihat - y_bar)^2
Rsquared = SSReg/SST
```

Where I use 'y_bar' for the mean of the y's, and 'y_ihat' to be the fit value for each point.

I'm not terribly familiar with numpy (I usually work in R), so there is probably a tidier way to calculate your R-squared, but the following should be correct

```
import numpy
# Polynomial Regression
def polyfit(x, y, degree):
results = {}
coeffs = numpy.polyfit(x, y, degree)
# Polynomial Coefficients
results['polynomial'] = coeffs.tolist()
# r-squared
p = numpy.poly1d(coeffs)
# fit values, and mean
yhat = p(x) # or [p(z) for z in x]
ybar = numpy.sum(y)/len(y) # or sum(y)/len(y)
ssreg = numpy.sum((yhat-ybar)**2) # or sum([ (yihat - ybar)**2 for yihat in yhat])
sstot = numpy.sum((y - ybar)**2) # or sum([ (yi - ybar)**2 for yi in y])
results['determination'] = ssreg / sstot
return results
```

A very late reply, but just in case someone needs a ready function for this:

i.e.

```
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)
```

as in @Adam Marples's answer.

Licensed under: CC-BY-SA with attribution

Not affiliated with: Stack Overflow