W.R. Wilcox, Clarkson University, October 2010
Explanation of results returned by the Regression tool
in Excel’s Data Analysis
The Regression tool in Excel’s Data Analysis determines the coefficients (ai) that yield the smallest residual sum of squares (see below), which is equivalent to the greatest correlation coefficient squared, R2, for Equation (1). This is known as linear regression analysis.
y = a0 + a1x1 + a2x2 + a3x3 + …. (1)
where y is the dependent variable (response), a0 is the intercept, and x1, x2, x3 etc. are the independent variables (factors). It is assumed that you have n observations of y versus different values of xi. Note that the xi can be functions of the actual experimental variables, including products of different variables. Following are the meanings of the results produced by Excel:
· R Square = (Multiple R)2 = R2 = 1 - Residual SS / Total SS = Regress SS / Total SS
(roughly the fraction of the variation in y that is explained by equation 1).
· R = correlation coefficient
· Adjusted R Square = 1 - (Total df / Residual df)(Residual SS / Total SS)
· Standard Error = (Residual MS)0.5
· Note that “Error” does not mean there’s a mistake or an experimental error. It’s just a definition related to residuals. This word is used a lot in statistical analysis, and is misunderstood by those who don’t know statistics. Unfortunately, this might be your boss.
ANOVA = ANalysis Of VAriance
· Regression df = regression degrees of freedom = number of independent variables (factors) in equation 1. See, for example, http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29
· Regression SS = Total SS - Residual SS
· Regression MS = Regression SS / Regression df
· Regression F = Regression MS / Residual MS
· Significance F = FDIST(Regression F, Regression df, Residual df) = Probability that equation (1) does NOT explain the variation in y, i.e. that any fit is purely by chance. This is based on the F probability distribution. If the Significance F is not less than 0.1 (10%) you do not have a meaningful correlation.
· Residual df = residual degrees of freedom = Total df - Regression df = n - 1 - number of independent variables (xi)
· Residual SS = sum of squares of the differences between the values of y predicted by equation 1 and the actual values of y. If the data exactly fit equation 1, then Residual SS would be 0 and R2 would be 1.
· Residual MS = mean square error = Residual SS / Residual df
· Total df = total degrees of freedom = n – 1
· Total SS = the sum of the squares of the differences between values of y and the average y
= (n-1)*(standard deviation of y)2
· Coefficients = values of ai which minimize the Residual SS (maximize R2). The Intercept Coefficient is a0 in equation 1.
· Standard error = (Residual MS using only the Coefficient for that row)0.5
· t Stat = Coefficient for that variable / Standard error for that variable
· P-value = TDIST(|t Stat|, Residual df, 2) = the Student’s t distribution two-tailed probability
If one divides this by 2, it is the probability that the true value of the coefficient has the opposite sign to that found. You want this probability to be small in order to be sure that this variable really influences y, certainly less than 0.1 (10%). If near 50%, try another fit with this variable left out.
· There is a 95% probability that the true value of the coefficient lies between the Lower 95% and Upper 95% values. The probability is 2.5% that it lies below the lower value, and 2.5% that it lies above. The narrower this range the better. If the lower value is negative and the upper value positive, try correlating the data with this variable left out. If the resulting R2 and Significance F are either improved or little changed, then the data do not support inclusion of that variable in your correlating equation.
See any statistics textbook for more detailed information and fundamental understanding. Comments and suggestions on this page would be greatly appreciated: email@example.com
Return to Excel tutorials
Last modified October 27, 2010