W.R. Wilcox, Clarkson University,
November 2006
Explanation of results returned by the
Regression tool
in Excel’s Data Analysis
The Regression tool in Excel’s Data Analysis determines the coefficients (ai) that yield the smallest residual sum of squares (see below), which is equivalent to the greatest correlation coefficient squared, R2, for Equation (1). This is known as linear regression analysis.
y = a0 + a1x1 + a2x2 + a3x3 + …. (1)
where y is the dependent variable (response), a0 is the intercept, and x1, x2, x3 etc. are the independent variables (factors). It is assumed that you have n observations of y versus different values of xi. Note that the xi can be functions of the actual experimental variables, including products of different variables. Following are the meanings of the results produced by Excel:
· R Square = (Multiple R)2 = R2 = 1 - Residual SS / Total SS = Regress SS / Total SS
(roughly the fraction of the variation in y that is explained by equation 1).
· R = correlation coefficient
· Adjusted R Square = 1 - (Total df / Residual df)(Residual SS / Total SS)
· Standard Error = (Residual MS)0.5
ANOVA = ANalysis
Of VAriance
· Regression df = regression degrees of freedom = number of independent variables (factors) in equation 1. See, for example, http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29
· Regression SS = Total SS - Residual SS
· Regression MS = Regression SS / Regression df
· Regression F = Regression MS / Residual MS
· Significance F = FDIST(Regression F, Regression df, Residual df) = Probability that equation (1) does NOT explain the variation in y. This is based on the F probability distribution. If it’s not less than 0.1 (10%) you do not have a meaningful correlation.
· Residual df = residual degrees of freedom = Total df - Regression df = n - 1 - number of independent variables (xi)
· Residual SS = sum of squares of the differences between the values of y predicted by equation 1 and the actual values of y. If the data exactly fit equation 1, then Residual SS would be 0 and R2 would be 1.
· Residual MS = mean square error = Residual SS / Residual df
· Total df = total degrees of freedom = n – 1
· Total SS = the sum of the squares of the differences between values of y and the average y
= (n-1)*(standard deviation of y)2
· Coefficients = values of ai which minimize the Residual SS (maximize R2). The Intercept Coefficient is a0 in equation 1.
· Standard error = (Residual MS using only the Coefficient for that row)0.5
· t Stat = Coefficient for that variable / Standard error for that variable
· P-value = TDIST(|t Stat|, Residual df, 2) = the Student’s t distribution two-tailed probability
If one divides this by 2, it is the probability that the true value of the coefficient has the opposite sign to that found. You want probability to be small in order to be sure that this variable really influences y, certainly less than 0.1 (10%).
· There is a 95% probability that the true value of the coefficient lies between the Lower 95% and Upper 95% values. The probability is 2.5% that it lies below the lower value, and 2.5% that it lies above. The narrower this range the better.
See any statistics textbook for more detailed information and fundamental understanding. Comments and suggestions on this page would be greatly appreciated: wilcox@clarkson.edu
Return to Excel tutorials
Last modified December 18, 2006