Case Study Materials (ch.4)
F.2 Educational Tool
Educational Information Handout
E. Nystrom, J. L. Sharp, W. Bridges, P. Gerard, and C. Gallagher
The Linear Regression Model
The most basic linear regression model (LM) is the simple linear model (Eq.(1)), which equates the ithresponse (Yi) to a linear function of a single predictor (X1i) and a random error term (i):
(1) Yi= β0+ β1X1i+ i, for i = 1, 2, ..., n, where 1, ..., niid
∼ N(0, σe2).
A basic two-predictor LM is given by
Yi= β0+ β1X1i+ β2X2i+ i, for i = 1, 2, ..., n, where iiid
∼ N(0, σ2e),
and a two-predictor LM with an interaction term (X1X2) is given by
Yi= β0+ β1X1i+ β2X2i+ β3X1iX2i+ i, for i = 1, 2, ..., n, where 1, ..., n
iid∼ N(0, σ2e).
The k-predictor LM (Eq.(2)) is given by
(2) Yi= β0+ β1X1i+ ... + βkXki+ i, for i = 1, 2, ..., n, where 1, ..., n
iid∼ N(0, σ2e),
and the corresponding estimated linear regression equation is given by
Ybi= bβ0+ bβ1X1i+ ... + bβkXki, for i = 1, 2, ..., n,
where bYidenotes the predicted response, and bβ0, bβ1, ..., bβkdenote the estimated regression coefficients.
Residuals
For each LM, the errors are assumed to be independent and normally distributed with constant vari-ance, and the predictors are assumed to be linearly related to the response. Although the errors are un-observed, the LM assumptions can be checked using the estimated errors. A residual (e) is defined as the difference between the observed and the predicted response value and is used to estimate the error () in the LM:
ei= Yi− bYi, for i = 1, 2, ..., n.
A standardized residual (e(s)) is derived by dividing the residual by its standard error:
e(s)i = ei
SE(ei), for i = 1, 2, ..., n.
A partial residual (e(Xm|X1,...,Xk)) is a residual from an estimated regression equation that uses a candidate predictor (Xm) as the response regressed on the other predictors (X1, ..., Xk) that are included in the LM for Y (i.e., bXmi=bγ0+Pk
j=1bγjXji). When the simple LM (Eq.(1)) is estimated using predictor X1, and X2is a candidate predictor, the partial residuals are given by
e(X2|X1),i= X2i− bX2i= X2i− (bγ0+bγ1X1i) , for i = 1, 2, ..., n.
Residual Plots
Residual plots are graphs (usually scatterplots) that display a type of residual and are commonly used to check model assumptions. Typically, the residual appears on the vertical axis, and the fitted response ( bY ) or a predictor is on the horizontal axis. Note that when a simple LM is fit, the residual versus bY scatterplot and the residual versus X1scatterplot are identical up to linear transformation. Candidate predictors (or added variables) that are not yet included in the model may also be used in residual plots. The partial residual plot (or added variable plot) is a special type of residual plot that displays the residuals on the vertical axis and the partial residuals on the horizontal axis (Fig.4).
When the LM assumptions are satisfied, the residual plots should have random scatter. Nonrandom scatter (patterns) in residual plots may indicate that an error assumption has been violated or that the linear predictor in the current model (the model upon which the estimated/fitted equation is based) is misspecified.
The residual versus bY scatterplot may be used to check the constant variance assumption. A pattern in this plot may indicate of nonconstant error variance. When the response needs to be transformed, a funnel pattern occurs, and the vertical spread (density) of the residuals within the funnel stretches across the plot from left to right (Fig.1, (a)). When an interaction term is missing (left out of the fitted model), a bowtie or partial bowtie pattern occurs, and the spacing of residuals within the pattern is fairly even throughout the plot (Fig.1, (b)).
The residual versus predictor (Xj) scatterplot may be used to check the appropriateness of Xj’s functional relationship in the fitted model (the linearity assumption) and to check the independent error terms assumption. For example, a quadratic pattern in this scatterplot suggests that Xj2be considered as a predictor of Y instead of, or in addition to Xj(Fig.2, (b)). A residual versus candidate predictor scatterplot may be used to check whether a candidate predictor should be added to the current model. Random scatter supports leaving out the candidate (Xm), whereas a nonrandom pattern supports adding Xm(or a function of Xm, such as Xm2) to the current model.
Comparing the residual versus candidate predictor scatterplot to the corresponding partial residual plot may help to distinguish cases where the candidate predictor is correlated with a predictor that is
2
EDUCATIONAL INFORMATION Nystrom, et al. 3
Figure 1. Comparing Patterns in Residual Plots to Check for Nonconstant Variance. (a) A funnel pattern may indicate that the response (Y ) needs to be transformed. (b) A bowtie or partial bowtie pattern may indicate that an interaction term is missing.
●
Fitted Response, Y
Residual ●
Fitted Response, Y
Residual
Figure 2. Comparing Patterns in Residual versus Predictor (X1) Scatterplots to Check the Linearity Assumption: (a) Random scatter suggest that X1is correctly related to Y in the fitted model. (b) A functionlike pattern (e.g., quadratic) suggests that the functional relationship of X1to Y is misspecified in the fitted model (e.g., X12should be considered as a predictor of Y instead of, or in addition to X1). The bowtie (c) and the curved bowtie (d) patterns here are indicative of a missing interaction term involving X1and another variable (X2). When X1and X2are correlated, the missing interaction (X1X2) is related to X12, resulting in curvature in (c).
●
Predictor, X1
Residual
Predictor, X1
Residual
Predictor, X1
Residual
Predictor, X1
Residual
already included in the fitted model. When a simple LM with predictor X1is estimated, if X1and X2are uncorrelated, the partial residual plot is a nearly identical to the residual versus X2scatterplot (Fig.3, (a)).
On the other hand, when the shapes of the two plots are noticeably different, correlation between X1and X2is suspected (Fig.3, (b)).
3
Figure 3. Using (Candidate Residual Plot, Partial Residual Plot) Pairs to Detect Corre-lation of the Candidate Predictor (X2) with the Included Predictor (X1): (a) When X1and X2are uncorrelated, the partial residual plot is nearly identical to the residual plot. (b) When X1and X2are correlated, the partial residual plot is noticeably different than the residual plot.
Candidate, X2
Residual, e
Candidate, X2
Residual, e
Figure 4. Partial Residual Plot: The partial residual plot (or added variable plot) displays the residuals (ei) on the vertical axis and the partial residuals (e(Xm|X1,...,Xk),i) on the horizontal axis. The example shown here is the partial residual plot when X2is the candidate predictor and the fitted model is the simple LM with response Y and predictor X1. The random scatter in this example indicates that X2does not add much explanatory value to the current model, which already includes X1.
●
Partial Residual Plot
Partial Residual, e( X2 | X1 )
Residual, e
Additional Resources
(1) Greene, W. (2003). Econometric Analysis, 5th ed. Upper Saddle River: Pearson.
(2) Kutner, M., Nachtsheim, C., J. Neter, and W. Li. (2005). Diagnostics and Remedial Measures.
Applied Linear Statistical Models, 5th ed. (pp. 100-153). Boston: McGraw-Hill.
(3) Montgomery, D. Peck, E., and G. Vining. (2013). Model Adequacy Checking. Introduction to Linear Regression Analysis, 5th ed. (pp. 129-170). Chicester: Wiley & Sons.
4