Interactions in multiple regression

Box 6.5 Collinearity

6.1.12 Interactions in multiple regression

The multiple regression model we have been using so far is an additive one, i.e. the effects of the pre- dictor variables on Y are additive. In many biolog- ical situations, however, we would anticipate interactions between the predictors (Aiken & West

1991, Jaccard et al. 1990) so that their effects on Y are multiplicative. Let’s just consider the case with two predictors, X₁and X₂. The additive multiple linear regression model is:

yi⫽␤₀⫹␤₁x_i₁⫹␤₂x_i₂⫹␧i (6.17) This assumes that the partial regression slope of Y on X₁ is independent of X₂ and vice-versa. The multiplicative model including an interaction is:

yi⫽␤₀⫹␤₁x_i₁⫹␤₂x_i₂⫹␤₃x_i₁x_i₂⫹␧i (6.18) The new term (␤₃x_i₁x_i₂) in model 6.18 represents the interactive effect of X₁and X₂on Y. It measures the dependence of the partial regression slope of Y against X₁on the value of X₂and the dependence of the partial regression slope of Y against X₂on the value of X₁. The partial slope of the regression of Y against X₁is no longer independent of X₂and vice versa. Equivalently, the partial regression slope of

Yagainst X₁is different for each value of X₂. Using the data from Paruelo & Lauenroth (1996), model 6.2 indicates that we expect no interaction between latitude and longitude in their effect on the relative abundance of C₃plants. But what if we allow the relationship between C₃ plants and latitude to vary for different longi- tudes? Then we are dealing with an interaction between latitude and longitude and our model becomes:

(relative abundance of C₃grasses)i⫽␤₀⫹ ␤1(latitude)i⫹␤2(longitude)i⫹

␤3(latitude)i⫻(longitude)i⫹␧i (6.19)

One of the difﬁculties with including interaction terms in multiple regression models is that lower- order terms will usually be highly correlated with their interactions, e.g. X₁and X₂will be highly cor- related with their interaction X₁X₂. This results in 130 MULTIPLE AND COMPLEX REGRESSION

Table 6.2 Expected values of mean squares from analysis of variance for a multiple linear regression model with two predictor variables

Mean square Expected value

MS_Regression r_e2⫹ MS_Residual r_e2 b2 1

兺

n i⫽1 (xi1⫺x¯1)2⫹b22

兺

n i⫽1 (xi2⫺x¯2)⫹2b1b2

兺

n i⫽1 (xi1⫺x¯1)(xi2⫺x¯2) 2

all the computational problems and inﬂated variances of estimated coefﬁcients associated with collinearity (Section 6.1.11). One solution to this problem is to rescale the predictor variables by centering, i.e. subtracting their mean from each observation, so the interaction is then the product of the centered values (Aiken & West 1991, Neter et

al. 1996; see Box 6.1 and Box 6.2). If X₁and X₂are centered then neither will be strongly correlated with their interaction. Predictors can also be standardized (subtract the mean from each observation and divide by the standard deviation) which has an identical affect in reducing collinearity.

When interaction terms are not included in the model, centering the predictor variables does not change the estimates of the regression slopes nor hypothesis tests that individual slopes equal zero. Standardizing the predictor variables does change the value of the regression slopes, but not their hypothesis tests because the standardization affects the coefﬁcients and their standard errors equally. When interaction terms are included, centering does not affect the regression slope for the highest-order interaction term, nor the hypothesis test that the interaction equals zero. Standardization changes the value of the regression slope for the interaction but not the hypothesis test. Centering and standardization change all lower-order regression slopes and hypothesis tests that individual slopes equal zero but make them more interpretable in the presence of an interaction (see below). The method we will describe for further examining interaction terms using simple slopes is also unaffected by centering but is affected by standardizing predictor variables.

We support the recommendation of Aiken & West (1991) and others that multiple regression models with interaction terms should be ﬁtted to data with centered predictor variables. Standardization might also be used if the variables have very different variances but note that calculation and tests of simple slopes must then be based on analyzing standardized variables but using the unstandardized regression coefﬁcients (Aiken & West 1991).

Probing interactions

Even in the presence of an interaction, we can still interpret the partial regression slopes for other

terms in model 6.18. The estimate of ␤₁ deter- mined by the OLS ﬁt of this regression model is actually the regression slope of Y on X₁when X₂is zero. If there is an interaction (␤₃does not equal zero), this slope will obviously change for other values of X₂; if there is not an interaction (␤₃ equals zero), then this slope will be constant for all levels of X₂. In the presence of an interaction, the estimated slope for Y on X₁when X₂is zero is not very informative because zero is not usually within the range of our observations for any of the predictor variables. If the predictors are centered, however, then the estimate of ␤₁is now the regres- sion slope of Y on X₁for the mean of X₂, a more useful piece of information. This is another reason why variables should be centered before ﬁtting a multiple linear regression model with interaction terms.

However, if the ﬁt of our model indicates that interactions between two or more predictors are important, we usually want to probe these interactions further to see how they are structured. Let’s express our multiple regression model as relating the predicted y_ito two predictor variables and their interaction using sample estimates:

ˆi⫽b₀⫹b₁x_i₁⫹b₂x_i₂⫹b₃x_i₁x_i₂ (6.20) This can be algebraically re-arranged to:

ˆi⫽(b₁⫹b₃x_i₂)x_i₁⫹(b₂x_i₂⫹b₀) (6.21) We now have (b₁⫹b₃x_i₂), the simple slope of the regression of Y on X₁for any particular value of X₂ (indicated as x_i₂). We can then choose values of X₂ and calculate the estimated simple slope, for either plotting or signiﬁcance testing. Cohen & Cohen (1983) and Aiken & West (1991) suggested using three different values of X₂: x¯₂, x¯₂⫹s, x¯₂⫺s,

where s is the sample standard deviation of X₂. We can calculate simple regression slopes by substi- tuting these values of X₂into the equation for the simple slope of Y on X₁.

The H₀that the simple regression slope of Y on

X₁for a particular value of X₂equals zero can also be tested. The standard error for the simple regression slope is:

(6.22) where s₁₁2 _{and s}2

33 are the variances of b1 and b3

respectively, s2

13is the covariance between b1and b3

兹s2

11⫹ 2x2s213⫹ x22s233

and x₂is the value of X₂chosen. The variance and covariances are obtained from a covariance matrix of the regression coefﬁcients, usually standard output for regression analyses with most software. Then the usual t test is applied (simple slope divided by standard error of simple slope). Fortunately, simple slope tests can be done easily with most statistical software (Aiken & West 1990, Darlington 1990). For example, we use the follow- ing steps to calculate the simple slope of Y on X₁ for a speciﬁc value of X₂, such as x¯₂⫹s.

1. Create a new variable (called the condi- tional value of X₂, say CVX₂), which is x_i₂minus the speciﬁc value chosen.

2. Fit a multiple linear regression model for Y on X₁, CVX₂, X₁by CVX₂.

3. The partial slope of Y on X₁from this model is the simple slope of Y on X₁for the speciﬁc value of X₂chosen.

4. The statistical program then provides a standard error and t test.

This procedure can be followed for any condi- tional value. Note that we have calculated simple slopes for Y on X₁ at different values of X₂. Conversely, we could have easily calculated simple slopes for Y on X₂at different values of X₁.

If we have three predictor variables, we can have three two-way interactions and one three- way interaction:

yi⫽␤₀⫹␤₁x_i₁⫹␤₂x_i₂⫹␤₃x_i₃⫹␤₄x_i₁x_i₂⫹

␤5xi1xi3⫹␤6xi2xi3⫹␤7xi1xi2xi3⫹␧i ( 6.23)

In this model, ␤₇is the regression slope for the three-way interaction between X₁, X₂and X₃and measures the dependence of the regression slope of Y on X₁on the values of different combinations of both X₂and X₃. Equivalently, the interaction is the dependence of the regression slope of Y on X₂ on values of different combinations of X₁and X₃ and the dependence of the regression slope of Y on X₃on values of different combinations of X₁ and X₂. If we focus on the ﬁrst interpretation, we can determine simple regression equations for Y on X₁at different combinations of X₂and X₃using sample estimates:

ˆi⫽(b₁⫹b₄x_i₂⫹b₅x_i₃⫹b₇x_i₂x_i₃)x_i₁⫹

(b₂x_i₂⫹b₃x_i₃⫹b₆x_i₂x_i₃⫹b₀) (6.24)

Now we have (b₁⫹b₄x_i₂⫹b₅x_i₃⫹b₇x_i₂x_i₃) as the simple slope for Y on X₁for speciﬁc values of X₂ and X₃together. Following the logic we used for models with two predictors, we can substitute values for X₂ and X₃ into this equation for the simple slope. Aiken & West (1991) suggested using

¯₂and x¯₃and the four combinations of x¯₂⫾sx

2and

x ¯₃⫾sx

3. Simple slopes for Y on X2or X3can be cal-

culated by just reordering the predictor variables in the model. Using the linear regression routine in statistical software, simple slopes, their stan- dard errors and t tests for Y on X₁at speciﬁc values of X₂and X₃can be calculated.

1. Create two new variables (called the condi- tional values of X₂and X₃, say CVX₂and CVX₃), which are x_i₂and x_i₃minus the speciﬁc values chosen.

2. For each combination of speciﬁc values of

X₂and X₃, ﬁt a multiple linear regression model for Y on X₁, CVX₂, CVX₃, X₁by CVX₂, X₁by CVX₃, CVX₂by CVX₃, and X₁by CVX₂by CVX₃.

3. The partial slope of Y on X₁from this model is the simple slope of Y on X₁for the chosen speciﬁc values of X₂and X₃.

With three or more predictor variables, the number of interactions becomes large and they become more complex (three-way interactions and higher). Incorporating all possible interactions in models with numerous predictors becomes unwieldy and we would need a very large sample size because of the number of terms in the model. There are two ways we might decide which interactions to include in a linear regression model, especially if our sample size does not allow us to include them all. First, we can use our biological knowledge to predict likely interactions and only incorporate this subset. For the data from Loyn (1987), we might expect the relationship between bird density and grazing to vary with area (grazing effects more important in small fragments?) and years since isolation (grazing more important in new fragments?), but not with dis- tance to any forest or larger fragments. Second, we can plot the residuals from an additive model against the possible interaction terms (new variables formed by simply multiplying the predictors) to see if any of these interactions are related to variation in the response variable.

There are two take-home messages from this section. First, we should consider interactions between continuous predictors in multiple linear regression model because such interactions may be common in biological data. Second, these interactions can be further explored and interpreted using relatively straightforward statistical tech- niques with most linear regression software.

In document Experimental Design and Data Analysis for Biologists - Quinn & Keough - Cambridge 2002 (Page 150-153)