• No results found

Box 6.7 Worked example of indicator (dummy) variables We will consider a subset of the data from Loyn (1987) where abundance of forest

birds is the response variable and grazing intensity (1 to 5 from least to greatest) and log10patch area are the predictor variables. First, we treat grazing as a contin- uous variable and fit model 6.28.

Coefficient Estimate Standard error t P

Intercept 21.603 3.092 6.987 ⬍0.001 Grazing ⫺2.854 0.713 ⫺4.005 ⬍0.001 Log10area 6.890 1.290 5.341 ⬍0.001

Note that both the effects of grazing and log10area are significant and the partial regression slope for grazing is negative, indicating that, holding patch area constant, there are fewer birds in patches with more intense grazing.

Now we will convert grazing into four dummy variables with no grazing (level 1) as the reference category (Table 6.3) and fit model 6.29.

Estimate Standard error t P

Intercept 15.716 2.767 5.679 ⬍0.001 Grazing1 0.383 2.912 0.131 0.896 Grazing2 ⫺0.189 2.549 ⫺0.074 0.941 Grazing3 ⫺1.592 2.976 ⫺0.535 0.595 Grazing4 ⫺11.894 2.931 ⫺4.058 ⬍0.001 Log10area 7.247 1.255 5.774 ⬍0.001 The partial regression slopes for these dummy variables measure the difference in bird abundance between the grazing category represented by the dummy variable and the reference category for any specific level of log10area. Note that only the effect of intense grazing (category: 5; dummy variable: grazing4) is different from the no grazing category.

the reference category (zero grazing) for any spe- cific value of log10area. Using analysis of covari- ance terminology (Chapter 12), each regression slope measures the difference in the adjusted mean of Y between that category and the refer- ence category (Box 6.7). Interaction terms between the dummy variables and the continuous variable could also be included. These interactions measure how much the slopes of the regressions between Y and the log10 area differ between the levels of grazing. Most statistical software now automates the coding of categorical variables in regression analyses, although you should check what form of coding your software uses. Models that incorporate continuous and categorical pre- dictors will also be considered as part of analysis of covariance in Chapter 12.

6.1.15 Finding the “best” regression model

In many uses of multiple regression, biologists want to find the smallest subset of predictors that provides the “best fit” to the observed data. There are two apparent reasons for this (Mac Nally 2000), related to the two main purposes of regression analysis – explanation and prediction. First, the “best” subset of predictors should include those that are most important in explaining the varia- tion in the response variable. Second, other things being equal, the precision of predictions from our fitted model will be greater with fewer predictor variables in the model. Note that, as we said in the introduction to Chapter 5, biologists, especially ecologists, seem to rarely use their regression models for prediction and we agree with Mac Nally (2000) that biologists are usually searching for the “best” regression model to explain the response variable.

It is important to remember that there will rarely be, for any real data set, a single “best” subset of predictors, particularly if there are many predictors and they are in any way correlated with each other. There will usually be a few models, with different numbers of predictors, which provide similar fits to the observed data. The choice between these competing models will still need to be based on how well the models meet the assumptions, diagnostic considerations of outli- ers and other influential observations and biolog- ical knowledge of the variables retained.

Criteria for “best” model

Irrespective of which method is used for selecting which variables are included in the model (see below), some criterion must be used for deciding which is the “best” model. One characteristic of such a criterion is that it must protect against “overfitting”, where the addition of extra predic- tor variables may suggest a better fit even when these variables actually add very little to the explanatory power. For example, r2 cannot

decrease as more predictor variables are added to the model even if those predictors contribute nothing to the ability of the model to predict or explain the response variable (Box 6.8). So r2is not

suitable for comparing models with different numbers of predictors.

We are usually dealing with a range of models, with different numbers of predictors, but all are subsets of the full model with all predictors. We will use P to indicate all possible predictors, p is the number of predictors included in a specific model, n is the number of observations and we will assume that an intercept is always fitted. If the models are all additive, i.e. no interactions, the number of parameters is p⫹1 (the number of predictors plus the intercept). When interactions are included, then p in the equations below should be the number of parameters (except the intercept) in the model, including both predictors and their interactions. We will describe four crite- ria for determining the fit of a model to the data (Table 6.4).

The first is the adjusted r2 which takes into

account the number of predictors in the model and, in contrast to the usual r2, basically uses

mean squares instead of sum of squares and can increase or decrease as new variables are added to the model. A larger value indicates a better fit. Using the MSResidualfrom the fit of the model is equivalent where a lower value indicates a better fit.

The second is Mallow’s Cp, which works by comparing a specific reduced model to the full model with all P predictors included. For the full model with all P predictors, Cpwill equal P⫹1 (the number of parameters including the intercept). The choice of the best model using Cphas two com- ponents: Cpshould be as small as possible and as close to p as possible.

138 MULTIPLE AND COMPLEX REGRESSION

Box 6.8 Hierarchical partitioning and model selection.

Outline

Related documents