Comparing nested models - SIGNIFICANCE TESTING AND CONFIDENCE INTERVALS

Estimation and Hypothesis Testing in Multilevel Regression

3.2 SIGNIFICANCE TESTING AND CONFIDENCE INTERVALS

3.2.2 Comparing nested models

From the likelihood function we can calculate a statistic called the deviance that indi-cates how well the model fits the data. The deviance is defined as −2 × ln (Likelihood), where Likelihood is the value of the likelihood function at convergence, and is the natural logarithm. In general, models with a lower deviance fit better than models with a higher deviance. If two models are nested, which means that a specific model can be derived from a more general model by removing parameters from the general model, we can compare them statistically using their deviances. The difference of the devi-ances for two nested models has a chi-square distribution, with degrees of freedom equal to the difference in the number of parameters estimated in the two models. This can be used to perform a formal chi-square test to test whether the more general model fits significantly better than the simpler model. The deviance difference test is also Estimation and Hypothesis Testing in Multilevel Regression 47

referred to as the likelihood ratio test, since the ratio of two likelihoods is compared by looking at the difference of their logarithms.

The chi-square test of the deviances can be used to good effect to explore the importance of random effects, by comparing a model that contains these effects with a model that excludes them.

Table 3.1 presents two models for the pupil popularity data used as an example in Chapter 2. The first model contains only an intercept. The second model adds two pupil-level variables and a teacher-level variable, with the pupil-level variable extraver-sion having random slopes at the second (class) level. To test the second-level variance component σu0² using the deviance difference test, we remove it from model M0. The resulting model (not presented in Table 3.1) produces a deviance of 6970.4, and the deviance difference is 642.9. Since the modified model estimates one parameter less, this is referred to the chi-square distribution with one degree of freedom. The result is obviously significant.

The variance of the regression coefficient for pupil gender is estimated as zero, and therefore it is removed from the model. A formal test is not necessary. In model M1 in Table 3.1 this variable is treated as fixed: no variance component is estimated. To test the significance of the variance of the extraversion slopes, we must remove the variance parameter from the model. This presents us with a problem, since there is also a covariance parameter σu02 associated with the extraversion slopes. If we remove both Table 3.1 Intercept-only model and model with explanatory variables

Model M0: intercept only M1: with predictors

Fixed part Coefficient (s.e.) Coefficient (s.e.)

Intercept 5.08 (.09) 0.74 (.20)

Pupil gender 1.25 (.04)

Pupil extraversion 0.45 (.02)

Teacher experience 0.09 (.01)

Random part

σ²e 1.22 (.04) 0.55 (.02)

σu0² 0.69 (.11) 1.28 (.28)

σu02 −0.18 (.05)

σu2² 0.03 (.008)

Deviance 6327.5 4812.8

48 MULTILEVEL ANALYSIS: TECHNIQUES AND APPLICATIONS

the variance and the covariance parameter from the model, we are testing a combined hypothesis on two degrees of freedom. It is better to separate these tests. Some soft-ware (e.g., MLwiN) actually allows us to remove the variance of the slopes from the model but to retain the covariance parameter. This is a strange model, but for testing purposes it allows us to carry out a separate test on the variance parameter only. Other software (e.g., MLwiN, SPSS, SAS) allows the removal of the covariance parameter, while keeping the variance in the model. If we modify model M1 this way, the deviance increases to 4851.9. The difference is 39.1, which is a chi-square variate with one degree of freedom, and highly significant. If we modify the model further, by removing the slope variance, the deviance increases again to 4862.3. The difference with the previous model is 10.4, again with one degree of freedom, and it is highly significant.

Asymptotically, the Wald test and the test using the chi-square difference are equivalent. In practice, the Wald test and the chi-square difference test do not always lead to the same conclusion. If a variance component is tested, the chi-square differ-ence test is clearly better, except when models are estimated where the likelihood function is only an approximation, as in the logistic models discussed in Chapter 6.

When the chi-square difference test is used to test a variance component, it should be noted that the standard application leads to a p-value that is too high. The reason is that the null-hypothesis of zero variance is on the boundary of the parameter space (all possible parameter values) since variances cannot be negative. If the null-hypothesis is true, there is a 50% chance of finding a positive variance, and a 50%

chance of finding a negative variance. Negative variances are inadmissible, and the usual procedure is to change the negative estimate to zero. Thus, under the null-hypothesis the chi-square statistic has a mixture distribution of 50% zero and 50%

chi-square with one degree of freedom. Therefore, the p-value from the chi-square difference test must be divided by two if a variance component is tested (Berkhof &

Snijders, 2001). If we test a slope variance, and remove both the slope variance and the covariance from the model, the mixture is more complicated, because we have a mix-ture of 50% chi-square with one degree of freedom for the unconstrained intercept–

slope covariance and 50% chi-square with two degrees of freedom for the covariance and the variance that is constrained to be non-negative (Verbeke & Molenberghs, 2000). The p-value for this mixture is calculated using p = 0.5 P冢χ²1 > C²冣+ 0.5P 冢χ²2 > C²冣

where C² is the difference in the deviances of the model with and without the slope variance and intercept–slope covariance. Stoel, Galindo, Dolan, and van den Wittenboer (2006) discuss how to carry out such tests in general. If it is possible to remove the intercept–slope covariance from the model, it is possible to test the significance of the slope variance with a one degree of freedom test, and we can simply halve the p-value again. For the regression coefficients, the chi-square test (only in combination with FML estimation) is in general also superior. The reason is that the Wald test is to some degree sensitive to the parameterization of the model and the specific restrictions to be Estimation and Hypothesis Testing in Multilevel Regression 49

tested (Davidson & MacKinnon, 1993, Chapter 13.5–13.6). The chi-square test is invariant under reparameterizations of the model. Since the Wald test is much more convenient, it is in practice used the most, especially for the fixed effects. Even so, if there is a discrepancy between the result of a chi-square difference test and the equivalent Wald test, the chi-square difference test is generally the preferred one.

LaHuis and Ferguson (2009) compare, among others, the chi-square deviance test and the chi-square residuals test described above. In their simulation, all tests controlled the type I error well, and the deviance difference test (dividing p by two, as described above) generally performed best in terms of power.

In document 2010 Hox (Page 58-61)