Description of the model - Multilevel modelling

4.3 Multilevel modelling

4.3.2 Description of the model

In order to explain the multilevel model (Woltman, 2012; StataCorp, 2013), the simplest possible regression model (i.e. a model only for the mean of the dependent variable with no explanatory variables) would be described and by building up this model, it will end up at the multilevel model. So, the equation which represents the simplest regression model is:

𝑦𝑖 = 𝛽0+ 𝑒𝑖 (4.3)

where:

𝑦_𝑖= dependent variable;

𝛽₀= the mean of y;

𝑒𝑖= the residuals, i.e. the difference between an individual’s y value and the population

mean;

Moving to the simplest two-level random effect model (equation (4.4)), the residuals are split into two components: the group-level residuals or group random effects (uj)

and the individual residuals eij.

𝑦𝑖𝑗 = 𝛽0+ 𝑢𝑗+ 𝑒𝑖𝑗

𝑒𝑖𝑗~𝑁(0, 𝜎𝑒2), 𝑢𝑗~𝑁(0, 𝜎𝑢2) (4.4)

where:

𝑢_𝑗= the difference between group j’s mean and the overall mean;

𝑒_𝑖𝑗= the difference between y value for the ith individual and the individual’s group

mean;

Residuals at both levels are assumed to follow normal distributions with zero means. The total variance is therefore partitioned into two components: the between-group variance 𝜎_𝑢2 , based on the deviation of group means from the overall mean, and the within-group between-individual variance 𝜎_𝑒2 , based on individual differences from the group means.

4.3.2.1 Testing for group effects

It is really important to test for group effects, i.e. to test if a multilevel model is more suitable to describe the data. The method that is used for this purpose is the likelihood ratio (LR) test, which is a statistical test used generally for comparing the goodness of fit of two models (the null model and the alternative one). By conducting the LR test to the models, described by the equations (4.3) and (4.4), the null hypothesis that there are no group effects: H0 : 𝜎_𝑢2=0 can be tested (i.e. H0: single-level model is true vs. HA:

multilevel model is true). The test statistic is twice the difference in the log-likelihoods: 𝐿𝑅 = 2 × (𝑙𝑜𝑔𝑙𝑖𝑘𝑒ℎ𝑜𝑜𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑎𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 𝑚𝑜𝑑𝑒𝑙 − 𝑙𝑜𝑔𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑜𝑓 𝑡ℎ𝑒 𝑛𝑢𝑙𝑙 𝑚𝑜𝑑𝑒𝑙)

In this case, the alternative model is the multilevel model and the null, the single-level one. The test statistic LR is compared with a chi-squared distribution with degrees of freedom equal to the number of extra parameters in the more complex model. The multilevel model (equation (4.4) has one additional parameter, the between-group variance 𝜎_𝑢2, so there is 1 degree of freedom. Rejection of the null hypothesis implies that there are ‘real’ group differences, in which case the multilevel model is preferred over the single-level model. On the other hand, if the null hypothesis cannot be rejected, further exploration is still needed in order to fit a single-level model, since between-group differences may be revealed after adding explanatory variables.

97 4.3.2.2 Interpret variance components

There are two coefficients that describe the variance that is due to the hierarchical structure of the data: the Variance Partition Coefficients (VPCs) and the Intraclass Correlation Coefficients (ICCs). The ICC measures the correlation (i.e. similarity or homogeneity) of the observations within a given cluster:

𝐼𝐶𝐶 = 𝜎u

𝜎_𝑢2_+𝜎 𝑒2

(4.5)

The more common characteristics have the observations in the same cluster the larger the ICC. Whereas the variance partition coefficient reports the proportion of the observed response variation that lies at each level of the model hierarchy and so is due to the differences between groups. It allows establishing the relative importance of each level to the variation of the observations:

𝑉𝑃𝐶_𝑢 = 𝜎u2

𝜎𝑢2+𝜎𝑒2 . for level 2

𝑉𝑃𝐶_𝑒 = 𝜎e2

𝜎𝑢2+𝜎𝑒2 , for level 1

(4.6)

If the observations do not statistically differ from one group to another, then the VPC equals to 0. It is noticeable that for the two-level model VPC and ICC are equivalent, but this changes in more complex models (e.g. for level 2 in a three-level model: 𝑉𝑃𝐶_𝑠 = 𝜎u2

𝜎v2+𝜎𝑢2+𝜎𝑒2, 𝐼𝐶𝐶𝑠 =

𝜎v2+𝜎u2

𝜎v2+𝜎𝑢2+𝜎𝑒2).

4.3.2.3 Random intercept model

Following the description of the model, the next step is to add an explanatory variable defined at level 1 and denoted by 𝑥_𝑖𝑗. The equation becomes:

𝛽_0𝑗 = 𝛾₀₀+ 𝑢_𝑜𝑗 𝐿𝑒𝑣𝑒𝑙 2 (4.8)

By replacing 𝛽_0𝑗 in the equation (4.7) with the equation (4.8), the resulting equation is the following:

𝑦𝑖𝑗 = 𝛾00+ 𝛽10× 𝑥𝑖𝑗+ 𝑢𝑗 + 𝑒𝑖𝑗 (4.9)

𝑒𝑖𝑗~𝑁(0, 𝜎𝑒2), 𝑢𝑗~𝑁(0, 𝜎𝑢2) (4.10)

This model is called a random intercept model because the intercept of the group regression lines is allowed to vary randomly across groups. The overall relationship between the dependent variable y and the explanatory variable x is represented by a straight line with intercept 𝛾00 and slope 𝛽10. A multilevel model can be thought of as

consisting of two components: a fixed part which specifies the relationship between the mean of y and explanatory variables, and a random part that contains the level 1 and 2 residuals. The fixed and the random parts of this model are shown in equation (4.9). The fixed part is extended by adding more predictors, while the random part is extended by allowing the effect of one or more predictor to vary across groups or by allowing the within-group variance to depend on explanatory variables.

As it was mentioned above the intercept may vary from group to group, but the slope of the line 𝛽₁₀ remains the same for all the groups. So, the predicted regression lines for all the different groups will be parallel as shown in Figure 4.7.

Fixed part Random part

99 u1 u2 u3 u4 Group 3 Group 1 Group 2 Group 4 y=β0+β1 x x y β1

Figure 4.7: Prediction lines from a random intercept model for 4 different groups

4.3.2.4 Random Intercepts and Slopes Model (Two-level random effect multilevel model)

Sometimes the effect of the explanatory variable may differ from group to group. A random slope model allows each group line to have a different slope.

𝑦_𝑖𝑗 = 𝛽_0𝑗+ 𝛽_1𝑗× 𝑥_𝑖𝑗+ 𝑒_𝑖𝑗 𝑒_𝑖𝑗~𝑁(0, 𝜎_𝜀2_{) 𝐿𝑒𝑣𝑒𝑙 1} _(4.11)

𝛽_0𝑗 = 𝛾₀₀+ 𝑢_𝑜𝑗 𝐿𝑒𝑣𝑒𝑙 2 (4.12)

𝛽_1𝑗 = 𝛾₁₀+ 𝑢_1𝑗 𝐿𝑒𝑣𝑒𝑙 2 (4.13)

where:

𝑦_𝑖𝑗= dependent variable measured for ith_{level-1 unit nested within the j}th_{level-2 unit;}

100 𝛽_0𝑗= intercept for the jth_{level-2 unit;}

𝛽_1𝑗= regression coefficient associated with for the jth_{level-2 unit;}

𝑒𝑖𝑗= random error associated with the ith level-1 unit nested within the jth level-2 unit;

𝛾00= overall mean intercept;

𝛾₁₀= overall mean slope;

𝑢𝑜𝑗= random effects of the jth level-2 unit adjusted for 𝑥𝑖𝑗 on the intercept;

𝑢1𝑗= random effects of the jth level-2 unit adjusted for 𝑥𝑖𝑗 on the slope

Now the slope of the average regression line is 𝛾10 and the slope of the line for group

j is 𝛾₁₀+ 𝑢_1𝑗. By replacing 𝛽_0𝑗and 𝛽_1𝑗 from the equations (4.12) and (4.13), the equation (4.11) is becoming: 𝑦𝑖𝑗 = 𝛾00+ 𝛾10× 𝑥𝑖𝑗+ 𝑢0𝑗 + 𝑢1𝑗 × 𝑥𝑖𝑗 + 𝑒0𝑖𝑗 𝑒𝑖𝑗~𝑁(0, 𝜎𝑒02 ), (4.14) [𝑢_𝑢0𝑗 1𝑗] ~𝑁(0, 𝛺𝑢), 𝛺𝑢 = [ 𝜎_𝑢02 𝜎_𝑢01 𝜎_𝑢12 ] (4.15)

Figure 4.8: shows the prediction lines (the average regression line and the prediction lines for four different groups) from a random slope and random intercept model.

101 u01 u02 u03 u04 Group 3 Group 1 Group 2 Group 4 y=β0+β1 x x y β1 u13 u11 u12 u14

Figure 4.8: Prediction lines from a random intercept and random slope model for 4 different groups

A level 2 explanatory variable (Gj) can be included in a multilevel model in the same

way as a level 1 variable. The composite equation can be expressed as:

𝑌𝑖𝑗 = 𝛾00+ 𝛾10× 𝑥𝑖𝑗 + 𝛾10× 𝐺𝑗 + 𝛾11× 𝐺𝑗× 𝑥𝑖𝑗+ 𝑢1𝑗× 𝑥𝑖𝑗 + 𝑢0𝑗+ 𝑒𝑖𝑗 (4.16)

4.3.2.5 Three-level random effect multilevel model

Last but not least, the equations of a three-level mixed effect model will be displayed, so as to present how the previous equations for two-level modelling can be expanded for more levels. As mentioned above this model will be used for the two datasets. Therefore, a three-level random-effects linear regression model can be developed for a single explanatory variable (x) as (StataCorp, 2013):

Level-1 predictor Level-2 predictor Cross- level term

102

𝑌_𝑖𝑗𝑘 = 𝛽_0𝑗𝑘+ 𝛽_1𝑗𝑘𝑥_𝑖𝑗𝑘+ 𝑒_𝑖𝑗𝑘 𝐿𝑒𝑣𝑒𝑙 1 (4.17)

𝛽_0𝑗𝑘 = 𝛿_00𝑘 + 𝑢_0𝑗𝑘; 𝛽_1𝑗𝑘 = 𝛿_10𝑘+ 𝑢_1𝑗𝑘 𝐿𝑒𝑣𝑒𝑙 2 (4.18)

𝛿00𝑘 = 𝛾000+ 𝜗00𝑘; 𝛿10𝑘 = 𝛾100+ 𝜗10𝑘 𝐿𝑒𝑣𝑒𝑙 3 (4.19)

The composite equation can be expressed as:

𝑌_𝑖𝑗𝑘 = 𝛾₀₀₀+ (𝛾₁₀₀+ 𝑢_1𝑗𝑘+ 𝜗_10𝑘)𝑥_𝑖𝑗𝑘+ 𝜗_00𝑘+ 𝑢_0𝑗𝑘+ 𝑒_𝑖𝑗𝑘 (4.20)

In which 𝑌𝑖𝑗𝑘 is the dependent variable for ith level-1 unit nested within the jth level-2

unit nested within the kth_{level-3 unit,}_𝛾

000 is the final model intercept, 𝑢0𝑗𝑘 is the

random trip-level intercept, 𝜗_00𝑘 is the driver-level random intercept, 𝑒_𝑖𝑗𝑘 is the event- level residual, Level-1 (event) variance of 𝑒_𝑖𝑗𝑘 is 𝜎_𝑒2_{, Level-2 (trip) variance of 𝑢}

0𝑗𝑘 is

𝜎_𝑢2₀_{and Level-3 (driver) variance of}_𝜗

00𝑘 is 𝜎𝜗00

2 _{, 𝛾}

100 is the fixed slope coefficient for

the explanatory variable x, 𝑢_1𝑗𝑘 is the random trip-level slope coefficient for x, and 𝜗_10𝑘 is the random driver-level slope coefficient for x. All random components are assumed to follow a normal distribution with a mean of zero and a constant standard deviation. Equation (4.20) represents a three-level random-effects linear regression model for a single explanatory variable but this can be similarly extended for multiple explanatory variables.

As far as this work is concerned, both three-level and two-level mixed effect models are used to describe the data and find the relationship between the deceleration event, more specifically the deceleration value and deceleration duration and its influencing factors. The data will be described in more detail in the Data chapter.

In document Modelling drivers’ braking behaviour and comfort under normal driving (Page 115-122)