4.3 Multilevel modelling
4.3.2 Description of the model
In order to explain the multilevel model (Woltman, 2012; StataCorp, 2013), the simplest possible regression model (i.e. a model only for the mean of the dependent variable with no explanatory variables) would be described and by building up this model, it will end up at the multilevel model. So, the equation which represents the simplest regression model is:
π¦π = π½0+ ππ (4.3)
where:
π¦π= dependent variable;
π½0= the mean of y;
ππ= the residuals, i.e. the difference between an individualβs y value and the population
mean;
Moving to the simplest two-level random effect model (equation (4.4)), the residuals are split into two components: the group-level residuals or group random effects (uj)
and the individual residuals eij.
π¦ππ = π½0+ π’π+ πππ
πππ~π(0, ππ2), π’π~π(0, ππ’2) (4.4)
where:
96
π’π= the difference between group jβs mean and the overall mean;
πππ= the difference between y value for the ith individual and the individualβs group
mean;
Residuals at both levels are assumed to follow normal distributions with zero means. The total variance is therefore partitioned into two components: the between-group variance ππ’2 , based on the deviation of group means from the overall mean, and the within-group between-individual variance ππ2 , based on individual differences from the group means.
4.3.2.1 Testing for group effects
It is really important to test for group effects, i.e. to test if a multilevel model is more suitable to describe the data. The method that is used for this purpose is the likelihood ratio (LR) test, which is a statistical test used generally for comparing the goodness of fit of two models (the null model and the alternative one). By conducting the LR test to the models, described by the equations (4.3) and (4.4), the null hypothesis that there are no group effects: H0 : ππ’2=0 can be tested (i.e. H0: single-level model is true vs. HA:
multilevel model is true). The test statistic is twice the difference in the log-likelihoods: πΏπ = 2 Γ (πππππππβπππ ππ π‘βπ πππ‘πππππ‘ππ£π πππππ β πππππππππβπππ ππ π‘βπ ππ’ππ πππππ)
In this case, the alternative model is the multilevel model and the null, the single-level one. The test statistic LR is compared with a chi-squared distribution with degrees of freedom equal to the number of extra parameters in the more complex model. The multilevel model (equation (4.4) has one additional parameter, the between-group variance ππ’2, so there is 1 degree of freedom. Rejection of the null hypothesis implies that there are βrealβ group differences, in which case the multilevel model is preferred over the single-level model. On the other hand, if the null hypothesis cannot be rejected, further exploration is still needed in order to fit a single-level model, since between-group differences may be revealed after adding explanatory variables.
97 4.3.2.2 Interpret variance components
There are two coefficients that describe the variance that is due to the hierarchical structure of the data: the Variance Partition Coefficients (VPCs) and the Intraclass Correlation Coefficients (ICCs). The ICC measures the correlation (i.e. similarity or homogeneity) of the observations within a given cluster:
πΌπΆπΆ = πu
2
ππ’2+π π2
(4.5)
The more common characteristics have the observations in the same cluster the larger the ICC. Whereas the variance partition coefficient reports the proportion of the observed response variation that lies at each level of the model hierarchy and so is due to the differences between groups. It allows establishing the relative importance of each level to the variation of the observations:
πππΆπ’ = πu2
ππ’2+ππ2 . for level 2
πππΆπ = πe2
ππ’2+ππ2 , for level 1
(4.6)
If the observations do not statistically differ from one group to another, then the VPC equals to 0. It is noticeable that for the two-level model VPC and ICC are equivalent, but this changes in more complex models (e.g. for level 2 in a three-level model: πππΆπ = πu2
πv2+ππ’2+ππ2, πΌπΆπΆπ =
πv2+πu2
πv2+ππ’2+ππ2).
4.3.2.3 Random intercept model
Following the description of the model, the next step is to add an explanatory variable defined at level 1 and denoted by π₯ππ. The equation becomes:
98
π½0π = πΎ00+ π’ππ πΏππ£ππ 2 (4.8)
By replacing π½0π in the equation (4.7) with the equation (4.8), the resulting equation is the following:
π¦ππ = πΎ00+ π½10Γ π₯ππ+ π’π + πππ (4.9)
πππ~π(0, ππ2), π’π~π(0, ππ’2) (4.10)
This model is called a random intercept model because the intercept of the group regression lines is allowed to vary randomly across groups. The overall relationship between the dependent variable y and the explanatory variable x is represented by a straight line with intercept πΎ00 and slope π½10. A multilevel model can be thought of as
consisting of two components: a fixed part which specifies the relationship between the mean of y and explanatory variables, and a random part that contains the level 1 and 2 residuals. The fixed and the random parts of this model are shown in equation (4.9). The fixed part is extended by adding more predictors, while the random part is extended by allowing the effect of one or more predictor to vary across groups or by allowing the within-group variance to depend on explanatory variables.
As it was mentioned above the intercept may vary from group to group, but the slope of the line π½10 remains the same for all the groups. So, the predicted regression lines for all the different groups will be parallel as shown in Figure 4.7.
Fixed part Random part
99 u1 u2 u3 u4 Group 3 Group 1 Group 2 Group 4 y=Ξ²0+Ξ²1 x x y Ξ²1
Figure 4.7: Prediction lines from a random intercept model for 4 different groups
4.3.2.4 Random Intercepts and Slopes Model (Two-level random effect multilevel model)
Sometimes the effect of the explanatory variable may differ from group to group. A random slope model allows each group line to have a different slope.
π¦ππ = π½0π+ π½1πΓ π₯ππ+ πππ πππ~π(0, ππ2) πΏππ£ππ 1 (4.11)
π½0π = πΎ00+ π’ππ πΏππ£ππ 2 (4.12)
π½1π = πΎ10+ π’1π πΏππ£ππ 2 (4.13)
where:
π¦ππ= dependent variable measured for ith level-1 unit nested within the jth level-2 unit;
100 π½0π= intercept for the jth level-2 unit;
π½1π= regression coefficient associated with for the jth level-2 unit;
πππ= random error associated with the ith level-1 unit nested within the jth level-2 unit;
πΎ00= overall mean intercept;
πΎ10= overall mean slope;
π’ππ= random effects of the jth level-2 unit adjusted for π₯ππ on the intercept;
π’1π= random effects of the jth level-2 unit adjusted for π₯ππ on the slope
Now the slope of the average regression line is πΎ10 and the slope of the line for group
j is πΎ10+ π’1π. By replacing π½0πand π½1π from the equations (4.12) and (4.13), the equation (4.11) is becoming: π¦ππ = πΎ00+ πΎ10Γ π₯ππ+ π’0π + π’1π Γ π₯ππ + π0ππ πππ~π(0, ππ02 ), (4.14) [π’π’0π 1π] ~π(0, πΊπ’), πΊπ’ = [ ππ’02 ππ’01 ππ’12 ] (4.15)
Figure 4.8: shows the prediction lines (the average regression line and the prediction lines for four different groups) from a random slope and random intercept model.
101 u01 u02 u03 u04 Group 3 Group 1 Group 2 Group 4 y=Ξ²0+Ξ²1 x x y Ξ²1 u13 u11 u12 u14
Figure 4.8: Prediction lines from a random intercept and random slope model for 4 different groups
A level 2 explanatory variable (Gj) can be included in a multilevel model in the same
way as a level 1 variable. The composite equation can be expressed as:
πππ = πΎ00+ πΎ10Γ π₯ππ + πΎ10Γ πΊπ + πΎ11Γ πΊπΓ π₯ππ+ π’1πΓ π₯ππ + π’0π+ πππ (4.16)
4.3.2.5 Three-level random effect multilevel model
Last but not least, the equations of a three-level mixed effect model will be displayed, so as to present how the previous equations for two-level modelling can be expanded for more levels. As mentioned above this model will be used for the two datasets. Therefore, a three-level random-effects linear regression model can be developed for a single explanatory variable (x) as (StataCorp, 2013):
Level-1 predictor Level-2 predictor Cross- level term
102
ππππ = π½0ππ+ π½1πππ₯πππ+ ππππ πΏππ£ππ 1 (4.17)
π½0ππ = πΏ00π + π’0ππ; π½1ππ = πΏ10π+ π’1ππ πΏππ£ππ 2 (4.18)
πΏ00π = πΎ000+ π00π; πΏ10π = πΎ100+ π10π πΏππ£ππ 3 (4.19)
The composite equation can be expressed as:
ππππ = πΎ000+ (πΎ100+ π’1ππ+ π10π)π₯πππ+ π00π+ π’0ππ+ ππππ (4.20)
In which ππππ is the dependent variable for ith level-1 unit nested within the jth level-2
unit nested within the kth level-3 unit, πΎ
000 is the final model intercept, π’0ππ is the
random trip-level intercept, π00π is the driver-level random intercept, ππππ is the event- level residual, Level-1 (event) variance of ππππ is ππ2, Level-2 (trip) variance of π’
0ππ is
ππ’20 and Level-3 (driver) variance of π
00π is ππ00
2 , πΎ
100 is the fixed slope coefficient for
the explanatory variable x, π’1ππ is the random trip-level slope coefficient for x, and π10π is the random driver-level slope coefficient for x. All random components are assumed to follow a normal distribution with a mean of zero and a constant standard deviation. Equation (4.20) represents a three-level random-effects linear regression model for a single explanatory variable but this can be similarly extended for multiple explanatory variables.
As far as this work is concerned, both three-level and two-level mixed effect models are used to describe the data and find the relationship between the deceleration event, more specifically the deceleration value and deceleration duration and its influencing factors. The data will be described in more detail in the Data chapter.