• No results found

4.3 Multilevel modelling

4.3.2 Description of the model

In order to explain the multilevel model (Woltman, 2012; StataCorp, 2013), the simplest possible regression model (i.e. a model only for the mean of the dependent variable with no explanatory variables) would be described and by building up this model, it will end up at the multilevel model. So, the equation which represents the simplest regression model is:

𝑦𝑖 = 𝛽0+ 𝑒𝑖 (4.3)

where:

𝑦𝑖= dependent variable;

𝛽0= the mean of y;

𝑒𝑖= the residuals, i.e. the difference between an individual’s y value and the population

mean;

Moving to the simplest two-level random effect model (equation (4.4)), the residuals are split into two components: the group-level residuals or group random effects (uj)

and the individual residuals eij.

𝑦𝑖𝑗 = 𝛽0+ 𝑒𝑗+ 𝑒𝑖𝑗

𝑒𝑖𝑗~𝑁(0, πœŽπ‘’2), 𝑒𝑗~𝑁(0, πœŽπ‘’2) (4.4)

where:

96

𝑒𝑗= the difference between group j’s mean and the overall mean;

𝑒𝑖𝑗= the difference between y value for the ith individual and the individual’s group

mean;

Residuals at both levels are assumed to follow normal distributions with zero means. The total variance is therefore partitioned into two components: the between-group variance πœŽπ‘’2 , based on the deviation of group means from the overall mean, and the within-group between-individual variance πœŽπ‘’2 , based on individual differences from the group means.

4.3.2.1 Testing for group effects

It is really important to test for group effects, i.e. to test if a multilevel model is more suitable to describe the data. The method that is used for this purpose is the likelihood ratio (LR) test, which is a statistical test used generally for comparing the goodness of fit of two models (the null model and the alternative one). By conducting the LR test to the models, described by the equations (4.3) and (4.4), the null hypothesis that there are no group effects: H0 : πœŽπ‘’2=0 can be tested (i.e. H0: single-level model is true vs. HA:

multilevel model is true). The test statistic is twice the difference in the log-likelihoods: 𝐿𝑅 = 2 Γ— (π‘™π‘œπ‘”π‘™π‘–π‘˜π‘’β„Žπ‘œπ‘œπ‘‘ π‘œπ‘“ π‘‘β„Žπ‘’ π‘Žπ‘™π‘‘π‘’π‘Ÿπ‘›π‘Žπ‘‘π‘–π‘£π‘’ π‘šπ‘œπ‘‘π‘’π‘™ βˆ’ π‘™π‘œπ‘”π‘™π‘–π‘˜π‘’π‘™π‘–β„Žπ‘œπ‘œπ‘‘ π‘œπ‘“ π‘‘β„Žπ‘’ 𝑛𝑒𝑙𝑙 π‘šπ‘œπ‘‘π‘’π‘™)

In this case, the alternative model is the multilevel model and the null, the single-level one. The test statistic LR is compared with a chi-squared distribution with degrees of freedom equal to the number of extra parameters in the more complex model. The multilevel model (equation (4.4) has one additional parameter, the between-group variance πœŽπ‘’2, so there is 1 degree of freedom. Rejection of the null hypothesis implies that there are β€˜real’ group differences, in which case the multilevel model is preferred over the single-level model. On the other hand, if the null hypothesis cannot be rejected, further exploration is still needed in order to fit a single-level model, since between-group differences may be revealed after adding explanatory variables.

97 4.3.2.2 Interpret variance components

There are two coefficients that describe the variance that is due to the hierarchical structure of the data: the Variance Partition Coefficients (VPCs) and the Intraclass Correlation Coefficients (ICCs). The ICC measures the correlation (i.e. similarity or homogeneity) of the observations within a given cluster:

𝐼𝐢𝐢 = 𝜎u

2

πœŽπ‘’2+𝜎 𝑒2

(4.5)

The more common characteristics have the observations in the same cluster the larger the ICC. Whereas the variance partition coefficient reports the proportion of the observed response variation that lies at each level of the model hierarchy and so is due to the differences between groups. It allows establishing the relative importance of each level to the variation of the observations:

𝑉𝑃𝐢𝑒 = 𝜎u2

πœŽπ‘’2+πœŽπ‘’2 . for level 2

𝑉𝑃𝐢𝑒 = 𝜎e2

πœŽπ‘’2+πœŽπ‘’2 , for level 1

(4.6)

If the observations do not statistically differ from one group to another, then the VPC equals to 0. It is noticeable that for the two-level model VPC and ICC are equivalent, but this changes in more complex models (e.g. for level 2 in a three-level model: 𝑉𝑃𝐢𝑠 = 𝜎u2

𝜎v2+πœŽπ‘’2+πœŽπ‘’2, 𝐼𝐢𝐢𝑠 =

𝜎v2+𝜎u2

𝜎v2+πœŽπ‘’2+πœŽπ‘’2).

4.3.2.3 Random intercept model

Following the description of the model, the next step is to add an explanatory variable defined at level 1 and denoted by π‘₯𝑖𝑗. The equation becomes:

98

𝛽0𝑗 = 𝛾00+ π‘’π‘œπ‘— 𝐿𝑒𝑣𝑒𝑙 2 (4.8)

By replacing 𝛽0𝑗 in the equation (4.7) with the equation (4.8), the resulting equation is the following:

𝑦𝑖𝑗 = 𝛾00+ 𝛽10Γ— π‘₯𝑖𝑗+ 𝑒𝑗 + 𝑒𝑖𝑗 (4.9)

𝑒𝑖𝑗~𝑁(0, πœŽπ‘’2), 𝑒𝑗~𝑁(0, πœŽπ‘’2) (4.10)

This model is called a random intercept model because the intercept of the group regression lines is allowed to vary randomly across groups. The overall relationship between the dependent variable y and the explanatory variable x is represented by a straight line with intercept 𝛾00 and slope 𝛽10. A multilevel model can be thought of as

consisting of two components: a fixed part which specifies the relationship between the mean of y and explanatory variables, and a random part that contains the level 1 and 2 residuals. The fixed and the random parts of this model are shown in equation (4.9). The fixed part is extended by adding more predictors, while the random part is extended by allowing the effect of one or more predictor to vary across groups or by allowing the within-group variance to depend on explanatory variables.

As it was mentioned above the intercept may vary from group to group, but the slope of the line 𝛽10 remains the same for all the groups. So, the predicted regression lines for all the different groups will be parallel as shown in Figure 4.7.

Fixed part Random part

99 u1 u2 u3 u4 Group 3 Group 1 Group 2 Group 4 y=Ξ²0+Ξ²1 x x y Ξ²1

Figure 4.7: Prediction lines from a random intercept model for 4 different groups

4.3.2.4 Random Intercepts and Slopes Model (Two-level random effect multilevel model)

Sometimes the effect of the explanatory variable may differ from group to group. A random slope model allows each group line to have a different slope.

𝑦𝑖𝑗 = 𝛽0𝑗+ 𝛽1𝑗× π‘₯𝑖𝑗+ 𝑒𝑖𝑗 𝑒𝑖𝑗~𝑁(0, πœŽπœ€2) 𝐿𝑒𝑣𝑒𝑙 1 (4.11)

𝛽0𝑗 = 𝛾00+ π‘’π‘œπ‘— 𝐿𝑒𝑣𝑒𝑙 2 (4.12)

𝛽1𝑗 = 𝛾10+ 𝑒1𝑗 𝐿𝑒𝑣𝑒𝑙 2 (4.13)

where:

𝑦𝑖𝑗= dependent variable measured for ith level-1 unit nested within the jth level-2 unit;

100 𝛽0𝑗= intercept for the jth level-2 unit;

𝛽1𝑗= regression coefficient associated with for the jth level-2 unit;

𝑒𝑖𝑗= random error associated with the ith level-1 unit nested within the jth level-2 unit;

𝛾00= overall mean intercept;

𝛾10= overall mean slope;

π‘’π‘œπ‘—= random effects of the jth level-2 unit adjusted for π‘₯𝑖𝑗 on the intercept;

𝑒1𝑗= random effects of the jth level-2 unit adjusted for π‘₯𝑖𝑗 on the slope

Now the slope of the average regression line is 𝛾10 and the slope of the line for group

j is 𝛾10+ 𝑒1𝑗. By replacing 𝛽0𝑗and 𝛽1𝑗 from the equations (4.12) and (4.13), the equation (4.11) is becoming: 𝑦𝑖𝑗 = 𝛾00+ 𝛾10Γ— π‘₯𝑖𝑗+ 𝑒0𝑗 + 𝑒1𝑗 Γ— π‘₯𝑖𝑗 + 𝑒0𝑖𝑗 𝑒𝑖𝑗~𝑁(0, πœŽπ‘’02 ), (4.14) [𝑒𝑒0𝑗 1𝑗] ~𝑁(0, 𝛺𝑒), 𝛺𝑒 = [ πœŽπ‘’02 πœŽπ‘’01 πœŽπ‘’12 ] (4.15)

Figure 4.8: shows the prediction lines (the average regression line and the prediction lines for four different groups) from a random slope and random intercept model.

101 u01 u02 u03 u04 Group 3 Group 1 Group 2 Group 4 y=Ξ²0+Ξ²1 x x y Ξ²1 u13 u11 u12 u14

Figure 4.8: Prediction lines from a random intercept and random slope model for 4 different groups

A level 2 explanatory variable (Gj) can be included in a multilevel model in the same

way as a level 1 variable. The composite equation can be expressed as:

π‘Œπ‘–π‘— = 𝛾00+ 𝛾10Γ— π‘₯𝑖𝑗 + 𝛾10Γ— 𝐺𝑗 + 𝛾11Γ— 𝐺𝑗× π‘₯𝑖𝑗+ 𝑒1𝑗× π‘₯𝑖𝑗 + 𝑒0𝑗+ 𝑒𝑖𝑗 (4.16)

4.3.2.5 Three-level random effect multilevel model

Last but not least, the equations of a three-level mixed effect model will be displayed, so as to present how the previous equations for two-level modelling can be expanded for more levels. As mentioned above this model will be used for the two datasets. Therefore, a three-level random-effects linear regression model can be developed for a single explanatory variable (x) as (StataCorp, 2013):

Level-1 predictor Level-2 predictor Cross- level term

102

π‘Œπ‘–π‘—π‘˜ = 𝛽0π‘—π‘˜+ 𝛽1π‘—π‘˜π‘₯π‘–π‘—π‘˜+ π‘’π‘–π‘—π‘˜ 𝐿𝑒𝑣𝑒𝑙 1 (4.17)

𝛽0π‘—π‘˜ = 𝛿00π‘˜ + 𝑒0π‘—π‘˜; 𝛽1π‘—π‘˜ = 𝛿10π‘˜+ 𝑒1π‘—π‘˜ 𝐿𝑒𝑣𝑒𝑙 2 (4.18)

𝛿00π‘˜ = 𝛾000+ πœ—00π‘˜; 𝛿10π‘˜ = 𝛾100+ πœ—10π‘˜ 𝐿𝑒𝑣𝑒𝑙 3 (4.19)

The composite equation can be expressed as:

π‘Œπ‘–π‘—π‘˜ = 𝛾000+ (𝛾100+ 𝑒1π‘—π‘˜+ πœ—10π‘˜)π‘₯π‘–π‘—π‘˜+ πœ—00π‘˜+ 𝑒0π‘—π‘˜+ π‘’π‘–π‘—π‘˜ (4.20)

In which π‘Œπ‘–π‘—π‘˜ is the dependent variable for ith level-1 unit nested within the jth level-2

unit nested within the kth level-3 unit, 𝛾

000 is the final model intercept, 𝑒0π‘—π‘˜ is the

random trip-level intercept, πœ—00π‘˜ is the driver-level random intercept, π‘’π‘–π‘—π‘˜ is the event- level residual, Level-1 (event) variance of π‘’π‘–π‘—π‘˜ is πœŽπ‘’2, Level-2 (trip) variance of 𝑒

0π‘—π‘˜ is

πœŽπ‘’20 and Level-3 (driver) variance of πœ—

00π‘˜ is πœŽπœ—00

2 , 𝛾

100 is the fixed slope coefficient for

the explanatory variable x, 𝑒1π‘—π‘˜ is the random trip-level slope coefficient for x, and πœ—10π‘˜ is the random driver-level slope coefficient for x. All random components are assumed to follow a normal distribution with a mean of zero and a constant standard deviation. Equation (4.20) represents a three-level random-effects linear regression model for a single explanatory variable but this can be similarly extended for multiple explanatory variables.

As far as this work is concerned, both three-level and two-level mixed effect models are used to describe the data and find the relationship between the deceleration event, more specifically the deceleration value and deceleration duration and its influencing factors. The data will be described in more detail in the Data chapter.