CHAPTER 4 METHODS
4.2 Multilevel Modeling Approach for Severity Analysis
4.2.2 General Model Formulation
To define the principles of multilevel models, a simplified two-level model can be used. Occupants nested within crashes are considered as the hierarchical form in this model
formulation. For example, let the response variable be the probability of injury for occupant i involved in crash j. The response yij can only takes one of the two values: 1 in case of injury
while 0 in case of no injury. The probability of yij = 1 is denoted by Οij = Pr (yij = 1) following a
binomial distribution.
The expected average probability for each crash can be defined as a non-linear function of the predictor or combination of predictors in linear form
πππ = π(ππ½ππ) (4.11) If f(x) represents a link function, the logit link chosen in this case can be represented by the following equation
πΏππππ‘(πππ) = πΎ0+ βπβ=1πΎβπ₯βππ+ π’0π (4.12)
The expected probability for driver i to be injured in crash j is now defined as being a logit function of the linear combination of an average value holding for the crash population (Ξ³0),
of the effect of driver level and crash level predictors (βπβ=1πΎβπ₯βππ), and of crash-related random
effect uoj which is assumed to be normally distributed with mean 0 and variance ππ’20π.
The model defined in equation 4.12 allows the intercept to vary across the crashes. Thus, this allows the expected probability of occupantsβ injury to be higher for some crashes than others. This random variation is intended to capture the unobserved characteristics of the crash- level units on the observations made of the of individual driver-level units.
Intra-class correlation (ICC) coefficients can be employed to calculate the proportion of variance in the outcome probability that is associated with each of the two levels (crash and driver) considered in this model formulation. This coefficient establishes the ratio of crash-level variance to the total variance in the outcome
π = ππ’02
ππ’02 +π2 (4.13)
The logistic distribution for the driver-level residual implies a variance of Ο2/3 = 3.29. So, for a two-level logistic random intercept model with an intercept variance of ππ’20, the ICC for
between-crash residuals is π = ππ’02
ππ’02 +π2 3
The Ο is an indicator of the magnitude of the within-crash correlation. A Ο value close to 0 indicates a lack of variation among the crash observations indicating that multilevel modeling is not warranted. On the other hand, a relatively large value of Ο implies an inclination for estimating the multilevel or hierarchical model to fit to the data having hierarchical structure. Literature shows (Huang et al., 2008; Dupont et al., 2013) that ICC values ranging from 0.25 and above are considered high in terms of explaining the variance at the higher level (crash level considered herein).
4.2.2.1 Hierarchical Binary Logit Model in Bayesian Framework
A hierarchical binary logit model with two-level specification was considered for the current research. In equation 4.11, if the response variable π¦ππ only takes one of two values: π¦ππ = 1 in case of the occupant i sustaining injury in crash j and π¦ππ = 0 in case of not sustaining injury, then equation 4.13 can be re-written with πππ following a binomial distribution
πΏππππ‘ ( πππ
1β πππ) = πΎ0+ β πΎβπ₯βππ+ π’0π
π
β=1 (4.15)
Full Bayesian inference was employed in this research. A Bayesian framework requires a researcher to think about prior information available on the parameters being estimated and to formally include that information in the model. If no prior information is available for the parameters of interest, one must specify an uninformative prior. The posterior distribution for a parameter ΞΈ given that observed data is y is subjected to the following rule according to Bayesian statistics
where p (ΞΈ|y) is the posterior distribution for ΞΈ given observed y, p (y|ΞΈ) is the likelihood of observing y given ΞΈ, and p (ΞΈ) is the probability distribution arising from some statement of prior belief for the parameter.
The key hierarchical part of the model mentioned in equation 4.16 is the random effect π’0π which can be specified at the individual crash level by allowing its variance (Ο02) to follow a
certain distribution varying across the crashes. For the current model specification, the crash level random effect was assumed to have a normal distribution with the variance of the normally distributed random effect having an inverse Gamma distribution (0.001, 0.001) as shown below
π’0π ~ N (0, Ο02) and Ο02 ~ Inverse-Gamma (0.001, 0.001)
Based on the full Bayesian inference, the joint prior distribution for the parameters (ΞΈ) and the random effects (represented by Ο) is
π(π, π) = π(π)π(π|π) (4.17)
and the joint posterior distribution can be defined as
π(π, π|π¦) β π(π¦|π, π)π(π, π) = π(π, π)π(π¦|π) (4.18)
In the absence of strong prior information for the model unknowns, uninformative priors were assumed for all regression coefficients (πΎ0,πΎβ) with normal distributions (0, 1000). Based on the above formulation, the model was computed via Metropolis-Hastings sampler, a Markov Chain Monte Carlo (MCMC) technique, which was implemented by using MLwiN software package (Rasbah et al., 2000). In MLwiN, the user does not have to choose between Gibbs sampling and Metropolis Hastings sampling directly, the software has the capability to choose the default and the most appropriate technique for the given model. In case of normal response models, Gibbs sampling is used for all parameters. For non-normal responses, MLwiN does not
allow Gibbs sampling. For logistic regression models, Metroplis-Hastings algorithm is usually quicker than the Gibbs sampler approach usually used in the software WinBUGS. 5,000 iterations were used as the burn-in step for each model. This is the number of initial iterations that were not allowed to be used to describe the final parameter distributions. The monitoring chain length was decided from the Raftery-Lewis diagnostic. The Raftery-Lewis diagnostic (Raftery & Lewis, 1992) is a diagnostic based on a particular quantile of the distribution of a parameter. This diagnostic is used to estimate the length of the Markov chain required to
estimate a particular quantile to a given accuracy. In MLwiN, the diagnostic is calculated for the two quantiles with the defaults being the 2.5% and 97.5% quantiles. More iterations would be required if the quantile values are greater than the number of iterations used for a parameter. A thinning to retain every tenth sample was used to reduce the autocorrelation. Thinning is the frequency with which successive values in the Markov chain are stored. Convergences of the models can be checked by monitoring the Markov Chain Monte Carlo (MCMC) dynamic plot traces for all the parameters considered in the models. The conclusion on the model convergence can be made once all the values of the parameters lies within a zone without strong periodicities. Plots of autocorrelation known as Auto Correlation Function (ACF) for all the parameters were also observed to make sure that the chain for each parameter was adequately close to
independently and identically distributed (IID) data. A diagnostic known as the Deviance Information Criteria (DIC) is used to measure how well a model fits the data. It is derived by using the deviance with MCMC sampling. The DIC diagnostic can be used to compare models as it consists of the measure of fit and complexity of a particular model.