General Model Formulation - Multilevel Modeling Approach for Severity Analysis

CHAPTER 4 METHODS

4.2 Multilevel Modeling Approach for Severity Analysis

4.2.2 General Model Formulation

To define the principles of multilevel models, a simplified two-level model can be used. Occupants nested within crashes are considered as the hierarchical form in this model

formulation. For example, let the response variable be the probability of injury for occupant i involved in crash j. The response yij can only takes one of the two values: 1 in case of injury

while 0 in case of no injury. The probability of yij = 1 is denoted by πij = Pr (yij = 1) following a

binomial distribution.

The expected average probability for each crash can be defined as a non-linear function of the predictor or combination of predictors in linear form

𝜋_𝑖𝑗 = 𝑓(𝑋𝛽_𝑖𝑗) (4.11) If f(x) represents a link function, the logit link chosen in this case can be represented by the following equation

𝐿𝑜𝑔𝑖𝑡(𝜋𝑖𝑗) = 𝛾0+ ∑𝑟ℎ=1𝛾ℎ𝑥ℎ𝑖𝑗+ 𝑢0𝑗 (4.12)

The expected probability for driver i to be injured in crash j is now defined as being a logit function of the linear combination of an average value holding for the crash population (γ0),

of the effect of driver level and crash level predictors (∑𝑟ℎ=1𝛾ℎ𝑥ℎ𝑖𝑗), and of crash-related random

effect uoj which is assumed to be normally distributed with mean 0 and variance 𝜎𝑢20𝑗.

The model defined in equation 4.12 allows the intercept to vary across the crashes. Thus, this allows the expected probability of occupants’ injury to be higher for some crashes than others. This random variation is intended to capture the unobserved characteristics of the crash- level units on the observations made of the of individual driver-level units.

Intra-class correlation (ICC) coefficients can be employed to calculate the proportion of variance in the outcome probability that is associated with each of the two levels (crash and driver) considered in this model formulation. This coefficient establishes the ratio of crash-level variance to the total variance in the outcome

𝜌 = 𝜎𝑢02

𝜎_𝑢02 +𝜎2 (4.13)

The logistic distribution for the driver-level residual implies a variance of π2/3 = 3.29. So, for a two-level logistic random intercept model with an intercept variance of 𝜎_𝑢2₀_{, the ICC for}

between-crash residuals is 𝜌 = 𝜎𝑢02

𝜎_𝑢02 ₊𝜋2 3

The ρ is an indicator of the magnitude of the within-crash correlation. A ρ value close to 0 indicates a lack of variation among the crash observations indicating that multilevel modeling is not warranted. On the other hand, a relatively large value of ρ implies an inclination for estimating the multilevel or hierarchical model to fit to the data having hierarchical structure. Literature shows (Huang et al., 2008; Dupont et al., 2013) that ICC values ranging from 0.25 and above are considered high in terms of explaining the variance at the higher level (crash level considered herein).

4.2.2.1 Hierarchical Binary Logit Model in Bayesian Framework

A hierarchical binary logit model with two-level specification was considered for the current research. In equation 4.11, if the response variable 𝑦_𝑖𝑗 only takes one of two values: 𝑦_𝑖𝑗 = 1 in case of the occupant i sustaining injury in crash j and 𝑦_𝑖𝑗 = 0 in case of not sustaining injury, then equation 4.13 can be re-written with 𝜋𝑖𝑗 following a binomial distribution

𝐿𝑜𝑔𝑖𝑡 ( 𝜋𝑖𝑗

1− 𝜋𝑖𝑗) = 𝛾0+ ∑ 𝛾ℎ𝑥ℎ𝑖𝑗+ 𝑢0𝑗

𝑟

ℎ=1 (4.15)

Full Bayesian inference was employed in this research. A Bayesian framework requires a researcher to think about prior information available on the parameters being estimated and to formally include that information in the model. If no prior information is available for the parameters of interest, one must specify an uninformative prior. The posterior distribution for a parameter θ given that observed data is y is subjected to the following rule according to Bayesian statistics

where p (θ|y) is the posterior distribution for θ given observed y, p (y|θ) is the likelihood of observing y given θ, and p (θ) is the probability distribution arising from some statement of prior belief for the parameter.

The key hierarchical part of the model mentioned in equation 4.16 is the random effect 𝑢0𝑗 which can be specified at the individual crash level by allowing its variance (τ02) to follow a

certain distribution varying across the crashes. For the current model specification, the crash level random effect was assumed to have a normal distribution with the variance of the normally distributed random effect having an inverse Gamma distribution (0.001, 0.001) as shown below

𝑢_0𝑗 ~ N (0, τ02) and τ02 ~ Inverse-Gamma (0.001, 0.001)

Based on the full Bayesian inference, the joint prior distribution for the parameters (θ) and the random effects (represented by φ) is

𝑝(𝜑, 𝜃) = 𝑝(𝜑)𝑝(𝜃|𝜑) (4.17)

and the joint posterior distribution can be defined as

𝑝(𝜑, 𝜃|𝑦) ∝ 𝑝(𝑦|𝜑, 𝜃)𝑝(𝜑, 𝜃) = 𝑝(𝜑, 𝜃)𝑝(𝑦|𝜃) (4.18)

In the absence of strong prior information for the model unknowns, uninformative priors were assumed for all regression coefficients (𝛾₀,𝛾_ℎ) with normal distributions (0, 1000). Based on the above formulation, the model was computed via Metropolis-Hastings sampler, a Markov Chain Monte Carlo (MCMC) technique, which was implemented by using MLwiN software package (Rasbah et al., 2000). In MLwiN, the user does not have to choose between Gibbs sampling and Metropolis Hastings sampling directly, the software has the capability to choose the default and the most appropriate technique for the given model. In case of normal response models, Gibbs sampling is used for all parameters. For non-normal responses, MLwiN does not

allow Gibbs sampling. For logistic regression models, Metroplis-Hastings algorithm is usually quicker than the Gibbs sampler approach usually used in the software WinBUGS. 5,000 iterations were used as the burn-in step for each model. This is the number of initial iterations that were not allowed to be used to describe the final parameter distributions. The monitoring chain length was decided from the Raftery-Lewis diagnostic. The Raftery-Lewis diagnostic (Raftery & Lewis, 1992) is a diagnostic based on a particular quantile of the distribution of a parameter. This diagnostic is used to estimate the length of the Markov chain required to

estimate a particular quantile to a given accuracy. In MLwiN, the diagnostic is calculated for the two quantiles with the defaults being the 2.5% and 97.5% quantiles. More iterations would be required if the quantile values are greater than the number of iterations used for a parameter. A thinning to retain every tenth sample was used to reduce the autocorrelation. Thinning is the frequency with which successive values in the Markov chain are stored. Convergences of the models can be checked by monitoring the Markov Chain Monte Carlo (MCMC) dynamic plot traces for all the parameters considered in the models. The conclusion on the model convergence can be made once all the values of the parameters lies within a zone without strong periodicities. Plots of autocorrelation known as Auto Correlation Function (ACF) for all the parameters were also observed to make sure that the chain for each parameter was adequately close to

independently and identically distributed (IID) data. A diagnostic known as the Deviance Information Criteria (DIC) is used to measure how well a model fits the data. It is derived by using the deviance with MCMC sampling. The DIC diagnostic can be used to compare models as it consists of the measure of fit and complexity of a particular model.

In document Bayesian analysis of factors affecting crash frequency and severity during winter seasons in Iowa (Page 77-82)