CHAPTER 2 LITERATURE REVIEW
2.3 Multilevel Modeling Techniques for Crash Severity Analysis
account crash hierarchy (crash-vehicle-occupant). The hierarchy can be expanded to include geographical elements such as road segments or sites, regions, and so on.
Jones and Jorgensen (2003) and Lenguerrand et al. (2006) were among the first studies to recognize the need to account for the crash level hierarchy (crash-vehicle-occupant) in crash data when modeling crash severity. Jones and Jorgensen (2003) first attempted to consider the natural crash hierarchy in disaggregate crash data using a crash dataset in Norway. A total of 16,332 crashes along Norwegian roads spanning from 1985 to 1996 were obtained. The probability of injury (fatal or serious) severity of occupants was modeled with a binomial model. The hierarchy specified for this study was occupants nested within crashes and crashes nested within
municipalities. The predictor variables included variables related to crash characteristics (light condition, road type etc.) and occupant characteristics (gender, age etc.). The random variations at the crash level and municipal level were significant. The Intra-class Correlation Coefficient (ICC) revealed that the largest proportion of variation in the injury severity outcomes was attributed to the lowest level of the crash hierarchy (occupants with 83%), while 16% of the variation was attributed to the crash level and 1% was attributed to the municipal level.
However, this study did not provide any comparison between single-level and multilevel models in terms of parameter significance and model fit as a single-level model was not computed. Jones and Jorgensen (2003) ignored the intermediate “vehicle” level as majority of the vehicles in the crash data considered for the analysis included only a single occupant sustaining serious or fatal injury leaving little information to differentiate between vehicles and occupants. This can be attributed to a fundamental characteristic of crash data but not to the particularity of the data analyzed by Jones and Jorgensen. While the number of crashes can be large, the number of cars per crash and of individuals or occupants is typically very low.
Valnar (2005) illustrated the advantage of multilevel modeling compared to statistical techniques that ignore hierarchies, based on two empirical traffic safety examples. The study showed two important consequences of ignoring the hierarchical structure in the data. The first consequence is the underestimation of standard errors, which was illustrated with data from an observational study on seatbelt behavior. It was found that two factors (passenger: a dummy variable indicating whether the observed subject was a front seat passenger or a driver, and weekend night: a dummy variable to indicate the time span of a crash) were significant at five percent level in a single-level model while those were insignificant in a two-level model. The second consequence, related to contextual information, was illustrated with data from a roadside survey on drink driving. The second consequence relating to contextual information is illustrated with the frog pond theory in Hox, 2002. For example, in traffic safety, this theory is applied in the form of the effect an explanatory variable (for example, willingness to take risk) might have on the dependent variable (for example, choice of speed by drivers). Speed choice may depend on the average speed of other drivers at a particular location. Of particular interest for
number of vehicles driving by the road site during police check) and the odds of drivers
exceeding the legal limit of Blood Alcohol Concentration (BAC). The drivers were nested within road sites and a multilevel model was fitted. A significant relationship between gender and the outcome variable was a nice illustration of the frog pond theory in this case. Although an
insignificant cross-level interaction was found, a significant cross-level interaction would mean a varying influence of individuals’ gender on odds for drunk driving with different values of traffic counts at different sites.
Lenguerrand et al. (2006) proposed a binomial model to model the probability of vehicle occupants accounting for the hierarchical structure of the crash data with three levels: crash, car/vehicle, and occupant. They used crash data from French road injury crash census for a four year period from 1996 to 2000 and tested three different modeling techniques with logistic models, generalized estimating equations (GEE), and multilevel logistic models. It was revealed that multilevel models yielded better results compared to the other two modeling techniques reinforcing the importance of accounting for the hierarchical nature of the crash data. One important observation from this study was that the variance of vehicle random effect was falsely estimated to be zero for 36% of the cases. These incorrect estimates were attributed to the small number of observations per vehicle and per crash.
Kim et al. (2007) used a sample of 548 crashes collected from 91 two-lane intersections to model the probability of occurrence of five types of crashes using binomial multilevel models with crashes clustered into intersections. For each crash type, a separate model was developed. It was found that the random variation of the intercept across intersections was significant except for one crash type (head-on). It meant that the average probability for these types of crashes to occur varied significantly from intersection to intersection. The modeling results showed that
multilevel models provide similar results compared to the traditional models except for one crash type (sideswipe opposite direction).
Helai et al. (2008) used a binomial multilevel model to predict the severity level of driver injury and vehicle damage in traffic crashes based on a total of 4,095 crashes occurring at
signalized intersections that involved 7,084 driver-vehicle units. A driver-vehicle unit was defined by both the vehicle and the driving person involved. A binary dependent variable was defined by combining the driver injury severity and vehicle damage severity for the vehicle- driver units involved in crashes. The authors compared the results of a traditional binomial regression model with the multilevel binomial model and found that the ICC (Intra-class Correlation Coefficient) was 28.9%. It meant that 28.9% of the variation in the probability for driver-vehicles units to have experienced severe damage resulted from between crash variance or within crash correlation. The comparison with the classical model also revealed a better model fit for the multilevel model formulation.
Yannis et al. (2010) modeled the probability for each individual occupant in a vehicle to sustain different levels of injuries using injury severity levels as a multinomial response. They used a dataset containing 1,300 crashes that involved 3,500 occupants. The results revealed no random variation at the vehicle or crash level for the probability of injuries sustained by the occupants.
Dupont et al. (2010) modeled the probability of a fatality for each individual occupant using a set of fatal car-car crashes. They considered the country, crash, and vehicle hierarchy for the multilevel modeling. The results indicated no significant random variation at the higher level. Comparison with a single-level model formulation revealed similar values and significance for the coefficients of the variables considered for that modeling purpose.