Accident severity models - E CONOMETRIC MODELS

CHAPTER 3 ECONOMETRIC METHODS USED IN ACCIDENT

3.4 E CONOMETRIC MODELS

3.4.2 Accident severity models

Accident severity is often measured categorically, for instance, the severity level of an accident can be classified as fatal, serious injury, slight injury or no injury (property damage only). As such, econometric models that are suitable for categorical data, such as logistic and probit models, have been used to analyse accident severities. For modelling binary outcomes, such as “fatal” and “non-fatal” accidents, a binary logistic regression model is a natural choice for modelling such data and has been widely used in the road safety literature (e.g., Pitt et al., 1990; Shibata and Fukuda, 1994; Miles-Doan, 1996; Farmer et al., 1997; Toy and Hammitt, 2003). For example, Pitt et al.

(1990) employed a logistic model to investigate the effects of various factors, such as age, gender and speed, on the relative risk of serious injuries (i.e. serious vs. non-serious injury).

Although it is possible to fit sequential binary logistic regression models for modelling accident data with multiple (three or more) outcomes of accident severity (see Shibata and Fukuda, 1994; Miles-Doan, 1996), more sophisticated models have been proposed and used in previous studies. Two types of such model were proposed: (1) ordered response models (ORM), such as ordered logit and probit; and (2) unordered nominal

Chapter 3: Econometric Methods Used in Accident Modelling 50 response models, such as multinomial, nested and mixed logit models. Studies employing such models are discussed below.

Since the accident severity is ordered in nature (ranging from non-injury to fatality), it seems natural to choose discrete ordered response models for analysing accident severity data. The ordered response models (ORM) refer to ordered logit and probit models and their various extensions, such as a generalised ordered logit model. There is a body of previous studies employing an ordered response model in the accident severity analysis. For example, O'Donnell and Connor (1996) investigated how attributes of road users affect the accident severity using an ordered logit and probit model. The ordered probit model has been used by several researchers in recent works to examine the accident severity (e.g., Khattak et al., 1998; Duncan et al., 1998;

Kockelman and Kweon, 2002; Quddus et al., 2002; Zajac and Ivan, 2003; Abdel-Aty, 2003; Lee and Abdel-Aty, 2005). Both ordered logit and probit models are essentially equivalent and according to O'Donnell and Connor (1996) and Abdel-Aty (2003), both these ordered models produce very similar results.

The ordered logit and probit models can be extended in many ways to address the restrictions arising from their underlying assumptions. One of the primary assumptions of ordered logit and probit models is that the error variances are the same for all observations. When this assumption is violated (i.e. heteroscedasticity) the parameter estimates are biased and the standard errors are incorrect (Yatchew and Griliches, 1985;

Keele and Park, 2006). To correct this, Williams (2008) suggested employing a heterogeneous choice (also known as location-scale or heteroscedastic ordered) model which relaxes the assumption by explicitly specifying the determinants of heteroscedasticity. Another important assumption associated with ordered logit and probit models is that the relationship between each pair of outcome groups is the same.

That is, for example, for the three categories of severity levels (slight injury, serious injury and fatality), the ordered model assumes that the slope coefficients for slight injury vs. serious injury plus fatality are equal to the slope coefficients for serious injury vs. fatality. This assumption in the literature is known as the parallel regression assumption, or for the ordered logit model as the proportional odds assumption (Long and Freese, 2006). This assumption is frequently violated and can lead to inappropriate or misleading model estimation results (Long and Freese, 2006; Fu, 1998). To relax the restrictive parallel regression assumption on slope coefficients, a generalised ordered

Chapter 3: Econometric Methods Used in Accident Modelling 51 logit model can be used to model ordinal data (Fu, 1998). This model allows the coefficients to vary across different outcome groups. Similarly, a partial proportional odds model has been proposed (Peterson and Harrell, 1990; Lall et al., 2002; Williams, 2006) to constrain only a subset of coefficients across different outcome groups. This model (the partial proportional odds model) has been recently used to investigate the left-turn injury severity at intersections by Wang and Abdel-Aty (2008); and the effect of traffic congestion on accident severity by Quddus et al. (2010).

Recent developments in ordered response models include those works undertaken by Eluru and Bhat (2007) and Eluru et al. (2008). Eluru and Bhat (2007) employed a random coefficient (i.e. mixed) ordered logit model in order to allow randomness in the effects of explanatory variables due to unobserved factors. Eluru et al. (2008) then extended this work by using a mixed generalised ordered response (MGORL) model. In ordered logit/probit models the thresholds are fixed across accidents. The MGORL model, proposed by Eluru et al. (2008), relaxed this restriction by allowing thresholds to vary according to both observed and unobserved factors. The MGORL can accommodate heterogeneity in both explanatory variables and threshold values.

However, no significant heterogeneity effects were found in their results suggesting that the MGORL model became a generalised ordered logit model.

Although the use of ordered response models (ORM) in analysing accident severity seems popular and sensible as accident severity is ordinal in nature, the use of such models can be criticised. As discussed by Kim et al. (2007) and Savolainen and Mannering (2007), two major problems raised with using ORMs are: first, traditional ORMs assume that the effects of a variable would either increase or decrease accident severity (i.e. in one direction). However it is highly possible that a variable may have a U-shaped effect on different categories of accident severity. In other words, a variable may simultaneously increase (or decrease) the possibility of high and low level severities. Following the example given by Washington et al. (2003) (also see Ulfarsson and Mannering, 2004; and Khorashadi et al., 2005), the deployment of airbags may cause slight injuries and reduce the likelihood of fatality, i.e. the airbag deployment can simultaneously decrease the possibility of having “no injury” and “fatality”, but only increases the possibility of “slight injury”. In addition, a variable may only have effects on a subset of severity levels, for example, only increase the likelihood of severity levels from “slight injury” to “serious injury” but does not increase “fatality”. Therefore

Chapter 3: Econometric Methods Used in Accident Modelling 52 the constraint (either increase or decrease severity) imposed by traditional ordered response models would be inappropriate in such a case. The second issue associated with traditional ordered response models is related to the fact that accident data often suffers from the under-reporting problem, especially for lower severity categories, such as “no injury” and “slight injury”. The presence of under-reporting means that high level severity accidents such as “fatality” and “serious injury” are over presented in the data, which can lead to biased and inconsistent results using traditional ordered response models (Yamamoto et al., 2008).

While the first problem discussed above may be solved to a great extent by using a generalised ordered response model (GORM) which allows the coefficients to vary across different levels of severities, the second problem is more difficult to correct.

Thus alternative methods such as an (unordered) nominal response model are proposed.

A nominal response model, such as a multinomial logit (MNL) model does not recognise ordinality in the model structure. A MNL model however is more flexible in terms of the functional form as the independent variables are not assumed to be identical across all severities in the model. Therefore, a MNL model allows different severity categories to be associated with different sets of independent variables (Yamamoto et al., 2008). Another advantage of a MNL model is that it provides consistent coefficient estimates except constant terms when under-reporting occurs (Cosslett, 1981; Yamamoto et al., 2008). The MNL model has been employed to analyse accident severity in early studies (e.g., Shankar and Mannering, 1996; Carson and Mannering, 2001) and has more recently been used as a preferred model specification to ordered response models (Ulfarsson and Mannering, 2004; Khorashadi et al., 2005; Kim et al., 2007).

One potential problem of a MNL model is that this model assumes that the unobserved components (effects) associated with each accident severity category are independent, which is referred to as the independence of irrelevant alternatives (IIA) property (Long and Freese, 2006). If the IIA assumption is violated, i.e. different accident severity categories share unobserved effects, the model estimation results would be incorrect. To circumvent this limitation, a more generalised modelling approach has been used by assuming a homoscedastic generalised extreme value (GEV) distribution for unobserved effects (Train, 2003). One popular GEV formulation, the nested logit (NL) model has thus been used in some previous safety research (Shankar et al., 1996; Chang

Chapter 3: Econometric Methods Used in Accident Modelling 53 and Mannering, 1999; Lee and Mannering, 2002; Abdel-Aty and Abdelwahab, 2004;

Savolainen and Mannering, 2007). The NL model groups severity outcomes with shared unobserved effects in a nest, for example “no injury” and “slight injury” could be grouped to form a low severity accident “nest”.

While the GEV models can take many different forms (MNL is also a special case of GEV, see Train, 2003), a more flexible approach has been proposed by adding a more general mixing distribution of error component in the model. This model, which is referred to as the mixed logit model, is enormously flexible and powerful. As showed by McFadden and Train (2000), any discrete choice model can be approximated to any degree of accuracy by a mixed logit model. Thus the mixed logit model can be used to accommodate complex patterns of correlation among accident severity outcomes and unobserved heterogeneity. The mixed logit model has been recently employed by Milton et al. (2008) to analyse highway accident severity in Washington State.

In this thesis, a series of ordered and nominal response models will be developed to analyse the effect of traffic congestion on road accident severity, such as a generalised ordered logit model and a mixed logit model. Accident severity models will also be used to estimate expected proportions of accidents at different severity levels on road segments.

In document The relationship between traffic congestion and road accidents : an econometric approach using GIS (Page 64-68)