The desire to understand the association between collisions and factors such as road geometry or traffic profiles led to the development of explanatory statistical models.
Chapter 3: Review of statistical techniques for road safety improvement 45
Miaou et al. (1993) and Shankar et al. (1995) gave three motivations for this work;
to take proactive remedial actions at high collision-risk locations across a network;
to identify variables critical to collision risk and develop relevant countermeasures;
and to predict the impact of countermeasures or to improve the before-after comparisons of countermeasures. These goals are based on engineering goals but human factors should be incorporated into this research.
Lord et al. (2010) published a review of the strengths and weaknesses of the methodological approaches pertinent to collision count modelling and looked forward to a substantial improvement in the understanding of associated factors by combining new methodologies with detailed collision data.
3.5.1 Linear regression
Collision count models were originally built using linear regression (Joshua et al., 1990; Jovanis et al., 1986; Zegeer et al., 1990). These models are well understood and give intuitive results, especially for large numbers of events, but they are not always reliable, for example where there are few collisions negative values can be predicted. (Joshua et al., 1990; Jovanis et al., 1986). One clear issue in linear regression is that the dependent variable must be measured on an interval scale whereas collision data consists of discrete non-negative integers, often with a preponderance of zeros. Skewed collision distributions were often transformed using logarithms and a small offset added to allow linear models to handle zero counts (Zegeer et al., 1990). Problems remained with linear regression, these included heteroskedasticity, often the error variance increases with traffic flow, and miss-specification of the coefficients (Jovanis et al., 1986). These imply that the assumptions of ordinary least squares (OLS) used in linear regression were broken and new methods were needed.
Linear regression modelling is useful in a range of situations where a dependant variable measured on an interval-scale depends on other interval, ordinal and nominal independent variables. Linear regression models have been used
successfully in work-related road safety to model behaviour and perceptions related to drink driving (Guppy et al., 1995) and to create models of intention to speed (Newnam et al., 2004). Although linear regression models are very versatile they
are restricted to modelling interval scale outcomes and are based on assumptions of constant variance and independence of residuals which may not always be met (McCullagh et al., 1989).
3.5.2 Generalised linear models
The generalised linear model (GLM) has a flexible form that allows error
distributions other than the normal but includes linear regression as a special case.
Joshua et al. (1990) and Miaou et al. (1992) indicated that GLMs and in particular Poisson based models are more suitable than linear regression for modelling collision counts. Barger et al. (2005) and Lynn et al. (1998) have reported using Poisson regression for work-related road safety using self-reported data. Poisson models have superior statistical properties but do require the mean to equal the variance. Over-dispersion, in which the variance is greater than the mean, is frequently observed in collision data (Miaou, 1994). A quasi-Poisson approach artificially inflates parameter standard errors to produce credible parameter estimates and incorporates over-dispersion (Maher et al., 1996). The negative binomial (NB) model (alternatively named the Poisson-gamma model) is widely used in collision modelling and explicitly includes a dispersion parameter (Cameron et al., 1998;
Lord et al., 2005; Maher et al., 1996). There are no reports, however, of this model being applied to work-related road safety except as part of a multilevel model (see 3.7 below).
There remain issues over data and methodology which have been summarised by Maher et al. (1996) and Lord et al. (2010) such as compensating for a trend in
collision risk. Where possible the analysis will attempt to overcome these issues and discuss possible gaps in the analysis.
Some data contain more zeros than would be expected under the Poisson or
Negative Binomial distributions. These situations are hypothesised to arise from a dual state process and have been models using the zero inflated Poisson (ZIP) or zero inflated negative binomial (ZINB) models (Lord et al., 2005; Shankar et al., 1997). Lord et al. (2005) stated zeros were as a result of low exposure rather than as a true dual state process while Lord et al. (2010) added that the safe state had a long
Chapter 3: Review of statistical techniques for road safety improvement 47
term mean of zero which is non-physical in road safety and advised against their use.
The application of count modelling to work-related road safety has included binary logistic, Poisson and a few negative binomial regression models with the majority of studies having been restricted to binary logistic regression. Wider use of these models and their derivatives (see below) should be explored.
3.5.3 Mixed models
The models discussed above assume the observations are uncorrelated thus having spherical errors. This may be untrue, however, especially if observations share locations or time frames (e.g. cross sectional data gathered over several time frames referred to as panel data) leading to spatial or temporal correlation (Lord et al., 2010). Random parameters are used to model the unobserved effects that are assumed to underlie these correlations. Mixed models combine fixed with random effects parameters to offer greater flexibility.
Mixed models offer not only the usual fixed effects or parameters but also effects or parameters. These allow random variation of intercepts for random-effects models and slopes for random-parameter models across the observations subject to the distribution being restricted, often to a normal distribution. Random-effects models allow the intercepts to vary randomly across the observations while the random parameter models allow the parameters to vary.
Random parameter, negative binomial models have the potential to explain more fully the factors influencing collision count frequencies than fixed parameter models. These have been applied using cross sectional data on Indiana State highway segments (Anastasopoulos et al., 2009), Washington State interstates (Venkataraman et al., 2011) and 200km of Indian undivided two-lane highways (Dinu et al., 2011) . In the latter case random-parameter models were able to take account of the greater variation within and among vehicles and drivers found in India compared to Western countries.
Random-parameter models have not been restricted to negative binomial distribution with other forms including Poisson, lognormal and full Bayesian implementations
(El-Basyouny et al., 2009). Similarly, random effects may be added to any model including those for collision severity. Examples of random effects in severity models are examined in Section 3.6 below.
A random effects negative binomial (RENB) model in which negative binomial model is fit to panel data, reprorted first by Hausman, et al. (1984), has been used in road collision modelling. This has included the evaluation of clustered median crossover collisions in Washington State (Shankar et al., 1998), and the study of collision occurrence at signalised junctions (Chin et al., 2003) both based on panel data. Improved model fit could be expected by allowing parameters to vary across observation rather than fixed across the entire dataset (Hausman et al., 1984), however, these models may transfer poorly to other datasets (Lord et al., 2010).
No random effects or random parameter models have been reported in the context of work-related road safety. This is an obvious area that provides an opportunity investigation given appropriate data.