Model Description - Model Estimation - Data Description and Methodology

Chapter 3 Data Description and Methodology

3.5 Model Estimation

3.5.2 Model Description

There are a number of statistical methods available to predict the number of accidents on roadway segments including roundabouts and intersections. The summary statistics illustrated in Section 3.3.6 show that for dependent variables (total accidents, truck accidents, and HBIs) the mean is less than the variance for all roundabout categories (see Tables 3-9a to c). This indicates that the data is over-dispersed and as discussed in Section 2.1.4 for such dispersion the NB distribution is more suitable for predicting count data, for this reason NB models were used to predict total accidents, truck accidents, and HBIs. In this study the random parameters model is applied for the first time to predict accidents and HBIs at roundabouts, because as described in Section 2.1.4, using traditional NB models (which allow parameter to be fixed across observations) to predict accidents at road sections including intersections and roundabouts would lead to biased results and wrong conclusions may be drawn as stated by Lord and Mannering (2010). The result obtained using the random parameters NB model is

compared with that of the fixed parameters NB model. As described in Section 2.1.4, random-parameters models allow one or more variables to vary across the observations and this can be indicated by the SD of the random variables (if the SD of the distributed variable is statistically different from zero then it is considered random, and if not it remains fixed across the observations). The model is based on a number of traffic and geometric variables described in Section (3.3.6) for each roundabout category.

The methodological approach behind the application of random-parameters models to count data is illustrated in detail in Section 2.1.5 and, as described in that section, in order to let variable effects to vary across the observations using random parameters count-data models, the predicted mean of the variables are written as illustrated in Eq. (2-8).

The random-parameters and fixed-parameters NB count data models were applied to the dependent variables using LIMDEP software, which is an econometric and statistical software package that provides a programming language to specify, estimate and analyse random and fixed-parameters NB models. This section describes the command statement used to predict accidents and HBIs, illustrating the procedure behind the random parameter; for more details about the procedure see Appendix E. Halton draws and marginal effects used in the statement are explained, and all the parameters used in the LIMDEP program are described. A more detailed description of the commands and statements illustrated in this section is given by Greene (2007).

Let:

negbin = specification for NB model.

Lhs = specification includes dependent variable; let (y) be the dependent variable (total accident or truck accident or HBI numbers) based on the model to be developed.

Rhs = specification, including the independent variables; let whole roundabout x1 to x11 be the geometric and traffic variables in which (x1 = two-lane indicator, x2 = three-arm indicator, x3 = is four-arm indicator, x4 = is five-arm indicator, x5 = is ICD, x6 = is circulatory roadway width, x7 = is entry width, x8 = is signalised indicator, x9 = is un- signalised indicator, x10 = is the percentage of truck traffic, and x11 = is ln(AADT)). The variableone is used for the constant term, as described by Greene (2007), who stated that in order for a model to contain a constant term the variable ‘one’ with the other Rhs variables must be included.

In order to compute the predicted value, keep = yfit170 command is used to compute the prediction values for the estimated model and keep them as new variables named yfit170

(Greene, 2007). The computed predicted value is compared to the actual value in order to identify the accuracy and fitness of the random parameters models to the data relative to the fixed parameters model.

Rpm: description of the random parameters models

Pts = the number of replications (draws) for the estimated simulation; the program default value is 100, but we can change this value.

Halton = specification of Halton sequences or draws for simulation-based estimators (see Section 2.1.5.1, and later in this section).

Fcn = the specification of the random parameters. The basic form is Fcn = parameter label (type), in which the ‘parameter label’ is defined as a variable name that has been used in Rhs specification, and ‘type’ is one of the distributions defined in the later paragraph (Greene, 2007). It should be noted that the random parameters model inLIMDEPhas a combination of fixed and random parameters: the FCN specification is used only for the parameters that are considered random, for a fixed-parameter model this statement will be removed before running the program.

Marginal effect = displays estimated marginal effects (more detail concerning this is given in Section 2.1.5.2, and later in this section).

Then from the above specifications a fixed and random-parameters NB command that is used to predict accidents and HBIs can be written as:

Fixed-parameters NB model command:

--> negbin;lhs = Y;rhs = one,x5,x9,x10,x11 ;rpm;pts = 200;halton

;marginal effects$

Random-parameters NB model command:

--> negbin;lhs = Y;rhs = one,x5,x9,x10,x11 ;rpm;pts = 200;halton

In the random-parameters model statement shown, in the fcn statement, for instance x10 (n), the variable (n) is for normal distributions, and in this study all the random parameters were found to statistically fit in a normal distribution. However, there are other distributions, for instance “lognormal (which restricts the impact of the estimated parameter to be strictly positive or negative), Weibull, uniform and triangular” (Anasatasopoulas & Mannering, 2009, p.155). Note that for the random parameter (i.e. normally distributed), the tool in Stattrek (2016

)

was used in order to identify the probability that a normal random variable has chance to increase or decrease in accidents and HBIs as a percentage. Firstly, the standard score (z) illustrated in the calculator as a value of zero was used (i.e. area under normal distribution), then the resultant value of mean and SD of the random parameters examined from the model was uploaded to the calculator to give the cumulative probability P (Z < z)27 for the random parameters. The probability will be between zero and one, by which the uncertainty associated with the event is quantified (e.g. the probability that a two-lane indicator is associated with more rather than fewer accidents would be 0.33, which means 33% of two-lane roundabouts resulted in more accidents, and 67% resulted in fewer).

Estimating random and fixed-parameters NB model requires a maximum likelihood simulation. As discussed in Section 2.1.5.1, Halton draws were used by previous researchers (Anastasopoulos & Mannering, 2009; El-Basyouny & Sayed, 2009; Garnowski & Manner, 2011; Ukkusuri et al., 2011; Venkataraman et al., 2014) to overcome the problem of maximum likelihood estimation for the random parameters data that is independent, thus this technique with 200 draws was applied in the current study, whereby (based on the models developed) the maximum number of random parameters is three, which means the model requires three-dimensional integration to estimate a good approximation. And Bhat (2003) found that 150 Halton draws gives a good approximation for dimensions of less and more than five, when it is compared to 500 random draws.

Marginal effect in the model statement, as described in Section 2.1.5.2, Eq. (2-10), gives the predicted change in the dependent variable with respect to a one-unit change in the independent variable over a time period. For instance, as the accident data covers 11 years, the expected change in accidents will be over 11 years, by a one-unit change of the independent variable (e.g. if the marginal effect for ICD (metre) was 0.2, a one metre increase in ICD would be associated with an increased average of 0.2 accidents over an 11-

Is a value referring to the probability that a randomly selected variable will be less than or equal to a specified

year period). As discussed in Section 2.1.5.2, LIMDEP software computes marginal effects with respect to the mean of the independent variable (see Eq. (2-11)) instead of taking individual means then dividing by the number of observations as in Eq. (2-10).

The procedure for getting a random-parameters NB model:

1. Firstly check all the variables to get a good fixed-parameters NB model: any variables that are insignificant will be removed from the model. A good fixed-parameter NB model is acquired by adding the first variable, if it is found to be significant it will remain in the model, if not it will be removed and the second variable will be added. This process continues until all the variables have been checked and only the significant ones remain in the model. 2. After building a good significant fixed-parameters model, then all significant parameters in the fixed model are tested as a random variable in order to see if any independent variables are distributed randomly; the variables that were found to be insignificant in the fixed- parameters model are also tested in the same way. A parameter is random when the SD of the parameter distribution is statistically different from zero; if the estimated SD of the parameter distribution is not statistically different from zero then the parameter is fixed across the roundabouts.

Note that when a given variable’s:

 Mean and SD are insignificant; it is removed from the model (see orange circles outlined in LIMDEP output below). Note that the significance of the parameter is indicated by thet-statistic (b/St.Er.). Usually, at-test is used to test the significance of the coefficients; three t-statistics are available for testing the significance of the variables (1.65, 1.96, and 2.58 for 90%, 95%, and 99% significance level, (Washington et al., 2011)).

 Mean is significant and SD is insignificant (not statistically different from zero), the variable is fixed across the observation (see orange circles outlined inLIMDEPoutput below).

 Mean is statistically not insignificant and SD is statistically different from zero (i.e. significant), the variable is considered random across the observation (see orange circles outlined inLIMDEPoutput below).

 Mean and SD are both significant, the variable is considered random across the observation (see orange circles outlined inLIMDEPoutput below).

The detailed procedure concerning theLIMDEPoutput is illustrated in Appendix E. It should be taken into account that once the random-parameters model is estimated, a separate

parameter is estimated for each observation, therefore it cannot be written as an equation since each observation has its ownߚif the parameter is random.

3. Then the estimated final random-parameter model is run as a fixed model by removing the FCN statement for comparison.

4. The same procedure (steps 1, 2 and 3) was used to estimate models for grade-separated and at-grade roundabouts based on total accidents, truck accidents and HBIs. In addition, the same procedure was applied to total and truck accidents when they are related to HBIs, along with traffic and geometric variables for whole roundabouts, within circulatory lanes, and at roundabout approaches and for different approach categories (number of lanes, signalisation, and road class).

In document Analysing truck position data to study roundabout accident risk (Page 116-122)