Data Collection and Preparation - Research Methodology

4 Research Methodology

4.3 Data Collection and Preparation

Research methodology in this research consists of four stages, including data collection and preparation, preliminary analysis, data analysis, and duration prediction (Figure 4- 5). Data collection is a vital part of the research approach in this study. In general, data required to develop HBDMs can be classified into two categories: dependent variable and explanatory (independent) variables. The dependent variable will be determined based on the study aim, whereas the explanatory variables will be selected based on previous studies and data availability.

In HBDMs, the dependent variable refers to the time variable of incident, in other words duration variable. As mentioned earlier in section 2.4, the definition of incident duration was found to vary from one study to another, based on the particular aims and objectives of different research projects. Thus, this is highly dependent on the purpose of the study.

In this study the total incident duration is divided into three time intervals:

1. Reporting time: the time in minutes between the incident occurrence and the responder receiving the call to respond to the incident.

2. Response time: the time in minutes between receiving the call and responding to the incident, and the arrival of the first responder (accident investigator) at the scene.

3. Clearance time: the time in minutes between the arrival of the first responder (accident investigator) at the scene and the last responder (accident investigator) departing from the scene.

60 It is important to collect the reporting, response and clearance times using an accurate approach to avoid censoring issues. This could be achieved as discussed in section 3.4 by clearly defining the start point and end point of each interval time with an appropriate time scale. For the purpose of this study, the start point, the end point and the time scale are clearly stated in the definitions of the interval times.

Another issue is that the event of interest, which determines the end of the interval time and the beginning of the following interval time, needs to be clearly defined. As mentioned in the definitions of the interval times, the event of interest for reporting time is receiving the call regarding the incident; the event of interest for response time is the arrival of the first responder; and the event of interest for clearance time is the departure of the last responder.

The explanatory data generally includes traffic accident characteristics. These characteristics may include variables such as time, location, severity, reporting mechanism and investigation mechanism of incidents. In addition, there are common characteristics that can be used for each interval time of the total incident duration, namely: location-specific data, time of day and day of the week. However, to obtain a clear insight into duration dependence, some interval times require specific information. For example, reporting mechanism may have an effect at reporting time interval. Also, clearance time could be affected by the investigation method and related mechanisms used for clearing incident sites. Thus, collecting interval related information is necessary to avoid misinterpretation of the model results.

Previous research has shown that numerous variables can be used to estimate or analyse incident duration. It was found that there are some common variables that have been used in previous studies, including incident type, location, the number of affected lanes, weather conditions, incident time, and the number of vehicles involved. Table 4-1 presents a summary of explanatory variables used by previous research in the area of incident duration.

Figure 4.1 Methodology framework

Figure 4-5 Methodology framework

Weibull DAT A ANAL Y S IS DA TA C OL LE CT IO N & PR EP AR AT IO N FTSS Records - Temporal, Geographic, Environmental, and Accident characteristics

ASCIS Records

- Occurrence, Reporting, Arrival and Departure time

Analysing covariate effects on interval times using Fully Parametric HBDMs - Estimated Coefficient Sign

- Magnitude in interval Time

Data Processing

Data cleaning, coding and preparation

Investigating the best distribution to represent interval time data using: - Plots to show the comparison between observed and predicted duration - Akaike Information Criterion

Interpreting covariate effects on interval time and their links to the current practices of traffic accident management in Abu Dhabi

Investigating the significant level of the explanatory variables using - 2 log 𝑳� statistic

Investigating for further significant variables using Stepwise Method Developing a base model using interval time data

Accident Duration Database Distribution 1 Predict Duration Visualise predicted vs observed duration Test prediction accuracy MAPE Develop decision trees DURAT IO N P RE DI CT IO N PR IL IM IN AR Y AN AL YS

IS - Select the best method considering the available HBDMs (PH vs AFT)

- Select the suitable interval time given the due consideration of the data collected for the study (1, 5, 10,… minutes)

Distribution n Distribution

62 Table 4-1 Explanatory variables of incident duration

Study Explanatory Variables

(Sullivan, 1997) Freeway characteristics, incident type, times, location and traffic volumes. (Garib et al., 1997) Time of the day, police response time, weather and number of vehicles involved.

(Jones et al., 1991) Season, time of day, special events, driver and vehicle characteristics, accident severity measures and location. (Khattak et al., 1995) Incident type, vehicle type, number of vehicles involved, injuries and fatalities, property damage, response time,

number of responders, weather condition, incident location, seasonal factors, flow conditions and motorist information.

(Nam and Mannering, 2000b)

Temporal characteristics, environmental characteristics, geographic information, incident characteristics and lead agency information (clearance time only).

(Sethi et al., 1994) Roadway type, incident type, incident severity and traffic conditions.

(Lee and Fazio, 2005) Crash severity, average daily traffic, day of week, number of vehicles involved in crash, light conditions, number of lanes, on- or off-freeway location, posted speed limit, road condition, work zone present, heavy vehicle involvement, urban or rural area and weather conditions.

(Wei and Lee, 2007) Incident characteristics, geometry characteristics, special relationship and time relationship. (Smith and Smith,

2001)

Time of day, day of the week, weather condition, number of vehicles involved, vehicle type and response type.

(Kim and Choi, 2001) Type of incident vehicle, incident service time, type of vehicle and location of incident vehicle. Pal et al, 1998 Type of incident, the position of the incident (lane, ramp or shoulder) and the time of day

63 Considering the explanatory variables used in previous studies, this study recognises the importance of using some common variables in the analysis. The appropriate variables were determined after examining the available data of traffic accidents in the study area for both urban accidents and highway accidents. Upon completion of the data collection process, the data were entered into a database before moving on to the data preparation stage.

4.3.2 Data Preparation for the Analysis

Data preparation aims to organise the data in a way that facilitates conducting survival analysis. It consists of three steps, beginning with data coding, followed by data declaration and finally, data examination. Before explaining these steps in detail, it is worth mentioning that data preparation and data analysis are going to be performed using Stata 10 software.

For the purpose of entering data, it is necessary to develop a coding system. However, prior to explaining this system, it should be noted that this system was only applied to the explanatory (independent) variables. This is because the dependent variable (time or duration) is measured in a continuous scale, therefore coding is not required. Also, it is worth mentioning that each accident was recorded from the beginning of each interval time until the end of that time. Thus, no censoring problem exists in the dependent variable, which makes recording survival data easier in one variable.

Furthermore, some explanatory variables were separated into sub-groups in order to investigate how they affect accident duration. For example, time of day was divided into three periods in Abu Dhabi: (1) morning, 12:01am-12:00pm; (2) afternoon, 12:01pm- 04:00pm; and (3) evening, 04:01pm-12:00am. Also, three additional sub groups were included in the database to find out whether the accident occurred either within the peak periods (AM peak: 06:00-08:00, PM peak: 14:00-16:00) or in the off peak periods.

The first step was to develop a coding system using Stata 10. This process starts by assigning a variable name to each independent variable. Then, the process of labelling the values of each independent variable was carried out in two phases. The first phase is to save the text and its value. This is known as labelling mapping, which consists of two texts with their values in this study, including ‘Yes=1’ and ‘No=0’. The second phase is to allocate labelling mappings to each independent variable.

64 Following data coding, the second step is data declaration, which is to clarify the dependent variable that represents the survival time in Stata 10. This stage is important in order to avoid repeating this process when making any survival command. It may consequently save a considerable amount of time when analysing data. The last step of data preparation is data examination. This step aims to check the suitability of the data for analysis. More details of applying HBDMs in Stata 10 are presented in section 4.7.

In document Modelling traffic accidents using duration analysis techniques: a case study of Abu Dhabi (Page 72-77)