Chapter 4 MODEL AND DATA SPECIFICATION
4.3 Data
4.3.2 RP data methodology and availability
In terms of data requirement, the RP dataset used in this thesis should consist of survey data and level-of-service data. The former reveals travellers’ chosen alternatives, e.g. the chosen route, mode and departure time. Moreover, travellers’ socio-economic and demographic data is also included in the survey data, e.g. age, gender, income level, flexibility, owned car type (important to estimate the toll level and cost in some cases), work starting time, etc. The level-of-service dataset records actual travel time information between origin and destination.
This data can be applied to specify attributes that are not revealed in the survey data, such as travel time attributes, inter alia. In this way it is possible to obtain the true network performances of alternatives, and more importantly, the travel time extracted from the level-of-service dataset can be regarded as the attribute of travel time perceived by travellers under some assumptions. The premise of this technique is that it is assumed that travellers are experienced enough to be aware of the true travel time distribution (although non-EUT allows misperception of distribution, this distortion is still based on true travel time distribution).
Based on the above methodology, it is possible to collect the required data. Initially, however, existing studies were consulted since these represent the most efficient way to obtain qualified data. Additionally, even though collecting new RP data is time consuming for a PhD student, the new method proposed here for RP data collection for modelling risky choice behaviour is introduced and the availability of this data is discussed. Accordingly, , subsection 4.3.2.1 discusses the availability of existing datasets and briefly describes the selected dataset, i.e., SR91 data; while subsection 4.3.2.2 proposes several possible methods for new RP data collection, and outlines the new London Underground dataset used in this thesis.
4.3.2.1 Existing datasets
Travel diary data: Senbil and Kitamura (2003) investigated the applicability of PT in the context of RP data. Their original data was collected in Japan in 2002 and they randomly mailed 1000 resident drivers who were asked to record departure and arrival time for three
91
days, and answer whether they would change their departure time. A distinguishing feature of this RP exercise was that respondents were asked to supply their preferred arrival time (PAT), and thus PAT is regarded as the reference point in their PT model. It should be noted that this kind of mail survey corresponds to a diary survey, in that both methods are capable of recording respondents’ daily travelling experiences. The travel time reported by respondents is the same as the travel time actually experienced by them. In the UK, a similar data set is the National Travel Survey (NTS) database. This survey consists of two parts – interview with respondents in their houses, and a seven day travel diary. Along with other traffic information, it is possible to apply NTS data to characterize individuals’ travel behaviour. The lack of detailed origin and destination (OD) information is, however, the main drawback of the NTS database.
GPS data: With the development of technology, advanced equipment has been increasingly applied to RP surveys. This enables researchers to obtain detailed travelling information by automatically tracking respondents’ actual travel choices. A recent study by Carrion and Levinson (2010) carried out an RP study to investigate drivers’ willingness to pay for the improvement of reliability offered by a High Occupancy Toll road. They equipped Global Positioning System (GPS) devices to each respondent and traced their choice behaviour. It has been found that such devices are relatively mature and feasible, and the data from these loggers are more reliable in reporting accurate location and travel time than travel diary surveys. Whilst a GPS logger is bigger and heavier than the other alternatives such as a mobile phone, they can be installed into vehicles so that respondents would not consider its portability. Furthermore, the passive nature of the data collection reduces the load on respondents. Hence, vehicle-based GPS loggers are highly recommended for future route choice surveys.
Floating car data: Small et al. (2005b)’s study on the SR91 corridor in the US pays particular attention to travellers’ route behaviour using both SP and RP data. In their RP sample, raw observations (438) for travel time were derived from field measurements on SR91 by students’ repeated driving, i.e. floating car data. Each observation was of the route choice made by travellers between two routes – a tolled and an untolled route. The tolled route was assumed to be uncongested and to have a fixed and known travel time, whereas the untolled route was congested with an uncertain travel time. The magnitude of the variability in travel time depended on the demand, and hence varied according to the time of day.
Travellers were assumed to have a fixed time of travel. Therefore any traveller in the RP
92
sample was assumed to be choosing between a certain prospect (the tolled route) and an uncertain prospect (the untolled route), where the uncertainty in question is in travel time (see also Chapter 5).
In closing, Small et al. (2005b)’s SR91 dataset is used in this thesis due to three reasons. Firstly, a GPS survey exercise is extremely resource intensive and thus the corresponding sample size is usually quite small. Secondly, this research requires extra information from network data which is not available in the existing travel diary dataset.
Finally, SR91 dataset offers a natural experiment for risky choice research with two parallel competitive routes, it is accessible, and its survey data and network data is suitable for our risky choice framework.
4.3.2.2 Methodology for new RP data collection
The feasibility of conducting a new RP exercise was also investigated. Whilst a risky choice framework appears to complicate RP data collection, as discussed above, this kind of data collection would still be feasible as long as tailored techniques and assumptions are adopted.
The used in this thesis methodology of simultaneously collecting survey data and level-of-service data is illustrated in Figure 4.1. The former covers respondents’ actual choices and their socio-economic information, while the latter is used to collect the data associated with alternatives, in particular travel time related data. Given that risky choice research has the feature of repeatability, the following discussion pays special attention to possible methods for addressing this issue.
Travellers’ choices can be observed by either traditional roadside interviews or other advanced methods. The advantage of interviews is that the researcher can ask respondents for detailed information including basic demographic data and additional information. For instance, ideal journey time and preferred arrival time have been found to be possible reference points in travellers’ reference dependent choice behaviour, and this information can be obtained from interview. Given the relatively large sample size needed in this research, it is difficult to employ this method in research which is of limited duration. Alternatively, new technological and commercial data, e.g. GPS, Automatic Number Plate Recognition (ANPR), cell phone data with customers’ information can be used. These methods enable drivers’
route choice behaviour to be conveniently observed, and greatly enlarge the sample size, however, it is extremely difficult to access drivers’ socio-economic information.16
16 In our feasibility studies, we contacted several councils and commercial companies which manage advanced data collection activities. We found that authorization is required to access drivers’ information recorded in
93
In the studies presented in this thesis, survey data must be complemented by level-of-service data in order to estimate the attribute of travel time and this is another challenge for data collection, since it requires massive performance data and appropriate assumptions. First of all, we have to collect performance data and measure travel time. For road transport, travel time can be estimated using the data collected by floating cars, ANPR, GPS or loop detector.
With the development of survey techniques, these methods are already capable of providing relatively reliable estimation of travel time. For rail transport, each train’s performance is monitored by signalling systems, and therefore, the recorded travel time extracted from such performance data is much more accurate. Consequently, special attentions were paid to rail system, especially to London Underground.
In this thesis, the new RP data is based on the London Underground (LU) dataset.
Survey data is from the Rolling Origin Destination Survey (RODS) dataset which records annual passenger survey results from a sample of underground stations. Train performance data is saved in the Network Management Information System (NetMIS) system through which we can retrieve historic data for each train (for details refer to Chapter 7). LU data is of special interest to us, in that it is the most efficient way to obtain qualified data for modelling risky choice behaviour. In particular, it avoids expensive and laborious field survey exercises, and the train running time system is much more accurate than the travel time estimated from network data.17 It should be noted that this proposed method still faces an unavoidable drawback of collinearity among key variables, e.g. travel time, travel time variability and travel cost. This problem can be overcome in two ways, however: firstly, the travel time distribution can be estimated across days for a given narrow time-of-day interval and a given day-of-week. Secondly, the interaction between the travel time variable and the socio-economic variables can be specified, which provides additional variations.
It is also necessary to acknowledge that train running time and frequency information alone does not ncecessrily provide a complete representation of passengers’ travel time experience, since it does not include representation of time spent in pedestrian access and circulation to and within stations nor account for the possibility that extreme train crowding
ANPR datasets, and companies managing cell phone and GPS data are not willing to share their customers’
personal information.
17 A series of comprehensive feasibility studies were conducted on travel time estimation, and research cases covering the M6 tolled road, the Maidstone area, and the Itchen bridge in the UK. It was found that it is extremely time consuming and expensive to install equipment and collect and analyse the data. Additionally, most of the loop detectors embedded in our target area are single loop, which provide only very limited information to estimate travel time. The associated possibility of inaccurate estimates of travel time was another important reason for not collecting data by ourselves. That said, we believe that the method proposed in this thesis will be useful for future study.
94
may extend platform waiting times. However, these limitataions are believed to be acceptable within the oveall context of the study.
Figure 4.1: Methodology for RP data collection in a risky choice context
4.4 Summary
This chapter proposed the model forms and data collection strategy that will be used in the ensuing analysis of travellers’ risky choice behaviour. Two principle methodological contributions have been proposed. The first is the generic risky choice framework incorporating various non-EUT approaches: SEU highlights the importance of nonlinear weighting function; rank-dependence is embedded in RDEU model on the basis of SEU; PT
95
is the typical theory which is capable of modelling reference dependent behaviour; CPT synthesizes both reference-dependence and rank-dependence. Finally, all of these proposed non-EUT models can be evaluated using an RUM approach.
The second contribution is the RP data collection method tailored specifically for our risky choice framework. Given that SP studies have predominated in the existing literature, the exploratory work conducted here enables a better understanding of how non-EUT models perform in an RP context, which improves the validity and reliability of this research.
Based on comprehensive feasibility studies, it was decided to use two datasets: the first being an existing dataset, which was originally collected on the SR91 corridor in the US;
the second was a dataset collected from the London Underground (LU) database. In the next chapter, we present the model result based on the SR91 data, while an extensive of their implementations is presented in Chapter 6. Chapter 7 presents a second case study using the LU data collected by us.
96