Chapter 4 Data and methodology
4.4 Research strategy for Objective 2
4.4.4 Data processing for geo-visualising smart card data
In order to compute conditional plots and flow-comaps to achieve the research goal of Objective 2, three components of data processing were conceptualised and carried out, including extracting bus service patterns, reconstructing travel trajectories and finally constructing flow matrices, each of which are explained as follows.
Extracting bus service patterns
First, GTFS data was processed to extract the service patterns, specifically, the sequence of bus stops for all bus routes across the Brisbane network. This provided the basis for reconstructing bus passengers travel trajectories such that the intermediate locations (i.e.
the bus stops that are passed through) and time stamps (i.e. the times at which the passenger passed through each of the stops) between the boarding and alighting stops can be identified and added to each smart card record. This process contains the following three steps.
(1) Brisbane’s bus service patterns were first extracted from a ‘stop-times’ file embedded within the GTFS data, in which the times each bus route arrives at and departs from the individual stops on daily basis are detailed (Google Developers, 2012). However, the resulting service patterns might not be sufficient for reconstructing travel trajectories of bus trips, due to the fact that this method does not identify the passed through stops for all the bus routes, particularly express lines that do not serve certain intermediate stops.
(2) To address the aforementioned issue, next, drawing on two GTFS files (i.e., stop file and shape files of bus routes), a distance of 20 metres was applied to spatially join stops with routes by using GIS-based techniques, given that 99.8% of stops were within this distance to their nearest routes. This procedure generated a list of bus stops that were geographically proximate and were assumed to be passed by a given bus route. To avoid the potential error of missing certain stops, this list (or the spatial-based routes) was examined against the stop-sequence patterns extracted from the stop-times file of GTFS (or stop-times-based routes).
(3) Finally, it was found that 119 inbound and 108 outbound routes had missing stops. A further examination of the spatial layout of these routes showed that six inbound and four outbound routes considerably deviated from the stop-times-based routes and therefore were excluded from the analysis (Table 4.5). This also resulted
89 in the removal of a rather small proportion (less than two per cent) of the original smart card records (summarised in table 4.6) for the five days, which is considered an acceptable level of inclusion of the original data.
Table 4.5 Summary of problematic routes passenger trip recorded by smart card data (Figure 4.5B) were identified and added. This was achieved by finding matching information in terms of route, direction, boarding and alighting stops between smart card data and the expanded GTFS data (Figure 4.5A). This step broke each of the original passenger trip records into a number of continuous stop-to-stop legs (or mini-trips), of which each added stop was denoted as the end point for one leg and the start point for the next one (Figure 4.5C).
Through adding the intermediate stops between boarding and alighting stops, integrating the intermediate time stamps across all mini-trips also became feasible. This was achieved
90 by two simple steps: first, the network distance for each mini-trip was calculated; second, drawing on the time differences between boarding and alighting times of a single trip, the travel time for each mini-trip was estimated proportionally to the ratio of network distance to the total trip distance. Based on the estimated travel times for the mini-trips, the time stamps for passing through the intermediate stops were added. Through the addition of both passed-though stops and intermediate time stamps, the travel trajectories of bus passengers were reconstructed at the bus stop level. After this process, over five million mini-trips were attained. Due to some minor non-concurrences between the GTFS data and smart card data, not all bus trips were successfully reconstructed as mini-trips. This issue is examined in detail as part of the results in Chapter 6.
91 Figure 4.5 Data processing for smart card data, souce: Tao et al (2014a)
92 Constructing flow matrices
The reconstructed travel trajectories of bus passengers of the five calendar events were classified into two groups, i.e., BRT trips and non-BRT trips. The ‘trips’ here refer to linked trips that consider transfers between smart card records, which can be identified based on the ‘Trip-ID’ entry in the smart card data, i.e., linked records were given consecutive Trip-IDs, e.g., 1, 2, 3. Based on this definition, as long as a part of a bus trip operated on the BRT busway, it was considered a BRT trip; otherwise, it was classified as a non-BRT trip.
This enabled the detection of spatial connections between the BRT busway and the remainder of the bus network in terms of travel demand (e.g., the trips feed into the busway from the rest of the bus network).
Following the identification of BRT and non-BRT trips, ten subsets of smart card data were attained (i.e., BRT, non-BRT trips for each of the five calendar events). Flow matrices that depicted the flow volumes between all the bus stops were then constructed for each of the ten subsets. Each of the flow matrices was further disaggregated into two directional series based on the ‘Direction’ entry of smart card data (i.e., inbound series reflecting flows moving towards the CBD, and outbound series reflecting flows moving away from the CBD).
To capture the temporal change of travel patterns, each series was segmented into four sub-matrices that were on continuous time periods of the day (i.e., morning, noon, evening and night). This rendered a total of 80 flow matrices (i.e., 16 matrices for each day).
Drawing on the flow matrices constructed comaps of stop-level behaviours and flow-comaps of passenger flow patterns were generated using GIS. The detailed results are presented and discussed in Chapter 6.