Chapter 5: Predicting travel flows with spatially explicit aggregate models
5.3 Model specifications
As mentioned above, the aim of this paper is to assess whether spatially-explicit models perform better in predicting the impact of interventions in a transport network than their non-spatial counter-parts. For this purpose, we will employ the estimation methods presented in the previous section to predict the impacts of a change in the design of a public transport network.
We will use data on the public transport supply and demand from the Arnhem-Nijmegen region in the east of the Netherlands.
The dependent variable in the models is the total number of public transport passengers between each OD pair in the month March 2014. That is, the total number of trips that have been made (in the studied month) from an origin i to a destination j. The information on the number of passengers (flows) between any two bus-stops are retrieved from smart card registrations.
In this section, we present the model specifications to forecast future travel flows. Our model structure follows the multilevel SIM presented by Kerkman et al. (2017). This model consists of two levels: a lower model and an upper model. The lower model estimates the sum of boardings and alightings in each zone based on spatial and transit supply characteristics. The upper model estimates the interactions (passenger flows) between zones, using the predictions by the lower model for the origin and destination zones, together with the travel impedance and OD specific characteristics.
Here, we use one specification of the lower model and three different specifications of the upper model. The reason to employ three different specification for the upper model is to generate a more solid base for comparing the performance of the estimation methods. If a single method performs better across all model specifications, we can be more certain that that estimation method is indeed superior to the other method.
For the comparison to be systematic, a combination of a lower and upper model will always be based on one single estimation method. So in what follows we will always compare a
multi-Chapter 5: Predicting travel flows with spatially explicit aggregate models 75
level model solely based on conventional OLS regression with a multi-level model solely based on the spatially autoregressive method. Since we distinguish between three specifications of the upper model, we will thus compare in total six multi-level models (two estimation methods * three model specifications).
In what follows, we will first present the specification of the lower model and subsequently the three specifications of the upper model. In these sub-sections, we will also provide detail on the data sources we have used for each of the variables includes in the various model specifications.
We end with a sub-section outlining our indicators to assess the performance of the nine multi-level models.
Variables of the lower model
We use the lower model to predict the sum of boardings and alightings in each zone based on characteristics of a zone’s potential demand and level of transit supply. The model is based on, and largely similar to, the lower model in the multilevel model by Kerkman et al. (2017). There are, however, a few differences. First, the model is estimated at a different spatial scale. Instead of the neighbourhood level, we now estimate it at level of postal zones, which are slightly larger in size than the neighbourhoods. Second, we included more variables on the transit supply.
Because of this, the model enables better predictions of the consequences of changes in transit networks on travel demand, which is the main focus of the current study. An overview of the variables included in the lower model is given in Table 5.1.
Table 5.1: Lower model variables
Variable Description Data source
Potential travellers (log)
Sum of number of inhabitants, employees, students, and train travellers in the zone
Statistics
Netherlands (2014) Distance to urban
centre
Euclidean distance between the zone centroid and the city centre of Arnhem or Nijmegen
Statistics
Netherlands (2014) Frequency (log) Total number of buses that serve the zone per
hour OVapi (2015)
Transfer possible
[0/1] At least two different bus lines serve the zone OVapi (2015) End of line [0/1] At least one bus line starts or ends in the zone OVapi (2015) Clustering
coefficient
Share of zones in its surroundings the zone is directly connected to (Derrible and Kennedy, 2011; Mishra et al., 2012)
OVapi (2015)
Variables of the upper models
At the upper level of our SIMs, we aim to predict the interactions (passenger flows) between all zones in the study area. We develop three model specifications, which differ in terms of the explanatory variables included in the model. The first model includes a balanced variation of variables on both travel demand and supply, the second model has a stronger focus on geographical separation of locations, and the third model has a focus on competition with other transport modes. They are all based on the upper model in the multilevel SIM specified in Kerkman et al. (2017).
The base model specification uses a selection of variables on potential travel demand (results of lower model), travel supply (travel impedance), and competition among zones. It is largely similar to the final model by Kerkman et al. (2017). However, there are a few differences. First, we estimate the models at a different spatial scale. Instead of flows between neighbourhoods, we analyse flows between postal zones. These are in general larger is size, but the zones are also more constant in size what likely improves the performance of the model. Secondly, competing origins (CO) and competing destinations (CD) variables are now included in their relative size compared to the average of all ODs instead of their actual size. This is done to prevent that an overall increase or decrease in transit level in the region influences the scale of the CO and CD variables, what would have disturbed the predictions of travel flows.
The “Euclidean separation” model has a stronger focus on the importance of the relative geographic locations of origins and destinations. It might be expected that proximity influences the potential travel demand between locations, which is not always directly translated into the travel time because of the transit network specifications. In addition to this, the balance between travel time and the Euclidean distance between locations might be important. This is added to the model by including the interaction term of these specific variables.
The “mode competition model” addresses the influence of the availability of alternative travel modes on public transport demand. It includes two variables that reflect public transport competitiveness vis-à-vis the car (“Car competition” variable) and vis-à-vis the bicycle (“Bike competition” variable). The first is defined as the ratio between car and public transport travel time on a zone-to-zone basis. The latter is defined as the ratio between bike and public transport travel time, also on a zone-to-zone basis.
Table 5.2 gives an overview of the variables included in each model specification.
Table 5.2: Variables included in each model specification
Variable Base model Euclidean separation Mode competition
Oi (lower model, log) X X X
Chapter 5: Predicting travel flows with spatially explicit aggregate models 77
Performance indicators
As discussed before, we will compare the performance of conventional and spatial explicit SIMs in terms of their ability to estimate the impacts of network changes on public transport ridership. Note that we do not study the impact of interventions that have actually taken place, but rather the impact of two hypothetical alternative network designs which we have developed ourselves. This implies that we cannot assess the performance of the various models based on the match between model predictions and actual changes in travel flows.
We will assess the performance of both model types on three aspects: (1) the (theoretical) quality of the models, (2) the models’ ability to explain existing travel flows, and (3) the models’ ability to generate (theoretically) reasonable predictions.
We assess the performance of the models at two levels: within each model specification, and between model specifications. At the first level, we assess the performance of each model type based on the three aspects mentioned above. This gives a first impression of the performance of the different estimation methods, as well as of the relative quality of the different model specifications (i.e., the selection of explanatory variables included). At the second level, we compare for each model type whether the different model specifications lead to comparable results. The latter is important, as it is not always possible in practice to employ the theoretically strongest model specification due to data limitations. Pragmatic decisions regarding model specification thus have to be made, which is of limited concern if a model is capable to deliver comparable predictions irrespective of the exact model specification. We therefore assess the consistency of the model results between the different model specifications, for both estimation methods separately. An overview of the performance indicators used to assess the model types is displayed in Table 5.3.
Table 5.3: Model performance indicators
Within each model specification Between model specifications Quality of the model Direction and size of parameters
in line with expectations
-- Consistency in predicted flows at
OD-level