Making predictions across segments - Comparison of four different model structures

4 A method for estimating benchmark mobility levels

4.5 Comparison of four different model structures

4.5.4 Making predictions across segments

Another consideration is how well the version of the model that is estimated using data from one segment can predict the trip-making in another segment. This tells us something about whether the models are different — that is, though using the same specification, if the

magnitudes of the estimated coefficients capture differently weighted set of influences – as well as differences across segments that are not accounted for in the model.

A first piece of evidence as to whether the versions of the model for each segment differ is in examining the coefficients. In general, there are fewer variables with significant coefficients in the low-access models, which could be in part due to a substantially smaller sample size for this segment, making it harder to capture small effects with statistical significance. Comparing

the three segments (a vs. b vs. c), there are no instances of variables with significant oppositely signed coefficients; if significant, they are all appear to be influencing trip-making in the same direction, though there may be differences in the relative magnitude of each. A coarse piece of evidence suggesting that there may be significant differences is the likelihood ratio test for whether the segmented models (collectively) outperform the unsegmented model on the entire sample, which is significant for all model types.

Comparing predicted values across segments using the hurdle models, the low-access model (model 4a) underestimates trips for the high-access group, and the high-access model (model 4c) overestimates trips for the low-access group (see the original results in Table 32 and a consolidated summary of comparisons in Table 36). In particular, using the high-access model to predict trips in the low-access group (model 4c, estimated on a), actual trips are 0.437 fewer than predicted, on average (with an average absolute difference of 1.318 trips predicted above or below the actual value); using the low-access model to predict trips in the high-access group (model 4a, estimated on c), actual trips are 0.332 greater than predicted, on average (with an average absolute difference of 1.561 trips above or below the actual value).

The direction of this finding is to be expected, given important variables left out from each of these specifications: The high-access model (model 4c) excludes driver status (as well as the dummy variables for no cars and fewer cars, though including the continuous variable for number of vehicles); and the low-access model (model 4a) excludes all of the vehicle ownership variables, since there is no variation in these variables within the segment. It seems likely that these variables would account for at least some of the discrepancy across models. The fact that there is a discrepancy across models in their absence may be evidence that these factors, and their underlying causes, matter for mobility; that is, that driver status matters for non-owner

mobility; and vehicle-ownership level matters for owner mobility. (See related discussion in Chapter5.)

Even without these variables, however, the models account for a good share of the average differences in mobility across segments. In particular, the average trip volume (that is, observed 5) in the low-access segment is 1.388, versus 2.644 in the high-access segment, a 1.255-trip shortfall, on average (see Table 36). The model (calibrated to the high-access group, model 4d) predicts that the low-access group would have a shortfall, however, predicting an average trip volume of 1.826. This means that much of the observed differences across segments can be accounted for by the sorts of demographic variables included in the model, common to all segments. In particular, in the case of the high-access model explaining behavior of the low-access segment, about 34.8% of the observed gap in behavior is explained by this model. A remaining 65.2%, or 0.818 trips, is unexplained, perhaps having to do with excluded variables (so that the model is failing to fully capture variations in latent demand for activity) or offering evidence of a barrier to mobility unique to the low-access segment (explored more in Chapter 5).

Finally, it seems notable that the comparative performance of how well the low-access model predicts behavior in the high-access segment versus how well the high-access model predicts behavior in the low-access segment is not symmetric. In particular, the high-access model overestimates by a bit more than the low-access model underestimates (by 0.437 trips versus 0.332 trips; or 34.8% versus 26.4% of the observed gap). If this difference is indeed large enough to be meaningful, it suggests one of two things: (1) either the variables omitted from the high-access model are more important for explaining low-access-segment behavior than the variables omitted from the low-access model for explaining high-access-segment behavior; or (2) the proclivity to make trips is generally lower in the low-access segment than in the

high-access segment, even after accounting for the explanatory variables in the model. As discussed elsewhere, and in detail in the next chapter, this could have to do with either preferences or ability: It could be evidence of underlying systematic differences in the two groups in the desire to get around to different addresses, or of constraints the low-access group suffers, limiting their ability to fulfill their latent demand for trip-making. Before drawing further conclusions based on this preliminary modeling, first I develop a final best specification of a hurdle model using the high-access sample, for most accurate estimate of the expected trip volume for a given demographic profile.

Table 36. Cross-segment predictions, using the hurdle model (model 4) estimated for each segment Observed and predicted numbers of trips in

each vehicle-access segment

Average difference, actual versus predicted value

from low-access H model (5 − 5CD) 0 0.309 0.332

from low-access H model (5 − 5CE) -0.309 0 -0.002

from low-access H model (5 − 5CF) -0.437 -0.069 0

Difference in average observed value across segments

compared to low-segment average ( 5 − 5D) n/a 0.751 1.255 compared to mid-segment average ( 5 − 5E) -0.751 n/a 0.504 compared to high-segment average ( 5 − 5F) -1.255 -0.504 n/a Gap in prediction as percent of gap in observed values

based on low-access H model (5 − 5CD) / ( 5 − 5D) n/a 41.2% 26.4%

based on mid-access H model (5 − 5CE) / ( 5 − 5E) 41.2% n/a -0.4%

based on high-access H model (5 − 5CF) / ( 5 − 5F) 34.8% 13.8% n/a

In document Mobility Fulfillment Among Low-car Households: Implications for Reducing Auto Dependence in the United States (Page 108-111)