Overall model fit and predictive ability - Comparison of four different model structures

4 A method for estimating benchmark mobility levels

4.5 Comparison of four different model structures

4.5.1 Overall model fit and predictive ability

Although the models have many variables with significant coefficients, the overall fit and predictive ability is weak. All of the models pass the low bar of performing better than an

“intercept-only” model containing no explanatory variables. (A likelihood ratio test comparing ℒ&, the log-likelihood value of the full model at convergence, to ℒ%, the “restricted”

log-likelihood for an intercepts-only model is significant at p<0.001 in each.) By design, almost all the variables included in the specification used in all of the models have significant coefficients when using the entire sample. Overall fit statistics such as pseudo-R² statistics, likelihood ratio tests comparing more- versus less-restricted models, and information-criteria-based statistics,

are generally accepted as useful for distinguishing between the better of two different models estimated on the same data (or for the same model estimated on two different data sets), but unfortunately there is no consensus as to threshold values for these statistics that would

indicate good or bad fit. That said, “very low [pseudo-R²] values may indicate a lack of fit” (Hilbe, 2011, p. 67), as we have here, with pseudo-R² values of about 0.2 for the BL models (model 2) and values less than 0.1 for all the others.

Furthermore, there is a fair amount of dissonance between the values predicted by the models and their actual values in the data. First, the incidence of making no trips (zeros) is difficult to predict, especially among the high-access group among whom zeros are generally less likely. In particular, using the BL model calibrated to the entire sample (model 2d), and similarly using the H model (model 4d), since its binary portion is identical to the stand-alone BL model, the predicted probability of making zero trips among all the zero-trip-makers, or

/(0)|(y=0), is 0.267, on average, meaning that the model estimates a 26.7% chance of making

zero trips (or with repeated observation, that zero trips would be made 26.7% of the time), compared with a predicted probability of 0.131 for those making at least some trips (that is, /(0)|(y=1) = 0.131; see Table 33). Of the zero-trip-makers, 14.0% are predicted to make zero

trips with a probability over 0.500, using the entire-sample model (model 2d), or just 2.5% using the high-access model (model 2c), with an average predicted probability among them of 0.187.) There is more prediction of zero trips among the zero-trip-makers in the low-access segment (model 1a, with an average predicted probability of 0.503 and 59.2% of them with a predicted probability over 0.500), but there is also a higher incidence of zero trips in this segment.

As another perspective on the predictive contribution of the BL model (model 2), I can compare the portion of cases for which its predicted probabilities better match the observed outcomes than a naïve assumption that all respondents made a trip (since this is the dominant

behavior). Since trip-making is dominant in all segments, the naïve assumption would be correct for most respondents, but more so in the high-access segment and less so in the low- and medium-access segments — which also means there is more room for improvement in low- and medium-access groups than in the high-access group. The measure lambda (λ) estimates the percent predicted “correctly” (that is, cases in which the predicted probability is greater than 0.500 for the observed outcome), after accounting for the “naïve” dominant outcome (from Veall and Zimmerman, 1996; see formulas in section 4.8.4). According to λ, none of the models provide much improvement among the high-access segment, with more improvement among the low- and medium-access segments. The most improvement is for the low-access model applied to the low-access segment, with 20.5-percent improvement after accounting for the naïve assumption. The measure sigma (σ) offers another measure based on the same principle, but designed to range between -1 and a positive value σmax (which is at its greatest when the alternative outcomes are equally likely, when pi= pj and σmax= 0.50; σmax decreases and approaches 0 as pi=>1 and pj =>0), with any value of σ above zero indicating any predictive power of the model (Veall and Zimmerman, 1996). By this measure, the high- and mid-access models (1b and 1c) provide no predictive power beyond the naïve assumption, though the low-access model provides some. (See Table 33.)

Table 33. Accuracy of the binary logit model for each segment in predicting observed values within that segment

Measures based on the % correctly predicted*

0¹ 0.205 0.103 0.006 0.043

2 0.163 0.013 -0.113 -0.077

2max 0.472 0.224 0.011 0.070

23= 2 2⁄ max 0.345 0.060 -10.557 -1.096

Average predicted probability of 0 0.389 0.233 0.122 0.152

Among observed 0's 0.503 0.377 0.187 0.267

Among observed 1's 0.316 0.189 0.113 0.131

Odds of 0's: among observed 0's vs. 1's 1.588 1.991 1.657 2.038

Average predicted probability of 1 0.611 0.767 0.878 0.848

Among observed 0's 0.497 0.623 0.813 0.733

Among observed 1's 0.684 0.811 0.887 0.869

Odds of 1's among observed 1's vs. 0's 1.375 1.301 1.091 1.186

* From Veall and Zimmerman (1996). See formulas in appendix section 4.8.4.

There is also considerable noise about the predictions of counts 1 and above. To illustrate, Table 34 shows various means of comparing the actual to predicted values based on each version of the hurdle model (model 4). Only about 20% of cases have predicted values that would round to the actual value (that is, is within 0.5 of the true number of trips), for all but the low-access segment. In general, lower counts tend to be underestimated, with an excess of cases assigned a predicted value near the observed mean (equal to 2.515 in the overall sample based on model 4d) – which would be the tendency the less predictive power the model provides. In all of the models, the prediction is higher than the actual number of trips made for more than half the cases, though varying to some degree across segments and models.

(Although the model is designed so that the average discrepancy is zero for the segment used for estimation, it over-predicts for more cases than it under-predicts, in all vehicle-access segments. This is because the distribution of trip volumes is right-skewed, with a long, thin tail of relatively rare high values, and a disproportionately large share of 0’s and 1’s that the model has difficulty predicting. ) And the average distance between the predicted and actual number

of trips is greater than 1 for all of the models and segments, for instance about 1.455 trips, on average — either above or below the observed number — using the entire-sample model (model 1d; see Table 34).

Table 34. Accuracy of the hurdle model for each segment in predicting observed values within that segment Model 1a Model 1b Model 1c Model 1d

In general, the results should be treated with caution due to lack of fit. However, better fit is difficult based only on the sort of demographic variables available in the dataset and possibly the idiosyncratic nature of trip-making. Next I discuss the contribution of the hurdle model in improving model performance, as well as differences in model performance across the different vehicle-access segments.

In document Mobility Fulfillment Among Low-car Households: Implications for Reducing Auto Dependence in the United States (Page 98-103)