Model Selection Part 1: Comparison of the joint choice model

3.3 Results

3.3.5 Model Selection Part 1: Comparison of the joint choice model

There are two main aspects of model selection addressed in this section. First and more importantly, this section addresses the question of the importance of a model of the choices of residential location, car ownership status, and commute mode choice be a single model that endogenizes all three decisions. The second aspect of model selection addressed here is the question of whether it is necessary to use a nested logit or if a joint logit model will suffice for this estimation.

For the first of the model selection questions identified above, six multinomial logit choice models were estimated that include all of the possible sub-models of the

full joint choice models. These were as follows:

1. Joint choice of residential location, car ownership status, and commute mode

2. Joint choice of residential location and car ownership status

3. Joint choice of residential location and commute mode

4. Joint choice of car ownership status and commute mode

5. Choice of car ownership status

6. Choice of commute mode

The estimation results for these models can be seen in Table 3.3 and in Ta- bles A.1 through A.5. To answer the question of the importance of modeling these choices jointly, the main evaluation method is to compare statistical significance and sign differences between the estimated coefficients in the models. The model that is most consistent with theory is the preferred model. One way to compare the models statistically is to compare the distribution of each model’s predicted probabilities for the alternatives that were actually chosen. This study employs both of these methods. The signs of the model coefficients are most consistent with theory in the full joint model of the choices of residential location, car ownership, and commute mode. The main changes in the coefficient signs that appear in the alternate models listed above are in the coefficients on “Riding Time”. For the subset of models that does not include the choice of residential location as part of the dependent variable, the estimated coefficients of “Riding Time” are positive, indicating that longer riding times are more desirable. It is possible that this is true for some range of riding times (Redmond and Mokhtarian, 2001), but it is not likely that in general, longer riding times make an alternative more desirable. Instead, the explanation I offer is that because trip distance and riding time are closely related and longer trips may be associated with more desirable residential locations, riding time appears to be have a positive effect on utility when residential location choice is not included as part of the choice.

on car insurance for high income households is positive. The second coefficient that changes sign in one of the alternate models is that on the variable “NH Miles From Midtown Manhattan”, again for high income households. In the full model, this coefficient is negative, indicating that high income households prefer to live closer to midtown. In the model of residential location and mode choice (Table A.2), this coefficient is positive.

There are many coefficients that are statistically significant in the full model and statistically insignificant in some of the alternate models. There are a few coefficients that are significant in one of the alternate models and statistically insignificant in the full model. The fact that more of the coefficients are stastistically significant in the full model, though, adds to the evidence that the full model provides the best fit to the data.

This evidence based on coefficient signs and statistical significance indicates strongly that residential location should be included as part of the dependent variable in a model of mode choice in New York City. Additional evidence of this is provided by comparison of each model’s predicted probabilities for the alternatives that were actually chosen. This is done by comparing the average predicted probability for the chosen alternative in the joint choice model with the product of the average predicted probabilities for the chosen sub-alternatives from the single choice models.2

2_{This is the correct comparison to make. There is also another method that is tempting to try, but}

is incorrect. This is to compare the average predicted probability for each chosen sub-alternative in the joint choice model with the average predicted probability of the chosen sub-alternative in each single choice model. This second method will yield the result that the single choice models outperform the joint choice models because the joint choice models are trying to predict something much more complicated, and effective prediction of each sub-choice is compromised to achieve the

This comparison method is relatively simple. First, the joint choice model is estimated. Then, the resulting predicted probabilities for each individual’s chosen compound alternative are averaged. Since the model is estimated using neighborhood weights, the averages here are weighted as well, using this same weighting scheme. For the comparison, it is necessary to also estimate the single choice models for each sub-choice, and calculate the weighted average of these predicted probabilities for each individual’s chosen sub-alternative. The goodness-of-fit comparison is between the weighted average probability for the compound alternative and the product of the weighted average probabilities for the sub-alternatives. The following mathematical expression represents the comparison (without the weighting).

P n P j ynjPn(lcm) N versus P n P l P c P m ynlPn(l) ∗ yncPn(c) ∗ ynmPn(m) N

where: l signifies the location choice, c signifies the car ownership choice, m signifies the mode choice,

ynj = 1 if individual n chooses compound alternative j

= 0 otherwise, and

ynl, ync, and ynm are defined in an analogous manner.

The results of this comparison can be seen in Table 3.2. By this measure, the joint choice models perform better for both the full compound choice case and for the location-mode choice case. For the car-mode and the location-car choice cases, the separate models perform better than the joint choice model. This is consistent with the model selection discussion above that focuses on coefficient estimates in that it re-emphasizes the importance of jointly modeling the choices of residential location and commute mode. Because the current research is focused on car ownership status as well as car use for commuting, I chose to continue to include as endogenous the choice of car ownership status in the present model. This choice was made in spite of the evidence presented here that suggests that its inclusion may actually compromise model goodness-of-fit.

3.3.6 Model Selection Part 2: Joint versus nested logit spec-

In document Cars and the City: An Investigation of Transportation and Residential Location Choices in New York City (Page 100-104)