Alternative Model Specification and Estimation

Chapter 5 Linear Dynamic Model

5.2 Empirical Results from the Dynamic Car Ownership Model

5.2.2 Assuming Linear Economic Relationship at Cohort Level

5.2.2.3 Alternative Model Specification and Estimation

Similar to the static models, we also test a set of random effect models. While the Hausman’s Test indicates preference to the fixed effect model in only about half of the models estimated, the RESET test rejects the hypothesis of no misspecification in almost all models. This is likely to be caused by the correlation between the explanatory variables and the cohort effects, which would violate the orthogonality assumption of the random effect model.

We also investigate the hypothesis of homogeneity by estimating cohort specific models. Due to insufficient number of observations, the youngest and two oldest cohorts have to be dropped. To save degrees of freedom, the household structure variables are average number of children and household size, and compressed location variables are used. Then, the models are estimated for each of the 13 cohorts separately. The results are unsatisfactory, as the majority of the coefficients are not significant even at 10% level. It is unlikely that we are able to make reliable inference on elasticity or to make forecasts based on these models.

Finally, we investigate the robustness of the estimation using the parametric bootstrap technique, which is implemented in four steps. First, the unrestricted fixed effect model in semi-log form (as reported in Table 5.4) was estimated, and the predicted value Aˆ _c_,_t was saved. Second, a series of random number W was generated with zero mean and standard deviation of 0.65, which are the approximate moments of the residual from the model estimated in step one. Third, the model is re-estimated with a simulated dependent variable A_cS_,_t = Aˆ_c_,_t +w, where w are random draws from W. It should be noted that for the first observation in each cohort, the simulated variable is

w A A ct

S t

c,0 = ,0 + , where Ac,to is the “observed” value in the pseudo panel dataset rather

than the predicted value23. Finally, step three is repeated 1000 times and the coefficients estimated from each run are saved for further analysis.

In general, the coefficients estimated using “observed” dependent variable are consistent with those based on simulation. Figure 5.4 plots the distribution of the simulated coefficients for the log income variable, compared with the value of the coefficient estimated based on “real” data. The latter is close to the centre of the bell shape distribution, which shows that the initial estimate is broadly unbiased.

Figure 5.5 shows the distribution of the simulated coefficient for the log running costs variable as well as the point estimate based on observed data. Similar to above, the

23 S

latter is very close to the centre of the simulated distribution. This also confirms the unbiasedness of the point estimate coefficient for the log running cost variable.

Figure 5-4 Log Income Variable: distribution of the simulated coefficients and point estimate based on real data

Log Income: Distribution of simulated coefficients & point estimate based on real data

0 20 40 60 80 100 120 0.07 6 0.10 1 0.12 6 0.15 2 0.17 7 0.19 3 0.21 8 0.24 4 0.26 9 0.29 4 0.31 9 Coefficient F re q u e n c y Simulated density Point Estimate

Figure 5-5 Log Running Cost Variable: distribution of the simulated coefficients and point estimate based on real data

Log Running Costs: Distribution of simulated coefficients & point estimate based on real data

0 20 40 60 80 100 120 -0.2 14 -0.1 89 -0.1 64 -0.1 39 -0.1 15 -0.0 96 -0.0 73 -0.0 48 -0.0 23 0.00 2 0.02 6 Coefficient F re q u e n c y Simulated density Point Estimate

Figure 5.6 compares the “most likely” fixed effects from simulation and the point estimate based on observed data. To obtain the “most likely” simulated coefficient, we divide the 1000 parameter values obtained from simulation into 31 ranges, and determine the “most likely” range, which is the one with the most occurrences. The mid-point of the “most likely” range becomes the representative simulated fixed effect. Figure 5.6 shows the point estimates based on real data are quite close to the “most

likely” simulated values expect for the youngest cohort. As discussed before (and in the literature, e.g. Dargay and Vythoulkas, 1999), the linear trend in the fixed effect become less clear for the youngest cohort, and this effect seems to be amplified with the simulated data.

Figure 5-6 “Most Likely” fixed effects from simulation and point estimate from real data

Fixed Effect -2.5 -2 -1.5 -1 -0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Cohort

'most likely' simulated fixed effects

point estimate fixed effect

5.3 Conclusion

In this chapter, we first investigate the theoretical aspects of the dynamic pseudo panel. Two types of consistent estimators have been discussed: one based on cohort average and the other based on individual survey data. Regarding the former, we review the Error Corrected Within-Group Estimator, consistent when T →∞ , and the Error Corrected GMM Estimator, consistent when the number of cohort is large (C→∞), both of which were proposed by Collado (1997). We also present a Within-Group Estimator, which is computationally attractive and consistent when the number of sample observation is large for each cohort unit (n_ct →∞). Certain rank conditions have to be satisfied for identification, which requires the cohort means of the dependent and independent variables should not exhibit perfect collinearity and vary over time. It is also required that there are at least three cross sections for the model to be identified.

Regarding the estimators based on data at individual level, we review the Two-Stage Least Square Estimator by Moffitt (1993) and the GMM Estimator of Quasi-

Differences model by Girma (2004). Both approaches involve noisy approximation of the lagged dependent variable. Although such noise would cancel out asymptotically, its impacts on practical application might be severe. In another word, their usefulness in empirical work would be limited.

The second part of this chapter reports the empirical work on dynamic car ownership model. As the pseudo panel is a synthetic panel constructed from individual survey data, we can assume a linear economic relationship between the car ownership level and various explanatory variables either at individual household level or at cohort level. Consequently, separate sets of the specification search have been carried out to determine the model with best fit under each assumption.

Assuming a linear economic relationship between the dependent variable and the regressors for each household requires the transformation (log, square, etc.) of individual survey data before obtaining cohort average. However, the empirical results from the semi-log models show various problems including a very low coefficient for the lagged dependent variable, coefficients with wrong sign for motoring cost variables and wrong trend in fixed effects across cohorts. These problems reinforce the idea that it is not appropriate to view the relationship between car ownership and explanatory variables as linear for each individual household, especially when the regressors include a lagged dependant variable. This is because for individual household, the relationship between past and current car ownership level is unlikely to remain constant over time.

Alternatively, we assume that the economic relationship is linear at cohort level. A number of models have been estimated, with different explanatory variables, functional forms and representation of cohort effects. Systematic specification search has been carried out, and an un-restrictive fixed effect model in semi-log form is found to have the best goodness of fit. The implied long run income elasticity is 0.57, 0.26 and 0.19 for households with low, middle and high car ownership level; the implied long run running cost elasticity is -0.28, -0.13 and -0.10 respectively, which all appear to be sensible. To check the robustness of the estimation, the preferred model is re-estimated using parametric bootstrap technique, and the distributions of the simulated coefficients confirm the unbiasedness of the point estimates obtained using the pseudo panel data.

Chapter 4 and 5 thoroughly investigate the static and dynamic car ownership model with linear (or generalised linear) form. The models with the best fit will be used for forecast at a later stage. The next two chapters will investigate the car ownership models with non-linear form.

In document The Use of Pseudo Panel Data for Forecasting Car Ownership (Page 92-98)