Pooled Binary-Multinomial Logit Model

6.3 Model Results

6.3.1 Pooled Binary-Multinomial Logit Model

Based on the utility specifications provided in Section 6.2.1, a pooled parsimonious BL-MNL model was developed where the attribute coefficients were held constant across the different preference elicitation methods but a scale parameter was estimated for two of the three different elicitation methods, with the scale parameter for one of them held constant (at unity). This approach allowed investigating

whether different preference elicitation methods captured different preference information and had varying levels of variance associated with them while allowing for a more parsimonious specification. The choice of the elicitation method for the fixed scale parameter depended on the overall model results. While the scale parameter can be fixed arbitrarily, it was found that other scale parameters did not have any estimation problem when the lowest scale parameter was fixed at unity.

Thus, several models were estimated with alternative specifications of the scale parameter in order to find the best specification structure.

In order to account for panel effects, the error component specification was used in BIOGEME 2.0 (Bierlaire, 2003; Bierlaire 2008; Yanez et al., 2010) where an error component was added to (n-1) alternatives.

A panel data set comprises of a series of repeated observations from the same unit (individuals, households or firms) over a number of periods. Though the availability of repeated observations on the same unit allows for more complicated and realistic models, the repeated nature of the data implies that it is no longer appropriate to assume that the different observations are independent. In comparison to time-series or cross-sectional data, an important advantage with panel data lies in the possibility to estimate certain parameters without making restrictive assumptions. However, given that the repeated observations are obtained from the same individuals, it is unrealistic to assume that the error terms over different time periods would be uncorrelated (Verbeek, 2008).

In case of the discrete choice model where repeated observations from an individual are obtained (panel data), a correlation of the disturbances (serial correlation) or heterogeneity due to the variations in the unobserved effects across individuals is observed (Abdel-Aty et al., 1997). While this panel data offers certain advantages over cross-sectional observations from the same individual as it allows for more accurate measurements (Yanez et al., 2010), the repeated measurement data introduces correlation of the unobserved terms which needs to be explicitly treated.

For a standard linear regression of the form: y_it ₀x_it  _it where x is a K-_it dimensional vector of explanatory variables and which excludes the intercept term and i is an index of all individuals (i = 1, . . ., N) and t is an index of all time periods (t = 1, . . ., N), the model implies that the intercept  and the coefficient  are₀ identical for all individuals and time periods while the error term varies over individuals and time periods and captures all the factors that affects y . A typical_it panel data model then assumes: _it _iu_it

where, u is assumed to be homoskedastic and not correlated over time while the_it component  is time invariant and homoskedastic over individuals. This model is_i referred as the error components or random effects model. In a fixed effects model, this problem is addressed by including an individual specific constant term in the model which is estimated along with other regressors of the model. In this case, the model is given as:

it i it it

y  x u

where,  (i = 1, . . ., N) are fixed unknown constants estimated in the model and_i u is the error term assumed to be i.i.d. over individuals and time periods. Theit

overall intercept term is omitted and is replaced by the individual specific constants, which are referred as the fixed (individual) effects. These effects capture all unobservable time-invariant differences across individuals. Most panel data models are estimated either using fixed or random effects model.

Factors that affect the dependent variable but which have not been included as the explanatory variable can be appropriately summarised by a random error term. In this case, it leads to the assumption that  are random factors, i.i.d. over_i individuals. The error component thus comprises of two parts: an individual specific component which does not vary over time and a remainder component which is assumed to be uncorrelated over time. Thus, all correlation of error terms over time is attributed to  (Verbeek, 2008)._i

In application of panel analysis methods for SP data, it can be seen that the implementation of the fixed effects method can be complicated and unrealistic as it requires an estimation of n individual-specific constants. This leads to the incidental parameter problem where the number of estimated parameters increases with the sample size (Verbeek, 2008). In order to avoid this problem, the random effects model is more frequently applied in case of the SP data to account for the serial correlation and heterogeneity observed.

The random effects model can either take the form of the random parameters logit (RPL) model or the error components logit (ECL) model. In case of the RPL model, the coefficient vector βⁿ is the coefficient associated with individual n, representing that person’s tastes which vary across different individuals and the density of this distribution is given byθ, which represent the population parameters that describe the distribution of the individual parameters (Revelt and Train, 1998).

Under the panel analysis framework, this implies that the correlation among observations obtained from an individual causes correlation among taste parameters which can be captured using the RPL method and where the variance of the taste parameters reflects inter-respondent heterogeneity caused due to the panel effect.

Under the ECL framework, an error component is introduced in the model which accounts for correlation obtained across observations from an individual (Abdel-Aty et al., 1997; Mabit et al., 2008; Yanez et al., 2010). For each of the utility functions, if the stochastic utility is taken to be the sum of the deterministic utility and an associated random error which can be further decomposed into the form

it i uit

   , where  captures the correlation over individuals (i.e., panel effect),_i and u is i.i.d. Gumbel distributed error term, then n alternative specific error terms_it for the panel effect can be specified. However the estimation of n error terms can cause identifiability issues and hence this procedure requires estimating (n-1) error variances (Yanez et al., 2010) where one procedure to identify the reference alternative is by estimating all error components and holding the alternative with the lowest value of error variance along with the associated low t-statistics, as the reference alternative (Walker et al., 2007). If a common error variance is used to capture panel effects across (n-1) alternatives, a correlation among these

alternatives is induced and hence alternative specific (n-1) error variances are estimated in this study to capture the panel effects for each of the alternatives.

While the RPL approach can also be applied to account for panel effects, this approach was not undertaken as it would imply estimating the variances for each of the coefficients, over each of the alternatives, thus substantially increasing the number of estimated parameters in comparison to the ECL method.

As given in Section 4.3.2.3 of Chapter 4, the estimation of the ECL or RPL model requires drawing parameters from a density. Several types of draws are available as has been outlined in the Section and the Modified Latin Hypercube Sampling (MLHS) procedure as developed by Hess et al. (2006) has been used to generate the draws.

Using the error component specification to account for panel effects in BIOGEME (Bierlaire, 2003; Bierlaire 2008; Yanez et al., 2010), an error component was added to (n-1) alternatives. In order to select the base alternative for the panel specification, a model with all error components and 500¹⁹ MLHS draws was estimated for all the different model specifications. The alternative with the least error component variance and a low associated statistical significance was selected as the base alternative. In case of location ratings ‘Alternative A’ was selected as the base alternative while in case of the location dummy model, the base alternative for the panel specification was ‘Alternative B’ of the binary elicitation method.

For the linguistic ratings and linguistic dummy models, the base alternative was

‘Abs. Certain A’ of the two stage Likert elicitation method.

Compared with a model with different parameter estimates for each of the elicitation methods but with a common charge estimate across the elicitation methods (while other parameters are allowed to freely vary), it was found that without considering the panel effects, no significant improvement was observed in the model fit compared to the common parameters model, thus paving way for the

19 Both MLHS as well as Halton draws were experimented with 500 and 1000 draws. It was found that MLHS draws were largely efficient with stable estimation obtained from 500 draws. The use of 1000 draws increased the estimation time without any substantial improvement in the model fit.

implementation of a more parsimonious model where each of the attribute coefficients are held to be same across the different elicitation methods while the scale parameters for the different elicitation methods (binary and two stage Likert, in relation to one stage Likert) are estimated. The following table thus provides the results of the pooled BL-MNL parsimonious model considering panel effects. It is to be noted that while the t-statistics for each of the parameters, ASC estimates and error variances are with respect to zero, those for the scale parameters are with respect to one. The panel error component obtained for each of the alternatives across the different model specifications, by considering the panel effects is provided in Table 6.5 and discussed subsequently thereafter.

The following results are obtained from the pooled BL-MNL panel models:

Table 6.4 Binary-Likert pooled MNL model for Location and Linguistic data with specific availability conditions and panel effects, with each variable level common across the elicitation methods

In document Preference elicitation and preference uncertainty: an application to noise valuation (Page 194-200)