Econometric Models for Choice Experiment - Research Design and Methodology

Chapter 3 Research Design and Methodology

3.5 Econometric Models for Choice Experiment

3.5.1 The Classic Conditional Logit Model

Choice Experiment is based on the characteristic theory of consumption which supposes that utility is derived from characteristics (attributes) of goods/services (Lancaster 1966) and the random utility theory which supposes that people make decisions to maximize the utility they derive from goods/services (McFadden 1974). The Conditional Logit Model (CLM), developed by McFadden (1974) who was rewarded the Nobel Prize of economics in 2000 for his contribution in choice modelling, is the classic and most widely used model for choice experiments studies. Assume that an individual is given a choice set (card) C to choose one of its alternatives, the utility of alternative i is supposed to be composed of a deterministic and observable component Vi and a random and unobservable error component ɛi.

𝑈𝑖 = 𝑉𝑖+ 𝜀𝑖 (3.51)

When the individual compares alternative i with alternative j in the choice set C, he/she would choose alternative i if and only if larger utility can be derived from this alternative. So the probability of choosing alternative i is

Pr (𝑖 | 𝐶) = Prob (𝑉𝑖 + 𝜀𝑖 > 𝑉𝑗 + 𝜀𝑗; 𝑖 ≠ 𝑗; ∀ 𝑗 ∈ 𝐶) (3.52)

When the random error terms 𝜀_𝑖 and 𝜀_𝑗are independently and identically distributed following the Gumbel distribution, the probability of choosing i is (McFadden 1974; Hanley, Wright and Adamowicz 1998; Louviere, Hensher and Swait 2000):

Pr(𝑖) = exp (𝜇𝑉𝑖)

∑𝑗∈𝐶exp (𝜇𝑉𝑗)

(3.53)

where μ is a scale parameter which is assumed to 1, implying the constant error variance. The deterministic component Vi is usually presented as a linear function of the attributes vector Xi and the coefficients vector β’.

𝑉𝑖 = 𝛽′𝑋𝑖 (3.54)

Pr(𝑖) = exp (𝛽′𝑋𝑖)

∑𝑗∈𝐶exp (𝛽′𝑋𝑗)

(3.55)

Under the assumption of Independent of Irrelevant Alternatives (IIA), which implies that the ratio of choice probability between two alternatives is not influenced by the introduction or removal of other alternatives, the coefficients vector β’ in Equation 3.55 can be estimated by conditional logit regression procedures in statistical software packages. In this study, the special package “mlogit” of the statistical program R was used to conduct the conditional logit regression procedure (Croissant 2013).

The estimated coefficients in CLM do not have straightforward interpretations of the choice probabilities of any specific alternatives since the probability of choosing an alternative is conditional on the other alternatives in the choice set (represented by the denominator in Equation 3.55). Instead, the coefficients of CLM represent the changes in respondents’ utility caused by a unit change in the attributes. A positive coefficient means the respondents prefer higher levels of the corresponding attribute and vice versa. Another useful interpretation of the coefficients is the marginal value, how much are the respondents willing to forgo for a unit increase in the non-monetary attributes.

Denote the coefficient of the monetary attribute as βm, and the coefficients of non-monetary attributes as βnm, the marginal value of non-monetary attributes can be calculated as (Hanley, Wright and Alvarez-Farizo 2006): 𝑀𝑉 = 𝛽𝑛𝑚

𝛽𝑚 (3.56)

3.5.2 The Random Parameters Logit Model for Revealing the

Heterogeneity in Respondents’ Preferences

Despite its usefulness and wide application, the classic CLM has its limitations. Firstly, the CLM is restricted by the IIA assumption which does not always hold in real life. Secondly, the CLM assumes that the parameters/coefficients which represent respondents’ preferences for the attributes are uniform for all people, so it is unable to account for the heterogeneity in respondents’ preferences in choice experiments studies (Train 1998; Hanley, Wright and Alvarez-Farizo 2006; Ruto and Garrod 2009). The Random Parameter Logit Model (RPL) is an advanced model to overcome these limitations by allowing the parameters/coefficients of attributes in choice models to randomly vary over respondents but follow certain statistic distributions (Train 1998; McFadden and Train 2000; Greene

and Hensher 2003; Hanley, Wright and Alvarez-Farizo 2006; Ruto and Garrod 2009; Hoyos 2010). The most adopted statistic distribution is the normal distribution which can be described by the mean and standard deviation. Therefore, instead of estimating one fixed coefficient for each attribute like the CLM, the RPL estimates two coefficients for each attribute, i.e. the mean coefficient and the standard deviation coefficient which together describe the distribution of the respondents’ heterogeneous preferences for this attribute.

Following Train’s (1998) study, Equation 3.55 is the probability of choosing alternative i when the coefficients vector β’ is assumed to be homogenous for all respondents. Adding the subscripts n and t to represent the nth respondent and the tth choice set respectively, Equation 3.55 can be rewritten as:

𝑃𝑛𝑖𝑡 = _∑_𝑗∈𝐶exp (𝛽′𝑋_{exp (𝛽′𝑋}𝑛𝑖𝑡_𝑛𝑗𝑡) ₎ (3.57)

The probability of the nth respondent’s sequence of choices from all the choice sets is the product of the choice probability:

𝑆𝑛 = ∏ 𝑃𝑡 𝑛𝑖𝑡(𝛽′) (3.58)

In the RPL, β’ is not fixed but follows the normal distribution θ* characterized by the mean and standard deviation. Denote the probability density of the coefficients as f (β’|θ*), the probability of the choice sequence which accounts for respondent’s heterogeneous preferences is the integral of Equation 3.58 over all possible values of β’ weighted by its probability density:

𝑃𝑛(𝜃∗) = ∫ 𝑆𝑛 𝑓 (𝛽′|𝜃∗)𝑑𝛽 (3.59)

The integral of Equation 3.59 does not have a closed form to be analytically calculated. Thus a simulated maximum likelihood estimate can be used to determine the coefficients distribution θ* (Train 1998; Ruto and Garrod 2009). Specifically, a number of values of β’ are randomly drawn from a given distribution θ, and the probability of the choice sequence of the nth respondent, i.e. Pn (θ*), is approximated by averaging all the simulated probabilities:

𝑃𝑛′(𝜃∗) =1_𝑅 ∑𝑟=1𝑅 𝑃𝑛(𝛽𝑟|θ) (3.60)

where R is the number of repetitions (draws), βr|θ is the rth draw of β from the given distribution θ. Then the simulated log-likelihood of the choice sequences of all respondents under the coefficients distribution θ is:

𝑆𝐿𝐿(𝜃) = ∑ 𝑙𝑛[𝑃𝑛 𝑛′(𝜃)] (3.61)

Substitute Equations 3.57, 3.60 into Equation 3.61, 𝑆𝐿𝐿(𝜃) = ∑ 𝑙𝑛 �_𝑅1 ∑ ∏ exp(𝛽𝑟|𝜃𝑋𝑛𝑖𝑡) ∑_𝑗∈𝐶exp (𝛽𝑟|𝜃_𝑋_𝑛𝑗𝑡₎ 𝑡 𝑅 𝑟=1 � 𝑛 (3.62)

Maximum Likelihood Estimate is applied to find the mean and standard deviation of the coefficients distribution θ* that maximize the simulated log- likelihood of respondents’ choice sequences. If the estimated standard deviation coefficient is significant, there is significant heterogeneity in respondents’ preferences for the attribute. In this study, the estimation of RPL was also conducted by the special package “mlogit” of the statistical program R (Croissant 2013).

3.6 Summary

This chapter explains and elaborates the research design, method and models used in this study. Non-market valuation was used to develop PES schemes for the water protection of the middle route of the South-to-North Water Transfer Project from both supply and demand perspectives. On the demand (consumer) perspective, a Contingent Valuation survey was conducted in four cities along the middle route project in order to investigate urban residents’ willingness to pay higher water prices for water protection. The non-parametric model, the Single Bound Dichotomous Choice Model and the Double Bound Dichotomous Choice Model were applied to estimate respondents’ mean WTP. An integrated procedure to combine the automatic stepwise regression and best subset regression techniques and manual adjustment was designed and developed in Contingent Valuation for model construction and refinement.

On the supply (provider) side of PES, a Choice Experiments survey was conducted in seven villages at the water supply area (around the Danjiangkou Reservoir) to reveal farmer households’ preferences for the design of two water protection programs, namely the existing Sloping Land Conversion Program for reforestation and a hypothetical program for fertilizer reduction. In addition to the classic Conditional Logit Model, the advanced Random Parameters Logit Model was also applied in this study to further reveal the heterogeneity in farmer households’ preferences. Furthermore, auxiliary questions were also asked in the choice experiments survey to investigate the effect of the SLCP on the livelihoods of the

participant households. The results from the surveys and model estimation are discussed in detail in the following chapters.

In document Payments for ecosystem services of the middle route of the South-to-North Water Transfer Project in China (Page 92-97)