Chapter 3 Research Design and Methodology
3.5 Econometric Models for Choice Experiment
3.5.1 The Classic Conditional Logit Model
Choice Experiment is based on the characteristic theory of consumption which supposes that utility is derived from characteristics (attributes) of goods/services (Lancaster 1966) and the random utility theory which supposes that people make decisions to maximize the utility they derive from goods/services (McFadden 1974). The Conditional Logit Model (CLM), developed by McFadden (1974) who was rewarded the Nobel Prize of economics in 2000 for his contribution in choice modelling, is the classic and most widely used model for choice experiments studies. Assume that an individual is given a choice set (card) C to choose one of its alternatives, the utility of alternative i is supposed to be composed of a deterministic and observable component Vi and a random and unobservable error component Ιi.
ππ = ππ+ ππ (3.51)
When the individual compares alternative i with alternative j in the choice set C, he/she would choose alternative i if and only if larger utility can be derived from this alternative. So the probability of choosing alternative i is
Pr (π | πΆ) = Prob (ππ + ππ > ππ + ππ; π β π; β π β πΆ) (3.52)
When the random error terms ππ and ππare independently and identically distributed following the Gumbel distribution, the probability of choosing i is (McFadden 1974; Hanley, Wright and Adamowicz 1998; Louviere, Hensher and Swait 2000):
Pr(π) = exp (πππ)
βπβπΆexp (πππ)
(3.53)
where ΞΌ is a scale parameter which is assumed to 1, implying the constant error variance. The deterministic component Vi is usually presented as a linear function of the attributes vector Xi and the coefficients vector Ξ²β.
ππ = π½β²ππ (3.54)
Pr(π) = exp (π½β²ππ)
βπβπΆexp (π½β²ππ)
(3.55)
Under the assumption of Independent of Irrelevant Alternatives (IIA), which implies that the ratio of choice probability between two alternatives is not influenced by the introduction or removal of other alternatives, the coefficients vector Ξ²β in Equation 3.55 can be estimated by conditional logit regression procedures in statistical software packages. In this study, the special package βmlogitβ of the statistical program R was used to conduct the conditional logit regression procedure (Croissant 2013).
The estimated coefficients in CLM do not have straightforward interpretations of the choice probabilities of any specific alternatives since the probability of choosing an alternative is conditional on the other alternatives in the choice set (represented by the denominator in Equation 3.55). Instead, the coefficients of CLM represent the changes in respondentsβ utility caused by a unit change in the attributes. A positive coefficient means the respondents prefer higher levels of the corresponding attribute and vice versa. Another useful interpretation of the coefficients is the marginal value, how much are the respondents willing to forgo for a unit increase in the non-monetary attributes.
Denote the coefficient of the monetary attribute as Ξ²m, and the coefficients of non-monetary attributes as Ξ²nm, the marginal value of non-monetary attributes can be calculated as (Hanley, Wright and Alvarez-Farizo 2006): ππ = π½ππ
π½π (3.56)
3.5.2 The Random Parameters Logit Model for Revealing the
Heterogeneity in Respondentsβ Preferences
Despite its usefulness and wide application, the classic CLM has its limitations. Firstly, the CLM is restricted by the IIA assumption which does not always hold in real life. Secondly, the CLM assumes that the parameters/coefficients which represent respondentsβ preferences for the attributes are uniform for all people, so it is unable to account for the heterogeneity in respondentsβ preferences in choice experiments studies (Train 1998; Hanley, Wright and Alvarez-Farizo 2006; Ruto and Garrod 2009). The Random Parameter Logit Model (RPL) is an advanced model to overcome these limitations by allowing the parameters/coefficients of attributes in choice models to randomly vary over respondents but follow certain statistic distributions (Train 1998; McFadden and Train 2000; Greene
and Hensher 2003; Hanley, Wright and Alvarez-Farizo 2006; Ruto and Garrod 2009; Hoyos 2010). The most adopted statistic distribution is the normal distribution which can be described by the mean and standard deviation. Therefore, instead of estimating one fixed coefficient for each attribute like the CLM, the RPL estimates two coefficients for each attribute, i.e. the mean coefficient and the standard deviation coefficient which together describe the distribution of the respondentsβ heterogeneous preferences for this attribute.
Following Trainβs (1998) study, Equation 3.55 is the probability of choosing alternative i when the coefficients vector Ξ²β is assumed to be homogenous for all respondents. Adding the subscripts n and t to represent the nth respondent and the tth choice set respectively, Equation 3.55 can be rewritten as:
ππππ‘ = βπβπΆexp (π½β²πexp (π½β²ππππ‘πππ‘) ) (3.57)
The probability of the nth respondentβs sequence of choices from all the choice sets is the product of the choice probability:
ππ = β ππ‘ πππ‘(π½β²) (3.58)
In the RPL, Ξ²β is not fixed but follows the normal distribution ΞΈ* characterized by the mean and standard deviation. Denote the probability density of the coefficients as f (Ξ²β|ΞΈ*), the probability of the choice sequence which accounts for respondentβs heterogeneous preferences is the integral of Equation 3.58 over all possible values of Ξ²β weighted by its probability density:
ππ(πβ) = β« ππ π (π½β²|πβ)ππ½ (3.59)
The integral of Equation 3.59 does not have a closed form to be analytically calculated. Thus a simulated maximum likelihood estimate can be used to determine the coefficients distribution ΞΈ* (Train 1998; Ruto and Garrod 2009). Specifically, a number of values of Ξ²β are randomly drawn from a given distribution ΞΈ, and the probability of the choice sequence of the nth respondent, i.e. Pn (ΞΈ*), is approximated by averaging all the simulated probabilities:
ππβ²(πβ) =1π βπ=1π ππ(π½π|ΞΈ) (3.60)
where R is the number of repetitions (draws), Ξ²r|ΞΈ is the rth draw of Ξ² from the given distribution ΞΈ. Then the simulated log-likelihood of the choice sequences of all respondents under the coefficients distribution ΞΈ is:
ππΏπΏ(π) = β ππ[ππ πβ²(π)] (3.61)
Substitute Equations 3.57, 3.60 into Equation 3.61, ππΏπΏ(π) = β ππ οΏ½π 1 β β exp(π½π|πππππ‘) βπβπΆexp (π½π|πππππ‘) π‘ π π=1 οΏ½ π (3.62)
Maximum Likelihood Estimate is applied to find the mean and standard deviation of the coefficients distribution ΞΈ* that maximize the simulated log- likelihood of respondentsβ choice sequences. If the estimated standard deviation coefficient is significant, there is significant heterogeneity in respondentsβ preferences for the attribute. In this study, the estimation of RPL was also conducted by the special package βmlogitβ of the statistical program R (Croissant 2013).
3.6 Summary
This chapter explains and elaborates the research design, method and models used in this study. Non-market valuation was used to develop PES schemes for the water protection of the middle route of the South-to-North Water Transfer Project from both supply and demand perspectives. On the demand (consumer) perspective, a Contingent Valuation survey was conducted in four cities along the middle route project in order to investigate urban residentsβ willingness to pay higher water prices for water protection. The non-parametric model, the Single Bound Dichotomous Choice Model and the Double Bound Dichotomous Choice Model were applied to estimate respondentsβ mean WTP. An integrated procedure to combine the automatic stepwise regression and best subset regression techniques and manual adjustment was designed and developed in Contingent Valuation for model construction and refinement.
On the supply (provider) side of PES, a Choice Experiments survey was conducted in seven villages at the water supply area (around the Danjiangkou Reservoir) to reveal farmer householdsβ preferences for the design of two water protection programs, namely the existing Sloping Land Conversion Program for reforestation and a hypothetical program for fertilizer reduction. In addition to the classic Conditional Logit Model, the advanced Random Parameters Logit Model was also applied in this study to further reveal the heterogeneity in farmer householdsβ preferences. Furthermore, auxiliary questions were also asked in the choice experiments survey to investigate the effect of the SLCP on the livelihoods of the
participant households. The results from the surveys and model estimation are discussed in detail in the following chapters.