Stated preference method: discrete choice experiment

2.5 Valuing benefits

2.5.2 Stated preference method: discrete choice experiment

SP methods involve analysing expected behaviour, rather than observed behaviour

(Lancsar and Louviere 2008). These valuations are invoked through surveys or

experiments where respondents are asked to give values directly, or infer them

indirectly. There are a number of SP techniques that can be used to elicit values, such

as time trade-off, person trade-off, standard gamble, the CVM, and DCE. Unlike the

RP approach, which relies on the presence of a market demand curve, the SP technique

is based on an income-compensated demand curve (McIntosh 2010b; Sugden and

Williams 1978). This affords true measures of welfare change through the CV and EV

of welfare economics; the RP approach relies on CS to measure welfare change

(McIntosh 2010b; Sugden and Williams 1978).

However, the validity of SP data is sometimes questioned. It relies on the assumption

that the choices inferred by respondents in a survey actually reflect the choices that

they would make in the real world. The behaviour of individuals may vary

considerably from a hypothetical scenario to a real scenario where they are forced to

choose between competing alternatives (Cookson 2003). SP methodology is further

questioned on its relevance to economic analysis as its theoretical basis is

psychological. For example, the Austrian school of economic thought maintains that

the only source of information relevant to economic analysis is provided by the

market.

As noted, there are a number of SP techniques that can be used to elicit values. A

The DCE, or some variant of the choice experiment, has been widely used across

various academic disciplines since its inception in the 1960s. However, due to varying

techniques, there is considerable ambiguity around the name, purpose, and theoretical

foundations of the experiment. In the 1960s choice experiments were introduced by

psychologists as a means of understanding ordered relations on sets, resulting in the

theory of Conjoint Measurement (Luce and Suppes 1965). Psychologists believed that

people could reliably and validly rank a set of multi-attribute scenarios, also known as

profiles, where additive, multiplicative or dual-distributive models could algebraically represent what psychologists term ‘decision rules’, and what economists term ‘indirect utility functions’ (Louviere and Lancsar 2009). However, using specific algebraic forms to examine ordered relations on sets carries significant implications on precision

and understanding of SP data. Principally, it fails to capture whether humans differ in

monotonic transformations, and whether these differences are generalisable (Louviere

and Lancsar 2009).

The choice experiment was adopted by the marketing literature in the 1970s and

became known as conjoint analysis (CA) (Green and Rao 1971). In CA, respondents

are presented with a series of competing scenarios and asked to rank or rate

alternatives. To aid respondents’ decision-making, the competing alternatives are described in terms of attributes where each attribute is made up of different levels

(Louviere and Lancsar 2009; Ryan 1996). In the 1970s and 1980s the number of

published studies using CA increased considerably with an estimated 400 marketing

studies published each year in the early 1980s (Wittink and Cattin 1989). CA was

commonly associated with two behavioural paradigms during this period: Social

1981). However, the choice experiment was originally devised on the axiomatic theory

of Conjoint Measurement, which is more about the behaviour of numbers rather than

humans or, moreover, orders on sets rather than human preferences (Louviere and

Lancsar 2009). It maintained that the rankings provided by humans represented

utilities, elicited through additive, multiplicative, and dual-distributive algebraic

models (Louviere and Lancsar 2009). Implied preferences are represented “as if”

humans simply added or multiplied, for example, attribute levels to assign a ranking

to each profile, or combination of attribute levels (Louviere et al 2010). As a result,

the origins of CA stem from statistical theory, rather than behavioural theory, relying

on mathematical techniques to represent ordered relations on sets in response to

systematic, factorial manipulations of factor levels (attribute levels), known as

“factorial designs” (Louviere et al 2010).

Because CA has little to do with human preferences, it is argued that it is unsuitable

for use as a stated preference technique in applied economics (Louviere et al 2010).

For instance, it is largely inconsistent with demand theory in that it ignores the

traditional constraints which underlie the economic concept, such as budget

constraints (Louviere et al 2010). It can ask respondents in a survey to rank or rate

alternatives that they simply could not afford, rendering the results meaningless. In

addition, CA has various logical limitations which distance it from practical

application in economic evaluation. The axioms of Conjoint Measurement are only

loosely related to utility theory, as illustrated above – CA reveals nothing about human

preferences. Further, there is no error theory, statistical or otherwise, associated with

2010). Therefore, it is unable to test differences in statistical models; the error

component is merely an afterthought.

The development of alternative choice techniques emerged in the economic literature

in the 1980s. The discrete choice experiment, as it is known in its current construct,

was pioneered by Louviere and Woodworth (1983) using experimental design theory.

For the first time, the authors constructed profiles, or choice sets, that respondents

could choose between, rather than rank or rate like CA. They found that the

experimental design elicited choices that were consistent with conditional logit models

(Louviere and Woodworth 1983). However, the study was interpreted and labelled as

another form of CA, termed ‘choice-based conjoint analysis’. It is this description that has caused much confusion around the name, purpose, and theoretical foundations of

the DCE. The DCE differs from CA for one major reason: the DCE is founded in

economic theory, while CA is rooted in statistical theory.

2.5.2.1 Theoretical foundations of the DCE

There are various theoretical underpinnings of the DCE. First and foremost, it is

strongly rooted in the standard economic theory of consumer behaviour, which

assumes that individuals are rational decision makers who consistently seek to

maximise a set of stable preferences (Amaya-Amaya et al 2008; Lancsar and Louviere

2008). The DCE assumes that respondents, when faced with a choice of competing

alternatives, or bundles, assign preferences to each alternative and choose the bundle

that satisfies a set of innate preferences. However, the neoclassical consumer evaluates

the optimisation problem, or discrete choice, according to the maximum utility that

constraints. As a result, the traditional theory of consumer behaviour has three key

extensions to the DCE (Amaya-Amaya et al 2008).

In the first instance, discrete choice theory assumes that utility is derived from the

various attributes of a good, rather than of the good in its entirety; traditional consumer

theory assumes that utility is a function of quantities (Amaya-Amaya et al 2008). This

idea of utility, or demand, is drawn from Lancaster’s (1966) economic theory of value. Lancaster (1966) argued that consumers are not influenced by the good itself, but by

the properties or attributes that embody the good. Any change in the attributes of the

good may cause individuals to switch from one bundle of goods to another,

representing a shift towards the bundle of goods with the most beneficial combination

of attributes. Secondly, the optimisation problem in a discrete choice experiment

subjects respondents to a set of finite and mutually exclusive alternatives (Amaya-

Amaya et al 2008). Participants in a DCE are subsequently restricted by an additional

constraint, whereas with traditional consumer theory, individuals are faced with

infinite choices, bound only by time and budget constraints. Lastly, discrete choice

theory assumes that the behaviour of individuals is probabilistic, rather than

deterministic as assumed by traditional theory (Amaya-Amaya et al 2008; Louviere et

al 2010). Probabilistic discrete choice modelling derives from Thurstone’s (1927)

theory of paired comparisons, and is called random utility theory (RUT). RUT is a

comprehensive behavioural theory which provides an explanation of the behaviour of

49 2.5.2.2 Random utility theory

Random utility theory (RUT) was proposed by Thurstone (1927), introduced into

economics by Marschak (1960) and later refined by McFadden (1974) and Manski (1977). RUT proposes that individuals have a latent construct of (indirect) “utilities” in their heads that are unobservable to researchers (Amaya-Amaya et al 2008;

Louviere et al 2010). Because these utilities cannot be “seen” by researchers, RUT

assumes that they can be decomposed into two additively separable components: a

systematic (explainable) component and a random (unexplainable) component. The

systematic component represents the attributes that explain differences in choice

alternatives and covariates that explain differences in individuals’ choices (Amaya- Amaya et al 2008; Louviere et al 2010). The random component captures the variation

in preferences that are unidentified. This can include differences in choices arising

from individual differences in utility, rather than choice options per se, measurement

errors, or specification errors, along with unobserved or unobservable attributes

(Amaya-Amaya et al 2008; Louviere et al 2010; Viney et al 2002). The latent utility

can, more formally, be described as:

𝑈_𝑖𝑛= 𝑉_𝑖𝑛+ 𝜀_𝑖𝑛 (2.4)

where 𝑈_𝑖𝑛 represents the latent, unobservable utility associated with choice alternative 𝑖 for individual 𝑛, 𝑉_𝑖𝑛 is the systematic component of the utility of choice 𝑖 for individual 𝑛, and 𝜀_𝑖𝑛 is the random component associated with choice 𝑖 and individual 𝑛.

Because RUT is probabilistic, it assumes that individual 𝑛 will choose alternative 𝑖 if, and only if, its utility is higher than any other option amongst all 𝐽 alternatives in the choice set 𝐶_𝑛:

𝑦_𝑖𝑛 = 𝑓(𝑈_𝑖𝑛) = {1 if 𝑈_{0 otherwise}𝑖𝑛= Max {𝑈𝑖𝑗}∀𝑗 ≠ 𝑖 ∈ 𝐶_𝑛

(2.5)

where 𝑦_𝑖𝑛 represents the choice indicator equal to 1 if individual 𝑛 chooses alternative 𝑖, or 0 otherwise. According to Eqn. 2.4, alternative 𝑖 is chosen by individual 𝑛 if, and only if:

(𝑉_𝑖𝑛+ 𝜀_𝑖𝑛) > (𝑉_𝑗𝑛+ 𝜀_𝑗𝑛) ∙ ∀𝑗 ≠ 𝑖 ∈ 𝐶_𝑛 (2.6) This can be rearranged to place the unobservable and observable components together:

(𝑉_𝑖𝑛− 𝑉_𝑗𝑛) > (𝜀_𝑖𝑛− 𝜀_𝑗𝑛) (2.7) Because (𝜀_𝑖𝑛− 𝜀_𝑗𝑛) is unobservable, it’s not possible to determine exactly if (𝑉_𝑖𝑛− 𝑉_𝑗𝑛) > (𝜀_𝑖𝑛− 𝜀_𝑗𝑛). As a result, it is only possible to estimate choice outcomes up to a probability of occurrence (McFadden 1974). For instance, the probability that

individual 𝑛 will choose 𝑖 is the same as the probability that the difference between the error components is less than the difference in the observable component between the chosen alternative 𝑖 and any other alternative 𝑗, amongst all 𝐽 alternatives in the subset 𝐶_𝑛:

𝑃_𝑖𝑛 = Pr(𝑦_in =_𝑋1

𝑖𝑛, 𝛽)

= Pr(𝑈𝑖𝑛 > 𝑈𝑗𝑛) ∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛

= Pr (𝜀𝑗𝑛 − 𝜀𝑖𝑛 < 𝑉𝑖𝑛− 𝑉𝑗𝑛)∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛 (2.8)

Again, it is not possible to observe (𝜀_𝑖𝑛− 𝜀_𝑗𝑛) across the population. RUT subsequently assumes that the distribution relates to the choice probability and the

selected distribution or density function; a probability distribution is assumed for the

random component (McFadden 1974). There are various probability distributions that

can be applied for 𝜀_𝑖𝑛, resulting in families of probabilistic discrete choice models, such as binary or multiple discrete choice models. (Amaya-Amaya et al 2008;

Louviere et al 2010). The exact probabilistic discrete choice model to apply in a DCE

depends on the assumptions about the probability distributions for 𝜀_𝑖𝑛. For instance, Thurstone (1927) assumed that the random components were distributed as non-

independent and non-identical normal random variates. But this assumption restricts

RUT in that it can only be applied in dichotomous discrete choice models, or binary

choice models, rather than multiple discrete choice models. McFadden (1974) later

developed RUT and assumed that the random components were distributed

independently and identically with a Gumbel distribution (extreme value type 1). This

assumption gave rise to the standard conditional logit (CL) model, or multinomial logit

(MNL) model.

The CL model is the easiest and most widely used probabilistic discrete choice model.

However, it has received considerable criticism because it relies heavily on restrictive

assumptions, such as random taste variation: the CL model can account for systematic

(observed) heterogeneity across observed characteristics, such as income or education,

remain random, regardless of observed characteristics (Amaya-Amaya et al 2008). As

a result, various new probabilistic discrete choice models have been proposed to better

represent human behaviour in choice models, including heteroscedastic models and

random parameters or mixed logit models (Amaya-Amaya et al 2008). The

appropriate model to use depends on a range of issues; most notably, the design of the

study. This is discussed in the following section.

2.5.2.3 From theory to practice: how to conduct a DCE

There are various stages involved in designing, analysing, and interpreting a DCE.

There are also various best-practice guidelines published on how to design and

construct a DCE. Ryan (1996) outlined five succinct stages: identification of

attributes; identification of levels; experimental design; data collection; and data

analysis. Adamowicz et al (1998) further refined the key steps to include questionnaire

development, sample sizing, and computerised support (Adamowicz et al 1998). The

design model proposed by Ryan (1996) is typically used in the health economics

literature, although this thesis draws on the most recent published guidelines on

conducting DCEs, proposed by Lancsar and Louviere (2008), as these guidelines are

expansive and particularly useful to novice practitioners. The design model proposed

by Ryan (1996) is included in these updated guidelines and expanded upon. Lancsar

and Louviere (2008) argue that there are three main components that need to be

considered when undertaking a DCE: experimental design, discrete choice analysis,

and welfare measures and other policy analyses. These are described separately below

2.5.2.4 Experimental design: choice survey and data

A DCE requires specific consideration of the choice format, framing of the choice set,

and relevance of the choice set to respondents to ensure proper design and

implementation (Lancsar and Louviere 2008). Carson et al (2000) note that true

preferences are most accurately revealed when the discrete choice is incentive

compatible (Carson et al 2000). There are different choice formats that can be applied

in a DCE, such as binary choice (yes/no), dichotomous choice (two alternatives), or

multiple choice (three or more alternatives). The appropriate format may depend on

the relevance of the goods to the respondent. Often in health care applications, the

choice between two comparators is hypothetical. In such cases, dichotomous choice

questions are inappropriate as participants are forced to reveal a preference for a

consumption bundle that they may never choose in practice. As a result, it is often

fitting to include an opt-out option, ‘choose neither’ option, or status-quo option.

It is also important to consider the impact that labelled (i.e. paracetamol, nurofen)

choice sets might have on revealed preferences. A generic description (i.e. drug A,

drug B) might be better suited where respondents have already experienced one of the

consumption bundles as this reduces the potential bias against any comparator. This is

known as status-quo bias (Ryan and Ubach 2003), or the endowment effect (Thaler

1980).6 A related decision, which is particularly important in health care applications,

is how much information should be provided to ensure that respondents are well

6_{The endowment effect is a behavioural economic concept that refutes the standard economic theory}

of the consumer (Thaler 1980). It asserts that initial endowments of wealth alter individuals’ preferences. It is typically used to explain the observed disparity between willingness to pay and willingness to accept, although it has recently been extended to explain how initial endowments of experience can influence preferences in the same way (Ryan and Ubach 2003; Salkeld et al 2000).

informed and are less influenced by previous experience/knowledge or assumptions

acquired from elsewhere.7

Identifying appropriate attributes and levels

The selection of appropriate attributes and attribute levels is crucial in a DCE to

capture the systematic component of the utility function. There are various methods

that can be used to obtain attributes, such as literature reviews, focus groups,

interviews with relevant personnel (including patients, policy makers, and experts),

and patient surveys (Coast et al 2011; Lancsar and Louviere 2008; Ryan and Hughes

1997).8 There are two important issues that need to be considered when formulating

attributes and levels (Ryan 1996). First, attributes must be meaningful and important

to respondents, while also addressing issues relevant to policy makers. Moreover, they must be ‘plausible’, and respondents should be able to trade across different attribute levels (Ryan 1996). Second, attributes must be comprehensive and measurable,

whether they are quantitative (i.e. distance to hospital) or qualitative (i.e. type of

provider). In order to be comprehensive, attributes must be meaningful to respondents.

Further, attribute levels must clearly depict the attractiveness of the associated

attributes (Ryan 1996). To be measurable, it must be reasonable to obtain a probability

distribution for an attribute that covers the range of selected attribute levels. It must also be reasonable to identify respondents’ preferences using a utility function (Ryan 1996).

7_{For information on the development of the choice process and format within this analysis, see Chapter}

4, section 4.2.1.1.

8_{This analysis uses focus groups to develop attributes. The focus groups are presented in Chapter 3.}

The number of attributes to use in a DCE is context specific. The general consensus is

that it should not exceed eight; however, some studies have included as many as 12

attributes (Hall et al 2006; Lancsar and Louviere 2008). By including a large number

of attributes in a DCE, the complexity of the experimental design is amplified (Lancsar

and Louviere 2008). It is necessary that specific consideration is given to relevant and

irrelevant attributes. A relevant attribute is one in which its exclusion would affect the

conclusion of the DCE, while an irrelevant attribute is one in which its exclusion

would not affect the results of the DCE. It is not always easy to distinguish relevant

from irrelevant attributes, especially if attributes that are currently demand-irrelevant

become relevant due to an increase in consumer knowledge, for example (Bennett and

Blamey 2001). Exclusion of relevant attributes may result in biased estimates, but this

needs to be weighed against an increase in task complexity arising from the inclusion

of more attributes (Lancsar and Louviere 2008). Lancsar and Louviere (2008) note

that in order to get the balance right, iterative piloting may be necessary.

There is no limit on the number of levels that an attribute should contain. Ryan (1999)

argues that an attribute level should be plausible, actionable, and tradable. The distance

between levels is crucially important to encourage respondents to trade across levels

(Lancsar and Louviere 2008). The interval distance should be sufficiently wide, rather

than too narrow or too wide, such that respondents can recognise reasonable

differences between levels, and are encouraged to trade. This is particularly important

for qualitative attributes where the expressed difference between levels can be easily

misinterpreted by respondents (Amaya-Amaya et al 2008). Interval distances that are

too narrow or too wide may be viewed as too insignificant or too significant,

that are insignificant or extreme. For quantitative attributes, such as cost attributes, the

distance between intervals can be easily interpreted, however, particular care is needed

where the aim is to implicitly determine the cost of other attributes using MRS

(Amaya-Amaya et al 2008; Lancsar and Louviere 2008). Another important

consideration concerning attribute levels arises from attribute-effects. An attribute-

effect arises when an increase in the number of attribute levels causes an attribute to

become more significant relative to attributes with fewer levels. For example, an

attribute with five levels will have a more profound impact on the DCE than an

attribute with three levels. One way to reduce this bias is to set the number of attribute

levels equal to each other for every attribute. However, this is often impractical as

most studies include cost attributes, and cost attributes are usually more informative

when wide ranges and greater attribute levels are included.

The experimental design

The next step in the design of a DCE involves selecting an experimental design and

constructing choice sets. Once all the attributes and associated levels are identified,

alternatives or consumption bundles, also called scenarios, profiles, or options, are

created using different combinations of attribute levels (Amaya-Amaya et al 2008).

Depending on the choice format, respondents are presented with competing

alternatives and asked to select their preferred option. The process of selecting

consumption bundles and placing them into choice sets is known as experimental

design (Amaya-Amaya et al 2008; Lancsar and Louviere 2008). Experimental design produces an estimation matrix which examines respondents’ choices (dependent

In document Valuing maternity care: a comparison of stated preference methods with an application to cost-benefit analysis (Page 64-87)