2.5 Valuing benefits
2.5.2 Stated preference method: discrete choice experiment
SP methods involve analysing expected behaviour, rather than observed behaviour
(Lancsar and Louviere 2008). These valuations are invoked through surveys or
experiments where respondents are asked to give values directly, or infer them
indirectly. There are a number of SP techniques that can be used to elicit values, such
as time trade-off, person trade-off, standard gamble, the CVM, and DCE. Unlike the
RP approach, which relies on the presence of a market demand curve, the SP technique
is based on an income-compensated demand curve (McIntosh 2010b; Sugden and
Williams 1978). This affords true measures of welfare change through the CV and EV
of welfare economics; the RP approach relies on CS to measure welfare change
(McIntosh 2010b; Sugden and Williams 1978).
However, the validity of SP data is sometimes questioned. It relies on the assumption
that the choices inferred by respondents in a survey actually reflect the choices that
they would make in the real world. The behaviour of individuals may vary
considerably from a hypothetical scenario to a real scenario where they are forced to
choose between competing alternatives (Cookson 2003). SP methodology is further
questioned on its relevance to economic analysis as its theoretical basis is
psychological. For example, the Austrian school of economic thought maintains that
the only source of information relevant to economic analysis is provided by the
market.
As noted, there are a number of SP techniques that can be used to elicit values. A
45
The DCE, or some variant of the choice experiment, has been widely used across
various academic disciplines since its inception in the 1960s. However, due to varying
techniques, there is considerable ambiguity around the name, purpose, and theoretical
foundations of the experiment. In the 1960s choice experiments were introduced by
psychologists as a means of understanding ordered relations on sets, resulting in the
theory of Conjoint Measurement (Luce and Suppes 1965). Psychologists believed that
people could reliably and validly rank a set of multi-attribute scenarios, also known as
profiles, where additive, multiplicative or dual-distributive models could algebraically represent what psychologists term ‘decision rules’, and what economists term ‘indirect utility functions’ (Louviere and Lancsar 2009). However, using specific algebraic forms to examine ordered relations on sets carries significant implications on precision
and understanding of SP data. Principally, it fails to capture whether humans differ in
monotonic transformations, and whether these differences are generalisable (Louviere
and Lancsar 2009).
The choice experiment was adopted by the marketing literature in the 1970s and
became known as conjoint analysis (CA) (Green and Rao 1971). In CA, respondents
are presented with a series of competing scenarios and asked to rank or rate
alternatives. To aid respondents’ decision-making, the competing alternatives are described in terms of attributes where each attribute is made up of different levels
(Louviere and Lancsar 2009; Ryan 1996). In the 1970s and 1980s the number of
published studies using CA increased considerably with an estimated 400 marketing
studies published each year in the early 1980s (Wittink and Cattin 1989). CA was
commonly associated with two behavioural paradigms during this period: Social
46
1981). However, the choice experiment was originally devised on the axiomatic theory
of Conjoint Measurement, which is more about the behaviour of numbers rather than
humans or, moreover, orders on sets rather than human preferences (Louviere and
Lancsar 2009). It maintained that the rankings provided by humans represented
utilities, elicited through additive, multiplicative, and dual-distributive algebraic
models (Louviere and Lancsar 2009). Implied preferences are represented “as if”
humans simply added or multiplied, for example, attribute levels to assign a ranking
to each profile, or combination of attribute levels (Louviere et al 2010). As a result,
the origins of CA stem from statistical theory, rather than behavioural theory, relying
on mathematical techniques to represent ordered relations on sets in response to
systematic, factorial manipulations of factor levels (attribute levels), known as
“factorial designs” (Louviere et al 2010).
Because CA has little to do with human preferences, it is argued that it is unsuitable
for use as a stated preference technique in applied economics (Louviere et al 2010).
For instance, it is largely inconsistent with demand theory in that it ignores the
traditional constraints which underlie the economic concept, such as budget
constraints (Louviere et al 2010). It can ask respondents in a survey to rank or rate
alternatives that they simply could not afford, rendering the results meaningless. In
addition, CA has various logical limitations which distance it from practical
application in economic evaluation. The axioms of Conjoint Measurement are only
loosely related to utility theory, as illustrated above – CA reveals nothing about human
preferences. Further, there is no error theory, statistical or otherwise, associated with
47
2010). Therefore, it is unable to test differences in statistical models; the error
component is merely an afterthought.
The development of alternative choice techniques emerged in the economic literature
in the 1980s. The discrete choice experiment, as it is known in its current construct,
was pioneered by Louviere and Woodworth (1983) using experimental design theory.
For the first time, the authors constructed profiles, or choice sets, that respondents
could choose between, rather than rank or rate like CA. They found that the
experimental design elicited choices that were consistent with conditional logit models
(Louviere and Woodworth 1983). However, the study was interpreted and labelled as
another form of CA, termed ‘choice-based conjoint analysis’. It is this description that has caused much confusion around the name, purpose, and theoretical foundations of
the DCE. The DCE differs from CA for one major reason: the DCE is founded in
economic theory, while CA is rooted in statistical theory.
2.5.2.1 Theoretical foundations of the DCE
There are various theoretical underpinnings of the DCE. First and foremost, it is
strongly rooted in the standard economic theory of consumer behaviour, which
assumes that individuals are rational decision makers who consistently seek to
maximise a set of stable preferences (Amaya-Amaya et al 2008; Lancsar and Louviere
2008). The DCE assumes that respondents, when faced with a choice of competing
alternatives, or bundles, assign preferences to each alternative and choose the bundle
that satisfies a set of innate preferences. However, the neoclassical consumer evaluates
the optimisation problem, or discrete choice, according to the maximum utility that
48
constraints. As a result, the traditional theory of consumer behaviour has three key
extensions to the DCE (Amaya-Amaya et al 2008).
In the first instance, discrete choice theory assumes that utility is derived from the
various attributes of a good, rather than of the good in its entirety; traditional consumer
theory assumes that utility is a function of quantities (Amaya-Amaya et al 2008). This
idea of utility, or demand, is drawn from Lancaster’s (1966) economic theory of value. Lancaster (1966) argued that consumers are not influenced by the good itself, but by
the properties or attributes that embody the good. Any change in the attributes of the
good may cause individuals to switch from one bundle of goods to another,
representing a shift towards the bundle of goods with the most beneficial combination
of attributes. Secondly, the optimisation problem in a discrete choice experiment
subjects respondents to a set of finite and mutually exclusive alternatives (Amaya-
Amaya et al 2008). Participants in a DCE are subsequently restricted by an additional
constraint, whereas with traditional consumer theory, individuals are faced with
infinite choices, bound only by time and budget constraints. Lastly, discrete choice
theory assumes that the behaviour of individuals is probabilistic, rather than
deterministic as assumed by traditional theory (Amaya-Amaya et al 2008; Louviere et
al 2010). Probabilistic discrete choice modelling derives from Thurstone’s (1927)
theory of paired comparisons, and is called random utility theory (RUT). RUT is a
comprehensive behavioural theory which provides an explanation of the behaviour of
49 2.5.2.2 Random utility theory
Random utility theory (RUT) was proposed by Thurstone (1927), introduced into
economics by Marschak (1960) and later refined by McFadden (1974) and Manski (1977). RUT proposes that individuals have a latent construct of (indirect) “utilities” in their heads that are unobservable to researchers (Amaya-Amaya et al 2008;
Louviere et al 2010). Because these utilities cannot be “seen” by researchers, RUT
assumes that they can be decomposed into two additively separable components: a
systematic (explainable) component and a random (unexplainable) component. The
systematic component represents the attributes that explain differences in choice
alternatives and covariates that explain differences in individuals’ choices (Amaya- Amaya et al 2008; Louviere et al 2010). The random component captures the variation
in preferences that are unidentified. This can include differences in choices arising
from individual differences in utility, rather than choice options per se, measurement
errors, or specification errors, along with unobserved or unobservable attributes
(Amaya-Amaya et al 2008; Louviere et al 2010; Viney et al 2002). The latent utility
can, more formally, be described as:
𝑈𝑖𝑛= 𝑉𝑖𝑛+ 𝜀𝑖𝑛 (2.4)
where 𝑈𝑖𝑛 represents the latent, unobservable utility associated with choice alternative 𝑖 for individual 𝑛, 𝑉𝑖𝑛 is the systematic component of the utility of choice 𝑖 for individual 𝑛, and 𝜀𝑖𝑛 is the random component associated with choice 𝑖 and individual 𝑛.
50
Because RUT is probabilistic, it assumes that individual 𝑛 will choose alternative 𝑖 if, and only if, its utility is higher than any other option amongst all 𝐽 alternatives in the choice set 𝐶𝑛:
𝑦𝑖𝑛 = 𝑓(𝑈𝑖𝑛) = {1 if 𝑈0 otherwise 𝑖𝑛= Max {𝑈𝑖𝑗}∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛
(2.5)
where 𝑦𝑖𝑛 represents the choice indicator equal to 1 if individual 𝑛 chooses alternative 𝑖, or 0 otherwise. According to Eqn. 2.4, alternative 𝑖 is chosen by individual 𝑛 if, and only if:
(𝑉𝑖𝑛+ 𝜀𝑖𝑛) > (𝑉𝑗𝑛+ 𝜀𝑗𝑛) ∙ ∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛 (2.6) This can be rearranged to place the unobservable and observable components together:
(𝑉𝑖𝑛− 𝑉𝑗𝑛) > (𝜀𝑖𝑛− 𝜀𝑗𝑛) (2.7) Because (𝜀𝑖𝑛− 𝜀𝑗𝑛) is unobservable, it’s not possible to determine exactly if (𝑉𝑖𝑛− 𝑉𝑗𝑛) > (𝜀𝑖𝑛− 𝜀𝑗𝑛). As a result, it is only possible to estimate choice outcomes up to a probability of occurrence (McFadden 1974). For instance, the probability that
individual 𝑛 will choose 𝑖 is the same as the probability that the difference between the error components is less than the difference in the observable component between the chosen alternative 𝑖 and any other alternative 𝑗, amongst all 𝐽 alternatives in the subset 𝐶𝑛:
𝑃𝑖𝑛 = Pr(𝑦in =𝑋1
𝑖𝑛, 𝛽)
= Pr(𝑈𝑖𝑛 > 𝑈𝑗𝑛) ∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛
51
= Pr (𝜀𝑗𝑛 − 𝜀𝑖𝑛 < 𝑉𝑖𝑛− 𝑉𝑗𝑛)∀𝑗 ≠ 𝑖 ∈ 𝐶𝑛 (2.8)
Again, it is not possible to observe (𝜀𝑖𝑛− 𝜀𝑗𝑛) across the population. RUT subsequently assumes that the distribution relates to the choice probability and the
selected distribution or density function; a probability distribution is assumed for the
random component (McFadden 1974). There are various probability distributions that
can be applied for 𝜀𝑖𝑛, resulting in families of probabilistic discrete choice models, such as binary or multiple discrete choice models. (Amaya-Amaya et al 2008;
Louviere et al 2010). The exact probabilistic discrete choice model to apply in a DCE
depends on the assumptions about the probability distributions for 𝜀𝑖𝑛. For instance, Thurstone (1927) assumed that the random components were distributed as non-
independent and non-identical normal random variates. But this assumption restricts
RUT in that it can only be applied in dichotomous discrete choice models, or binary
choice models, rather than multiple discrete choice models. McFadden (1974) later
developed RUT and assumed that the random components were distributed
independently and identically with a Gumbel distribution (extreme value type 1). This
assumption gave rise to the standard conditional logit (CL) model, or multinomial logit
(MNL) model.
The CL model is the easiest and most widely used probabilistic discrete choice model.
However, it has received considerable criticism because it relies heavily on restrictive
assumptions, such as random taste variation: the CL model can account for systematic
(observed) heterogeneity across observed characteristics, such as income or education,
52
remain random, regardless of observed characteristics (Amaya-Amaya et al 2008). As
a result, various new probabilistic discrete choice models have been proposed to better
represent human behaviour in choice models, including heteroscedastic models and
random parameters or mixed logit models (Amaya-Amaya et al 2008). The
appropriate model to use depends on a range of issues; most notably, the design of the
study. This is discussed in the following section.
2.5.2.3 From theory to practice: how to conduct a DCE
There are various stages involved in designing, analysing, and interpreting a DCE.
There are also various best-practice guidelines published on how to design and
construct a DCE. Ryan (1996) outlined five succinct stages: identification of
attributes; identification of levels; experimental design; data collection; and data
analysis. Adamowicz et al (1998) further refined the key steps to include questionnaire
development, sample sizing, and computerised support (Adamowicz et al 1998). The
design model proposed by Ryan (1996) is typically used in the health economics
literature, although this thesis draws on the most recent published guidelines on
conducting DCEs, proposed by Lancsar and Louviere (2008), as these guidelines are
expansive and particularly useful to novice practitioners. The design model proposed
by Ryan (1996) is included in these updated guidelines and expanded upon. Lancsar
and Louviere (2008) argue that there are three main components that need to be
considered when undertaking a DCE: experimental design, discrete choice analysis,
and welfare measures and other policy analyses. These are described separately below
2.5.2.4 Experimental design: choice survey and data
53
A DCE requires specific consideration of the choice format, framing of the choice set,
and relevance of the choice set to respondents to ensure proper design and
implementation (Lancsar and Louviere 2008). Carson et al (2000) note that true
preferences are most accurately revealed when the discrete choice is incentive
compatible (Carson et al 2000). There are different choice formats that can be applied
in a DCE, such as binary choice (yes/no), dichotomous choice (two alternatives), or
multiple choice (three or more alternatives). The appropriate format may depend on
the relevance of the goods to the respondent. Often in health care applications, the
choice between two comparators is hypothetical. In such cases, dichotomous choice
questions are inappropriate as participants are forced to reveal a preference for a
consumption bundle that they may never choose in practice. As a result, it is often
fitting to include an opt-out option, ‘choose neither’ option, or status-quo option.
It is also important to consider the impact that labelled (i.e. paracetamol, nurofen)
choice sets might have on revealed preferences. A generic description (i.e. drug A,
drug B) might be better suited where respondents have already experienced one of the
consumption bundles as this reduces the potential bias against any comparator. This is
known as status-quo bias (Ryan and Ubach 2003), or the endowment effect (Thaler
1980).6 A related decision, which is particularly important in health care applications,
is how much information should be provided to ensure that respondents are well
6 The endowment effect is a behavioural economic concept that refutes the standard economic theory
of the consumer (Thaler 1980). It asserts that initial endowments of wealth alter individuals’ preferences. It is typically used to explain the observed disparity between willingness to pay and willingness to accept, although it has recently been extended to explain how initial endowments of experience can influence preferences in the same way (Ryan and Ubach 2003; Salkeld et al 2000).
54
informed and are less influenced by previous experience/knowledge or assumptions
acquired from elsewhere.7
Identifying appropriate attributes and levels
The selection of appropriate attributes and attribute levels is crucial in a DCE to
capture the systematic component of the utility function. There are various methods
that can be used to obtain attributes, such as literature reviews, focus groups,
interviews with relevant personnel (including patients, policy makers, and experts),
and patient surveys (Coast et al 2011; Lancsar and Louviere 2008; Ryan and Hughes
1997).8 There are two important issues that need to be considered when formulating
attributes and levels (Ryan 1996). First, attributes must be meaningful and important
to respondents, while also addressing issues relevant to policy makers. Moreover, they must be ‘plausible’, and respondents should be able to trade across different attribute levels (Ryan 1996). Second, attributes must be comprehensive and measurable,
whether they are quantitative (i.e. distance to hospital) or qualitative (i.e. type of
provider). In order to be comprehensive, attributes must be meaningful to respondents.
Further, attribute levels must clearly depict the attractiveness of the associated
attributes (Ryan 1996). To be measurable, it must be reasonable to obtain a probability
distribution for an attribute that covers the range of selected attribute levels. It must also be reasonable to identify respondents’ preferences using a utility function (Ryan 1996).
7 For information on the development of the choice process and format within this analysis, see Chapter
4, section 4.2.1.1.
8 This analysis uses focus groups to develop attributes. The focus groups are presented in Chapter 3.
55
The number of attributes to use in a DCE is context specific. The general consensus is
that it should not exceed eight; however, some studies have included as many as 12
attributes (Hall et al 2006; Lancsar and Louviere 2008). By including a large number
of attributes in a DCE, the complexity of the experimental design is amplified (Lancsar
and Louviere 2008). It is necessary that specific consideration is given to relevant and
irrelevant attributes. A relevant attribute is one in which its exclusion would affect the
conclusion of the DCE, while an irrelevant attribute is one in which its exclusion
would not affect the results of the DCE. It is not always easy to distinguish relevant
from irrelevant attributes, especially if attributes that are currently demand-irrelevant
become relevant due to an increase in consumer knowledge, for example (Bennett and
Blamey 2001). Exclusion of relevant attributes may result in biased estimates, but this
needs to be weighed against an increase in task complexity arising from the inclusion
of more attributes (Lancsar and Louviere 2008). Lancsar and Louviere (2008) note
that in order to get the balance right, iterative piloting may be necessary.
There is no limit on the number of levels that an attribute should contain. Ryan (1999)
argues that an attribute level should be plausible, actionable, and tradable. The distance
between levels is crucially important to encourage respondents to trade across levels
(Lancsar and Louviere 2008). The interval distance should be sufficiently wide, rather
than too narrow or too wide, such that respondents can recognise reasonable
differences between levels, and are encouraged to trade. This is particularly important
for qualitative attributes where the expressed difference between levels can be easily
misinterpreted by respondents (Amaya-Amaya et al 2008). Interval distances that are
too narrow or too wide may be viewed as too insignificant or too significant,
56
that are insignificant or extreme. For quantitative attributes, such as cost attributes, the
distance between intervals can be easily interpreted, however, particular care is needed
where the aim is to implicitly determine the cost of other attributes using MRS
(Amaya-Amaya et al 2008; Lancsar and Louviere 2008). Another important
consideration concerning attribute levels arises from attribute-effects. An attribute-
effect arises when an increase in the number of attribute levels causes an attribute to
become more significant relative to attributes with fewer levels. For example, an
attribute with five levels will have a more profound impact on the DCE than an
attribute with three levels. One way to reduce this bias is to set the number of attribute
levels equal to each other for every attribute. However, this is often impractical as
most studies include cost attributes, and cost attributes are usually more informative
when wide ranges and greater attribute levels are included.
The experimental design
The next step in the design of a DCE involves selecting an experimental design and
constructing choice sets. Once all the attributes and associated levels are identified,
alternatives or consumption bundles, also called scenarios, profiles, or options, are
created using different combinations of attribute levels (Amaya-Amaya et al 2008).
Depending on the choice format, respondents are presented with competing
alternatives and asked to select their preferred option. The process of selecting
consumption bundles and placing them into choice sets is known as experimental
design (Amaya-Amaya et al 2008; Lancsar and Louviere 2008). Experimental design produces an estimation matrix which examines respondents’ choices (dependent