1

**Estimating health state utility values from discrete choice experiments – a **

**QALY space model approach **

Yuanyuan Gu, Richard Norman, Rosalie Viney

Centre for Health Economics Research and Evaluation, University of Technology, Sydney, Australia

**Corresponding author: **

Yuanyuan Gu,

Centre for Health Economics Research and Evaluation, University of Technology, Sydney,

PO BOX 123, Broadway, NSW, 2007, Australia. E-mail: yuanyuan.gu@gmail.com

Phone: +61 2 9514 9886 Fax: +61 2 9514 4730

**Keywords: **

2

**Abstract **

Using discrete choice experiments (DCEs) to estimate health state utility values has become an important alternative to the standard methods such as the Time Trade-Off (TTO). Studies using DCEs have typically used the conditional logit to estimate the underlying utility function. We show that this approach will lead to the valuation of each health state from “an average person” in the population. By contrast, the standard approach that has been

developed for the TTO method is based on estimating the “average valuation” for a health state within the population. These are fundamentally conceptually different approaches and have different interpretations in policy evaluation. In this paper we point out that it is also possible to estimate the “average valuation” for a health state when using DCEs. The

estimation approach is based on the mixed logit (MIXL). In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingness-to-pay literature. These methods are applied to a data set collected using the EQ-5D. The results demonstrate that the preferred QALY space model provides lower estimates of the utility values than the conditional logit, with the divergence increasing with worsening health states.

3

**1 Introduction **

For the evaluation of new health technologies, it is conventional to model their effect using the quality-adjusted life year (QALY). QALYs combine quality of life and life expectancy into a summary measure that reflects preferences for these two dimensions of health gain (Pliskin, et al., 1980). The use of cost-utility analysis, with outcomes measured in terms of QALYs is now recommended by most health technology agencies internationally. A number of standard generic quality of life instruments have been developed for the purpose of measuring and valuing quality of life to facilitate estimation of QALYs directly from patient reported outcomes (Brazier, 2007). These instruments, known as multi-attribute utility instruments describe the health state space in terms of several dimensions of quality of life, and include a preference based scoring algorithm that can be interpreted on a cardinal scale. Typically, standard preference based valuation techniques such as the Standard Gamble (SG) and Time Trade-off (TTO) have been used to derive the scoring algorithms to assign the scores (known as utility values or QALY weights) to the universe of health states described by the instrument.

In the past decade, several authors have considered the use of discrete choice experiments (DCEs) to estimate health state utility values, as an alternative to TTO and SG based

techniques (Bansback, et al., 2012; Coast, et al., 2008; Flynn, 2010; Hakim and Pathak, 1999; Lancsar, et al., 2011; Ratcliffe, et al., 2009; Ryan, et al., 2006; Viney, et al., 2013). In the approach developed by Bansback, et al. (2012), and used by others, the health state utility values are estimated based on the conditional logit model. Broadly, in this approach, the conditional logit is used to estimate coefficients of the attributes that describe a health profile. Utility decrements associated with any move away from full health can be estimated for each dimension and level by computing the ratios between the estimated coefficients of the

non-4 time attributes and that of the time attribute.1 Utility values assigned to specific health states are then calculated by summing the relevant utility decrements and subtracting them from one. This approach has important conceptual differences from the approach that has been developed for the TTO and the SG. The standard approach that has been used in the QALY literature and in economic evaluation is based on finding the “average valuation” of a health state for the relevant population. Effectively this involves estimating the health state utility values for each individual in the population and then averaging these individual utility values over the whole population. In contrast, the approach using conditional logit is to find the valuation of a health status from “an average person” in the population. These are

conceptually different approaches and therefore have potentially different interpretations in policy evaluation.

In this paper we demonstrate that it is possible to estimate the “average valuation” for a health state when using DCEs. The estimation approach is based on the mixed logit (MIXL) which allows us to derive the population distributions of utility decrements and then the means of these distributions. In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingness-to-pay (WTP) literature. The QALY space model has several advantages over the preference space model and the most significant one is that it allows us to directly estimate and compare different distribution assumptions for the utility decrements. A specific contribution is made to the estimation of a QALY space model with utility decrements assumed to follow a multivariate Johnson’s SB distribution. In the choice modelling literature this type of model has been very difficult to estimate due to an identification problem (Rigby and Burton, 2006; Train and Sonnier, 2005). In this paper we show that using informative priors on the bounds

1

5 may improve the identification and estimating the bounds with other parameters

simultaneously is possible.

In this study, we develop methods to estimate utility values for EQ-5D health states although these methods could be applied to other instruments that are based on a linear additive model, such as the SF-6D. These methods are applied to a data set which has been previously used to estimate health state utility values. The utility values estimated from the selected MIXL model and the conditional logit are compared.

**2 Valuing EQ-5D health states using DCEs **

The EQ-5D, developed by the EuroQol Group, is the most widely used multi-attribute utility instrument (Richardson, et al., 2011; Szende, et al., 2007). It has five dimensions, intended to represent the major areas in which health changes can manifest: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. For the most commonly used version of the EQ-5D, each dimension contains three levels, loosely classified as ‘No Problems’, ‘Some Problems’, and ‘Extreme Problems’. Details are shown in Table 1. There are 3^5 = 243 potential states in the descriptive system.

[Insert Table 1 around here]

The traditional approach to value these 243 states has been to administer a TTO preference based task for a sample of health states in a population based sample, and then use regression based modelling to impute the values of the remaining health states (Dolan, 1997; Szende, et

6 al., 2007; Viney, et al., 2011). There is an extensive literature on this broad approach

including a series of examinations on their limitations which might have led to the current trend of investigating alternative methods (Bosch, et al., 1998; Craig, et al., 2009; Norman, et al., 2010). For example there have been explorations of alternative specifications of the TTO, including Lead-Time and Lag-Time TTOs (Devlin, et al., 2011). A review of the

development of using DCEs to value health states can be found in Bansback, et al. (2012).

**2.1 The DCE data **

Viney, et al. (2013) have developed a DCE based algorithm for the Australian population, and the data from that study are used in the current analysis. This section briefly describes the experiment. The DCE was developed and administered to a sample of the Australian general population. Respondents were asked to choose between health profiles described in terms of EQ-5D profiles and survival attributes. Each choice set included three options: two health profile options and an immediate death option. Each health profile option in a choice set was defined by five attributes covering the dimensions of the EQ-5D and a survival duration attribute. Five survival durations (1, 2, 4, 8 and 16 years) were included in the experiment.

The third option of immediate death was included to allow for a complete ranking of health profiles over the “worse than death” to “full health” utility space. The task for the respondent was to identify which of the three options was considered the best, and which the worst, thus providing a complete ranking within each choice set. An example of a choice set is provided in Figure 1.

7 Details of the experimental design can be found in Viney, et al. (2013). Although each choice set included an immediate death option, only the choice between the two non-death profiles was considered.2 Therefore the analysis was based on a constructed choice set with only the rankings of these two profiles.

A total of 1,120 individuals consented to participate in the survey and were eligible to

participate. Of these, 1031 completed they survey, giving a response rate of 92.1%. Viney, et al. (2013) showed that overall the characteristics of these who completed the task are broadly comparable to the characteristics of the general Australian population. Each respondent faced 15 choice sets, which translates into 15,465 observations.

**3 Using conditional logit **

As Viney, et al. (2013) and Bansback, et al. (2012) both noted, an additive utility function with life expectancy and the levels of the EQ-5D would be inconsistent with the theoretical framework that underpins QALYs, because the QALY model requires that all health states have the same utility at death, i.e., as survival approaches zero, the systematic component of the utility function should similarly tend to zero. This satisfies the zero condition implicit in the QALY model (Bleichrodt and Johannesson, 1997; Bleichrodt, et al., 1997). Therefore, the utility of option in choice set for survey respondent is assumed to be

*isj*
*isj*
*isj*

*isj*

*isj* *TIME* *X* *TIME*

*U* ,

(1)

2

Flynn, et al. (2008) argues that including the immediate death option in the choice modelling violates the random utility theory as some respondents may always choose survival over death no matter what health profiles are provided to them.

8 where represents a set of dummy variables relating to the levels of the EQ-5D health state, represents survival, and the error term are i.i.d. Gumbel distributions.

It is conventional to use the best level of each dimension as the reference category. In this case excludes the dummies representing the best levels with other 10 elements remaining: MO2, MO3, SC2, SC3, UA2, UA3, PD2, PD3, AD2, and AD3. For example, a health state denoted as 12221 should translate into a vector (0, 0, 1, 0, 1, 0, 1, 0, 0, 0).

In the current literature, the α and β terms have been assumed to be constant across

individuals and based on this assumption equation (1) leads to the conditional logit model.3 It
is our baseline model and denoted as *M1*.

The estimation of α and β does not directly lead to the valuation of health states. An approach
is needed to anchor the latent utility scale to the health state utility scale. There are several
ways to derive this algorithm (Bansback, et al., 2012; Ratcliffe, et al., 2009; Viney, et al.,
2013). The main idea is that the utility value of a health state is its marginal utility of *TIME*

on the latent scale, i.e.,

*X*
*TIME*
*U* _{}_{} _{}_{} _{}
.

In the case of full health, its marginal utility of *TIME* on the latent scale is

_{}
*TIME*
*U*
,

which needs to be normalised to be 1 under the QALY model. Hence the normalising constant is α and the utility score for a health state is

3

9
*X*
1 .

The utility decrements are therefore β α⁄ .

**3.1 Average valuation versus an average person’s valuation **

As noted by Bansback, et al. (2012), the objective was to derive the population mean utility scores for all possible health states, which requires estimation of population mean utility decrements. The conditional logit parameter estimates and represent population mean preferences for attributes that describe a health profile. In effect, the estimate ⁄ represents an “average person” in the population whose preference parameters are exactly and . In this case, ⁄ is actually the estimate of this average person’s utility decrements. This is conceptually different from the population mean utility decrement which would be estimated by deriving, for each person in the target population for the person’s and , and using this to calculate that individual's utility decrements ⁄ . The population mean utility decrements are then computed as the average of all the individual decrements. Mathematically, this procedure can be described as

1

which may or may not be close to ⁄ , i.e., the ratio of means ( ∑ ∑ .

It is worth noting that when the TTO approach is used, this issue does not arise. When using TTO, a sample of health states are selected and respondents’ utility scores for these health states are elicited. These scores are then used as the dependent variable in a model which is

10 regressed on . In this case the regression coefficients, representing population mean utility decrements, are directly estimated using least squares (Dolan, 1997; Viney, et al., 2011).

**4 Using MIXL: preference space versus QALY space **

One possible way to estimate the population mean utility decrements is to use a framework based on random parameters. Equation (1) can be rewritten as

*isj*
*isj*
*isj*
*i*
*isj*
*i*

*isj* *TIME* *X* *TIME*

*U*

* * (2)
where and are both random. The induced model is called the MIXL. Under this
framework, we first estimate the distributions of (i.e., the distributions of utility
decrements) and then derive the means of these distributions.

To find the distribution of the ratio of two random variables is a longstanding problem. It has been particularly investigated in the WTP literature where represents the coefficient of price and represent the coefficients of non-price attributes in a DCE. Hensher and Greene (2003) and Daly, et al. (2012) discussed the major challenges in this area of research. The first challenge is that may not have finite moments unless is assumed to have some specific distributions such as log-normal. In our case, to assume to be a log-normal

random variable is reasonable because represents a person’s preference for the duration of life at perfect health condition and should be always positive.

The second challenge concerns the extreme values that arise from the reciprocal of a random variable. As long as can take very small values, 1/ will produce quite large numbers.

11 This problem is increasingly acute when ’s distribution has thick tails (e.g., student t and log-normal).

We therefore estimated two MIXL models:

* M2.1*: log and follow multivariate normal distribution with mean μ and variance ;

* M2.2*: log and log follow multivariate normal distribution with mean μ and variance

.4

The second model (*M2.*2) has the advantage of assuring the decrements’ distributions are
strictly negative and the disadvantage of inducing a lot of extreme values. In contrast, the first
model (*M2.*1) may suffer less from the extreme values but it cannot guarantee each

individual’s utility decrements are strictly negative.

Another challenge that has not been addressed in the literature is that the distribution of is induced from our assumptions on the distributions of and so it is not possible to directly compare and test the distributions of . In the WTP literature, alternative methods have been developed to meet these challenges (Daly, et al., 2012). Among them the most

promising effort has been the invention of the WTP space model (Train and Weeks, 2005). The name “WTP space” was proposed as a contrast to the “preference space” on which the framework described above is based. The WTP space model is essentially a

re-parameterisation of equation (2) so that the distribution of can be directly assumed and estimated. We adapted this idea to our context and named the approach “QALY space model”. We now re-parameterise equation (2) as

4

For estimating *M2.2* we need to change the signs of the data corresponding to to their opposite. This applies
to other models when log-normal distribution is assumed for negative coefficients.

12

##

*isj*

*i*

*isj*

*isj*

##

*isj*

*i*

*isj* *TIME* *X* *TIME*

*U* * *(3)

where / . Under this new framework we may estimate and compare models that
assume different distributions on the utility decrements * _{i}*. For the EQ-5D DCE data, we
estimated three models:

* M3.1*: log( and follow multivariate normal distribution with mean μ and variance ;

* M3.2*: log( and log( ) follow multivariate normal distribution with mean μ and variance

;

* M3.3*: log( and log( follow multivariate normal distribution with mean μ and
variance , where represents the size of (i.e., the number of utility decrements) and
represents a positive unknown scalar parameter.

The model *M3.1* and *M3.2* assume normal and log-normal distributions for the utility

decrements respectively. Both have merits and flaws; the normal distribution has thin tails but cannot ensure everyone has negative decrements while the log-normal distribution is the opposite – it can ensure everyone has negative decrements but has a thick right tail that may lead to very large mean estimates.

Model *M3.3* assumes Johnson’s SB distribution for the utility decrements, i.e.,

exp / 1 exp ) (4)

where is normally distributed. It is a special case of Johnson’s SB distribution with the lower bound set as 0 and the upper bound to be estimated.5 This distribution has both merits of normal and log-normal: thin tail and only taking positive numbers. Literature also

5

As the log-normal case, we changed the signs of the data corresponding to to their opposite. Therefore, a decrement’s distribution should have a lower bound - and an upper bound 0.

13
shows that a wide variety of distributions such as normal, log-normal, Weibull, and modified
beta can be satisfactorily fitted by the Johnson’s SB distribution (Yu and Standish, 1990).
Moreover, it has been shown that Johnson’s SB distribution can accommodate data with two
modes spiked at the lower and upper bounds (Rigby and Burton, 2006). Based on these
evidences we expected *M3.3 *to be the best modelling strategy, especially given that we have
limited prior knowledge on the shape of the distributions of utility decrements.

**5 Estimation and model comparison **

The most popular methods for estimating MIXL are simulated maximum likelihood (SML)
and Bayesian. Each has relative merits (Regier, et al., 2009; Train, 2003). The SML method
is widely used, as most econometric and statistic software have developed standard routines
to estimate MIXL based on this method.6 However, the Bayesian approach has several clear
advantages that suit our case. First, we assume all the random coefficients are correlated
which leads to the estimation of a large covariance matrix. The SML method can be very
time consuming in this case. And even with large number of simulation draws, convergence
is not always guaranteed. In contrast, the Bayesian approach estimates correlated MIXL and
uncorrelated MIXL at almost the same speed (Train, 2003). Second, the SML method cannot
estimate *M3.3* without fixing the bounds while the Bayesian approach may estimate the
bounds and other parameters simultaneously by using informative priors (we will show this
in a moment). Therefore, in this study we chose to use the Bayesian method to estimate all
the models including the conditional logit which is a special case of MIXL with its set as

6

For example, in STATA the –mixlogit– routine (Hole, 2007) can be used to estimate the MIXL models in preference space while the –gmnl– rountine (Gu, et al., 2013) can be modified to estimate MIXL models in QALY space or WTP space (Fiebig, et al., 2010; Greene and Hensher, 2010; Hole and Kolstad, 2012).

14 an empty matrix. The sampling scheme for estimating the MIXL models in preference space was given in Train (2003). The Matlab code written by Kenneth Train was used.7

It is also straightforward to estimate the MIXL models in QALY space including *M3.1 *and

*M3.2*; only a slight modification of the likelihood function is needed. The challenge comes
from *M3.3.* As Train and Sonnier (2005) pointed out, in equation (4) the bound parameter
is closely related to the variance of and thus the model is under-identified. In the choice
modelling literature, this under-identification is usually solved by fixing the ’s at a series
of constants and then selecting the model with the best log-likelihood estimate. This approach
is called “grid search”. The grid search method works well on the univariate case but for the
multivariate situation it can be extremely laborious (Rigby and Burton, 2006). In our case, we
have a 10 dimension multivariate Johnson’s SB distribution and to identify the optimal point
on the 10-D space is computationally infeasible. It is therefore necessary to seek an

alternative solution.

Our approach was based on using informative prior distributions on the bounds so that Bayesian identifiability of the model can be obtained.8 The priors were log-normal

distributions, constructed based on the estimates from *M3.2*. More specifically, the chosen
priors cover the largest 99th percentile of the 10 log-normal distributions estimated from *M3.2*,
a reasonable assumption of the upper bound of the bound parameter . The bound

parameters were sampled as a vector using the random walk Metropolis-Hasting algorithm.9 In order to confidently use the post burn-in iterates for inference, it is necessary to check that the sampling scheme has converged. We judged convergence visually by running the

sampling scheme from three different initial positions and plotted various functionals of the

7

Available from http://elsa.berkeley.edu/~train/software.html 8

The mechanism is explained in detail in Scheines, et al. (1999).

9

15 iterates on the same graph. Successful convergence was indicated by the overlap of the functionals from the three chains. Following Train (2003), we adopted a frequentist interpretation of the Bayesian estimates, i.e., the posterior means and standard deviations were used as the point estimates and standard errors. The decrements’ distributions were simulated using 100,000 random draws. The log-likelihood was calculated at the point estimates using 100,000 random draws. We also used AIC as the criterion for model

comparisons.10 We did not use BIC as it penalises sample size heavily and thus, for very large sample sizes such as in this case, it is less informative in distinguishing between models that involve additional parameters.

**6 Results **

**6.1 Estimation of Conditional logit (M1) **

The parameter estimates of the conditional logit model were given in Table 2. Utility decrements based on ⁄ were reported in the last column of Table 2. The interpretation of these numbers is that they represent an average person’s utility decrements.

[Insert Table 2 around here]

10_{ }

We also used AICc which penalizes sample size, but due to the large sample size, AICc is almost identical to AIC.

16

**6.2 Estimation of MIXL using preference space (M2.1 and M2.2) **

The parameter estimates of the two MIXL models using preference space were given in Table
3 (*M2.1*) and Table 4 (*M2.2*). Base on log-likelihood and AIC, both models were

substantially better than *M1*. *M2.2 *also completely dominated *M2.1* in terms of model fit
indicating that the log-normal distribution assumption on accommodated the data much
better than the normal distribution assumption.

[Insert Table 3 around here]

[Insert Table 4 around here]

Based on these parameter estimates, the distributions of were simulated. The means of
these distributions were reported in the tables as the population mean estimates of utility
decrements. By comparing these two sets of estimates with the estimates from *M1*, we found
that for the size of level 2 decrements (e.g., MO2), overall *M2.2 M1 M2.1*. For the size
of level 3 decrements (e.g., MO3), overall *M2.2 M2.1 M1*. The differences for the level
3 decrements were particularly significant. To understand these differences, we plotted these
simulated distributions of in Figure 2 (for *M2.1*) and Figure 3 (for level 3 decrements
from *M2.2*).

17 [Insert Figure 3 around here]

From Figure 2 we can see that all the distributions from *M2.1* have a significant proportion of
the distribution greater than zero. This was particularly the case for the level 2 decrements.
Given the EQ-5D is designed to be monotonic (level 2 is necessarily worse than level 1 in
each dimension), this is a concern. This also explains why the mean decrements for level 2
decrements from *M2.1* were clearly smaller than the estimates from other two models.
Another finding is that extreme values existed on both tails. If these extreme values spread
out evenly on both sides the mean estimates would not be affected but unfortunately it is not
the case.

As shown in Figure 3, the problem of outliers is more severe in *M2.2.* All the distributions
have very thick right tails indicating the population mean estimates are in fact determined by
a group of extreme individuals. These extreme people may or may not exist in the real world,
and it is questionable whether, in the policy making context the resulting valuations of health
states should be driven by their valuations. To correct for this concern, a reasonable

approach is to drop the 1% or 2% most extreme values from the simulated data (Daly, et al.,
2012; Hensher and Greene, 2003). In Figure 3, we plotted the decrements’ distributions again
after discarding the 2% most extreme values. They appeared to have much thinner tails. We
also re-calculated the means which were reported in the last column of Table 4. The level 3
decrements’ mean estimates are now very close to those from *M2.1* but still significantly
larger than those from *M1*.

18

**6.3 Estimation of MIXL using QALY space (M3.1, M3.2, and M3.3) **

The parameter estimates of the first two MIXL models using QALY space were given in
Table 5 (*M3.1*) and Table 6 (*M3.2*). Base on log-likelihood and AIC, *M3.2 *was superior to

*M3.1* in terms of model fit indicating that the log-normal distribution assumption on the
utility decrements was superior to the normal distribution assumption. Indeed, under *M3.1*,
some decrements’ estimated distributions had substantial proportions greater than zero, which
potentially led to the underestimation of these mean decrements. In the case of UA2, the sign
clearly violates the monotonic condition.

[Insert Table 5 around here]

[Insert Table 6 around here]

Another interesting comparison is *M2.2* versus *M3.2*. The two models had very similar model
fit with the latter slightly better. They also produced very similar utility decrements’

distributions indicating that whilst the distribution of from *M2.2* is not in closed form it is
in fact very close to log-normal distribution.

The parameter estimates of the final model *M3.3* were given in Table 7. When estimating the
model we used informative prior distributions on all the 10 bounds: ~ 0, σ ) where σ
was chosen as 0.6. 0,0.36) covers a range from 0.25 to 4 (the 1st and 99th percentiles).
The 99th percentiles of the 10 log-normal distributions estimated from *M3.2* (the smallest 0.81
and the largest 3.64) all locate well in this range.

19
Base on log-likelihood and AIC, *M3.3 *dominated *M3.2*, confirming that Johnson’s SB is
indeed a better distribution than log-normal for describing the utility decrement’s distribution.
We plotted the estimated distributions from both models in Figure 4 which clearly

demonstrates Johnson’s SB’s advantage over normal: its shape is very close to
log-normal but has a very thin tail. Unsurprisingly, the mean decrement estimates from this
model were close to those from *M2.2* and *M3.2 *where extreme values were discarded.

[Insert Table 7 around here]

[Insert Figure 4 around here]

**7 Discussions and conclusions **

This study explored different estimation methods to provide estimates of the health state utility values that take better account of the individual heterogeneity in EQ-5D data that have been obtained using DCEs. This is important not only because previous methods do not exploit any of the individual heterogeneity in the raw data, but also because the methods for estimating health state utility values from DCE data need to model explicitly the variance as well as the means of the model parameters to provide population mean estimates of the health state utility values.

In this paper we have argued that the previous methods that did not model variance such as conditional logit essentially derive “an average person’s valuation” which is conceptually different from the “average valuation” from the population, the standard approach used in

20 TTO studies. The paper has developed methods to derive an “average valuation” from the population using DCE data. This average valuation is then more comparable with the TTO approach.

Our methods were based on the MIXL framework and two types of models were proposed in
this paper. The first is preference space modelling, which derives the distribution of utility
decrements by taking the ratio of random variables. A significant problem associated with
this approach is that the distributions are induced from our assumptions on these random
parameters and so it is difficult to directly compare these induced distributions. For example,
in our empirical analysis, we showed that *M2.2* did have better model fit than *M2.1*. However,
it did not translate into a fact that the mean decrement estimates from the former model were
more reasonable than those from the latter. In fact, the estimates from *M2.2* were severely
affected by extreme values as the induced distributions had very thick right tails. Dropping
these extreme values would make the mean estimates more robust but the choice of the
appropriate point of truncation is arbitrary.

The second approach is based on an adaptation of methods developed in the WTP literature to deal with the drawbacks of preference space models. We have adapted the WTP space model to develop the second type of model in our analysis that is the QALY space model. It is essentially a re-parameterization of the preference space model so that the decrements’ distributions can be estimated and compared directly. In the empirical analysis we tried three different distribution assumptions for the 10 utility decrements: normal, log-normal, and Johnson’s SB. The last of these provided the best model fit.

Our analysis showcased the advantages of Johnson’s SB distribution over the normal and log-normal distributions the most commonly used ones in choice modelling practice. Johnson’s SB distribution has not been widely used since it was first introduced to the choice modelling

21 literature by Train and Sonnier (2005). The major reason may be the difficulty of its

estimation which often needs an extensive search of the bounds. In this paper, we showed that it is also possible to estimate the bounds by using informative priors on them. In the empirical analysis, we identified plausible priors from a model using log-normal assumptions whose estimation showed that the bounds are likely to be smaller than 3.64. Based on this, the prior distribution was constructed as 0, σ ) where σ was set as 0.6. We also did sensitivity analysis by changing σ and found that other values between 0.5 and 1 would lead to similar results but the convergence of the model became harder as σ increases.

By comparing the mean decrement estimates from *M3.3* with the estimates from the

conditional logit model, we found that the latter appeared to have smaller sizes. The largest
differences happened to the level 3 decrements, in particular, MO3 and AD3. It is worth
mentioning that when we estimated the conditional logit we did not impose any constraints
while for *M3.3* we imposed a monotonic constraint on each dimension of the EQ-5D. To
explore the impact of doing so, we re-estimated the conditional logit with its β constraint to
be negative (i.e. to impose monotonicity), and doing so did not change the parameter
estimates at all.

In Figure 5 we plotted the predicted values for all 243 health states described by the EQ-5D
using estimates from *M1* and *M3.3*. The ranking of the 243 health states from left to right is
based on the predictions from the conditional logit approach. From the graph we can see that
the conditional logit provides higher estimates of the utility values for almost all health states,
with the divergence increasing with worsening health states.

22 The DCEs offer a valuable alternative approach to the estimation of utility values, and is an area with an increasing international profile. In particular, it can be argued that the task is less onerous for respondents. However, the methods for analysing the data, and then for

translating the result into an algorithm for use in economic evaluation remain contentious. We believe that the QALY space model approach outlined in this work represents a sensible way of using these data for this purpose, and should be explored using other generic quality of life instruments.

**Reference **

Bansback N, Brazier J, Tsuchiya A, Anis A. 2012. Using a discrete choice experiment to
estimate health state utility values. *Journal of Health Economics***31**: 306-318.

Bleichrodt H, Johannesson M. 1997. The validity of qalys: An experimental test of constant
proportional tradeoff and utility independence. *Medical Decision Making***17**: 21-32.

Bleichrodt N, Wakker P, Johannesson M. 1997. Characterizing qalys by risk neutrality.

*Journal of Risk and Uncertainty***15**: 107-114.

Bosch JL, Hammitt JK, Weinstein MC, Hunink MG. 1998. Estimating general-population
utilities using one binary-gamble question per respondent. *Medical Decision Making***18**:
381-390.

Brazier J. 2007. *Measuring and valuing health benefits for economic evaluation*. Oxford
University Press: Oxford ; New York.

Coast J, Flynn TN, Natarajan L, Sproston K, Lewis J, Louviere JJ, et al. 2008. Valuing the
icecap capability index for older people. *Social Science & Medicine***67**: 874-882.

Craig BM, Busschbach JJ, Salomon JA. 2009. Keep it simple: Ranking health states yields
values similar to cardinal measurement approaches. *J Clin Epidemiol***62**: 296-305.

Daly A, Hess S, Train K. 2012. Assuring finite moments for willingness to pay in random
coefficient models. *Transportation***39**: 19-31.

23
Devlin NJ, Tsuchiya A, Buckingham K, Tilling C. 2011. A uniform time trade off method for
states better and worse than dead: Feasibility study of the 'lead time' approach. *Health *

*Economics***20**: 348-361.

Dolan P. 1997. Modeling valuations for euroqol health states. *Medical Care***35**: 1095-1108.
Fiebig DG, Keane MP, Louviere J, Wasi N. 2010. The generalized multinomial logit model:
Accounting for scale and coefficient heterogeneity. *Marketing Science***29**: 393-421.

Flynn TN. 2010. Using conjoint analysis and choice experiments to estimate qaly values:
Issues to consider. *Pharmacoeconomics***28**: 711-722.

Flynn TN, Louviere JJ, Marley AA, Coast J, Peters TJ. 2008. Rescaling quality of life values
from discrete choice experiments for use as qalys: A cautionary tale. *Population Health *

*Metrics***6**.

Greene WH, Hensher DA. 2010. Does scale heterogeneity across individuals matter? An
empirical assessment of alternative logit models. *Transportation***37**: 413–428.

Gu Y, Hole AR, Knox S. 2013. Estimating the generalized multinomial logit model in stata.

*The Stata Journal***in press**.

Hakim Z, Pathak DS. 1999. Modelling the euroqol data: A comparison of discrete choice
conjoint and conditional preference modelling. *Health Economics***8**: 103-116.

Hensher DA, Greene WH. 2003. The mixed logit model: The state of practice.

*Transportation***30**: 133-176.

Hole AR. 2007. Fitting mixed logit models by using maximum simulated likelihood. *The *

*Stata Journal***7**: 388-401.

Hole AR, Kolstad JR. 2012. Mixed logit estimation of willingness to pay distributions: A
comparison of models in preference and wtp space using data from a health-related choice
experiment. *Empirical Economics***42**: 445-469.

Lancsar E, Wildman J, Donaldson C, Ryan M, Baker R. 2011. Deriving distributional

weights for qalys through discrete choice experiments. *Journal of Health Economics***30**:
466-478.

Norman R, King MT, Clarke D, Viney R, Cronin P, Street D. 2010. Does mode of

administration matter? Comparison of online and face-to-face administration of a time
trade-off task. *Qual Life Res***19**: 499-508.

Pliskin JS, Shepard DS, Weinstein MC. 1980. Utility-functions for life years and
health-status. *Operations Research***28**: 206-224.

Ratcliffe J, Brazier J, Tsuchiya A, Symonds T, Brown M. 2009. Using dce and ranking data
to estimate cardinal values for health states for deriving a preference-based single index from
the sexual quality of life questionnaire. *Health Economics***18**: 1261-1276.

24
Regier DA, Ryan M, Phimister E, Marra CA. 2009. Bayesian and classical estimation of
mixed logit: An application to genetic testing. *Journal of Health Economics***28**: 598-610.
Richardson J, McKie J, Bariola E. Review and critique of health related multi attribute utility
instruments. Centre for Health Economcs, Monash University, 2011.

Rigby D, Burton M. 2006. Modeling disinterest and dislike: A bounded bayesian mixed logit
model of the uk market for gm food. *Environmental and Resource Economics***33**: 485-509.
Ryan M, Netten A, Skatun D, Smith P. 2006. Using discrete choice experiments to estimate a
preference-based measure of outcome--an application to social care for older people. *Journal *

*of Health Economics***25**: 927-944.

Scheines R, Hoijtink H, Boomsma A. 1999. Bayesian estimation and testing of structural
equation models. *PSYCHOMETRIKA***64**: 37-52.

Szende A, Oppe M, Devlin N, editors. *Eq-5d value sets: Inventory, comparative review and *

*user guide*. Dordrecht, The Netherlands: Springer, 2007.

Train K. 2003. *Discrete choice methods with simulation*. Cambridge University Press: New
York.

Train K, Sonnier G. Mixed logit with bounded distributions of correlated partworths. In:
Scarpa R, Alberini A, editors. *Applications of simulation methods in environmental and *

*resource economics. *. Dordrecht, The Netherlands: Springer Publisher, 2005:117-134

Train K, Weeks M. Discrete choice models in preference space and willingness-to-pay space.
In: Scarpa R, Alberini A, editors. *Applications of simulation methods in environmental and *

*resource economics. *. Dordrecht, The Netherlands: Springer Publisher, 2005:1-16.

Viney R, Norman R, Brazier J, Cronin P, King M, Ratcliffe J, et al. 2013. An australian
discrete choice experiment to value eq-5d health states. *Health Economics***in press**.
Viney R, Norman R, King MT, Cronin P, Street DJ, Knox S, et al. 2011. Time trade-off
derived eq-5d weights for australia. *Value Health***14**: 928-936.

Yu AB, Standish N. 1990. A study of particle size distribution. *Powder Technology***62**:
101-118.

25

**Table 1. The EQ-5D instrument **

Dimension Level Description

Mobility (MO) 1 I have no problem in walking about 2 I have some problems in walking about 3 I am confined to bed

Self-Care (SC) 1 I have no problems with self-care

2 I have some problems washing and dressing myself 3 I am unable to wash and dress myself

Usual Activities (UA)

1 I have no problems with performing my usual activities 2 I have some problems with performing my usual activities 3 I am unable to perform my usual activities

Pain / Discomfort

(PD)

1 I have no pain or discomfort 2 I have moderate pain or discomfort 3 I have extreme pain or discomfort Anxiety /

Depression (AD)

1 I am not anxious or depressed

2 I am moderately anxious or depressed 3 I am extremely anxious or depressed

26

**Table 2. Conditional logit (M1) **

Parameters Utility decrements

Attributes Estimate (S.E.) Levels

Time 0.27 (0.007) MO2*Time -0.03 (0.004) MO2 -0.12 MO3*Time -0.14 (0.004) MO3 -0.52 SC2*Time -0.03 (0.005) SC2 -0.12 SC3*Time -0.08 (0.005) SC3 -0.29 UA2*Time -0.03 (0.005) UA2 -0.10 UA3*Time -0.05 (0.005) UA3 -0.19 PD2*Time -0.03 (0.004) PD2 -0.11 PD3*Time -0.13 (0.004) PD3 -0.50 AD2*Time -0.04 (0.004) AD2 -0.14 AD3*Time -0.10 (0.004) AD3 -0.37 Log-likelihood -8920 No. of parameters 11 AIC 17862

27

**Table 3. MIXL using preference space: ** **~Log-normal and ** **~Normal (M2.1) **

Parameters Utility decrements

Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.

Time 0.53 (0.04) _{0.79 (0.04) }
MO2*Time -0.19 (0.03) 0.48 (0.03) MO2 -0.10 0.53
MO3*Time -1.10 (0.05) 0.79 (0.04) MO3 -0.68 0.85
SC2*Time -0.21 (0.04) 0.59 (0.04) SC2 -0.09 0.64
SC3*Time -0.60 (0.04) 0.71 (0.04) SC3 -0.35 0.78
UA2*Time -0.15 (0.03) 0.53 (0.03) UA2 -0.04 0.58
UA3*Time -0.40 (0.04) 0.59 (0.04) UA3 -0.20 0.63
PD2*Time -0.16 (0.03) 0.47 (0.03) PD2 -0.06 0.51
PD3*Time -1.03 (0.05) 0.78 (0.04) PD3 -0.65 0.84
AD2*Time -0.25 (0.03) 0.51 (0.03) AD2 -0.11 0.55
AD3*Time -0.86 (0.04) 0.79 (0.04) AD3 -0.49 0.82
Log-likelihood -7816
No. of parameters 77
AIC 15786

28

**Table 4. MIXL using preference space: ** **~Log-normal and ** **~Log-normal (M2.2) **

Parameters Utility decrements

Attributes Mean (S.E.) S.D. (S.E.) Levels Original (S.D.) Truncated (S.D.) Time -0.02 (0.08) 1.63 (0.10) MO2*Time -2.78 (0.24) 1.72 (0.23) MO2 -0.13 (0.23) -0.11 (0.12) MO3*Time -0.64 (0.08) 1.53 (0.09) MO3 -0.77 (0.79) -0.70 (0.56) SC2*Time -2.69 (0.25) 1.60 (0.21) SC2 -0.15 (0.30) -0.12 (0.15) SC3*Time -1.43 (0.11) 1.56 (0.12) SC3 -0.42 (0.60) -0.36 (0.36) UA2*Time -3.23 (0.35) 1.92 (0.30) UA2 -0.10 (0.22) -0.08 (0.10) UA3*Time -1.87 (0.14) 1.71 (0.15) UA3 -0.25 (0.30) -0.22 (0.20) PD2*Time -3.03 (0.28) 1.88 (0.22) PD2 -0.11 (0.23) -0.09 (0.11) PD3*Time -0.74 (0.08) 1.64 (0.10) PD3 -0.73 (0.82) -0.65 (0.56) AD2*Time -2.55 (0.20) 1.99 (0.19) AD2 -0.14 (0.22) -0.12 (0.13) AD3*Time -1.04 (0.10) 1.82 (0.11) AD3 -0.55 (0.64) -0.49 (0.43) Log-likelihood -7548.1 No. of parameters 77 AIC 15250

29

**Table 5. MIXL using QALY space: ** ~**Log-normal and ** **~Normal (M3.1) **

Parameters Utility decrements

Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.

Time 0.32 (0.12) 1.84 (0.14) MO2*Time -0.07 (0.02) 0.37 (0.02) MO2 -0.07 0.37 MO3*Time -0.77 (0.03) 0.61 (0.03) MO3 -0.77 0.61 SC2*Time -0.03 (0.03) 0.42 (0.02) SC2 -0.03 0.42 SC3*Time -0.33 (0.03) 0.54 (0.03) SC3 -0.33 0.54 UA2*Time 0.03 (0.03) 0.39 (0.02) UA2 0.03 0.39 UA3*Time -0.16 (0.03) 0.43 (0.02) UA3 -0.16 0.43 PD2*Time -0.04 (0.02) 0.37 (0.02) PD2 -0.04 0.37 PD3*Time -0.71 (0.03) 0.59 (0.03) PD3 -0.71 0.59 AD2*Time -0.08 (0.02) 0.37 (0.02) AD2 -0.08 0.37 AD3*Time -0.53 (0.03) 0.56 (0.03) AD3 -0.53 0.56 Log-likelihood -7716 No. of parameters 77 AIC 15586

30

**Table 6. MIXL using QALY space: ** **~Log-normal and ** **~Log-normal (M3.2)**

Parameters Utility decrements

Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.

Time 0.02 (0.08) 1.68 (0.10) MO2*Time -2.74 (0.23) 1.15 (0.14) MO2 -0.13 0.21 MO3*Time -0.63 (0.04) 0.83 (0.05) MO3 -0.76 0.75 SC2*Time -2.62 (0.21) 1.20 (0.14) SC2 -0.15 0.27 SC3*Time -1.41 (0.08) 1.03 (0.07) SC3 -0.42 0.57 UA2*Time -3.15 (0.33) 1.27 (0.17) UA2 -0.10 0.19 UA3*Time -1.83 (0.11) 0.90 (0.08) UA3 -0.24 0.27 PD2*Time -2.98 (0.25) 1.26 (0.14) PD2 -0.11 0.22 PD3*Time -0.74 (0.05) 0.90 (0.05) PD3 -0.72 0.80 AD2*Time -2.47 (0.17) 1.01 (0.10) AD2 -0.14 0.19 AD3*Time -1.02 (0.06) 0.90 (0.06) AD3 -0.54 0.61 Log-likelihood -7545 No. of parameters 77 AIC 15244

31

**Table 7. MIXL using QALY space: ** **~Log-normal and ** **~Johnson’s SB (M3.3) **

Parameters Utility decrements

Attributes Mean (S.E.) S.D. (S.E.) Bound (S.E.) Levels Mean S.D.
Time 0.13 (0.08) _{1.66 (0.09) }
MO2*Time -2.73 (0.63) 2.28 (0.71) -0.84 (0.39) MO2 -0.14 0.19
MO3*Time -0.21 (0.32) 1.54 (0.21) -1.36 (0.22) MO3 -0.63 0.37
SC2*Time -3.19 (0.72) 2.95 (0.97) -0.88 (0.27) SC2 -0.16 0.23
SC3*Time -1.02 (0.33) 2.48 (0.62) -0.97 (0.18) SC3 -0.36 0.32
UA2*Time -4.43 (0.92) 2.95 (0.82) -0.90 (0.21) UA2 -0.09 0.18
UA3*Time -1.45 (0.43) 2.12 (0.70) -0.79 (0.20) UA3 -0.23 0.23
PD2*Time -2.87 (0.85) 2.79 (1.06) -0.57 (0.23) PD2 -0.11 0.15
PD3*Time -0.59 (0.35) 1.42 (0.20) -1.53 (0.32) PD3 -0.59 0.38
AD2*Time -2.68 (0.62) 2.94 (0.89) -0.62 (0.11) AD2 -0.14 0.18
AD3*Time -1.42 (0.35) 1.32 (0.17) -1.91 (0.58) AD3 -0.47 0.38
Log-likelihood -7498
No. of parameters 87
AIC 15170

32

33

**Figure 2. Kernel densities of utility decrements estimated from the preference space **
**model using normal distribution assumption (M2.1) **

The left panel displays the kernel densities of level 2 decrements and the right panel displays the kernel densities of level 3 decrements. All densities were estimated using 100,000 random draws.

-10 -5 0 5 10 15 0 1 2 MO2 -20 -10 0 10 20 30 40 0 0.5 1 MO3 -10 -5 0 5 10 15 20 25 30 35 0 1 2 SC2 -20 -10 0 10 20 30 40 50 60 0 0.5 1 1.5 SC3 -20 -15 -10 -5 0 5 10 15 20 0 1 2 UA2 -20 -15 -10 -5 0 5 10 15 20 0 0.5 1 1.5 UA3 -10 -5 0 5 10 15 20 25 0 1 2 PD2 -25 -20 -15 -10 -5 0 5 10 15 0 0.5 1 PD3 -10 -5 0 5 10 15 20 0 1 2 AD2 -15 -10 -5 0 5 10 15 20 25 30 0 0.5 1 AD3

34

**Figure 3. Kernel densities of utility decrements estimated from the preference space **
**model using log-normal distribution assumption (M2.2) **

The left panel displays the kernel densities of level 3 decrements estimated using 100,000 random draws and the right panel displays the kernel densities of level 3 decrements estimated using these random draws with the smallest 2% discarded. -15 -10 -5 0 0 1 2 No truncation MO3 -3.50 -3 -2.5 -2 -1.5 -1 -0.5 0 1 2 2% truncation MO3 -30 -25 -20 -15 -10 -5 0 0 1 2 SC3 -2.50 -2 -1.5 -1 -0.5 0 2 4 SC3 -12 -10 -8 -6 -4 -2 0 0 5 UA3 -1.20 -1 -0.8 -0.6 -0.4 -0.2 0 5 UA3 -25 -20 -15 -10 -5 0 0 1 2 PD3 -3.50 -3 -2.5 -2 -1.5 -1 -0.5 0 1 2 PD3 -25 -20 -15 -10 -5 0 0 1 2 AD3 -2.50 -2 -1.5 -1 -0.5 0 1 2 AD3

35

**Figure 4. Utility decrements’ distributions estimated from two QALY space models **

The solid lines represent the distributions estimated from the QALY space model using log-normal distribution
assumption (*M3.2*) and the dotted lines represent the distributions estimated from the QALY space model using
Johnson’s SB distribution assumption (*M3.3*). The estimated log-normal distributions were all projected to the
negative real line.

-0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 MO2 -1.40 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.5 1 1.5 MO3 -0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 30 SC2 -1 -0.8 -0.6 -0.4 -0.2 0 0 1 2 3 SC3 -0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 50 100 UA2 -0.80 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 2 4 6 UA3 -0.70 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 20 40 PD2 -1.60 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.5 1 1.5 PD3 -0.70 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 30 AD2 -2 -1.5 -1 -0.5 0 0 1 2 AD3

36

**Figure 5. Predicted EQ-5D health state utility values **

The solid line represents the predictions from the conditional logit (*M1*) and the dotted line represents the
predictions from the preferred QALY space model (*M3.3*). The ranking of the 243 health states from left to right
is based on the predictions from the conditional logit.

0 50 100 150 200 250 -1.5 -1 -0.5 0 0.5 1 Health State Ut ilit y Valu es