Estimating health state utility values from discrete choice experiments – a
QALY space model approach
Yuanyuan Gu, Richard Norman, Rosalie Viney
Centre for Health Economics Research and Evaluation, University of Technology, Sydney, Australia
Centre for Health Economics Research and Evaluation, University of Technology, Sydney,
PO BOX 123, Broadway, NSW, 2007, Australia. E-mail: email@example.com
Phone: +61 2 9514 9886 Fax: +61 2 9514 4730
Using discrete choice experiments (DCEs) to estimate health state utility values has become an important alternative to the standard methods such as the Time Trade-Off (TTO). Studies using DCEs have typically used the conditional logit to estimate the underlying utility function. We show that this approach will lead to the valuation of each health state from “an average person” in the population. By contrast, the standard approach that has been
developed for the TTO method is based on estimating the “average valuation” for a health state within the population. These are fundamentally conceptually different approaches and have different interpretations in policy evaluation. In this paper we point out that it is also possible to estimate the “average valuation” for a health state when using DCEs. The
estimation approach is based on the mixed logit (MIXL). In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingness-to-pay literature. These methods are applied to a data set collected using the EQ-5D. The results demonstrate that the preferred QALY space model provides lower estimates of the utility values than the conditional logit, with the divergence increasing with worsening health states.
For the evaluation of new health technologies, it is conventional to model their effect using the quality-adjusted life year (QALY). QALYs combine quality of life and life expectancy into a summary measure that reflects preferences for these two dimensions of health gain (Pliskin, et al., 1980). The use of cost-utility analysis, with outcomes measured in terms of QALYs is now recommended by most health technology agencies internationally. A number of standard generic quality of life instruments have been developed for the purpose of measuring and valuing quality of life to facilitate estimation of QALYs directly from patient reported outcomes (Brazier, 2007). These instruments, known as multi-attribute utility instruments describe the health state space in terms of several dimensions of quality of life, and include a preference based scoring algorithm that can be interpreted on a cardinal scale. Typically, standard preference based valuation techniques such as the Standard Gamble (SG) and Time Trade-off (TTO) have been used to derive the scoring algorithms to assign the scores (known as utility values or QALY weights) to the universe of health states described by the instrument.
In the past decade, several authors have considered the use of discrete choice experiments (DCEs) to estimate health state utility values, as an alternative to TTO and SG based
techniques (Bansback, et al., 2012; Coast, et al., 2008; Flynn, 2010; Hakim and Pathak, 1999; Lancsar, et al., 2011; Ratcliffe, et al., 2009; Ryan, et al., 2006; Viney, et al., 2013). In the approach developed by Bansback, et al. (2012), and used by others, the health state utility values are estimated based on the conditional logit model. Broadly, in this approach, the conditional logit is used to estimate coefficients of the attributes that describe a health profile. Utility decrements associated with any move away from full health can be estimated for each dimension and level by computing the ratios between the estimated coefficients of the
non-4 time attributes and that of the time attribute.1 Utility values assigned to specific health states are then calculated by summing the relevant utility decrements and subtracting them from one. This approach has important conceptual differences from the approach that has been developed for the TTO and the SG. The standard approach that has been used in the QALY literature and in economic evaluation is based on finding the “average valuation” of a health state for the relevant population. Effectively this involves estimating the health state utility values for each individual in the population and then averaging these individual utility values over the whole population. In contrast, the approach using conditional logit is to find the valuation of a health status from “an average person” in the population. These are
conceptually different approaches and therefore have potentially different interpretations in policy evaluation.
In this paper we demonstrate that it is possible to estimate the “average valuation” for a health state when using DCEs. The estimation approach is based on the mixed logit (MIXL) which allows us to derive the population distributions of utility decrements and then the means of these distributions. In particular, we propose two types of models, one using preference space and the other using QALY space, a concept adapted from the willingness-to-pay (WTP) literature. The QALY space model has several advantages over the preference space model and the most significant one is that it allows us to directly estimate and compare different distribution assumptions for the utility decrements. A specific contribution is made to the estimation of a QALY space model with utility decrements assumed to follow a multivariate Johnson’s SB distribution. In the choice modelling literature this type of model has been very difficult to estimate due to an identification problem (Rigby and Burton, 2006; Train and Sonnier, 2005). In this paper we show that using informative priors on the bounds
5 may improve the identification and estimating the bounds with other parameters
simultaneously is possible.
In this study, we develop methods to estimate utility values for EQ-5D health states although these methods could be applied to other instruments that are based on a linear additive model, such as the SF-6D. These methods are applied to a data set which has been previously used to estimate health state utility values. The utility values estimated from the selected MIXL model and the conditional logit are compared.
2 Valuing EQ-5D health states using DCEs
The EQ-5D, developed by the EuroQol Group, is the most widely used multi-attribute utility instrument (Richardson, et al., 2011; Szende, et al., 2007). It has five dimensions, intended to represent the major areas in which health changes can manifest: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. For the most commonly used version of the EQ-5D, each dimension contains three levels, loosely classified as ‘No Problems’, ‘Some Problems’, and ‘Extreme Problems’. Details are shown in Table 1. There are 3^5 = 243 potential states in the descriptive system.
[Insert Table 1 around here]
The traditional approach to value these 243 states has been to administer a TTO preference based task for a sample of health states in a population based sample, and then use regression based modelling to impute the values of the remaining health states (Dolan, 1997; Szende, et
6 al., 2007; Viney, et al., 2011). There is an extensive literature on this broad approach
including a series of examinations on their limitations which might have led to the current trend of investigating alternative methods (Bosch, et al., 1998; Craig, et al., 2009; Norman, et al., 2010). For example there have been explorations of alternative specifications of the TTO, including Lead-Time and Lag-Time TTOs (Devlin, et al., 2011). A review of the
development of using DCEs to value health states can be found in Bansback, et al. (2012).
2.1 The DCE data
Viney, et al. (2013) have developed a DCE based algorithm for the Australian population, and the data from that study are used in the current analysis. This section briefly describes the experiment. The DCE was developed and administered to a sample of the Australian general population. Respondents were asked to choose between health profiles described in terms of EQ-5D profiles and survival attributes. Each choice set included three options: two health profile options and an immediate death option. Each health profile option in a choice set was defined by five attributes covering the dimensions of the EQ-5D and a survival duration attribute. Five survival durations (1, 2, 4, 8 and 16 years) were included in the experiment.
The third option of immediate death was included to allow for a complete ranking of health profiles over the “worse than death” to “full health” utility space. The task for the respondent was to identify which of the three options was considered the best, and which the worst, thus providing a complete ranking within each choice set. An example of a choice set is provided in Figure 1.
7 Details of the experimental design can be found in Viney, et al. (2013). Although each choice set included an immediate death option, only the choice between the two non-death profiles was considered.2 Therefore the analysis was based on a constructed choice set with only the rankings of these two profiles.
A total of 1,120 individuals consented to participate in the survey and were eligible to
participate. Of these, 1031 completed they survey, giving a response rate of 92.1%. Viney, et al. (2013) showed that overall the characteristics of these who completed the task are broadly comparable to the characteristics of the general Australian population. Each respondent faced 15 choice sets, which translates into 15,465 observations.
3 Using conditional logit
As Viney, et al. (2013) and Bansback, et al. (2012) both noted, an additive utility function with life expectancy and the levels of the EQ-5D would be inconsistent with the theoretical framework that underpins QALYs, because the QALY model requires that all health states have the same utility at death, i.e., as survival approaches zero, the systematic component of the utility function should similarly tend to zero. This satisfies the zero condition implicit in the QALY model (Bleichrodt and Johannesson, 1997; Bleichrodt, et al., 1997). Therefore, the utility of option in choice set for survey respondent is assumed to be
isj isj isj
isj TIME X TIME
Flynn, et al. (2008) argues that including the immediate death option in the choice modelling violates the random utility theory as some respondents may always choose survival over death no matter what health profiles are provided to them.
8 where represents a set of dummy variables relating to the levels of the EQ-5D health state, represents survival, and the error term are i.i.d. Gumbel distributions.
It is conventional to use the best level of each dimension as the reference category. In this case excludes the dummies representing the best levels with other 10 elements remaining: MO2, MO3, SC2, SC3, UA2, UA3, PD2, PD3, AD2, and AD3. For example, a health state denoted as 12221 should translate into a vector (0, 0, 1, 0, 1, 0, 1, 0, 0, 0).
In the current literature, the α and β terms have been assumed to be constant across
individuals and based on this assumption equation (1) leads to the conditional logit model.3 It is our baseline model and denoted as M1.
The estimation of α and β does not directly lead to the valuation of health states. An approach is needed to anchor the latent utility scale to the health state utility scale. There are several ways to derive this algorithm (Bansback, et al., 2012; Ratcliffe, et al., 2009; Viney, et al., 2013). The main idea is that the utility value of a health state is its marginal utility of TIME
on the latent scale, i.e.,
X TIME U .
In the case of full health, its marginal utility of TIME on the latent scale is
TIME U ,
which needs to be normalised to be 1 under the QALY model. Hence the normalising constant is α and the utility score for a health state is
9 X 1 .
The utility decrements are therefore β α⁄ .
3.1 Average valuation versus an average person’s valuation
As noted by Bansback, et al. (2012), the objective was to derive the population mean utility scores for all possible health states, which requires estimation of population mean utility decrements. The conditional logit parameter estimates and represent population mean preferences for attributes that describe a health profile. In effect, the estimate ⁄ represents an “average person” in the population whose preference parameters are exactly and . In this case, ⁄ is actually the estimate of this average person’s utility decrements. This is conceptually different from the population mean utility decrement which would be estimated by deriving, for each person in the target population for the person’s and , and using this to calculate that individual's utility decrements ⁄ . The population mean utility decrements are then computed as the average of all the individual decrements. Mathematically, this procedure can be described as
which may or may not be close to ⁄ , i.e., the ratio of means ( ∑ ∑ .
It is worth noting that when the TTO approach is used, this issue does not arise. When using TTO, a sample of health states are selected and respondents’ utility scores for these health states are elicited. These scores are then used as the dependent variable in a model which is
10 regressed on . In this case the regression coefficients, representing population mean utility decrements, are directly estimated using least squares (Dolan, 1997; Viney, et al., 2011).
4 Using MIXL: preference space versus QALY space
One possible way to estimate the population mean utility decrements is to use a framework based on random parameters. Equation (1) can be rewritten as
isj isj isj i isj i
isj TIME X TIME
(2) where and are both random. The induced model is called the MIXL. Under this framework, we first estimate the distributions of (i.e., the distributions of utility decrements) and then derive the means of these distributions.
To find the distribution of the ratio of two random variables is a longstanding problem. It has been particularly investigated in the WTP literature where represents the coefficient of price and represent the coefficients of non-price attributes in a DCE. Hensher and Greene (2003) and Daly, et al. (2012) discussed the major challenges in this area of research. The first challenge is that may not have finite moments unless is assumed to have some specific distributions such as log-normal. In our case, to assume to be a log-normal
random variable is reasonable because represents a person’s preference for the duration of life at perfect health condition and should be always positive.
The second challenge concerns the extreme values that arise from the reciprocal of a random variable. As long as can take very small values, 1/ will produce quite large numbers.
11 This problem is increasingly acute when ’s distribution has thick tails (e.g., student t and log-normal).
We therefore estimated two MIXL models:
M2.1: log and follow multivariate normal distribution with mean μ and variance ;
M2.2: log and log follow multivariate normal distribution with mean μ and variance
The second model (M2.2) has the advantage of assuring the decrements’ distributions are strictly negative and the disadvantage of inducing a lot of extreme values. In contrast, the first model (M2.1) may suffer less from the extreme values but it cannot guarantee each
individual’s utility decrements are strictly negative.
Another challenge that has not been addressed in the literature is that the distribution of is induced from our assumptions on the distributions of and so it is not possible to directly compare and test the distributions of . In the WTP literature, alternative methods have been developed to meet these challenges (Daly, et al., 2012). Among them the most
promising effort has been the invention of the WTP space model (Train and Weeks, 2005). The name “WTP space” was proposed as a contrast to the “preference space” on which the framework described above is based. The WTP space model is essentially a
re-parameterisation of equation (2) so that the distribution of can be directly assumed and estimated. We adapted this idea to our context and named the approach “QALY space model”. We now re-parameterise equation (2) as
For estimating M2.2 we need to change the signs of the data corresponding to to their opposite. This applies to other models when log-normal distribution is assumed for negative coefficients.
isj i isj isj
isj TIME X TIME
where / . Under this new framework we may estimate and compare models that assume different distributions on the utility decrements i. For the EQ-5D DCE data, we estimated three models:
M3.1: log( and follow multivariate normal distribution with mean μ and variance ;
M3.2: log( and log( ) follow multivariate normal distribution with mean μ and variance
M3.3: log( and log( follow multivariate normal distribution with mean μ and variance , where represents the size of (i.e., the number of utility decrements) and represents a positive unknown scalar parameter.
The model M3.1 and M3.2 assume normal and log-normal distributions for the utility
decrements respectively. Both have merits and flaws; the normal distribution has thin tails but cannot ensure everyone has negative decrements while the log-normal distribution is the opposite – it can ensure everyone has negative decrements but has a thick right tail that may lead to very large mean estimates.
Model M3.3 assumes Johnson’s SB distribution for the utility decrements, i.e.,
exp / 1 exp ) (4)
where is normally distributed. It is a special case of Johnson’s SB distribution with the lower bound set as 0 and the upper bound to be estimated.5 This distribution has both merits of normal and log-normal: thin tail and only taking positive numbers. Literature also
As the log-normal case, we changed the signs of the data corresponding to to their opposite. Therefore, a decrement’s distribution should have a lower bound - and an upper bound 0.
13 shows that a wide variety of distributions such as normal, log-normal, Weibull, and modified beta can be satisfactorily fitted by the Johnson’s SB distribution (Yu and Standish, 1990). Moreover, it has been shown that Johnson’s SB distribution can accommodate data with two modes spiked at the lower and upper bounds (Rigby and Burton, 2006). Based on these evidences we expected M3.3 to be the best modelling strategy, especially given that we have limited prior knowledge on the shape of the distributions of utility decrements.
5 Estimation and model comparison
The most popular methods for estimating MIXL are simulated maximum likelihood (SML) and Bayesian. Each has relative merits (Regier, et al., 2009; Train, 2003). The SML method is widely used, as most econometric and statistic software have developed standard routines to estimate MIXL based on this method.6 However, the Bayesian approach has several clear advantages that suit our case. First, we assume all the random coefficients are correlated which leads to the estimation of a large covariance matrix. The SML method can be very time consuming in this case. And even with large number of simulation draws, convergence is not always guaranteed. In contrast, the Bayesian approach estimates correlated MIXL and uncorrelated MIXL at almost the same speed (Train, 2003). Second, the SML method cannot estimate M3.3 without fixing the bounds while the Bayesian approach may estimate the bounds and other parameters simultaneously by using informative priors (we will show this in a moment). Therefore, in this study we chose to use the Bayesian method to estimate all the models including the conditional logit which is a special case of MIXL with its set as
For example, in STATA the –mixlogit– routine (Hole, 2007) can be used to estimate the MIXL models in preference space while the –gmnl– rountine (Gu, et al., 2013) can be modified to estimate MIXL models in QALY space or WTP space (Fiebig, et al., 2010; Greene and Hensher, 2010; Hole and Kolstad, 2012).
14 an empty matrix. The sampling scheme for estimating the MIXL models in preference space was given in Train (2003). The Matlab code written by Kenneth Train was used.7
It is also straightforward to estimate the MIXL models in QALY space including M3.1 and
M3.2; only a slight modification of the likelihood function is needed. The challenge comes from M3.3. As Train and Sonnier (2005) pointed out, in equation (4) the bound parameter is closely related to the variance of and thus the model is under-identified. In the choice modelling literature, this under-identification is usually solved by fixing the ’s at a series of constants and then selecting the model with the best log-likelihood estimate. This approach is called “grid search”. The grid search method works well on the univariate case but for the multivariate situation it can be extremely laborious (Rigby and Burton, 2006). In our case, we have a 10 dimension multivariate Johnson’s SB distribution and to identify the optimal point on the 10-D space is computationally infeasible. It is therefore necessary to seek an
Our approach was based on using informative prior distributions on the bounds so that Bayesian identifiability of the model can be obtained.8 The priors were log-normal
distributions, constructed based on the estimates from M3.2. More specifically, the chosen priors cover the largest 99th percentile of the 10 log-normal distributions estimated from M3.2, a reasonable assumption of the upper bound of the bound parameter . The bound
parameters were sampled as a vector using the random walk Metropolis-Hasting algorithm.9 In order to confidently use the post burn-in iterates for inference, it is necessary to check that the sampling scheme has converged. We judged convergence visually by running the
sampling scheme from three different initial positions and plotted various functionals of the
Available from http://elsa.berkeley.edu/~train/software.html 8
The mechanism is explained in detail in Scheines, et al. (1999).
15 iterates on the same graph. Successful convergence was indicated by the overlap of the functionals from the three chains. Following Train (2003), we adopted a frequentist interpretation of the Bayesian estimates, i.e., the posterior means and standard deviations were used as the point estimates and standard errors. The decrements’ distributions were simulated using 100,000 random draws. The log-likelihood was calculated at the point estimates using 100,000 random draws. We also used AIC as the criterion for model
comparisons.10 We did not use BIC as it penalises sample size heavily and thus, for very large sample sizes such as in this case, it is less informative in distinguishing between models that involve additional parameters.
6.1 Estimation of Conditional logit (M1)
The parameter estimates of the conditional logit model were given in Table 2. Utility decrements based on ⁄ were reported in the last column of Table 2. The interpretation of these numbers is that they represent an average person’s utility decrements.
[Insert Table 2 around here]
We also used AICc which penalizes sample size, but due to the large sample size, AICc is almost identical to AIC.
6.2 Estimation of MIXL using preference space (M2.1 and M2.2)
The parameter estimates of the two MIXL models using preference space were given in Table 3 (M2.1) and Table 4 (M2.2). Base on log-likelihood and AIC, both models were
substantially better than M1. M2.2 also completely dominated M2.1 in terms of model fit indicating that the log-normal distribution assumption on accommodated the data much better than the normal distribution assumption.
[Insert Table 3 around here]
[Insert Table 4 around here]
Based on these parameter estimates, the distributions of were simulated. The means of these distributions were reported in the tables as the population mean estimates of utility decrements. By comparing these two sets of estimates with the estimates from M1, we found that for the size of level 2 decrements (e.g., MO2), overall M2.2 M1 M2.1. For the size of level 3 decrements (e.g., MO3), overall M2.2 M2.1 M1. The differences for the level 3 decrements were particularly significant. To understand these differences, we plotted these simulated distributions of in Figure 2 (for M2.1) and Figure 3 (for level 3 decrements from M2.2).
17 [Insert Figure 3 around here]
From Figure 2 we can see that all the distributions from M2.1 have a significant proportion of the distribution greater than zero. This was particularly the case for the level 2 decrements. Given the EQ-5D is designed to be monotonic (level 2 is necessarily worse than level 1 in each dimension), this is a concern. This also explains why the mean decrements for level 2 decrements from M2.1 were clearly smaller than the estimates from other two models. Another finding is that extreme values existed on both tails. If these extreme values spread out evenly on both sides the mean estimates would not be affected but unfortunately it is not the case.
As shown in Figure 3, the problem of outliers is more severe in M2.2. All the distributions have very thick right tails indicating the population mean estimates are in fact determined by a group of extreme individuals. These extreme people may or may not exist in the real world, and it is questionable whether, in the policy making context the resulting valuations of health states should be driven by their valuations. To correct for this concern, a reasonable
approach is to drop the 1% or 2% most extreme values from the simulated data (Daly, et al., 2012; Hensher and Greene, 2003). In Figure 3, we plotted the decrements’ distributions again after discarding the 2% most extreme values. They appeared to have much thinner tails. We also re-calculated the means which were reported in the last column of Table 4. The level 3 decrements’ mean estimates are now very close to those from M2.1 but still significantly larger than those from M1.
6.3 Estimation of MIXL using QALY space (M3.1, M3.2, and M3.3)
The parameter estimates of the first two MIXL models using QALY space were given in Table 5 (M3.1) and Table 6 (M3.2). Base on log-likelihood and AIC, M3.2 was superior to
M3.1 in terms of model fit indicating that the log-normal distribution assumption on the utility decrements was superior to the normal distribution assumption. Indeed, under M3.1, some decrements’ estimated distributions had substantial proportions greater than zero, which potentially led to the underestimation of these mean decrements. In the case of UA2, the sign clearly violates the monotonic condition.
[Insert Table 5 around here]
[Insert Table 6 around here]
Another interesting comparison is M2.2 versus M3.2. The two models had very similar model fit with the latter slightly better. They also produced very similar utility decrements’
distributions indicating that whilst the distribution of from M2.2 is not in closed form it is in fact very close to log-normal distribution.
The parameter estimates of the final model M3.3 were given in Table 7. When estimating the model we used informative prior distributions on all the 10 bounds: ~ 0, σ ) where σ was chosen as 0.6. 0,0.36) covers a range from 0.25 to 4 (the 1st and 99th percentiles). The 99th percentiles of the 10 log-normal distributions estimated from M3.2 (the smallest 0.81 and the largest 3.64) all locate well in this range.
19 Base on log-likelihood and AIC, M3.3 dominated M3.2, confirming that Johnson’s SB is indeed a better distribution than log-normal for describing the utility decrement’s distribution. We plotted the estimated distributions from both models in Figure 4 which clearly
demonstrates Johnson’s SB’s advantage over normal: its shape is very close to log-normal but has a very thin tail. Unsurprisingly, the mean decrement estimates from this model were close to those from M2.2 and M3.2 where extreme values were discarded.
[Insert Table 7 around here]
[Insert Figure 4 around here]
7 Discussions and conclusions
This study explored different estimation methods to provide estimates of the health state utility values that take better account of the individual heterogeneity in EQ-5D data that have been obtained using DCEs. This is important not only because previous methods do not exploit any of the individual heterogeneity in the raw data, but also because the methods for estimating health state utility values from DCE data need to model explicitly the variance as well as the means of the model parameters to provide population mean estimates of the health state utility values.
In this paper we have argued that the previous methods that did not model variance such as conditional logit essentially derive “an average person’s valuation” which is conceptually different from the “average valuation” from the population, the standard approach used in
20 TTO studies. The paper has developed methods to derive an “average valuation” from the population using DCE data. This average valuation is then more comparable with the TTO approach.
Our methods were based on the MIXL framework and two types of models were proposed in this paper. The first is preference space modelling, which derives the distribution of utility decrements by taking the ratio of random variables. A significant problem associated with this approach is that the distributions are induced from our assumptions on these random parameters and so it is difficult to directly compare these induced distributions. For example, in our empirical analysis, we showed that M2.2 did have better model fit than M2.1. However, it did not translate into a fact that the mean decrement estimates from the former model were more reasonable than those from the latter. In fact, the estimates from M2.2 were severely affected by extreme values as the induced distributions had very thick right tails. Dropping these extreme values would make the mean estimates more robust but the choice of the appropriate point of truncation is arbitrary.
The second approach is based on an adaptation of methods developed in the WTP literature to deal with the drawbacks of preference space models. We have adapted the WTP space model to develop the second type of model in our analysis that is the QALY space model. It is essentially a re-parameterization of the preference space model so that the decrements’ distributions can be estimated and compared directly. In the empirical analysis we tried three different distribution assumptions for the 10 utility decrements: normal, log-normal, and Johnson’s SB. The last of these provided the best model fit.
Our analysis showcased the advantages of Johnson’s SB distribution over the normal and log-normal distributions the most commonly used ones in choice modelling practice. Johnson’s SB distribution has not been widely used since it was first introduced to the choice modelling
21 literature by Train and Sonnier (2005). The major reason may be the difficulty of its
estimation which often needs an extensive search of the bounds. In this paper, we showed that it is also possible to estimate the bounds by using informative priors on them. In the empirical analysis, we identified plausible priors from a model using log-normal assumptions whose estimation showed that the bounds are likely to be smaller than 3.64. Based on this, the prior distribution was constructed as 0, σ ) where σ was set as 0.6. We also did sensitivity analysis by changing σ and found that other values between 0.5 and 1 would lead to similar results but the convergence of the model became harder as σ increases.
By comparing the mean decrement estimates from M3.3 with the estimates from the
conditional logit model, we found that the latter appeared to have smaller sizes. The largest differences happened to the level 3 decrements, in particular, MO3 and AD3. It is worth mentioning that when we estimated the conditional logit we did not impose any constraints while for M3.3 we imposed a monotonic constraint on each dimension of the EQ-5D. To explore the impact of doing so, we re-estimated the conditional logit with its β constraint to be negative (i.e. to impose monotonicity), and doing so did not change the parameter estimates at all.
In Figure 5 we plotted the predicted values for all 243 health states described by the EQ-5D using estimates from M1 and M3.3. The ranking of the 243 health states from left to right is based on the predictions from the conditional logit approach. From the graph we can see that the conditional logit provides higher estimates of the utility values for almost all health states, with the divergence increasing with worsening health states.
22 The DCEs offer a valuable alternative approach to the estimation of utility values, and is an area with an increasing international profile. In particular, it can be argued that the task is less onerous for respondents. However, the methods for analysing the data, and then for
translating the result into an algorithm for use in economic evaluation remain contentious. We believe that the QALY space model approach outlined in this work represents a sensible way of using these data for this purpose, and should be explored using other generic quality of life instruments.
Bansback N, Brazier J, Tsuchiya A, Anis A. 2012. Using a discrete choice experiment to estimate health state utility values. Journal of Health Economics31: 306-318.
Bleichrodt H, Johannesson M. 1997. The validity of qalys: An experimental test of constant proportional tradeoff and utility independence. Medical Decision Making17: 21-32.
Bleichrodt N, Wakker P, Johannesson M. 1997. Characterizing qalys by risk neutrality.
Journal of Risk and Uncertainty15: 107-114.
Bosch JL, Hammitt JK, Weinstein MC, Hunink MG. 1998. Estimating general-population utilities using one binary-gamble question per respondent. Medical Decision Making18: 381-390.
Brazier J. 2007. Measuring and valuing health benefits for economic evaluation. Oxford University Press: Oxford ; New York.
Coast J, Flynn TN, Natarajan L, Sproston K, Lewis J, Louviere JJ, et al. 2008. Valuing the icecap capability index for older people. Social Science & Medicine67: 874-882.
Craig BM, Busschbach JJ, Salomon JA. 2009. Keep it simple: Ranking health states yields values similar to cardinal measurement approaches. J Clin Epidemiol62: 296-305.
Daly A, Hess S, Train K. 2012. Assuring finite moments for willingness to pay in random coefficient models. Transportation39: 19-31.
23 Devlin NJ, Tsuchiya A, Buckingham K, Tilling C. 2011. A uniform time trade off method for states better and worse than dead: Feasibility study of the 'lead time' approach. Health
Dolan P. 1997. Modeling valuations for euroqol health states. Medical Care35: 1095-1108. Fiebig DG, Keane MP, Louviere J, Wasi N. 2010. The generalized multinomial logit model: Accounting for scale and coefficient heterogeneity. Marketing Science29: 393-421.
Flynn TN. 2010. Using conjoint analysis and choice experiments to estimate qaly values: Issues to consider. Pharmacoeconomics28: 711-722.
Flynn TN, Louviere JJ, Marley AA, Coast J, Peters TJ. 2008. Rescaling quality of life values from discrete choice experiments for use as qalys: A cautionary tale. Population Health
Greene WH, Hensher DA. 2010. Does scale heterogeneity across individuals matter? An empirical assessment of alternative logit models. Transportation37: 413–428.
Gu Y, Hole AR, Knox S. 2013. Estimating the generalized multinomial logit model in stata.
The Stata Journalin press.
Hakim Z, Pathak DS. 1999. Modelling the euroqol data: A comparison of discrete choice conjoint and conditional preference modelling. Health Economics8: 103-116.
Hensher DA, Greene WH. 2003. The mixed logit model: The state of practice.
Hole AR. 2007. Fitting mixed logit models by using maximum simulated likelihood. The
Stata Journal7: 388-401.
Hole AR, Kolstad JR. 2012. Mixed logit estimation of willingness to pay distributions: A comparison of models in preference and wtp space using data from a health-related choice experiment. Empirical Economics42: 445-469.
Lancsar E, Wildman J, Donaldson C, Ryan M, Baker R. 2011. Deriving distributional
weights for qalys through discrete choice experiments. Journal of Health Economics30: 466-478.
Norman R, King MT, Clarke D, Viney R, Cronin P, Street D. 2010. Does mode of
administration matter? Comparison of online and face-to-face administration of a time trade-off task. Qual Life Res19: 499-508.
Pliskin JS, Shepard DS, Weinstein MC. 1980. Utility-functions for life years and health-status. Operations Research28: 206-224.
Ratcliffe J, Brazier J, Tsuchiya A, Symonds T, Brown M. 2009. Using dce and ranking data to estimate cardinal values for health states for deriving a preference-based single index from the sexual quality of life questionnaire. Health Economics18: 1261-1276.
24 Regier DA, Ryan M, Phimister E, Marra CA. 2009. Bayesian and classical estimation of mixed logit: An application to genetic testing. Journal of Health Economics28: 598-610. Richardson J, McKie J, Bariola E. Review and critique of health related multi attribute utility instruments. Centre for Health Economcs, Monash University, 2011.
Rigby D, Burton M. 2006. Modeling disinterest and dislike: A bounded bayesian mixed logit model of the uk market for gm food. Environmental and Resource Economics33: 485-509. Ryan M, Netten A, Skatun D, Smith P. 2006. Using discrete choice experiments to estimate a preference-based measure of outcome--an application to social care for older people. Journal
of Health Economics25: 927-944.
Scheines R, Hoijtink H, Boomsma A. 1999. Bayesian estimation and testing of structural equation models. PSYCHOMETRIKA64: 37-52.
Szende A, Oppe M, Devlin N, editors. Eq-5d value sets: Inventory, comparative review and
user guide. Dordrecht, The Netherlands: Springer, 2007.
Train K. 2003. Discrete choice methods with simulation. Cambridge University Press: New York.
Train K, Sonnier G. Mixed logit with bounded distributions of correlated partworths. In: Scarpa R, Alberini A, editors. Applications of simulation methods in environmental and
resource economics. . Dordrecht, The Netherlands: Springer Publisher, 2005:117-134
Train K, Weeks M. Discrete choice models in preference space and willingness-to-pay space. In: Scarpa R, Alberini A, editors. Applications of simulation methods in environmental and
resource economics. . Dordrecht, The Netherlands: Springer Publisher, 2005:1-16.
Viney R, Norman R, Brazier J, Cronin P, King M, Ratcliffe J, et al. 2013. An australian discrete choice experiment to value eq-5d health states. Health Economicsin press. Viney R, Norman R, King MT, Cronin P, Street DJ, Knox S, et al. 2011. Time trade-off derived eq-5d weights for australia. Value Health14: 928-936.
Yu AB, Standish N. 1990. A study of particle size distribution. Powder Technology62: 101-118.
Table 1. The EQ-5D instrument
Dimension Level Description
Mobility (MO) 1 I have no problem in walking about 2 I have some problems in walking about 3 I am confined to bed
Self-Care (SC) 1 I have no problems with self-care
2 I have some problems washing and dressing myself 3 I am unable to wash and dress myself
Usual Activities (UA)
1 I have no problems with performing my usual activities 2 I have some problems with performing my usual activities 3 I am unable to perform my usual activities
Pain / Discomfort
1 I have no pain or discomfort 2 I have moderate pain or discomfort 3 I have extreme pain or discomfort Anxiety /
1 I am not anxious or depressed
2 I am moderately anxious or depressed 3 I am extremely anxious or depressed
Table 2. Conditional logit (M1)
Parameters Utility decrements
Attributes Estimate (S.E.) Levels
Time 0.27 (0.007) MO2*Time -0.03 (0.004) MO2 -0.12 MO3*Time -0.14 (0.004) MO3 -0.52 SC2*Time -0.03 (0.005) SC2 -0.12 SC3*Time -0.08 (0.005) SC3 -0.29 UA2*Time -0.03 (0.005) UA2 -0.10 UA3*Time -0.05 (0.005) UA3 -0.19 PD2*Time -0.03 (0.004) PD2 -0.11 PD3*Time -0.13 (0.004) PD3 -0.50 AD2*Time -0.04 (0.004) AD2 -0.14 AD3*Time -0.10 (0.004) AD3 -0.37 Log-likelihood -8920 No. of parameters 11 AIC 17862
Table 3. MIXL using preference space: ~Log-normal and ~Normal (M2.1)
Parameters Utility decrements
Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.
Time 0.53 (0.04) 0.79 (0.04) MO2*Time -0.19 (0.03) 0.48 (0.03) MO2 -0.10 0.53 MO3*Time -1.10 (0.05) 0.79 (0.04) MO3 -0.68 0.85 SC2*Time -0.21 (0.04) 0.59 (0.04) SC2 -0.09 0.64 SC3*Time -0.60 (0.04) 0.71 (0.04) SC3 -0.35 0.78 UA2*Time -0.15 (0.03) 0.53 (0.03) UA2 -0.04 0.58 UA3*Time -0.40 (0.04) 0.59 (0.04) UA3 -0.20 0.63 PD2*Time -0.16 (0.03) 0.47 (0.03) PD2 -0.06 0.51 PD3*Time -1.03 (0.05) 0.78 (0.04) PD3 -0.65 0.84 AD2*Time -0.25 (0.03) 0.51 (0.03) AD2 -0.11 0.55 AD3*Time -0.86 (0.04) 0.79 (0.04) AD3 -0.49 0.82 Log-likelihood -7816 No. of parameters 77 AIC 15786
Table 4. MIXL using preference space: ~Log-normal and ~Log-normal (M2.2)
Parameters Utility decrements
Attributes Mean (S.E.) S.D. (S.E.) Levels Original (S.D.) Truncated (S.D.) Time -0.02 (0.08) 1.63 (0.10) MO2*Time -2.78 (0.24) 1.72 (0.23) MO2 -0.13 (0.23) -0.11 (0.12) MO3*Time -0.64 (0.08) 1.53 (0.09) MO3 -0.77 (0.79) -0.70 (0.56) SC2*Time -2.69 (0.25) 1.60 (0.21) SC2 -0.15 (0.30) -0.12 (0.15) SC3*Time -1.43 (0.11) 1.56 (0.12) SC3 -0.42 (0.60) -0.36 (0.36) UA2*Time -3.23 (0.35) 1.92 (0.30) UA2 -0.10 (0.22) -0.08 (0.10) UA3*Time -1.87 (0.14) 1.71 (0.15) UA3 -0.25 (0.30) -0.22 (0.20) PD2*Time -3.03 (0.28) 1.88 (0.22) PD2 -0.11 (0.23) -0.09 (0.11) PD3*Time -0.74 (0.08) 1.64 (0.10) PD3 -0.73 (0.82) -0.65 (0.56) AD2*Time -2.55 (0.20) 1.99 (0.19) AD2 -0.14 (0.22) -0.12 (0.13) AD3*Time -1.04 (0.10) 1.82 (0.11) AD3 -0.55 (0.64) -0.49 (0.43) Log-likelihood -7548.1 No. of parameters 77 AIC 15250
Table 5. MIXL using QALY space: ~Log-normal and ~Normal (M3.1)
Parameters Utility decrements
Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.
Time 0.32 (0.12) 1.84 (0.14) MO2*Time -0.07 (0.02) 0.37 (0.02) MO2 -0.07 0.37 MO3*Time -0.77 (0.03) 0.61 (0.03) MO3 -0.77 0.61 SC2*Time -0.03 (0.03) 0.42 (0.02) SC2 -0.03 0.42 SC3*Time -0.33 (0.03) 0.54 (0.03) SC3 -0.33 0.54 UA2*Time 0.03 (0.03) 0.39 (0.02) UA2 0.03 0.39 UA3*Time -0.16 (0.03) 0.43 (0.02) UA3 -0.16 0.43 PD2*Time -0.04 (0.02) 0.37 (0.02) PD2 -0.04 0.37 PD3*Time -0.71 (0.03) 0.59 (0.03) PD3 -0.71 0.59 AD2*Time -0.08 (0.02) 0.37 (0.02) AD2 -0.08 0.37 AD3*Time -0.53 (0.03) 0.56 (0.03) AD3 -0.53 0.56 Log-likelihood -7716 No. of parameters 77 AIC 15586
Table 6. MIXL using QALY space: ~Log-normal and ~Log-normal (M3.2)
Parameters Utility decrements
Attributes Mean (S.E.) S.D. (S.E.) Levels Mean S.D.
Time 0.02 (0.08) 1.68 (0.10) MO2*Time -2.74 (0.23) 1.15 (0.14) MO2 -0.13 0.21 MO3*Time -0.63 (0.04) 0.83 (0.05) MO3 -0.76 0.75 SC2*Time -2.62 (0.21) 1.20 (0.14) SC2 -0.15 0.27 SC3*Time -1.41 (0.08) 1.03 (0.07) SC3 -0.42 0.57 UA2*Time -3.15 (0.33) 1.27 (0.17) UA2 -0.10 0.19 UA3*Time -1.83 (0.11) 0.90 (0.08) UA3 -0.24 0.27 PD2*Time -2.98 (0.25) 1.26 (0.14) PD2 -0.11 0.22 PD3*Time -0.74 (0.05) 0.90 (0.05) PD3 -0.72 0.80 AD2*Time -2.47 (0.17) 1.01 (0.10) AD2 -0.14 0.19 AD3*Time -1.02 (0.06) 0.90 (0.06) AD3 -0.54 0.61 Log-likelihood -7545 No. of parameters 77 AIC 15244
Table 7. MIXL using QALY space: ~Log-normal and ~Johnson’s SB (M3.3)
Parameters Utility decrements
Attributes Mean (S.E.) S.D. (S.E.) Bound (S.E.) Levels Mean S.D. Time 0.13 (0.08) 1.66 (0.09) MO2*Time -2.73 (0.63) 2.28 (0.71) -0.84 (0.39) MO2 -0.14 0.19 MO3*Time -0.21 (0.32) 1.54 (0.21) -1.36 (0.22) MO3 -0.63 0.37 SC2*Time -3.19 (0.72) 2.95 (0.97) -0.88 (0.27) SC2 -0.16 0.23 SC3*Time -1.02 (0.33) 2.48 (0.62) -0.97 (0.18) SC3 -0.36 0.32 UA2*Time -4.43 (0.92) 2.95 (0.82) -0.90 (0.21) UA2 -0.09 0.18 UA3*Time -1.45 (0.43) 2.12 (0.70) -0.79 (0.20) UA3 -0.23 0.23 PD2*Time -2.87 (0.85) 2.79 (1.06) -0.57 (0.23) PD2 -0.11 0.15 PD3*Time -0.59 (0.35) 1.42 (0.20) -1.53 (0.32) PD3 -0.59 0.38 AD2*Time -2.68 (0.62) 2.94 (0.89) -0.62 (0.11) AD2 -0.14 0.18 AD3*Time -1.42 (0.35) 1.32 (0.17) -1.91 (0.58) AD3 -0.47 0.38 Log-likelihood -7498 No. of parameters 87 AIC 15170
Figure 2. Kernel densities of utility decrements estimated from the preference space model using normal distribution assumption (M2.1)
The left panel displays the kernel densities of level 2 decrements and the right panel displays the kernel densities of level 3 decrements. All densities were estimated using 100,000 random draws.
-10 -5 0 5 10 15 0 1 2 MO2 -20 -10 0 10 20 30 40 0 0.5 1 MO3 -10 -5 0 5 10 15 20 25 30 35 0 1 2 SC2 -20 -10 0 10 20 30 40 50 60 0 0.5 1 1.5 SC3 -20 -15 -10 -5 0 5 10 15 20 0 1 2 UA2 -20 -15 -10 -5 0 5 10 15 20 0 0.5 1 1.5 UA3 -10 -5 0 5 10 15 20 25 0 1 2 PD2 -25 -20 -15 -10 -5 0 5 10 15 0 0.5 1 PD3 -10 -5 0 5 10 15 20 0 1 2 AD2 -15 -10 -5 0 5 10 15 20 25 30 0 0.5 1 AD3
Figure 3. Kernel densities of utility decrements estimated from the preference space model using log-normal distribution assumption (M2.2)
The left panel displays the kernel densities of level 3 decrements estimated using 100,000 random draws and the right panel displays the kernel densities of level 3 decrements estimated using these random draws with the smallest 2% discarded. -15 -10 -5 0 0 1 2 No truncation MO3 -3.50 -3 -2.5 -2 -1.5 -1 -0.5 0 1 2 2% truncation MO3 -30 -25 -20 -15 -10 -5 0 0 1 2 SC3 -2.50 -2 -1.5 -1 -0.5 0 2 4 SC3 -12 -10 -8 -6 -4 -2 0 0 5 UA3 -1.20 -1 -0.8 -0.6 -0.4 -0.2 0 5 UA3 -25 -20 -15 -10 -5 0 0 1 2 PD3 -3.50 -3 -2.5 -2 -1.5 -1 -0.5 0 1 2 PD3 -25 -20 -15 -10 -5 0 0 1 2 AD3 -2.50 -2 -1.5 -1 -0.5 0 1 2 AD3
Figure 4. Utility decrements’ distributions estimated from two QALY space models
The solid lines represent the distributions estimated from the QALY space model using log-normal distribution assumption (M3.2) and the dotted lines represent the distributions estimated from the QALY space model using Johnson’s SB distribution assumption (M3.3). The estimated log-normal distributions were all projected to the negative real line.
-0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 MO2 -1.40 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.5 1 1.5 MO3 -0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 30 SC2 -1 -0.8 -0.6 -0.4 -0.2 0 0 1 2 3 SC3 -0.90 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 50 100 UA2 -0.80 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 2 4 6 UA3 -0.70 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 20 40 PD2 -1.60 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.5 1 1.5 PD3 -0.70 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 10 20 30 AD2 -2 -1.5 -1 -0.5 0 0 1 2 AD3
Figure 5. Predicted EQ-5D health state utility values
The solid line represents the predictions from the conditional logit (M1) and the dotted line represents the predictions from the preferred QALY space model (M3.3). The ranking of the 243 health states from left to right is based on the predictions from the conditional logit.
0 50 100 150 200 250 -1.5 -1 -0.5 0 0.5 1 Health State Ut ilit y Valu es