4 In 2009–2012, one study explored how changing the number of
responses elicited from respondents might affect estimates of WTP [ 204 ]; another looked at parents’ preferences for management of attention-deficit hyperactivity disorder [ 206 ]; one study looked at general public preferences for long-term care [ 137 ]; another two studies looked at preferences for human papillomavirus vaccine, one case looking a societal preferences [ 207 ] and the other [ 63 ] looking at mothers’ preferences; another study looked at the valuation of diagnostic testing for idiopathic developmental disability by the general population [ 208 ]; another looked at various stakeholder groups’ preferences for coagulation factor concentrates to treat hemophilia [ 145 ]; one study looked at general public preferences for tele-endocopy services [ 158 ]; another compared Dutch and German preferences for health insurance amongst their populations [ 214 ]; one paper looked at public and decision maker preferences for pharma- ceutical subsidy decisions [ 215 ]; one study explored how individuals perceive various coronary heart disease factors [ 203 ], whilst another described the relative importance of major adverse cardiac and cerebrovascular events to be used when analyzing trials [ 212 ]. Two other DCEs were performed on the area of quality improvement; one investigated how to best disseminate evidence-based practices to addiction service providers and administrators [ 205 ], while the other was used to investigate which indicators had the greatest impact on the decisions of health service inspectors concerning the assessment of quality of mental health care [ 211 ]. Other applications included a study on preferences of health workers in Burkina Faso for health- insurance payment mechanisms [ 209 ]; a study on how respondents valued mortality risk attributable to climate change reductions [ 210 ]; and a study on the preferences for reducing contaminated sites to reduce the risk for cancer [ 213 ].
Primarily, the chosen attributes and, thus, the pref- erences elicited by the DCE depend on the specific research question. Even if the study objective is the same, the precise issue might differ. Pedersen et al.  and Turner et al.  for example both aim to assess primary care consultations in general, but while the first assess preferences regarding different organizational characteristics, the latter estimate the relative importance of continuity of care compared to other aspects of primary care. Therefore, unsurpris- ingly, these two studies obtain different results regarding the preferences for primary health care. Pedersen et al. find the attribute “ Waiting time ” as being the most important one and Turner and col- leagues ascertain the process attribute “ Information and explanation ” to be most significant. Their differ- ent research questions may cause a different selection of attributes and consequently different results although the study objectives are the same. In this context, a replication study using the same research question, the same attributes and levels as an existing DCE but comparing different regions and/or populations would be a useful addition to the literature.
Methods: We searched the published literature for papers that conducted DCEs to assess user preferences for HIV testing.
Findings: We identi ﬁed 237 publications; 14 studies conducted in 10 countries met inclusion criteria. Overall, test cost was one of the strongest drivers of preference, with participants preferring free or very low-cost testing. Conﬁdentiality was a salient concern, particularly among key populations and persons who never tested. Participants in resource-limited settings preferred short travel distance and integration of HIV testing with other services. There was substantial heterogeneity across participant characteristics. For example, while women preferred home testing, high-risk groups (e.g. male porters, female bar workers) and men who had not tested in the last year preferred traveling a short distance for testing. HIV self-testing (HIVST) had high acceptability, particularly among those who had never HIV tested, although most users preferred blood- based sample collection over oral swabs. Participants highly valued post-test counselling availability after HIVST.
Discretechoiceexperiments (DCE) and conjoint analysis (CA) are increasingly used to address health policy issues. This is because the DCE and CA approaches have theoretical foundations in the characteristics theory of demand, which assumes goods, services, or healthcare provision, can be valued in terms of their characteristics (or attributes). As a result, such analysis is grounded in economic theory, lending theoretical validity to this approach.
Discretechoiceexperiments have become increasingly used in health services research, but primarily to assess patient-stated preferences and willingness to pay for dif- ferent models of health care service delivery [47-50]. There are still only a small number of studies that have used this methodology to analyse the job preferences of health care providers. The aim of this article is to review the existing literature on the use of discretechoice experi- ments to study HR issues in both developed and develop- ing countries. The intention is to draw lessons on the value of this relatively new methodology to inform HR policy development in developing countries. This paper first introduces the basic principles of DCE methods, then the methodology of our literaturereview is described. The main part of the paper describes the DCE studies we iden- tified and summarizes their findings. The final discussion focuses on some cross-cutting lessons as well as the advan- tages and limitations of DCE methods for HR research. Methods
However, our results should be considered in light of several limitations. First, gray literature such as reports, policy documents, and dissertations were not included in the review, nor were protocol papers. Although such reports may be relevant to the topic of interest, gray lit- erature is not peer-reviewed and therefore may not rise to the high standards of quality associated with peer- reviewed publications. Inclusion of gray literature would also have biased the results given that papers re- lated to work known by the authors and their network would have been more likely to have been identified than other works. Second, there are limitations to using volume of research output as a measure of research ef- fort. Due to publication bias, studies with unfavorable results may not be published, leading to under- representation of the actual volume of work carried out in the field. Finally, it is unclear if DCE use effectively influenced implementation strategies and subsequent outcomes, due to the lack of follow-up data in these studies. Whether stakeholder DCEs direct implementa- tion activities to the best approach or outcome remains to be demonstrated in future studies.
4.4. Limitations of the systematic review approach
Although we took care to avoid missing relevant articles e.g. by using a test library, the search strings may be insuf ﬁciently sensi- tive to capture all available studies on the reliability and validity of DCE in the non-market environmental valuation literature. Adding more search terms might have permitted a more sensitive search, but would have been at the cost of speci ﬁcity ( Pullin and Stewart, 2006 ). The diverse ways in which reliability and validity are con- ceptualised and reported in the literature prevent a more comprehensive search without much greater resources. The use of consistent terminology in validity and reliability testing would assist future systematic reviews. Nevertheless, we believe that the results are representative of studies testing reliability and validity and provide a good assessment of the extent to which the peer- reviewed literature has reported empirical evidence of the reli- ability and validity of DCEs. Given the diversity and relative paucity of studies, especially the very small sub-samples for speci ﬁc types of validity tests, we did not attempt a meta-analysis. Moreover, the very different contexts, treatments and DCE designs prohibit us from identifying factors that determine whether a speci ﬁc method made DCEs more likely to be reliable and/or valid. Unless deter- mining these factors was speci ﬁcally the focus of a controlled test (within a study), such an analysis would need a large number of studies to control for confounding variables.
The NGT could also be useful in selecting the initial set of attributes. Participants could first be asked individually to generate a list of important medication attributes, fol- lowed by discussion refining the list by adding, merging, or removing attributes, and by the final individual ranking of the most important attributes. This was not done in our study because many potential attributes were already identified by the literaturereview and we also aimed to assess the impact of the NGT session on rank order. However, our patients had the opportunity to add attributes to the list. Our study could
The attributes and levels describing the different consultation scenarios were identified through a review of the existing literature and semi-structured interviews to primary care managers and District managers of Tuscan Local Health Authorities and they were validated in a focus group. In order to avoid placing a significant cognitive burden on respondents that could alter the trade off between the attributes , the number of attributes selected was limited to the three most important factors emerged . Considering also the results of previous DCEs [4, 10-12, 20], plausible levels to each of the attributes were assigned (Table 1) [18, 19]. A full factorial design has been adopted  and 3 3 (27) combinations were obtained. The 27
MNL was for decades the workhorse of choice modeling and we recommend it as a natural first model to estimate. Where to go next after MNL is not always clear and depends on the research objectives but a basic first step would be the estimation of a mixed logit model to account for the panel structure of the data, providing more reliable standard errors and move away from proportional substitution (by relaxing IIA). It also allows for unobserved preference heterogeneity by allowing coefficients to vary randomly across individuals. Whether one takes a Frequentist or Bayesian approach to the estimation of mixed logit in part comes down to preferences of the researcher but with the use of simulation methods the distinction between the two approaches is becoming less pronounced and recent evidence suggests little difference in estimates . Focus on mixed logit in the healtheconomicsliterature has often been motivated by interest in unobserved heterogeneity. To our minds, the other two reasons for exploring mixed logit are at least as important. Having said that, exploration of heterogeneity can be important and has received much attention. If one views the distribution of preference heterogeneity to be discrete rather than continuous a latent class model would be appropriate. By allowing for different preference parameters between classes an advantage of latent class modeling is it allows heterogeneity to be interpreted in terms of class type and class membership. Another form of heterogeneity gaining attention is scale heterogeneity. A modification to the MNL leads to the heteroscedastic logit which allows for between person differences in scale to be modeled as a function of covariates. Alternatively, interest in unobserved scale
The observed increase in the total number of DCEs in healtheconomics was similar to the trend reported in prior reviews [ 6 , 7 , 11 ], but less consistent from year to year (Fig. 3 ). This less consistent increase might be explained by the presence of many competing stated preference meth- ods [ 4 , 5 , 347 ]. We hypothesise that other methods may be increasing in popularity or becoming more useful in health settings [ 348 ]. Examples of such methods may include BWS case 1 and case 2 [ 349 – 351 ], which were not included in this review. Additionally, in this review, we excluded a sig- nificant number of studies (n = 31) making methodological considerations about DCEs rather than conducting empiri- cal research. The presence of such studies may indicate that knowledge about DCEs in health has increased and there is more focus on studies to develop the method. Exam- ples include simulation studies about experimental design, studies comparing the outcomes of a DCE to other stated preference method outcomes and studies examining differ- ent model specifications [ 352 – 354 ]. This might be another explanation for the less consistent increase in DCE applica- tion studies.
11 studies (39.3%) after 2010 (Table 3). Mixed logit relaxes the restrictive assumptions of the commonly used multi- nominal logit model by allowing for heterogeneity of preferences for attributes between participants, which is likely to be high in the fairly diverse health worker populations covered by many of these studies. It does this by introducing an individual-level utility estimate for each attribute calculated from the mean utility estimate for that attribute and an individual-specific deviation from the mean [29,70]. Although flexible, the mixed logit model has a number of challenges, such as the choice of parameters to define as random. Moreover, the size of these individual-specific variances are likely to vary within and between participants, reducing the precision of utility estimates rather than increasing it. The latent class model has the same advantage over the multinominal logit as mixed logit, however assumes that there are two or more classes (or groups) of participants underlying the data with more homogeneous tastes. The distribution of participants belonging to these classes is not known to the researcher, but is assumed to be related to observed variables such as attitudes and/or socio-demographic characteristics . Latent class models have been used only rarely in health DCEs, with none from this review and just one in de Bekker-Grob et al. , however
Householdsmight be following the CBHI enrollment decision of other households. To asses and test herding bias, the respondents who have CBHI awareness were asked the following question: My neighbors and friends were important sources of information when I decide to join/Not to join community-based health insurance scheme. However, for respondents who do not have CBHI awareness were asked in the following form: In the future when I decide to join or not to join community-based health insurance scheme my neighbors and friends will be an important source of information. This variable was measured on a four-point Likert scale (one=Strongly Agree; 4=Strongly disagree). Afterward, the data were regrouped into two categories as "Agree and disagree” for numerical significance and to simplify the analyses and data interpretation.
The DCE is estimated in two ways, first using a linear utility function, including the main effects only, and second using a non-linear utility function, allowing for interactions and terms of higher- order. The procedure of Hosmer and Lemeshow (2000) is used for the specification of non-linearities, following the statistical model-specification literature. In a first step, the utility functions are tested for misfits, using a variety of goodness-of-fit measures as proposed by Basu et al. (2004) and Basu et al. (2006). The linear utility function is found to perform better with regard to over-fitting, but to have serious misfit problems. However, the non-linear utility function is found to present the data well. In a second step, the utility functions are compared with regard to estimated WTP, stated in terms of health insurance contributions. The results are found to significantly differ in terms of statistical significance and in magnitude for three out of five attributes. (1) With the linear utility function, respondents are willing to pay CHF 24 per month for the reimbursement of additional alternative treatment methods. With the non-linear specification, estimated WTP decreases to CHF 12 per month. (2) The linear specification proposes that respondents must be compensated with a decrease in contributions of CHF 12 per month in order to accept the reimbursement of only the cheapest pharmaceuticals (generics). However, this willingness-to-accept more than doubles with the non-linear utility function at CHF 29 per month. (3) An increase in the copayment rate from 10 to 20 percent with a simultaneous increase in the maximum copayment from CHF 600 to CHF 1,200 per year must be compensated with a decrease in contributions of CHF 20 per month with the linear specification, but rises to CHF 32 per month with the non-linear specification.
this model offers much to health workforce DCEs. As described earlier, quite heterogenous populations are typically included in health DCEs, for which latent class models may be able to separate into subgroups with more similar (and accurate) preferences depending on characteristics, for example years of work experience or growing up in a rural area. Four studies (14.8%) used an extension of mixed logit, generalised multinomial logit models, with three of these finding a better fit to data than comparator mixed logit or logit models [51,54,58,62]. Generalised multinomial logit models are able to account for scale heterogeneity of preferences as well as taste heterogeneity, i.e. utility estimates might vary between individuals not only because of differences in preferences, but also due to differences in variance. Some individuals may be much more certain of their choice than others or use decision heuristics that reduce variance, whilst other participants may not understand the task well or make mistakes that increase variance . Fiebig et al.  assert that this model can better account for responses from these “extreme” participants, providing an improved fit to the data. This is undoubtedly an attractive feature for DCEs examining labour market decisions (where participants may be more uncertain) in populations of workers that are typically time-poor and highly pressurised (thus perhaps more likely to employ decision heuris- tics or make mistakes). This may explain its popularity here, with four studies employing it compared to none in de Bekker-Grob et al. .
4 The behaviors and decisions considered in behavioral experiments in health therefore usually take place, or are framed, in a health, healthcare, or medical setting or context.
The term “behavioral” in “behavioral experiments in health” requires a first clarification. Common to experimental economics and behavioral science, in fact, the outcomes of behavioral experiments in health are “behavioral” in that they consist of directly observable and measurable behavioral responses or directly revealed preferences, rather than self-reported statements. For example, subjects in behavioral experiments in health are typically observed in real health or healthcare field situations, or, if not, they face real consequences for their choices or behaviors through aligned monetary and non-monetary incentives. Behaviors and decisions of participants to a behavioral experiment in health are thus typically “natural” – that is they take place in naturalistic situations - or “incentive-compatible” in the usual experimental economics sense that participants bear some real behavioral consequences for their choices in the experiment (e.g., Smith, 1976, 1982; Friedman and Sunder, 1994; Cassar and Friedman, 2004). This defining feature makes behavioral experiments in health distinct from “stated preference experiments”, such as contingent valuation studies, or “discretechoiceexperiments” (DCEs), which have since long been used in healtheconomics, and which do not typically consider real behavior or incentive-compatible choice situations (e.g., Ryan, McIntosh, and Shackley, 1998; Ryan and Farrar, 2000; de Bekker-Grob, Ryan, and Gerard, 2012).
Discretechoiceexperiments (DCEs) and their modeling describe consumers’ be- haviors. In these experiments, consumers are given a questionnaire or survey of a series of choice sets. Their task is to choose one alternative from a set of alternatives that benefits them the most. The alternatives, or products, being modeled play the role of the explanatory variables in regression modeled. They may be goods, services, policies, and/or scenarios. DCEs have applications in a multitude of fields including but not limited to, health systems research, public policy, transportation research, and economics. These experiments provide valuable information to businesses on the impact the features of a product have on the likelihood of the product being chosen over competitive alternatives. Best-worst scaling experiments are modified DCEs to elicit further information about the best and worst product, or best and worst attributes and attribute-levels of a product. Our research in the area DCEs is on the attribute-level best-worst DCEs and their models. We make extensions to the traditionally defined utility function and look at the impact of time on expected utility using Markov decision processes (MDPs).
A US analysis suggested most patients would consider, or accept a transplant at increased risk of blood-borne viral infection. Another US analysis demonstrated that frequency and timing of dialysis were pertinent and a majority of patients would only switch to 6 sessions from 3 sessions weekly for substantive health benefits. A different analysis demonstrated a clear preference, for localised dialysis provision within Greenland, at the cost of increased annual taxation. An Australian dialysis analyses found that whilst some determinants of dialysis versus conservative dialysis could be established for patients and caregiver respondents, many of the hypothesized
Results: We identified 14 eligible studies from Europe, Australasia, North America, and Asia, reporting preferences for treatment or screening, patient experiences, quality of life, health
outcomes and priority setting frameworks. Specific contexts included medical interventions in kidney transplantation and renal cell carcinoma, health policies for organ donation and allocation, dialysis modalities and end-of-life care; using a variety of statistical models. The characteristics of ‘time’ (i.e. transplant waiting time, dialysis hours, transport time) and ‘quality of life’ (pre and post-transplant, or pre and post-dialysis) consistently influenced patient and clinician preferences across the choice studies.
Most DCEs in healtheconomics are rooted in the Random Utility Theory (RUT) [3,5–7]. This theory assumes that respon- dents choose rationally and will select the scenario that generates the highest personal utility, that is, respondents will only select the opt-out option if none of the presented scenarios in that specific choice task is more attractive than the opt-out option [5,8]. Additional research shows that from this perspective, forcing respondents to make a choice induces bias, as they would not always make that same choice in real life [3,9,10]. In such a forced-choice situation, people who would rather choose to opt- out, tend to randomly select either scenario from a choice task or select the most safe/least extreme scenario [9–12]. As a consequence, the standard error of the attribute estimates will increase while the external validity decreases [9,10]. In summary, based on the RUT, an opt-out option can always be included, if this is accordance with the respondent’s real-life decision context. However, in practice, other motives than achieving the highest personal utility may be more important when people make their decisions [8–22]. This resulted in the hypothesis that only very few respondents act solely according to the assumptions of the RUT when choosing the opt-out option. Some individuals are more prone to choose the opt-out situation even before they actually evaluate the different situations in a choice task. Baron and Ritov (1992) argued that individuals choose the opt-out alternative to protect themselves from poor choices, as negative outcomes based on taking action (choosing) are perceived as worse compared to negative outcomes due to inactivity (not choosing) . This finding was confirmed by many others [13,17,18], among which a theory by Luce and colleagues who suggest that if people decide to make a choice, the tendency to choose to opt-out increases as the trade-off becomes more difficult and the decision at hand is emotion-laden [12,16]. This indicates that people choose to opt- out to avoid making difficult trade-offs [12,16]. Research by Dahr and colleagues (1997 and 2003) showed that choice task complexity (i.e., large number of choice situations per choice task or comparable choice situations with respect to their attractive- ness) results in more opting out [9,11]. In summary, it seems plausible that respondents choose the opt-out option more often if they have to decide about a complex emotion-laden topic, if choice tasks are difficult, if scenarios are complex and if none of the scenarios is clearly superior. This way, respondents minimize their effort and reduce internal conflict induced by (negative) decision making.