Survival Time Outcomes in Randomized, Controlled Trials and Meta-Analyses: The Parallel Universes of Efficacy and Cost-Effectiveness

(1)

Survival Time Outcomes in Randomized, Controlled Trials and

Meta-Analyses: The Parallel Universes of Efficacy and Cost-Effectiveness

Patricia Guyot, MSc1,2,_{*, Nicky J. Welton, PhD}1_{, Mario J.N.M. Ouwens, PhD}2_{, A.E. Ades, PhD}1

1_{School of Social and Community Medicine, University of Bristol, UK;}2_{Mapi Values, Houten, The Netherlands}

A B S T R A C T

Objectives:Many regulatory agencies require that manufacturers

es-tablish both efficacy and cost-effectiveness. The statistical analysis of the randomized, controlled trial (RCT) outcomes should be the same for both purposes. The question addressed by this article is the following: for survival outcomes, what is the relationship between the statistical analyses used to support inference and the statistical model used to support decision making based on cost-effectiveness analysis (CEA)?

Methods:We performed a review of CEAs alongside trials and CEAs

based on a synthesis of RCT results, which were submitted to the Na-tional Institute for Health and Clinical Excellence (NICE) Technology Appraisal program and included survival outcomes. We recorded the summary statistics and the statistical models used in both efficacy and cost-effectiveness analyses as well as procedures for model diagnosis and selection.Results:In no case was the statistical model for efficacy and CEA the same. For efficacy, relative risks or Cox regression was used. For CEA, the common practice was to fit a parametric model to

the control arm, then to apply the hazard ratio from the efficacy anal-ysis to predict the treatment arm. The proportional hazards assump-tion was seldom checked; the choice of model was seldom based on formal criteria, and uncertainty in model choice was seldom addressed and never propagated through the model.Conclusions:Both infer-ence and decisions based on CEAs should be based on the same statis-tical model. This article shows that for survival outcomes, this is not the case. In the interests of transparency, trial protocols should specify a common procedure for model choice for both purposes. Further, the sufficient statistics and the life tables for each arm should be reported to improve transparency and to facilitate secondary analyses of results of RCTs.

Keywords:cost-effectiveness analysis, cox proportional model,

Introduction

Survival analysis is used to describe the analysis of data in the form of times from a well-defined time origin until the occurrence of some particular event or end point[1]. Survival times are fre-quently censored. This arises if patients fail to reach the end point, for example, death or disease progression, before the end of the study period, or if they become lost to observation. If any observa-tions are censored, the sample mean is a biased estimate of the true mean, whether censored observations are included or not.

Survival analysis solves the problem of censoring by calculat-ing, at each time point, the probability that a patient reaches the end point, given that he or she has survived that far. If patients are censored, they are simply dropped from the at-risk denominator. The analysis is based on the hazard at each time point and leads to the nonparametric Kaplan-Meier curves illustrated inFigure 1. Note that in this figure, no patient is followed beyond 32 months; at this point, an estimated 29% of patients for the treatment arm and 23% for the control arm have still not reached the end point. To es-tablish the clinical effectiveness of a new treatment, the semipara-metric Cox regression is usually used. This method allows, if the

proportional hazards assumption holds, estimation of a treatment effect in the form of a hazard ratio without assuming any particular form of probability distribution for the hazard function.

To establish the cost-effectiveness of a new treatment, the health benefits are most frequently measured by using quality adjusted life-years (QALYs)[2]. QALYs are calculated by summing the time spent in a health state weighted by the utility value as-sociated with the health state, thus incorporating both health-state utility and length of survival. Therefore, although the statis-tician can generate a relative effect measure in the form of a hazard ratio to compare two samples, the economist must adopt a strategy that permits the estimation of the expected time in each health state, despite censored data. Usually a parametric ap-proach is taken where a specific distributional form is assumed for the time-to-end point data. A wide range of such distributions can be fitted to the data. Once a distributional form has been chosen, extrapolation of the survival curve can be undertaken, as illus-trated inFigure 1, and mean survival— effectively the area under the curve— can be estimated.

There is, then, a possibility that two distinct statistical analyses may be carried out on the same data set: one for the purposes of

Conflicts of interest: All authors declare that they have no competing interests. A.E. Ades is a member of the NICE Appraisals Committees, but was not involved in the appraisal of sorafenib.

*Address correspondence to:Patricia Guyot, School of Social and Community Medicine, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol BS8 2PS, UK.

E-mail:[email protected]@mapivalues.com.

doi:10.1016/j.jval.2011.01.008

a v a i l a b l e a t w w w . s c i e n c e d i r e c t . c o m

(2)

inference about treatment efficacy and the other for the purpose of cost-effectiveness analysis (CEA). The purpose of this article is therefore to review recent literature to see how investigators have approached this problem and whether they were aware of the potential contradictions that could arise. We reviewed two kinds of literature—CEAs alongside randomized, controlled trials (RCTs) and CEAs with evidence synthesis—to see how investigators ex-press relative treatment effects in clinical effectiveness reports and how—and indeed whether—their efficacy analysis informs the estimates of mean time in states or whether efficacy and CEAs are based on different statistical models. We also assessed the extent to which key assumptions (proportional hazards, paramet-ric form) are checked and whether the uncertainties in these as-sumptions are appropriately propagated through the decision model.

In the Discussion section, we consider the impact of a mis-match between efficacy and CEAs on the transparency and integ-rity of decision making based on CEAs and on methods for synthe-sis of survival time data. We also examine possible remedies.

Methods

Sampling frame

To provide a manageable sampling frame, we use the most recent CEAs commissioned and issued by the National Institute for Health and Clinical Excellence (NICE) Technology Appraisal pro-gram including survival outcome(s), working backward from June 2008 to September 2004. The assessment reports provided by the NICE during this period were reviewed and were included in this review if they contained at least one survival outcome in the anal-ysis. This 4-year period should generate a representative view of current practice. The review included 13 CEAs alongside RCTs or systematic review but no meta-analysis, of which 7 were single technology appraisals (STAs), in which the manufacturer submis-sions are reviewed by an academic evidence review group and 6 multiple technology appraisals (MTAs), in which the main report is prepared by an academic assessment group. We also examined 11 CEAs based on meta-analysis or indirect comparison (3 STAs and 8 MTAs). We chose to base our study on CEAs commissioned and issued by the NICE for several reasons. Submissions to the NICE are relatively uniform because they adhere closely to a pre-specified methodology[3,4]that is widely regarded as represent-ing a high standard[5]. Furthermore, the NICE is seen as a role model for the implementation of CEAs and is being closely watched by health care policymakers throughout Europe and the

United States[6]. In addition, the NICE makes the complete assess-ment reports available in English on their Web site. For the STAs, our analyses were based, where possible, on the initial submis-sions. We relied on the evidence review group’s description of manufacturer submissions. Several alternate analyses may be presented as part of the appraisal process; these were not exam-ined systematically, but we refer to these further in the Discussion section. For the MTAs, our analyses were based on the assessment groups’ reviews.

Data extracted

Each publication was scrutinized in detail. Numerous features of the studies were extracted and recorded separately for 1) for CEAs alongside an RCT and 2) for CEAs based on evidence synthesis. A large number of study methods and results were recorded. We report here on the absolute and relative effect measure for the efficacy and CEA, the statistical models used for absolute and rel-ative effects, and procedures adopted for checking model assump-tions, such as proportional hazards and the parametric model, and methods used for model selection. We also recorded whether the CEA was deterministic or probabilistic. For CEAs based on meta-analysis or indirect comparison, we also recorded the number of studies included in the synthesis and, where possible, the relative effect and the absolute effect for efficacy in each study and in the synthesis. Based on these two large spreadsheets, the information extracted was tabulated as presented in the Results section. We focused especially on whether the statistical model underlying the CEA is the same as the statistical model underlying the efficacy analysis. When the study included two survival outcomes with different features, both features are shown in the summary tables. For some studies, the information required was not available in the material examined or the explanation about the models was imprecise and/or incomplete. We tried to reflect appropriately the information provided, and we take responsibility for any errors made in tabulating the results.

Results

CEAs alongside one RCT

InTable 1([7–19) 14 of the 16 CEAs were based on parametric models (first 3 columns). On the other hand, none of the efficacy analyses specified parametric models (first two lines). In no case was the same model used for efficacy and cost-effectiveness.

In 12 of the 16 efficacy analyses, a hazard ratio measure was generated usually based on a Cox model (although the method was not specified in four cases). In CEAs, in two cases, separate models were fitted to each arm (first column ofTable 1). In these two cases, the treatment effect is fundamentally different from what was assumed in the efficacy analysis; the assumption of a hazard ratio that is constant over time is effectively abandoned. In four cases, a new parametric model was estimated for the control arm, and then a prediction for the treatment arm was generated by assuming proportional hazards and by applying to the control arm the hazard ratio estimated by Cox regression in the efficacy analysis (second column ofTable 1).

We found that no study reported that the proportional hazards assumption had been formally tested, and none provided any ra-tionale or justification. No study stated that the hazard ratios was estimated, or re-estimated, from a parametric model. Regarding the parametric form of the survival distribution, the exponential, Weibull, Gompertz, and log-logistic and log-normal distributions were used and graphically assessed (Table 2). The Weibull and the exponential distributions are the most frequently encountered. Six studies did not state whether different models had been com-pared. For the five other studies, the number of parametric distri-Fig. 1 – Overall survival for two treatment arms for a typical

trial selected from the look-back review. The stepped lines are the Kaplan-Meier estimates; the smooth curves show a proportional hazards Weibull model, from which the mean survival time under each treatment can be estimated.

(3)

butions compared varied between two and four. Four studies re-ported that the choice of model was based on best graphical fit against the Kaplan-Meier curves. Only one study selected the dis-tribution using a statistical criterion, which was the Akaike Infor-mation Criterion. Four of the five studies using an exponential distribution did not report having compared it with other para-metric distributions, and the fifth one only compared the expo-nential distribution to one other distribution. Only one study[19]

checked the sensitivity of the results to model assumptions by using the Weibull parameter estimates, the exponential esti-mates, and the lower and upper values around the fitted expo-nential in sensitivity analyses. Otherwise, there was no attempt to take into account the uncertainty about which model was appropriate.

Results for CEAs based on evidence synthesis

In the 11 CEAs based on evidence syntheses, whether on pairwise meta-analysis or on indirect or mixed treatment comparisons[20], manufacturers presented either hazard ratios or risk ratios when describing results from their own trials. The summary statistics in the synthesis were pooled (log) hazard ratios (7) or (log) relative risks (3) (Table 3) ([21–31). To carry out CEAs, these hazard or risk ratios were again applied to “baseline” parametric models esti-mated mostly from the baseline in the manufacturer’s trial. The manufacturers tend to have access to individual patient data (IPD) from their own trial, but not to the IPD of the other trials included in the evidence synthesis. This obliges them to use the summary statistics reported in the publications of the other trials, either hazard ratios or medians.

Seven studies did not report having compared different models (Table 4). One study reported testing an exponential and a Weibull distribution: the Weibull distribution was chosen based on the fit against the Kaplan-Meier curves. One study explicitly states that no comparison of different models could be possible due to the lack of IDP. The proportional hazards assumption was tested in

Table 1 – CEA alongside trial: efficacy and cost-effectiveness models used in 16 analyses from 13 studies (3 studies included 2 different survival analyses ). Cost-effectiveness Health benefit statistic: mean survival time (to calculate QALYs) Health benefit statistic: none Method Method Parametric models separated for each arm Parametric model on control arm, HR from Cox on new treatment arm Parametric model on control arm, HR from unspecified method on new treatment arm Semiparametric Cox Commercial in confidence Cost minimization Total Efficacy Method Health benefit statistic Parametric model Mean survival time 0 Hazard ratio 0 Semiparametric Cox Hazard ratio 1 [12] 2 [16, 15] 3 [7, 8, 18] 1 [17] 1 [10] 8 Not specified Hazard ratio 1 [14] 2 [9, 13] 1 [13] 4 Dichotomous Relative risk 2 [8, 16] 2 No model reported No statistic reported 2 [11, 19] 2 Total 2 4 8 0 1 1 16 HR, hazard ratio; QALYs, quality-adjusted life-years.

Table 2 – CEAs alongside trial: parametric and proportional hazards assumptions.

References Total

Parametric models in the 14 analyses (from the 11 studies): model

diagnostics and selection Distributions tested Exponential [7,11,13,14,18,19] 6 Weibull [7,8,12,14,16, 19] 6 Gompertz [16] 1 Log-logistic [7, 14] 2 Log-normal [7] 1 Unspecified [8,9,15] 3 Choice rationale

Best graphical fit against KM curves [8, 14, 16, 19] 4 Akaike Information Criterion [7] 1 Not stated [9, 11–14, 18] 5

Was the proportional hazards assumption tested in the 12 analyses (from the 9 studies) using this assumption in the CEA?

Yes 0

Not stated [7–9, 11, 13, 15, 16, 18, 19] 9 References in bold indicate the distribution chosen.

(4)

two of six studies. The method used is not reported for one study, whereas for the other, a plot of ln(⫺ln(S(t)) against ln(t) was exam-ined. (Parallel lines for each treatment suggest proportional haz-ards.) No attempt was made to incorporate the uncertainty in the choice of the model directly into the CEA.

Discussion

Interpretation of findings

The statistical model underlying the estimate of treatment effi-cacy should ideally be the same as the statistical model underlying the CEA, and yet we found that in 24 unselected, recent technology appraisals involving survival analysis submitted to the NICE, this was never the case. What makes this so surprising is that there is no technical reason why both efficacy and cost-effectiveness can-not be served by the same statistical model and indeed by the same relative summary measure. Instead of a single coherent analysis, we found that a parametric model was fitted to the con-trol arm or the standard treatment, and a hazard ratio or relative risk estimated in a separate model was then applied to this base-line to generate the treatment arm for the CEA. In most of the cases, this hazard ratio was from the efficacy analysis and thus from a Cox proportional model. This practice has many implica-tions.

First, just from a statistical point of view, the Cox hazard ratio will not have the same numerical value as a hazard ratio that would be estimated by fitting the parametric model to both arms. Yet, if we believe that the parametric model correctly represents the standard treatment effect and we accept proportional hazards, then there is no reason to not use the parametric model to esti-mate the relative treatment effect. Second, overlaying the hazard ratio from one analysis onto a baseline arm from a different anal-ysis will overstate the uncertainty in the analanal-ysis because the co-variation between baseline and treatment effect that would be expressed in a single coherent analysis is lost, resulting eventually in an incorrect characterization of the uncertainty in incremental net benefit. Because cost-effectiveness models based on survival data are highly nonlinear functions of the survival parameters, the probabilistic analyses required for decision making under param-eter uncertainty will then be biased. Finally, as we have to assume a distribution for the baseline treatment, we remove the advan-tage of the Cox model; therefore, the question of keeping the pro-portional hazards assumption should be properly tested. If the proportional hazards assumption does not hold, then the hazard ratio obtained from a Cox model is meaningless, and applying it to a parametric form for the baseline will give incorrect inference. Note also that some of the parametric distributions, such as the log-logistic distributions, cannot be parameterized as a propor-tional hazards model.

The analyses of the studies alongside trials suggest that most investigators were aware of the awkward position in which they found themselves: a Cox regression analysis had been prespeci-fied in the original protocol and yet a parametric model was needed for cost-effectiveness. They preferred to apply the Cox hazard ratio from the nonparametric efficacy analysis to their parametric model, even though this was not entirely “correct,” rather than produce a completely new parametric analysis giving a different estimate of the hazard ratio. The “hybrid” estimates produced in this way are not, of course, the best fit to the data, given the model.

For CEAs based on evidence syntheses, investigators face greater difficulties. They have fewer options because the individ-ual patient data needed to select, test, and fit parametric models will only be available from the trials sponsored by the manufac-turer making the submission. Therefore, alternative evidence

syn-Table 3 – CEA based on evidence synthesis: efficacy and cost-effectiveness models used in 13 analyses from 11 studies (2 studies included 2 different surviv al analyses). Cost-effectiveness Health benefit statistic: mean survival time (to calculate QALYs) Health benefit statistic: none Total Method Method Parametric models on control arm, pooled HR from evidence synthesis Exponential models based on estimates of medians Crude aggregation of data from the different arms of the trials included in the review Cost-minimization Efficacy Method Health benefit statistic Hazard ratio EV Pooled hazard ratio 5 [22, 24, 25, 28, 29] 1 [31] 1 [21] 7 Dichotomous EV Pooled relative risk 3 [24, 25, 27] 3 No model reported No pooled statistic reported 2 [26, 30] 1 [23 ] 3 Total 10 1 1 1 13 EV, evidence synthesis; HR, hazard ratio; QALYs, quality-adjusted life-years.

(5)

theses to obtain expected survival times or expected times to pro-gression are impossible to implement.

A further, less-than-satisfactory aspect of these analyses is the inadequate checking of assumptions. The proportional hazards assumption was seldom checked, and the use of formal criteria for choice of parametric models was highly infrequent. Uncertainty in model choice was seldom addressed at all and never formally in-corporated or propagated through the decision model. Interest-ingly, in this look-back review, there was no use, or even mention, of methods capable of reflecting model uncertainty, neither the flexible multiparameter generalizedFdistribution that contains most of the standard shapes[32]nor the new methods for flexible spline curves[33]. A series of recent publications on the use of flexible distributions and model averaging[34 –36]indicate a grow-ing interest in the problem.

Although we have focused on the assessment group submis-sion and on the review group’s description of the initial manufac-turer submissions, several alternative analyses may be performed in the course of the appraisal process, and the final decision is shaped by a wide range of other information and comments from stakeholders. A further exercise might be to track the different analyses through each appraisal and examine alternative analy-ses and how they influenced the decision. Manufacturer submis-sions could be examined directly as well as peer-review publica-tions of key trials before and after the submission. This could improve the accuracy of the review we have conducted and would certainly offer rich insights into how the analyses were motivated and used, but it would be unlikely to change the conclusions of this study. In cases in which de novo survival analyses were under-taken by the evidence review group[8,11,15,26], one looked at fur-ther parametric curves, two examined the area under the KM curve, and one used external data. It is therefore unlikely that further examination of our sample of studies would change our conclusions.

Impact of the various models on medical decision making Setting aside the incoherence of current practice from the view-point of statistical method, the lack of protocol or guidelines ad-dressing the CEAs, and particularly the choice of parametric mod-el(s), leaves those responsible for choosing a model a very substantial level of freedom and creates a danger of arbitrary choice of model or choice of model to favor a particular product.

It was not possible in this study to document whether the use of an alternative model that was equally compatible with the data would have led to different conclusions on efficacy or to a different decision on cost-effectiveness. This would only be possible with access to the individual data and the full model used for the CEAs. It is unlikely that relative efficacy measures such as the hazard ratio would be materially different, and it is also likely that the

directionof the differences between treatments in expected sur-vival or expected time to progression would be unaltered, unless there were severe departures from proportional hazards.

It is worth remembering, however, that a substantial majority of submissions to reimbursement agencies such as the NICE may be on the very margin of cost-effectiveness because manufactur-ers naturally tend to set prices as high as they can without going over the cost per QALY-gained threshold. Under these circum-stances, only minor changes in the model are needed to change a “yes” decision to a “no” or vice versa. At the NICE, for example, the issue of whether the parametric model assumed by the manufac-turer is justified or whether different results might be obtained with different models equally well supported by the evidence has been raised during the appraisal process on a number of occasions and has once been subject to an appeal[37].

Possible remedies

Despite the increasing use of CEA alongside RCTs, and the use of systematic reviews and CEAs in reimbursement decisions in sev-eral countries, such as the NICE in England and the Scottish Med-icines Consortium in Scotland, the design and analysis of RCTs with survival outcomes are oriented toward ratio-based measures without regard to expected survival time. At the trial design stage, CEA is still at present regarded as an add-on, to be conducted by other people, after the efficacy analysis. Many of the trials are designed to obtain licenses, and one of the contributing causes may be the separation of regulatory and reimbursement processes and their very different requirements for data. Possibly efforts to link the two would encourage the development of joint analysis protocols, although the different requirements of different agen-cies may make this challenging.

The Cochrane Handbook for systematic reviews of interven-tions[38]advises that the intervention effect should be expressed as a hazard ratio. The CONSORT statement[39]notes that “for survival time data, the measure could be the hazard ratio or dif-ference in median survival time.” These guidelines neither recom-mend reporting an estimate of the mean survival nor do they spec-ify how the survival analysis results of RCTs should be included in economic evaluations. As we have seen, this raises the possibility of having different statistical models for the same data set, used for inference on treatment efficacy and for economic decision making. The two models cannot both be correct; one of them is not even the best fit to the data given its own assumptions, and we have seen that in some cases they are fundamentally different. In the absence of any protocol, the choice of analysis for CEA is vir-tually arbitrary because many models might be equally compati-ble with the evidence.

One remedy would be for trial protocols to include prespecifi-cation of a single analysis to be used in both efficacy and

cost-Table 4 – CEAs based on evidence synthesis: parametric and proportional hazards assumptions.

References Total Parametric assumptions in the 10

analyses (from the 8 studies): model diagnostics and selection Distribution tested

Exponential [24,27, 30] 3

Piecewise exponential [26] 1

Weibull [22, 24, 25, 28, 29] 5

No further investigation because of lack of IPD

[30] 1

Choice rationale

Best graphical fit against KM curves [24] 1 Weibull graphical plot of

ln(⫺ln(S(t)) against ln(t)

[22, 28] 2

Not stated [25–27, 29, 30] 3

Were the proportional hazards assumption tested in the 10 analyses (from the 8 studies) using this assumption in the CEA?

Yes: graphical plot of ln(-ln(S(t)) against ln(t)

[22] 1

Yes: assumption tested but commercial/ academic in confidence information removed

[29] 1

Not stated [24–28, 30] 6

References in bold indicate the distribution chosen.

CEAs, cost-effectiveness analyses; IPD, individual patient data; KM, Kaplan-Meier.

(6)

effectiveness research. Further research will be needed before joint protocols can be designed, and this will involve careful ex-amination of a wide range of survival data, experiments with dif-ferent curve fitting methods, and perhaps simulation studies to assess the effects of model choice. A joint prespecified protocol would not, of course, dictate a particular choice of parametric model, or even a “menu” of standard models. Instead it would specify a single, common process for model selection. It would also be an opportunity to prespecify methods for analyzing mor-tality data in trials in which patients may switch to other treat-ments (crossover) after tumor progression[40]and may even en-courage trial designs that would be easier to analyze in this respect. The need for prespecified protocol is something that could perhaps be championed by bodies like the Cochrane Collaboration or the International Committee on Harmonisation.

A second remedy that would immediately transform the prob-lem, although it would not solve the problem of the protocol, re-lates to the reporting and availability of results. The CONSORT statement[36]requires that in trials with continuous outcomes, the mean and SD are reported for each arm, whereas numerators and denominators are required for trials with dichotomous out-comes. These constitute the sufficient statistics for each arm. CONSORT’s advocacy of the hazard ratio or difference in median survival time for survival data is surprising for three reasons: first, these are not sufficient statistics; second, they specify a measure of the relative effect rather than an effect for each arm (generally required for CEA); and third, the hazard ratio requires a propor-tional hazards assumption and is therefore strongly model depen-dent. A more consistent approach would require the sufficient statistics for each arm. These are the life tables: the numbers of events and numbers of patients at risk each time that there is an event. Although the provision of the life table for each arm would not allow covariate analysis and adjustment, they would other-wise contain the same information as the IPD. The provision of the life-table data would allow analysts to test assumptions and to evaluate different models and therefore would improve the trans-parency in economic evaluations and would greatly facilitate evi-dence synthesis.

Source of financial support: Patricia Guyot and Mario J.N.M. Ouwens are supported by Mapi Values, Nicky J. Welton by an MRC Methodology Research Fellowship, and A.E. Ades from funds transferred from the MRC to University of Bristol.

R E F E R E N C E S

[1] Collett D. Modelling Survival Data in Medical Research (2nd ed.). New York: Chapman & Hall/CRC, 2003.

[2] Drummond M, Sculpher M, Torrance G, et al. Methods for the Economic Evaluation of Health Care Programmes (3rd ed.). Oxford: Oxford University Press, 2005.

[3] National Institute for Clinical Excellence (NICE). Guide to the Methods of Technology Appraisal, Issued April 2004. Available from:http:// www.nice.org.uk/niceMedia/pdf/TAP_Methods.pdf. [Accessed January 21, 2010].

[4] National Institute for Clinical Excellence (NICE). Guide to the Methods of Technology Appraisal, Issued: June 2008. Available from:http://www .nice.org.uk/media/B52/A7/TAMethodsGuideUpdatedJune2008.pdf. [Accessed January 21, 2010].

[5] World Health Organization. The Clinical Guideline Programme of the National Institute for Health and Clinical Excellence (NICE). A review by the World Health Organization, May 2006. Available from:http:// www.euro.who.int/__data/assets/pdf_file/0003/96447/E89740.pdf. [Accessed January 21, 2010].

[6] Schlander M. Health Technology Assessments by the National Institute for Health and Clinical Excellence: A Qualitative Study. New York: Springer, 2007.

[7] Griffin S, Walker S, Sculpher M, et al. Cetuximab plus radiotherapy for the treatment of locally advanced squamous cell carcinoma of the head and neck. Issued June 2008. Available from:http://www.nice. org.uk/nicemedia/pdf/HeadandNeckCancer_cetuximab_ERG.pdf. [Accessed January 21, 2010].

[8] Bagust A, Hockenhull J, Boland A, et al. Rituximab for the treatment of relapsed or refractory stage III or IV follicular non-Hodgkin’s lymphoma. Issued February 2008. Available from:http://www.nice. org.uk/nicemedia/pdf/RituximabERGReport.pdf. [Accessed January 21, 2010].

[9] Green C, Bryant J, Takeda A, et al. Bortezomib for the treatment of multiple myeloma patients. Issued October 2007. Available from: http://www.nice.org.uk/nicemedia/pdf/MultipleMyeloma_ Evidencereviewgroupreport.pdf. [Accessed January 21, 2010]. [10] Bagust A, Macbeth F, Boland A, et al. Pemetrexed for the treatment of

relapsed non-small cell lung cancer. Issued August 2007. Available from:http://www.nice.org.uk/nicemedia/pdf/Lungcancer_ pemetrexed_ERG.pdf. [Accessed January 21, 2010].

[11] Walker A, Palmer A, Erhorn S, et al. Fludarabine phosphate for the first-line treatment of chronic lymphocytic leukaemia. Issued February 2007. Available from:http://www.nice.org.uk/nicemedia/pdf/ Leukaemia_fludarabine_ergreport.pdf. [Accessed January 21, 2010]. [12] Tappenden P, Jones R, Paisley S, Carroll C. The use of bevacizumab and cetuximab for the treatment of metastatic colorectal cancer. Issued January 2007. Available from:http://www.nice.org.uk/ nicemedia/pdf/colorectal_beva_assessment.pdf. [Accessed January 21, 2010].

[13] Hind D, Ward S, De Nigris E, et al. Hormonal therapies for early breast cancer: systematic review and economic evaluation. Issued November 2006. Available from:http://www.nice.org.uk/nicemedia/pdf/ breastcancerearly_hormonal_assessmentreport.pdf. [Accessed January 21, 2010].

[14] Chilcott J, Lloyd Jones M, Wilkinson A. Docetaxel for the adjuvant treatment of early node-positive breast cancer: a single technology appraisal. Issued September 2006. Available from:http://www.nice. org.uk/nicemedia/pdf/Breast_cancer_docetaxel_STA_ERG_Report. pdf. [Accessed January 21, 2010].

[15] Ward S, Pilgrim H, Hind D. Trastuzumab for the treatment of primary breast cancer in HER2 positive women: a single technology appraisal. Issued August 2006. Available from:http://www.nice.org.uk/ nicemedia/pdf/Breastcancer_trastuzumab_erg.pdf. [Accessed January 21, 2010].

[16] Pandor A, Eggington S, Paisley S, et al. The use of oxaliplatin and capecitabine for the adjuvant treatment of colon cancer. Issued April 2006. Available from:http://www.nice.org.uk/nicemedia/pdf/ Assessment_Report_%28CiC_removed%29.pdf. [Accessed January 21, 2010].

[17] Bryant J, Brodin H, Loveman E, et al. The clinical effectiveness and cost effectiveness of implantable cardioverter defibrillators: arrhythmias. Issued January 2006. Available from:http://www.nice. org.uk/nicemedia/pdf/AtrFib_assessment_report.pdf.Accessed January 21, 2010].

[18] Castelnuovo E, Stein K, Pitt M, et al. The effectiveness and cost-effectiveness of dual chamber pacemakers compared to single chamber pacemakers for bradycardia due to atrioventricular block or sik sinus syndrome: systematic review and economic evaluation. Issued February 2005. Available from:http://www.nice.org.uk/ nicemedia/pdf/DCP_Assessment_Report_for_public_ consultation.pdf. [Accessed January 21, 2010].

[19] Wilson J, Connock M, Song F, et al. Imatinib for the treatment of patients with unresectable and/or metastatic gastro-intestinal stromal tumours—a systematic review and economic evaluation. Issued October 2004. Available from:http://www.nice.org.uk/nicemedia/pdf/ ImatinibGIST_HTA.pdf. [Accessed January 21, 2010].

[20] Caldwell DM, Ades AE, Higgins JPT. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 2005;331:897–900.

[21] Wilson J, Yao G, Raftery J, et al. A systematic review and economic evaluation of epoetin alfa, epeotin beta and darbepoetin alfa in anaemia associated with cancer, especially that attributable to cancer treatment. Issued May 2008. Available from:http://www.nice.org.uk/ nicemedia/pdf/Anaemia_Assessment_Report.pdf. [Accessed January 21, 2010].

[22] Fox M, Mealing S, Anderson R, et al. The effectiveness and cost-effectiveness of cardiac resynchronisation (biventricular pacing) for heart failure: a systematic review and economic model. Issued May 2007. Available from:http://www.nice.org.uk/nicemedia/pdf/ Assesment%20Report.pdf. [Accessed January 21, 2010].

[23] Jones J, Takeda A, Tan SC, et al. Gemcitabine for metastatic breast cancer. Issued January 2007. Available from:http://www.nice.org.uk/ nicemedia/pdf/Breastcancer_gemcitabine_ergreport.pdf. [Accessed January 21, 2010].

[24] Connock M, Juarez-Garcia A, Jowett S, et al. Methadone and buprenorphine for the management of opioid dependence: a systematic review and economic evaluation. Issued January 2007. Available from: http://www.nice.org.uk/nicemedia/pdf/DM-meth-bupren-AR.pdf. [Accessed January 21, 2010].

[25] Adi Y, Juarez-Garcia A, Wang D, et al. Oral naltrexone as a treatment for relapse prevention in formerly opioid dependent drug users—a

(7)

systematic review and economic evaluation. Issued January 2007. Available from:

http://www.nice.org.uk/nicemedia/pdf/DM-nalt-AR.pdf. [Accessed January 21, 2010].

[26] Griffin S, Dunn G, Palmer S, et al. The use of paclitaxel in the management of early stage breast. Issued: September 2006. Available from:http://www.nice.org.uk/nicemedia/pdf/STA_report.pdf. [Accessed January 21, 2010].

[27] Murray A, Lourenco T, de Verteuil R, et al. Systematic review of the clinical effectiveness and cost-effectiveness of laparoscopic surgery for colorectal cancer. Issued: August 2006. Available from:http:// www.nice.org.uk/nicemedia/pdf/LS_Assessment_report.pdf. [Accessed January 21, 2010].

[28] Collins R, Fenwick E, Trowman R, et al. A systematic review and economic model of the effectiveness and cost-effectiveness of docetaxel in combination with prednisone or prednisolone for the treatment of hormone refractory metastatic prostate cancer. Issued June 2006. Available from:http://www.nice.org.uk/nicemedia/pdf/ Prostate_cancer_docetaxel_assessmentreport.pdf. [Accessed January 21, 2010].

[29] Hind D, Tappenden P, Tumur I, et al. The use of irinotecan, oxaliplatin and raltitrexed for the treatment of advanced colorectal cancer: systematic review and economic evaluation. Issued August 2005. Available from:http://www.nice.org.uk/nicemedia/pdf/

Advancedcolorectalcancer_ass_rep_full.pdf. [Accessed January 21, 2010].

[30] Main C, Ginnelly L, Griffin S, et al. Topotecan, pegylated liposomal doxorubicin hydrochloride and paclitaxel for second-line or subsequent treatment of advanced ovarian cancer. Issued May 2005. Available from:http://www.nice.org.uk/nicemedia/pdf/

Assessmentreport_Ovariancancer.pdf. [Accessed January 21, 2010].

[31] McCormack K, Wake B, Perez J, et al. Systematic review of the clinical effectiveness and cost-effectiveness of laparoscopic surgery for inguinal hernia repair. Issued September 2004. Available from:http:// www.nice.org.uk/nicemedia/pdf/LapHernia_Final_report_11_12_03.pdf. [Accessed January 21, 2010].

[32] Cox C. The generalized F distribution: an umbrella for parametric survival analysis. Stat Med 2008;27:4301–12.

[33] Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modeling and estimation of treatment effects. Stat Med 2002;21:2175–97.

[34] Jackson C, Sharples L, Thompson S. Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. Int J Biostat 2010;6:34.

[35] Jackson C, Sharples L, Thompson S. Accounting for uncertainty in health economic decision models by using model averaging. J R Stat Soc Ser A 2009:383– 404.

[36] Jackson C, Sharples L, Thompson S. Structural and parameter uncertainty in Bayesian cost-effectiveness models. J R Stat Soc Ser C Appl Stat 2010;59:233–53.

[37] Hepato-cellular carcinoma (advanced and meta-static)—sorafenib (first line). Available from:http://guidance.nice.org.uk/TA/Wave17/ 8#keydocs. [Accessed January 21, 2010].

[38] Higgins JPT, Green S. Cochrane Handbook for Systematic Reviews of Interventions. Chichester, UK: John Wiley & Sons, Ltd., 2008. [39] Moher D, Hopewell S, Schulz KF, et al. CONSORT 2010 explanation and

elaboration: updated guidelines for reporting parallel group randomised trials. BMJ 2010;340:c869.

[40] Morden J, Lambert P, Latimer N. Assessing methods for dealing with treatment switching in randomised clinical trials. Available from: http://www.nicedsu.org.uk/Crossover%20and%20survival%20-%20 final%20DSU%20report.pdf. [Accessed December 27, 2010].