FROM PAIRWISE TO NETWORK META-ANALYSES

(1)

Chapter 2

F

ROM

P

AIRWISE TO

N

ETWORK

M

ETA

-A

NALYSES

Sonya J. Snedecor

1,

, Ph.D., Dipen A. Patel

2

, Ph.D.,

and Joseph C. Cappelleri

3

, Ph.D.

1

Director, Health Economics, Pharmerit International, Bethesda, MD, US

2

Associate Director, Health Economics and Outcomes Research, Pharmerit International, Bethesda, MD, US

3

Senior Director, Statistics, Pfizer Inc., Groton, CT, US

A

BSTRACT

Meta-analyses have been used for several years to pool data from clinical trials and generate estimates of treatment effect associated with a therapeutic intervention in relation to a comparator. In recent years, there has been an added emphasis on comparative effectiveness research since decision makers are often faced with more than one available treatment and want to understand whether a new product is more effective than the existing options. For disease indications where there are several treatment options available, it is difficult or simply not feasible to have clinical trials comparing each pairwise combination of the available treatments. In such circumstances, a network meta-analysis is a valuable tool to compare treatments of interest that have not been assessed in head-to-head clinical trials. This chapter provides an introduction to traditional pairwise meta-analyses and the related network meta-analyses. Commonly used methods in meta-analysis involve combining effect size estimates from individual studies using a fixed-effect model or random-effects model (or both). Differences between the two models along with presentation of Bayesian meta-analysis methodology are also discussed. Evidence networks used in network meta-analysis and key assumptions of similarity, homogeneity and consistency are also described. Finally, challenges and opportunities with meta-analyses are presented, including meta-regression and integration of individual patient data.



Corresponding author: Sonya J Snedecor, PhD; Pharmerit International, 4350 East West Hwy, Ste 430, Bethesda, MD, 20814 USA;. Email: [email protected].

The license for this PDF is unlimited except that no part of this digital document may be reproduced, stored in a retrieval system or transmitted commercially in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

(2)

Keywords: Direct evidence, evidence networks, indirect evidence, indirect treatment comparisons, meta-analysis applications, history, network meta-analysis, mixed treatment comparison

I

NTRODUCTION

Evidence synthesis involves the development of techniques to combine multiple sources of quantitative evidence. It can be considered an extension of research synthesis, which is the integration of empirical research for the purpose of creating generalizations from data gathered from multiple sources. [1] Goals of research synthesis include critical analysis of the research covered, identification of central issues for future research, and attempts to resolve conflicts in literature. The term meta-analysis is often used as a synonym for research synthesis, but a more precise definition was first described by Glass in 1976, [2] to be ―the statistical analysis of a large collection of individual studies for the purposes of integrating the findings.‖ The definition has evolved to also include an examination of heterogeneity, the variation of treatment effects among the studies analyzed.

There are four steps to conducting a meta-analysis: 1) identification of studies with relevant data; 2) assessing eligibility of studies; 3) abstracting data for analysis; and 4) execution of statistical analysis, including exploration of differences among studies or study effects. Although all steps are equally important for the quality and validity of a meta-analysis, this chapter will concentrate on execution and applications (step 4).

One of the first meta-analyses as an evidence synthesis methodology was conducted by British statistician Karl Pearson who analyzed and investigated differences among clinical trial results of the association of typhoid fever inoculation with mortality. [3, 4] Although much of the subsequent analysis work originated in the social sciences, where meta-analyses of observational studies are common, [5, 6] meta-meta-analyses applied to the medical sciences typically only include data from randomized controlled trials (RCTs). Two early and influential medical meta-analyses were conducted by Elwood, Cochrane and colleagues, [7, 8], who studied the effects of aspirin treatment after heart attack to show reduced risk of recurrence, and by Chalmers et al., [9] who examined the use of anticoagulants after acute myocardial infarction. These analyses showed how methods could be used to synthesize the results of separate but similar studies to provide more scientifically robust estimates of the direction and size of treatment effects.

In 1985, Peto and colleagues published a meta-analytic overview of RCTs of beta blockade to encourage clinicians to review randomized trials systematically and to combine estimates of the effects of treatments considered to be the same based on informed clinical judgment. [3, 4, 10] Subsequently, citations of meta-analyses in health-related literature have surged over the past two decades from 272 PubMed citations in 1990 to 6,354 citations in 2012 and has paralleled the increase in the number of randomized trials being conducted (7,170 to 21,444 during the same time frame).

Well-executed systematic reviews of the literature and meta-analyses of RCTs are widely considered to be at the top of the evidence hierarchy. These constitute the highest level of evidence because they attempt to collect, combine, and report the best available evidence using systematic, transparent, and reproducible methodology. [11] Performance of good

(3)

quality meta-analyses requires recognition that the analysis is itself a study necessitating careful planning and execution. [6] A formal protocol including clear specification on the method of study identification, rules for study selection, and analytic methodology are encouraged prior to initiating a meta-analytic investigation. [12] Complete and consistent reporting of meta-analyses are also paramount to conveying the quality and validity of an analysis. [13, 14] For more detailed information on systematic reviews and meta-analyses, the reader is referred to highly regarded texts. [15-17]

P

AIRWISE

M

ETA

-A

NALYSIS

M

ETHODOLOGY

One of the simplest methods for the combination of results of several studies is the ―vote counting‖ method, where the overall assessment is made in comparing the number of studies demonstrating positive results with the number of studies with negative results. Vote counting is limited to answering the question ―is there any evidence of an effect?‖ [16] A similarly simple method involves the combination of p values from each study. Here, a test statistic, which is a function of the one-sided p values from each study, is calculated to reject or not reject the null hypotheses that there is no effect in every study in the collection. However, with this method the null hypothesis may be rejected on the basis of a non-zero effect in just one study.

Unsurprisingly, these methods are not recommended because they fail to take into account the sizes of effects observed in individual studies, differential methods or statistical rules used to obtain analysis decisions and p values, or differential weights given to each study. These methods should be avoided whenever possible but might be considered a last resort in situations such as when the individual studies provide only non-parametric data analyses or there is no consistent outcome measure and little information other than a p value being available. [6, 16]

The most commonly employed meta-analytic methods combine estimates of effect sizes from individual studies. These methods fall into two broad categories: fixed-effect and random-effects.

The fixed-effect model answers the question ―What is the overall treatment effect in this collection of studies?‖ Implicit in a fixed-effect model is that the true effect of interest is constant across all of the studies considered. In contrast, a random-effects model acknowledges the occurrence of variation of true effects among studies. This model assumes that the study-level effects are drawn from a common distribution and answers the question ―What is the overall treatment effect in the universe of studies from which this collection was sampled?‖

The difference between fixed and random-effects models is shown graphically in Figure 1. The fixed-effect model (left) estimates the mean effect only from the studies identified. Each study is a sample estimate of treatment effect that is assumed to measure the same (common) treatment effect; these estimates are assumed to differ only because of natural random sampling variations. The random-effects model (right) estimates the mean effect which is centrally positioned around the individual effects of the different studies, each of which has its own individual distribution of effect.

(4)

Mathematically, the difference between fixed-effect and random-effects models can be represented by a single parameter:

Fixed-effect model: Random-effects model:

where, yi is the treatment effect of the ith study,  is the overall mean, ei is the error of the ith

study, and si is the deviation of the study-specific effect from the overall mean (the random

effect). Thus, when si = 0, the random-effects model reduces to the fixed-effect model.

Figure 1. Graphical description of fixed-effect (left) and random-effects (right) models.

Analysis Methods

Common fixed-effect methods used to combine data are the Peto [10] and the Mantel-Haenszel methods [18] for effects measured on a ratio scale and general variance-based methods [19] for estimates of a difference measure. The DerSimonian and Laird method [20] is a random-effects model that may be used for dichotomous or continuous effects. These methods and others are explained and reviewed in detail elsewhere. [21, 22]

These methods are classified as ―frequentist‖ because the parameters of interest (i.e., the treatment effect) are assumed to be unknown but have a fixed distribution and the data are considered to be random variables. Meta-analyses may also be conducted using Bayesian methodology. Bayesian statistical inference assumes the opposite of frequentist inference: that the input data are fixed and the parameters of interest are random variables that follow no pre-specified distribution. [23-25] In hypothesis testing, frequentist methods generate Pr(data|), reported as mean estimates and 95% confidence intervals (―CIs‖); Bayesian methods generate Pr(|data) and credible intervals (―CrIs‖) surrounding the mean estimate. [26] Confidence intervals indicate the probability with which repeated study samples will generate an interval range containing the true value of the parameter of interest. Credible intervals represent the probability that the value of parameter of interest lies within the interval range, given the observed data – the definition often incorrectly applied to confidence intervals. Thus, the Bayesian approach offers a more natural means of interpreting results. [27]

The Bayesian model framework also requires "prior information‖ in the form of a distribution around the parameters of interest. The prior distribution represents some prior

i i

y

 



e

i i i

(5)

belief and uncertainty about the value of the parameters that is integrated with the observed data to generate the model results (called the ‗posterior distribution‘). This requirement introduces a level of subjectivity to which some researchers object, since different prior beliefs can lead to different model conclusions. To minimize the subjective nature of priors, many advocate the use of so-called ―non-informative‖ priors in meta-analyses so that the model estimates are based solely on the RCT data collected. [24]

Choice of Meta-Analysis Model

In general, random-effects models will result in similar overall mean estimates but larger uncertainty intervals than fixed-effect models. Because of this, it is frequently considered best to always use a random-effects model as it will generate the most conservative estimate of statistical significance. However, a number of factors should be considered when choosing a model type. The first should be whether heterogeneity or variation in treatment effects among the studies is expected. If studies use very similar protocols, data collection methods, outcome definitions, patient populations and so on, a fixed-effect model would likely be appropriate. If heterogeneity of treatment effects among the studies is expected or identified, then use of a random-effects model will quantify the degree of heterogeneity, but should be accompanied by an attempt to investigate or explain possible sources of the variation such as subgroup analyses or meta-regression.

Random-effects models give more weight to small studies and less weight to large studies when generating estimates of effect, which may not be desirable in the presence of heterogeneity. [28] Sufficient data are also necessary to properly estimate the between-study variance of treatment effects across studies. When there are very few studies available, or few events in the outcome of interest, the estimate of between-study variability of treatment effect can be unreliable, and hence, it might be reasonable to use fixed-effect models. The size of the random effect parameter can indicate the appropriateness of the model selection. An RE parameter close to zero suggests that an FE model would be appropriate since a very small value indicates little heterogeneity in treatment effects. In Bayesian statistics, one can also test the appropriateness of each model using selection criteria such as the deviance information criterion (DIC). [29]

M

ETA

-A

NALYSIS

A

PPLICATIONS

The most common reasons for performing a meta-analysis are to provide an estimate of a treatment effect associated with a therapeutic intervention and to quantify and explain heterogeneity of treatment effects across studies, especially when data from a single study are insufficient and the conduct of a new, large study would be impractical. [30] Meta-analyses of clinical trials are also increasingly used to identify and evaluate potential drug safety. [31] Unless an RCT is prospectively designed and statistically powered with a particular safety outcome as its primary endpoint, the trial may not have a large enough sample size to reliably evaluate whether there is an increased risk of such event. When more than one study is

(6)

available, meta-analyses can improve the ability to detect and characterize risks of AEs that occur at low rates. [30]

Cumulative meta-analysis can illustrate important gaps between accumulated clinical trial evidence and treatment recommendations or routine clinical practice, [32] identify developing trends in therapeutic efficacy, and guide the planning of future trials. Cumulative meta-analysis is repeated meta-analysis of a set of studies updating the set as new studies are published. [27] One sequential analysis of RCTs published from 1959 to 1988 included two large studies published in 1986 and 1988, both of which demonstrated favorable effects of the use of intravenous streptokinase for the treatment of acute myocardial infarction. [33] By sequentially including studies, the cumulative meta-analysis first showed a consistent, statistically significant reduction in mortality in 1973 after only eight trials had been completed. The addition of the two large RCTs into the analysis in 1986 and 1988 had little-to-no effect on the treatment effect, but narrowed the 95% confidence interval. The United States Food and Drug Administration did not approve this drug until 1988, after the first large-scale trial had been undertaken and 15 years after a cumulative meta-analysis indicated that the treatment could have been effective. [34]

Before a new treatment can be marketed in a country, it must be approved by the respective drug regulatory agency. Relevant evidence to support the safety, efficacy, and value of the new product is contained within reports on a number of individual studies and reports, and synthesis of all of these sources is necessary to support claims of drug safety, efficacy, and value. [6] Pooled estimates from meta-analyses can identify optimal dosing or bioequivalence for new therapies [34] or assist in drug planning and development by serving as inputs into a decision analysis or cost-effectiveness analyses designed to support claims of safety, efficacy, and value. [35, 36]

Meta-analyses are frequently used in comparative effectiveness research which compares the relative benefits and harms among a range of available treatments or interventions for a given condition. [37] Comparative effectiveness is a growing field of research as decision makers are often faced with more than one viable treatment option and are increasingly asking whether a new medicine is more effective than the existing options rather than whether it is effective at all. However, answering this new question is difficult within clinical trials as there may be a large number of treatment alternatives and clinical trials comparing each and every combination are unlikely. Network meta-analysis methods are frequently used in comparative effectiveness research in such cases where there is little or no evidence from direct head-to-head clinical trial comparisons.

N

ETWORK

M

ETA

-A

NALYSIS

Comparisons between treatments included in RCTs, often referred as ‗direct evidence‘, is considered a highly reliable source of evidence for healthcare decision making. Ideally, an RCT would compare all relevant comparators in a disease area in order to understand the relative effects of all treatments and to simplify decision making. However, this is generally impractical and new treatments are most often compared to a standard of care or placebo, resulting in lack of direct evidence between new or newer treatments.

(7)

A traditional pairwise meta-analysis typically involves a comparison of effects between two treatments. In the absence of direct head-to-head evidence between two treatments of interest, a network meta-analysis (NMA) can be conducted if both treatments have been compared to a common comparator. [25, 38-42] The estimate of treatment effect obtained from such an analysis is referred to as ‗indirect evidence.‘ That is, an indirect estimate of the effect of treatment A over B can be obtained by comparing trials of A vs. C and B vs. C. Extending this concept, NMAs can also allow simultaneous comparison of more than two treatments. Formally, NMA can be defined as a statistical combination of all available evidence for an outcome from several studies across multiple treatments [25, 39] to generate estimates of pairwise comparisons of each intervention to every other intervention within a network.

Evidence Networks

Network meta-analyses are so named because all of the treatments analyzed are connected to every other treatment via a network of RCT comparisons, sometimes referred to as the ‗evidence network.‘ In the evidence network, each treatment is depicted as a node and the RCTs containing the treatments are represented as lines connecting the nodes. Figure 2 shows examples of networks of varying complexity.

To convey more information on the available clinical trials, the size of each node can be made proportional to the number of patients receiving that treatment and thickness of the lines between treatments can be proportional to the number of available RCTs informing the comparison.

Figures 2a-2c demonstrate networks that generate indirect evidence because all treatment nodes are connected through a common comparator (treatment A), allowing indirect comparisons between treatments not having direct RCT data available (such as B vs. C, C vs. F, B vs. F). [25] Analyses of these networks are commonly referred to as indirect treatment comparisons (ITC).

(8)

Figure 3. Classifications of meta-analyses.

Figures 2d and 2e include identical treatments and RCT comparisons as Figures 2a and 2b but include additional trials, BC and CD, which form ―closed loops‖ where direct and indirect evidence are analyzed (i.e., ABC and ACD). An analysis combining both types of data are often referred to as mixed treatment comparisons (MTCs) and inclusion of both types of evidence can help strengthen the precision of treatment effects between a pair of treatments in the network. [25, 43]

The terms NMA, ITC, and MTC are often used interchangeably. Technically, network meta-analysis is a broader concept and can be used whenever the evidence base consists of two or more trials connecting three or more treatments. [25] ITCs and MTCs can be considered as sub-classifications within NMAs (Figure 3).

M

ETHODOLOGY AND

A

NALYSIS

Indirect Comparison Model of Two Treatments

Lack of direct treatment comparison data has been a persistent problem in the medical literature. To overcome this lack of data, some researchers have pooled findings from only the active treatment arms of the original controlled trials. [38, 44, 45] However, using the data in this way ―breaks randomization‖ and fails to separate the efficacy of the drugs from possible placebo effects. [25] Also, such pooled responses fail to account for the differences in underlying baseline characteristics, resulting in biased estimates. [25, 38, 46].

Bucher and colleagues proposed a model for making indirect comparisons of two treatments while preserving randomization of the originally assigned patient groups. [38] Under the assumption that the magnitude of the treatment effect is constant regardless of any

(9)

differences in the populations‘ baseline characteristics, it was possible to generate an unbiased estimate of treatment effect.

Consider a situation where separate studies have compared treatments A vs. B and treatments A vs. C. For a binary outcome measure, suppose that the probability of an event for patients on treatment A, B, and C is PA, PB, and PC, respectively. The treatment effect for

each of the trials can be assessed through the odds ratios,

and .

A summary odds ratio of indirect comparison B vs. C can be computed by taking the ratio of the odds ratios from studies comparing A vs. B and A vs. C.

To allow for parametric hypothesis testing (H0: ORBC = 1), a natural log transformation

(ln) of the above yields

ln(ORBC) = ln(ORBA) – ln(ORCA),

from which the variance can be estimated because ORCA and ORBA are estimated from

different studies and are statistically independent. Var(ln ORBC) = Var(ln ORCA) + Var(ln ORBA)

This method is advantageous because of its simplicity and wide applicability in the case of two treatments compared against a common comparator. The model also protects against some biases, but may still lead to some inaccuracies. The authors tested their proposed model in a meta-analysis of RCTs that compared two experimental and one standard regimen for primary and secondary prevention of Pneurnocystis carinii pneumonia in HIV infection. Results for indirect estimates were in the same direction as the direct observed data, but a difference remained in the magnitude of treatment effect between direct and indirect data, suggesting that this model may indeed protect against some sources of bias, but yet remains at risk for some inaccuracies. Potential differences between study populations and definition or measurement of outcomes were thought to be key sources of bias. [38]

Indirect Comparison Methods for Two or More Treatments

When direct data of treatments are sparse, some studies have included pairwise meta-analyses of several treatments compared to placebo and qualitatively compared the results of those. [47, 48] Although accurate to some degree, such a naïve ―indirect‖ comparison can be misleading as they can neither generate a relative effect measure with associated uncertainty (without also performing an additional indirect comparison such as the Bucher method) nor

(10)

incorporate any available direct data. In these cases an NMA is the most appropriate and informative approach.

For a network including loops of RCTs such as the ABC network in Figure 2d, the statistical model must reflect the mathematical relationships between the treatment effects within RCTs containing direct or indirect evidence. [25, 42] The basic premise of the calculations behind MTC analyses are similar to that of the Bucher method described above. One treatment within the network is designated the reference or base treatment, b, against which all others are estimated. Standard convention is to select b to be placebo, some other standard of care, or the most-studied treatment within the evidence network. [24]

The effects for some outcome dXb (e.g., log odds ratio, log hazard ratio, risk difference,

mean difference) are then calculated for each trial, where X is any treatment compared to b. RCTs not containing treatment b (e.g., treatment C vs. treatment D) are specified in the model as dDC. Because both treatments C and D must somehow connect to b within the evidence

network, dDC can be expressed as dDb – dCb. If trials comparing C vs. b and/or D vs. b are also

within the network, then all CD, Cb, and Db trials jointly estimate the dDband dCbparameters.

If no Cb or Db trials are included, the parameters are estimated via other trials connecting C and D to b. A simple example of this is shown in Figure 4. In summary, NMA models generate the relative effect estimates of each intervention relative to the reference treatment b which is informed by trials comparing those treatments to b in addition to trials comparing other treatments.

Network meta-analyses can also be performed with either fixed-effect or random-effects models, using a frequentist or Bayesian approach, analogous to pairwise analyses. In practice, frequentist methodologies are often used for traditional pairwise meta-analyses and Bayesian methods are preferred for complex evidence networks. Bayesian NMAs provide joint posterior distributions of the effects of all treatments, which are particularly useful for sensitivity analysis within decision models or cost-effectiveness model utilizing NMA data. They also provide a probability calculation that allows rank-ordering the interventions, i.e., the probability that a particular drug is best, second best, third best, and so on. Hence, Bayesian analyses provide information that is directly relevant to decision makers. [25]

Figure 4. Indirect and direct effect estimations within an network meta-analysis (NMA). Reprinted from

Value in Health, 17(2), Jansen J et al. ―Indirect Treatment Comparison/Network Meta-Analysis Study

Questionnaire to Assess Relevance and Credibility to Inform Health Care Decision Making: An ISPOR-AMCP-NPC Good Practice Task Force Report‖, 157-73, 2014, with permission from Elsevier.

(11)

Extension of the Network

An NMA will typically include only the treatments of interest for the research objective. Under circumstances when it is not possible to form a connected network of these treatments based on the available randomized data, it may be required to introduce additional treatments so that a connected network can be formed. [49] For example, if there are three treatments of interest (A, B, and C) and treatments A and B have been compared in an RCT but treatment C has not been compared to either A or B, C will not be included in the network. In such a case, an additional treatment X should be included if there are AX and CX trials to connect C with treatments A and B. [49, 50] If no trials exist to connect the network, an informed assumption between the disconnected trial(s) and the rest of the network may be made, with the caveat that the quality of comparisons made between treatments ―connected‖ by the assumption will depend on the validity of the data imputed into the model. Comparisons within the fully-connected network, however, will be unaffected.

A consideration of extending a network does not have to be restricted in cases of disconnected networks. Even if the treatments of interest are connected, including additional treatments does have potential advantages such as providing additional evidence and strengthening inference, producing a more robust analysis which is less sensitive to specific sources of data, and the ability to check consistency more thoroughly. [49, 51] However, there are potential disadvantages as well. There is an increased danger of introducing effect modifiers as the ―remotely‖ connected treatments are more likely to have been trialed on relatively different patient populations. There may be a particular danger in extending the network to include treatments that were trialed earlier, particularly if date of publication is associated with different patient characteristics or severity of condition. [49]

Assumptions in Network Meta-Analysis

Network meta-analyses combine data of multiple interventions across several RCTs to synthesize estimates of relative treatment effects to generate pairwise comparisons. The validity and accuracy of estimates from NMAs depend on the requirement that trials in the network are sufficiently comparable and similar to yield meaningful unbiased estimates. [25, 52] Three assumptions underlie NMA methodology and should always be tested. [53]

Homogeneity assumes that there is no significant variation (or if present, it is due to random chance) in treatment effects among studies of the same comparison. In other words, are all AB trials (and, separately, AC trials) ―comparable‖ and estimating the same treatment effect? This assumption is as applicable to network analyses as it is in pairwise meta-analyses. Homogeneity can be assessed separately for each collection of identical comparisons within the network using standard statistical measures, such as Q-statistic or I2 (or both). [52, 54, 55] If heterogeneity exists, then the possible sources should be explored and implementation of random-effects modeling, sensitivity analyses, subgroup analyses, or meta-regression should be considered if sufficient data are available.

Trial evidence may be homogenous within certain pairwise comparisons, but significant variation in trial characteristics across different comparisons within a network can still lead to biased estimates. This leads to the assumption of similarity that requires all trials included within a network to be ―comparable‖ in terms of key factors that can be potential treatment

(12)

effect modifiers (such as patient baseline characteristics, trial design, outcome definition and/or measurement, and follow-up time). Similarity – which cannot be formally tested and verified – can be gauged (though not proven) through quantitative techniques (sensitivity analysis, meta-regression, subgroup analysis) and assessed qualitatively using summary tables documenting relevant baseline characteristics of patients and description of studies. The assumption of similarity is not violated if differences in baseline or study characteristics between trials do not modify or influence treatment effect. It is only when such characteristics are treatment effect modifiers that the estimated treatment effect becomes biased.

When direct and indirect evidence are combined for a particular comparison, it is assumed that there is agreement between the direct and indirect comparisons. [25, 56-59] This assumption is termed consistency, and it should be assessed in every NMA. Figure 2d shows a simple closed loop network, where both direct and indirect evidence is possible for all pairwise comparisons. For example, an estimate of effect for B vs. C can be obtained directly from the BC trial, and can also be estimated indirectly from AC and AB trials. For this loop to be consistent, the direct estimate should be equivalent to the indirect estimate (i.e. dBC =

dBA – dCA). Of note, consistency is a property of closed loops of evidence, and not individual

comparisons. [25, 50, 52] It is possible to say that AB, BC and AC comparisons are consistent, whereas stating that AB comparison is consistent with AC comparison has no meaning.

Inconsistencies can be caused by differences in treatment effect modifiers among the studies within a loop. Although three independent studies forming a closed loop of evidence are unlikely to generate exact equality within a consistency evaluation, there are several published methods for evaluating acceptable ranges of consistency. [38, 39, 50, 59]

M

ETA

-A

NALYSIS

:

C

HALLENGES AND

O

PPORTUNITIES

Study Comparability

One of the most common criticisms of meta-analyses is the ―apples and oranges‖ phenomenon in that all RCTs on which they are based are different and, as such, inherently cannot be combined as if they are. Although it is true that studies available for pooling may vary with respect to design, quality, outcome measures or populations, well-designed meta-analyses can minimize the effects of these variations by preparing a protocol including well-defined criteria and objectives for including studies determined to be sufficient similar for comparison. [34] Ironically, this criticism reflects one of the strengths of a meta-analysis. By combining data from different studies of different populations (provided it is clinically reasonable to do so), the results of a meta-analysis can be considered to be generalizable to a broader population of patients than any individual RCT.

Of course, every study may not be similar for a number of reasons including different patient populations, outcome definitions, study-level covariates, patient-level covariates, and so on. Another critique is that without individual patient data (IPD) to adjust for differences among the patients, results of a meta-analysis cannot be valid. Obtaining the IPD from each study and pooling them into a single analysis is the ideal meta-analytic combination. Some have argued that in order to be valid, meta-analyses must include IPD. The rationale behind

(13)

this position is that individual studies are not identical and differing patient characteristics can influence study outcomes.

Meta-regression techniques may mitigate these differences by relating the size of the study effect to one or more characteristics of the trial. [60, 61] Meta-regression is an approach commonly used to address heterogeneity of effect between studies and is a hypothesis-generating technique, not to be regarded as a proof of causality. [62] One of the largest barriers to effective meta-regression is lack of power to detect an association between covariates and effects. This is particularly true if a meta-analysis contains a small number of studies or is an NMA including a number of treatments. In the latter case, the analysis must include a sufficient number of studies per treatment to estimate the individual treatment effect as well as the covariate effect, which is often difficult. The other substantial barrier to meta-regression is that the trial-level interaction between estimate and covariate may not always accurately reflect the within-trial relationship due to ecological bias or aggregation bias, the expected difference in effects between groups and individuals due to confounding in differences in other unaccounted for effect modifiers. [31, 62]

Individual Patient Analysis

For these reasons, clinicians generally prefer treatment-covariate interaction estimates to be based on within-trial data as it can relate patients‘ clinical characteristics directly to their treatment responses. [63] Introduction of IPD for at least one of the studies can reduce the risk of underestimation of the covariate effect and aggregation bias. In meta-analyses where IPD is available for all studies included, the relationship between covariate(s) and effects can be well-estimated. However, IPD for all trials is generally not attainable and IPD for only one or a few is more likely. In these cases, the relationship between covariate and treatment effect can be estimated from the trials with available IPD and imputed on the other trials, providing an overall meta-analytic estimate with reduced risk of aggregation bias. [63, 64] Of note, for analyses of trials with no differences in treatment effect modifying characteristics, thus not requiring IPD, including it improves the precision but provides the same answer as the corresponding aggregate data analysis when no effect modifying covariates are considered. [31]

Although labor-intensive with respect to acquiring and analyzing the data, there are several additional advantages to including IPD within meta-analyses beyond that of more precise estimation of treatment effects as a function of patient-level covariates. First, access to IPD for trials allows for re-adjustment of variables to create common variables, thereby expanding the network of studies suitable for inclusion into the analysis. Re-adjustment can be useful in situations of different clinical definitions to define adverse events or when outcomes are based on a combination of variables that may collectively define a specific event. [31] The ability to re-analyze data is also important for outcomes that are dependent on exposure time or length of follow-up. Inclusion of IPD also permits greater flexibility in the analysis of subgroups data, defined by patient-level characteristics. Subgroup data are often not reported or not reported with standardization sufficient for combination within the meta-analysis.

When two treatments of interest are compared to a common treatment in two different trials, the Bucher method described above is a straight-forward method to generate an indirect

(14)

comparison. However, in order to generate unbiased estimate, all patient characteristics known to influence treatment effect must be equal or assumed to be equal. If the distribution of important characteristics is unequal between the trials and IPD is available for one, adjustment of the IPD has been proposed by Signorovitch et al. [65] so that the two trial populations become suitably comparable. In this method, the IPD from one trial are re-weighted to match the average baseline characteristics reported from the other trial.

Study Biases

An implicit assumption regarding the statistical aggregation of a collection of study effects is that the identified collection is an unbiased random sample of the total universe of data. For several reasons, this assumption may not be true. The published scientific literature document only a proportion of the result of all research carried out. Some studies could be proprietary data owned by the manufacturer and unavailable to the general population. Others may not be published because their authors chose not to write and submit studies with uninteresting findings or journals may have chosen not to accept them. [34] For example, there is evidence suggesting studies with significant outcomes are more likely to be published and published more quickly than those with non-significant outcomes. [66] Completely overcoming the possibility of publication bias is difficult, but carrying out as comprehensive a search as possible when obtaining studies for synthesis will help minimize its influence.

A final limitation of meta-analyses is that no algorithm for the combination of study-level effects can correct for qualitative flaws of the studies themselves, leading to potential bias in the meta-analysis outcomes. Analysts must be careful to understand and identify any possible quality defects within the studies and understand their potential influence on the model outcomes (via sensitivity analysis or subgroup analysis, for example). Several scales have been developed for assessing the quality of randomized trials, [67] although little evidence supports an association between to the magnitude of the treatment effect and the quality of the study. [68] Nevertheless, by focusing attention on the quality of RCTs, meta-analyses has probably provided a powerful stimulus for improving their conduct and reporting. [34, 69]

Network Meta-Analysis Challenges

NMAs are necessary when more than two treatments of interest are compared in a number of trials. However, adoption of NMAs has been hindered by their relative complexity compared to standard pairwise analyses. The statistical concepts behind calculation of the treatment effects among the comparators are not immediately intuitive, particularly if the network of clinical trials is complex. Further, NMAs are frequently executed using Bayesian methodology, which may be unfamiliar to many clinicians or other decision makers. In a recent survey, selected samples of authors of Cochrane reviews were asked about their knowledge and view of indirect treatment comparison methods. [70] Their responses highlighted the usefulness of the methods but also expressed caution about their validity and the need to know more about the methods. It was evident that clinicians were more reserved in considering the use of indirect evidence in decision making, compared to academic researchers.

(15)

Conducting a network meta-analysis is often perceived to be more resource consuming than a traditional pairwise meta-analysis. Additionally, the underlying assumptions in a network meta-analysis, especially similarity and consistency, are often a source of skepticism towards these methods. Authors don‘t often present explicit evidence supporting these assumptions, adding to the nervousness of the readers, especially due to potential differences in treatment effect modifiers among trials in terms of effect modifiers. [53] Also, there have been recent findings that significant inconsistency between direct and indirect estimates may be more prevalent than previously observed, [71] which adds to the concern. In order to overcome these concerns, it is essential for all studies to adopt standardized methods to assess the assumptions and report the assessment methods applied. [70]

A proxy for unmeasured but important patient-level characteristics which may be a potential source of meta-analysis heterogeneity and inconsistency is baseline risk. Baseline risk reflects the underlying burden of disease in the population and is the average risk of the outcome of interest had the patients been left untreated. [16] Trials with varying levels of risk can results in different treatment outcomes and correcting for these differences may increase validity of model results. Achana et al. describe methods to adjust for baseline risk in pairwise meta-analyses and extend these methods to NMAs. [72]

Multivariate Meta-Analyses

Meta-analyses that consider multiple outcomes usually do so in separate analyses. However, because multiple endpoints are usually correlated, a simultaneous analysis that takes their correlation into account should add efficiency and accuracy [73, 74]. Multivariate meta-analysis can describe the associations between different estimates of effect and provide estimates with smaller uncertainty intervals. Further, results of these analyses include joint confidence regions of the outcomes, which is particularly useful in making predictions of each outcome and for use within cost-effectiveness analyses [35, 75, 76]. The difficulty with multivariate meta-analysis models is that they are more complex to execute and may require from each study the within-study correlation between the outcomes of interest, which are often not reported.

C

ONCLUSION

Meta-analyses offer several benefits including ability to address uncertainty and heterogeneity when results of multiple RCTs disagree, increasing statistical power for outcomes and subgroups, and leading to new knowledge and formulation of new research questions [34]. As the number of clinical trials has proliferated in recent years, the value of meta-analyses to synthesize and organize clinical data will likely increase. Therefore, researchers must learn to conduct high-quality studies and clinicians and decision makers must be able to critically assess the value and reliability of meta-analyses in order to have confidence in the results. Several guides, scales, and checklists are also available to assess the quality of reviews, [77-79] meta-analyses, [52, 80, 81] and the recommendations informed by meta-analyses. [82] When executed and used correctly, meta-analysis can be a powerful tool

(16)

and will likely have an increasingly important role in medical research, drug regulation, public policy, and clinical practice.

R

EFERENCES

[1] Cooper HH, L.V. Research synthesis as a scientific process. In: Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-Analysis. 2nd ed. New York: Russell Sage Foundation; 2009.

[2] Glass GV. Primary, Secondary, and Meta-Analysis of Research. Educational Researcher 1976; 5: 3-8.

[3] O'Rourke K. An historical perspective on meta-analysis: dealing quantitatively with varying study results. J. R. Soc. Med. 2007; 100: 579-82.

[4] Pearson K. Report on Certain Enteric Fever Inoculation Statistics. BMJ 1904; 2: 1243-6.

[5] Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, Rennie D, Moher D, Becker BJ, Sipe TA, Thacker SB. Meta-analysis of observational studies in epidemiology: a proposal for reporting. Meta-analysis Of Observational Studies in Epidemiology (MOOSE) group. JAMA 2000; 283: 2008-12.

[6] Jones DR. Meta-analysis: weighing the evidence. Stat. Med. 1995; 14: 137-49.

[7] Elwood PC, Cochrane AL, Burr ML, Sweetnam PM, Williams G, Welsby E, Hughes SJ, Renton R. A randomized controlled trial of acetyl salicylic acid in the secondary prevention of mortality from myocardial infarction. BMJ 1974; 1: 436-40.

[8] Elwood P. The first randomized trial of aspirin for heart attack and the advent of systematic overviews of trials. J. R. Soc. Med. 2006; 99: 586-8.

[9] Chalmers TC, Matta RJ, Smith H, Jr., Kunzler AM. Evidence favoring the use of anticoagulants in the hospital phase of acute myocardial infarction. New Engl. J. Med. 1977; 297: 1091-6.

[10] Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog. Cardiovasc. Dis. 1985; 27: 335-71.

[11] Pandis N. The evidence pyramid and introduction to randomized controlled trials. Am. J. Orthod. Dentofacial. Orthop. 2011; 140: 446-7.

[12] Berman NG, Parker RA. Meta-analysis: neither quick nor easy. BMC Med. Res. Methodol 2002; 2: 10.

[13] Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gotzsche PC, Ioannidis JP, Clarke M, Deveraux PJ, Kleijnen J, Moher D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Ann. Intern. Med. 2009; 151: W65-94.

[14] Sacks HS, Berrier J, Reitman D, Ancona-Berk VA, Chalmers TC. Meta-analyses of randomized controlled trials. New Engl. J. Med. 1987; 316: 450-5.

[15] Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-analysis. 2nd edition. New York: Russell Sage Foundation; 2009.

[16] Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]: The Cochrane Collaboration; 2011.

(17)

[17] Borenstein M. Introduction to Meta-Analysis. Chichester: John Wiley & Sons; 2009. [18] Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective

studies of disease. J. Nat. Cancer Inst. 1959; 22: 719-48.

[19] Wolf FM. Meta-analysis: quantitative methods for research synthesis: New York: Sage Publications; 1986.

[20] DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin. Trials. 1986; 7: 177-88.

[21] Petitti DB. Meta-analysis, decision analysis, and cost-effectiveness analysis: methods for quantitative synthesis in medicine. 2nd ed. New York: Oxford University Press; 2000.

[22] Shadish WR, Haddock CK. Combining estimates of effect size. In: Cooper H, Hedges LV, Valentine JC, editors. The Handbook of Research Synthesis and Meta-Analysis. 2nd ed. New York, NY: Russell Sage Foundation; 2009.

[23] Sutton AJ, Abrams KR, Jones DR, Sheldon TA, Song F. Methods for Meta-Analysis in Medical Research. London: John Wiley & Sons; 2000.

[24] Dias S, Sutton AJ, Ades AE, Welton NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med. Decis Making 2013; 33: 607-17.

[25] Jansen JP, Fleurence R, Devine B, Itzler R, Barrett A, Hawkins N, Lee K, Boersma C, Annemans L, Cappelleri JC. Interpreting indirect treatment comparisons and network meta-analysis for health-care decision making: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 1. Value Health 2011; 14: 417-28.

[26] Parker M. Foundations of Statistics – Frequentist and Bayesian 2004 2/2/2014. Available from: http://www.austincc.edu/mparker/stat/nov04/talk_nov04.pdf (last accessed on February 9, 2014).

[27] Sutton AJ, Higgins JP. Recent developments in meta-analysis. Stat. Med. 2008; 27: 625-50.

[28] Lau J, Ioannidis JP, Schmid CH. Summing up evidence: one answer is not always enough. Lancet 1998; 351: 123-7.

[29] Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2002;64:583-639.

[30] US Food and Drug Administration CDER2013139 White Paper2013 1/21/2014. Available from: www.fda.gov/downloads/Drugs/NewsEvents/UCM372069.pdf (last accessed on February 9, 2014).

[31] Berlin JA, Crowe BJ, Whalen E, Xia HA, Koro CE, Kuebler J. Meta-analysis of clinical trial safety data in a drug development program: answers to frequently asked questions. Clin Trials 2013; 10: 20-31.

[32] Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC. A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts: Treatments for myocardial infarction. JAMA 1992; 268: 240-8.

[33] Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC. Cumulative Meta-Analysis of Therapeutic Trials for Myocardial Infarction. New Engl. J. Med. 1992; 327: 248-54.

(18)

[34] Cappelleri JC, Ioannidis JPA, Lau J. Meta-Analysis of Therapeutic Trials. In: Chow S-C, editor. Encyclopedia of Biopharmaceutical Statistics. New York: Informa Healthcare; 2010.

[35] Dias S, Sutton AJ, Welton NJ, Ades AE. Evidence synthesis for decision making 6: embedding evidence synthesis in probabilistic cost-effectiveness analysis. Med. Decis Making 2013; 33: 671-8.

[36] Ades AE, Sculpher M, Sutton A, Abrams K, Cooper N, Welton N, et al. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics 2006; 24: 1-19.

[37] Cappelleri JC, Network meta-analysis for comparative effectiveness research. Presented at 19th Annual Biopharmaceutical Applied Statistics Symposium, Savannah, GA.,2012. [38] Bucher HC, Guyatt GH, Griffith LE, Walter SD. The results of direct and indirect treatment comparisons in meta-analysis of randomized controlled trials. J. Clin. Epidemiol. 1997; 50: 683-91.

[39] Lumley T. Network meta-analysis for indirect treatment comparisons. Stat. Med. 2002; 21: 2313-24.

[40] Song F, Altman DG, Glenny AM, Deeks JJ. Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published meta-analyses. BMJ 2003; 326: 472.

[41] Salanti G, Higgins JP, Ades AE, Ioannidis JP. Evaluation of networks of randomized trials. Stat. Methods Med. Res. 2008; 17: 279-301.

[42] Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat. Med. 2004; 23: 3105-24.

[43] Caldwell DM, Ades AE, Higgins JP. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ 2005; 331: 897-900.

[44] Felson DT, Anderson JJ, Meenan RF. The comparative efficacy and toxicity of second-line drugs in rheumatoid arthritis. Results of two meta-analyses. Arthritis. Rheum. 1990; 33: 1449-61.

[45] O'Brien BJ, Anderson DR, Goeree R. Cost-effectiveness of enoxaparin versus warfarin prophylaxis against deep-vein thrombosis after total hip replacement. CMAJ 1994; 150: 1083-90.

[46] Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews. BMJ 2009; 338: b1147.

[47] Chou R, Carson S, Chan BK. Gabapentin versus tricyclic antidepressants for diabetic neuropathy and post-herpetic neuralgia: discrepancies between direct and indirect meta-analyses of randomized controlled trials. J. Gen. Intern. Med. 2009; 24: 178-88.

[48] Chapple CR, Khullar V, Gabriel Z, Muston D, Bitoun CE, Weinstein D. The effects of antimuscarinic treatments in overactive bladder: an update of a systematic review and meta-analysis. Eur. Urol. 2008; 54: 543-62.

[49] Dias S, Welton NJ, Sutton AJ, Ades AE. Evidence synthesis for decision making 1: introduction. Med. Decis Making 2013; 33: 597-606.

[50] Dias S, Welton NJ, Sutton AJ, Caldwell DM, Lu G, Ades AE. Evidence synthesis for decision making 4: inconsistency in networks of evidence based on randomized controlled trials. Med. Decis. Making 2013; 33: 641-56.

(19)

[51] Cooper NJ, Peters J, Lai MC, Juni P, Wandel S, Palmer S, Paulden M, Conti S, Welton NJ, Abrams KR, Bujkiewicz S, Spiegelhalter D, Sutton AJ. How valuable are multiple treatment comparison methods in evidence-based health-care evaluation? Value Health 2011; 14: 371-80.

[52] Hoaglin DC, Hawkins N, Jansen JP, Scott DA, Itzler R, Cappelleri JC, Boersma C, Thompson D, Larholt KM, Diaz M, Barrett A. Conducting indirect-treatment-comparison and network-meta-analysis studies: report of the ISPOR Task Force on Indirect Treatment Comparisons Good Research Practices: part 2. Value Health 2011; 14: 429-37.

[53] Donegan S, Williamson P, Gamble C, Tudur-Smith C. Indirect comparisons: a review of reporting and methodological quality. PloS one 2010; 5: e11054.

[54] Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat. Med 2002; 21: 1539-58.

[55] Cochran WG. The combination of estimates from different experiments. Biometrics 1954; 10: 101-29.

[56] Cooper NJ, Sutton AJ, Morris D, Ades AE, Welton NJ. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Stat. Med. 2009; 28: 1861-81.

[57] Salanti G, Kavvoura FK, Ioannidis JP. Exploring the geometry of treatment networks. Ann. Intern. Med. 2008; 148: 544-53.

[58] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat. Med. 2010; 29: 932-44.

[59] Lu G, Ades AE. Assessing Evidence Inconsistency in Mixed Treatment Comparisons. J. Am. Stat. Assoc. 2006; 101: 447.

[60] Thompson SG, Sharp SJ. Explaining heterogeneity in meta-analysis: a comparison of methods. Stat. Med. 1999; 18: 2693-708.

[61] Christensen R. Beyond RefMan: Meta-regression analysis in context: The Cochrane Collaboration; 2014. Available from: http://musculoskeletal.cochrane.org/sites/ musculoskeletal.test.cochrane.org/files/uploads/RC_Lecture.pdf (last accessed on February 9, 2014).

[62] Baker WL, White CM, Cappelleri JC, Kluger J, Coleman CI; Health Outcomes, Policy, and Economics (HOPE) Collaborative Group. Understanding heterogeneity in meta-analysis: the role of meta-regression. Int. J. Clin. Pract. 2009; 63: 1426-34.

[63] Riley RD, Lambert PC, Staessen JA, Wang J, Gueyffier F, Thijs L, Boutitie F. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat. Med. 2008; 27: 1870-93.

[64] Jansen JP, Capkun-Niggli G, Cope S. ―Incorporating patient level data in a network meta-analysis‖ Presented at ISPOR 18th Annual International Meeting, New Orleans, LA, 2013.

[65] Signorovitch JE, Wu EQ, Yu AP, Gerrits CM, Kantor E, Bao Y, Gupta SR, Mulani PM. Comparative effectiveness without head-to-head trials: a method for matching-adjusted indirect comparisons applied to psoriasis treatment with adalimumab or etanercept. Pharmacoeconomics 2010; 28: 935-45.

(20)

[66] Sutton AJ. Publication bias. In: Valentine JCC, Harris; Hedges, Larry V., editor. The Hand of Research Synthesis & Meta-Analysis. 2nd ed. New York, NY: Russell Sage Foundation; 2009.

[67] Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin. Trials 1995; 16: 62-73.

[68] Balk EM, Bonis PA, Moskowitz H, Schmid CH, Ioannidis JP, Wang C, Lau J. Correlation of quality measures with estimates of treatment effect in meta-analyses of randomized controlled trials. JAMA 2002; 287: 2973-82.

[69] Halpern SD, editor. Evidence-based Obstetric Anesthesia. Oxford: Blackwell Publishing; 2007.

[70] Abdelhamid AS, Loke YK, Parekh-Bhurke S, Chen Y-F, Sutton A, Eastwood A. Use of indirect comparison methods in systematic reviews: a survey of Cochrane review authors. Research Synthesis Methods 2012; 3: 71-9.

[71] Song F, Xiong T, Parekh-Bhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland R, Chen YF, Glenny AM, Deeks JJ, Altman DG. Inconsistency between direct and indirect comparisons of competing interventions: meta-epidemiological study. BMJ 2011; 343: d4909.

[72] Achana FA, Cooper NJ, Dias S, Lu G, Rice SJ, Kendrick D, Sutton AJ. Extending methods for investigating the relationship between treatment effect and baseline risk from pairwise meta-analysis to network meta-analysis. Stat. Med. 2013; 32: 752-71. [73] Jackson D, Riley R, White IR. Multivariate meta-analysis: Potential and promise. Stat.

Med. 2011; 30: 2481-98.

[74] Riley RD. Multivariate meta-analysis: the effect of ignoring within-study correlation. Journal of the Royal Statistical Society: Series A (Statistics in Society) 2009; 172: 789-811.

[75] Riley RD, Abrams KR, Lambert PC, Sutton AJ, Thompson JR. An evaluation of bivariate random-effects meta-analysis for the joint synthesis of two correlated outcomes. Stat. Med. 2007; 26: 78-97.

[76] Nam IS, Mengersen K, Garthwaite P. Multivariate meta-analysis. Stat. Med. 2003; 22: 2309-33.

[77] Oxman AD, Guyatt GH. Validation of an index of the quality of review articles. J. Clin. Epidemiol. 1991; 44: 1271-8.

[78] Oxman AD, Guyatt GH, Singer J, Goldsmith CH, Hutchison BG, Milner RA, Streiner DL. Agreement among reviewers of review articles. J. Clin. Epidemiol. 1991; 44: 91-8. [79] Council NR. Finding What Works in Health Care: Standards for Systematic Reviews.

In: Eden J, Levit L, Berg A, Morton S, editors. Washington: The National Academies Press; 2011.

[80] Ades AE, Caldwell DM, Reken S, Welton NJ, Sutton AJ, Dias S. Evidence synthesis for decision making 7: a reviewer's checklist. Med. Decis. Making 2013; 33:679-91. [81] Jansen JP, Trikalinos T, Cappelleri JC, Daw J, Andes S, Eldessouki R, Salanti G.

Indirect treatment comparison/network meta-analysis study questionnaire to assess relevance and credibility to inform health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report. Value Health. 2014 ; 17:157-73.

[82] Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, Norris S, Falck-YtterY, Glasziou P, DeBeer H, Jaeschke R, Rind D, Meerpohl J, Dahm P, Schunemann HJ.

(21)

GRADE guidelines: 1. Introduction-GRADE evidence profiles and summary of findings tables. J. Clin. Epidemiol. 2011; 64: 383-94.

(22)

(23)

(24)