Quantifying and explaining heterogeneity - Interpretation of the cumulative evidence

Chapter 1: Literature review

1.3 Meta analysis – theory and practice

1.3.5 Interpretation of the cumulative evidence

1.3.5.1 Quantifying and explaining heterogeneity

So far, meta-analysis has been used to estimate the average effect across a number of

studies answering a similar question. When the magnitude of the effects of each included study is similar, deriving conclusions from our estimate becomes relatively simple. On the other hand, when results are somehow different, conclusions are less clear (Higgins and Thompson, 2003). One of the primary goals of meta-analysis is to understand to what

extent the results from primary studies are consistent (or inconsistent) with each other. In this way, we are able to assess the “combinability” of the included studies.

The assessment of heterogeneity is one of the most important parts of a meta-analysis

(Higgins et al. 2003; Nakagawa and Santos, 2012). This is because the presence of true heterogeneity affects the decision of what statistical model needs to be used (Huedo-

Medina et al. (2006), therefore, it directly affects the reliability of the estimated meta-

analytic mean (Nakagawa and Santos, 2012). In general terms, the test of heterogeneity estimates whether there are true differences underlying the results from the included studies irrelevant of the sampling error. Traditionally, heterogeneity has been assessed using either

the Q-test proposed by Cochran (1954), which provides information on whether there is statistically significant heterogeneity or the I2_{index defined by Higgins and Thompson}

(2002), which provides information on the extent of heterogeneity (which can be categorised as small, medium or large heterogeneity; Higgins et al. 2003).

The Q statistic is calculated as the sum of weighted squared deviations of each study’s estimate from the overall effect estimate. Statistical significance of Q is tested against a chi-

square distribution with k-1 degrees of freedom, k being the number of studies (for detailed description of this test see Cochram, 1954 and Higgins et al. 2003). Hardy and Thompson (1998) demonstrated that this test has low power to detect heterogeneity when the number of included studies is too small (i.e. n = 10) and it has been suggested that a value of 0.10 is used as a cut-off for significance (Higgins et al. 2003). Higgins et al. (2003) also

demonstrated that the Q-test has too much power when there is a large number of studies. They showed that by using the Q-test to determine heterogeneity in a meta-analysis of 135

trials with over 15 000 participants, a p-value of 0.005 was obtained, suggesting significant heterogeneity in the data. They argued that this p-value does not describe the extent to which heterogeneity affects the results of the meta-analysis. Using the I2 index on the same

dataset, they showed that whilst heterogeneity was present (I2_{= 26%), it was unlikely to} have a major input on the estimated effects.

The I2_{index is an improvement over the}_Q_-_{test as a measure of heterogeneity in meta}_- analysis (Nakagawa and Santos, 2012). Rather than focusing on statistical significance, the

quantifying the degree of between-study variance that is not due to chance alone (Higgins

et al. 2003). Using the notation of Nakagawa and Santos (2012), the I2_{index is defined by:}

= +

where is the between-study variance and is the typical sampling error variance (see

Higgins and Thompson, 2002 and Nakagawa and Santos, 2012 for how to estimate , see also below). I2_{values of 25%, 50% and 75% are considered low, moderate or high} heterogeneity for meta-analysis purposes (Higgins et al. 2003). Nakagawa and Santos

(2012; cf. Cheung, 2014) extended the definition of I2_{proposed by Higgins and Thompson} (2002) to suit multilevel meta-analytic model. In this extended I2 version, the sum of all

variance components is used instead of just the between-study variance, then:

= −

where is the sum of all variance components and is the typical sampling error variance calculated as:

= ∑ ( − 1)

∑ − ∑

where is the inverse of the th measurement error variance associated with the th effect-

size estimate ( = 1, …, k).

After heterogeneity has been detected (and quantified), the next step is to find possible explanatory variables that can account for some of that variation in study results. In meta-

analysis, these explanatory variables are called moderators (i.e. covariates or categorical predictors) and have been defined a priori during the first two stages of the meta-analytic

process. In meta-analysis, moderators are used to build a mixed-effects model (that is, a

model that assumes that heterogeneity stems from both fixed and random effects), which is usually referred to as meta-regression. Unaccounted variance in the meta-analysis can be

therefore explored with meta-regression models (Harrison, 2011). Although, more complex

models require a larger sample of studies (Pigott, 2012), and whilst meta-regression is

advised to explore the conditional relationship among the selected moderators and effect-

size magnitude, it comes at a cost of loss of power (Lipsey and Wilson, 2001, Nakagawa and Santos, 2012). Nakagawa and Santos (2012) recommended meta-regression models as

the primary meta-analytic models presented in biological meta-analyses, because

heterogeneity is almost always present in biological datasets. Although, they also suggested that any meta-analytic model used for analysis should compromise between complexity and

the nature of the data, and urged to run several alternative models to confirm the robustness of the estimated results.

In document Optimum nutrition of the pregnant ewe : a meta analytic approach : a thesis presented in partial fulfillment of the requirements for the degree of Doctorate of Philosophy in Animal Science at Massey University, Manawatū, New Zealand (Page 51-54)