Chapter 1: Literature review
1.3 Meta analysis – theory and practice
1.3.5 Interpretation of the cumulative evidence
1.3.5.1 Quantifying and explaining heterogeneity
So far, meta-analysis has been used to estimate the average effect across a number of
studies answering a similar question. When the magnitude of the effects of each included study is similar, deriving conclusions from our estimate becomes relatively simple. On the other hand, when results are somehow different, conclusions are less clear (Higgins and Thompson, 2003). One of the primary goals of meta-analysis is to understand to what
extent the results from primary studies are consistent (or inconsistent) with each other. In this way, we are able to assess the “combinability” of the included studies.
The assessment of heterogeneity is one of the most important parts of a meta-analysis
(Higgins et al. 2003; Nakagawa and Santos, 2012). This is because the presence of true heterogeneity affects the decision of what statistical model needs to be used (Huedo-
Medina et al. (2006), therefore, it directly affects the reliability of the estimated meta-
analytic mean (Nakagawa and Santos, 2012). In general terms, the test of heterogeneity estimates whether there are true differences underlying the results from the included studies irrelevant of the sampling error. Traditionally, heterogeneity has been assessed using either
the Q-test proposed by Cochran (1954), which provides information on whether there is statistically significant heterogeneity or the I2 index defined by Higgins and Thompson
(2002), which provides information on the extent of heterogeneity (which can be categorised as small, medium or large heterogeneity; Higgins et al. 2003).
The Q statistic is calculated as the sum of weighted squared deviations of each study’s estimate from the overall effect estimate. Statistical significance of Q is tested against a chi-
square distribution with k-1 degrees of freedom, k being the number of studies (for detailed description of this test see Cochram, 1954 and Higgins et al. 2003). Hardy and Thompson (1998) demonstrated that this test has low power to detect heterogeneity when the number of included studies is too small (i.e. n = 10) and it has been suggested that a value of 0.10 is used as a cut-off for significance (Higgins et al. 2003). Higgins et al. (2003) also
demonstrated that the Q-test has too much power when there is a large number of studies. They showed that by using the Q-test to determine heterogeneity in a meta-analysis of 135
trials with over 15 000 participants, a p-value of 0.005 was obtained, suggesting significant heterogeneity in the data. They argued that this p-value does not describe the extent to which heterogeneity affects the results of the meta-analysis. Using the I2 index on the same
dataset, they showed that whilst heterogeneity was present (I2 = 26%), it was unlikely to have a major input on the estimated effects.
The I2 index is an improvement over the Q-test as a measure of heterogeneity in meta- analysis (Nakagawa and Santos, 2012). Rather than focusing on statistical significance, the
quantifying the degree of between-study variance that is not due to chance alone (Higgins
et al. 2003). Using the notation of Nakagawa and Santos (2012), the I2 index is defined by:
= +
where is the between-study variance and is the typical sampling error variance (see
Higgins and Thompson, 2002 and Nakagawa and Santos, 2012 for how to estimate , see also below). I2 values of 25%, 50% and 75% are considered low, moderate or high heterogeneity for meta-analysis purposes (Higgins et al. 2003). Nakagawa and Santos
(2012; cf. Cheung, 2014) extended the definition of I2 proposed by Higgins and Thompson (2002) to suit multilevel meta-analytic model. In this extended I2 version, the sum of all
variance components is used instead of just the between-study variance, then:
= −
where is the sum of all variance components and is the typical sampling error variance calculated as:
= ∑ ( − 1)
∑ − ∑
where is the inverse of the th measurement error variance associated with the th effect-
size estimate ( = 1, …, k).
After heterogeneity has been detected (and quantified), the next step is to find possible explanatory variables that can account for some of that variation in study results. In meta-
analysis, these explanatory variables are called moderators (i.e. covariates or categorical predictors) and have been defined a priori during the first two stages of the meta-analytic
process. In meta-analysis, moderators are used to build a mixed-effects model (that is, a
model that assumes that heterogeneity stems from both fixed and random effects), which is usually referred to as meta-regression. Unaccounted variance in the meta-analysis can be
therefore explored with meta-regression models (Harrison, 2011). Although, more complex
models require a larger sample of studies (Pigott, 2012), and whilst meta-regression is
advised to explore the conditional relationship among the selected moderators and effect-
size magnitude, it comes at a cost of loss of power (Lipsey and Wilson, 2001, Nakagawa and Santos, 2012). Nakagawa and Santos (2012) recommended meta-regression models as
the primary meta-analytic models presented in biological meta-analyses, because
heterogeneity is almost always present in biological datasets. Although, they also suggested that any meta-analytic model used for analysis should compromise between complexity and
the nature of the data, and urged to run several alternative models to confirm the robustness of the estimated results.