CHAPTER 4: METHODS FOR SYSTEMATIC REVIEWS OF ASSOCIATION
4.6 Analysis and interpretation of data
In this thesis, the term ‘prognostic’ refers to strength of association between a test at birth and the odds of an adverse outcome, as measured by an odds ratio. The term ‘predictive’ refers to the ability of a test to discriminate between those babies who will and those who will not experience an adverse outcome, as measured by sensitivity,
33 specificity, and positive and negative likelihood ratios. A test may have strong
prognostic ability, but not necessarily good predictive ability, and so it is important to consider both.93
4.6.1 Data synthesis for prognostic association
The 2 x 2 tables were used to compute odds ratios (OR) and 95 % confidence
intervals (CI) for each index test-outcome pair. OR were calculated using the formula
(TP x TN)/ (FP x FN). The CI were calculated using the formula logOR +/-
1.96SE(logOR). ORs were selected as the summary statistic as they represent the effect of the exposure on the odds of having the condition in an unbiased fashion and enable the results of case- control and cohort studies to both be included.94 It is frequently used to demonstrate an epidemiologic association,94 and here it provides a measure of a test’s prognostic ability. Where possible the results for each index test and outcome measure were pooled using meta-analysis.
4.6.2 Data synthesis for predictive ability
Where there was a strong and statistically significant prognostic association between a test and an outcome measure (defined by an OR >5 ,with 95% confidence interval that did not cross 1) sensitivity, specificity and likelihood ratios (LR) were calculated, again using data from the 2 x 2 tables as follows: sensitivity TP / (TP+ FN); specificity
TN/ (FP + TN); positive likelihood ratio sensitivity/ (1-specificity); negative likelihood
ratio (1-sensitivity)/ specificity. This allowed the predictive ability of the test to be determined;95 that is, whether the test can accurately discriminate between those who do and those who do not have a poor outcome (as measured by sensitivity and
34 specificity), and how much a positive or negative test result modifies the odds of a poor outcome (as measured by the positive and negative likelihood ratios).
4.6.3 Assessment of heterogeneity
OR data was plotted in forest plots and the between-study heterogeneity in the prognostic association for each test was assessed visually and by estimating I2 (the amount of variability in prognostic effects due to between-study heterogeneity)96 and tau-squared (an estimate of between study variance).97
Where significant heterogeneity (defined as I2 >50%) was present, the reason for heterogeneity was explored using meta-regression (where the number of studies included in a particular meta-analysis was ≥ 10),planned a priori in keeping with published recommendations.96 Where meta-regression was significant, or if meta- regression was not possible, subgroup analyses were performed to explore the effect on results and heterogeneity. Pre-defined categories included:
1. Study quality (high quality versus low or medium quality)
2. Population characteristics (including gestational age, year of birth of study population, risk factors for adverse outcomes)
3. Study setting (including country of origin where standards of care thought to be similar: USA/ Europe/ Australia and New Zealand versus others)
Factors thought to be relevant to particular index tests and their relationship with the outcome measures in question are reported in the relevant chapter.
35
4.6.4 Meta-analysis
Meta-analysis was performed where two or more studies used the same index test and outcome measure. In each study when a table contained cells with a value of 0, 0.5 was added to all cells to allow the calculation of log ORs and their variances for meta-analysis.98 The primary outcomes were considered to be neonatal mortality, and composite measures of neonatal, childhood and adult morbidity. A composite outcome measure for morbidity was employed to maximise the number of events that could be included in the analysis and avoid the need to select a single morbidity as a primary outcome measure. However, a hazard of composite outcome measures is the assumption that the significance of the result applies to all components.99 To address this issue, the component outcomes were analysed as subgroups where possible. When the composite outcome measure was used, care was taken to ensure that each individual was only counted once in each analysis, particularly where studies reported multiple outcomes for a single population. Where multiple outcomes were reported, attempts were made to select the outcome most consistent with other studies within the meta-analysis.
Due to the expected presence of clinical and statistical heterogeneity between studies, a random effects model was used throughout, which dichotomises the log OR estimates for each test and weights each study by the inverse of the study’s variance plus between-study variance. When this method is used to calculate odds ratios, it provides a summary estimate of the average prognostic effect of a test.100 As a test’s prognostic ability may vary from this average from setting to setting, after each random-effects meta-analysis ,if I2 was greater than 0%, a prediction interval (EPI) was calculated to reveal the potential prognostic association if the test is
36 applied in a single setting similar to one of the studies from the analysis.101 This was calculated where three or more studies were included in the meta-analysis.
Pooled sensitivity, specificity and likelihood ratios were also calculated using a bivariate random-effects meta-analysis model. Bivariate meta-analysis accounts for the possible negative correlation between the sensitivity and specificity of a test, thought to be due to the fact that varying test threshold affects these parameters, and that different studies may have explicit or implicit differences in the threshold used to define a positive test. Explicit differences arise from studies using different thresholds to define a positive test, whereas implicit threshold variations may result from
differences in observers or equipment.102 Bivariate meta-analysis assumes that the sensitivities from different studies (after logit transformation) within a meta-analysis are normally distributed around a mean value, with variability around this, and the same considerations are applied to the specificities of the study. The combination of the two normally distributed outcomes, acknowledging the potential correlation between them, results in the bivariate normal distribution.102
All bivariate meta-analyses were performed in Stata version 10.0 (StataCorp, Texas,
USA) using the metan and metandi commands.103;104 Plots were generated using
StatsDirect (StatsDirect, Cheshire, UK). Meta-Disc was used for calculations and non-bivariate meta-analyses.105
4.6.5 Publication bias
Publication bias arises when the studies included in a review differ systematically from those that are missed. Funnel plots assess sample size effects by plotting the log odds ratio (ln OR) against the sample size of precision (estimated by the
37 reciprocal of the standard error (SE). Where no sample size effect exists, the points will form a symmetrical funnel plot.106 Sources of asymmetry include publication bias and location bias (e.g. language bias), poor methodological quality leading to
spuriously inflated effects in smaller studies,107 and true heterogeneity.108 Certain tests of funnel plot asymmetry have been found to be more prone to type I error rates within meta-analyses of diagnostic accuracy, due to correlation between lnOR and its SE, and therefore these were avoided.106 To explore for the presence of funnel plot asymmetry (small study effects) and thus potential publication bias, the Peters test was performed in each meta-analysis containing at least 10 studies.108 This uses a
weighted linear regression which has been shown to be more accurate.109 The
analysis was performed in Stata 10 using the metabias command.110