SYNTHESIS 2.6.1 Data extraction
2.6.2 Quality assessment
Study quality significantly varied between design with prognostic,
epidemiological, or observational designs usually demonstrating most variance and often the poorest standards of methodological rigour (Altman, 2001).
Assessment of quality is important to enable objective assessment of the individual contribution of each study to the field as a whole. As such, quality assessment data can be used in numerous ways. First, data can be synthesised qualitatively to give a methodological critique of a specific field of research; second, as a grouping variable for sensitivity analysis; and, third, many reviews will often use low quality scores as a basis for study exclusion as their inclusion can often obscure the true nature of variable relationships (Altman & Lyman, 1998). Rosenthal (1995) warns against use of quality scores as an exclusion variable as scores are prone to the bias of the individual(s) who conducted the quality assessment. Rather,
Rosenthal promotes using the data to investigate the moderating role of study
quality in the mean effect size (i.e. in sensitivity analysis) or as an alternative study weighting variable in meta-analysis.
Despite substantial criticism against doing so, many reviews continue to discount non-RCT designs and as such, numerous quality assessment tools
currently exist for RCTs; the Delphi List (Verhagen, de Vet, de Bie, Kessels, Boers, Bouter etal., 1998) and the GRADE approach (Atkins, Briss, Eccles, Flottorp,
Guyatt, Harbour etal., 2005) for example. As an alternative to quality assessment scales, the Cochrane Collaboration encourage the use of risk of bias lists which promote evaluation based on sub-domains of quality rather than calculation of total quality scores (Higgins & Altman, 2008). Like many of the other tools,
however, this method is biased towards assessment of the RCT and not all aspects are relevant to other methodological designs. The increasing number of such tools makes it difficult to distinguish between them and select the most suitable
(Altman, 2001).
Fewer tools exist for use in assessing the quality of qualitative studies. The protocol stated that had qualitative studies been included, their quality would be assessed using the well-validated Mays and Pope (1996) tool (these criteria will not be detailed here as no qualitative studies were ultimately included, however, they are readily available: see Mays and Pope, 1996; Kahn et al. 2001).
Converse to the number of experimental design quality assessment tools, there are no widely accepted quality criteria for studies of observational or non- experimental research. To use experimental quality criteria would be invalid as many features are simply not suitable for theses design specific quality indicators.
Therefore, many review teams create their own, usually citing NHS CRD guidance, the Cochrane Handbook and methodological papers (see, for example, Downs &
Black (1998) and Papworth & Milne, 2001) for their development. This has given rise to a number of largely invalidated measures in the literature which are rarely used on more than one or two occasions (Altman, 2001).
An alternative approach is proposed by Edwards, Russell and Stott (1998):
the signal to noise ratio. These authors advocate that to exclude designs lower in the evidence hierarchy is inappropriate as such designs are more feasible and appropriate to many questions. Edwards et a l (1998) propose that in addition to quality assessment, the 'noise' component, an assessment should also be made
about the potential value added by the study, the 'signal' component, to the
literature. Resultant scores are then presented as a ratio representing study value rather than quality perse.
For this review it was necessary to select a method which was relevant to the highest number of possible designs, to allow for direct comparisons to be made.
Whilst the signal to noise method could have been selected, the more
straightforward and arguably more objective checklist approach was favoured.
As such, the Kmet, Lee and Cook (2004) tool was selected.
Some modification to this quality assessment tool was necessary to meet the specific requirements of this review. Based on the assumption that longitudinal designs are more informative for explorations of variable association and
prediction, three items of quality assessment were added relating to such designs.
Specifically, these items assessed: (1) the suitability of timing between baseline and follow-up collection (i.e. long enough for outcomes to have emerged); (2) the sufficiency of explanation of sample attrition between baseline and follow up data collection; and, (3) whether statistical adjustments were made in the analyses based on different lengths of follow-up.
The modified tool used the same scoring instructions recommended by Kmet etal. (2004). The standardised form is applied to each study individually and assesses the quality, clarity and suitability of the stated aims, hypotheses, design, sample, methodology, analysis, reporting of results, and validity of the
conclusions drawn from the data. Each paper is awarded one of four scores (2=yes/good; l=partial; 0=no/poor; X=not applicable) for each of the 19 quality criteria. The total (out of 38) is then converted into a percentage indicator of quality. Quality assessment was conducted by both reviewer one and reviewer two independently and an overall mean score calculated per study. A copy of the quality assessment form (including scoring guidelines) is included in appendix 2.3.
Whilst potentially creating some level of confound into data synthesis, studies with low quality assessment scores were not excluded (see section 2.2.2).
This decision was reached based on two observations. First, due to the subjectivity inherent in quality assessment checklist, and second, because scoping searches had already highlighted that poor quality was an ongoing issue in this field. One of the review objectives was to systematically critique methodology in addition to
synthesising findings. To exclude some studies based on low methodological scores would have rendered this objective redundant Whilst total mean scores were calculated for each study, domain specific evaluation (as suggested by the Cochrane Collaboration) was conducted and in-depth descriptive discussion of various aspects of quality are presented in addition to summary scores.
In addition to quality scoring each individual study, Russell, Di Blasi, Lambert and Russell (1998) propose a scoring system for systematic reviews themselves.
No systematic reviews are currently available on this topic and as such scoring systems were not relevant to data extraction, however, this tool was used in the review evaluation (section 2.8.3).