TABLE 2.1: ASSESSMENTS OF STUDY QUALITY
2.5 SELECTING DATA FOR ANALYSIS
Many of the studies included in the review used several estimation methods within the same study, principally matching and regression (covariance) adjustment. When there are large differences in the covariate distributions between the groups, standard model-based adjustments are known to rely heavily on extrapolation and assumptions. In response, matching has become a widely used non-experimental method of evaluation over the past three decades (D’Agostino, 1998; Rosenbaum & Rubin, 1983). Matching is done with the aim of creating treated and control groups with similar observed covariate distributions, thereby increasing robustness in observational studies by reducing reliance on modelling assumptions. Since the work of LaLonde (1986), many have investigated whether non-experimental
methods can yield results similar to those from randomised experiments. The work of Dehejia and Wahba (1999), in particular, generated great interest regarding the ability of (propensity score) matching methods to potentially produce unbiased estimates of a programme’s impact.A number of authors have specifically evaluated matching methods (Glazerman, Levy, & Myers, 2003; Heckman, Ichimura, & Todd, 1997; Heckman, Ichimura, Smith, & Todd, 1998; Heckman, Ichimura, & Todd, 1998; Michalopoulos, Bloom, & Hill, 2004), with many supporting the use of methods as a means of limiting reliance on inherently untestable modelling assumptions and the consequential sensitivity to those assumptions (for a discussion, see Stuart & Rubin, 2007). Others who have compared estimates from propensity score matching with different regression (covariance) adjustment analyses have found that no method is consistently better than the others (e.g., Michalopoulos et al., 2004). This presents major challenges for reviewers faced with assessing the potential of a wide range of matching and covariance adjustment methods for reducing bias in observational studies.
Drawn from some of the available practical guidance on this topic (e.g., Stuart & Rubin, 2007), the following outlines our approach for choosing between different methodologies when extracting outcome data.
Combining methods (i.e., matching and regression-based model adjustment) was judged to be more efficient in reducing bias in the estimate of the treatment effect than using those methods individually (Cochran & Rubin, 1973; Glazerman et al., 2003; Ho, Imai, King, & Stuart, 2007; Rubin, 1973a, 1973b, 1979; Rubin & Thomas, 2000). Combination could take the form of either:
a two-step procedure in which matching is followed by regression analysis (linear regression, logistic regression, hierarchical modelling, and so on) to remove any remaining differences between groups. (Here, results should be less sensitive to the modelling assumptions and thus should be fairly insensitive to the model specification, as compared with the same analysis on the original unmatched samples.)
a model incorporating a polynomial of the propensity scores (i.e., regression adjustment on matched sample)
Where matching and covariate adjustment were both used in a single study and then the findings from each method of estimation compared (i.e., the methods were not used in combination) matching was usually judged to be the more efficient estimator (especially in cases where the difference-in-differences version had been
implemented). However, there was potential for model-based adjustment methods to be considered more efficient if:
there was substantial bias between the groups in the matched samples (e.g., imbalance in the propensity score of more than 0.5 standard deviations), and the model-based approach used high-quality data with a rich set of covariates
(Glazerman et al., 2003);
matching was undertaken using a small set of covariates and the model-based approach involved the use of a rich set of covariates; or
the matching procedure resulted in very small sample sizes (furthermore, much better balance is achieved when there are many controls available for the matching) (Rubin, 1976), and the model-based approach involved the use of a rich set of covariates.
In short, particularly in cases of cross-sectional versions of matching, if a model was correctly specified then it tended to be judged as more efficient than matching.8 In deciding which outcome data to select, making a choice between different matching techniques was sometimes required. Matching techniques differ in both the way they define similarity and the way weights are computed.9 Where different techniques for constructing a matched sample (using the propensity score) were used in a single study included in the review, our approach to the selection of data was as follows:10
If the authors reported which technique led to the most closely related/matched samples (i.e., best balance between the covariates in the treated and control groups) the outcome data based on this technique were extracted.
Where no such information was presented by the authors, the following hierarchy applied:
8 For matching, the use of the same data source for the participants and non-participants was also
regarded as important, as this would help ensure similar covariate meaning and measurement.
9 Traditional matching estimators pair each participant with a single matched non-participant
(Rosenbaum & Rubin, 1983), whereas more recently developed estimators pair participants with multiple non-participants and use weighted averaging to construct the matched outcome.
10 To some extent, the best method depends on the individual data set and where relevant this was also
o local linear (most efficient)11 o kernel12
stratified
o nearest neighbor13 (also called pair-wise matching) (least efficient)
In situations where different numbers of nearest neighbours were used, the general principle was that we extracted outcome data relating to the technique using the greatest number of neighbours (unless the authors reported better balance between the covariates in the treated and control groups using a different number of neighbours, or reviewers determined this).
For kernel regression matching conducted using more than one bandwidth (0.1, 0.2 and so on), our approach was to extract the outcome data relating to the highest bandwidth.14
A large body of techniques for carrying out regression analysis has been developed. In cases where the authors reported several models with different combinations of control variables in the same paper, our approach was to focus on the effect
estimates that were derived from the most similar models across studies. In so doing, the aim was to minimise (although not eliminate) the differences in what was adjusted across studies.
For studies using cross-sectional and difference-in-differences estimation strategies, we extracted the outcome data for both. For studies reporting different estimation parameters (e.g., average treatment effects, marginal treatment effects, and so on) we extracted the outcome data relating to each of these.
Many of the included studies reported results separately for different cohorts and/or different sub-groups of participants. When it could be established that the different cohorts or sub-groups contained no overlapping subjects, we treated them as independent samples. Where different sized samples were used (and these samples were overlapping), the general principle was that the impact effects for the largest sample would be used. In practice, however, decisions about which to include in the meta-analysis were made on a case-by-case basis (taking into consideration relevant
11 Local linear regression is a non-parametric regression technique that improves on the more
traditional kernel regression estimator (Fan, 1992, 1993). It differs from kernel regression in terms of weights.
12 In Kernel matching, all the individuals of the sample are used. In the estimation of treatment effects,
more weight is assigned to those matches that are more similar.
13 In Nearest Neighbour matching, each individual in the treatment group is matched with the most
similar individual or individuals in the control group. However, this process does not guarantee that the matched individuals are sufficiently comparable in terms of propensity scores if the samples do not overlap. The Nearest Neighbour matching can be improved by the use of a caliper, although this strategy may conduct to losses of observations from the treatment group. If a sufficiently small caliper is used, Nearest Neighbour approaches are preferred to stratification approaches.
issues relating to the selection of the sample, such as whether there was likely to be more overlap between control and treatment individuals in terms of observable and unobservable characteristics). Where no such issues were noted by the authors or identified by the reviewers, the approach was to select the largest sample.