Part II Statistical Methodology
8.3 Statistical methods for IPD subgroup meta-analyses
When performing IPD meta-analyses, it is very important to recognize the hierarchical structure or clustering and incorporate it into the statistical modelling. In fact, it is well recognized that any study with an underlying hierarchical or clustered structure should use an appropriate statistical method to account for the clustering (154, 155). For example, each individual in an IPD meta-analysis of LBP studies will be associated with a particular study in the pooled dataset. Thus in a multi-level model, the
individuals form the first level in the hierarchy (level 1) and the studies form the second level (level 2). This hierarchical structure implies that individuals selected randomly from a single particular study will be more similar than individuals randomly selected from several studies; thus introducing between-study variation. Simply
ignoring the clustering of patients within trials during analyses is inappropriate. In a standard linear model, the clustering is ignored and it is assumed that the individual observations are independent; hence the error values for each observation will also be unrelated. As we know, due to clustering at the study level, the outcomes within each study will have some degree of correlation. If the outcomes are correlated, the error terms will also be correlated thus violating the independence assumption of the linear regression model. Therefore, a linear regression model applied to IPD will not be able to provide reliable estimates of the coefficient standard errors. The estimated standard errors will be underestimated and this could potentially lead to false inferences
claiming real effects exist when in actual fact they donβt. For that reason it is very important, as demonstrated in a recent study, that the clustering of patients within studies is accounted for when performing IPD meta-analyses (156). For this reason IPD meta-analyses typically use either a two-stage approach or a one-stage approach to account for the clustering (146). A two-stage approach conducts the analysis using conventional linear regression for each study separately and then synthesizes the
141
results using well established meta-analysis techniques. A one-stage approach on the other hand uses a mixed-effect model, also referred to as a multilevel model or hierarchical model, to fit a single model to the pooled IPD. It is called a mixed effect model because the model consists of fixed effects and random effects; where the random effects are used to capture the variation at different levels. A fixed-effects model including indicator variables for each study can also be applied for the one-stage approach where all the components of the model are fixed.
There are several papers available detailing the application and advantages of using a two-stage and a one-stage approach (157-159). However, when considering the extension of tree based methods to an IPD setting, it would very difficult and
computationally intensive to use a two-stage approach. There are two ways in which the two-stage approach can be implemented. One approach would be to naively grow a tree for each trial separately; however each trial will probably grow a different tree thus making it impossible to synthesize the results. Another approach would be to evaluate every split for each covariate using the splitting function for each trial
separately and then synthesize the score across the trials using some weighted average (as done in aggregate data meta-analyses). However, a danger with this is that if one of the trials does not contain the value of the split being considered, then a score will not be computed for that trial and thus the information from that trial will be lost. For example, if a tree method was considering a split on gender (males vs. females) in each trial separately and if one trial had just females in it, then no score would be computed and so the trial would not contribute anything to the estimation of the effect i.e. loss of information. Moreover, such a procedure would be computationally intensive. The one- stage approach on the other hand would not experience the aforementioned difficulties associated with the two-stage approach. Hence, a one-stage approach is better suited to tree based methods and their application. The main advantage of the two-stage method
142
is simplicity, but as it is lost here, there is no reason to pursue it. Therefore, only the one-stage approach will be considered going forward.
In a one-stage approach, the covariates in the mixed effect model can be set-up to have fixed effects or random effects to account for the clustering. To account for the
clustering, one approach would be to use a standard linear regression model and add indicator variables to the model for each study (fixed effects) as follows:
πππ = π½0π+ π½1πππ+ π½2πππ+ π½3πππβ πππ+ πππ (8.1)
where π½0π is a vector of indicator variables for each study, the ππ subscript denotes the
i-th observation in the j-th study and πππ is the normally distributed error term. The model therefore allows each study to have a different intercept and is referred to as the fixed-effects model. This is basically a general linear model that adjusts for the trial effects by including them as indicators in the model. Instead of adding fixed-effects for trials as shown in equation (8.1), another option for a one-stage approach would be to set the study level covariate as having a random-effect. This basically means that the equation is of the same form but the π½0π term in the model is assumed to be normally distributed with mean π½0 and variance ππ½20. Thus the fully specified model can be
written:
πππ = π½0π+ π½1πππ+ π½2πππ+ π½3πππβ πππ+ πππ (8.2)
π½0π~N(π½0, ππ½0
2)
The models specified in equations (8.1) and (8.2) are referred to as the fixed-effects model and random-effects model respectively. For both of these models, the intercepts differ for each study however the slopes remain the same.
143
It is also possible that covariates may differ across studies and so this also needs to be accounted for when using either fixed-effects or random-effects models. For example if the treatment effect is different across studies, then this could be accounted for in a fixed-effect model by including a treatment by study interaction term. In a random- effects model, random effects can be placed on the treatment variable to give the following model πππ = π½0π+ π½1ππππ+ π½2πππ+ π½3πππβ πππ+ πππ (8.3) π½0π~N(π½0, ππ½0 2) π½1π~N(π½1, ππ½1 2)
By doing so, the model will have a random intercept and a random treatment effect. In this manner, as illustrated by the example models specified thus far, mixed effects models can be fitted to best incorporate the correlations inherent within the hierarchical data structure to obtain reliable parameter estimates.
Parameter estimation
This section provides a very brief overview as to how the commonly used REML approach is used for parameter estimation in mixed-effects modelling. For a more detailed description, one can refer to Pinheiro et al (160).
The one-stage mixed-effect models (equations (8.2) and (8.3)) make use of maximum likelihood (ML) or restricted (or residual) maximum likelihood (REML) to obtain parameter estimates. Of the two, the REML approach to estimation is preferred as it provides unbiased estimates of the variance parameters and performs well when the data is unbalanced (160, 161). The REML approach works by maximizing the likelihood
144
for the two components of the mixed model i.e. the fixed effects component and the random effects component. We can write the two components of a mixed model in a general matrix form as follows
π = ππ΅ + ππ + π
where Y is a π Γ 1 vector of the reported outcomes, X and Z are the covariate matrices for the fixed component and the random component respectively, B and U are both π Γ 1 vectors containing the fixed effect coefficients and the random effects respectively and finally e is vector that consists of the residuals. Typically, both U and e have a multivariate normal distribution (MVN) of the form
π~πππ(0, π») π~πππ(0, π )
Initially the covariance components H and R i.e. π»Μ and π Μ, are estimated using REML so that the parameters B and U can be estimated thereafter. As a whole, the model has a MVN distribution of the form π~πππ(ππ΅, ππ»πβ²+ π ) where B is estimated by
(πβ²πΜβ1π)β1πβ²πΜβ1π where πΜ = ππ»Μπβ²+ π Μ, and the U component is estimated using a
shrinkage estimate of the form π»Μπβ²πΜβ1(π β ππ΅Μ) (157, 160). The estimates of the
covariance H for the random component are referred to as REML estimates as it estimates the proportion of the variance explained by the between-study heterogeneity.