• No results found

Part II Statistical Methodology

8.3 Statistical methods for IPD subgroup meta-analyses

When performing IPD meta-analyses, it is very important to recognize the hierarchical structure or clustering and incorporate it into the statistical modelling. In fact, it is well recognized that any study with an underlying hierarchical or clustered structure should use an appropriate statistical method to account for the clustering (154, 155). For example, each individual in an IPD meta-analysis of LBP studies will be associated with a particular study in the pooled dataset. Thus in a multi-level model, the

individuals form the first level in the hierarchy (level 1) and the studies form the second level (level 2). This hierarchical structure implies that individuals selected randomly from a single particular study will be more similar than individuals randomly selected from several studies; thus introducing between-study variation. Simply

ignoring the clustering of patients within trials during analyses is inappropriate. In a standard linear model, the clustering is ignored and it is assumed that the individual observations are independent; hence the error values for each observation will also be unrelated. As we know, due to clustering at the study level, the outcomes within each study will have some degree of correlation. If the outcomes are correlated, the error terms will also be correlated thus violating the independence assumption of the linear regression model. Therefore, a linear regression model applied to IPD will not be able to provide reliable estimates of the coefficient standard errors. The estimated standard errors will be underestimated and this could potentially lead to false inferences

claiming real effects exist when in actual fact they don’t. For that reason it is very important, as demonstrated in a recent study, that the clustering of patients within studies is accounted for when performing IPD meta-analyses (156). For this reason IPD meta-analyses typically use either a two-stage approach or a one-stage approach to account for the clustering (146). A two-stage approach conducts the analysis using conventional linear regression for each study separately and then synthesizes the

141

results using well established meta-analysis techniques. A one-stage approach on the other hand uses a mixed-effect model, also referred to as a multilevel model or hierarchical model, to fit a single model to the pooled IPD. It is called a mixed effect model because the model consists of fixed effects and random effects; where the random effects are used to capture the variation at different levels. A fixed-effects model including indicator variables for each study can also be applied for the one-stage approach where all the components of the model are fixed.

There are several papers available detailing the application and advantages of using a two-stage and a one-stage approach (157-159). However, when considering the extension of tree based methods to an IPD setting, it would very difficult and

computationally intensive to use a two-stage approach. There are two ways in which the two-stage approach can be implemented. One approach would be to naively grow a tree for each trial separately; however each trial will probably grow a different tree thus making it impossible to synthesize the results. Another approach would be to evaluate every split for each covariate using the splitting function for each trial

separately and then synthesize the score across the trials using some weighted average (as done in aggregate data meta-analyses). However, a danger with this is that if one of the trials does not contain the value of the split being considered, then a score will not be computed for that trial and thus the information from that trial will be lost. For example, if a tree method was considering a split on gender (males vs. females) in each trial separately and if one trial had just females in it, then no score would be computed and so the trial would not contribute anything to the estimation of the effect i.e. loss of information. Moreover, such a procedure would be computationally intensive. The one- stage approach on the other hand would not experience the aforementioned difficulties associated with the two-stage approach. Hence, a one-stage approach is better suited to tree based methods and their application. The main advantage of the two-stage method

142

is simplicity, but as it is lost here, there is no reason to pursue it. Therefore, only the one-stage approach will be considered going forward.

In a one-stage approach, the covariates in the mixed effect model can be set-up to have fixed effects or random effects to account for the clustering. To account for the

clustering, one approach would be to use a standard linear regression model and add indicator variables to the model for each study (fixed effects) as follows:

π‘Œπ‘–π‘— = 𝛽0𝑖+ 𝛽1𝑇𝑖𝑗+ 𝛽2𝑋𝑖𝑗+ 𝛽3π‘‡π‘–π‘—βˆ™ 𝑋𝑖𝑗+ πœ€π‘–π‘— (8.1)

where 𝛽0𝑖 is a vector of indicator variables for each study, the 𝑖𝑗 subscript denotes the

i-th observation in the j-th study and πœ€π‘–π‘— is the normally distributed error term. The model therefore allows each study to have a different intercept and is referred to as the fixed-effects model. This is basically a general linear model that adjusts for the trial effects by including them as indicators in the model. Instead of adding fixed-effects for trials as shown in equation (8.1), another option for a one-stage approach would be to set the study level covariate as having a random-effect. This basically means that the equation is of the same form but the 𝛽0𝑖 term in the model is assumed to be normally distributed with mean 𝛽0 and variance πœŽπ›½20. Thus the fully specified model can be

written:

π‘Œπ‘–π‘— = 𝛽0𝑖+ 𝛽1𝑇𝑖𝑗+ 𝛽2𝑋𝑖𝑗+ 𝛽3π‘‡π‘–π‘—βˆ™ 𝑋𝑖𝑗+ πœ€π‘–π‘— (8.2)

𝛽0𝑖~N(𝛽0, πœŽπ›½0

2)

The models specified in equations (8.1) and (8.2) are referred to as the fixed-effects model and random-effects model respectively. For both of these models, the intercepts differ for each study however the slopes remain the same.

143

It is also possible that covariates may differ across studies and so this also needs to be accounted for when using either fixed-effects or random-effects models. For example if the treatment effect is different across studies, then this could be accounted for in a fixed-effect model by including a treatment by study interaction term. In a random- effects model, random effects can be placed on the treatment variable to give the following model π‘Œπ‘–π‘— = 𝛽0𝑖+ 𝛽1𝑖𝑇𝑖𝑗+ 𝛽2𝑋𝑖𝑗+ 𝛽3π‘‡π‘–π‘—βˆ™ 𝑋𝑖𝑗+ πœ€π‘–π‘— (8.3) 𝛽0𝑖~N(𝛽0, πœŽπ›½0 2) 𝛽1𝑖~N(𝛽1, πœŽπ›½1 2)

By doing so, the model will have a random intercept and a random treatment effect. In this manner, as illustrated by the example models specified thus far, mixed effects models can be fitted to best incorporate the correlations inherent within the hierarchical data structure to obtain reliable parameter estimates.

Parameter estimation

This section provides a very brief overview as to how the commonly used REML approach is used for parameter estimation in mixed-effects modelling. For a more detailed description, one can refer to Pinheiro et al (160).

The one-stage mixed-effect models (equations (8.2) and (8.3)) make use of maximum likelihood (ML) or restricted (or residual) maximum likelihood (REML) to obtain parameter estimates. Of the two, the REML approach to estimation is preferred as it provides unbiased estimates of the variance parameters and performs well when the data is unbalanced (160, 161). The REML approach works by maximizing the likelihood

144

for the two components of the mixed model i.e. the fixed effects component and the random effects component. We can write the two components of a mixed model in a general matrix form as follows

π‘Œ = 𝑋𝐡 + π‘π‘ˆ + 𝑒

where Y is a 𝑁 Γ— 1 vector of the reported outcomes, X and Z are the covariate matrices for the fixed component and the random component respectively, B and U are both 𝑁 Γ— 1 vectors containing the fixed effect coefficients and the random effects respectively and finally e is vector that consists of the residuals. Typically, both U and e have a multivariate normal distribution (MVN) of the form

π‘ˆ~𝑀𝑉𝑁(0, 𝐻) 𝑒~𝑀𝑉𝑁(0, 𝑅)

Initially the covariance components H and R i.e. 𝐻̂ and 𝑅̂, are estimated using REML so that the parameters B and U can be estimated thereafter. As a whole, the model has a MVN distribution of the form π‘Œ~𝑀𝑉𝑁(𝑋𝐡, 𝑍𝐻𝑍′+ 𝑅) where B is estimated by

(π‘‹β€²π‘‰Μ‚βˆ’1𝑋)βˆ’1π‘‹β€²π‘‰Μ‚βˆ’1π‘Œ where 𝑉̂ = 𝑍𝐻̂𝑍′+ 𝑅̂, and the U component is estimated using a

shrinkage estimate of the form π»Μ‚π‘β€²π‘‰Μ‚βˆ’1(π‘Œ βˆ’ 𝑋𝐡̂) (157, 160). The estimates of the

covariance H for the random component are referred to as REML estimates as it estimates the proportion of the variance explained by the between-study heterogeneity.