Towards a systematic approach - Missing data in randomised controlled trials: a practical guide

We propose that a systematic approach begins with considering the reason, or mechanism, which caused the data to be missing. As this plays a central role in our discussion, we refer to it more succinctly as the missingness mechanism. We may think of the missingness mecha- nism as a second stage of sampling. It samples from the data we intended to collect leaving us with the data we actually observe. Now, if we do not know how individuals came to be included in a study, or selected for intervention, we cannot draw definite conclusions from the study. Sim- ilarly, as discussed above, unless we know the ‘missingness mechanism’, we generally cannot draw definitive conclusions.

However, in discussion with investigators and/or regulators, and by examining the observed data, we can often come up with one or more likely missingness mechanisms. In an asthma study, for example, it may be those with additional complications at baseline are more likely to withdraw.

Then, it turns out that there are two broad approaches for incorporating into the analysis the necessary extra assumptions that must be made when data are missing. We outline these below, and illustrate them by considering a trial with no missing data up to and including the penul- timate follow-up visit, but some missing data at the final follow-up visit. We suppose interest focuses on the estimated intervention effect at the final visit.

The first approach focuses on the details of the missingness mechanism. Specifically, after taking account of all the information about the missingness mechanism in the observed data, it considers how the missingness mechanism depends on the unseen data. This then informs the

1.5 Towards a systematic approach 11 probability distribution of the missing data, and thus the analysis. The focus on the mechanism by which the data become missing (or alternatively are selected for observation) leads to the term selection modelling for this approach. Taking the example from the previous paragraph, we would first consider how the reason for missing the final visit depended on previous visits and baseline data. Then we would consider how, in addition to this, the missingness mechanism might depend on the unseen measurement. This then affects the probability distribution of the missing data, and thus the estimated intervention effect at the final visit.

The second approach focuses on the possible distribution of the missing data given the observed data. In the example, this means focusing on whether the distribution of patients’ unseen observations at the final visit, given their observations at previous visits and baseline, is different from that seen among the patients who have no missing data. In other words the focus is on whether the ‘pattern’ of the data is the same in patients who do, and do not, have missing data. To estimate the intervention effect at the end of the trial, we have to make an assumption about how the patterns differ in the two groups of patients. This leads to an estimated intervention effect amongst those who do, and do not have, missing data, which has to be averaged, or mixed, to arrive at the overall estimate of the effect of intervention. Hence the name for this is a pattern mixture approach. Example1.4illustrates this in a simple setting.

Although both approaches appear different, we can actually go from one to the other, although this is usually not straightforward (Molenberghs et al., 2003). Whichever approach we adopt, we need to make assumptions about either (i) the missingness mechanism, or (ii) how the distribution, or pattern, of missing data differs between patients we actually observed and those we intended to observe, but did not. Note that (i) implies things about (ii) and vice versa. We term these assumptions the missing data model.

If we adopt a missing data model, we can then determine a sensible analysis and draw conclusions. These conclusions will be correct if our adopted missing data model is correct. However, if it is not correct, the conclusions will generally be wrong. We can then adopt another missing data model, and re-analyse the data. In fact, we can repeat this process as often as we wish. A more systematic, and informative, approach is as follows. Either before the trial is conducted, or during a blind review of the data, the trialists meet and discuss various missing data models that may be appropriate. Ideally, there will be agreement on the relative plausibility of these missing data models. Then, under each missing data model, the statistician can plan a sensible analysis. After the blinding is broken, these analyses are performed. The results reflect the range of conclusions that are consistent with the observed data and the assumed models. Taking these conclusions and the relative plausibility of the missing data models together, the trial can be interpreted as follows:

1. Under the most plausible missing data model, a, we conclude A.

2. Under a range of similar missing data models, b, c, d, we conclude B, C, D.

3. Under slightly different missing data models, e, f , g, which cannot be ruled out, we con- clude E, F, G.

In line with ICH E9, a valid interpretation of the trial, which explores the sensitivity of the conclusions to the missing data models, presents all these analyses. Hopefully, and quite often

in our experience, the conclusions will not be too sensitive to the more plausible missing data models. A valid interpretation of the trial would then be to act on the basis of the common themes running through conclusions A–D, possibly in a way that minimises the risk to patients if E–G turn out to be correct.

Investigators discuss possible missingnesss mechanisms, informed by the information available in the data.

They rank their plausibility.

Under most plausible missingness Under similar missingness Under least plausible missingness

Draw conclusions Perform sensible statistical analysis

Investigators discuss conlcusions Arrive at valid interpretation of the trial

model, say a models, say b, c, d models, say e, f, g

Figure 1.1: A systematic approach for analysing a trial with missing data

Figure1.1shows this approach. As discussed above, although the observed data usually cannot definitively identify the missing data model, they can often provide useful guidance about what is and is not plausible, in the given trial context. Thus, careful analysis of the data should play an important role in formulating missing data models. At the design stage, data from similar previous studies may be used. Data from blind reviews may also provide useful information. We consider design issues and interactions with the regulatory authorities further at the end of this Chapter.

As the missing data model is only ever a working proposition under which the analysis is per- formed, we regard considering the effect of several missing data models on the conclusions of the analysis as an essential part of the analysis process. Following ICH E9, we call this sensitivity analysis.

This approach is fundamentally different from common practice where the analyst regards missing data as a ‘problem’ and casts around for a ‘solution’, usually a computationally simple procedure. Once the data have been analysed using this procedure the problem is regarded as having been ‘solved’. Such an approach is contrary to ICH E9, and may well lead to misleading conclusions.

Statisticians and programmers will notice we have deliberately avoided discussing what is computationally feasible. This is because we believe that the principles of the analysis should be

1.6 Missing data mechanisms 13

In document Missing data in randomised controlled trials: a practical guide (Page 31-34)