Group No. of patients Mean FEV1(litres) SD
Active treatment (FP) 168 1.33 0.46
Placebo 141 1.30 0.49
t = 0.48, 307 degrees of freedom, p = 0.63 95% CI for difference (−0.08, 0.13) litres
Table 2.3: Isolde trial, complete case analysis: t-test of treatment effect 3 years after randomi- sation
would be preferable, as it will include more patients and so give more precise estimates of treatment effects. However, if the missingness mechanism is not MCAR, then neither method is sensible. Only if data were MCAR in the early part of the trial and not MCAR later would observed case analysis be sensible but complete case analysis not sensible. This is very unlikely in practice.
2.3 Last observation carried forward
Suppose a trial has longitudinal follow up, and that patients withdraw over the course of the follow up. After a patient withdraws, their subsequent responses are missing. Suppose that, for each patient who withdraws, we set their missing responses equal to their last observed response. This is called Last Observation Carried Forward (LOCF). If some patients withdraw before the first follow-up visit, then their baseline observation can be carried forward. Using LOCF gives a data set with no missing values, to which the analysis method intended for the fully observed data can be directly applied. We say the missing values have been imputed using LOCF. We refer to the assumption that a missing patient’s responses are equal to their last observed response as the LOCF assumption.
EXAMPLE2.1 Isolde (ctd)
Table2.4shows follow-up data from 4 patients. The first completed the trial. The subsequent 3 have had their missing data imputed using LOCF (values shown in italics).
To illustrate the use of LOCF, we impute the missing responses for every patient following their withdrawal, apart from the 134 who withdrew before the first follow-up visit. Figure2.1shows the mean FEV1 at each follow-up visit, by treatment group, using (i) all available data at each
follow-up visit and (ii) LOCF to impute the missing data. The LOCF imputed means are similar for the FP arm, but markedly lower for the placebo arm. The exception is the last visit, where LOCF gives a higher mean for the FP arm. Table2.5shows a t-test for treatment effect using the LOCF imputed data. In contrast to Table2.3, the estimated treatment effect is now significant at the 5% level.
Patient Years of follow-up FEV1(litres) at follow-up visit:
6 months 1 year 1.5 years 2 years 2.5 years 3 years
1 3 1.3 1.2 1.0 1.0 1.0 1.1
2 0.5 0.7 0.7 0.7 0.7 0.7 0.7
3 1 1.7 1.5 1.5 1.5 1.5 1.5
4 1.5 0.9 1.0 1.2 1.2 1.2 1.2
Table 2.4: Isolde trial: After withdrawal, patients have had their missing data imputed using LOCF (imputed values shown in italics)
sible, and are these plausible here? We need to be confident about the answers to these before
concluding a treatment effect actually exists. ¤
Group No. of patients Mean FEV1(litres) SD
Active treatment (FP) 316 1.35 0.47
Placebo 301 1.26 0.48
t = 2.28, 615 degrees of freedom, p = 0.02 95% CI for difference (0.01, 0.16) litres
Table 2.5: Isolde study, LOCF imputed data: t-test of treatment effect 3 years after randomisa- tion
LOCF is a popular method for handling missing data. The above example illustrates its sim- plicity, and it can be argued that much of its popularity is due to this. We now consider whether it is a sensible method.
Two principles emerged in Chapter 1. First, when a patient withdraws, we can rarely hope to recover their missing values. Second, suppose we assume the withdrawn patient’s missing data are MAR. Then, suppose we can find a group of patients whose members, prior to the patient withdrawing, share similar responses to the patient who withdrew. Then, at least under the per-protocol hypothesis (§1.8.2), the subsequent responses of this group give an estimate of the likely distribution of the withdrawn patient’s missing responses (e.g. Figure2.2, left panel). LOCF generally goes against both these principles. It imputes a single value for each miss- ing response. The subsequent analysis gives these imputed responses the same status as actual observed responses. This is unsatisfactory, as a single value is being used as an estimate of a distribution. This can only be generally correct in the extremely implausible event that the dis-
2.3 Last observation carried forward 33 0.5 1.0 1.5 2.0 2.5 3.0 1.20 1.25 1.30 1.35 1.40 1.45 1.50
Years since randomisation
FEV (litres)
Mean of observed data Mean based on LOCF Active
Placebo
Figure 2.1: Isolde trial: mean FEV1 (litres) at each follow-up visit, by treatment arm. Solid
line, means calculated using all available data at each visit. Broken line, means calculated after imputing missing data using LOCF. Note that 134 patients with no readings after baseline are omitted
tribution is degenerate1. Such a degenerate distribution will never be implied by the multivariate normal distribution, or any standard distributions.
At best, estimating a distribution by a single value potentially underestimates its variance2. This
explains why LOCF analyses for the per-protocol hypothesis may underestimate the informa- tion lost due to missing data, resulting in standard errors that are too small and confidence intervals that are too narrow.3 Further, under the per-protocol hypothesis, suppose the with-
drawn patient’s data are approximately MAR. Then the group of patients who complete, but who share similar characteristics and responses to this patient prior to withdrawal, will usually give a better estimate of the distribution of the missing values than the last response before the patient withdrew. Yet LOCF ignores this information. Thus LOCF is likely to give biased imputations for the missing data leading in turn to biased estimates of treatment effect.
On the other hand, if we focus on the ITT analysis, and believe the distribution around the marginal (i.e. treatment group) mean stays the same for patients who withdraw, we should perform a ‘principled LOCF’ and ‘carry forward’ this distribution, not the last observation. The 1A probability distribution which says a single particular value is certain to occur is termed degenerate. With
missing data, all we can estimate is the distribution of the missing data given the observed data, under certain as- sumptions. Imputing a single, worst/best value, usually therefore implicitly assumes a very implausible degenerate distribution for the missing data given the observed data.
2As response variability usually increases over time.
3Some have described hypotheses where the LOCF analysis has the correct size (Shao and Zhong,2003), but
one exception is if we are prepared to accept that for each patient who drops out, before their last observation, their condition had stabilised — so that the distribution of their responses does not change at all for the remainder of the study4. Under this strong assumption, the patient’s
last observation is a genuine observation from their stable response distribution, and could have equally been seen just before withdrawal as at the end of the study. We can therefore use this last observation as the patient’s response in the cross sectional analysis of treatment effect at the end of the trial follow-up. However, this corresponds to a very counter-intuitive missingness mechanism. Indeed it is hard to think of why patients would withdraw after they had stabilised unless either the protocol were very demanding or they had no expectation that their condition would change whether they were in or out of the trial. We reiterate that in most settings this is very implausible. Patients often change intervention when they withdraw from a trial (the desire to do this may well trigger withdrawal) in the hope of getting a better response.
It is sometimes suggested that, where the focus is on estimated treatment differences at the end of the follow-up, because ITT analyses ‘need’ a response from each randomised patient, LOCF is appropriate. We disagree. As discussed in §1.8.3, when patients withdraw, and almost certainly change their intervention regime, an ITT analysis needs to estimate the distribution of their unseen response at the end of the trial under this new regime. It is highly implausible that this distribution is adequately represented by their last observation.
Defenders of LOCF sometimes argue that it leads to conservative estimates of treatment effects. However, it is easy to show that this cannot be true in general (Molenberghs et al., 2004). Rather, the direction of the bias depends on the (unknown) true treatment effect and the missing value mechanism. In general, LOCF is biased even when a complete case analysis is sensible (Molenberghs et al., 2004). If investigators or regulators have strong prior beliefs about the relationship between missing and observed responses, the correct way to allow for these is through a sensitivity analysis, examples of which which we discuss in Chapter6.
EXAMPLE2.2 LOCF is not sensible when data are MCAR
Consider a hypothetical study where we have a placebo and an active treatment group, both with 100 patients. At the first post-randomisation visit, both groups have a mean FEV1 of 1.2
litres. At the second, and final, visit, the true mean in active treatment group is 1.5 litres, but that in the placebo arm remains 1.2 litres. However, suppose that 50 of the patients in the active arm withdrew, completely at random.
A complete case analysis is sensible. The mean for the active group, estimated from the 50 patients who complete, is around 1.5 litres; that in the placebo group around 1.2 litres. The estimated treatment effect is 0.3 litres.
Now consider the LOCF analysis. In the active group, we observe 50 patients, with a mean FEV1 of around 1.5 litres. However, LOCF carries forward the first visit responses of the 50
who withdrew. These are around 1.2 litres. So the LOCF average response at the final visit is around (50 × 1.2 + 50 × 1.5)/100 = 1.35 litres. As no patients drop out of the placebo arm, the mean response at the final visit is the same under LOCF, around 1.2 litres. So the estimated treatment effect is 0.15 litres.
Thus LOCF is not sensible, even when data are MCAR. Further, it is hard to see how the LOCF analysis is a meaningful sensitivity analysis to the complete case analysis. ¤ 4Such an assumption of exchangeability is rarely appropriate for longitudinal data, irrespective of any missing
2.3 Last observation carried forward 35 In fact, as pointed out by Heyting et al.(1993), it is possible to see from the observed data whether LOCF is plausible. We first make the assumption that responses are not missing due to a MNAR missingness mechanism which is totally unrelated to prior responses. In other words we assume that the missingness mechanism is not too far from MAR. Now consider a group of patients with similar measurements. At each observation time, a proportion of them withdraw. As the missingness mechanism is approximately MAR, the unseen responses of these patients are distributed roughly according to the observed responses of patients in the group who have not yet withdrawn. Figure 2.2 illustrates this graphically. Only in the right panel, where individual patients’ responses are virtually constant, is the LOCF assumption plausible. However, in both panels of Figure 2.2 a MAR analysis is sensible. Just because one may use LOCF as a “poor man’s” MAR analysis in situations like the right panel is not sufficient to justify it — although it is the probable source of occasional anecdotes that LOCF tends to agree with MAR analysis. In this case, though it would be addressing the per-protocol not ITT hypothesis!
Note too that it does not follow that if the mean profile is approximately constant, individual pa- tient profiles are approximately constant. In real life, approximately constant individual patient profiles are rarely seen.
Time
0 0 Time
Response Response
Figure 2.2: Panels show a group of patients with similar responses (dashed lines), one of whom (solid line) drops out. In the left panel, the group responses suggest the LOCF assumption is false. In the right panel, the group responses suggest it is less implausible
Another point sometimes made in favour of LOCF is that if there is no treatment effect it pre- serves the ‘Type I error’ (i.e. the chance of finding a statistically significant treatment effect when none in fact exists) at 5%. Although in a limited sense this is true (if both groups have identical distributions of response and withdrawal) the problem lies rather under the alternative hypothesis. There are many possible patterns of treatment effect and withdrawal for which the power of the LOCF test is the same as the test size. Further, merely maintaining the test size is not sufficient to justify a test procedure: if it were we could use the throw of a 20-sided die to calculate a test statistic with perfect nominal 5% type I error! Clearly we also need to consider the behaviour of the statistic under the range of alternative hypotheses. In this the LOCF test falls down badly, as it is unable to detect a wide range of actual treatment effects (Carpenter
et al., 2004). Further, in situations where the treatment effect can be detected by the LOCF procedure, the likelihood based analyses described Chapter3will usually be more powerful. In summary, if we really wish to ‘carry forward’ information after withdrawal, then the appro- priate distribution should be carried forward, not the observation. This is not difficult to do, and will give valid inference much more generally than carrying forward the last observation. While one could attempt to delineate specific circumstances where LOCF may perform reasonably, as we are writing generally (as this seems the best way of being relevant to most analyses) we do not attempt to do this.
As LOCF is neither valid under general assumptions nor based on statistical principles, it is not a sensible method, and should not be used. It is therefore unfortunate that Wood et al.
(2004) found that LOCF is commonly used as a sensitivity analysis when the principal analysis is complete cases. In effect, LOCF is actually just an analysis of each patient’s last observed value (so called Last Observation Analysis, LOA). If LOA is really of interest then by definition the last observed measurement needs to be analysed, but in this setting it is equally obvious that the time to this event must also be relevant, yet this is almost never considered in such analyses. When estimating treatment effects at the end of a trial, though, LOA is not useful, as it may well reflect misleading transient effects. Although there has been some confusion on this point (Shao and Zhong,2003), seeing LOCF in this light helps expose its lack of credibility (Carpenter et al.,
2004). It is definitely not a sensitivity analysis in the sense described in Chapters1and6. Lastly,
Lavori(1992) comprehensively refutes LOCF in the context of psychiatry, andPocock(1996)
reinforcesHeyting et al.(1992), noting ‘it is doubtful whether this [LOCF] actually answers a scientifically relevant question’.