In general, missing data within a trial can be categorised as; Missing Completely at Random (MCAR), Missing at Random (MAR) or Missing Not At Random (MNAR). [61].
2.2.1 Missing Completely At Random
Data are Missing Completely at Random (MCAR) if the missingness indicator is unrelated to any inference that can be drawn from the dataset [59], [62]. In this case, the probability of a value being missing is unrelated to both the observed values and the values that the subject would have recorded had the data been available. An example of this is when some measurements are lost by a clinician or an equipment failure has led to a patient outcome not being recorded [63]. In both instances, the patient’s status is irrelevant when
considering the probability of missingness. Formulaically, let r be the missingness and
yobs, ymis be the observed and missing values within a dataset respectively. In this thesis
yobs is defined as the longitudinal values for each patient pre-dropout, and ymis are the
potential longitudinal readings post dropout. Missing completely at random data is defined as [64]
P r(r|yobs, ymis) =P r(r).
In MCAR, the average intervention e↵ect is the same when an analysis is performed
with just the observed values as when no missing data is present. However, this may re- sult in a significant loss of information and wider confidence intervals if the percentage of missing data is high.
In practice it is uncommon that data are MCAR [65], and there are various issues associated with determining whether data is MCAR after a patient has dropped out. By
using simple t-tests, it is possible to test for the relationship between the values of ob- served covariates and subject missingness. However, if no relationship is found between any covariate value and the completion of the measurement schedule by a subject, this only confirms that the data has missing properties equivalent to MCAR, which alone is
not sufficient to assume MCAR data [4]. This difficulty in establishing the MCAR mecha-
nism highlights the importance of encouraging clinicians to record the reasons behind each patient’s missing data.
2.2.2 Missing At Random
In some cases when the missingness mechanism is not MCAR, data may be classified as Missing At Random (MAR). In MAR data, the missingness is dependent on observed values within the dataset [59, 61]. This can refer to baseline variables, covariate values, or previous longitudinal measurements taken in the trial. For example, in some clinical trials patients with a worse prognostic factor at baseline are less likely to complete the measurement schedule in a study for reasons unrelated to the outcome of interest. This is an example of MAR data. In terms of a mathematical formula, this is denoted [66];
P r(r|yobs, ymis) =P r(r|yobs).
In the case of MAR, the missingness is not dependent on the outcome itself, however vital information is being lost by omitting the patients that had missing values [59] as the
patients included in the analysis may have di↵erent prognostic outcomes to those which are
omitted. This can lead to patients with a specific set of prognostic profiles being omitted from the analysis in some cases.
2.2.3 Missing Not At Random
If missing data is not classifiable as MCAR or MAR, the data is Missing Not At Random (MNAR), also known as non ignorable. In the MNAR mechanism, the missing values are not only dependent on the observed covariate values, but also of the missing non-observed values themselves [59, 61]. An example of a trial with MCAR data could be one which is designed to analyse the percentage of patients that have given up smoking. At the end of the trial, those patients who have stopped smoking are more likely to disclose the result of the trial than those who have failed to quit smoking. In a longitudinal framework, when data is MNAR there is a link between the values post-dropout and the patient dropping out. In terms of a mathematical formula, this denoted as [67]
P r(r|yobs, ymis) =P r(r|yobs, ymis).
For a trial with MNAR data some standard modelling methods of missing data handling
may be inefficient, and more sophisticated methods of analysis need to be employed as a
simple analysis may fail to compensate for important prognostic details [4].
This diversity of potential missing data should further emphasise the importance of good trial design. While in recent years many techniques of missing data handling have now been established, one of the initial aims within a trial should be to ensure that the
amount of missing data is minimised [60]. Furthermore, in many cases it is difficult to
classify the missingness within a trial, although this can be estimated with higher levels of clinical information [23]. To ensure that a mishandling of missing data is avoided, tech- niques should be discussed when the trial protocol is being constructed, as this will lead to a more transparent statistical analysis.