Following the report released by the National Research Council (NRC) of the National Academy of Sciences entitled “The Prevention and Treatment of Missing Data in Clinical Trials”, commissioned by the US Food and Drug Administration (FDA), the report (Mallinckrodt, Roger, Chuang-Stein, Molenberghs, O’Kelly, Ratitch, Janssens and Bunouf, 2013; NRC, 2010) had an immediate effect on the way in which statisticians and clinical researchers in both industry and regulatory agencies think about the missing data problem (Mallinckrodt, Roger, Chuang-Stein, Molenberghs, O’Kelly, Ratitch, Janssens and Bunouf, 2013; Carpenter et al., 2013; Ratitch et al., 2013; Permutt, 2015, 2016). This report outlined recommendations which form the basis for designing clinical trials as well as conducting analysis that has great potential to improve study quality and the way in which results of the analysis can be interpreted. This can be achieved by reducing the amount of missing data through changes in trial design and conduct, and by planning and conducting analyses that better account for the missing information. When data are missing, validity of any methods of analysis will depend on the scientific question addressed in a given clinical trial setting. This is because each trial may have different settings as well as varying scientific questions to address. In this thesis, we describe some of the recommendations in the report and discuss how these recommendations are addressed using the CD4 count data from the IMPI trial (Mayosi et al., 2012, 2014).
Randomized clinical trials are the recommended tool for evaluating the effect of new medical inter- ventions. Randomization provides for a balance comparison between treatment and control groups, balancing out, on average, distributions of known and unknown factors among trial participants
(NRC, 2010). However, a substantial percentage of the measurements of the outcome or outcomes of interest is often missing. This missingness reduces the benefit provided by the randomization and introduces potential biases in the comparison of the treatment groups (NRC, 2010).
The presence of missing data can arise due to a variety of reasons, including the inability or un- willingness of participants to meet appointments for evaluation. Some of the reasons could be due to the adverse effect of treatment, or moving to partial compliance with treatment (NRC, 2010; Carpenter et al., 2013). The NRC panel noted that the existing guidelines for the design and con- duct of clinical trials, and the analysis of such data, provide only limited advice on how to handle missing data. Hence, approaches to the analysis of data with a substantial number of missing values tend to be ad hoc and variable. Consequently, the panel concludes that a more principled approach to design and analysis in the presence of missing data is both needed and possible. The panel noted that this approach needs to focus on two critical elements. The first is to carefully design and conduct trials to limit the amount and impact of missing data. This requires a trial design to clearly define the target population, and the outcomes that will form the basis for decisions about efficacy and safety. The treatment of missing data depends on how these outcomes are defined, and lack of clarity in their definition translates into a lack of clarity as to how to deal with missing data issues.
Given the difficulties of adequately addressing missing data at the analysis stage, the design process needs to pay more attention to the potential hazards arising from substantial numbers of missing values. The recommendation 2 of the NRC panel states that “investigators, sponsors, and regulators should design clinical trials consistent with the goal of maximizing the number of participants who are maintained on the protocol-specified intervention until the outcome data are collected ”. The second approach is to conduct analysis that makes full use of information on all randomized participants and is based on careful attention to the assumptions about the nature of the missing data underlying estimates of treatment effects.
Among the recommendations by the NRC panel are three key recommendations. These recom- mendations are (1) making precise and clear objectives of the trial, (2) minimizing the amount of missing data, and (3) using plausible primary analysis together with sensitivity analyses that support the research hypotheses to be addressed, as well as being capable of assessing sensitivity of primary results to missing data assumptions. The NRC (2010) recommendations have received much attention. In particular, clinical practitioners, academicians and regulators now require drug development groups to be guided by such recommendations when proposing and implementing plans to deal with missing data. Although the NRC panel’s work focused primarily on Phase III confirmatory clinical trials that are the basis for the approval of drugs and devices, many of the panel’s recommendations can be applied to all randomized trials (NRC, 2010, pp. 1).
We discuss the three key recommendations of the NRC panel and then point out how these recom- mendations are applied in the IMPI trial settings and trials with similar settings.
2.3.1
Establish clear and precise objectives of the trial
The NRC (2010) panel recommends to set out clear and precise objectives of the trial. This is a measure to avoid ambiguities in conclusions caused by missing data. The data may be missing intermittently or missing because of dropout. Depending on the trial’s objectives, dropouts may or may not be given standard of care after dropout. Assessment of dropout from the initially randomized treatment or introducing a standard of care, may or may not be considered in the primary analysis. The primary analysis (as specified in the statistical analysis plan) addresses the main objective of the study. It is important to be aware that dropout analysis only occurs when patients deviate from the initially randomized treatment (and either discontinue treatment or switch to standard of care) and observations are made but not used in the analysis.
The use of post-deviation data largely depends on the estimand. The post-deviation data are data obtained for subjects after dropout. Within a single study, some analyses may use follow-up data while others may not. The debate on appropriate estimands is based on whether the focus is on efficacy or effectiveness (Carpenter et al., 2013; Mallinckrodt, Lin and Molenberghs, 2013; Ayele et al., 2014). For instance, if one is interested in the difference in outcome improvement at the planned end period for all randomized patients, post-deviation data can be used in the primary analysis. In this scenario, the hypothesis to address is the effectiveness estimand hypothesis. This is because the effectiveness estimand compares treatment groups irrespective of what treatment patients received, and thus inferences is on the effectiveness of the treatment regimen and not the originally randomized treatment. On the other hand, if one is interested in the difference in outcome improvement assuming that all patients adhered to treatment, then post-deviation data cannot be used in the primary analysis. In this scenario, the efficacy estimand will be of interest since the estimand compares causal effects of the initially randomized treatment if taken as directed.
Furthermore, if one is interested in the difference in outcome improvement in all randomized patients at the planned end period of the trial attributable to the initially randomized treatment, post- deviation or imputed data can be used. In this case the hypothesis to address is the effectiveness hypothesis. Whenever there is the need to use post-deviation data, control imputation analyses may be used as a means to obtained follow-up data needed to estimate effectiveness (Carpenter et al., 2013; Mallinckrodt, Lin and Molenberghs, 2013; Ayele et al., 2014).
It can be observed under the data description Section 2.6 that our hypothesis to address is the effectiveness hypothesis. This is because the trial aim is on outcome improvement at the planned end period for all randomized patients.
2.3.2
Minimizing the amount of missing data
The best approach to missing data is to avoid the occurrence of missing data. However, missing data are often unavoidable since the missingness are often not under the control of the researcher, especially studies involving human subjects. The development of new analytic methods and software tools for analyzing incomplete data has been an active area of research and more achievements have been made in that regards. However, all analyses are still challenged by the confusing and difficult problem of analyzing incomplete data (NRC, 2010). For instance, all analyses require assumptions about the missing data; these assumptions cannot be verified from the data, and the appropriateness of analyses and inference cannot be ensured.
When considering design options that minimize missing data, the influence that these options may have on other aspects of the trial must be considered. According to NRC (2010), some of the trial design options that one could consider include enrolling a target subpopulation for whom the risk-benefit of the treatment is more favorable, or identifying such subgroups during the course of the trial via enrichment or run-in designs. However, examples where this has been done successfully in the context of lowering rates of dropout are rare. Other design options in the NRC guidelines included use of add-on designs (Chow and Liu, 2008) and flexible dosing (Mallinckrodt, Roger, Chuang-Stein, Molenberghs, O’Kelly, Ratitch, Janssens and Bunouf, 2013). One may also minimizes patient burden by using efficient data capture procedures, providing education on the importance of complete data, monitoring and providing incentives for complete data.
The missing data methods discussed under the Section 2.4 address this need by accounting for the missing data in the statistical analysis. The validity of these methods lies on the objective of the trial as stated under the recommendation 2.3.1. By the recommendation 2.3.1 and in the context of the IMPI clinical trial, the use of post-deviation data, which we do not have, can be compensated for by using methods that account for, or estimate the distribution of missing values before data analysis.
2.3.3
Appropriate primary and sensitivity analyses
Among the NRC key recommendations is the use of an appropriate primary analysis and sensitivity analyses approaches to assess the sensitivity of the primary analysis results to key assumptions about the missing data. Both primary and sensitivity analysis are necessary because despite all efforts to minimize missing data, anticipating complete data is not realistic. In order to decide on appropriate primary analysis, the process generating the missing data must be take into account in the statistical analysis. In other words, one must consider the missing data mechanisms (MCAR, MAR and NMAR) in order to decide on appropriate primary analysis to use (Carpenter et al.,
2013).
The statistical analysis methods such as maximum likelihood (Dempster et al., 1977; Harville, 1977), multiple imputation (Lavori et al., 1995; Rubin, 1996; Carpenter and Goldstein, 2004; Ayele et al., 2014), Bayesian methods (Harville, 1974; Daniels and Hogan, 2008), and methods based on weighted generalized estimating equations (Robins et al., 1995; Robins and Rotnitzky, 1995; Scharfstein et al., 1999; Seaman and White, 2013) can reduce the potential bias arising from missing data by making principled use of auxiliary information available for nonrespondents. These methods assume that the data are MAR. The NRC panel encourages increased use of these methods. However, these methods rely on untestable assumptions concerning the factors leading to the missing values and how they relate to the study outcomes. Therefore, the assumptions underlying these methods need to be clearly communicated to medical experts so that they can assess their validity (Carpenter et al., 2013).
Sensitivity analyses are therefore important to assess the degree to which the treatment effects rely on the assumptions considered. We need to choose the primary analysis approach carefully since it is based on the chosen primary analysis method that appropriate sensitivity analysis can be formulated to assess sensitivity of the results under the primary analysis to the sensitivity analysis assumptions (Diggle and Kenward, 1994; Thijs et al., 2000; Verbeke et al., 2001; Thijs et al., 2002; Shen et al., 2006; Creemers et al., 2011; Mallinckrodt, Lin and Molenberghs, 2013; Carpenter et al., 2013).
In this thesis, the primary analysis model assumes that the data are missing at random and sensi- tivity analyses are based on the selection model (Diggle and Kenward, 1994; Verbeke et al., 2001; Mallinckrodt, Roger, Chuang-Stein, Molenberghs, O’Kelly, Ratitch, Janssens and Bunouf, 2013), pattern-mixture model (Wu and Bailey, 1989; Carpenter et al., 2013; Mallinckrodt, Lin and Molen- berghs, 2013), and the shared-parameter model (Gao, 2004; Tsonaka et al., 2009; Creemers et al., 2011, 2010).