Application in the One-Sample Multivariate Case

3.8 Bias Correction Using the Bootstrap

3.8.2 Application in the One-Sample Multivariate Case

To investigate how the bootstrap would work in practice, we demonstrate it using the MUSE trial data. In this scenario N=182 and due to the computational complexity we choose nboot=1000. Therefore the procedure is as follows:

1. Sample with replacement N=182 patients from the MUSE trial

2. Compute the treatment effect using the latent variable, augmented binary and standard binary methods

Table 3.28: Log-odds treatment effect estimates and 95% confidence intervals from the

latent variable method, augmented binary method and standard binary method in the phase IIb MUSE trial and the bootstrap sample when N=182 and n_boot = 1000

Method Log-odds treatment effect

MUSE trial estimate Bootstrap estimate Latent Variable 0.641 (0.217, 1.072) 0.682 (0.275, 1.137) Augmented binary 0.580 (0.139, 1.021) 0.608 (0.096, 1.111) Binary 0.763 (0.078, 1.449) 0.809 (0.112, 1.561) 3. Repeat step 1 and 2 nboot=1000 times

4. Obtain an estimate of the bias using the difference between the treatment effect in the MUSE trial and the mean of the bootstrap treatment effects

Importantly, a 95% bootstrap confidence interval for the treatment effect estimate can be obtained by ordering the 1000 bootstrap estimates of the treatment effect and taking the 25th and 975th estimate. The point estimates and 95% confidence intervals from the MUSE trial and from the re-sampling are shown in Table 3.28.

The log-odds point estimate from the latent variable method has shifted away from the null by approximately 0.04. This is the magnitude of bias that the simulation results suggested for this treatment effect. The width of the confidence interval has remained the same in the bootstrap sample, indicating that the variance is well estimated in the trial dataset. Ideally, we would investigate this further across a larger number of datasets however this is too computationally intensive. To perform this on one replicate, where nboot = 1000 using 200 cores on the HPC currently takes 7 hours.

Exploring this further through bootstrapping or employing alternative multivariate distributions is an area for future research.

3.9 Discussion

The work in this chapter aimed to address the large loss of information in modelling complex composite endpoints. One challenge in this work was determining an appropriate joint model for the components when these are measured on different scales. By partitioning latent variable outcome spaces we were able to model the observed

3.9 Discussion 113

structure of the composite endpoint which resulted in large gains in efficiency. These gains in efficiency were offset by the introduction of a small bias when the treatment effect is large. Sensitivity analyses showed that this bias is exacerbated when the assumptions of joint normality were not satisfied, however similar reductions in variance were observed. Application to the MUSE trial data reinforced the simulation findings, in that the treatment effect reported from the latent variable method was 2.5 times as precise as that reported from the logistic regression and appeared to be biased towards the null.

Bias correction seems to perform well in the real data, where the crucial assumptions cannot be tested. The point estimate is shifted by a magnitude that would have been expected from the simulation results and the estimate of the variance is similar to that obtained in the single trial dataset. Furthermore the latent variable bootstrap confidence interval for the treatment effect is contained within that for the binary method, which offers further reassurance for application. However, these results are not definitive and more work could be done on investigating different structures and scenarios to ensure that the bias correction is always what we would expect from simulation results.

The potential precision gains offered by the latent variable method offer justification for the additional complexity however the magnitude of these gains are highly dependent on the components that drive response. The baseline case in the simulations was chosen to reflect when a composite endpoint is recommended for use, i.e. when all four components were responsible for driving response. In this scenario the precision gains achieved resulted in the latent variable method reporting the effect 2.5 to 17.5 times more precisely than the standard binary method. However, in practice in SLE trials this has not been found to be the case. A review of two phase III trials (N= 2262) using the SRI-5 index found the SRI-5 response rate at week 52 for all patients was 32.8% [114]. Non-response due to a lack of SLEDAI improvement, concomitant medication non-compliance or dropout was 31, 16.5 and 19.1%, respectively. Non-response due to deterioration in BILAG or PGA after SLEDAI improvement, concomitant medication compliance and trial completion was 0.5%. This is in agreement with our findings from the MUSE trial data, which suggests that the precision gains in the baseline case are optimistic. The simulation results show that when one continuous and one binary component drive response, the latent variable method may be anywhere between 1 and 12 times as precise. This means that in a very small number of cases (<2%) there are no precision gains from the increased complexity of the latent variable method. The

potential gains available in 98% of cases ensure that implementing the latent variable method is very much a worthwhile endeavour, for all stakeholders in a clinical trial. Another useful metric in considering whether the method should be employed in practice is the MSE, as this is a combined measure of bias and variance. The simulation results show that the MSE of the reported treatment effect from the latent variable method (0.01-0.04) is always smaller than that of the standard binary method (0.06). Another important consideration comes from an ethical context. Having the skills to interpret these performance measures means that statisticians also have an ethical obligation when recommending methods for use. If the confidence interval has close to nominal coverage, it is important to consider whether an unbiased point estimate is crucial, especially when the required sample size may be reduced by 60%+. This sample size reduction would mean that fewer patients are subjected to placebo, effective drugs may make it to market sooner and could allow randomisation ratio to be moved away from 1:1 without affecting power. We therefore recommend the latent variable method for use in practice in SLE trials. Should the method not be employed as the primary analysis method, it should at least be fitted as a secondary analysis measure to enhance understanding of the trial data.

In addition to SLE, we have identified other disease areas that have a similar complex composite structure, meaning the potential to improve efficiency extends well beyond the SLE paradigm. However, it must be acknowledged that the exact structure of the endpoint may offer different magnitudes of bias, precision and computational time. In addition, as we have coded the likelihood ourselves with no generic package available to do this, the likelihood and probability of response code will have to be tailored specifically to each endpoint. In order to promote implementation in the general case of multiple continuous and discrete outcomes, we will need to develop a software package. This is beyond the remit of this thesis but is an important consideration for future work.

Obtaining maximum likelihood estimates from latent variable models has been achieved in different ways throughout the literature. In this work we have used a quasi-Newton algorithm however these and Newton type algorithms are not without their limitations, such as tending to be slow or intractable in higher dimensions [85]. The EM algorithm has been proposed in this setting as it lends itself well to situations with unknown parameters such as the τ-thresholds, however conditioning on these parameters as in (3.4) violates regularity conditions. Hence a Parameter-Expanded EM algorithm which transforms the latent variables and expands the parameter space may be more

3.9 Discussion 115

appropriate [115]. For an implementation of this estimation method when identifying genetic factors for comorbid conditions, see the work conducted by Zhang [96]. Im- plementing the method as we have done in this paper is computationally demanding however we would not expect the Parameter Expanded EM algorithm to rectify this and may actually lead to increased computational time. More work is required to compare estimation methods for latent variable models in general.

The work in this chapter advocates the use of novel methodology to extract more available information from a complex composite endpoint. An obstacle for the uptake and implementation of this method is the lack of an existing method to perform a sample size calculation for a given trial. We explore this in the following chapter.

Chapter 4 Sample Size Estimation using the

Latent Variable Model

4.1 Motivation

Sample size estimation plays an integral role in the design of a clinical trial. The objective is to determine the minimum sample size that is large enough to detect, with a specified power, a predetermined clinically meaningful treatment effect. Although it is crucial that investigators have enough patients enrolled to detect this effect, overestimating the sample size also has ethical and practical implications. Namely, in a placebo-controlled trial, more patients are subjected to a placebo arm than is necessary therefore withholding access to potentially beneficial drugs from them and delaying access to future patients. Furthermore it results in longer, more expensive trials, using resources that could be allocated elsewhere.

Mixed outcome components may be collapsed into a binary composite endpoint based on response thresholds, as we have seen previously. If a composite is selected as the primary endpoint in a trial then a sample size calculation is needed and this is typically based on the overall binary responder endpoint analysed using logistic regression. Sample size calculations performed in this way are valid but when applying a novel analysis approach that increases power, such as the latent variable model in Chapter 3, it is desirable to have the option to take this into account in the sample size calculation. If we can develop an approach to calculate the sample size using the latent variable method then the potential efficiency gains are much more likely to be realised in practice.

In document Statistical Methods to Improve Efficiency in Composite Endpoint Analysis (Page 139-146)