Here a Bayesian design of the ViP trial with the inclusion of informative priors on the baseline hazard function is considered.
The target population for the trial is the same as for the three trials that are illustrated in Figure 6.1, with the GemCap trial in particular being administered in the same trials unit. ViP, in a similar fashion to the previous trials, also includes a control arm which is Gemcitabine alone. Furthermore, the data shown in Figure 6.1 were also used in the trial design, informing the sample size calculation.
To illustrate the effect of the informative baseline hazards, two approaches are taken. Firstly, a fully Bayesian sample size technique is followed based on the sam- pling methodology of Wang and Gelfand [178] and De Santis, [211]. Secondly an ap- proach is taken whereby the main efficacy parameter of interest is assumed fixed at pre-determined values. The purpose of this second approach is to obtain quantities similar to the frequentist Type I and Type II error rates for comparison with the initial trial design.
7.6.1 Bayesian sample size for ViP
The Average Length Criterion (ALC) is chosen as the utility function on which to base sample size calculations and it is define a-priori that a posterior length of 0.6 is of interest to obtain a coverage of 90%. Prior point estimates are obtained using the estimates that are given in Table 7.1 along with the survival estimates that are obtained from the GemCap trial.
Data are sampled using the marginal distribution of the Bayesian PEM as shown in Section 6.2. Prior variability is defined using the effective number of events approach and effective number of prior events are set as ’ = 10,20,30 and 50. Design priors
are set from Normal distributions with the most informative log baseline hazard priors, (’ = 50) along with a prior distribution for the log hazard ratio ofN ≥(log(0.6),0.5).
This is chosen to replicate the initial design parameters of the ViP trial. For each sampled dataset, administrative censoring is applied to any survival time greater than 24 months. Patterns of censoring are obtained using the same methods as Section 3.4 and Section 6.4.
Data are simulated for total sample sizes of 60 to 150 by increments of ten. The resulting ALCs from each set of simulations, for each model are shown in Figure 7.8. These show the resulting ALC estimates obtained from varying sample sizes for un- informative priors and informative priors based on the four effective event scenarios described above. This shows how the behaviour of the ALC criterion alters depending on the prior distributions set. Including prior information improves the behaviour in all cases. Improvement is negligible for the less informative of the locally flat priors
however. Normal priors consistently out perform the locally flat priors in passing the 0.6 threshold at smaller sample sizes.
60 80 100 120 140 0.4 0.5 0.6 0.7 0.8
ALC; Normal Priors
Total Sample Size (N)
Inter val length 60 80 100 120 140 0.4 0.5 0.6 0.7 0.8
ALC; Step Priors
Total Sample Size (N)
Inter val length 60 80 100 120 140 0.4 0.5 0.6 0.7 0.8
ALC; Trapezium Priors
Total Sample Size (N)
Inter
val length
Figure 7.8: Figure to show the performance of the ALC for normal, step and trapezium prior distributions
The results are given in Table 7.4. As a reference model, where the priors remain uninformative, 98 patients are required on average to ensure that a length of 0.6 will contain 90% of the posterior distribution. This is smaller than the 120 patient required for the frequentist design which is in part due to the differing approaches of the two methodologies as a Bayesian approach attempts only to control some aspect of the posterior distribution whereas frequentist approaches by contrast attempt to control against two types of error. Some disparity is also expected based on the parameters chosen on which to base the Bayesian design.
Sample size estimates show that as more information enters the model through the priors, smaller numbers of patients are required to control the width of the posterior distribution. In the most extreme case, 74 patients are required to obtain an ALC of 0.6. Again, this effect is accentuated for normal priors compared to the locally flat alternatives. Considering the locally flat priors, the Step prior has a larger effect than the Trapezium prior.
Effective Sample Size ’ = 10 ’ = 20 ’ = 30 ’ = 50
Normal 92 86 78 74
Step 94 88 79 74
Trapezium 97 92 84 76
Table 7.4: Sample size estimates for the ViP trial under various differing priors on the baseline hazard function under the ALC.
Normal priors in comparison to the locally flat priors in terms of the ALC. The locally flat priors may still be preferred in practice as they may be easier to derive and can inform a trial design and analysis without a-priori setting a point estimate for the most likely solution.
7.6.2 Bayesian type I and type II error rates
To evaluate Bayesian Type I and Type II error rates, a Successful Trial Criterion (STC) is utilised. In the context of the ViP trial, according to the initial design parameters, the trial is a success only if Ø 90% of the posterior distribution is less than zero. To
evaluate Bayesian Type I and Type II error rates, two special conditions of the STC are considered where the design priors for— are set to fixed values of 0 and” respectively. Specifically setting — = 0 and calculating the STC will give the Type I error rate and equivalently — = ” for a Type II error rate. Whilst from a Bayesian perspective, sampling from a distribution where the key parameter of interest is considered fixed is inappropriate, this method allows estimation of quantities analogous to the frequentist Type I and Type II error rates.
To ensure that reliable estimates of Type I and Type II errors are obtained, 2000 datasets are simulated following the same procedure as in Section 5.1 but with a fixed sample size of 120 patients to replicate the initial ViP design. The aim here is to show the ‘error rates’ can be improved over the proposed design as opposed to searching for a sample size based on controlling Type I or Type II error rates.
The results are given in Table 7.5 and show that for the reference model, design pa- rameters similar to that for the ViP trial are obtained. As with sample size calculations based on the ALC, as more information enters into the design through informative pri- ors, the Type I and Type II error rates improve. Considering prior distributions based on normal distributions, for the most informative priors Type I and Type II error rates of 0.07 and 0.08 respectively are obtained. Again, the effect is lessened for locally flat priors with only the most informative priors having any noticeable effect on the Type II error rates.
It is also of interest to note that there is a plateau in the effect that increasingly informative priors have. This is due to the reasoning that as more information enters the prior distributions through effective events, the further information that obtained from the events in the control arm during the course of a trial is reduced. Design parameters here become more dependent on what is observed in the experimental arm as the data form the control arm contribute less towards the estimate of the log hazard ratio.
There is little effect on the Type I error rates, showing when data are simulated with the efficacy parameter fixed at zero the effect of falsely concluding that a new therapy is superior does not change.
Effective Prior Events Priors Error 10 20 30 50 Reference Type I 0.12 0.11 0.12 0.11 Type II 0.10 0.10 0.11 0.12 Normal Type I 0.11 0.09 0.10 0.09 Type II 0.08 0.07 0.07 0.07 Step Type I 0.11 0.10 0.10 0.10 Type II 0.09 0.07 0.07 0.08 Trapezium Type I 0.11 0.10 0.10 0.10 Type II 0.10 0.08 0.08 0.08
Table 7.5: The effect of different priors and effective prior events on Bayesian Type I and Type II error rates
7.7 Discussion
In this chapter, a method by which summary information on the survival rates can be taken from previous trials or expert information and incorporated into the design and analysis of clinical trial data with a time-to-event endpoint has been introduced. As an example, the GemCap trial carried out at the Liverpool Cancer Trials Unit is used. It was shown that increased precision in the log hazard ratio can be achieved. Though in this instance this also results in some shrinkage towards the ‘null point’ of no difference.
It is argued here that the main interest in any trial with at time-to-event endpoint is the (log) hazard ratio, however this quantity may only be deemed as clinically important if the survival rates in the control arm agree with the current medical thinking. Take for example the situation of a positive result showing the experimental arm to be an improvement over a control arm, but where survival probabilities in the improved experimental arm do not show any improvement over previously published data. In situations such as these, it is not immediately clear whether the within trial comparison should take precedence and an important difference declared or whether the results on the new therapy should be compared against other available information on patient performance.
In some way at least, the results of a single trial are always going to be compared against other trials or respective data available to the medical community. The Bayesian methods here allow for that information to be formally incorporated into the design and analysis and can therefore further inform the clinical decision making process.
Further introduced were the local step and trapezium priors, which penalise solu- tions which do not agree with prior information but have no effect when the data agree with previous evidence. It is argued that these priors may be particularly attractive as they do not require a single most likely solution be defined a-priori, rather that a set
of solutions within given bounds all deemed to be equally likely are defined. They may be of particular use in situations where data are sparse or expensive to collect and can therefore be used to encourage likely solutions without being overly influential.
The step and trapezium priors and the normal priors are applied to the design of the ViP trial and show that by incorporating prior information on the baseline hazard function, smaller sample sizes based on the average length of the posterior distribution of the log hard ratio are obtained. Considering trial design on the basis of Type I and Type II error rates, small improvements are also observed when trial designs incorporate informative prior distributions.
There have been recent methodological advances in the incorporation of historical information into the design of clinical trials, most notably the use of commensurate priors and power priors. Despite this however, there is still yet to be many examples of these approached being used in practice. This chapter introduced some of the practical steps that must be considered in the design of a clinical trial with a time-to-event out- come which incorporates historical information. In the next chapter the methodology presented here is extended into trial design to investigate the possibility of deviation away form the standard 1:1 allocation ratios that is common in randomised controlled trials.
Chapter 8
Unequal Allocation Ratios in a
Bayesian and Frequentist
Framework
8.1 Introduction
This chapter is concerned with the optimal allocation of patients to treatment arms in a two-arm clinical trial using reliable prior information. Initially a review is given on the allocation ratios that are used in practice. Following this, some theoretical results are obtained for binary and continuous outcomes where prior information is available. Analytical results for the optimal allocation ratios are obtained and applied to simple examples.
The use of informative priors for survival outcomes are investigated for a standard exponential and a PEM. An analytical form for the optimal allocation ratio is derived based on the assumption of all patients having equal follow-up.