Lokhnygina and Tsiatis (2008) presented a fully optimized two-stage design that has minimum expected sample size averaged over a range of alternatives. In this paper, we simplified this design and presented a method to create a pre-specified optimal two-stage design with a limited set of stage two sample size possibilities to lessen the information revealed at the interim analysis.
In this paper, we focus the stepwise adaptive design with two choices of second-
stage sample size for the prior distribution of θ ∼ N(δ/2,(δ/2)2). We set the choice
θ P o w e r 0 0.5δ δ 1.5δ 0 20 40 60 80 100
Stepwise Adaptive Design Gao's Adaptive Design
Stepwise AD matching Gao's AD power at 0.5 Delta
θ Eθ ( N ) Nfi x 0 0.5δ δ 1.5δ 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
Stepwise Adaptive Design Gao's Adaptive Design
Stepwise AD matching Gao's AD power at 0.5 Delta
Figure 2.4: Power Curve (left) and expected sample size (right). Grey line shows the power curve for a stepwise adaptive design which matched the power of Gao’s
adaptive design at 0.5δ.
either the futility bound or efficacy bound at the first interim analysis, i.e.,n2 =n4,
and to a different value when the first-stage test statistic falls into an intermediate
region away from the first-stage stopping boundaries,i.e., an intermediate treatment
effect is observed that is not particularly close to the null or alternate hypothesis effect size. This feature of the design improves blinding of the interim treatment effect by lessening the information revealed at the interim analysis. Each second-stage sample size corresponds to one range or two ranges of the first interim analysis test statistic, as shown in Table 2.1. If the study proceeds to the second stage with sample size of
0.68Nf ix, we know only that the standardized first-stage test statistic is between 0.69
and 1.70. If the study proceeds to the second stage with sample size of 0.55Nf ix, we
know only that the standardized first-stage test statistic is either between 0.48 and 0.69 or between 1.70 and 2.01. The fully optimized two-stage adaptive design has unlimited choices of second-stage sample size due to its continuous nature and could
of second-stage sample size. The optimal two-stage group sequential design has only one choice of second-stage sample size and reveals the least information (only gave one range of first-interim analysis test statistic). The stepwise adaptive design and the optimal two-stage group sequential design therefore reveal less information about the interim treatment effect than the fully optimized adaptive design.
We have seen that the efficiency loss from the stepwise adaptive design may be min- imal compared to the substantially more complicated fully optimized design (Lokhny- gina and Tsiatis (2008)). The stepwise adaptive, fully optimized adaptive designs and optimal two-stage group sequential designs have similar expected sample size and
overall power over the range of θ. Advantages of the stepwise adaptive design over
the optimal two-stage group sequential design are that the minimum second-stage sample size is much smaller, and the stepwise adaptive design is less likely to require the maximum sample size compared to the optimal two-stage group sequential design. Notice the shape of the stepwise adaptive design is not symmetric. This is also true for the fully optimized two-stage adaptive design (Lokhnygina and Tsiatis (2008)). This might be caused by the optimization process which requires a minimum expected sample size for a given prior. We design a symmetric stepwise adaptive design with equal length of continuation region when the first-stage test statistic is close to the futility bound or efficacy bound at the first interim. We compare the expected sample size for the current stepwise adaptive design with this symmetric stepwise adaptive design. The expected sample size for the current stepwise design relative to a fixed
sample size design is 0.77096 compared to 0.77107 for the symmetric stepwise adaptive design.
Levin et al. (2011) recently presented a completely pre-specified optimal adaptive design. This design is similar to our stepwise adaptive design in that we both used step functions. Levin et al. (2011) only considered the symmetric design and optimized the design by assigning half the weight on the null and half the weight on the alternative and achieved the optimization through adding more steps to the design. Our design focuses on the design with fewer steps and minimizes the expected sample size over a range of alternatives.
Chuang-Stein et al. (2006) pointed out that the interim treatment effect size can be highly variable and potentially too unreliable to be used directly for sample size re-estimation purposes. And in general, the sample size re-estimation design based on conditional power is likely not optimized for expected sample size. Jennison and Turnbull (2003) have demonstrated that mid-course sample size modification based on the observed treatment effect come with the cost of efficiency when compared with group sequential designs. The stepwise adaptive design is an extension of standard group sequential design. This design is pre-specified at the design stage as the group sequential design and also provides the opportunity of sample size adaptation with great efficiency. The stepwise adaptive design provides a solution by combining the prior information and the information within a trial.
two-stage adaptive and with optimal two-stage group sequential designs, but reveals less information about interim treatment effect than the fully optimized adaptive design and has the potential to increase sample size based on interim results.
Chapter 3
Sample Space Ordering and
Inference for Group
Sequential/Adaptive Designs
3.1
Introduction
Armitage, McPherson, and Rowe (1969) numerically showed that if significance
tests at a fixed level are repeated at interim analyses, the Type I error rate (or α) is
greatly increased over the nominal level. Simple group sequential methods for a pre- defined number of equally spaced interim analyses were developed by Pocock (1977) and O’Brien and Fleming (1979) to control the Type I error rate by adjusting the
Fleming (1979) designs to a class of group sequential tests, also referred as boundary families. But the boundary family designs assume the maximum number of analyses, K, be fixed in advance and require equally spaced interim analyses. Lan and DeMets (1983) suggested an alternative method to construct discrete sequential boundaries
by using α-spending functions. The boundary at a decision time is determined by
α(t), where t is the timing of the interim analysis, which is also called information
time. Information time t is defined as Ii/Imax for i = 1, . . . , K, where Ii is the
statistical information at analysis iand Imax represents the maximum planned infor-
mation at the time of design. Kim and DeMets (1987) and Hwang, Shih, and DeCani (1990) individually extended the method of Lan and DeMets (1983) to a general one-
parameter family of α-spending functions, α(t;γ) = α×hγ(t), where the parameter
γ specifies the rate of α-spending. The function h(t) is increasing in t ∈ (0,1) with
h(0) = 0 and h(t) = 1 for t ≥ 1. Pampallona, Tsiatis, and Kim (2001) extended
the Type I error spending method of Lan and DeMets (1983) by incorporating an
analogous Type II error (or β) spending function for interim analyses to test futility.
Anderson and Clark (2010) discussed additional one- and two-parameter spending families. Their two- or three-parameter spending function families provide additional flexibility to customize the shape of spending functions to fit more than one desired critical value. The spending function approach has become common because of its flexibility in accommodating unequally-spaced analyses and allowing some leeway in moving, adding or deleting interim analyses as long as this is done without knowledge
of treatment effects. This is compared to boundary families which require a fixed total number of analyses, generally performed at equally-spaced intervals. The boundaries
constructed by α- andβ- spending functions are determined by the past and current
information times but not by future information times, and not by the total number of analyses. These are the properties of the spending function approach that allow flexibility in resetting timing of analyses during the course of the trial.
Group sequential designs with asymmetrical boundaries permit clinical trial stop- ping for efficacy when the interim results cross the upper boundaries or stopping for futility when the interim results cross the lower boundaries. Boundaries of the group sequential design define the acceptance or rejection of the null hypothesis of the group sequential test on their own, however the boundaries do not provide additional in- formation about the relative strength of the evidence to reject the null hypothesis. Fori= 1,2, . . . , K, letZi be the test statistic against the null hypothesis H0 in favor
of the alternative hypothesis H1 at analysis i. Let Ci be the continuation region at
analysis i and CK =∅. Ω is the sample space defined by a classical group sequential
design, that is, the set of all pairs (i, zi) wherezi ∈/ Ci so that the test can terminate at
stageiwith (T, ZT) = (i, zi). A p-value for testingH0 can be stated as the probability
under the null hypothesis of obtaining (i, zi) as extreme or more extreme than the
observed (i∗, zi∗), where “extreme” refers to the ordering of Ω. A fixed sample design
(with no monitoring) has unique ordering of the sample space under the normality assumption due to the monotone likelihood ratio property. The p-value converges to
0 as z → ∞, and the p-value converges to 1 as z → −∞ for a fixed sample design. But this is not the case for a group sequential trial. Since the number of observations varies between different stages, there are many ways to order the possible outcomes. We start with a brief review of the basic concepts of group sequential testing and existing sample space orderings for group sequential designs, including stage- wise ordering by Tsiatis, Rosner and Mehta (1984); maximum likelihood estimate (MLE) ordering by Emerson and Fleming (1990); likelihood ratio ordering or z-score ordering by Chang (1989); score test ordering or B-value ordering by Rosner and Tsiatis (1988), and sequential p-value ordering by Liu and Anderson (2008a). We prefer to use sequential p-value ordering because this method uses the totality of the accumulating data which takes into account the entire sample path, while the other orderings only consider the data where the boundary was crossed or the data
at the current analysis. We will show that spending functions with the form of
α(t) = α×h(t) do not completely order the sample space using the power spending
function as an example. This has the disadvantage that there is often a broad range of the sample space at an interim analysis where the p-value is 1. The exponential
spending function from Anderson and Clark (2010), αe(t;ν) = αt
−ν
, has a different form from most commonly used spending functions. We will define what we mean by the complete ordering of a group sequential sample space and show that a Wang- Tsiatis boundary family or an exponential spending function family or Lan-DeMets O’Brien-Fleming approximation can completely order the sample space. We also
propose a simple method to transform a spending function to a completely ordered sample space when using the sequential p-value ordering, a power spending function
will be used as an example. This method is also extended to β-spending functions
for p-values to reject the alternate hypothesis. We’ll then give examples to illustrate our approach.