Sample Size Determination - Designing a series of clinical trials

Due to the inherent biologic variations in patients presented with the same condition it is necessary to have a group of patients in clinical trials. The results from the sampled patients are consequently used to infer how the treatments may behave in the population. Thus, one of the fundamental issues in the design of a clinical trial is the number of patients that should be sampled. On the one hand, a sample size that is too large may delay the treatment from being made available to the population when it has shown some minimum clinical efficacy. On the other hand, an inadequate sample size may not be able to draw a valid conclusion thus, subjecting patients unnecessarily to “questionable” treatments.

This chapter discusses some of the common methods used to determine sample size for a superiority trial which aims to establish the superiority of the experimental treatment to a control treatment that could either be a placebo or the current standard treatment. In a phase II setting however, it may not be necessary to have a controlled arm. Instead the efficacy of

Sample Size Determination

the experimental treatment is compared against a known value of an historical control. The designs discussed in the later sections thus include both controlled (two-arm) and uncontrolled (one-arm) trials, and the primary endpoint is assumed to be either a continuous variable or a binary variable.

This thesis assumes that the trial’s primary outcome, X, is a random variable that has a known form of a probability distribution function with an unknown parameter θ. The probability density function (or equivalently the probability mass function for a discrete variable) is represented by f(x|θ). Patients’ outcomes are independent of each other and so each random variable is assumed to be independently and identically distributed with the same probability density function, f(x|θ).

The sample size is determined based on the analysis of the primary endpoint that is to be done at the end of the trial (ICH, 1998), that is, an inference on parameter θ is made. Issues such as patient withdrawal and protocol violation may affect the actual number of patients in a trial and thus, methods that deal with these issues are employed alongside the common sample size calculation formulations to determine the minimum sample size that is necessary. However, these methods will not be discussed in this thesis.

Sample size determination methods are broadly classified into two main groups, namely, frequentist and Bayesian. The Bayesian methods are further divided into two general categories; an inferential technique and a decision theoretic technique which treats the inference problem as a decision problem based on utility or loss function. Some designs make use of both frequentist

Sample Size Determination

As discussed briefly in Spiegelhalter et al. (2004, Ch. 6), some authors advocated the decision-theoretic framework for clinical trial designs and some advocated the inferential framework. For the former framework the argument is that a decision is ultimately made whether to stop the trial or not and therefore, assessing utilities. Under the inferential framework the argument is that the estimated efficacy of interest should be reported with sufficient confidence because the population who may be receiving the treatment after the trial is usually more heterogeneous than the population in the trial. Both frameworks have their merits and Whitehead (1993) argues that Bayesian decision-theoretic framework is appropriate in early phase trials whereas a frequentist approach is suitable for phase III trial.

This thesis concentrates on the designs of a series of trials based on the hybrid approach. As such the general frameworks of both frequentist and Bayesian methodology will be discussed. The first section is on the frequentist method for continuous and binary random variables. There is a subsection on sample size determination for phase II trials based on the frequentist approach (Section 4.1.3). Following on, Section 4.2 discusses some general clinical trials design based on the Bayesian methodology. The hybrid method is discussed in Section 4.3 and Section 4.4 presents some published works on designing a series of clinical trials. Finally, Section 4.5 concludes the chapter. Details of the methods discussed in this chapter can be found in these textbooks: Friedman et al. (2010), Joseph and Belisle (1997), Julious (2010), Lachin (1981) and Machin et al. (2009).

Sample Size Determination 4.1 Frequentist method

4.1 Frequentist method

The principle of sample size determination is to choose a sample size so that an analysis is performed on the observed primary outcomes and consequently infer the unknown parameter θ. Therefore, it is necessary to ascertain the primary patient outcome which is chosen based on the main objective of the trial.

In the classical frequentist approach, the objective of the trial is delineated in the hypothesis where the null hypothesis states that the true difference between the experimental and control treatments belongs to a subset of the parameter space. Let Ω be the parameter space and let ω be the subset of Ω where the true difference belongs to. For the purpose of designing a trial, the alternative hypothesis needs to be specified, too, where it simply states that the true treatment difference does not belong to ω but to Ω−ω.

4.1.1 Continuous variable

Controlled trial

Consider a trial examining the efficacy between the experimental treatment and a control treatment, and that the primary outcome is a continuous variable, that is, it takes a range of values from the real line. Suppose that the trial’s objective is to estimate the true difference between the mean of the true outcome of the experimental treatment and the mean of the true outcome of the control treatment. Let X1, X2, . . . , Xn1 ben1 independent continuous

Sample Size Determination 4.1 Frequentist method

X =Pn1

i=1Xi/n1be the mean ofX1, . . . , Xn1 and as the observations from the

patients are assumed to be independent from each other, X ∼N(µ1, σ12/n1). Similarly, letY1, Y2, . . . , Yn2 be then2 independent continuous outcomes from

the experimental treatment group where Yj ∼ N(µ2, σ22), j = 1, . . . , n2, and let Y = Pn2

j=1Yj/n2 be the mean from the experimental group where

Y ∼N(µ2, σ22/n2).

Let the difference of the mean of the two treatment arms denoted by

δ =Y −X. The observations from both arms are assumed to be independent of each other and so δ ∼ N(µ2 −µ1, σ12/n1+σ22/n2), that is, δ also follows the normal distribution with mean µ2−µ1 and varianceσ12/n1+σ22/n2. For ease of notation, letθ =µ2−µ1 andn1 =rn2 wherer is the allocation ratio. The total number of patients to be recruited to the trial is N = n1 +n2 = (1 + r)n2. Therefore, rewriting the notation, δ ∼ N(θ,(σ₁2 +rσ₂2)/(rn2)), which is equivalent to δ−θ p (σ2 1 +rσ22)/(rn2) ∼N(0,1). (4.1)

The objective of the trial is to estimate the true difference between the two means. Following on the review in Chapter 3.3, assuming that there is only one element in ω and that element is denoted by θ0 then the null hypothesis is written as H0 :θ =θ0. Under the alternative hypothesis, also, assumed that there is also only one element in Ω−ω and this is denoted as

θA, so the alternative hypothesis is H1 : θ =θA. In determining the sample size, bothθ0 and θAare specifieda priori. These two values may be obtained by assuming the true mean of the control arm and the smallest difference

Sample Size Determination 4.1 Frequentist method

that is of clinical significance between the two arms.

In the process of inferring the population mean, two types of errors may be incurred and they are type I and II errors. Type I error is the error of rejecting the null hypothesis when it is true and type II error is the error of not rejecting the null hypothesis when it is false. The maximum allowable levels of type I and II error rates need to be specified a priori, too, in the design stage when determining the sample size. Letα andβ be the maximal allowable levels for the type I and II errors, respectively,

Pr(Reject H0|θ=θ0) = Pr(type I error) =α

Pr(Non-rejection of H0|θ =θA) = Pr(type II error) =β. (4.2)

Assuming that the variance ofδ is the same in both null and alternative hypotheses, a general equation for sample size determination can be devel- oped from (4.1) and (4.2). Similar to the steps seen in (3.16), under the null hypothesis and a two-sidedα-level of significance, the critical region to reject

H0 is given by Pr(Reject H0|θ =θ0) =α/2 ⇔ Pr(δ > c|θ =θ0) =α/2 ⇔ Pr Z > _p c−θ0 (σ2 1 +rσ22)/(rn2) =α/2 ⇔ 1−Φ c−θ0 p (σ2 1 +rσ22)/(rn2) =α/2 ⇔ c=z1−α/2 s σ2 1 +rσ22 rn2 +θ0 (4.3)

Sample Size Determination 4.1 Frequentist method

where Φ(·) is the cumulative distribution function of the standard normal distribution and zγ is the lower 100γ percentile of the standard normal distribution.

Under the alternative hypothesis, the random variable δ follows the normal distribution with mean θA and variance (σ12+rσ22)/(rn2), equivalently, this is rewritten as (δ−θA)/(

p (σ2

1 +rσ22)/(rn2))∼N(0,1). As shown earlier, in (3.17) the power function is

Pr(Reject H0|θ =θA) = Power

⇔ Pr(δ > c|θ =θA) = 1−β (4.4)

Substituting (4.3) into (4.4) to solve for n2,

Pr Z > z1−α/2+ θ0−θA p (σ2 1 +rσ22)/(rn2) = 1−β ⇔ z1−α/2− θA−θ0 p (σ2 1+rσ22)/(rn2) =zβ ⇔ √rn2 = (z1−α/2−zβ) p σ2 1+rσ22 θA−θ0 ⇔ n2 = (σ2 1 +rσ22)(z1−α/2−zβ)2 r(θA−θ0)2 . (4.5)

Therefore, the total number of patients needed is N = (1 + r)n2 where

n1 = rn2 patients are randomized to the control arm and n2 patients are randomized to the experimental treatment arm.

Assuming that the variances of both arms are equal, σ2 ₌_σ2

Sample Size Determination 4.1 Frequentist method

the allocation ratio is 1:1 then (4.5) is simplified into

n2 =

2σ2₍_z

1−α/2−zβ)2 (θA−θ0)2

. (4.6)

The total number of patients to be recruited is thus, N = 2n2, a positive number. In practice, N is rounded up to the nearest even integer. For example, consider an asthma clinical trial whose objective is to detect a difference of 10% between treatments in mean percentage change from baseline in the forced expiratory volume in one second (FEV1) observed 12 hours after the treatment dose against a null hypothesis of no difference between treatments, that is,θ0 = 0 andθA= 10. The standard deviations of both treatment arms are assumed to be equal and fixed at σ = 14. Let α = 0.05 and β = 0.10, and the randomization ratio is 1:1 then from equation (4.6), the number of patients needed per arm is n1 = n2 = 41.189. Rounding up to the nearest even integer the total number of patients needed is N = 84.

Uncontrolled trial

Some phase II trials are conducted with only one treatment arm, that is, the experimental treatment is tested against a known historical control value. In this setting, the sample size determination is done by testing the parameter

µ2 of the random variable Y under these hypotheses: H0 : µ2 = θ0 against

H1 : µ2 =θA where θ0 is the known historical value and θA is the minimum value that is of clinical significance that may be attained by the experimental treatment. Given the type I and II error rates and assuming that the variance

Sample Size Determination 4.1 Frequentist method

is known, the desired sample size is

n= (z1−α/2−zβ)

2_σ2 2 (θA−θ0)2

. (4.7)

Consider the example given in the controlled trial setting but instead of randomizing patients to two treatment arms, all npatients are given the experimental treatment. Assume that the mean percentage change from baseline in FEV1 of the historical control isθ0 = 0 and that the mean percentage change of the experimental treatment is θA = 10. The standard deviation is as given in the earlier example, σ2 = 14 and so, for a two-sided test at 0.05 level of significance and power of 0.90 the desired sample size is n = 21 (rounding up from n = 20.59).

4.1.2 Binary variable

This section considers clinical trials with binary data as the primary endpoint, that is, the response from each patient is dichotomized into, for example, success or failure, yes or no. Usually, a positive response such as success is represented numerically by 1 and a negative response, failure, by 0. Con- sider a two-arm randomized trial where patients are randomly allocated to either treatment. Let X1, X2, . . . , Xn1 be n1 independent binary outcomes

from the control group where Xi = 1 if the response is a success andXi = 0, otherwise, for i = 1,2, . . . , n1. Let S1 = X1 +X2 +. . .+Xn1 be the sum

of successes from n1 patients. A common and convenient probability distribution for Xi is the Bernoulli distribution and therefore, as seen earlier in Section 3.1.1, S1 is assumed to follow the binomial distribution with in-

Sample Size Determination 4.1 Frequentist method

Table 4.1: Statistics for a two-arm randomized controlled trial with binary response.

Treatment Number of Number of Observed Anticipated

arm successes observations success rate success rate

Control s1 n1 s1/n1 p1

Experimental s2 n2 s2/n2 p2

Overall s N

dex n1 and an unknown parameter p1, the probability of success. Similarly,

letY1, Y2, . . . , Yn2 ben2 independent binary outcomes from the experimental

treatment arm and let S2 =Y1+Y2+. . .+Yn2 be the sum of successes from

n2 patients. Likewise, S2 is assumed to follow the binomial distribution with index n2 and an unknown probability of success,p2. LetS1 =s1 andS2 =s2 be the observed successes at the end of the trial, then the statistics may be recorded as shown in Table 4.1.

There are a few methods to summarize the binary effects between two treatments at the end of the trial. Some of the common summary measures are absolute risk difference, odds ratio, log odds ratio and relative risk (Ta- ble 4.2). The sample size determination depends on the hypothesis which depends on the summary measure that would be used for the final analysis at the end of the trial. This thesis will concentrate on the log odds ratio but examples will be given for both absolute risk difference and log odds ratio summary measures.

Absolute risk difference: controlled trial

Let r be the allocation ratio such that n1 = rn2, then the total number of patients needed for the trial is N = (r+ 1)n . First, consider the setting of

Sample Size Determination 4.1 Frequentist method

Table 4.2: Summary measures for binary response.

Measure Definition

Absolute risk difference p2−p1

Odds ratio p2(1−p1)

p1(1−p2)

Log odds ratio log

p2(1−p1)

p1(1−p2)

Relative risk p2/p1

testing the absolute risk difference between the two treatment groups. The null and alternative hypotheses are thus written as

H0 :p2 =p1 against H1 :p2 6=p1.

For some values of n1 and n2 that satisfy these conditions n1p1(1−p1) ≥ 10 and n2p2(1−p2) ≥ 10, respectively, the normal distribution is a good approximation to the binomial distribution. As such, the test statistic to test the null hypothesis is defined as

Z = pˆ2−pˆ1

σ ,

where ˆp1 =s1/n1, ˆp2 =s2/n2 and ˆσ2 = ˆp¯(1−pˆ¯)(1/n1+ 1/n2). The average proportions of events is estimated as ˆp¯= (rpˆ1+ ˆp2)/(r+ 1).

The null hypothesis is defined as, H0 : p2 = p1 = ¯p, where ¯p = (rp1+

p2)/(r+ 1) and the variance is σ20 = (1 +r)¯p(1−p¯)/(rn2). Under the null hypothesis and for large sample sizes, the test statistic, Z, is assumed to follow the standard normal distribution, that is, Z ∼N(0,1). Following the steps seen in Section 4.1.1, for a two-sided α-level of significance the critical

Sample Size Determination 4.1 Frequentist method

region to reject the null hypothesis is

Pr(Reject H0|p2 =p1) =α/2 Pr Z > c−(p2−p1) σ0 =α/2 1−Φ c σ0 =α/2 c=z1−α/2σ0. (4.8)

As the variance of the test statistic depends on the anticipated probabilities of success of each treatment arm, it is necessary to specify both p1 and p2 a

priori at the design stage for sample size calculation. Under the alternative hypothesis, H1 : p2−p1 = pA, the variance is σ12 = p1(1−p1)/n1 +p2(1−

p2)/n2, and the power function is

Pr Z > c−(p2−p1) σ1 = 1−β. (4.9)

Substituting (4.8) into (4.9) to solve for n2,

Pr Z > z1−α/2σ0−(p2−p1) σ1 = 1−β ⇔ n2 = z1−α/2 p (1 +r)¯p(1−p¯)−zβ p p1(1−p1) +rp2(1−p2) 2 r(p2−p1)2 . (4.10)

Therefore, the minimum total sample size needed is N = (1 + r)n2 and of this, n1 = rn2 is randomized to the control arm and n2 patients to the experimental treatment arm.

Sample Size Determination 4.1 Frequentist method

As an illustration of the sample size calculation, consider an asthma clinical trial whose objective is to test if the experimental treatment is better in controlling asthma exacerbation than the placebo. The primary endpoint is binary where at least an episode of moderate or severe asthma exacerbation during a 4-week treatment period is considered as a failure and the absence of any asthma exacerbation event is considered as a success. Assume that the probability of success of the placebo isp1 = 0.80 and that the probability of success of the experimental treatment is p2 = 0.90. For a two-sided significance level of α= 0.05 and power, 1−β = 0.90, and an equal allocation,

r = 1, the minimum sample size that is required per arm isn1 =n2 = 265.86. So, rounding up to an even integer, the minimum total number of patients needed is N = 532.

Absolute risk difference: uncontrolled trial

In a single-arm trial, the true probability of success of the experimental treatment is tested against a fixed and known historical control, p0. For sample size calculation, the probability of success of the experimental treatment,pA, is specified a priori. Thus, the null hypothesis is written as, H0 : p2 = p0, which yields σ2₀ =p0(1−p0)/n2 where n2 is the sample size required for the trial. The alternative hypothesis, on the other hand, isH1 :p2 =pA and the variance is σ2

1 =pA(1−pA)/n2. The sample size is

n2 = z1−α/2 p p0(1−p0)−zβ p pA(1−pA) pA−p0 2 . (4.11)

Sample Size Determination 4.1 Frequentist method

For illustration, let p0 = 0.80, pA = 0.90, α= 0.05 and β = 0.10, then from (4.11), n2 = 136.52 and so the minimum sample size required for the trial is 137.

Log odds ratio: controlled trial

The absolute risk difference discussed in the preceding section is a simple measurement bounded by (−1,1). This may however, lead to anomalies especially if the response is very near to the boundary (Julious, 2010, White- head, 1997). As such, the log odds ratio may be a better choice as a summary measure. The odds of success of the control arm is p1/(1−p1) and the odds

In document Designing a series of clinical trials (Page 63-98)