Randomization Based Confidence Intervals For Cross Over and Replicate Designs and for the Analysis of Covariance

(1)

Randomization Based Confidence Intervals For Cross Over and Replicate Designs

and for the Analysis of Covariance Winston Richards

Schering-Plough Research Institute JSM, Aug, 2002

Abstract

Randomization or permutation tests have been studied extensively under the finite model framework in comparative experiments to assess whether one treatment is significantly different than the other under the null hypothesis of no treatment difference.We explore the use of the randomization principle to construct confidence intervals for the treatment differences (considered as shifts) through the conceptual recovery of the null (hypothesis) state from the alternative in the sample. In the linear model framework, they may be constructed by generating a ’null’ distribution around the estimated shift (under least squares theory, say,) using the finite model rerandomization sample space of estimates derived from residuals. These confidence intervals do not require any assumptions of normality which are quite questionable for small sample cases. They are obtained under a very a simple assumption of additivity under the finite model setting. Illustrative

Coverage probabilities are presented for some applications of the methodology. We consider replicate (cross over) designs where the confidence intervals are obtained using randomization theory versus the usual normal theory assumptions and the mixed model approach with factor analytic covariance structure from the FDA guidance on average bioequivalence for pharmacokinetics.

1 Introduction

In situations where the distribution of the underlying population of interest is assumed (e.g., normal distribution), the validity of the procedure used, or the probabilistic functions based on the resulting sampling distribution will depend, perhaps critically, on the assumptions. Violations of the assumptions may have very serious consequences on the correctness or justification of the inference or decision involved. In the practical world, however, one's experience or prior information about a data situation may be quite inadequate to justify the assumptions. Besides, in many cases the sample size may be too small and the appeal to general large sample theory may be without basis. In these situations, distribution free or non-parametric methods may provide suitable procedures for analyzing the data. Below we give a brief and coherent description of finite model randomization methodology.

Fisher in his 1926 paper (1) and in his now classical book "The Design of Experiments" (1935) (2) put forward an idea, the principle of randomization, that has dominated

comparative experiment methodology for the past 75 years. The principle of randomization under a finite model setting was illustrated in testing the null hypothesis of no treatment difference in his Lady Testing Tea experiment. There Fisher used a restricted

(2)

randomization or permutation test conditional on fixed sample sizes for the two treatments. In this article the principle of randomization under a finite model setup is used as the framework from which we construct confidence intervals when the response variables are either ordinal or dichotomous (Richards and Gogate [7]). The principle of randomization was expounded in Fisher (1935) (2) and in greater detail by Kempthorne (1952) (3). Kempthorne and Folks(1971) (4) defined consonance intervals in the context of the inversion of tests of significance. However, only recently has the application of this principle directly in data analysis received widespread attention due to the advancement in the computer technology. Lehmann (1997) (5), has discussed construction of a confidence interval using this concept for a shift in location parameter of an infinite population as an approximation which may be considered as the treatment effect for a comparative trial.

Our motivation is inter alia to challenge the rather complex forms of assumptions and complicated analyses that are now rampant in the mixed model arena to make an inference about the comparison of treatment effects.

We suggest that the inference from the finite model approach is sound. It does incorporate or embrace a wide range of underlying realities under approximate

additivity of treatment effect, even under the infinite model distributional assumptions. We extend this concept to construct a confidence interval for the (additive)

treatment difference in (repeated measures) simple cross over design and replicate design for the case of two treatments. We investigate the extension of this approach to the analysis of covariance in the linear model framework . We suggest a way of obtaining confidence intervals of the odds in dichotomous contexts where the link function is considered linear in concomitant variables and additive in the treatment effect, for the logistic regression, say. 2 Finite Model Framework

An important feature of the finite model theory is its usefulness in making inference in situations where the probability distributions of the observations are unknown. Incorporation of randomization into experimental design gives a strong basis for statistical inference, particularly, if additivity as defined later holds. The probability statements and associated inferences have definitive relations to what is conceptually observable in a situation.

The reader is referred to the work of Kempthorne in the derivation of

Models which may be represented in form similar to parametric forms for the factorial analysis of variance, say, with main effects and interactions. However, these definitions are given in terms of the conceptual population defined below. They reflect algebraic partitions (combinations) of the basal responses of the experimental units without appeal to added distributional assumption of errors. In our discussions below we fill use the usual parametric forms of the models in our finite model estimation procedures.

Suppose there are N experimental units U1, U2, … , UN. Since each unit can receive any

of the t (sequences of) treatments, the conceptual population of responses will consist of N t possible (vector) responses. In an actual experiment, however, we are restricted by the fact that each unit is assigned to only one sequence of treatments so that we have a restricted sample from the conceptual population which we must use to make a statistical inference for the difference of treatment means, for example, via test of the null hypothesis or confidence

(3)

interval estimation. We restrict our discussion for the case of two treatments since that is of primary interest to us. Extension to a sequence of treatments follow logically. Suppose γ1 and γ2denote the true treatment effects (e.g., means or proportions) associated with the two treatments (say A (new or active treatment) and B (control or standard of therapy)). Suppose Zi denote the basal vector response of unit Ui. Under additivity, response Cik (of unit Ui

under treatment sequence vector k) can be expressed as Cik= Zi +γk.

In the sample, yi kthe ith vector response under treatment sequence k is then given by

yik=ΣjCj kδjk(i)

Where

1 if unit j is the i-th replicate unit of treatment sequence k δjk(i) = 0 otherwise Note that

(

)

1 1 ) ( N Pδ_ik i = =

1/N(N-1) for j≠j’ and i≠i’

(

k_j(i) =1, k_j_''(i') =1

)

Pδ δ

0 otherwise

Let y.kdenote the sample mean of the observations yik. Based on the above representation of yik, it follows that the E(y.k ) = Z. + γk and hence contrasts c1 y1 – c2 y2 is an unbiased

estimate of the true difference∆ = c1γ1 - c2γ2. Also notice that the average of any subset of

the sample observations on sequence k is also an unbiased estimate of the population mean Z. + γk, and consequently, linear contrasts the differences of the sample averages are unbiased

estimates of the true difference ∆. This property suggests a natural way of constructing confidence intervals as quantiles of the empirical distribution of the difference of populations means as described in Section 3.2.

3 Construction of Confidence Intervals

Conceptually we use an inversion process to test an hypotheses under the finite model setup. The equivalence (or one to one correspondence) between rejection and acceptance regions shows the structure of confidence sets as the totality of parameter values ∆ for which the hypothesis H (∆) is accepted when a sample is observed (see, for example, Section 3.5 of Lehmann (1997) for more details for the case where each unit has a single response). This property may be used in constructing the confidence intervals using randomization (permutation) tests. However a conceptually equivalent approach is to use an empirical distribution of linear contrasts on mean vector response per sequence based on various

(4)

samples of the two samples to obtain a confidence interval for the true difference (of the parameters under consideration). In this, the key property of unbiasedness of the difference in sample means under the finite model setup as described in the previous section is used as a basis to obtain the empirical distribution. We will derive confidence intervals for the difference in treatment effects. The usual least squares (or generalized least squares method under compound symmetry) estimates for the parameter models may be shown to yield unbiased estimates of differences in direct treatment effects under randomization for the finite model with additivity. We may use this general approach to obtain the point estimate and randomization theory to obtain the confidence interval estimates (relative to the point estimate) using the residuals.

Inversion method under the finite model set up 3.1.1 Treatment Unit Responses

We consider scalar responses to illustrate the theory. The generalization to vector responses under the simple additivity of component treatment effects follows.

Suppose X1, X2, … , Xm and Y1, Y2, … , Ynare sets of scalar observations under treatment sequences C and T respectively with true population means γ1 and γ2. We assume that the active treatment and the control appear in different periods for sequence B and sequence A. Without loss of generality, we assume that the first m are assigned to treatment A and the remaining n observations have received treatment B in what follows. Our interest is to obtain a confidence interval for the true contrast∆= c1γ1- c2γ2. Note that, under the finite model set

up with additivity, the X's and Y's are a realization of the original basal responses Z's where the actives are shifted by an amount∆as laid out in the following table.

Unit U1 . . . Um Um+1 Um+2 . . . Um+n

Treat. C X1 . . . Xm

Treat. T Y1 Y2 . . . Yn

Basal Z1 . . . Zm Zm+1+∆ Zm+2+∆ . . . Zm+n+∆

If∆ were known, by subtracting∆from the Y's we would be in a null hypothesis situation of no treatment difference and the resulting observations would be one of the

m+n

Cm possible (treatment group) realizations of the original Z's restricted only by the observed sample size. Therefore, a natural way of finding a confidence interval for ∆ with confidence coefficient 1 - α is to find the totality of values ∆o such that, by subtracting ∆0

from the Y's, the null hypothesis of no treatment difference is not rejected at a significance levelαusing a randomization test (based on the modified or adjusted values). The lower and upper bounds of the values ∆0 then constitute a confidence interval as noted earlier. The

algorithmic steps can thus be summarized as follows.

Compute the modified or adjusted responses from the observed values by choosing an initial value∆0to create a "null" hypothesis situation.

(5)

Compute the significance level by comparing the new values of the test statistic with the value obtained for the original observations from which∆0is subtracted.

Retain the value of ∆0 if the significance value is less than or equal to the value

corresponding to the desired confidence coefficient. Otherwise, repeat the above steps until one reaches the desired significance level, perhaps, by using the method of bisection or method of tangents.

The procedure may have to be carried out separately to obtain upper and lower confidence limits.

Note that it may not be feasible to obtain the exact value of α because the distribution is necessarily discrete.

Remark 1 The above algorithm is given mainly for pedantic purposes. A logically equivalent and more efficient way of obtaining confidence intervals is to apply the empirical differences method as described in Lehmann (1997)[5]in the finite model set up: Since each observation is an unbiased estimate of the population mean, so is the average of those observations taken r at a time where r∈{1,2…, min(m,n)}. In turn, the differences of such averages is an unbiased estimate of the difference∆ of the population means. Generate the distribution of all such possible differences. The quantiles Qα/2and Q1−α/2of this empirical

distribution will constitute a 1 - α confidence interval for ∆.. Equivalently, generate the ‘null’ distribution of all such possible differences, using the residuals from the least squares fit. The quantiles Qα/2 and Q1−α/2 of this ‘null’ empirical distribution shifted by the point

estimate will produce the 1 -αconfidence interval for∆. previously described.

Let us consider the replicate design with sequences 1, TRTR and 2, RTRT. Under additivity the conceptual vector response for patient k under sequences 1 and 2 may be written

Z1k =(uk1uk2uk3uk4)+τ(1 0 1 0) and Z2k =(uk1uk2uk3uk4)+τ(0 1 0 1), respectively, whereτis the additive difference between the test treatment T, and the control R. So that ifτ were known, subtractingτ(1 0 1 0) from the vector response of unit k under sequence 1 or correspondingly τ(0 1 0 1) under sequence 2 would recover the basal null vector response for unit k, i.e., Uk =(uk1uk2uk3uk4)

Under the assumption of a compound symmetry covariance structure, the best linear unbiased estimate of the direct treatment difference between treatments T and R adjusting for possible first order carryover effects is given by the inner product

T0= C ‘(Y1.– Y2.) =(6 -3 4 –7) [(y1.1y1.2y1.3y1.4) - (y2.1y2.2y2.3y2.4)]’ / 20

where the yi.p are the (scalar) component for period p of the mean vector responses over subjects in sequences i=1, 2.

(6)

A 1-α confidence interval for τ is given by (t1 t2) such that by modifying the observed vector responses as described above for τ equal to t1 and t2 would result in observed significance levels for the statistic T0 of α1 and α2, respectively, under the randomization distribution from the modified values, with α=α1 + α2. As before, this is equivalent to generating the distribution of T0 over all subsets taken r at a time from each sequence for r=1, 2, … l , where l is the minimum of n1 and n2 and selecting the appropriate quantiles. Alternatively, we may use the residuals from the least squares fit instead of the responses and shift the resulting ‘null’ distribution distribution by the point estimate.

3.1.2 The Analysis of Covariance.

In situations where concomitant values are considered to influence the response the analysis of covariance model for the response of the j-th replicate unit on treatment i is usually represented in the form

Yij=µ+αi+ Xiβ+εij

with ancova table and resulting ANOVA as follows:

---Sum of Squares

Source df xx xy yy

Treatments t-1 Txx Txy Tyy

Residual N-t Rxx Rxy Ryy

Total N-1 Gxx Gxy Gyy

ANOVA

X|Treat p Rxy’ Rxx-1Rxy

Treat|X t-1 Gyy- Ryy - Gxy’Gxx-1Gxy + Rxy’ Rxx-1Rxy

Error N-t-p Ryy- Rxy’ Rxx-1Rxy

Our interest here is to obtain a confidence interval estimate of the comparison of the treatment effects after adjusting for the nuisance parameterβ, estimated under least

squares byb= Rxx-1 Rxy , which for scalar b is equal toΣij (yij-yi.) (xij-xi.)/Σij(xij-xi.)2. We note that even under a single ‘null’ grouping the true value of βcannot in general be determined via the finite model with additivity. However, we could assign its value by

convention to be equal to Gxx-1 Gxz , where Z is the conceptual population of responses under basal conditions.

The LS estimate of the comparison between two treatments from the ANCOVA model is then given by

(7)

EMS ANOVA Under Randomization theory

For the simple case of equal sample sizes for two treatments under Randomization theory it may be shown that both estimates below are unbiased for the treatment effect.

E (ak- ak’) =τk-τk’.. (unbiased )

E (y.k) - E (yk’) =τk-τk’.. (unbiased )

However, for the analysis of variance table the expectation of means square for treatment effects and for error are not equal under null conditions, since

ER (Rxy’ Rxx-1Rxy) ± Gxz’ Gxx-1Gxz. (N-t-p)/(N-1-p) and ER (Rxx-1Rxy) ±Gxx-1Gxz.

Nevertheless, as the main interest is the comparison of the treatment effects the analysis of covariance model may be written in the analysis of variance form

wij =Yij- Xiβ=µ+αi+εij

Conditional on the value of the nuisance parameterβ, estimated by b, we can modify the initial value of the response by removing the nuisance effects and proceed, as in the analysis of variance above, to obtain the randomization confidence interval on the adjusted values {wki}={ yki.- xki. b}.

3.1.3 Confidence Intervals for the treatment effect in the Analysis of Covariance. As in the the previously described sections, the procedure for the analysis of covariance is equivalent to using the residuals from the least squares fit to generate the ‘null’ distribution of the treatment difference as described above and shifting the resulting quantiles of the distribution by the point estimate.

The rerandomization approach may be used in the determination of confidence intervals in binary situations where the link function is additive in the treatment effect. 3.1.4 Randomization Confidence Intervals for the treatment effect in binary logistic

regression.

For the case of two treatments where the response rate is considered approximately as increasing in a binary logistic regression, the model may be approximated as

E Yij=π(X)= exp(δi+ Xijβ)/ (1+ exp(δi+ Xijβ)),

whereβmay be considered as nuisance parameters and theδi= 0 orδ is the effect for one treatment relative to the other. As in the analysis of covariance the true value of βcannot in general be determined via the finite model with additivity. However, we could proceed to

estimate theδand βjointly, and obtain a confidence interval for δaccounting for X or conditional onβ.

Hirji, Mehta and Patel, 1987 (6), 1995, (6a) have derived and examined Maximum likelihood estimates and Exact likelihood ratio estimates based on the sufficient statistics.

(8)

A minor modification to their approach enables randomization confidence interval estimate forδto be obtained assuming the nuisance parametersβare known and equal to their estimated values, b. The limits would be given by the quantiles from the distribution of solutions of δusing the likelihood function or estimating equations on the subsets of the data of sizes 1 to m as described above with βequated to their estimated values, b.

Treatment subgroups will need both types of outcomes to obtain a solution for the treatment effect. Subgroups that violate this criterion may require an imposed modification in the relevant equations or may be assigned to null or extreme value solutions for the treatment effect depending on whether the patterns are similar or not.

The coverage based on this approach is to be compared with that for the exact CI of Mehta and Patel.

Results and Conclusions Replicate Designs a)

For the sample sizes investigated, n=12, 16, and 20, under the finite model (permutations) the coverage of our procedure is equal to the nominal value within tolerance of a finite discrete distribution. The coverage of confidence intervals obtained by the usual cross over model with first order carryover and normal assumptions under compound symmetry and from the mixed model procedure recommended in the FDA guidance for Bioequivalence exceed the nominal values.

b)

Sampling from an infinite model framework shows that our procedure produces coverage that is close to the nominal value. The coverage of confidence intervals obtained by the usual normal assumptions under compound symmetry is slightly exceeded by the nominal values. While coverage from the mixed model procedure of the FDA guidance for

Bioequivalence slightly exceeds the nominal values.

c)

Preliminary work on coverage for the analysis of covariance under a) the finite model (permutations) construction and b) the infinite model was performed for small sample sizes (12, 16 and 20) . The simulations were done using truncated pseudo normal variables, and various values of the slope. The results indicate that the coverage of our procedure for ANOVA is close to the nominal value for the 99%, 95% and 99% confidence intervasl and a little smaller than the nominal values in ANCOVA.for the 95% and 90% confidence

intervals. The width of the interval in ANCOVA is smaller than for ANOVA.

Sampling from the finite model framework, the coverage of confidence intervals obtained using the usual normal estimates are close to thethe nominal values. The finite model rerandomization procedure gives coverages that are equal to the nominal values for the ANOVA. and slightly less than the nominal values in ANCOVA.for the 95% and 90% confidence intervals.

(9)

e)

Computation issues

As the sample sizes increase the panoply of outcomes for the sample space becomes computationally prohibitive. In application, given the data we would estimate the confidence intervals by selecting a weighted random sample of subsets of sizes 1 to m as defined above The weights are determined by the relative frequency of the subset sizes in the randomization sample space.

Monte Carlo Results for the Replicate Design.

The results given below are for illustrative purposes. The infinite model data were generated to reflect various inter subject and intra subject heterogeneity for the treatments. Results are based on approximately 2000 samples each.

In the finite model table, results were obtained for underlying actual finite model additive data. The coverage examples for sample size 12:6-6* are based on the complete restricted set of 924 assignments of the 12 subjects to treatment groups of size 6.

For sample size 16:8-8 and 20:10-10 the results are based on 2000 random selections from the complete restricted set of assignments of the subjects to treatment groups of equal sizes.

Infinite model Results: Sample Size and Coverage(/2000)

Sample Sizes(N:n1-n2) 12:6-6* 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite model 98.90 95.20 90.60 99.05 94.74 89.30 98.55 94.10 89.65 glm w.carry-ovr 98.90 94.10 89.40 98.15. 93.30 87.35 98.35 92.95 87.25 mixed-fda 99.10 96.70 92.80 99.30. 95.35 90.95 99.20 95.35 90.90

Finite model Results: Sample Size and Coverage(/2000)

Sample Sizes(N:n1-n2) 12:6-6 16:8-8 20:10-10 Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite model 98.92 94.81 90.04 99.30 95.90 90.20 99.70 95.00 88.80 glm w.carry-ovr 99.89 99.68 97.40 100.0. 97.70 93.80 99.70 99.30 .97.70 mixed-fda 100.0 99.13 96.54 100.0. 97.0 94.10 100.0. 99.30 .97.90

(10)

Monte Carlo Results for the ANOVA and ANCOVA.

Again the results given below are for illustrative purposes. The infinite model data were generated using a pseudonormal random generator truncated at 2 standard deviations from the means.

In the finite model table, results were obtained for underlying actual finite model additive data. The data were derived using subsets from the samples above and unique

rerandomization assignments of the units into treatment groups. The coverage examples for sample size 12:6-6* are based on the complete restricted set of 924 assignments of the 12 subjects to treatment groups of size 6. For sample size 16:8-8 and 20:10-10 the results are based on 2000 random selections from the complete restricted set of assignments of the subjects to treatment groups of equal sizes.

Finite model Table: Sample Size and Coverage(/2000)

finite model Table: 12:6-6 16:8-8 20:10-10

Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite ANOVA 99.03 94.81 90.60 99.00 94.04 89.60 99.15 95.15 90.65 Finite ANCOVA 99.13 92.64 89.40 98.50. 91.70 86.65 99.05 94.70 88.60 GLM ANOVA 98.81 96.10 89.93 98.90. 93.75 88.50 99.15 95.15 90.70 GLM ANCOVA 99.81 95.24 89.83 99.00 94.65 89.65 99.35 95.70 90.7

Infinite model Table: Sample Size and Coverage (/1000)

infinite model Table: 12:6-6 16:8-8 20:10-10

Method/ Confidence 99% 95% 90% 99% 95% 90% 99% 95% 90% finite ANOVA 99.30 95.40 90.00 99.10 94.90 89.30 99.20 95.90 91.05 Finite ANCOVA 97.90 90.50 85.20 99.00 93.60 85.40 98.20 92.80 8800 GLM ANOVA 99.40 95.30 89.70 99.10 94.90 89.50 99.30 95.90 91.20 GLM ANCOVA 99.00 94.40 88.20 99.00 94.80 88.80 98.80 94.80 90.30 References 1 Fisher, R. A. (1926)

The Arrangement of Field Experiments. J. Min. Agric. Eng. 33:503-513

2 Fisher, R. A. (1935)

The Design of Experiments. Oliver and Boyd, Edinborough, England

(11)

Design and Analysis of Experiments 2nded. John Wiley and Sons, Inc., New York

4 Kempthorne, Oscar and Leroy Folks (1971)

Probability, Statistics and Data Analysis, Ames, Iowa: Iowa State Press.

5 Lehman, E. J. (1997)

Testing Statistical Hypothesis, 3rded. John Wiley and Sons, Inc., New York

6 Hirji, Karim F., Mehta, Cyrus R., and Patel, Nitin R. (1987),

Computing Distributions for Exact Logistic Regression, JASA, 82

6a Mehta, C R and Patel N, R (1995)

Exact logistic regression: theory and examples. Statistics in Medicine, 1995

7 Richards, W and J. Gogate (2000)