Need for Large Sample Sizes in Randomized Trials

(1)

COMMENTARIES 569

REFERENCES

1. Reducing Poverty Among Children. Washington, DC, Congressional Budget Office, May 1985, p 1

2. Children in Poverty, Committee on Ways and Means, US House of Representatives, No. 46-869 0. Government Print-ing Office, May 1985, p 625

3. Hunger in American Cities: Eight Case Studies. Washington, DC, US Conference of Mayors, June 1983

4. Hunger in America: The Growing Epidemic. Physician Task Force on Hunger in America, Harvard University School of Public Health. Middleton, CT, Wesleyan and University Press, 1985

5. A Report to the Secretary on Homeless and Emergency

Shel-ter. US Department of Housing and Urban Development, Office of Policy Development and Research, No. 35-941. Government Printing Office, 1983

6. Current Population Reports, P-70-83-4, Economic

Charac-teristics of Households in the United States: Fourth Quarter 1983, Table E. Washington, DC, US Bureau of the Census, 1985

7. Going Hungry in America. Report to the Committee on

Labor and Human Resources, US Senate, Dec 22, 1983 8. Report of the Secretary’s Task Force on Black and Minority

Health, US Department of Health and Human Services.

Government Printing Office, August 1985, p 171

9. Report from the Department of Health and Human Services

on Infant Mortality for the Committee on Energy and

Com-merce, US House of Representatives. Government Printing

Office, April 1985

10. End Results: The Impact of Federal Policies Since 1980 on

Low Income Americans. Washington, DC, Center on Budget

and Policy Priorities, September 1984

11. Alternatives for the 1980s, No. 17: Health Care: How to

Improve It and Pay For It. Washington, DC, Center for

National Policy, April 1985, p 59

12. Opportunities for Success: Cost-Effective Programs for

Chil-dren. Staff Report of the House Select Committee on

Chil-dren, Youth, and Families. Government Printing Office, Aug 14, 1985

Need

for Large

Sample

Sizes

in Randomized

Trials

Prevention is an area of relevance and concern

to all health professionals, in particular,

pediatri-cians. This concept encompasses primary

preven-tion of disease (such as diphtheria, pertussis, teta-nus, and poliomyelitis) among healthy individuals

as well as secondary prevention or the reduction in

risks of complications, recurrences, or mortality

among those already affected. It is unlikely that a

new measure will have as dramatic an effect as did the poliomyelitis vaccine, a prevention measure, which reduced the incidence of paralytic disease in

the vaccinated group more than 50% compared with

children given placebo. Analogously, virtually none

of the new therapeutic measures of promise is likely

to have as clear-cut an effect as did penicillin, which decreased mortality from pneumococcal pneumonia

approximately sixfold (from about 95% to 15%).

For the vast majority of interventions, the most plausible effects will be small to moderate in size, on the order of a 10% to 30% difference.1

Despite the relatively small size of the likely effects, for common outcomes or diseases such re-ductions in risk of development or recurrence would have a major impact upon the health of the general public. The problem is that such small but clinically worthwhile effects are difficult to detect reliably.2 Simple clinical observation is often useful to gen-erate research questions, but because it is usually based on the experience of a series of cases without

an appropriate comparison group, it is not possible

to test hypotheses with such data. Observational

analytic studies can be used to test hypotheses but are limited by the fact that the magnitude of their

inherent biases, as well as uncontrolled baseline differences in patient or disease characteristics (confounding), may easily be as large as the postu-lated effect of any agent or procedure. Thus, an

intervention study or clinical trial is the most pow-erful tool available to detect reliably such small to

moderate effects and can provide the strongest and

most direct evidence to judge whether an associa-tion is one of cause and effect. If, first, the

treat-ments are allocated at random and, second, the

sample size is sufficiently large, a clinical trial can provide a degree of assurance about the validity of findings that is simply not possible with any other

epidemiologic design option.3

With respect to the first of these, randomization

is the preferred method of treatment allocation in

any clinical trial because it has the unique

advan-tage that the study groups that are formed will be,

on average, comparable with respect to all variables

except for the intervention being tested. This is

especially important because baseline characteris-tics that differ between the treatment groups and also affect the risk of developing the outcome would confound the relationship between exposure and

outcome. With randomization, not only will all

recognized confounding variables be equally

distrib-uted but so will all potential confounders that are

unknown or unsuspected by the investigator be-cause of limitations of biologic knowledge at the

time the trial is initiated. The known confounders

could be controlled for in the design or analysis of

studies, whether observational or intervention. In

contrast, the only possible way to achieve control

of the influence of unknown confounders is through randomization. The fact that randomization works

“on average” implies that the larger the sample size,

the more successful the randomization process will

at Viet Nam:AAP Sponsored on September 7, 2020 www.aappublications.org/news

(2)

570 PEDIATRICS Vol. 79 No. 4

April 1987

be in distributing confounding factors equally among the groups. When the sample size is suffi-ciently large, randomization will result in a level of comparability between the study groups that can-not be achieved with an observational study design.

The second requirement, that the trial be of

sufficient sample size, is related, in part, to the

success of randomization as a means of controlling confounding. More important is the ability of the

study to minimize the probability that observed

differences are due merely to the play of chance, by ensuring adequate statistical power to detect a

dif-ference between treatment groups if one truly

ex-ists. Although sample size must be addressed early

in the planning stage of any analytic epidemiologic

study, it has particular importance in a clinical

trial. Occasionally, clinicians with a primary con-cern for the care of individual patients may

erro-neously tend to favor the conduct of trials that collect large amounts of detailed data on small numbers of individuals. In fact, the adequate testing

of any research question requires the collection of

less detailed information on large samples of sub-jects. Observational analytic study designs can re-liably study only large effects, so that sample sizes need not usually be as great. In contrast, a trial must have a sufficient sample size to detect reliably the likely small to moderate differences between treatment groups. The major danger in a trial of insufficient sample size is that it will produce not

just a null result but an uninformative null finding. In this case, it will be unclear whether there is truly no difference between the treatment groups or whether the sample size was merely insufficient to detect the small to moderate effect. Such trials may, in fact, have the potential for great scientific harm

if their results are misinterpreted as demonstrating that an intervention has no effect when in fact the sample size was not sufficient to provide an inform-ative null result. Even if an investigator feels con-fident that a new intervention will have a large benefit (ie, a 50% or greater reduction in the

pri-mary end point), it is preferable to design a trial to test the more likely small to moderate benefit. In

this circumstance, if a large effect emerges, the trial can be stopped early. On the other hand, if the trial

is designed to find only a large effect, there will be

no power to detect smaller but nonetheless clini-cally important differences.4’5

From a practical standpoint, to achieve a

suffi-cient sample size, investigators often spend much

time and effort attempting to simply increase the total number of participants enrolled. However, the effective sample size, and, consequently, the power of the trial, is proportional not simply to the num-ber of participants enrolled but more specifically to

two critical factors: the total number of end points experienced by the participants and the difference in compliance between the treatment groups.6 With respect to the accumulation of sufficient end points,

two major strategies can be considered: first, se-lecting a high-risk population for study and, second, ensuring an adequate duration of follow-up. To select individuals at increased risk of developing the outcomes of interest, one might consider factors such as age, sex, geographic area, or personal or family history. To ensure adequate duration of any planned follow-up period, one should always

con-sider the possibility that the actual rate of accrual of end points will be less than projected. This is not unusual in clinical trials and may occur for reasons

beyond the control of the investigators. Those who

volunteer to participate in intervention studies are a self-selected group who also tend to experience generally lower numbers of end points than those who do not take part, regardless of the hypothesis being studied or the randomized treatments allo-cated. The most effective means to deal with this problem is to extend the duration of follow-up of the trial.

The second major consideration concerning sam-ple size is the effect of compliance. High compliance in the study population is crucial to obtaining a

valid result. The effect of noncompliance in any participant, regardless of treatment assignment, is

to make the intervention and comparison groups more alike, which has the result of decreasing the ability of the study to detect any true differences between the groups. By definition, an intervention study requires the active participation and cooper-ation of all study subjects. After agreeing to partic-ipate, subjects in a trial of medical therapy may deviate from the protocol for a variety of reasons,

including developing side effects, forgetting to take their medication, or simply changing their minds regarding participation. Analogously, in a trial of

surgical therapy, those who were randomized to one

group may choose to obtain the alternative treat-ment. In addition, there will be instances in which participants will not or cannot comply, such as

when consent is withdrawn after randomization or the condition of a randomized patient rapidly wors-ens to the point where therapy becomes contrain-dicated. Consequently, the problem of achieving and maintaining high compliance is an issue in the

design and conduct of all clinical trials.

The extent of noncompliance in any trial is re-lated to the length of time that participants are

expected to adhere to the intervention, as well as

to the complexity of the study protocol. There are a number of possible strategies that can be adopted

to try to enhance compliance among the

(3)

COMMENTARIES

571

pants in a trial. Selection of a population of

mdi-viduals who are both interested and reliable is cru-cial to enhancing compliance rates. Other ways of

attempting to increase compliance include frequent contact with participants by home or clinic visit, telephone, or mail; the use of calendar packs of

study medication, in which each pill is labeled with the day it is to be taken; and the use of incentives

such as detailed medical information not ordinarily

available from their usual source of health care.

The higher the degree of compliance with the offered program, the greater the extent to which

observed differences between those allocated to

al-ternative therapies reflect real differences in the

effects of the treatments themselves. Thus, compli-ance levels must be measured, which in most cases

is not easy. All ofthe measures available to estimate compliance have inherent limitations. The simplest measure is a self-report. In fact, for some

interven-tions, such as exercise programs or dietary

modifi-cations, this may be the only practical way to assess

compliance. In trials of pharmacologic agents, pill

counts have been used, where participants bring

unused medication to each clinic visit or return it

to the investigators at specified intervals. Although

this method may eliminate inaccuracies due to poor

memory, it assumes that the subject has ingested

all medication that has not been returned to the

clinic. A more objective means of assessing

compli-ance, which is also expensive and logistically

diffi-cult, is the use of biochemical measurements to

validate self-reports. Frequently, the presence of

active drugs or metabolites can be detected in either blood or urine by laboratory procedures. In cases in which drugs or metabolites are difficult to measure, or for subjects who are taking a placebo, a safe and

inert biochemical marker can be added to the treat-ment. Laboratory determinations are limited,

how-ever, in that they usually reflect only whether

med-ication was taken in the preceding day or two and,

thus, cannot be used as a reliable measure for long-term compliance.

Inevitably, some proportion of participants in a trial will become noncompliant despite all

reason-able efforts. In such instances, maintaining any level of compliance is preferable to complete

non-compliance. Moreover, because every randomized subject, compliant or noncompliant, should be in-cluded in the analysis of an intervention study, it

is essential to obtain as complete follow-up

infor-mation as possible on those who have discontinued the treatment program. Investigators should pursue follow-up data on outcome for such individuals for

the duration of the trial in a manner identical with that for subjects who continue to comply.

One strategy to maximize compliance is the im-plementation of a run-in or “wash-out” period prior

to actual randomization. All participants receive either the active treatment or placebo for a number

of weeks or months before formal randomization to a treatment group. This permits those who have difficulty adhering to the intervention program or those perceiving adverse effects to withdraw before

randomization without affecting the validity of the study. Such a strategy seems particularly attractive

in primary prevention trials or secondary preven-tion studies in which it is not necessary for a randomized treatment to begin during or

immedi-ately following an acute event.

The ultimate goal of all these methodologic

con-siderations is to design a randomized trial that can clearly prove or refute the hypothesis being tested. To be cost-effective, a randomized trial must pro-vide either a definite positive result on which public

policy can be based or an informative null result that could then safely permit the rechanneling of resources to other areas of research. Based on all these considerations, well-designed, large-scale

ran-domized trials, which maintain high compliance and can be conducted at low cost, should play an

increasing role in determining the efficacy and

safety of primary and secondary preventive

meas-ures.

REFERENCES

CHARLES HENNEKENS, MD

JULIE E. BURING, DSc

Departments of Medicine and

Preventive Medicine and Clin-ical Epidemiology

Harvard Medical School

Brigham and Women’s Hospital Brookline, MA

1. Hennekens CH, Buring JE: Epidemiology in Medicine. Bos-ton, Little, Brown and Co, in press, 1987

2. Yusuf 5, Collins R, Peto R: Why do we need large, simple randomized trials? Statistics Med 1984;3:409-420

3. Hennekens CH: Issues in the design and conduct of clinical trials. JNCI 1984;73:1473-1476

4. Peto R, Pike MC, Armitage P, et al: Design and analysis of randomized clinical trials requiring prolonged observation of each patient: I. Introduction and design. Br J Cancer

1976;34:585-612

5. Peto R, Pike MC, Armitage P, et al: Design and analysis of randomized clinical trials requiring prolonged observation of each patient: II. Analyses and examples. Br J Cancer

1977;35:1-38

6. Buring JE, Hennekens CH: Sample size and compliance in randomized trials, in Sestili MA, Dell JG (eds): Chemopre-vention Clinical Trials. Problems and Solutions, 1984, NIH publication No. 85-2715. 1985, Bethesda, MD National In-stitutes of Health, pp 7-11

(4)

1987;79;569

Pediatrics

CHARLES HENNEKENS and JULIE E. BURING