Does This Work?

(1)

COMMENTARIES

Opinions expressed in these commentaries are those of the authors and not necessarily those of the American Academy of Pediatrics or its Committees.

Does

This Work?

The published reports of uncontrolled trials have begun to include an emollient statement in the final paragraphs of the discussion, qualifying the conclu-sions by drawing attention to the need for these promising findings to be confirmed in a large ran-domized trial. At first sight, this seemed a hopeful sign, a predictor of imminent conversion to the laudable practice of randomizing the first patient, but gradually it became clear to me that the appar-ent change of attitude was not matched by any comparable change in behavior. New treatments, procedures, operations, diagnostic aids, and other interventions continue to follow the classic opening stages in the career of a medical innovation: from promising report through to standard procedure’ before the fundamental question-does this work?-has been answered. The emollient state-ment is a matter of form, an escape clause.

Why is the question so rarely answered? Here are some of the stumbling blocks.

EXPERIMENTING

OR TREATING:

THE

IMPOSITION

OF DOUBLE

STANDARDS

In 1982, the physical therapy department of an Australian teaching hospital, in conjuction with the

division of obstetrics, began to treat breast engorge-ment in the puerpenium with therapeutic ultra-sound. The basis for this treatment was the analogy with other inflammatory conditions that appear to respond well to ultrasound. Satisfaction with the treatment among these providing it and among those receiving it was very high. Ultrasound treat-ment of breast engorgement soon became a stan-dard procedure (J. Milne, personal communication, 1984). Note that, because this was defined not as research but as treatment, the initial proposal did not need to go to the hospital’s research committee

or to the hospital’s ethics committee; nor was writ-ten, informed consent by the patients required; no

formal treatment protocol was defined; no system-atic assessment of the outcome was required; exclu-sion criteria (the decision, for example, not to treat women with silicon implants) were developed on an ad hoc basis. Once the physical therapists decided that it was necessary to answer the question “does this work?” 2 years later, all those constraints applied to research projects quite rightly came into operation. They were not applied beforehand be-cause it was deemed to be treatment, not research. Applying an experimental treatment to all eligible women is acceptable; applying it to half of them is problematic.

The pressure to transform an innovation into a standard procedure quickly is reinforced by litiga-tion.’ Although defense of a practice rests so strongly on its being normative, rather than on its being effective and acceptably safe, clinicians are best served by rapid and widespread deployment of innovations so that they cease to be experimental as quickly as possible. One unfortunate side effect of the process of peer review is the speeding up of institutional and regional adoption.

WITHHOLDING

EFFECTIVE

TREATMENT:

SOME

UNETHICAL

DECISIONS

Uncontrolled trials that end up in the medical literature are significantly more likely to have

pos-itive findings (“it works!”) than are adequately con-trolled trials.3 When a randomized trial is being planned in the aftermath of an uncontrolled trial, the question of withholding effective treatment is sure to be an issue. It may even be an issue before any trial has been carried out. For example, a planned randomized trial of periconceptional mi-cronutrient supplementation in women who had previously given birth to an infant with a neural

(2)

about the effectiveness and safety of supplementa-tion. The necessary, subsequent, large, randomized trial barely survived the public and professional

outcry over withholding effective treatment. As commentators on another contentious issue once wrote in this journal: “Since few people [now] feel

totally neutral about [X], a truly randomized trial becomes difficult.”5

BEING OUT-DATED: THE TECHNOLOGIC IMPERATIVE

An Australasian friend on study leave in America took her children to the Smithsonian Museum

where she was startled to see on display the iden-tical model CT scanner that her own health region

at home had just ordered. The machine was already obsolete before she had a chance to use it! When a new patient care device is introduced, the best interests of the manufacturer are served if the in-novation is deployed quickly and widely. Access to state-of-the-art equipment is also deeply satisfying

to users of it. Once a patient care device has been acquired, it is in the interests of even more people

(hospital administrators, for example) for it to be widely used: this justifies the purchase and may, in fee-for-service settings, recoup some or all of the capital costs. Conspiracy theories are unnecessary. The interests of almost everyone are in harmony: they are for new equipment to become standard as quickly as possible, the strategy of fait accompli. In this familiar situation, any well-designed, random-ized trial that has gotten past the ethics committee, research committee, and funding barriers is likely to be evaluating a piece of equipment on the verge of replacement. Trial findings will then be treated as of historical interest only. If it did not work-so what? Something newer, which is almost by defi-nition something better, is at hand.

CONSENTING ADULTS: PATIENTS AND STAFF

The organizers of a controlled trial in which the intervention took the form of simplified advice about how to use the contraceptive diaphragm were

dismayed to note after 6 months that hardiy anyone had been enrolled in the experimental arm of the trial. Enquiries confirmed that the clinic nursing staff were far from enthusiastic about simplified advice: they believed in exhaustive detail for effec-tive prevention of pregnancy. Because this con-trolled trial did not use random allocation to groups, but used instead the final digit of the clinic record

number, staff members could see at a glance which group the client would be in. Clients who would have received the simplified advice were not asked to take part in the trial. Only those whose clinic

card number would have entitled them to standard advice were asked to participate and enrolled in the trial (E. Weisberg, personal communication, 1985).

A recent trial to determine the best method of delivery for very low birth weight infants began only after extensive discussions with the hospital’s

ethics committee during 6 months about informed consent. When the trial was prematurely termi-nated, the main problem had turned out to be, not patient consent, but staff consent. Almost two thirds of the first 25 eligible patients had been withdrawn from the trial by the consultant obste-tnicians.6 Although the obstetric staff had agreed to the necessity for a trial and to the trial design, their decision making during the 5 months the trial was

in operation implied that they were already irrev-ocably convinced that most very low birth weight infants benefit from cesarean delivery. This was shown in particular by the high rate of withdrawal for elective cesarean section of 48% of those eligible for randomization. This way of reconciling per-ceived conflict between the roles of scientific inves-tigator and personal physician “agreeing to a trial

in principle and then failing to enroll patients for it” has been described for other forms of surgery.7

In relation to staff consent, the trial organizers failed to realize the difference between this trial and earlier successful ones in the same setting of fetal heart rate monitoring. The latter had worked

well, with few withdrawals or losses of eligible sub-jects, even though the obstetricians had always had the right to withdraw any patient if they believed they should. However, the earlier trials had all dealt with diagnosis, not treatment. This trial dealt with

treatment and, moreover, a comparison between a surgical and a nonsurgical treatment. As current trials of the management of breast cancer have shown, it is exceptionally difficult for surgeons to regard “more” and “less” surgery as ethically equiv-alent. Staff consent is an underestimated and un-derresearched component of trial design and man-agement.

OVERESTIMATING

EFFECTIVENESS:

WHY

TRIALS ARE TOO SMALL

Statisticians and epidemiologists criticize clini-cians for carrying out trials that are too small, and

(3)

beliefs about effectiveness are usually based on previous reports from uncontrolled trials.

Second is the “dilution factor.” Preliminary stud-ies of a new technology that are carried out on patients meeting certain preset criteria, typically recruit people with extreme conditions. For exam-ple, a pilot study of uterine activity monitoring to

predict preterm labor had an entry criterion of at least one previous midtnimester abortion or still-birth before 28 weeks’ gestation or preterm labor before 36 weeks’ gestation. In fact, 12 of the 15 subjects (80%) had at least two such losses in their past reproductive history. In a subsequent random-ized trial using the same entry criteria, only 28% had two or more previous losses (R. Bell, personal communication, 1984). The probability of preterm delivery in the trial subjects was significantly less than that predicted by the pilot study.

The dilution factor in an uncontrolled evaluation will tend to give a favorable impression of the intervention because this is being applied to an increasingly low-risk group. The only exception occurs when an intervention has major side effects; otherwise, “the more overused is any given ineffec-tive treatment, the higher the percentage of cases where there appears to be a good result.”8 In a controlled trial, the dilution factor leads to a better outcome than expected for both groups and con-tributes to making the size of the trial inadequate.

In obstetric perinatal trials, a third factor is the uncritical use of the term “high-risk patients” to describe a heterogeneous group with assorted com-plications of pregnancy and different prognoses, plus the automatic assumption that such high-risk patients could expect a perinatal mortality sever-alfold greater than hospital or regional rates.

This assumption is wrong as demonstrated by two trials of fetal heart rate monitoring (Table). The first of these did not include any preterm infants; therefore, not surprisingly, the combined perinatal mortality of experimental and control

subjects, despite their assorted complications and high-risk status, was only about six per 1,000. The

TABLE. Perinatal Deat Compared With Hospital

h Rates in High-Risk Subjects and Regional Rates*

Subjects Perinatal Mortality Rate

(deaths/1,000 total births)

Trial A Trial B (1973-1974) (1977-1979)

All trial participantst All hospital (Level III)

patients, same years State, same years

5.7 24.5

27.9 28.1

22.2 16.5

* Trial A involved intrapartum fetal monitoring and trial B, antenatal cardiotocography.’#{176}

t Experimental and control groups combined.

second trial, also in high-risk patients, was re-stnicted to those admitted to the hospital

antena-tally. The gestation at delivery ranged from 26 to 42 weeks and the trial participants had a worse peninatal outcome than those in the earlier trial,

and worse than the outcome for the whole region. However, their outcome was significantly better than that for all women who gave birth at the same hospital within the same 2 years. The probable explanation is the inadvertent but inevitable exclu-sion of what might be called “ultrahigh-risk”

pa-tients from the trial: those who were admitted to the hospital in established preterm labor or with complications necessitating immediate delivery and

who thus went straight to the labor ward. Infants born before 28 weeks are particularly likely to have been excluded in this way.

The high risk fallacy, like the dilution factor, tends to overestimate the effectiveness of uncon-trolled interventions. If, instead of the trials shown in the Table, there had been implementation of intrapartum (A) and antenatal (B) monitoring as standard procedures for the same defined high-risk groups, the results might have been reported as monitored, high-risk (5.7 per 1,000), nonmonitored,

low risk (27.9 per 1,000) in A and monitored, high risk (24.5 per 1,000), nonmonitored, low risk (28.1 per 1,000) in B. Such unsophisticated interpreta-tions are not unknown in the perinatal literature.

When the probable peninatal mortality of a group is overestimated, any randomized trial will be too small to detect a clinically important effect of the

intervention. One of the strengths of the trials of fetal monitoring has been to establish reasonable estimates of the possible effectiveness of

monitor-ing, estimates that are different from those of 10

years ago. In retrospect, it is easy to decide how many patients were really needed.”

COLLABORATING: OLD AND NEW

CONSIDERATIONS

Because peninatal death is a rare event, random-ized trials in peninatal medicine, using mortality as the prime outcome of interest, need to be large. One or two institutions are capable of mounting trials of adequate size by themselves, but most trials need

to be multicentered. Unfortunately, most physi-cians have been educated in a style that maximizes competition rather than collaboration. Venerable insitututions too, which may be competing for pa-tients, prestige, or research funds, often regard neighboring venerable institutions in a light that

makes “collaboration” a particularly unfortunate word.

(4)

Na-tional Perinatal Epidemiology Unit in Britain), the practical difficulties of mounting a multicenter trial can overwhelm even those who have crossed the psychologic barriers mentioned above. The other

essential facilitator is adequate funding for the coordination and monitoring aspects of the trial: something that has yet to be seen as a priority by

national research-funding bodies, at least in rela-tion to peninatal research.

The newest threat to collaboration in the United States is the possibility that trial organizers will be sued on the grounds of restraint of free trade.’2 Ophthalmologists have already been through this mill regarding their proposed trial of radial kera-totomy. The two elements that make legal action a possibility seem to be collaborative, as opposed to single-investigator trials, and trials in which the experimental intervention is available only to those who agree to take part in the trial. The former is relevant, because it may be construed as evidence

of a conspiracy, and the latter on the grounds of restricting access by patients and professionals to a specific mode of treatment. No example of such legal action in the perinatal area has been reported, but a multicenter trial of chonionic villous sampling like those in Canada and Denmark, where chorionic villous sampling is available only within the trial, would surely be vulnerable.

STRUGGLING WITH THE CHAGRIN FACTOR AND COPING WITH UNCERTAINTY

Clinicians, it is said, choose their actions not according to the elegant probabilities and utilities

of decision analysis but by identifying a result that will lead to major chagrin and avoiding the deci-sional option that may lead to that result.’3 The chagrin associated with the death of an infant seems to be ameliorated if the infant has, before death, received all possible treatment. To avoid

chagrin and self-reproach, decisions are likely to be those that involve doing something-in the current terminology, aggressive action. This harmonizes with the well-known medical phenomenon of judg-ing sins of omission much more harshly than sins of commission, as attendance at any hospital pen-natal mortality meeting or regional review of avoid-able factors in peninatal deaths will confirm.

Re-ceiving all possible treatment is, of course, incom-patible with withholding anything, even unproven therapies.

In addition, peninatal care, especially neonatal

intensive care, has changed so rapidly in the past decade that current practitioners feel able to dis-miss anything that happened last year as irrelevant

to modern methods of treatment. There are two

implications of this response. One is that any

slow-ing down of the rate of change, an inevitable con-sequence of evaluating each innovation before wide-spread adoption, is likely to be intolerable for the frontierspeople. The other implication is that the past has no lessons for the present-a pity, consid-ening the history of neonatal cane.

Lessons from the present are difficult to learn. The high mortality, morbidity, and disability among preterm infants ensure that any specific ill-effects of an innovation will be difficult to disen-tangle, especially if the procedure is used widely or used mostly in the sickest infants. Recent examples are the difficulties of determining whether the

as-sociation between prenatal nitodnine and learning problems’4 and the association between low-dose hepanin for catheter patency and intraventnicular hemorrhage’5 are causal. That practices of such long-standing can suddenly be linked with extraor-dinanily disturbing outcomes must alarm any out-sider. As for insiders, if it is generally true that coping with uncertainty is a major source of stress for staff in neonatal intensive care units,’6 it goes some way to explain why the explicit recognition of doubt and uncertainty that is expressed in the design and conduct of randomized trials is so hard to accept.

ANSWERING

THE

WRONG

QUESTION:

HIDDEN AGENDAS

A randomized trial of weekly antenatal fetal heart rate monitoring in high-risk women (Table, trial B) was unable to detect any reduction in major morbidity when monitoring was added-on to the previous usual care.’#{176}Yet, the trial totally failed to influence the use of this technology in the hospital where it was carried out. In the yea after the trial ended, the use of antenatal fetal heart rate moni-toning increased 16-fold. The paradoxical response to the failure to detect any benefits of its use in high-risk patients was an extension of its use to women with fewer or no complications, plus its more frequent use in high-risk groups.

The reason for this paradoxical response must remain speculative, but the most likely explanation

is that the technology was not being used for the objective that had been assumed and investigated. Its use was not to reduce peninatal mortality and morbidity, or to reduce untimely interventions such as induction of labor or cesarean section, but to provide the obstetrician with reassurance about the continuing well-being of the fetus. The answer was correct, but the question was irrelevant.

SPEAKING

PLAINLY

(5)

“survivor.” It will be difficult to rename all inno-vations and established procedures for which the question “does this work?” has not been answered, “research projects.” However, the word “new”

should always be preceded by “experimental,” be-cause “new” often implies “improved.” We need a

less negative word than “negative” to summarize the findings of “negative” trials and a better way of describing randomization than “random.”

We speak of “subjecting” interventions to a ran-domized trial as if it were a cruel and unusual punishment. It may be that for the trial organiz-ens-no one ever made any friends by carrying out a controlled trial-but for the innovation it should

be, and perhaps one day will be, the standard pro-cedure. Identifying the stumbling blocks is another step on that path.

REFERENCES

JUDITH LUMLEY, MA, MBBS, PHD Department of Paediatnics

Monash University

Queen Victoria Medical Centre Melbourne

1. McKinlay JB: From ‘promising report’ to ‘standard proce-dure’: Seven stages in the career of a medical innovation.

Milbank Mem Fund Q 1981;59:374-411

2. Avery ME, Chernik V: On decision making surrounding drug therapy: A continuing dilemma. N Engi J Med

1977;296:102-103

3. Sacks HS, Chalmers TC, Smith H: Sensitivity and specific-ity of clinical trials: Randomized v historical controls. Arch

Intern Med 1983;143:753-755

4. Wald NJ: Neural-tube defects and vitamins: The need for a randomized clinical trial. Br J Obstet Gynoecol 1984;91:516-523

5. Hobbins JC, Freeman R, Queenan JT: The fetal monitoring debate. Pediatrics 1979;63:942-951

6. Lumley J, Lester A, Renou P, et al: A failed RCT to determine the best method of delivery for very low birth weight infants. Controlled Clin Trials 1985;6:120-127

7. Anonymous: Consent: How informed. Lancet

1984;1:1445-1447

8. Hendricks, CH: The case for nonintervention in preterm labor, in Elder MG, Hendricks CH (ads): Preterm Labor.

London, Butterworths, 1981, pp 93-123

9. Renou P, Chang A, Anderson I, et al: Controlled trial of fetal intensive care. Am J Obstet Gynecol 1976;126:470-476 10. Lumley J, Lester A, Anderson I, et al: A randomized trial of

weekly antenatal cardiotocography in high-risk obstetric patients. Br J Obstet Gynaecol 1983;90:1018-1026

11. Detsky AS, Sackett DL: When was a clinical trial big enough? How many patients you needed depends on what you found. Arch Intern Med 1985;145:709-712

12. Rose M, Leibenluft RF: Antitrust implications of medical technology assessment. N Engi J Med 1986;314:1490-1493

13. Feinstein AR: The ‘chagrin factor’ and qualitative decision analysis. Arch Intern Med 1985;145:314-317

14. Hadders-Aigra H, Touwen B, Huisjes HJ: Follow-up of children exposed to ritodrine. Br J Obstet Gynaecol

1986;93:156-161

15. Lesko SM, Mitchell AA, Epstein MF, et al: Heparin use as a risk factor for intraventricular hemorrhage in low-birth-weight infants. N Engi J Med 1986;314:1156-1160

16. Astbury J, Yu V: Determinants ofstress for staff in neonatal intensive care units. Arch Dis Child 1982;57:108-111

Neonatal

Neurosonography

By this time, we should expect that the ultrasonic diagnosis of peniventnicular hemorrhage should be definitive with demonstration of an echodense patch in an ependymal region that deforms the adjacent ventricular contour in its early stages (Fig 1). The combined experience with transfontanel viewing, which has spanned some four generations of ultrasonic equipment, convincingly demon-strates that the selection and operation of an in-strument can have a profound effect on the diag-nosis of acute neonatal intracranial pathology.

Visualizing intracranial hemorrhage is primarily

a contrast or “gray scale” resolution task, not a

matter of spatial or detail resolution. That is, the hemorrhage must stand out as different in some

way from the surrounding parenchyma. In the low megahertz range, subependymal and panenchymal hemorrhages are more reflective than “normal” cor-tex or subcortical gray matter. That contrast gra-dient is affected by biologic factors including the size of the hemorrhage, local water content, and, probably, tissue perfusion. Several instrument fac-tons are equally important, principally the spectral frequency composition and spatial configuration of

the pulse at its interaction depth. The final image is a combination of interdependent tissue and im-aging system variables. If the contrast gradient is too low on “noise” too high, an area of hemorrhage will be indistinguishable from the surrounding tis-sue. Conversely, germinal matrix tissue is itself echodense (Fig 2) and may be misinterpreted as hemorrhage if the gradient is intensified by the gray

scale manipulation. Likewise, focal hemorrhages are less often recognized by ultrasound in cenebel-lum than their incidence’ because high background reflectivity obscures detection. Organizational changes within germinal matrix on other hemon-nhage can be observed (Fig 3). Distinction of bloody

and clean CSF is another contrast-dependent task (Fig 4).

Another concern are “side lobe” artifacts2 by which a strong reflector in one part of the field is

(6)

1987;79;1040

Pediatrics

JUDITH LUMLEY

Does This Work?

Services

Updated Information &

http://pediatrics.aappublications.org/content/79/6/1040

including high resolution figures, can be found at:

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml

entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or in its

Reprints

(7)

Does This Work?

COMMENTARIES

Does

This Work?

EXPERIMENTING

OR TREATING:

THE

IMPOSITION

OF DOUBLE

STANDARDS

WITHHOLDING

EFFECTIVE

TREATMENT:

SOME

UNETHICAL

DECISIONS

OVERESTIMATING

EFFECTIVENESS:

WHY

CONSIDERATIONS

ANSWERING

THE

WRONG

QUESTION:

SPEAKING

PLAINLY

Neonatal

Neurosonography

1987;79;1040

Pediatrics

JUDITH LUMLEY

Does This Work?

Services

Updated Information &

http://pediatrics.aappublications.org/content/79/6/1040

including high resolution figures, can be found at:

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml

entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or in its

Reprints

1987;79;1040

Pediatrics

JUDITH LUMLEY

Does This Work?

http://pediatrics.aappublications.org/content/79/6/1040

the World Wide Web at:

The online version of this article, along with updated information and services, is located on

American Academy of Pediatrics. All rights reserved. Print ISSN: 1073-0397.