The Effect of Open- vs. Closed-Book Testing on Performance on a Multiple-Choice Examination in Pediatrics

(1)

The

Effect

of Open-

vs. ClosedBook

Testing

on

Performance

on a Multiple-Choice

Examination

in

Pediatrics

Charles F. Schumacher, Ph.D., Diane W. Butzin, Laurence Finberg, M.D., and Fredric D. Burg, M.D.

From the Departments of Psychometrics arid Graduate and Continuing Medical Evaluation, National Board

of Medical Examiners, Philadelphia; the American Board of Pediatrics, Philadelphia; and the Department of

Pediatrics, Montefiore Hospital and Medical Center, Albert Einstein College of Medicine of Yeshiva

University, New York

ABSTRACT. A study was undertaken to test the effect of open- vs. closed-book testing conditions on performance on a graduate-level, multiple-choice examination in pediatrics. A group of practicing pediatricians and a group of medical students took the examination. For the practice group, no significant difference between mean scores was observed, and the correlation between scores under the two testing

conditions was high. In the student group, however, the mean score was significantly higher under open-book condi-tions and the correlation between scores under the two testing conditions was positive but low. The mean score obtained by practitioners was significantly higher than the

mean score obtained by students under both testing condi-tions. The effects of time limit and level of motivation were

not explored in the present study. Pediatrics 61:256-261,

1978, recertification examination, open-book testing.

Most American specialty boards have now made the decision to evaluate their current diplo-mates periodically for purposes of recertification. A number of alternative methods for accomplish-ing this evaluation are being considered, includ-ing the administration of a written examination. If examination is adopted as one means of evalua-tion for recertification purposes, a number of important questions arise regarding both the type of examination to be developed and the condi-tions under which this examination should be

administered.

The present study was undertaken at the request of the American Board of Pediatrics (ABP) and the American Academy of Pediatrics (AAP) to investigate one of the latter questions which has been raised on several occasions by members of the ABP. Specifically, it has been

suggested that if the ABP were to administer an examination as one method for reevaluating current diplomates, such an examination should not measure simply the examinee’s ability to recall information that is available in pediatric textbooks. Following from this premise, it has further been suggested that examinees be given an opportunity to refer to textbooks to find whatever factual information they might need in order to answer questions or solve problems posed

by the examination. In brief, some members of the ABP and the AAP feel that a recertification examination should be administered under “open-book” conditions rather than under the tradi-tional “closed-book” conditions that normally apply for certification examinations.

One general question that has arisen from the discussion of this suggestion is, does it matter whether an examinee takes the test under open-book conditions or closed-book conditions? More specifically, do examinees taking an examination under open-book conditions obtain higher scores, on the average, than these same examinees taking a similar examination under closed-book condi-tions? Does open- vs. closed-book testing produce the same effect for practicing physicians as it does for medical students? Regardless of the effect on overall level of performance, do examinees rank-order themselves differently when tested under

Received October 28; revision accepted for publication November 15, 1977.

ADDRESS FOR REPRINTS: (D.W.B.) The American Board

of Pediatrics, Inc., Children’s Hospital of Philadelphia, 34th

Street and Civic Center Boulevard, Philadelphia, PA

(2)

open-book conditions vs. closed-book conditions?

These questions led to the generation of the

following hypotheses:

1. A group of practicing pediatricians given

the opportunity to refer to textbooks in general

pediatrics while taking a multiple-choice

exami-nation in pediatrics (open-book testing) will

obtain a higher mean score on this test than they will on a similar examination taken without benefit of any reference materials (closed-book

testing).

2. A group of third- and fourth-year medical

students, tested under the same conditions described above for practicing pediatricians, will

also obtain a higher mean score under open-book testing than under closed-book testing.

3. A group of practicing pediatricians will obtain higher mean scores on a multiple-choice examination in pediatrics than a group of third-and fourth-year medical students when both groups are tested under the same conditions, either open-book or closed-book.

4. For both practicing pediatricians and medi-cal students there will be a high positive correla-tion between scores obtained on a multiple-choice examination taken under open-book

condi-tions and scores obtained on a similar examination

taken under closed-book conditions.

The present study was undertaken to test these

hypotheses and to gather information about the

opinions of examinees taking the test under open-vs. closed-book conditions.

Review of the Literature

Although occasional editorials in favor of open-book exams have appeared,12 the literature

contains little experimental evidence to support or refute open-book examinations.

Stalnaker and Stalnaker3 reported that scores on open-book college exams were the same as scores achieved the previous year in a closed-book format. Studies by Kalish,4 Marco,5 and Jehu et al.6 indicate that allowing books or notes in college exams does not improve performance.

These three studies were conducted in systems

where exams were generally in a closed-book format. In a study of students in a medical school operating under an open-book system, Krarup et

al.7 found that performance on a physiology exam

was slightly better on recall items but no better

on the total exam using an open-book format.

Michaels and Kieren8 found that secondary school students in mathematics achieved high scores on open-book exams for knowledge and comprehen-sion items, but not for application items.

Feldhusen9 and Marco5 reported that students

believed open-book exams to be less anxiety-provoking. Michaels and Kieren8 found that

secondary school students were significantly less

anxious in the closed-book setting. The results of

J

ehu et al.6 indicate that access to notes is accompanied by anxiety reduction during, but not before, the exam. Krarup et al.,7 on the other hand, observed that students in an open-book system felt that the open-book exam was more difficult and more stressful.

A study by Mankin et al.’#{176}was designed to study the effect of allowing orthopaedic residents to take an examination in an unproctored, permis-sive environment. While the study did not test directly the effect of using texts in an examina-lion, the results do provide some answers to this question, since 56% of the examinees in the

unproctored examination did use reference texts

or journals while none of the examinees in the proctored setting did so. Performances of the unproctored group were significantly better for recall questions, and for interpretation questions, but not significantly different for problem-solving

questions. Interestingly, although the majority of

the residents indicated they enjoyed the unproc-tored setting, a significant number indicated they preferred the proctored exam and were in many cases confused by different opinions in their reference sources.

None of these studies involved examinations in

pediatrics and none involved examinees in a continuing education mode. These limitations plus the lack of conclusive evidence on the value of open-book exams suggested a need for the study described here.

METHOD Overview

The hypotheses to be tested in this study required the development of two examinations that were essentially equivalent with respect to content, the identification of groups of practi-tioners and medical students willing to partici-pate in the study, the administration of the tests to examinees under both open- and closed-book conditions and the analysis of the resulting data.

Instruments

(3)

TABLE I

PERFORMANCE OF PRACTICE GROUP ON FIRST ABP

WRIi-FEN EXAMINATION

No. of Participants

Quartile

1st 28

2nd 28

3rd 31

4th 4

Year of first exam

1941-1950 4

1951-1960 29

1961-1970 55

1971-1972 3

To obtain a rough index of the extent to which the two resulting forms were equivalent statistically, a mean difficulty index and a mean discrimination index (biserial r) were calculated for each form, using item statistics that had been generated from the 1975 ABP candidate group (P: .7517 vs. .7566; RbS: .2886 vs. .2937).

In addition, the items in each form were classified by a physician-member of the study team with respect to the cognitive process that was apparently being measured by the item

(recall

of information or problem solving: .84 to

.16

vs. .82 to .18).

Both of these analyses suggested that the two forms were similar enough to each other to be used for purposes of this study. Exact equivalence was not required since the assignment of test forms to testing conditions was counterbalanced for each examinee group. However, if the two forms were grossly different from each other, an additional source of unwanted variability would have been introduced into the analyses, compli-cating both the analyses and the interpretation of the results.

In addition to the testing instrument described above, a brief questionnaire was also developed to obtain the examinee’s opinion about his/her own performance under the open- vs. closed-book testing conditions, to determine which reference texts were used, to determine the extent to which the examinee was satisfied with these references, and to ascertain the manner in which the references were used (whether it was to find answers or to confirm answers).

Participants

The practice group for this study was identified and recruited by members of the ABP and ABP official examiners in accordance with a set of general guidelines and that were provided by the study team. In September 1975, each Board

member and examiner was contacted by mail, given an outline of the protocol for the study, and asked to recruit at least five practicing pediatri-cians in his/her community who had been certified by the ABP not later than 1972. In addition, each recruiter was asked to select poten-tial participants in such a way as to provide a range of competence in this group, from individ-uals having average and borderline competence to those identified as outstanding.

A total of 123 practitioners agreed to partici-pate in the study. This group was later reduced to

96

individuals for reasons described below. To obtain a rough indication of the extent to which the participant group included a range of competence, the records on file with the ABP for these 96 practitioners were analyzed to determine the distribution of scores they obtained on their first attempt on the ABP written examination.

Records were available for 91 of these

partici-pants and the results of this analysis are shown in Table I. These data suggest that the practice group was indeed heterogeneous with respect to competence as measured by the ABP certifying examination, and also heterogeneous with respect to number of years since the first certifying exam was taken.

The third- and fourth-year student groups were

recruited from classes at eight medical schools in

the eastern and midwestern regions of the coun-try. Most of these students were volunteers who were paid a nominal stipend to participate in the study. A total of 223 students were tested, of whom 100 were included in the analyses. This

study group was selected as described below.

Test Administration

All examinations were administered under controlled conditions either by an ABP official examiner (practice group) or a medical school faculty member (student group) at a single testing

session in which the two forms of the test were

given in consecutive two-hour periods. In a few instances practitioners were permitted to take the examinations on two consecutive evenings. To control for possible undetected differences in the two test forms and to control for the sequence in which the tests were taken, the administration of forms according to testing conditions was coun-terbalanced for each examinee group. The number of examinees tested according to sequence of testing conditions and test forms that were used are summarized in Table II.

(4)

TABLE II TABLE III

NUMBER OF EXAMINEES IN EACH SUBGROUP

Sequence Form A Open, Form B Open,

Form B Closed Form A Closed

.

Practice Student Practice Student

Group Group Group Group

Open-book first 24 65 40 44

Closed-book first 27 52 32 62

been tested in each of the four subgroups shown in Table II. The original study design called for a total of 25 students in each subgroup. In addition, it was important that the study group should not come predominantly from a single school. There-fore, it was decided that any one school should provide less than half of the students in the study group. Finally, to prevent any differences in performance that might be related to a specific school from influencing results, it was essential to have the same number of students from each school in all four of the student subgroups shown in Table II.

To meet these conditions, three schools were dropped from the sample and some students were eliminated at random from the remaining five schools. This selection process yielded a study group of 100 students.

In selecting the practice group to be included in the analyses, it was also considered essential that each of the practice subgroups shown in Table II be represented by an equal number of examinees. It was necessary to eliminate 12 prac-titioners because they did not take the exam under the sequence prescribed. After these had been eliminated, subgroup 1 with 24 examinees was the smallest of these groups. Therefore, examinees were excluded at random from each of the other subgroups until those groups also contained 24 subjects each. Thus, the total number of practitioners included in the analyses was reduced from 123 to 96.

Statistical Analyses

In order to test the first three hypotheses posed

by this study, a two-way repeated measures

analysis of variance was performed. The indepen-dent variables were practitioners vs. students and open- vs. closed-book conditions. The dependent variable was raw score on the examination.

To test hypothesis 4, a product-moment corre-lation coefficient was obtained between scores on the test taken under open-book conditions and scores obtained under closed-book conditions for

RAW ScoRE MEANS AND STANDARD DEVIATIONS

Group Mean SD

Students (N = 100)

Open-book 61.18 8.73

Closed-book 54.29 8.21

Practitioners (N = 96)

Open-book 68.32 12.22

Closed-book 67.27 11.73

TABLE IV

ANALYSIS OF VARIANCE RESULTS

Source df Mean F

Square

Uncorrelated data Groups (students vs.

practitioners) 1 9,919.79 52.30#{176}

Error 194 189.67

Correlated data

Conditions (open vs. closed) 1 1,544.64 63.77#{176}

Interaction (groups X

conditions) 1 834.57 34.45#{176}

Error 194 24.22

#{176}Significant at .01 level.

the practice group and for the student group,

separately.

RESULTS

Tables III and IV show the outcome of the repeated-measures analysis of variance to test for differences between examinee groups and exami-nation conditions.

The first noteworthy finding from this analysis was a highly significant interaction effect between groups and conditions. This interaction indicates that the effects of the two testing conditions were not the same for the student group as they were for the practice group. Inspec-tion of the mean scores that were obtained by the two groups shows that the student group achieved a noticeably higher mean score when the test was administered under open-book conditions (61.2) than they did when the test was given under closed-book conditions (54.3). A t test for

corre-lated measures indicated that this difference was indeed significant at the .01 level.

The practice group obtained mean scores of 68.3 and 67.3 under the open- and closed-book conditions, respectively. This difference was not significantly different at the .05 level, again using a t test for correlated measures.

(5)

TABLE V

SUMMARY OF QUESTIONNAIRE RESPONSES

Q

uestion % Resp

Practitioner (N=96)

onding

5 Students

(N=100)

1. I think I did better

On the closed-book exam 31 17 On the open-book exam 32 51

No difference 31 26

No response 6 6

3. Now that I have seen the questions,

I am satisfied with the texts 74 86 I selected

I wish I had selected a 15 6

different text

No response 1 1 8

4. When I did use the text,

I used it primarily to find 51 71 an answer

I used it primarily to con- 42 23

firm an answer

Other 6 3

No response 1 3

an examination of the type that was developed for

this study is given to medical students under

open-book conditions, the average student per-forms better than when the test is administered under closed-book conditions. However, when the test is administered to practicing pediatri-cians, average scores are essentially the same under both testing conditions.

A second finding of note was a highly signifi-cant difference between the mean scores obtained

by students and those obtained by practitioners under both testing conditions. Because of the significant interaction effect noted earlier, the

difference between student and practitioner

mean scores was also tested for each condition

separately, using a t test for independent samples.

The results of these analyses indicated that the mean score obtained by practitioners was signifi-cantly higher (.01 level) than the mean score obtained by students for both the open- and closed-book examinations. Thus, the results tend to confirm hypothesis 3, i.e., as might be expected, on an examination that was originally designed for purposes of specialty board certifica-tion, the average practicing pediatrician scored significantly higher than the average student.

Product-moment correlation coefficients that were obtained for each group between scores obtained under the two testing conditions were

.84 for the practice group and .41 for the student

group. Thus, regarding hypothesis 4, it appears

that the rank-ordering of practitioners was quite similar for the two different testing conditions,

whereas a considerable difference in rank-order

occurred for the student group.

Table V contains a summary of the opinions expressed by the practitioners and students in response to the questionnaire that was completed after the examination had been administered.

The opinions expressed by the two groups in response to the first question (Do you think you did better on the open-book or on the closed-book portion of the test, or was there no difference?) were surprisingly consistent with the actual examination performance of the two groups. The practice group was about equally divided in their response to this question, whereas a majority of the students felt that they had done better on the open-book portion of the test.

A large majority of the participants in each group were satisfied with the textbooks that they had selected. However, the few who wished they had selected different texts were more likely to have been practitioners than students.

Responses to the questions regarding the manner in which textbooks were used again differed rather sharply between the practice group and the student group. A large percentage of students (71%) used textbooks primarily to find answers, whereas the practice group were about

equally divided between those who used texts primarily to find answers and those who used texts primarily to confirm answers. This differ-ence may well be due to the fact that the examination was drawn from a test that was designed for individuals who had completed resi-dency training in pediatrics, and, therefore, called for information that was reasonably well-known

by most of the practitioners but not yet known by

many of the students.

DISCUSSION

In attempting to interpret the results that were obtained in this study, and before any firm conclusions are drawn from these results, certain

uncontrolled factors that might have influenced the performance of the participants should be considered.

(6)

had been given as a “live” recertification exam.

These variables are the amount of time allowed to

complete the examination and the degree of motivation to do well on the test.

It is quite possible that the time limit for the test was too short for the practice group. If this were the case, the practice group may have been unwilling to spend much time attempting to look

up answers in their textbooks, thus negating the possible effect that these reference materials may have had. In a live recertification test, one would

presumably allow enough time for those who

wished to use textbooks to do so without

jeopar-dizing the examinee’s opportunity to complete

the test.

A second variable that might produce different

results if the examination were live is the level of motivation in the practice group. Although prac-titioners participatedin the study willingly, it is difficult to believe that they were as highly motivated to perform well on the test as they would have been if the examination had been a

live recertification exam. Thus, in a live

examina-tion, especially if more time were allowed, one might well find that practitioners would use textbooks to a much greater extent than they did in the current study and, thereby, possibly improve their test performance under open-book conditions.

SUMMARY AND RECOMMENDATIONS

This study compared the performance of 100 medical students and 96 practicing pediatricians taking a graduate-level examination in pediatrics under open- vs. closed-book conditions. For each of these groups, the parameters of test

perform-ance that were investigated were overall mean

score under each testing condition and

correla-tion between performance under the two testing

conditions.

In the student group, the mean score obtained under the open-book condition was significantly

higher than the mean score obtained when the test was given under closed-book conditions and

the correlation between scores obtained under

the two testing conditions was positive but low (.41). For the practice group, however, no signif-icant difference between mean scores was

observed and the correlation between scores obtained under the two testing conditions was

substantially higher (.84).

These findings suggest that administering a

recertification test of cognitive knowledge to

practicing pediatricians under open-book condi-tions might yield essentially the same overall performance as a similar test given under closed-book conditions, and that the rank-ordering of examinees would be similar on both examina-tions.

Two variables that were not explored in the present study (time limit and level of motivation) might, however, have a substantial impact on the performance of practicing pediatricians taking a test of cognitive knowledge, under open-book conditions, for purposes of recertification. The

anticipated effects of lengthening the time limit

and increasing the level of examinee motivation would be a higher overall level of performance under open-book conditions than under closed-book conditions. There is no way to anticipate the effect of these variables on the rank ordering of examinees under the two testing conditions.

REFERENCES

1. Tussig L: A consideration of the open-book

examina-tions. Educ Psychob Meas 11:597, 1951.

2. Upson RH: Open-book examination. I Engin Educ

43:429, 1953.

3. Stalnaker JM, Stalnaker RC: Open-book examinations:

Results. With the Technicians 5:214, 1935.

4. Kalish BA: An experimental evaluation of the open-book examination. I Educ Psychob 49:200, 1958.

5. Marco GL: Psychological and Psychometric Correlates of

Open and Closed Book Achievement Test Modes.

Doctoral dissertation, University of Illinois, 1966.

6. Jehu D, Picton CJ, Futcher S: The use of notes in

examination. Br I Educ Psychol 40:335, 1970. 7. Krarup N, Naeraa N, Olsen C: Open-book tests in a

university course. High Educ 3: 157, 1974. 8. Michaels SA, Kieren TR: An investigation of open-book

and closed-book examinations in mathematics.

Alberta I Educ Res 19:202, 1973.

9. Feldhusen J:An evaluation of college students’ reactions

to open book examinations. Educ Psychol Meas

21:637, 1961.

10. Mankin HJ, Carter RM, Krawczyk M: The effect of permissive environment on scoring of the

ortho-paedic in-training examination. I Bone Joint Surg

(7)

1978;61;256

Pediatrics

Charles F. Schumacher, Diane W. Butzin, Laurence Finberg and Fredric D. Burg

Examination in Pediatrics