A GRADED PROBLEM ORIENTED RECORD TO EVALUATE CLINICAL PERFORMANCE

(1)

(Received November 13, 1972; revision accepted for publication January 16, 1973.)

ADDRESS FOR REPRINTS: (C.Z.M.) Officeof Medical Education, Research and Development, College of

Human Medicine, Michigan State University, East Lansing, Michigan 48823.

Pr.mrrncs, Vol. 51, No. 6, June 1973

A GRADED

PROBLEM

ORIENTED

RECORD

TO

EVALUATE

CLINICAL

PERFORMANCE

Carmi Z. Margolis, M.D., T. Joseph Sheehan, Ph.D., and William T. Stickley, Ph.D.

From the Department of Pediatrics, Yale University School of Medicine and Yale-New Haven Hospital, Department of Research in Health Education, University of Connecticut School of Medicine, Hartford,

Connecticut and Dieision of Research in Medical Education, Case-Western Reserve School of Medicine, Cleveland, Ohio

ABSTRACT. In order to see if the problem ori-ented record could be used to measure a student’s

facility at data collection, data recording, and

clini-cal problem solving, the problem oriented record

(POR) was divided into 14 sections, each of which

was graded for structure and completeness.

Maxi-mum possible score was 162 points. Seven faculty

members and a teaching resident at two

institu-tions graded a single work-up, with a coefficient of

variation of 2.8%. In two observed work-ups, there

was a high correlation betveen observed and

re-corded data. Three clerkship groups of 9 to 16

stu-dents each, some of whom were taught by a

teach-ing resident, handed in at different times during

the clerkship a total of 66 PORs, with a mean score

of 101.75 points (range, 10 to 152 points). It was

concluded that the graded POR could objectively

measure facility at data collection, data recording,

and problem solving, and that students were

taught these skills by grading a work-up

them-selves. Pediatrics, 51:980, 1973, MEDICAL

EDUCA-TION GRADUATE, EDUCATIONAL MEASUREMENT,

PE-DIATRIC EDUCATION, PROBLEM SOLVING, CLINICAL EVALUATION, MEDICAL AUDIT.

T

HOUGH there are several methods of

evaluating clinical performance, these

methods either measure mainly cognitive

skills, or are unreliable or not objective in

measuring combinations of psychomotor,

affective and cognitive skills.l Weed2 has

stated that audit of a problem oriented

rec-ord (POR) can provide an objective and

valid evaluation of a student’s data

collec-tion, data recording, and problem-solving

skills. If one accepts Weed’s premise that

these skills are essential components of

clin-ical performance, one ought to be able to

evaluate clinical performance by grading

the POR. In order to test this hypothesis,

we determined that the pediatric faculty

and a prospective teaching resident felt

that data collection, data recording, and

problem solving were important

instruc-tional objectives for a clerkship.

We then attempted to grade students’

PORs in order to measure the objectivity,

validity, and reliability of the POR as an

in-strument that would evaluate achievement

of these objectives. We at the same time

at-tempted to use the FOR to measure

changes in student performance, and hence

effectiveness of clinical teaching, between

students oh a pediatric clerkship who had a

teaching resident and students who had

none.

CLERKSHIPS AND STUDENTS

Collection and grading of PORs was

per-formed at two medical schools. At school 1,

the pediatric clerkship lasted eight weeks,

while at school 2 it lasted six weeks. The

scheduled activities for students at both

schools were strikingly similar. At both

schools students rotate through wards with

children of different age groupings, and the

daily schedule of rounds and conferences

includes work rounds, attending rounds, a

radiology conference, a didactic student

conference, and a departmental conference.

Students at both schools are expected to

take an active role in the management of

their patients and receive much individual

teaching from house staff. At school 1, there

(2)

ARTICLES

TABLE I

CHECKLIST FOR GRADING THE MODIFIED PROBLEM ORIENTED RECORD

Name of Section Required Items Passing Score

Actual Score

Maximum Extra Patients

Actual Extra Patients

I. Chief Complaint Inpatient’s or parent’s words,

pre-senting symptom, duration S - 1

II. Present Illness Age, sex, color, consistent chronologi-cal order, outlined pertinent

review of systems S - 6

VI. Problem List Problems numbered and titled,

prob-lem is correctly defined 2 - 6

chairman; faculty preceptors review

stu-dent work-ups individually, and each

stu-dent spends two hours per week in clinic.

At school 2, work-ups are reviewed by ward

residents.

At medical school 1, the faculty had a

written list of goals for students on the

clerkship,3 while at medical school 2,

in-structional objectives were determined by

using a questionnaire.4 At both schools,

data collection in a pediatric setting, data

recording in a pediatric chart, and problem

solving of pediatric clinical problems were

three objectives that were thought to be

im-portant by all faculty.

Mean medical college admission test

scores for the students on the clerkship

studied at school 1 and the two clerkships

studied at school 2 were as follows: verbal

scores were 618, 615, and 639, respectively;

quantitative scores were 630, 643, and 645;

general information scores were 585, 672,

and 644 and science scores were 583, 640,

and 644.

METHODS

Grading the Problem Oriented Record

A check list (Table I) was constructed

from an outline of a FOR that had been

de-veloped for use with pediatric patients.5

The FOR was divided into eight major

sec-tions (chief complaint through plan) and

section four, medical history, was

subdi-vided into eight sections (growth and

de-velopment through psychosocial). Each

section was graded for completeness and!

or structure. For example : if the chief

com-plaint described the presenting symptom in

the patient’s or parent’s own words, and

de-scribed duration, it was graded 3; otherwise

it was graded 0. If the problem list

con-tained numbered and titled problems that

met Weed’s2 criteria for problems, and if

the first problem was always general care,

the problem list was graded 2; otherwise it

was graded 0. Grading for the present

ill-ness is also outlined in Table I. Each major

section was also given from one to six extra

points, based on the grader’s subjective

im-pression. Maximum score was 162 points. It

should be noted that the details of the

grad-ing system are not essential, since records

may be objectively compared by using any

consistent grading system.#{176}

Validity and Objectivity of the Graded Problem Oriented Record

Validity of the graded problem oriented

record (GPOR) was checked by comparing

two observed work-ups with the students’

records. Intergrader agreement was

checked by having seven faculty members

(one from medicine, six from pediatrics)

and the teaching resident grade a single

workup using the GPOR.

Experimental Design

It was planned to collect work-ups at the

beginning, middle, and end of each

(3)

TABLE II

EXPERIMENTAL DESIGN

A B

I Early

C

2 Late

2 Late

Students on clerkship CDE, that was

taught by the teaching resident, also graded

a POR together just before they submitted

their first POR, and each had his PORs

re-turned with corrections by the teaching

res-ident. Differences in mean scores were

ana-None lyzed by Student’s or correlated Student’s

E

Ye,

D

CLEIIKSIIIP

lntergrader Agreement

None The scores for the POR graded by the

eight faculty members had a mean of 114

(range, 106 to 119) and a SD of ±3.2.

Coef-ficient of variation was 2.8%, signifying a

high degree of agreement between faculty

members.

TABLE III

PROBLEM ORIENTED RECORD

* Numbers without brackets were used for correlated t-test. Numbers in brackets were used for uncorrelated

t-test.

Time of

School

Year

Teaching .

Restd#{128}ni

I I

J

Beginning 1lid(lk End

ship. For reasons not anticipated, however,

PORs were only collected at the beginning

and middle of an eight-week clerkship

(AB) that was taken early in the third-year

clerks’ experience at medical school 1, at

the beginning, middle, and end of a late

six-week clerkship (CDE) and at the

mid-dle of a late six-week clerkship (F) at

med-ical school 2 (Table II). Only clerkship

CDE was taught by the teaching resident.

All students who submitted PORs for

grading by one of us (C.Z.M.) received the

same FOR outline, and one teaching session

on its use from a teaching resident (C.Z.M.).

t-test.

Validity

RESULTS

In the two instances when data collected

was compared with the data recorded in

the POR, no data were recorded in either of

the PORs that had not been elicited by

his-tory or physical examination. All data

elic-ited was recorded.

Group Scores on the GPOR

Students on three clerkships submitted

six groups of work-ups, which were

evalu-ated in 66 GPORs with a mean score of

SIGNIFICANCE OF DIFFEIIENCES BET5VEEN MEAN Gnoui’ ScoItEs

Group No. of Students Mean GPOR Score SD p

A 5 (9) 85.2 (81.0) 25.27 (39.69)

II-B 5 (8) 78.2 (83.3) 32.56

.10)

C 11(12) 117.9(110.7) 37.50(26.38)

D 11(11) 128.7

E 10 129.9 20.26

F 14(16) 89.2 (87.2) 20.26(20.26)

(4)

ARTICLES

TABLE IV

CORRELATIONS AMONG SECTIONS OF GPOR

1 2 3 4 5 6 7 8 9 10 11 12 1.3 14

Chief complaint 1 - .42 .49 .50 .14 .21 .23 .27 .15 .13 .22 .37 .42 .26

Present illness 2 .42 - .05 .39 .38 .21 .46 .47 .47 .09 .35 .19 .41 .05

Family history 3 .49 .03 - .48 -.01 .22 .10 -.04 -.06 -.03 -.04 .05 .41 .19

Physicalexamination 4 .50 .39 .48 - .20 .23 .40 .07 .23 .16 .28 .22 .53 .39

Problem list 3 .14 .38 -.01 .20 - .33 .05 .33 .20 .08 .40 .20 .28 .04

Plan 6 .21 .21 .22 .22 .33 - .06 .31 .24 -.01 .19 .38 .47 .13

Interpretation 7 .23 .46 .10 .40 .05 .06 - .25 .27 -.06 .13 .15 .35 .21

Growth and development 8 .27 .47 - .04 .07 .33 .31 .25 .61 .19 .38 .57 .26 .21

Nutrition 9 .15 .47 -.06 .23 .20 .24 .27 .61 - .03 .45 .60 .22 .08

Past history 10 .13 .09 -.03 .16 .08 -.01 -.06 .19 -.03 - .36 .23 .13 .44

Propliylaxis 11 .22 .35 -.04 .28 .40 .19 .13 .38 .45 .36 .57 .38 .24

llabits 12 .37 .19 .05 .22 .20 .38 .15 .57 .60 .30 .57 .34 .31

Environment 13 .42 .41 .41 .53 .28 .47 .35 .26 .22 .13 .38 .34 .20

Review of systems 14 .26 .05 .19 .39 .04 .13 .21 .21 .08 .44 .24 .31 .20

-104.9 points (range, 10 to 152 points

),

and

a SD ±30.1 points (Table III ). Of these 66

PORs, a total of 56 were useable

statisti-cally for the design shown in Table I. Ten

FORs could not be used for correlated

t-tests because ten students each handed in

only one FOR. Group scores and results of

significance tests are given in Table III.

Correlated t-tests comparing A to B, C to

D,

and D to E did not yield differences that

were statistically significant. Comparison

between different student groups at the

middle of the clerkships shows that group

D GPOR scores are significantly higher

than groups B or F.

DISCUSSION

There are at present six main methods of

evaluating a student’s clinical performances:

oral examinations, practical examinations

at the end of a clerkship, essay

examina-tions, objective examinations, observational

reports on a student’s clerkship

perfor-mance, and simulation exercises. A seventh

method, measurement of patient outcome,

is not suitable for evaluating students

though it may be used for evaluating

grad-uate physicians. The first five methods are

deficient either in that they only measure

cognitive skills, or in that they are not

ob-jective and reliable. Simulation exercises6

measure problem solving and have been

standardized, but do not measure

psycho-motor or affective skills involved in data

collection and recording. We first

deter-mined that pediatric faculty and a

pro-spective teaching resident felt that data

col-lection and recording in a pediatric setting

and solving of pediatric clinical problems

are three important objectives of a pediatric

clerkship. We then attempted to

standard-ize audit of the POR in order to evaluate

achievement of these three objectives. In

comparing two observed work-ups with

their corresponding PORs, we observed no

major discrepancies. We, therefore, strongly

suspect that the POR is valid in the sense

that the student writes actual data that he

has elicited with his history and physical

TABLE V

COMPONENT ANALYSIS OF GPOR

I 2 .3 4 5

Chief complaint I - .61

Presentillness 2 -.66

- .42

.09

- .01 -.38

- .01 -.29

.12 -.32

Family history 3 - .31

Physical examination 4 - .63

Problem list S -.47 - .75

- .54 .2

- .04 .01

-. 15 .23

- .15 .36

.20

- .12 -.60

Plan 6 -.51 -.01 -.16 .62 .13

Interpretation 7 - .47

Growthanddewelopment 8 -.66

Nutrition 9 -.62

- .17 .47

.47

- .33

-.05

-.23

- .60

-.05

-. 16

- .02

.19

.32

Past hiatory 10 -.30

Prophylaxis II -.65

Habit, 12 -.70

.06 .34 .33 .78 .26 .22 -. 10 .05 .14 -.28 -.22 .43

Environment 13 -.70 -.32 -.12 .20 -.11

Review of 8yetems 14 -.43

Percent of variance 32%

(5)

examination, although more comparisons of

elicited and reported data must be done to

conclude that all the data recorded is

elic-ited. We found little variation among eight

different graders, and we, therefore,

con-dude that the GPOR is adequately

objec-tive. We did not attempt to use any of the

available techniques for determining a

reli-ability coefficient, because we do not feel

that the usual methods for determining

reli-ability of an instrument of evaluation, such

as determination of its alpha coefficient, are

applicable to the GPOR (see Appendix).

We also wish to point to a few limitations

of the GPOR as an instrument of

evalua-tion. Though the problem list, plan, and

in-terpretation may be used to measure

clini-cal problem solving, one cannot conclude

that the CPOR measures clinical judgment,

or the ability to make decisions in a real

clinical situation, as defined by Feinstein.7

The POR does not contain certain

psycho-logical variables that routinely affect

deci-sions in clinical situations, such as anxiety

in an emergency or identification with a

dy-ing patient. It must, therefore, still be

de-termined whether the student’s written

de-cisions reflect clinical decisions or represent

rational decisions that would be made in

the absence of psychological variables

in-herent in a clinical situation. Also, a

disad-vantage of the GPOR for general use in

measuring data collection and

problem-solving abilities is that it takes an

experi-enced grader at least 20, usually 30,

mm-utes to grade a single write-up for the data

collection and problem-solving components

of clinical performance.

We attempted to standardize the GPOR

while using it to determine whether the

teaching resident was effective. The

differ-ences between mean GPOR scores of

stu-dents in group D and those in groups B and

F at the middle of the three clerkships were

not due either to differences in schools or

amount of experience of the students, since

groups D and F were in the same school,

and had the same amount of experience.

Group D might differ from groups B and F

in native intelligence, in that it had an

op-portunity to grade a GPOR, or in that it

had a teaching resident. It would seem

un-likely from Medical College Admission Test

scores that group D exceeded the other

groups in intelligence by an amount

suffi-cient to account for the highly significant

statistical difference between D and the

other groups. The fact that there was no

significant difference between the

begin-ning, middle, and end of CDE supports the

conclusion that group D differed chiefly in

that students had the unique experience of

themselves grading a FOR before they

sub-mitted their first POR. However, the group

of students with correlated scores at the

be-ginning and middle of CDE was small (11

students), and the experience of grading a

work-up before handing in work-ups at C

would tend to increase the scores of group

C. One might therefore guess that the

higher score at D than at C might mean

that effective teaching by the resident did

occur between C and D. We conclude that

the teaching resident was effective in

teach-ing data collection and problem-solving

skills to students at the beginning of CDE

by having them grade a work-up, and that

he was possibly also effective later in the

clerkship by grading their work-ups.

APPENDIX

Construct validity of the GPOR as an

in-strument of evaluation was measured by

correlating and reducing the fourteen

sec-tions of 38 different students’ GPORs by a

principal component analysis.8 The

remain-ing GPORs had been submitted by the

same 38 students and were, therefore, not

used in the analysis.

The intercorrelations among the 14

sec-tions of the GPOR appear in Table IV. If

one arbitrarily chooses a correlation of .40

as an indicator of moderate association

be-tween any two subsections of the GPOR,

one sees that student performance on

sec-tion 1, chief complaint, is moderately

re-lated to performance on sections 2, 3, 4, and

14 (present illness, family history, physical

examination, and environment), and is only

(6)

sections of the GPOR. If one looks down

any column or across any row of Table III,

the same pattern is seen for each section of

the GPOR. Student performance as

mea-sured by the GPOR therefore begins to

emerge as being multidimensional, and not

easily predictable from performance on a

single section.

In order to determine how many

inde-pendent dimensions comprise student

per-formance, a principal component analysis

was done on the correlation matrix in Table

IV. The sets of weights that imply

indepen-dent dimensions of student performance

ap-pear in the columns of Table V. For

exam-ple, the weights in column 1 of Table V

in-dicate how much each of the 14 sections

contributes to the first independent

dimen-sion. The weights in column 2, similarly,

in-dicate how much each section contributes

to the second independent dimension,

hav-ing already removed the first dimension.

The five columns shown in Table V, which

represent five independent dimensions of

the data displayed in Table III, alone

ac-count for over 72% of the variance in Table

III. (It would take a total of 14 sets of

weights or factorings of Table III to

ac-count for 100% of the variance.)

The various alpha coefficients for

measur-ing reliability of an instrument of

evalua-tion assume subtest units that are

homoge-nous, and demonstrate the highest

correla-tions when they are used to correlated

per-fectly homogenous subtest units.#{176}However,

from the component analysis of the GPOR,

we conclude that the weightings of the

sub-tests in the first three dimensions suggest

that the subtests that contribute heavily to

these dimensions, such as chief complaint,

present illness, and physical examination,

are measuring a nonhomogenous process.

The subtests that contribute heavily to the

fourth and fifth dimensions, such as

prob-lem list and plan, may measure cognitive

processes involved in synthesizing data,

for-mulating problems, and creating logical

plans. Since, then, student performance as

measured by the GPOR is comprised of

nonhomogenous subtest units, classical

de-terminations of internal consistency

be-tween homogenous subtest units are not

ap-propriate.9

REFERENCES

1. Charvat, J., McGuire, C., and Parsons, V.: A Review of the Nature and Uses of Examina-tions in Medical Education. Geneva: World Health Organization, 1968.

2. Weed, L. L.: Medical Records, Medical Educa-tion and Patient Care. Cleveland, Ohio: The

Press of Case-Western Reserve University,

1970.

3. Grupe, W., ed.: Objectives and Goals-Phase III Pediatrics. Cleveland, Ohio: Babies and

Chil-dren’s Hospital, Department of Pediatrics,

Case-Western Reserve University School of

Medicine, 1971.

4. Margolis, C. Z., Cook, C. D., and Pearson, H. A.:

Pediatric house officer training: I.

Educa-tional objectives and performance criteria.

Unpublished data.

5. Barich, D.: History and Physical Examination

Outline. Cleveland, Ohio: Cleveland Metro-politan General Hospital, Department of Pe-diatrics, 1970.

6. McGuire, C. H., and Solomon, L. M., ed.: Clini-cal Simulations. New York: Appleton-Cen-tury-Crofts Inc., 1971.

7. Feinstein, A. R.: Clinical Judgment. Baltimore: The Williams & Wilkins Co., 1967.

8. Morrison, D. F.: Multivariate Statistical

Meth-ods. New York: McGraw-Hill, 1967.

9. Rajaratnam, N., Cronback, L. J., and Gleser, C. C.: Ceneralizability of stratified parallel

tests. Psychometrika, 30:39, 1965.

Acknowled gment

We would like to thank Drs. C. D. Cook and

H. A. Pearson for helpful editorial criticisms, Miss

Caroline Kraynak for invaluable assistance in

pre-paring the GPOR, the faculty and students at both

schools for their cooperation and patience, and

(7)

1973;51;980

Pediatrics

Carmi Z. Margolis, T. Joseph Sheehan and William T. Stickley

PERFORMANCE