(Received November 13, 1972; revision accepted for publication January 16, 1973.)
ADDRESS FOR REPRINTS: (C.Z.M.) Officeof Medical Education, Research and Development, College of
Human Medicine, Michigan State University, East Lansing, Michigan 48823.
Pr.mrrncs, Vol. 51, No. 6, June 1973
A GRADED
PROBLEM
ORIENTED
RECORD
TO
EVALUATE
CLINICAL
PERFORMANCE
Carmi Z. Margolis, M.D., T. Joseph Sheehan, Ph.D., and William T. Stickley, Ph.D.
From the Department of Pediatrics, Yale University School of Medicine and Yale-New Haven Hospital, Department of Research in Health Education, University of Connecticut School of Medicine, Hartford,
Connecticut and Dieision of Research in Medical Education, Case-Western Reserve School of Medicine, Cleveland, Ohio
ABSTRACT. In order to see if the problem ori-ented record could be used to measure a student’s
facility at data collection, data recording, and
clini-cal problem solving, the problem oriented record
(POR) was divided into 14 sections, each of which
was graded for structure and completeness.
Maxi-mum possible score was 162 points. Seven faculty
members and a teaching resident at two
institu-tions graded a single work-up, with a coefficient of
variation of 2.8%. In two observed work-ups, there
was a high correlation betveen observed and
re-corded data. Three clerkship groups of 9 to 16
stu-dents each, some of whom were taught by a
teach-ing resident, handed in at different times during
the clerkship a total of 66 PORs, with a mean score
of 101.75 points (range, 10 to 152 points). It was
concluded that the graded POR could objectively
measure facility at data collection, data recording,
and problem solving, and that students were
taught these skills by grading a work-up
them-selves. Pediatrics, 51:980, 1973, MEDICAL
EDUCA-TION GRADUATE, EDUCATIONAL MEASUREMENT,
PE-DIATRIC EDUCATION, PROBLEM SOLVING, CLINICAL EVALUATION, MEDICAL AUDIT.
T
HOUGH there are several methods ofevaluating clinical performance, these
methods either measure mainly cognitive
skills, or are unreliable or not objective in
measuring combinations of psychomotor,
affective and cognitive skills.l Weed2 has
stated that audit of a problem oriented
rec-ord (POR) can provide an objective and
valid evaluation of a student’s data
collec-tion, data recording, and problem-solving
skills. If one accepts Weed’s premise that
these skills are essential components of
clin-ical performance, one ought to be able to
evaluate clinical performance by grading
the POR. In order to test this hypothesis,
we determined that the pediatric faculty
and a prospective teaching resident felt
that data collection, data recording, and
problem solving were important
instruc-tional objectives for a clerkship.
We then attempted to grade students’
PORs in order to measure the objectivity,
validity, and reliability of the POR as an
in-strument that would evaluate achievement
of these objectives. We at the same time
at-tempted to use the FOR to measure
changes in student performance, and hence
effectiveness of clinical teaching, between
students oh a pediatric clerkship who had a
teaching resident and students who had
none.
CLERKSHIPS AND STUDENTS
Collection and grading of PORs was
per-formed at two medical schools. At school 1,
the pediatric clerkship lasted eight weeks,
while at school 2 it lasted six weeks. The
scheduled activities for students at both
schools were strikingly similar. At both
schools students rotate through wards with
children of different age groupings, and the
daily schedule of rounds and conferences
includes work rounds, attending rounds, a
radiology conference, a didactic student
conference, and a departmental conference.
Students at both schools are expected to
take an active role in the management of
their patients and receive much individual
teaching from house staff. At school 1, there
ARTICLES
TABLE I
CHECKLIST FOR GRADING THE MODIFIED PROBLEM ORIENTED RECORD
Name of Section Required Items Passing Score
Actual Score
Maximum Extra Patients
Actual Extra Patients
I. Chief Complaint Inpatient’s or parent’s words,
pre-senting symptom, duration S - 1
II. Present Illness Age, sex, color, consistent chronologi-cal order, outlined pertinent
review of systems S - 6
VI. Problem List Problems numbered and titled,
prob-lem is correctly defined 2 - 6
chairman; faculty preceptors review
stu-dent work-ups individually, and each
stu-dent spends two hours per week in clinic.
At school 2, work-ups are reviewed by ward
residents.
At medical school 1, the faculty had a
written list of goals for students on the
clerkship,3 while at medical school 2,
in-structional objectives were determined by
using a questionnaire.4 At both schools,
data collection in a pediatric setting, data
recording in a pediatric chart, and problem
solving of pediatric clinical problems were
three objectives that were thought to be
im-portant by all faculty.
Mean medical college admission test
scores for the students on the clerkship
studied at school 1 and the two clerkships
studied at school 2 were as follows: verbal
scores were 618, 615, and 639, respectively;
quantitative scores were 630, 643, and 645;
general information scores were 585, 672,
and 644 and science scores were 583, 640,
and 644.
METHODS
Grading the Problem Oriented Record
A check list (Table I) was constructed
from an outline of a FOR that had been
de-veloped for use with pediatric patients.5
The FOR was divided into eight major
sec-tions (chief complaint through plan) and
section four, medical history, was
subdi-vided into eight sections (growth and
de-velopment through psychosocial). Each
section was graded for completeness and!
or structure. For example : if the chief
com-plaint described the presenting symptom in
the patient’s or parent’s own words, and
de-scribed duration, it was graded 3; otherwise
it was graded 0. If the problem list
con-tained numbered and titled problems that
met Weed’s2 criteria for problems, and if
the first problem was always general care,
the problem list was graded 2; otherwise it
was graded 0. Grading for the present
ill-ness is also outlined in Table I. Each major
section was also given from one to six extra
points, based on the grader’s subjective
im-pression. Maximum score was 162 points. It
should be noted that the details of the
grad-ing system are not essential, since records
may be objectively compared by using any
consistent grading system.#{176}
Validity and Objectivity of the Graded Problem Oriented Record
Validity of the graded problem oriented
record (GPOR) was checked by comparing
two observed work-ups with the students’
records. Intergrader agreement was
checked by having seven faculty members
(one from medicine, six from pediatrics)
and the teaching resident grade a single
workup using the GPOR.
Experimental Design
It was planned to collect work-ups at the
beginning, middle, and end of each
TABLE II
EXPERIMENTAL DESIGN
A B
I Early
C
2 Late
2 Late
Students on clerkship CDE, that was
taught by the teaching resident, also graded
a POR together just before they submitted
their first POR, and each had his PORs
re-turned with corrections by the teaching
res-ident. Differences in mean scores were
ana-None lyzed by Student’s or correlated Student’s
E
Ye,
D
CLEIIKSIIIP
lntergrader Agreement
None The scores for the POR graded by the
eight faculty members had a mean of 114
(range, 106 to 119) and a SD of ±3.2.
Coef-ficient of variation was 2.8%, signifying a
high degree of agreement between faculty
members.
TABLE III
PROBLEM ORIENTED RECORD
* Numbers without brackets were used for correlated t-test. Numbers in brackets were used for uncorrelated
t-test.
Time of
School
Year
Teaching .
Restd#{128}ni
I I
J
Beginning 1lid(lk End
ship. For reasons not anticipated, however,
PORs were only collected at the beginning
and middle of an eight-week clerkship
(AB) that was taken early in the third-year
clerks’ experience at medical school 1, at
the beginning, middle, and end of a late
six-week clerkship (CDE) and at the
mid-dle of a late six-week clerkship (F) at
med-ical school 2 (Table II). Only clerkship
CDE was taught by the teaching resident.
All students who submitted PORs for
grading by one of us (C.Z.M.) received the
same FOR outline, and one teaching session
on its use from a teaching resident (C.Z.M.).
t-test.
Validity
RESULTS
In the two instances when data collected
was compared with the data recorded in
the POR, no data were recorded in either of
the PORs that had not been elicited by
his-tory or physical examination. All data
elic-ited was recorded.
Group Scores on the GPOR
Students on three clerkships submitted
six groups of work-ups, which were
evalu-ated in 66 GPORs with a mean score of
SIGNIFICANCE OF DIFFEIIENCES BET5VEEN MEAN Gnoui’ ScoItEs
Group No. of Students Mean GPOR Score SD p
A 5 (9) 85.2 (81.0) 25.27 (39.69)
II-B 5 (8) 78.2 (83.3) 32.56
.10)
C 11(12) 117.9(110.7) 37.50(26.38)
D 11(11) 128.7
E 10 129.9 20.26
F 14(16) 89.2 (87.2) 20.26(20.26)
ARTICLES
TABLE IV
CORRELATIONS AMONG SECTIONS OF GPOR
1 2 3 4 5 6 7 8 9 10 11 12 1.3 14
Chief complaint 1 - .42 .49 .50 .14 .21 .23 .27 .15 .13 .22 .37 .42 .26
Present illness 2 .42 - .05 .39 .38 .21 .46 .47 .47 .09 .35 .19 .41 .05
Family history 3 .49 .03 - .48 -.01 .22 .10 -.04 -.06 -.03 -.04 .05 .41 .19
Physicalexamination 4 .50 .39 .48 - .20 .23 .40 .07 .23 .16 .28 .22 .53 .39
Problem list 3 .14 .38 -.01 .20 - .33 .05 .33 .20 .08 .40 .20 .28 .04
Plan 6 .21 .21 .22 .22 .33 - .06 .31 .24 -.01 .19 .38 .47 .13
Interpretation 7 .23 .46 .10 .40 .05 .06 - .25 .27 -.06 .13 .15 .35 .21
Growth and development 8 .27 .47 - .04 .07 .33 .31 .25 .61 .19 .38 .57 .26 .21
Nutrition 9 .15 .47 -.06 .23 .20 .24 .27 .61 - .03 .45 .60 .22 .08
Past history 10 .13 .09 -.03 .16 .08 -.01 -.06 .19 -.03 - .36 .23 .13 .44
Propliylaxis 11 .22 .35 -.04 .28 .40 .19 .13 .38 .45 .36 .57 .38 .24
llabits 12 .37 .19 .05 .22 .20 .38 .15 .57 .60 .30 .57 .34 .31
Environment 13 .42 .41 .41 .53 .28 .47 .35 .26 .22 .13 .38 .34 .20
Review of systems 14 .26 .05 .19 .39 .04 .13 .21 .21 .08 .44 .24 .31 .20
-104.9 points (range, 10 to 152 points
),
anda SD ±30.1 points (Table III ). Of these 66
PORs, a total of 56 were useable
statisti-cally for the design shown in Table I. Ten
FORs could not be used for correlated
t-tests because ten students each handed in
only one FOR. Group scores and results of
significance tests are given in Table III.
Correlated t-tests comparing A to B, C to
D,
and D to E did not yield differences thatwere statistically significant. Comparison
between different student groups at the
middle of the clerkships shows that group
D GPOR scores are significantly higher
than groups B or F.
DISCUSSION
There are at present six main methods of
evaluating a student’s clinical performances:
oral examinations, practical examinations
at the end of a clerkship, essay
examina-tions, objective examinations, observational
reports on a student’s clerkship
perfor-mance, and simulation exercises. A seventh
method, measurement of patient outcome,
is not suitable for evaluating students
though it may be used for evaluating
grad-uate physicians. The first five methods are
deficient either in that they only measure
cognitive skills, or in that they are not
ob-jective and reliable. Simulation exercises6
measure problem solving and have been
standardized, but do not measure
psycho-motor or affective skills involved in data
collection and recording. We first
deter-mined that pediatric faculty and a
pro-spective teaching resident felt that data
col-lection and recording in a pediatric setting
and solving of pediatric clinical problems
are three important objectives of a pediatric
clerkship. We then attempted to
standard-ize audit of the POR in order to evaluate
achievement of these three objectives. In
comparing two observed work-ups with
their corresponding PORs, we observed no
major discrepancies. We, therefore, strongly
suspect that the POR is valid in the sense
that the student writes actual data that he
has elicited with his history and physical
TABLE V
COMPONENT ANALYSIS OF GPOR
I 2 .3 4 5
Chief complaint I - .61
Presentillness 2 -.66
- .42
.09
- .01 -.38
- .01 -.29
.12 -.32
Family history 3 - .31
Physical examination 4 - .63
Problem list S -.47 - .75
- .54 .2
- .04 .01
-. 15 .23
- .15 .36
.20
- .12 -.60
Plan 6 -.51 -.01 -.16 .62 .13
Interpretation 7 - .47
Growthanddewelopment 8 -.66
Nutrition 9 -.62
- .17 .47
.47
- .33
-.05
-.23
- .60
-.05
-. 16
- .02
.19
.32
Past hiatory 10 -.30
Prophylaxis II -.65
Habit, 12 -.70
.06 .34 .33 .78 .26 .22 -. 10 .05 .14 -.28 -.22 .43
Environment 13 -.70 -.32 -.12 .20 -.11
Review of 8yetems 14 -.43
Percent of variance 32%
examination, although more comparisons of
elicited and reported data must be done to
conclude that all the data recorded is
elic-ited. We found little variation among eight
different graders, and we, therefore,
con-dude that the GPOR is adequately
objec-tive. We did not attempt to use any of the
available techniques for determining a
reli-ability coefficient, because we do not feel
that the usual methods for determining
reli-ability of an instrument of evaluation, such
as determination of its alpha coefficient, are
applicable to the GPOR (see Appendix).
We also wish to point to a few limitations
of the GPOR as an instrument of
evalua-tion. Though the problem list, plan, and
in-terpretation may be used to measure
clini-cal problem solving, one cannot conclude
that the CPOR measures clinical judgment,
or the ability to make decisions in a real
clinical situation, as defined by Feinstein.7
The POR does not contain certain
psycho-logical variables that routinely affect
deci-sions in clinical situations, such as anxiety
in an emergency or identification with a
dy-ing patient. It must, therefore, still be
de-termined whether the student’s written
de-cisions reflect clinical decisions or represent
rational decisions that would be made in
the absence of psychological variables
in-herent in a clinical situation. Also, a
disad-vantage of the GPOR for general use in
measuring data collection and
problem-solving abilities is that it takes an
experi-enced grader at least 20, usually 30,
mm-utes to grade a single write-up for the data
collection and problem-solving components
of clinical performance.
We attempted to standardize the GPOR
while using it to determine whether the
teaching resident was effective. The
differ-ences between mean GPOR scores of
stu-dents in group D and those in groups B and
F at the middle of the three clerkships were
not due either to differences in schools or
amount of experience of the students, since
groups D and F were in the same school,
and had the same amount of experience.
Group D might differ from groups B and F
in native intelligence, in that it had an
op-portunity to grade a GPOR, or in that it
had a teaching resident. It would seem
un-likely from Medical College Admission Test
scores that group D exceeded the other
groups in intelligence by an amount
suffi-cient to account for the highly significant
statistical difference between D and the
other groups. The fact that there was no
significant difference between the
begin-ning, middle, and end of CDE supports the
conclusion that group D differed chiefly in
that students had the unique experience of
themselves grading a FOR before they
sub-mitted their first POR. However, the group
of students with correlated scores at the
be-ginning and middle of CDE was small (11
students), and the experience of grading a
work-up before handing in work-ups at C
would tend to increase the scores of group
C. One might therefore guess that the
higher score at D than at C might mean
that effective teaching by the resident did
occur between C and D. We conclude that
the teaching resident was effective in
teach-ing data collection and problem-solving
skills to students at the beginning of CDE
by having them grade a work-up, and that
he was possibly also effective later in the
clerkship by grading their work-ups.
APPENDIX
Construct validity of the GPOR as an
in-strument of evaluation was measured by
correlating and reducing the fourteen
sec-tions of 38 different students’ GPORs by a
principal component analysis.8 The
remain-ing GPORs had been submitted by the
same 38 students and were, therefore, not
used in the analysis.
The intercorrelations among the 14
sec-tions of the GPOR appear in Table IV. If
one arbitrarily chooses a correlation of .40
as an indicator of moderate association
be-tween any two subsections of the GPOR,
one sees that student performance on
sec-tion 1, chief complaint, is moderately
re-lated to performance on sections 2, 3, 4, and
14 (present illness, family history, physical
examination, and environment), and is only
sections of the GPOR. If one looks down
any column or across any row of Table III,
the same pattern is seen for each section of
the GPOR. Student performance as
mea-sured by the GPOR therefore begins to
emerge as being multidimensional, and not
easily predictable from performance on a
single section.
In order to determine how many
inde-pendent dimensions comprise student
per-formance, a principal component analysis
was done on the correlation matrix in Table
IV. The sets of weights that imply
indepen-dent dimensions of student performance
ap-pear in the columns of Table V. For
exam-ple, the weights in column 1 of Table V
in-dicate how much each of the 14 sections
contributes to the first independent
dimen-sion. The weights in column 2, similarly,
in-dicate how much each section contributes
to the second independent dimension,
hav-ing already removed the first dimension.
The five columns shown in Table V, which
represent five independent dimensions of
the data displayed in Table III, alone
ac-count for over 72% of the variance in Table
III. (It would take a total of 14 sets of
weights or factorings of Table III to
ac-count for 100% of the variance.)
The various alpha coefficients for
measur-ing reliability of an instrument of
evalua-tion assume subtest units that are
homoge-nous, and demonstrate the highest
correla-tions when they are used to correlated
per-fectly homogenous subtest units.#{176}However,
from the component analysis of the GPOR,
we conclude that the weightings of the
sub-tests in the first three dimensions suggest
that the subtests that contribute heavily to
these dimensions, such as chief complaint,
present illness, and physical examination,
are measuring a nonhomogenous process.
The subtests that contribute heavily to the
fourth and fifth dimensions, such as
prob-lem list and plan, may measure cognitive
processes involved in synthesizing data,
for-mulating problems, and creating logical
plans. Since, then, student performance as
measured by the GPOR is comprised of
nonhomogenous subtest units, classical
de-terminations of internal consistency
be-tween homogenous subtest units are not
ap-propriate.9
REFERENCES
1. Charvat, J., McGuire, C., and Parsons, V.: A Review of the Nature and Uses of Examina-tions in Medical Education. Geneva: World Health Organization, 1968.
2. Weed, L. L.: Medical Records, Medical Educa-tion and Patient Care. Cleveland, Ohio: The
Press of Case-Western Reserve University,
1970.
3. Grupe, W., ed.: Objectives and Goals-Phase III Pediatrics. Cleveland, Ohio: Babies and
Chil-dren’s Hospital, Department of Pediatrics,
Case-Western Reserve University School of
Medicine, 1971.
4. Margolis, C. Z., Cook, C. D., and Pearson, H. A.:
Pediatric house officer training: I.
Educa-tional objectives and performance criteria.
Unpublished data.
5. Barich, D.: History and Physical Examination
Outline. Cleveland, Ohio: Cleveland Metro-politan General Hospital, Department of Pe-diatrics, 1970.
6. McGuire, C. H., and Solomon, L. M., ed.: Clini-cal Simulations. New York: Appleton-Cen-tury-Crofts Inc., 1971.
7. Feinstein, A. R.: Clinical Judgment. Baltimore: The Williams & Wilkins Co., 1967.
8. Morrison, D. F.: Multivariate Statistical
Meth-ods. New York: McGraw-Hill, 1967.
9. Rajaratnam, N., Cronback, L. J., and Gleser, C. C.: Ceneralizability of stratified parallel
tests. Psychometrika, 30:39, 1965.
Acknowled gment
We would like to thank Drs. C. D. Cook and
H. A. Pearson for helpful editorial criticisms, Miss
Caroline Kraynak for invaluable assistance in
pre-paring the GPOR, the faculty and students at both
schools for their cooperation and patience, and