Evaluation
of
Clinical
Competence:
The
Gap
Between
Expectation
and
Performance
Bahman Joorabchi, MD, MEd and Jeffrey M. Devries, MD, MPH
ABSTRACT. Objective. To evaluate a 3-year experience
with the Objective Structured Clinical Examinations
(OSCEs) and to compare faculty expectations with
resi-dent performance.
Design. Descriptive analysis of measures of resident performance.
Setting. Community-based pediatric residency
pro-gram in Michigan.
Participants. One hundred twenty-six pediatric
resi-dents at all levels of training.
Methods. The three examinations consisted of 36 to 42
5-minute stations, testing skills in physical examination,
history, counseling, telephone management, and test
in-terpretation. A committee of faculty and chief residents
predetermined minimum pass levels for each resident
level. Results were compared with other indices of
resi-dent performance.
Results. There was evidence for content, construct, and concurrent validity, as well as a high degree of reliability.
However, 40% to 96% of residents scored below the
mm-imum pass levels for their levels. In each examination,
third-year residents had the highest failure rates, yet they
scored well on the American Board of Pediatrics
in-train-ing examination and on their monthly clinical
evalua-tions. Furthermore, for residents at all levels, the scores
reflecting application of data were significantly lower
than those assessing data gathering.
Conclusions. The gaps between expectations and
per-formance, and between data gathering and application,
have important implications for institutional educational
philosophy, suggesting a shift toward more clinically
oriented and learner-directed strategies in the design of
instructional and evaluation methods. Pediatrics 1996;97:
179-184; evaluation of clinical competence, criterion-referenced evaluation.
ABBREVIATIONS. OSCE, Objective Structured Clinical Examina-tion; MPL, minimum pass level; ITE, in-training examination; RPR, resident performance rating.
Evaluating clinical competence is a desired, but
elusive, goal in medical education. To overcome the
problems of inadequate sampling, subjective scoring,
interrater variability, and low reliability, Harden et
aP introduced the Objective Structured Clinical
Ex-amination (OSCE) 21 years ago. The method uses
From the Departments of Pediatrics, Henry Ford Health System, Detroit, and St. Joseph Mercy Hospital, Pontiac, MI.
Recipient of The First Ray E. Helfer Award for Innovation in Pediatric Education, Ambulatory Pediatric Association, May 3, 1994.
Received for publication Nov 7,1994; accepted Mar 21, 1995. Reprint requests to (B.J.) 900 Woodward Aye, Pontiac, MI 48341-2985. PEDIATRICS (ISSN 0031 4005). Copyright © 1996 by the American Acad-emy of Pediatrics.
real or simulated patients in a multistation format
that evaluates a variety of clinical skills and attitudes, as well as cognitive objectives. In half of the stations, the examinees, provided with specific instructions, carry out clearly defined tasks, such as patient
inter-view or counseling, focused physical examination,
performance of a procedure, telephone management,
and interpretation of test results. While performing
these tasks, an observer evaluates the students using
a detailed checklist that contains a long list of all
possible actions that the students should take and
some that they should avoid. Additionally, the real
or simulated patients complete their own rating
scales evaluating communication skills and attitudes.
In the other half of the stations, the students
an-swer open-ended or multiple-choice questions based
on the results of the clinical task just completed. They
may be asked to generate a list of differential
diag-noses, to interpret clinical findings, to propose
treat-ment plans, to write admission orders, etc.
This form of examination is gaining wide
accep-tance in Europe and in the Commonwealth
coun-tries. It is used for instruction as well as for
evalua-tion. Its use in the United States has been limited to
a relatively small number of medical schools and
university residency programs.29 Among the
rea-sons for lack of widespread application in this
coun-try include the lack of a tradition for clinical
evalu-ation, historical reliance on paper-and-pencil tests as
the ultimate in objective evaluation, and the high
demand for faculty time, commitment, and expertise.
Since 1990, we have administered three OSCEs to a
total of 126 pediatric residents in a community-based
program.
The
report
of
the experience in the firstyear
remains the only published pediatric examplefor residents.1#{176} This report updates our experience with the OSCE, assessing its validity and reliability,
and compares faculty expectations with resident
performance.
We hypothesized that the examinations would
continue to be both valid and reliable. Based on our
observations in previous years, we further
hypothe-sized that resident performance would fall below
faculty expectations.
METHODS
The planning process, patient selection and training, prepara-tion of checklists, rating scales and test questions, orientation of residents and observers, and details of the test administration have been reported previously for the 1990 OSCE.’#{176}The examina-tions given in 1991 and 1993 were similar in format but not identical in content. A task force composed of six to eight full-time clinical pediatric faculty (both generalists and subspecialists) and
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
two fourth-year level chief residents created a blueprint for each examination. The selection of problems for station development was guided by the written, problem-based program objectives and was performed according to such considerations as adequate sam-pling, prevalence, priority, availability, and practicality.
A complete list of all stations used in the three examinations is given in “Appendix.” A sampling of this list, consisting of 36 to 42 5-minute stations, was used each year. Each examination com-prised four physical examination stations, six to eight interviews (including counseling and telephone management), six laboratory
stations, and one or two technical procedures. After most stations, the residents answered written questions and outlined their treat-ment plans. Additionally, six to eight rest stops were provided.
In the physical examination, history, counseling, procedures, and telephone management stations, residents performed clearly defined clinical tasks, while an observer rated their performance on a detailed checklist. The patients (or their parents) evaluated the residents’ communication skills and bedside manners on an uniform, five-item rating scale.
After production of the final version of each station, but before its administration, the planning task force used the Nedelsky’t and modified Angoff’2 methods to determine a minimum pass level
(MPL) for each OSCE. For each item in the observer checklists and
rating scales and for each short-answer or multiple-choice ques-tion and management plan, committee members agreed on the correct answers and arrived at a consensus score that a minimally competent resident must achieve to pass that particular item or question. The determination of performance level is essentially a subjective process representing faculty expectations. Every at-tempt was made to moderate this expectation, keeping it in line with program objectives. MPLs were calculated for each station and subsection of the test, as well as for the entire OSCE. MPLs were derived for the first- and third-year residents separately. An average of first- and third-year values was assigned to the second-year residents. Using these MPLs, the proportion of residents at various levels passing each OSCE was calculated. Members of the standard setting committee also rated residents during the exam-ination but in only one or two stations. In any case, the checklists were designed in such a way as to minimize subjective scoring.
Each examination was conducted on a weekend day during a 4-hour period. A clinic module large enough to accommodate up to 42 stations was used. Breakfast, lunch, and refreshments were provided. As part of their orientation, the residents were told that the purpose of the examination was to provide feedback to the faculty and to the residents on the effectiveness of clinical instruc-tion and to assist the Evaluation Committee in the disposition of borderline cases. After the examination, the residents completed a seven-item questionnaire’#{176} evaluating the experience.
All examinations were corrected by hand. Scores were derived for each station both individually and, where applicable, as cou-plets, combining the observer scores with the postencounter writ-ten results. The score for each station was standardized to a maximum score of 10. For each OSCE, separate scores were de-rived for data gathering (the sum of all observer checklists in history, counseling, telephone management, and physical exami-nation stations), application (the sum of postencounter written scores), and communication skills (the sum of rating scales com-pleted by patients). Thus, stations with patients yielded three
different scores, and those without patients produced two types of scores, all marked independently.
Scores of residents at various levels of training were compared with each other, as well as with the results of the American Board
of Pediatrics in-training examinations (ITEs) and monthly resident performance ratings (RPRs), featuring the critical incident tech-nique.t3 In this method, members of the evaluating group, which consisted of the rounding physician, the senior resident(s), and the head nurse, each provided accounts of specific behaviors (“critical incidents”) displayed by the resident during the month. Both
positive and negative incidents were recorded. Subjective assess-ments (eg, “interested” and “hard working”) were avoided. To facilitate recall, blank slips for recording and filing behaviors were made available in various locations. Following the listing of crit-ical incidents, a nine-item, Likert-type rating scale was completed. The ratings were based on group consensus, facilitated by the critical incidents just recorded. In addition to the total RPR scores, subscores for data gathering, application, and communication skills were derived. Mean ratings from the 3 months closest to the date of the OSCE were used. Only the 1993 RPRS were used, because subscores were available for only that year.
Using statistical computer programs, analysis of variance,
Pear-son’s product-moment correlations, paired t tests, and were calculated. Generalizability coefficients14 were determined in var-ious ways. A coefficient was derived for the entire test, consider-ing each station couplet score as an item. Separate coefficients were derived for application, data gathering, and communication skills by selecting only postencounter written scores, observation checklists, and patient rating scales, respectively.
Validity
RESULTS
Content validity is defined as the extent to which
inspection and analysis of the contents of an
exami-nation indicate that the stated or implied objectives
are
being measured.Content validity was indicated by the following:
(1) faculty review of the OSCE blueprints verified a
wide sampling of content and process skills that
adequately reflected written program objectives; (2)
analysis of individual stations by the faculty
mdi-cated
that the OSCEs measured clinically important,common, and relevant objectives; and (3) on the
post-test questionnaire, the majority of residents agreed
that the examinations were realistic and appropriate measures of clinical competence.
Construct validity is said to exist when a
hypoth-esis advanced to define an abstract concept such as
clinical competence is validated by the results of the
test. The following data demonstrate the construct
validity of these tests.
Table I shows the sum of scores for all three
OSCEs. As can be seen, residents at an advanced
level of training scored higher than those at more
junior levels. The differences among resident levels
become
even
more
significant
when
only
the
data
gathering
and information processing scores arecon-sidered. This relationship would be anticipated,
be-cause tests designed to measure clinical competence
TABLE 1. Analysis of Variance o f the Scores Group ed According to the Level of Training*
Year n Sum of All Scores
FL-I PL-2 PL-3 Ft P
1990 1991 1993 29 32 65 136.1 163.2 190.2 151.1 184.2 220.7 166.2 188.9 230.5 34.65 5.35 21.04 .000 .01 .000
* Scores are given as group means. FL-I, PL-2, PL-3 indicate first-, second-, and third-year postgraduate level, respectively; n indicates the
total number of residents in each of the examinations. t F statistic of analysis of variance.
tSignificance of the differences among the three levels of training. Student-Newman-Keuls multiple comparisons revealed significant differences among all groups, except for PL-2 and PL-3 groups in 1991 and 1993.
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
TABLE 2. Analysis of Variance ofthe Scores Grouped According to the Level of Training
Year Communication Skills*
FL-I PL-2 PL-3 F P
1990 12.85 14.4 15.8 2.64 .09
1991 52.6 53.4 51.8 0.09 .91
1993 65.1 75.3 73.7 2.25 .11
* Scores represent group means; 1990 scores are lower because only two completed rating scales were available. Abbreviations are as in
Table 1.
should discriminate among groups at different
stages of training.
Table 2 lists the mean group scores on
communi-cation skills as recorded by the standardized patients
(or their parents) in the history, counseling,
tele-phone
management,
and physical examinationsta-tions. As can be seen, there were no differences
among the resident groups in their social skills and
communication styles, as measured by this uniform
questionnaire. This failure to demonstrate
improve-ment might be anticipated, inasmuch as these skills
are presumably more ingrained, and their
develop-ment has not been addressed adequately in our
curriculum.
Concurrent validity is presumed when there is
agreement between the results of a given test and
those of others measuring attainment of the same
objectives. Concurrent validity is indicated by the
following data.
Table 3 indicates the results of comparisons
be-tween the American Board of Pediatrics ITE and the
OSCE.
In all three examinations, there were goodcorrelations between the ITE and overall OSCE
scores. The correlations were highest between the
ITE
and postencounter written stations (application),moderate
with observer checklists (data gathering),and
low
with communication scores. These resultsalso could be taken as a measure of construct
valid-ity; the OSCE is measuring areas that are not tested
by paper-and-pencil methods.
Table 4 reveals the correlations between the
monthly RPRs (and their subsections) and
corre-sponding scores of the 1993 OSCE. There were
mod-erate,
but still significant, correlations between the various subsections.Reliability was checked by measuring
generaliz-ability coefficients.14 This is an index of the
reproduc-ibility of examinee ranking if tested by another
sim-ilar examination containing a different sample of
cases and/or examiners.
The
results of the reliability tests are shown inTable 5. The values for the tests in their entirety are comparable to the reliability scores of standardized
paper-and-pencil examinations. The coefficients for
subscores measuring general skills of data gathering,
application, and communication are also within
ac-ceptable limits. In light of the recognized
phenome-non of poor correlations between performance in one
case with that in other cases (content specificity),6’9’15
the high values obtained in these series may be
at-tributable to the relatively large number of stations.
TABLE 3. Concurrent Validity: Correlations Between OSCE
Subscores and the American Board of Pediatrics ITE Scores
OSCE Scores ITE Scores
1990 1991 1993 Mean
Entire test Application Data gathering Communication
0.71* 0.72* 0.56* 0.45t
0.53* 059*
0.32 0.15
0.54* 0.71* 0.55* 0.15
0.59 0.67 0.47 0.25
*P < .01. t P < .05.
TABLE 4.
Subscores an
Concurrent Validity: d the Monthly RPRs*
Correlations B etween OSCE
OSCE Scores
RPRs
Overall Application Data Communi-Gathering cation
Entire test 0.39t 0.57t 0.46+ 0.4I
Application 0.28 0.58t 0.36t 0.28*
Data gathering 0.38t 0.54t 0.39t 0.42t
Communication 0.30* 0.32* 0.38t 0.32*
* Complete data available only for 1993 examination; n = 65. tP < .01.
*P < .05.
TABLE 5. Generaliz ability Coefficients
OSCE Score 1990 1991 1993
Total test 0.80 0.81 0.86
Data gathering 0.77 0.71 0.82
Application 0.63 0.73 0.67
Communication 0.26* 0.63 0.82
* Only two rating scales were available for analysis.
Resident Performance
There
was a significant
gap between
faculty
expec-tations, as reflected in the MPLs, and resident
per-formance in all three OSCEs (Table 6). Even though
the
MPLs,
as
percentages
of
maximum
possible
scores, were not considered by the planning task
force to be excessive, a very high proportion of the
residents scored below the pass levels. The
propor-tions of residents scoring below the MPLs were
sig-nificantly different among the three resident levels,
with a greater proportion of the more advanced
res-idents failing the examination. Although the raw
scores demonstrated the expected increase with
ad-vancing levels of training (Table 1), the faculty
ex-pectations
rose
at an even
greater
rate.
There was also a large and consistent difference
between data gathering and application scores in all
3 OSCE years among all resident groups. This is
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
TABLE 6. Pooled Data
Resident Performance Based on MPLs: 3-Year
Year of No. of MPL as % of % Residents
Training Residents Maximum Score Below MPL*
I 64 48 41
2 36 57 55
3 26 68 96
* Chi-square, 23.19; P = .000.
shown in Table 7. Although there was a high
corre-lation between these two scores (the r column), the
data gathering scores were significantly higher than
the application scores (the
t
column).DISCUSSION
Our 3-year experience with pediatric OSCEs
mdi-cates that the method was feasible, albeit
labor-intensive and costly, and could be implemented with
a high degree of reliability and validity.
Further-more, this process generated enthusiastic support
and encouragement from faculty, residents, patients,
and their families, as well as from the administration.
The
focus of this study was to compare facultyexpectations with resident performance. The wide
gap that was found consistently and uniformly
dur-ing the entire experience was disconcerting to both
faculty and residents. A number of previous studies
have found similarly high failure rates in
criterion-referenced evaluations of clinical competence.6’168
Even the standard-setting physicians fell far short of
their own criteria when simulated patients were
introduced anonymously into these physicians’
practices.’9’2#{176}
Possible explanations for the discrepancy found
between faculty expectations and resident
perfor-mance in this study include the following.
Poor Caliber of the Participating Residents
This is not borne out by other indicators, such as
the results of ITE’s and monthly resident
evalua-tions. Overall success rates of board certification,
fellowship procurement, and job placement of these
and other recent graduates of this program are
equivalent to those of residents in similar training
programs. Admittedly, our residents represent a
cul-turally diverse group, for many of whom English is a
second language. This, however, is unlikely to be a
major factor in explaining the low pass levels for
three reasons: (1 ) a major criterion for admission into
this program has always been proficiency in the
English language and in the overall communication
skills of the applicants; (2) as many as one third of
our residents previously have had advanced
pediat-nc or other clinical training and would be expected
to perform better than average; and (3) if suboptimal
language skills are detriments to clinical
perfor-mance in this group of residents, the pass rates might
be expected to rise with increasing years of
expen-ence. In contrast, based on the MPLs, the pass rates
declined with advancing years.
Poor Quality of the OSCEs
The cited validity and reliability data support
ap-propriate levels of test integrity. Resident feedback
on post-OSCE surveys indicated that there were no
significant distractions. Although 5 minutes per
sta-tion may have been too short for some residents, the
majority thought that the allotted time was “about
right.” Review of video sampling of the encounters
supported this and demonstrated an orderly
pro-gression of events without significant disruptions.
Unrealistic Faculty Expectations
Alternatively, the results of this and other similar studies6’162#{176} may indicate unrealistic faculty expecta-tions that stem from the following:
Contrived Nature of the Evaluation Process. Despite all attempts to minimize it, there is always a measure
of machination in any examination. The
standard-setting process is not exempt; it concentrates on
iso-lated items and encourages overexploration.
Experi-enced physicians, however, gather data in a gestalt
context and often use short cuts to solve problems.
They pursue data gathering in a logical sequence;
responses to questions determine subsequent
ques-tions, resulting in an efficient line of investigation.
Candidates without skills in examination taking may
strive to reach a conclusion using such short cuts and may lose points in the process.
Fragmentary Evaluation of Clinical Competence. In
most programs, there is scant direct observation of
students and residents during clinical encounters
with patients. Surveys have shown that 30-90% of
internal medicine residents7 and as many as 60% of
fourth-year medical students21 had never been
ob-served by their faculty while performing complete
histories and physical examinations. Thus, there is a
continued tendency to equate cognitive skills with
clinical proficiency, despite evidence to the
con-trary?’
This
can lead not only to complacentcon-clusions but also to heightened expectations.
Incomplete Assessment of Data Processing Facility.
Even if the process of a physical examination or an
interview is observed, the assumption that skills in
data gathering are equivalent to those in data
appli-cation may not be warranted. The remarkable gap
that was shown consistently between observer
checklist scores and postencounter written scores in
all three OSCEs (Table 7) attests to this. An evaluator
simply observing a clinical examination may be
mis-led into unwarranted conclusions regarding the
ef-fective use of the data gathered. This is likely to
result in unrealistic expectations.
Teacher-oriented Educational Philosophy. Finally, the
pervasive educational philosophy of traditional
sys-tems still equates teaching with learning and holds
that information transfer confers problem-solving
skills. During the scoring of these tests, there was a
constant refrain from the faculty expressing chagrin at these “poor results,” despite all the “teaching” that had taken place.
Implications for Curriculum Development
Despite similar findings by other
investiga-tors,662#{176} the revelation of such a disturbing
discrep-ancy between faculty expectations and resident
per-formance in one’s own training program compels a
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
TABLE 7. Comparisonof Data-gathering and Application Scores*
Year Data Application rF t P
Gathering
1990 58.7 41.2 .67 13 <.0001
1991 55.8 32.1 .69 21 <.0001
1993 84.2 52.4 .76 33 <.0001
* Scores are presented as mean pooled raw scores for each test.
t Correlations between the two scores. *Paired t test comparisons of the two scores.
§t test comparisons.
comprehensive, critical evaluation of existing
educa-tional practices. The results of our OSCE made it
evident to even the most complacent of our faculty
that significant rethinking was imperative. This was
probably the greatest benefit to derive from the very
labor-intensive process of the OSCE. Subsequent
cnt-ical reappraisal of our program uncovered the
fol-lowing: (1) despite the availability of detailed,
writ-ten objectives, they were not used consistently in
designing the instructional program in each rotation; (2) the written educational objectives were not suffi-ciently specific to allow a valid measurement of their
attainment; (3) recently hired faculty members
occa-sionally disagreed with the educational objectives
that had been written by their predecessors and did
not feel compelled to use them; (4) didactic sessions,
consisting primarily of one-way communication, did
not offer the teachers feedback about the ability of
the trainees to integrate the knowledge into clinical
situations; (5) there was excessive reliance on
resi-dent reports of their clinical activities and findings,
without adequate direct observation; (6) monthly
written resident evaluation forms addressed
stan-dard areas of performance (eg, history taking,
appro-priate use of laboratory tests, clinical judgment, and
interpersonal relations), without assessing the train-ees’ attainment of the specific educational objectives.
In response to the concerns raised by the OSCE,
our program has embarked on the transition to a
competency-based curriculum. Within the
disci-plines of general and subspecialty pediatrics, specific
learning outcomes are being identified. The list of
competencies for each area is based on the skills
required for the practice of general pediatrics, as
determined by epidemiologic studies and opinion
surveys. Competencies will be expressed in specific, observable behaviors, amenable to testing by various
evaluation tools, which may include the OSCE as
well as other, more traditional, methods. A separate
resident evaluation form will be designed for each
discipline, reflecting attainment of competencies spe-cific to that area. In this manner, we hope to correct the deficiencies highlighted by the OSCE, to improve the learning experience of our residents significantly,
and to develop a curriculum that will serve as a
model for competency-based training that can be
applied to both resident and medical student
pro-grams in pediatrics and other specialties.
Conclusions
Evaluation of clinical competence is complex and
labor intensive but can yield valid and reliable data
on resident performance. The significant gap
be-tween faculty expectations and resident performance
demonstrated in this and other studies emphasizes
the need for a shift in institutional philosophy,
ob-jectives, instructional designs, and evaluation
meth-ods toward more problem-based, clinically oriented,
and learner-directed strategies. Data-gathering,
in-terpretation, problem-solving, and communication
skills need to be specifically targeted. Studies
de-signed to test the effectiveness of such measures
would be of great general interest.
ACKNOWLEDGMENTS
This work was supported by departmental funds.
We thank members of the planning task forces, the nursing staff, and the patients and their families for their enthusiasm, effort, and cooperation, which made these examinations possible. We especially acknowledge the invaluable assistance rendered by
Marjorie Chartier, both in the production of the OSCEs and in the
preparation of this manuscript.
APPENDIX
List of All Stations Developed to Date
The stations are grouped according to the task
required. For identification purposes, each station is
given a name. Approximately 60% of the stations
were used in each of the three examinations.
Physical Examination Stations, Observer Present
Hearing. Otoscopy with pneumotoscopy and a test
of hearing in a healthy adolescent.
Heart Murmur. Cardiac examination in a
5-year-old child with a small ventricular septal defect.
Duplicate patients with identical findings alternated.
Headache. Neurologic examination in a 12-year-old
girl with a history of severe headaches.
Anemia. Focused physical examination in
8-year-old
twins with anemia, jaundice, and hepatospleno-megaly.Facies. Focused physical examination in a 5-year-old girl with fetal alcohol syndrome.
Hardball. Focused physical examination in a 16-year-old with Grave’s disease.
History Stations, Observer Present
Big Foot. Twelve-year-old with a history of swol-len ankles-simulated patients.
Cholesterol, Part I. Recently discovered hypercho-lesterolemia in two siblings-real patients.
Wheezer.
Two-year-old with frequent respiratoryCesarean, Part I. History from a lying-in nurse
Technical Procedures, Obserzler Present
while an emergency Cesarean section is in progress-
R / 0 Sepsis, Prepare and perforrn a lumbar
punt-
real nurse, simulated operation.
ture with proper sterile techniques on a human im-
ALTE. Two-month-old
infant who
"stopped
munodeficiency virus-positive infant-mannequin.
breathingu-simulated
mother.
Cesarean, Part II. Set up equipment for resuscita-
Infant Check. Nine-month-old &ant with failure to
tion of a newborn about to be delivered; select or ask
thrive-real
mother.
for appropriate sizes.
Yellozu is Mellow. Newborn with jaundice-simu-
Miscellaneous
lated mother.
Chart Rezliew. Critique a medical record of a pa-
Murialz C Three-month-old infant with frequent
tient admitted for abdomillal
pain,
vomiting.
REFERENCES
Counseliiig, Obserzler Present
Will Baby Learn? Resident informs and counsels
mother whose newborn has physical characteristics
of trisomy 21-simulated
mother.
Cholesterol, Part II. Treatment advice for family
with hypercholesterolemia-real
family.
Teleplzone Managemei~t, Obseruer Present: Simulated Parent
Calls Frorn Adlotntng Room
Cell~ilar One. Distraught mother of a "colicky"
infant.
Heavy Breather. Parent of an 18-month-old with
noisy breathing. In a subsequent station, examinees
select among several upper airway films the one
consistent with this patient's presentation (see
below).
Laborutory Medicine
Urirzalysis. Slides of urinary sediment, reports on
urinalysis and culture, and films from an intrave-
nous pyelogram.
Blood Smear. Slides of hypochromic, microcytic
anemia under a microscope.
Taclzycardia. Pretreatment and posttreatment elec-
trocardiograms of tachycardia in an infant.
Pain in the Butt. Color slides of entroitus of a
3-year-old girl with burning at urination.
Bolulegs. Radiographs and results of blood and
urine tests on a patient with possible rickets.
Bellyache. Chest and abdominal radiographs of a
patient with lower lobe pneumonia presenting as
possible appendicitis.
Hear Y e . Results of tympanometry in an 18-month-
old girl.
.
Grozclflz. Four growth charts to match four case
scenarios.
Nose Kyzozus. Results of complete blood counts and
slides of a nasal smear in a patient with chronic
rhinorrhea.
1. Harden RM, Stevenson M, D o ~ v n ~ e WW, Wilson GM. Assessment of clinical competence using objective structured e\amination. Br Med J .
1975;1:447-451
2. Stillman P, Swanson D. Ensuring the clinical competence of niedlcal school graduates through standard~zed patients. Arc11 liiterii Mcri. 1987; 147:1049-1052
3. Petrusa ER, Black~rell TA, Rogers LP, Saydjari C, Parcel S, Guckian JC. An objective measure of clinical performance. Airi J MP~I 1987;83:34-42 4. Hoole AJ, Kowlow~tz V, McGaghie WC, Sloane PD, Colindres RE. Using the objective structured clinical examination at the University of North Carolina Medical School. N C Med J. 1987;48:463-467
5. Harris IB, Miller M'J. Feedback In an objective structured c l ~ n ~ c a l exam- ination by students serving as patients, teachers and examiners. Acnd Med. 1990;65:443-434
6. Vu NV, Barroxvs HS, Marcv ML, Verhulst SJ, Colliver AJ, Travis T. Slx years of comprehensive cl~nical, performance-based assessment using standardized patients at the Southern Illinois University School of Medicine. Acad Med. 1992;67:42-50
7. Stillman PL, Swanson DB, Smee S, et al. Assessing clinical skills of residents rz~ith standardized patients. Aiiii liitern Mtad. 1986;105:762-771
8. Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and valid~ty of an objective structured clinical examinatioii for dssessing the clin~cal per- formance of residents. Arch Jiiterii Mrd. 1990;150:573-577
9. van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. T~.ncli!ii~ Lcurfiirlg ,Wed. 1990;2: 58-76
10. Joorabchi B. Objective structured cliiiical examination In a pediatric residency program. Ant J
Di::
Child. 1991;145:757-76211. Nedelsky L. Absolute grading standards for objective tests. Eiilic Psycl!ol Mrizsurrmeilt. 1954;14:3-19
12. Angoft WH. Scales, norms, and equivalent scores. 111: Thorndyke RL, ed. Educntional Meas~crenieilt. Washington, DC, Ainericdn Couilcil on Education; 1971:514-515
13. Flanagan JC. Critical incident technique. Ps,~/ciiol Biill. 1954;51:327-358 14 Brennan R. Elerizrnts of Ge~rrraliznbility Tiieory. Iowa City, IA: American
College Testing Program; 1983
15. Ne\\,ble DI, Sxvanson DB. Psychometric cllaracteristlcs ui the object~ve structured clinical examination. Mcd Etiiic. 1988,22:325-334
16. Gleeson F. Defects in postgraduate clinical skills as revealed by the objective structured long examination record (OSLER) 1r Mrd i. 1992; 85:11-13
17. Cater JI, Forsyth JS, Frost GJ. The use of objective structured clinical examination as an audit of teaching and student performance. wed Tencli. 1991;13:253-257
18. Hoppe RB, Farquhar LJ, Henry R, Stoffelmayr B. Residents' att~tudes toward skills in counselling: using undetected standardized patients. J Gel1 liltern Med. 1990;5:415-420
19. Norman GR, Iieufeld VR, Walsh A, Woodwnrd CA, McConvey GA Measuriiig physicians performances by using simulated patients, Mtd Educ. 1985;60:925-934
20. Kopelo~v ML, Schnabl GK, Hassard TH, et al. Assessing practicing physicians in two settings using standardized patients. Aiiiii Med 1992; 67519-521
SfriLior, TIlree sets
of
upper
airway films ,=,ither
21. Stillman PL, Regan MB, Sxvanson DB. A diagnostic fourth-year perfor-independently or coupled with "Heavy Breather."
22. Association of American Med~cal mance assessment. Arcif li~terii ,Wed 1987;147:1981-1985 Colleges External examinations forBeat o f the Heart. Electrocardiograms with prema-
-
evaluation of medical education achievement and for Iicensure. j Medture ventricular and atrial contractions.
E ~ U C . 1981;56:933-96223. Muller S. Physicians for the 2lst century: report of the panel on the
Cough. Chest roentgenograms of a 2-year-old child
general professional education of physic~ans and college preparation forwith lobar atelectasis.
medicine.1
Med Educ. 1984;59(pt 2):11-13at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
1996;97;179
Pediatrics
Bahman Joorabchi and Jeffrey M. Devries
Evaluation of Clinical Competence: The Gap Between Expectation and Performance
Services
Updated Information &
http://pediatrics.aappublications.org/content/97/2/179
including high resolution figures, can be found at:
Permissions & Licensing
http://www.aappublications.org/site/misc/Permissions.xhtml
entirety can be found online at:
Information about reproducing this article in parts (figures, tables) or in its
Reprints
http://www.aappublications.org/site/misc/reprints.xhtml
Information about ordering reprints can be found online:
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news
1996;97;179
Pediatrics
Bahman Joorabchi and Jeffrey M. Devries
Evaluation of Clinical Competence: The Gap Between Expectation and Performance
http://pediatrics.aappublications.org/content/97/2/179
the World Wide Web at:
The online version of this article, along with updated information and services, is located on
American Academy of Pediatrics. All rights reserved. Print ISSN: 1073-0397.
American Academy of Pediatrics, 345 Park Avenue, Itasca, Illinois, 60143. Copyright © 1996 by the
been published continuously since 1948. Pediatrics is owned, published, and trademarked by the
Pediatrics is the official journal of the American Academy of Pediatrics. A monthly publication, it has
at Viet Nam:AAP Sponsored on September 1, 2020
www.aappublications.org/news