Evaluation of Clinical Competence: The Gap Between Expectation and Performance

(1)

Evaluation

of

Clinical

Competence:

The

Gap

Between

Expectation

and

Performance

Bahman Joorabchi, MD, MEd and Jeffrey M. Devries, MD, MPH

ABSTRACT. Objective. To evaluate a 3-year experience

with the Objective Structured Clinical Examinations

(OSCEs) and to compare faculty expectations with

resi-dent performance.

Design. Descriptive analysis of measures of resident performance.

Setting. Community-based pediatric residency

pro-gram in Michigan.

Participants. One hundred twenty-six pediatric

resi-dents at all levels of training.

Methods. The three examinations consisted of 36 to 42

5-minute stations, testing skills in physical examination,

history, counseling, telephone management, and test

in-terpretation. A committee of faculty and chief residents

predetermined minimum pass levels for each resident

level. Results were compared with other indices of

resi-dent performance.

Results. There was evidence for content, construct, and concurrent validity, as well as a high degree of reliability.

However, 40% to 96% of residents scored below the

mm-imum pass levels for their levels. In each examination,

third-year residents had the highest failure rates, yet they

scored well on the American Board of Pediatrics

in-train-ing examination and on their monthly clinical

evalua-tions. Furthermore, for residents at all levels, the scores

reflecting application of data were significantly lower

than those assessing data gathering.

Conclusions. The gaps between expectations and

per-formance, and between data gathering and application,

have important implications for institutional educational

philosophy, suggesting a shift toward more clinically

oriented and learner-directed strategies in the design of

instructional and evaluation methods. Pediatrics 1996;97:

179-184; evaluation of clinical competence, criterion-referenced evaluation.

ABBREVIATIONS. OSCE, Objective Structured Clinical Examina-tion; MPL, minimum pass level; ITE, in-training examination; RPR, resident performance rating.

Evaluating clinical competence is a desired, but

elusive, goal in medical education. To overcome the

problems of inadequate sampling, subjective scoring,

interrater variability, and low reliability, Harden et

aP introduced the Objective Structured Clinical

Ex-amination (OSCE) 21 years ago. The method uses

From the Departments of Pediatrics, Henry Ford Health System, Detroit, and St. Joseph Mercy Hospital, Pontiac, MI.

Recipient of The First Ray E. Helfer Award for Innovation in Pediatric Education, Ambulatory Pediatric Association, May 3, 1994.

Received for publication Nov 7,1994; accepted Mar 21, 1995. Reprint requests to (B.J.) 900 Woodward Aye, Pontiac, MI 48341-2985. PEDIATRICS (ISSN 0031 4005). Copyright © 1996 by the American Acad-emy of Pediatrics.

real or simulated patients in a multistation format

that evaluates a variety of clinical skills and attitudes, as well as cognitive objectives. In half of the stations, the examinees, provided with specific instructions, carry out clearly defined tasks, such as patient

inter-view or counseling, focused physical examination,

performance of a procedure, telephone management,

and interpretation of test results. While performing

these tasks, an observer evaluates the students using

a detailed checklist that contains a long list of all

possible actions that the students should take and

some that they should avoid. Additionally, the real

or simulated patients complete their own rating

scales evaluating communication skills and attitudes.

In the other half of the stations, the students

an-swer open-ended or multiple-choice questions based

on the results of the clinical task just completed. They

may be asked to generate a list of differential

diag-noses, to interpret clinical findings, to propose

treat-ment plans, to write admission orders, etc.

This form of examination is gaining wide

accep-tance in Europe and in the Commonwealth

coun-tries. It is used for instruction as well as for

evalua-tion. Its use in the United States has been limited to

a relatively small number of medical schools and

university residency programs.29 Among the

rea-sons for lack of widespread application in this

coun-try include the lack of a tradition for clinical

evalu-ation, historical reliance on paper-and-pencil tests as

the ultimate in objective evaluation, and the high

demand for faculty time, commitment, and expertise.

Since 1990, we have administered three OSCEs to a

total of 126 pediatric residents in a community-based

program.

The

report

of

the experience in the first

year

remains the only published pediatric example

for residents.1#{176} This report updates our experience with the OSCE, assessing its validity and reliability,

and compares faculty expectations with resident

performance.

We hypothesized that the examinations would

continue to be both valid and reliable. Based on our

observations in previous years, we further

hypothe-sized that resident performance would fall below

faculty expectations.

METHODS

The planning process, patient selection and training, prepara-tion of checklists, rating scales and test questions, orientation of residents and observers, and details of the test administration have been reported previously for the 1990 OSCE.’#{176}The examina-tions given in 1991 and 1993 were similar in format but not identical in content. A task force composed of six to eight full-time clinical pediatric faculty (both generalists and subspecialists) and

at Viet Nam:AAP Sponsored on September 1, 2020

www.aappublications.org/news

(2)

two fourth-year level chief residents created a blueprint for each examination. The selection of problems for station development was guided by the written, problem-based program objectives and was performed according to such considerations as adequate sam-pling, prevalence, priority, availability, and practicality.

A complete list of all stations used in the three examinations is given in “Appendix.” A sampling of this list, consisting of 36 to 42 5-minute stations, was used each year. Each examination com-prised four physical examination stations, six to eight interviews (including counseling and telephone management), six laboratory

stations, and one or two technical procedures. After most stations, the residents answered written questions and outlined their treat-ment plans. Additionally, six to eight rest stops were provided.

In the physical examination, history, counseling, procedures, and telephone management stations, residents performed clearly defined clinical tasks, while an observer rated their performance on a detailed checklist. The patients (or their parents) evaluated the residents’ communication skills and bedside manners on an uniform, five-item rating scale.

After production of the final version of each station, but before its administration, the planning task force used the Nedelsky’t and modified Angoff’2 methods to determine a minimum pass level

(MPL) for each OSCE. For each item in the observer checklists and

rating scales and for each short-answer or multiple-choice ques-tion and management plan, committee members agreed on the correct answers and arrived at a consensus score that a minimally competent resident must achieve to pass that particular item or question. The determination of performance level is essentially a subjective process representing faculty expectations. Every at-tempt was made to moderate this expectation, keeping it in line with program objectives. MPLs were calculated for each station and subsection of the test, as well as for the entire OSCE. MPLs were derived for the first- and third-year residents separately. An average of first- and third-year values was assigned to the second-year residents. Using these MPLs, the proportion of residents at various levels passing each OSCE was calculated. Members of the standard setting committee also rated residents during the exam-ination but in only one or two stations. In any case, the checklists were designed in such a way as to minimize subjective scoring.

Each examination was conducted on a weekend day during a 4-hour period. A clinic module large enough to accommodate up to 42 stations was used. Breakfast, lunch, and refreshments were provided. As part of their orientation, the residents were told that the purpose of the examination was to provide feedback to the faculty and to the residents on the effectiveness of clinical instruc-tion and to assist the Evaluation Committee in the disposition of borderline cases. After the examination, the residents completed a seven-item questionnaire’#{176} evaluating the experience.

All examinations were corrected by hand. Scores were derived for each station both individually and, where applicable, as cou-plets, combining the observer scores with the postencounter writ-ten results. The score for each station was standardized to a maximum score of 10. For each OSCE, separate scores were de-rived for data gathering (the sum of all observer checklists in history, counseling, telephone management, and physical exami-nation stations), application (the sum of postencounter written scores), and communication skills (the sum of rating scales com-pleted by patients). Thus, stations with patients yielded three

different scores, and those without patients produced two types of scores, all marked independently.

Scores of residents at various levels of training were compared with each other, as well as with the results of the American Board

of Pediatrics in-training examinations (ITEs) and monthly resident performance ratings (RPRs), featuring the critical incident tech-nique.t3 In this method, members of the evaluating group, which consisted of the rounding physician, the senior resident(s), and the head nurse, each provided accounts of specific behaviors (“critical incidents”) displayed by the resident during the month. Both

positive and negative incidents were recorded. Subjective assess-ments (eg, “interested” and “hard working”) were avoided. To facilitate recall, blank slips for recording and filing behaviors were made available in various locations. Following the listing of crit-ical incidents, a nine-item, Likert-type rating scale was completed. The ratings were based on group consensus, facilitated by the critical incidents just recorded. In addition to the total RPR scores, subscores for data gathering, application, and communication skills were derived. Mean ratings from the 3 months closest to the date of the OSCE were used. Only the 1993 RPRS were used, because subscores were available for only that year.

Using statistical computer programs, analysis of variance,

Pear-son’s product-moment correlations, paired t tests, and were calculated. Generalizability coefficients14 were determined in var-ious ways. A coefficient was derived for the entire test, consider-ing each station couplet score as an item. Separate coefficients were derived for application, data gathering, and communication skills by selecting only postencounter written scores, observation checklists, and patient rating scales, respectively.

Validity

RESULTS

Content validity is defined as the extent to which

inspection and analysis of the contents of an

exami-nation indicate that the stated or implied objectives

are

being measured.

Content validity was indicated by the following:

(1) faculty review of the OSCE blueprints verified a

wide sampling of content and process skills that

adequately reflected written program objectives; (2)

analysis of individual stations by the faculty

mdi-cated

that the OSCEs measured clinically important,

common, and relevant objectives; and (3) on the

post-test questionnaire, the majority of residents agreed

that the examinations were realistic and appropriate measures of clinical competence.

Construct validity is said to exist when a

hypoth-esis advanced to define an abstract concept such as

clinical competence is validated by the results of the

test. The following data demonstrate the construct

validity of these tests.

Table I shows the sum of scores for all three

OSCEs. As can be seen, residents at an advanced

level of training scored higher than those at more

junior levels. The differences among resident levels

become

even

more

significant

when

only

the

data

gathering

and information processing scores are

con-sidered. This relationship would be anticipated,

be-cause tests designed to measure clinical competence

TABLE 1. Analysis of Variance o f the Scores Group ed According to the Level of Training*

Year n Sum of All Scores

FL-I PL-2 PL-3 Ft P

1990 1991 1993 29 32 65 136.1 163.2 190.2 151.1 184.2 220.7 166.2 188.9 230.5 34.65 5.35 21.04 .000 .01 .000

* Scores are given as group means. FL-I, PL-2, PL-3 indicate first-, second-, and third-year postgraduate level, respectively; n indicates the

total number of residents in each of the examinations. t F statistic of analysis of variance.

tSignificance of the differences among the three levels of training. Student-Newman-Keuls multiple comparisons revealed significant differences among all groups, except for PL-2 and PL-3 groups in 1991 and 1993.

(3)

TABLE 2. Analysis of Variance ofthe Scores Grouped According to the Level of Training

Year Communication Skills*

FL-I PL-2 PL-3 F P

1990 12.85 14.4 15.8 2.64 .09

1991 52.6 53.4 51.8 0.09 .91

1993 65.1 75.3 73.7 2.25 .11

* Scores represent group means; 1990 scores are lower because only two completed rating scales were available. Abbreviations are as in

Table 1.

should discriminate among groups at different

stages of training.

Table 2 lists the mean group scores on

communi-cation skills as recorded by the standardized patients

(or their parents) in the history, counseling,

tele-phone

management,

and physical examination

sta-tions. As can be seen, there were no differences

among the resident groups in their social skills and

communication styles, as measured by this uniform

questionnaire. This failure to demonstrate

improve-ment might be anticipated, inasmuch as these skills

are presumably more ingrained, and their

develop-ment has not been addressed adequately in our

curriculum.

Concurrent validity is presumed when there is

agreement between the results of a given test and

those of others measuring attainment of the same

objectives. Concurrent validity is indicated by the

following data.

Table 3 indicates the results of comparisons

be-tween the American Board of Pediatrics ITE and the

OSCE.

In all three examinations, there were good

correlations between the ITE and overall OSCE

scores. The correlations were highest between the

ITE

and postencounter written stations (application),

moderate

with observer checklists (data gathering),

and

low

with communication scores. These results

also could be taken as a measure of construct

valid-ity; the OSCE is measuring areas that are not tested

by paper-and-pencil methods.

Table 4 reveals the correlations between the

monthly RPRs (and their subsections) and

corre-sponding scores of the 1993 OSCE. There were

mod-erate,

but still significant, correlations between the various subsections.

Reliability was checked by measuring

generaliz-ability coefficients.14 This is an index of the

reproduc-ibility of examinee ranking if tested by another

sim-ilar examination containing a different sample of

cases and/or examiners.

The

results of the reliability tests are shown in

Table 5. The values for the tests in their entirety are comparable to the reliability scores of standardized

paper-and-pencil examinations. The coefficients for

subscores measuring general skills of data gathering,

application, and communication are also within

ac-ceptable limits. In light of the recognized

phenome-non of poor correlations between performance in one

case with that in other cases (content specificity),6’9’15

the high values obtained in these series may be

at-tributable to the relatively large number of stations.

TABLE 3. Concurrent Validity: Correlations Between OSCE

Subscores and the American Board of Pediatrics ITE Scores

OSCE Scores ITE Scores

1990 1991 1993 Mean

Entire test Application Data gathering Communication

0.71* 0.72* 0.56* 0.45t

0.53* 059*

0.32 0.15

0.54* 0.71* 0.55* 0.15

0.59 0.67 0.47 0.25

*P < .01. t P < .05.

TABLE 4.

Subscores an

Concurrent Validity: d the Monthly RPRs*

Correlations B etween OSCE

OSCE Scores

RPRs

Overall Application Data Communi-Gathering cation

Entire test 0.39t 0.57t 0.46+ 0.4I

Application 0.28 0.58t 0.36t 0.28*

Data gathering 0.38t 0.54t 0.39t 0.42t

Communication 0.30* 0.32* 0.38t 0.32*

* Complete data available only for 1993 examination; n = 65. tP < .01.

*P < .05.

TABLE 5. Generaliz ability Coefficients

OSCE Score 1990 1991 1993

Total test 0.80 0.81 0.86

Data gathering 0.77 0.71 0.82

Application 0.63 0.73 0.67

Communication 0.26* 0.63 0.82

* Only two rating scales were available for analysis.

Resident Performance

There

was a significant

gap between

faculty

expec-tations, as reflected in the MPLs, and resident

per-formance in all three OSCEs (Table 6). Even though

the

MPLs,

as

percentages

of

maximum

possible

scores, were not considered by the planning task

force to be excessive, a very high proportion of the

residents scored below the pass levels. The

propor-tions of residents scoring below the MPLs were

sig-nificantly different among the three resident levels,

with a greater proportion of the more advanced

res-idents failing the examination. Although the raw

scores demonstrated the expected increase with

ad-vancing levels of training (Table 1), the faculty

ex-pectations

rose

at an even

greater

rate.

There was also a large and consistent difference

between data gathering and application scores in all

3 OSCE years among all resident groups. This is

(4)

TABLE 6. Pooled Data

Resident Performance Based on MPLs: 3-Year

Year of No. of MPL as % of % Residents

Training Residents Maximum Score Below MPL*

I 64 48 41

2 36 57 55

3 26 68 96

* Chi-square, 23.19; P = .000.

shown in Table 7. Although there was a high

corre-lation between these two scores (the r column), the

data gathering scores were significantly higher than

the application scores (the

t

column).

DISCUSSION

Our 3-year experience with pediatric OSCEs

mdi-cates that the method was feasible, albeit

labor-intensive and costly, and could be implemented with

a high degree of reliability and validity.

Further-more, this process generated enthusiastic support

and encouragement from faculty, residents, patients,

and their families, as well as from the administration.

The

focus of this study was to compare faculty

expectations with resident performance. The wide

gap that was found consistently and uniformly

dur-ing the entire experience was disconcerting to both

faculty and residents. A number of previous studies

have found similarly high failure rates in

criterion-referenced evaluations of clinical competence.6’168

Even the standard-setting physicians fell far short of

their own criteria when simulated patients were

introduced anonymously into these physicians’

practices.’9’2#{176}

Possible explanations for the discrepancy found

between faculty expectations and resident

perfor-mance in this study include the following.

Poor Caliber of the Participating Residents

This is not borne out by other indicators, such as

the results of ITE’s and monthly resident

evalua-tions. Overall success rates of board certification,

fellowship procurement, and job placement of these

and other recent graduates of this program are

equivalent to those of residents in similar training

programs. Admittedly, our residents represent a

cul-turally diverse group, for many of whom English is a

second language. This, however, is unlikely to be a

major factor in explaining the low pass levels for

three reasons: (1 ) a major criterion for admission into

this program has always been proficiency in the

English language and in the overall communication

skills of the applicants; (2) as many as one third of

our residents previously have had advanced

pediat-nc or other clinical training and would be expected

to perform better than average; and (3) if suboptimal

language skills are detriments to clinical

perfor-mance in this group of residents, the pass rates might

be expected to rise with increasing years of

expen-ence. In contrast, based on the MPLs, the pass rates

declined with advancing years.

Poor Quality of the OSCEs

The cited validity and reliability data support

ap-propriate levels of test integrity. Resident feedback

on post-OSCE surveys indicated that there were no

significant distractions. Although 5 minutes per

sta-tion may have been too short for some residents, the

majority thought that the allotted time was “about

right.” Review of video sampling of the encounters

supported this and demonstrated an orderly

pro-gression of events without significant disruptions.

Unrealistic Faculty Expectations

Alternatively, the results of this and other similar studies6’162#{176} may indicate unrealistic faculty expecta-tions that stem from the following:

Contrived Nature of the Evaluation Process. Despite all attempts to minimize it, there is always a measure

of machination in any examination. The

standard-setting process is not exempt; it concentrates on

iso-lated items and encourages overexploration.

Experi-enced physicians, however, gather data in a gestalt

context and often use short cuts to solve problems.

They pursue data gathering in a logical sequence;

responses to questions determine subsequent

ques-tions, resulting in an efficient line of investigation.

Candidates without skills in examination taking may

strive to reach a conclusion using such short cuts and may lose points in the process.

Fragmentary Evaluation of Clinical Competence. In

most programs, there is scant direct observation of

students and residents during clinical encounters

with patients. Surveys have shown that 30-90% of

internal medicine residents7 and as many as 60% of

fourth-year medical students21 had never been

ob-served by their faculty while performing complete

histories and physical examinations. Thus, there is a

continued tendency to equate cognitive skills with

clinical proficiency, despite evidence to the

con-trary?’

This

can lead not only to complacent

con-clusions but also to heightened expectations.

Incomplete Assessment of Data Processing Facility.

Even if the process of a physical examination or an

interview is observed, the assumption that skills in

data gathering are equivalent to those in data

appli-cation may not be warranted. The remarkable gap

that was shown consistently between observer

checklist scores and postencounter written scores in

all three OSCEs (Table 7) attests to this. An evaluator

simply observing a clinical examination may be

mis-led into unwarranted conclusions regarding the

ef-fective use of the data gathered. This is likely to

result in unrealistic expectations.

Teacher-oriented Educational Philosophy. Finally, the

pervasive educational philosophy of traditional

sys-tems still equates teaching with learning and holds

that information transfer confers problem-solving

skills. During the scoring of these tests, there was a

constant refrain from the faculty expressing chagrin at these “poor results,” despite all the “teaching” that had taken place.

Implications for Curriculum Development

Despite similar findings by other

investiga-tors,662#{176} the revelation of such a disturbing

discrep-ancy between faculty expectations and resident

per-formance in one’s own training program compels a

(5)

TABLE 7. Comparisonof Data-gathering and Application Scores*

Year Data Application rF t _P

Gathering

1990 58.7 41.2 .67 13 <.0001

1991 55.8 32.1 .69 21 <.0001

1993 84.2 52.4 .76 33 <.0001

* Scores are presented as mean pooled raw scores for each test.

t Correlations between the two scores. *Paired t test comparisons of the two scores.

§t test comparisons.

comprehensive, critical evaluation of existing

educa-tional practices. The results of our OSCE made it

evident to even the most complacent of our faculty

that significant rethinking was imperative. This was

probably the greatest benefit to derive from the very

labor-intensive process of the OSCE. Subsequent

cnt-ical reappraisal of our program uncovered the

fol-lowing: (1) despite the availability of detailed,

writ-ten objectives, they were not used consistently in

designing the instructional program in each rotation; (2) the written educational objectives were not suffi-ciently specific to allow a valid measurement of their

attainment; (3) recently hired faculty members

occa-sionally disagreed with the educational objectives

that had been written by their predecessors and did

not feel compelled to use them; (4) didactic sessions,

consisting primarily of one-way communication, did

not offer the teachers feedback about the ability of

the trainees to integrate the knowledge into clinical

situations; (5) there was excessive reliance on

resi-dent reports of their clinical activities and findings,

without adequate direct observation; (6) monthly

written resident evaluation forms addressed

stan-dard areas of performance (eg, history taking,

appro-priate use of laboratory tests, clinical judgment, and

interpersonal relations), without assessing the train-ees’ attainment of the specific educational objectives.

In response to the concerns raised by the OSCE,

our program has embarked on the transition to a

competency-based curriculum. Within the

disci-plines of general and subspecialty pediatrics, specific

learning outcomes are being identified. The list of

competencies for each area is based on the skills

required for the practice of general pediatrics, as

determined by epidemiologic studies and opinion

surveys. Competencies will be expressed in specific, observable behaviors, amenable to testing by various

evaluation tools, which may include the OSCE as

well as other, more traditional, methods. A separate

resident evaluation form will be designed for each

discipline, reflecting attainment of competencies spe-cific to that area. In this manner, we hope to correct the deficiencies highlighted by the OSCE, to improve the learning experience of our residents significantly,

and to develop a curriculum that will serve as a

model for competency-based training that can be

applied to both resident and medical student

pro-grams in pediatrics and other specialties.

Conclusions

Evaluation of clinical competence is complex and

labor intensive but can yield valid and reliable data

on resident performance. The significant gap

be-tween faculty expectations and resident performance

demonstrated in this and other studies emphasizes

the need for a shift in institutional philosophy,

ob-jectives, instructional designs, and evaluation

meth-ods toward more problem-based, clinically oriented,

and learner-directed strategies. Data-gathering,

in-terpretation, problem-solving, and communication

skills need to be specifically targeted. Studies

de-signed to test the effectiveness of such measures

would be of great general interest.

ACKNOWLEDGMENTS

This work was supported by departmental funds.

We thank members of the planning task forces, the nursing staff, and the patients and their families for their enthusiasm, effort, and cooperation, which made these examinations possible. We especially acknowledge the invaluable assistance rendered by

Marjorie Chartier, both in the production of the OSCEs and in the

preparation of this manuscript.

APPENDIX

List of All Stations Developed to Date

The stations are grouped according to the task

required. For identification purposes, each station is

given a name. Approximately 60% of the stations

were used in each of the three examinations.

Physical Examination Stations, Observer Present

Hearing. Otoscopy with pneumotoscopy and a test

of hearing in a healthy adolescent.

Heart Murmur. Cardiac examination in a

5-year-old child with a small ventricular septal defect.

Duplicate patients with identical findings alternated.

Headache. Neurologic examination in a 12-year-old

girl with a history of severe headaches.

Anemia. Focused physical examination in

8-year-old

twins with anemia, jaundice, and hepatospleno-megaly.

Facies. Focused physical examination in a 5-year-old girl with fetal alcohol syndrome.

Hardball. Focused physical examination in a 16-year-old with Grave’s disease.

History Stations, Observer Present

Big Foot. Twelve-year-old with a history of swol-len ankles-simulated patients.

Cholesterol, Part I. Recently discovered hypercho-lesterolemia in two siblings-real patients.

Wheezer.

Two-year-old with frequent respiratory

(6)

Cesarean, Part I. History from a lying-in nurse

Technical Procedures, Obserzler Present

while an emergency Cesarean section is in progress-

R / 0 Sepsis, Prepare and perforrn a lumbar

punt-

real nurse, simulated operation.

ture with proper sterile techniques on a human im-

ALTE. Two-month-old

infant who

"stopped

munodeficiency virus-positive infant-mannequin.

breathingu-simulated

mother.

Cesarean, Part II. Set up equipment for resuscita-

Infant Check. Nine-month-old &ant with failure to

tion of a newborn about to be delivered; select or ask

thrive-real

mother.

for appropriate sizes.

Yellozu is Mellow. Newborn with jaundice-simu-

Miscellaneous

lated mother.

Chart Rezliew. Critique a medical record of a pa-

Murialz C Three-month-old infant with frequent

tient admitted for abdomillal

pain,

vomiting.

REFERENCES

Counseliiig, Obserzler Present

Will Baby Learn? Resident informs and counsels

mother whose newborn has physical characteristics

of trisomy 21-simulated

mother.

Cholesterol, Part II. Treatment advice for family

with hypercholesterolemia-real

family.

Teleplzone Managemei~t, Obseruer Present: Simulated Parent

Calls Frorn Adlotntng Room

Cell~ilar One. Distraught mother of a "colicky"

infant.

Heavy Breather. Parent of an 18-month-old with

noisy breathing. In a subsequent station, examinees

select among several upper airway films the one

consistent with this patient's presentation (see

below).

Laborutory Medicine

Urirzalysis. Slides of urinary sediment, reports on

urinalysis and culture, and films from an intrave-

nous pyelogram.

Blood Smear. Slides of hypochromic, microcytic

anemia under a microscope.

Taclzycardia. Pretreatment and posttreatment elec-

trocardiograms of tachycardia in an infant.

Pain in the Butt. Color slides of entroitus of a

3-year-old girl with burning at urination.

Bolulegs. Radiographs and results of blood and

urine tests on a patient with possible rickets.

Bellyache. Chest and abdominal radiographs of a

patient with lower lobe pneumonia presenting as

possible appendicitis.

Hear Y e . Results of tympanometry in an 18-month-

old girl.

.

Grozclflz. Four growth charts to match four case

scenarios.

Nose Kyzozus. Results of complete blood counts and

slides of a nasal smear in a patient with chronic

rhinorrhea.

1. Harden RM, Stevenson M, D o ~ v n ~ e WW, Wilson GM. Assessment of clinical competence using objective structured e\amination. Br Med J .

1975;1:447-451

2. Stillman P, Swanson D. Ensuring the clinical competence of niedlcal school graduates through standard~zed patients. Arc11 liiterii Mcri. 1987; 147:1049-1052

3. Petrusa ER, Black~rell TA, Rogers LP, Saydjari C, Parcel S, Guckian JC. An objective measure of clinical performance. Airi J MP~I 1987;83:34-42 4. Hoole AJ, Kowlow~tz V, McGaghie WC, Sloane PD, Colindres RE. Using the objective structured clinical examination at the University of North Carolina Medical School. N C Med J. 1987;48:463-467

5. Harris IB, Miller M'J. Feedback In an objective structured c l ~ n ~ c a l examination by students serving as patients, teachers and examiners. Acnd Med. 1990;65:443-434

6. Vu NV, Barroxvs HS, Marcv ML, Verhulst SJ, Colliver AJ, Travis T. Slx years of comprehensive cl~nical, performance-based assessment using standardized patients at the Southern Illinois University School of Medicine. Acad Med. 1992;67:42-50

7. Stillman PL, Swanson DB, Smee S, et al. Assessing clinical skills of residents rz~ith standardized patients. Aiiii liitern Mtad. 1986;105:762-771

8. Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and valid~ty of an objective structured clinical examinatioii for dssessing the clin~cal per- formance of residents. Arch Jiiterii Mrd. 1990;150:573-577

9. van der Vleuten CPM, Swanson DB. Assessment of clinical skills with standardized patients: state of the art. T~.ncli!ii~ Lcurfiirlg ,Wed. 1990;2: 58-76

10. Joorabchi B. Objective structured cliiiical examination In a pediatric residency program. Ant J

Di::

Child. 1991;145:757-762

11. Nedelsky L. Absolute grading standards for objective tests. Eiilic Psycl!ol Mrizsurrmeilt. 1954;14:3-19

12. Angoft WH. Scales, norms, and equivalent scores. 111: Thorndyke RL, ed. Educntional Meas~crenieilt. Washington, DC, Ainericdn Couilcil on Education; 1971:514-515

13. Flanagan JC. Critical incident technique. Ps,~/ciiol Biill. 1954;51:327-358 14 Brennan R. Elerizrnts of Ge~rrraliznbility Tiieory. Iowa City, IA: American

College Testing Program; 1983

15. Ne\\,ble DI, Sxvanson DB. Psychometric cllaracteristlcs ui the object~ve structured clinical examination. Mcd Etiiic. 1988,22:325-334

16. Gleeson F. Defects in postgraduate clinical skills as revealed by the objective structured long examination record (OSLER) 1r Mrd i. 1992; 85:11-13

17. Cater JI, Forsyth JS, Frost GJ. The use of objective structured clinical examination as an audit of teaching and student performance. wed Tencli. 1991;13:253-257

18. Hoppe RB, Farquhar LJ, Henry R, Stoffelmayr B. Residents' att~tudes toward skills in counselling: using undetected standardized patients. J Gel1 liltern Med. 1990;5:415-420

19. Norman GR, Iieufeld VR, Walsh A, Woodwnrd CA, McConvey GA Measuriiig physicians performances by using simulated patients, Mtd Educ. 1985;60:925-934

20. Kopelo~v ML, Schnabl GK, Hassard TH, et al. Assessing practicing physicians in two settings using standardized patients. Aiiiii Med 1992; 67519-521

SfriLior, TIlree sets

of

upper

airway films ,=,ither

21. Stillman PL, Regan MB, Sxvanson DB. A diagnostic fourth-year perfor-

independently or coupled with "Heavy Breather."

_{22. Association of American Med~cal}mance assessment. Arcif li~terii ,Wed 1987;147:1981-1985 _{Colleges External examinations for}

Beat o f the Heart. Electrocardiograms with prema-

-

evaluation of medical education achievement and for Iicensure. j Med

ture ventricular and atrial contractions.

E ~ U C . 1981;56:933-962

23. Muller S. Physicians for the 2lst century: report of the panel on the

Cough. Chest roentgenograms of a 2-year-old child

_{general professional education of physic~ans}_{and college preparation for}

with lobar atelectasis.

medicine.

1

Med Educ. 1984;59(pt 2):11-13

(7)

1996;97;179

Pediatrics

Bahman Joorabchi and Jeffrey M. Devries

Evaluation of Clinical Competence: The Gap Between Expectation and Performance

Services

Updated Information &

http://pediatrics.aappublications.org/content/97/2/179

including high resolution figures, can be found at:

Permissions & Licensing

http://www.aappublications.org/site/misc/Permissions.xhtml

entirety can be found online at:

Information about reproducing this article in parts (figures, tables) or in its

Reprints

http://www.aappublications.org/site/misc/reprints.xhtml

Information about ordering reprints can be found online:

(8)

1996;97;179

Pediatrics

Bahman Joorabchi and Jeffrey M. Devries

Evaluation of Clinical Competence: The Gap Between Expectation and Performance

the World Wide Web at:

The online version of this article, along with updated information and services, is located on

been published continuously since 1948. Pediatrics is owned, published, and trademarked by the

Pediatrics is the official journal of the American Academy of Pediatrics. A monthly publication, it has

http://www.aappublications.org/site/misc/Permissions.xhtml

http://www.aappublications.org/site/misc/reprints.xhtml