Structured
ImplicitReview
of
the
Medical
Record:
A
Method
for
Measuring
the
Quality
of
Inhospital
Medical
Care
and
a
Summary
of
Quality
Changes
Following
Implementation
of
the
Medicare
Prospective
Payment System
Lisa
V.Rubenstein, Katherine
L.Kahn,
Ellen R.
Harrison,
Marjorie
J.Sherwood,
The
research described in this reportwas
sponsoredby
theHealth
Care
Financing
Administration, U.S.Department
ofHealth
and
Human
Servicesunder
CooperativeAgreement
No. 18-C-98853/9-03.The
RAND
Publication Series:The
Report is the principal publicationdoc-umenting
and
transmittingRAND's
major research findingsand
final researchresults.
The
RAND
Note
reports other outputs of sponsored research forgeneral distribution. Publications of
RAND
do not necessarily reflect theopin-ions or policies ofthe sponsors of
RAND
research.Published 1991 by
RAND
A
RAND
NOTE
N-3033-HCFA
Structured
Implicit
Review
of
the
Medical
Record:
A
Method
for
Measuring
the
Quality
of
Inhospital
Medical
Care
and
a
Summary
of
Quality
Changes
Following
Implementation
of
the
Medicare
Prospective
Payment System
Lisa
V.
Rubenstein, Katherine
L.Kahn,
Ellen
R.Harrison,
Marjorie
J.Sherwood,
William
H.
Rogers,
Robert
H.
Brook
Prepared
for
the
Health
Care
Financing Administration,
U.S.
Department
of
Health
and
Human
Services
-111-PREFACE
As
a cost controlmeasure, the federalgovernment
established aprospectivepayment
system(PPS) in 1983 to reimburse hospitals forinhospitalcare ofMedicarepatients.
PPS
changed theway
Medicare reimbursed hospitalsfrom
acost or chargebasis to a prospectively determined, fixed-price systemin
which
hospitals are paid accordingto the diagnosis related group(DRG)
intowhich
a patient isclassified.The
fixed
payment
perpatient providesfinancial incentives for hospitals to reduce bothlength ofstay and intensity ofcare, givingrise toconcerns that the quality ofcaremay
havedeclined underPPS.
In October 1985,
The
RAND
Corporationbegan
a study, fundedby
the Health Care FinancingAdministration(HCFA),
to evaluate theimpact oftheDRG-based
PPS
system.
The
projectwas
undertaken in collaborationwith theProfessionalReview
Organizations
(PROs)
in five states.The
objective ofthisfour-year studywas
todetermine whether
PPS
cost-containment efforts have affected the quality of care receivedby
hospitalized Medicare patients.Six diseases or conditions
from
among
themost
common
andmost
seriousadmittingdiagnoses ofelderly Medicare patients
were
selected for study: congestiveheart failure, acute myocardial infarction, hipfracture, pneumonia, cerebrovascular
accident,
and
depression. Well-defined diagnosticcriteria existfor these diseases, andtheir treatment
was
fairly stablefrom
1981 to 1986. Inaddition, forthese diseases, thepatient's medical record contains theinformation necessary to assess patientsickness at the time ofhospital admission and the quality ofhospital care.
Hip
fracture anddepression
were
chosen to provide dataabout care for surgical and psychiatric conditions,to supplement the information providedby
the four medicalconditions.Five states
were
selected for study, eachfrom
a different geographic region ofthenation: California, Florida, Indiana, Pennsylvania, and Texas.
Each
statewas
dividedinto 10to 12 study areas, anddata
were
gatheredfrom
a sample of four to eight areas ineach state, for atotal of
30
areas.
-iv-1986).
Twenty
percent ofthe totalpatient samplewas
collected in 1981,30
percent in1982, and
50
percent in the post-PPS time period.The
sampleis representative ofpatients nationally withrespect to thepercentage ofhospitalsinurban areas, their size, thenumber
ofhouse staffper hospitalbed, and thepercentage ofMedicare admissions. City/countyhospitals and hospitalswith
many
Medicaid patientswere
oversampled to better estimate theeffect ofPPS
on
poorpatients.Of
the305
hospitalsasked to participate,297
actually did so.The
planned studysample
was
17,000 patients.To
collect data on thisnumber
ofpatients, 21,925 medicalrecords
were
selected forreview. During medical record review, 5,167patients did notmeet
clinical criteria forhaving the selected diseases (even though the recordwas
coded
as such), and these
were
excludedfrom
the study. Hospitalswere
unable tolocate870
records.
The
final study sampleconsisted of 16,758 patients.Both
explicitand implicit measureswere
used to assess qualityofcare. Explicitmeasures usepreset criteria to evaluate each patient's care. Implicitmeasures use
physician
judgment
to assign quality ofcare rating.A
10percent subsample ofthe entirestudy sample ofmedical records underwent implicit review.
The
sample ofmedicalrecords receiving implicitreview
was
evaluatedby
physician specialistsfrom
each ofthefive study states.
Disease-specific abstraction forms
were
developed to collect explicit datafrom
the patient's medicalrecord about sickness at admission, process ofcare, inhospitaloutcomes, and patient status at discharge.
These
measures havebeen
reported elsewhere.In developing disease-specific instruments for implicitlymeasuring quality of
care, first the literature
on
qualityofcare for patients hospitalized with each disease andtheliterature
on
how
todo
implicit reviewwere
reviewed. Then, PRO-selectedphysicians
from
the five study stateswere
invitedto participate in six disease-specificpanel meetings todiscuss and test implicitmeasures ofquality ofcare. Physicians
were
asked to detail the problemsthey had previously experienced in implicitly reviewingrecords for purposes ofinhospital assessment (e.g., morbidity and mortality conferences)
or
PRO
review. In addition, advicewas
sought about theway
medical records mightdiffer
by
type ofhospital orgeographic area so that the implicitreviewform
formeasuring quality of care couldbe developed foruse in allkinds of acute care general
-V
Kahn,
Katherine L., LisaV. Rubenstein, Marjorie J.Sherwood,
andRobert H. Brook, StructuredImplicit
Review
forPhysicianMeasurement
ofQualityof Care:Development
oftheForm
and
Guidelines forIts Use,The
RAND
Corporation,N-3016-HCFA,
1989
-vii-SUMMARY
To
measure
the qualityof medical care before and afterimplementation oftheprospective
payment
system (PPS)by
Medicare,we
developedmethods
forbothimplicitand explicitreview ofthe medical record. Explicitreview relies
upon
measurement
ofthedegree to
which
care accords with previously specified criteria.To
perform implicitreview, reviewers use their
own
unspecified criteria tojudge care.Our
method
ofperforming implicitreview
was
structured, requiring the reviewing physicians to analyzespecified parts ofthemedical record and to assess theirquality using arating scale.
Each
implicit
judgment
was
thus recorded as anitemon
the structured implicitreview form.Items couldthenbe
combined
intolargerscales.We
applied ourstructured implicitreview (SIR) toasample of 1,366
Medicare
patients withcongestive heartfailure, acutemyocardial infarction, pneumonia,cerebrovascularaccident, or hipfracture
who
were
hospitalized in
1981-1982
or 1985-1986.The
implicitreview samplewas
arandom
subsetofthe 14,012
Medicare
patients included inthePPS
study.The
demographic
characteristics ofpatients in the implicitreview sample
were
similarto thoseofthe largersample from
which
itwas
drawn.To
determine the performance characteristics ofstructured implicitreview,we
evaluated interitem and interraterreliability. Interitem reliabilities for scaleswere
between
0.8and
0.9.A
randomly
selected sample of about 25 percentofall recordswas
reviewed
by two
reviewers, and a 3 percentsamplewas
reviewedby
fivereviewers.We
found asignificantreviewer effect, with
some
reviewersjudgingcaremore
harshly andsome more
leniently. After adjusting for thereviewereffect, interrater reliabilities for scaleswere between
0.4 and0.7 and thuswere
adequate forcomparing
groups ofpeople.At
thislevelofreliability,however, using theSpearman-Brown
Prophesy Formula,records
would
have tobe reviewedby
fourto seven reviewers to be 95 percent certainthatthe care delivered
was
poororvery poor. In contrast, onlyone
reviewerwould
beneeded
to be sure that carewas
eithergood
orvery good. Despite the limitations inreliability ofthe structured implicitreview method,
we
found thatverypoor
qualityof
-viii-When
we
evaluated the quality ofcare before and afterPPS,
we
found thatthe quality ofmedical careimproved between 1981-1982
and 1985-1986. Twenty-fivepercent ofpatientsreceived pooror very poorcare duringthe earlierperiod, whereas only 12 percent did inthelaterperiod.
More
patients, however,were
judged tohavebeen
discharged too soon and inunstable condition duringthe laterperiod (7 percentcompared
with4
percent). Thus,except for discharge planningprocesses, thequality of
-ix-CONTENTS
PREFACE
iiiSUMMARY
. viiTABLES
xi Section 1.INTRODUCTION
1 Purpose 1Background
2 2.METHODS
5 PatientSample
5Performing Implicit
Review
5Structured Implicit
Review
Form
and Guidelines 9Additional Data Sources 13
Statistical
Methods
13Calculating the
Number
ofReviewersNeeded
163.
RESULTS
PART
I:PERFORMANCE
OF THE
SIR
METHOD
18Patient
Sample
18Implicit
Review
Quality Scales 19Reviewer
Effect 19Overall Quality Scale 19
Reliability
20
Components
of Variance in QualityScores 21Linking Process to
Outcome
22
Comparison
Between
Implicit and ExplicitReviews
25Number
ofReviewersNeeded
to Identify Poor Care 264.
RESULTS
PART
II:CHANGES
IN
QUALITY
OF
CARE
BETWEEN
1981-1982
AND
1985-1986
28
5.
DISCUSSION
31Appendix
A.
STRUCTURED
IMPLICIT
REVIEW
QUESTIONNAIRE
FOR
CONGESTIVE
HEART
FAILURE,
ACUTE MYOCARDIAL
INFARCTION
AND
PNEUMONIA
35B.
STRUCTURED
IMPLICIT
REVIEW
QUESTIONNAIRE
FOR
CEREBROVASCULAR
ACCIDENT
41C.
STRUCTURED
IMPLICIT
REVIEW
QUESTIONNAIRE
FOR
HIP
FRACTURE
45D.
COMPOSITION OF
IMPLICIT
REVIEW
SCALES
ON
THE
SIR
FORM
ITEMS
ON
COMPOSING
SCALES
FOR
EACH
DISEASE
47
E.
RELIABILITY
OF
IMPLICIT
REVIEW
ITEMS
48
-xi-TABLES
1.
The
inpatient hospitalization: aspects evaluatedby
implicitreviewby
SIR
questionnumber
122. Results: patient sample 18
3. Correlations ofindividual implicitreview scales and items with overall
quality ofcare scale 20
4. Reliability ofimplicitreview for five diseases 21
5. Percentage of variance inreviewerjudgments accounted for
by
threescores of variance 22
6. Predicting inhospital death, 30-day death, and 180-day death using quality
ofcare scores
from
implicitreviews forfive diseasescombined
237. Percentage ofpatients
who
died within 30 days ofhospitalizationdepending
upon
the overall quality ofcare received and the level ofsickness at admission 24
8. Percentage ofpatients discharged alive
who
died outside ofthe hospitalwithin 180 days
by
overall quality ofCare Received andthe level ofsickness at admission 24
9.
Comparison between
implicitand explicitprocess scale scores forfivediseases 25
10.
Number
of reviewers needed to identify very poor care orgood
care27
11.
The
quality ofmedical care inthe United States for five diseases asmeasured
by
implicitreview 2812. Percentage ofpatientswith poor orvery poor care
by
study period forfive diseases 29
13.
Changes
in the percentage ofpatientsjudged by reviewers tohaveinappropriate lengthofstay for fivediseases: pre-PPS (1981-82)
1.
INTRODUCTION
The
early entry ofthe medical profession into quality review strongly influencedthe subsequentevolution of
American
medicine(Codman,
n.d.). Qualityreviewcontinues tobe amajor feature ofmedical practice today.
The
two
most
common
qualityassessmentand assurance
methods
incurrent use are themeasurement
ofspecificstructuralcharacteristics ofhealthcare institutions,such asinfection controlprocedures,
and the
measurement
ofthe process ofmedical care using implicit review ofthe medicalrecord. Implicit review depends
upon
a practitioner's expertopinion to develop a qualityrating; explicit review, in contrast, usespreset criteria tojudge care. Despite itsactive
role inmedical care, the
measurement
characteristics ofimplicitreview have notbeen
extensively studied.
In addition to its importance as a
common
method
ofquality review, implicitreview ofthe medical record has the added significance ofbeing the current
community
gold standard for performing peer review. For example, implicit review is used as the basis for any final actions taken basedon quality reviews, such as denial of
payment
orcensure ofa health care provider.
Although
eitherimplicitor explicitreview can beperformed
by
observing physicians and patients, for practical reasons the medicalrecord hasbeen themost
common
data source for quality assessment (Petersen andAndrews,
1956). Thus,
when
ProfessionalReview
Organizations (PROs), hospital qualityassurance committees, and insurance
companies
perform qualityreviews, physicians using implicitpeer review ofthe medical record are always the finaljudges ofcare,althoughmedical records for review are often selected
by
nurse reviewers using explicitscreening criteria.
PURPOSE
The
purpose ofthis studywas
toimprove
implicitreview, to determine thereliability andvalidity ofthe
improved
method, and then to use itto evaluate changesinthe quality ofmedical care for Medicare patients inthe United States
between
1981 and1986.
We
report aswellupon
the relationshipbetween judgments
ofqualitybasedon
training and reviewprotocols that
were
specificenough
to ensure thatphysicianswere
basing theirjudgments on the
same
assumptions and information.We
wanted
disagreements about care to reflect true differences of opinion
among
reviewers ratherthan differences in the
way
the recordwas
reviewed or inhow
differentcomponents
ofquality
were
weightedin arriving ata finaljudgment.We
have called ourmethod
Structured Implicit
Review
(SIR) (Rubenstein et al., 1990).BACKGROUND
Researchers in the late 1960s and early 1970s developed considerable experience
with inpatient implicitmedical record review.
Using
physicians withspecial experiencein the clinical condition under study, earlyresearchers found
90
percent agreementamong
reviewers, with an additional 7to 8 percent ofdisagreements resolved afterintrareviewer consultation
(Morehead
et al., 1962; Rosenfeld, 1957). Subsequent studieswere
less optimistic. After review of 1,258obstetrical recordsby
three independentreviewers, Richardson (1972) found thatone-third ofthe time one reviewer disagreed
with the other
two
about the adequacy ofthe care delivered.To
facilitate agreementamong
more
diversegroups of reviewers, and to enable hospitalsto review and educatetheirphysicians, Fine and
Morehead
(1971), under the auspices ofthe Medical Society ofthe State of
New
York, developed aseries ofimplicit review forms forreviewers tofillin, termed "review patterns." Patterns
were
developed for acute appendicitis, fibroid uterus, prostatic hypertrophy, acute myocardial infarction, diabetes mellitus, andcerebralvascularaccident.
These
review patternsseemed
to facilitate agreementamong
reviewers, althoughformal analysisofagreement
was
not done.Richardson (1972) performed a
more
rigorousmethodological study ofimplicitreview. Richardson's review included
some
frankly explicit criteria, aswell as animplicit Overall Qualityjudgment. In hisstudy, 81 university-based physician specialists
(obstetricians, pediatricians, orsurgeons)
were
required tojudgecare for patientswithtoxemiaofpregnancy, placenta previa, and abruptio placentae; live neonates born to
these patients; gallbladder orbiliary tract operations; and infantile diarrhea.
Twenty-one
obstetrical charts, 10 surgical charts, and 10pediatric charts
were
scoredby
all reviewersin the relevant specialty. Richardson found that
some
reviewerswere
more
likely thanothers tojudge cases asunsatisfactory; that different aspects of care affected the final
-3-subtracting thatpart ofthe agreement that
was
due to chance(Kappa
interraterreliability),
was
0.398 for obstetrical cases, 0.457 for surgical cases, and 0.578 forpediatric cases.
Using
theseresults, he calculated, usingtheSpearman-Brown
predictionequation, that
between
16 and 28 judgeswould
benecessary tojudge an individual casewith areliabilityof0.95, the level considered acceptable to clinicians
whose
caseswere
the subjectof review. Peters (1972) reanalyzed Richardson's data and concluded that
although cases about
which
reviewers disagreed could notbe easilyjudged,unanimously
unsatisfactory cases could besuccessfully identified,without erroneously includinga large
number
ofsatisfactory cases.Brook
(1973) published a detailed analysisofthe qualityof care given to296
outpatientswithurinary tract infection, hypertension, or gastric or duodenal ulcer.
He
reviewed detailed case abstracts.
Ten
internistsreviewed one-third ofthe296
caseabstracts; each abstract
was
reviewedby
three reviewers. Caseswere
supplementedby
information aboutthe patient's medical
outcome
atfive months. Caseswere
judged interms ofadequacy ofthemedical care process, unprovability or unimprovability ofthe
outcome, and acceptability ofthe medical care overall.
He
foundthat all threephysicians agreed thatcare
was
acceptable in 11 percent of cases and unacceptable in 43percent. Physicianpair agreements ranged
from
53 to 95percent for processofcare, 53to
100
percentfor outcome, and37
to94
percent for quality ofcare.Kappas on
physician pairs, calculated subsequently
by
Koran
(1975a), averaged 0.29.The
interrater reliabilities reported in these studies appear lower thanwe
mighthope, but these reliabilities are not dramatically different
from
those obtainedby
comparing
variousfeatures ofthe actual clinicalexam,
or interpretinglaboratory testssuch as cardiograms.
Koran
(1975b)systematically reviewed levels ofagreement aboutthese tests as reported in a
wide
range ofstudies.Agreement
about thepresence orabsence ofthe dorsalis pedispulse
was
39
percent(Meade
et al., 1968; Light, 1971); aboutpresence or absence ofvariceson
esophagoscopy,46
percent(Conn
et al., 1965);and whether an electrocardiogram
(EKG)
was
normal or abnormal,84
percent (Acheson,1960).
Kappas
for these levels ofagreementwould
be lower still.In a
more
recent study,Dubois
et al. (1987a, 1987b) used bothimplicitandexplicit review to studythe quality ofcare delivered to patients
who
died in hospitalwithacute myocardialinfarction, pneumonia, or stroke. Explicit criteria used
were
taken inRubenstein et al., 1988;
Sherwood
et al., 1988). Implicit reviewswere
performedon
dictated abstracts of the medical record prepared
by
one ofthe investigators. Threereviewers reviewed all cases. Interrater reliabilities (by
ANOVA)
forlabeling a death aspreventable
were
0.55, 0.51, and 0.11 forcerebrovascular accident, myocardialinfarction, and pneumonia, respectively.
To
testfor stability in ratings over time asample of45 dictations
was
reviewed twice; agreementwas
69 percent for the45 pairs ofratings. Physicians also reviewed 19 photocopied complete medical records ofpatients
whose
dictatedsummaries
they had alsoreviewed; their ratings for the complete recordswere compared
with theirratings forthe 19 dictated summaries, and the rate ofagreement
was
84 percent (16 of19). Reliabilityforjudging Overall Quality ofcare, asopposed
topreventable deaths,was
notreported.Although
these studies and others (Sheps, 1955; Fletcher, 1965, 1974; Payne,1967;
Morehead
et al., 1971;Dans
et al., 1985; Grimaldi and Micheleteie, 1985;Mellette, 1986) have introduced a strongnote of caution into the implicitreview process,
no
bettermethod
ofperforming routinerecord reviews for inhospital care has beendeveloped. Explicit review iswell suited to large studiesor review efforts focused
on
particular diseases or clinical situations. Like implicit review, however, itrequires
extensive training and monitoringof personnel toprovide reliable data. Explicit reviews
become
increasingly time-consumingas they attemptto probedetails ofclinicalmanagement.
Furthermore, although peer review has intrinsicvalidity as thecommunity
standard, explicit reviewer's validity
may
becalled into question injudging the particular2.
METHODS
PATIENT
SAMPLE
The
patient samplefor the explicitreview portion ofthisstudy includes 14,012 Medicare patients65 years ofage or olderfrom
297
hospitals.The
297 sampled
hospitals represent 97percentofthe
305
hospitalsapproached.The
hospitalswere
randomly
selectedfrom
30
areas in fivestates. States and areasfrom which
hospitalswere
selectedwere
chosento be representative oftheU.S. population in terms ofcitysize and geographic region,but hospitalscaring forpoor people
were
slightlyoversampled.
The
studysampled
patientsfrom
1981 (20 percent ofpatients),1982
(30percent), and 1985/1986(50percent).
Patients hospitalizedwith congestiveheart failure, acute myocardial infarction,
pneumonia, cerebrovascularaccident, and hipfracture
were randomly
selectedfrom
listsofall Medicare patients hospitalized during thestudy years. Listsofpatients
were
basedon
ICD-9-CM
codes, butpatientmedical recordswere
thenscreened to exclude patientswho
didnot havethe studydisease, orwho
did nothave acutesymptoms
(Kahn
et al.,1988b; Draper et al., 1989).
All included medical records underwent explicitreview.
Ten
percent of included records (a total of1,366records)were
selected toundergo implicit review. Stratifiedrandom
samplingwas
used, with oversampling of deaths so that approximately50
percent ofpatients
whose
recordsunderwent implicit reviewwere
patientswho
had diedin the hospital. This samplingplan
maximized
our chance ofidentifying patients forwhom
death occurred as aresult ofpoorquality. In theanalyses reported here, data have been reweighted to achieve representativeness ofourfindings for allMedicare
patients hospitalized withone
ofthe five diseases.To
determine interrater reliability, arandomly
selected sample ofabout 25 percent ofall recordswas
reviewedby two
reviewers, and 3 percentwere
reviewedby
allfivereviewers.
PERFORMING
IMPLICIT
REVIEW
Deldentlfying
Records
Before undergoing implicitreview, the inpatient records ofall study patients
were
duplicated and deidentified.
On
average, it tooktwo
hoursper record to obliterate allIn addition, a
team
of six clerical workers supervisedby
aphysician andpublichealth professional reviewed each medical record tobe sure the record
was
legibleenough
tobe interpretable andcomplete.The
checkfor completeness includedascertaining that thephysicians' notes, order sheets, nurses' notes, vital signs, and
laboratory results
were
present for each hospitalday. Completely illegiblemicrofiche recordswere
excluded.These two
taskstook an average of55 minutesper record.Physician
Time
Requirements
forPerforming
ImplicitReview
Reviews were
budgeted at30
minutes each; reviewerswere
instructed tokeep tothistime limit, although
some
reviews mightrequire less time andsome
more.We
budgeted training at 12 hours per physician.
Reviewer
Sample
Twenty-five physicianreviewers participated in the study.
One
reviewerper diseasewas
selectedby
each ofthefive statePROs
participatingin the study, but each reviewer reviewed recordsfrom
all states.We
randomly
assigned records to reviewers.No
reviewer reviewed patientsfrom
more
than one ofthefive study diseases. All reviewerswere
boardcertified in appropriate specialties. Internists reviewed recordsofpatients hospitalizedwith congestive heart failure; cardiologists reviewed acute
myocardial infarction records; pulmonologistsreviewed
pneumonia
records; neurologistsreviewed cerebrovascular accident records; and orthopedistsreviewed hip fracture records.
Reviewer
Training
Implicit review, as performed in thisstudy, required that reviewers be trained in
theuse ofa structured review
form
withaccompanying
written guidelines. In training,we
specifically avoided trying to change reviewers' opinions aboutwhat
should have beendone
in a given clinical situation, butwe
encouraged reviewers to use auniform setofrating terms as applied topredefined aspects ofcare.
Initial training of physician reviewers
was
carried out duringone
three-hoursmall-group session. Followingthe session, each physicianreviewed
two
trainingrecords and participated in a half-hour telephone callwith studyphysicians. Physicians then reviewed three
more
recordson
theirown
athome;
these recordswere
discussed duringa two-hour conference call involving all five physicians andtwo
studyone-day sessionone and one-half years before training.
At
that time, they had reviewedpotentialexplicit criteria foreach study disease.
During theinitial training session, ourmajorgoal
was
to train reviewers to thinkabout qualityofcare in terms ofthe probabilitythat the care given
maximized
(orworsened) the chance thatthe patient
would
experience agood
outcome, whateverthe actualoutcome
was.The
following training statementswere
frequently repeated:1.
Out
of100 patients in this clinical state,how
many
would
have suffered aworsened
outcome,compared
topatients receiving standardtogood
care, if treated in thisway?
For example, if 100patients with myocardial infarctionwere
hospitalized in an orthopedicward, as patientSmith was,how many
would
have experienced a significantlyworse outcome compared
to similar patients admittedto acoronary care unit?(We
asked reviewers tomake
thisjudgment
whether or notpatient Smithactually experienced apoor outcome.)2.
We
are interested inthe overall care received, notinwho
delivered it. Forexample, ifan internal medicine consultant or anurse gave poorcare to an
orthopedic patient, the patientreceived poor care, whatever the level of excellence ofthesurgeon.
3.
You,
the reviewers, are the experts.We
willnot giveyou
"answers" totheform
orchange your opinions about appropriate medical practice. Rather,we
willencourage allreviewers to use thesame
data and to attach thesame
meaning
to our rating termswhen
answeringtheSIR
questions.Time
To
maximize
the impact ofourteaching,we
used an interactive smallgroup format to ensurethe participation ofall reviewers. During thefirsthalfhourofthesession,
we
began
withopen-ended questions elicitingreviewer opinionsand experiences regarding implicitreview.We
then gave a briefdidactic introduction to qualityofcare,includingthe termsimplicitreview, explicitreview, structure, process, and outcome.
We
reviewed the tasks tobe performed to complete the study. Forthe next 15 minutes,we
asked reviewers to independently reviewa single duplicatedmedical record andto
form
an impressionregarding the quality ofcare given.We
then askedthem
to reviewthemember
inturn togive hisor her rating for each question.We
discussed each questionin turn, elicitingreasons
from
reviewers for their disagreements.Our
writtenguidelinesoftenhelped to resolve differences, because
many
disagreementswere
basedon
differences ininterpretation ofour rulesand definitions rather than
on
true differences ofopinion.
When
true differences ofopinion existed, e.g.,between
aphysicianwho
believed all elderly hypoxicpneumonia
patientsbelonged in an intensive careunit and onewho
believed thatmany
suchpatientscould be cared foron
award,we
did not attempt to "correct" either opinion.We
then broke for 15 minutes,reconvened, andrepeated the process for another medical record over thecourse ofthe next hour and a
half, for atotal of four hours.
We
conducted thetelephone conference calls in asimilar manner.The
firstcallforeach reviewer included
two
study investigators and onephysician reviewer; thesecondincluded all physician reviewers for one ofour diseases and
two
studyinvestigators. (The second call thusinvolved only reviewers for a single disease;
we
conducted a separate call for each ofour five diseases.)
Each
call focusedon
preassigned and prereviewed medical records.
We
reviewed answerstoSIR
questionsone at a time, encouraging dialog and, hence, problem-solving.
We
basedmany
ofourresponses to reviewerson
key principles thatprovided theconceptual
framework
forour guidelines.These
includedthe three broad conceptsdiscussed above, as well as others addressing problems
commonly
encountered duringimplicit review.
Other
Key
Principles
Emphasized
During
Training
Reviewers
were
taught duringtraining to "anchor" theirratings to an agreedupon
definition ofquality categories. For
most
items, reviewerswere
asked to use the lowest response categorywhen
the care givenwould
have been highly likely to contribute to abad
outcome, whether or not abadoutcome
actually occurred in that case. For example,when
rating the quality ofaphysician's history or clinical assessment, raterswere
instructed to use thelowest response category ("very poor") iftherater,
when
asked tosee the patient at midnight,
would
have to startfrom
scratch in the evaluation.A
judgment
of"adequate"meant
thatmost
essential historical and assessment observationswere
included, but additional data might be required for optimal diagnosis andtreatment."Excellent"
meant
that all necessary datawere
present."Good" was somewhat
betterWe
asked reviewers tojudge care according tothe individual needs ofa particularpatient, rather than according to preset standards. For example, one question asks
reviewers to rate the quality of recorded information about the patient's acute disease.
The
amount
of information required inthe patient's recorded historymight vary with theclinical presentation, the patient'smental status, or the degree offamiliarity ofthe
physicianwith thepatient, although for each case,
some
amount
ofrecorded informationwas
necessary to assure appropriatemanagement
andcommunication
with othermembers
ofthemedical team.The
amount
of information necessary for each individualcase
was
left to thereviewer's implicitjudgment.Reviewers
were
asked tojudge urban, rural, teaching, and nonteaching hospitalsusing the
same
standard.They
were
asked not toadjust their ratings according to theirguess about the type ofhospital
from which
a record came. Reviewerswere
asked to rate care as inadequate (i.e., as "poor" or "very poor")when
that care did notmeet
alevel achievable
by most
hospitalsand nottojudgecare as inadequate solely forfailure toperform extraordinary or controversial procedures.Reviewers
were
asked totake account of"Do
Not
Resuscitate"(DNR)
orders.They
were
asked tojudge care forthese patients according to theirown
implicitstandards for the care of
DNR
patients. Reviewerswere
asked not tojudge caremore
leniently ifthey suspected that less aggressive treatment
was
aspired toby
thephysicians caringfor the patient,when
no
explicitstatement aboutless aggressive carewas
presentin the record.
STRUCTURED
IMPLICIT
REVIEW
FORM AND
GUIDELINES
The
SIR form (Appendix
1,Kahn
et al., 1990) included28
items and asked thephysician reviewers torate specific aspects ofthe following: theprocess of physician and
nursing care; the appropriateness of use ofhospital services; patient prognosis;
treatability ofthe patient's condition; preventability of death
when
itoccurred; thequality oftheoutcome; and an overall assessment ofthe quality of care delivered during
the hospitalization. Ratings
were
basedon
Likert scales; a five-point scalefrom
very
-10-Allowable Data
Sources
Within the
Medical
Record
To
perform implicitreviews, reviewerswere
instructed to fullyreview theentiremedical record, including order sheets, laboratory tests, and physiciannotes, withthe
exception ofnurses' notes,
which were
to be used as needed. All nurses' noteswere
duplicated and reviewers
were
asked tocomment
on
the initial nurse assessment performed atthe time ofadmissionto the hospital.An
Overall Qualityjudgment
integrates themultiple
components
ofhospital care intoone
Overall Qualityscore.The
SIR
form
also measures thesecomponents
individually, assuring that all specifiedcomponents
are takeninto account.We
should explain thelow
emphasis ourform
placeson
the specificmeasurement
ofthe quality of nursing care.
Although
we
believe that the quality of nursing care isofvital importance in determining patient outcomes,
we
were
limitedto30
minuteson
average for physician reviews. This timelimitprecluded complete review of nursing
notes. In addition
we
were
not convinced that physicianreviewerswere
the appropriatejudges of nursing care. Physicians had all nurses' notes available forreview, and could and did use
them
as needed tojudge particular incidents duringthe hospitalization.In assigning their ratings, reviewers tookinto account the entire hospitalization,
includingthe actions (or nonactions) of nonphysicians, ofhospital laboratories and
facilities, and of consultants
who
may
have been requestedby
theprimary carephysician. For example, a hospitalization for a hipfracture during
which
the surgeon performed well could be downrated as a result ofthe actionsofapoor-quality internalmedicine consultant or inappropriate nursingcare.
For each quality
judgment
thereviewersmade,
allowable data sourceswere
specified in the review
form
or guidelines.The
purpose ofthiswas
to encouragephysicians to assign ratings based
on
thesame
data sources. For example, information sources allowable for rating the quality of information obtained about the patient's acuteillnessinclude
emergency
department notes, consultant notes, orprimarycare noteswritten within
24
hours ofthe time of admission. Raters coulddowngrade
records ifappropriate information
was
obtained butwas
unacceptably late—
e.g.,when
aneurologyconsultant
was
the only physicianwho
indicated the level ofconsciousness in a patient
-11-Tlmlng
ofHospital
Services
A
hospitalizationcan be conceptualized as aseries ofservices delivered duringthecourse ofthe hospital stay (Table 1). Conceivably, the appropriateness with
which
the service is delivered could begood on
one day but pooron
another.We
designed theSIR
form
withthis inmind.The
implicit reviewform
asked reviewers tojudgesome
aspects ofcare only during theinitial
24
hours or so ofhospitalization—
some
during the initialand subsequent periods separately andsome
by
averaging the appropriateness ofuse ofthe service takinginto account the entirehospital stay. Table 1 illustrates
which
questions on thereview
form measured
the first24
hours ofhospitalization andwhich
measured
the entire stay.We
think theinitial24
hours ofhospitalization are conceptually differentfrom
subsequenthospital days. During thisinitialperiod, key assessment data are gatheredand
key
therapeutic decisionsaremade.
Ifthe initial assessmentis poorly performed,subsequent decisions
may
beflawed.We
thereforeemphasized
thisinitial period indesigning our form.
The
periodjust before discharge is another key time periodwhere
flawed decisionsmay
be especially crucial.We
specifically assessed thispredischarge period with questionson
theappropriateness ofthe length ofstaygiven the patient'sstatus at discharge and thepresence or absence ofinstabilityat discharge.
Services delivered during the hospitalization can be
viewed
as falling intotwo
categories.
One
category consists of assessments and procedures that shouldbedone on
allpatients with the disease being studied.
The
second consists ofassessments andtreatments performed only for specific indications, such as an abnormal finding, an
adverse event, or a risk factor.
We
distinguishbetween
thesetwo
kinds ofservices onlyfor nurse assessment;
we
judged nursing assessment as appropriate for allpatients anddid notattempt to
measure
nursejudgment
or response toindividual needs.Prognosis
Two
questions ask physicians tojudgepatient prognosisat admission and the potential effectiveness of treatment for the patient's disease. Reviewerswere
asked tobase their
judgments
on
initial data gathered during the first24
hours of admission.Reviewers
were
asked to ignore the actual processof care during the hospitalizationbutinstead to
assume
thatgood
but not extraordinary carewould
be given. Reviewers judged the patient'soutcome
relative towhat
mighthavebeen
expected to occur given
-12-Table 1
The
Inpatient Hospitalization: Aspects Evaluated byImplicitReview by
SIR
QuestionNumber
Initial Only (Tirst 24hr) Initialand Subsequent AtorNear Discharge Only
SERVICES:
Nurse (cognitive) AssessmentQle
—
Therapy—
—
Physician (cognitive) Assessment—
Priorand chronicdisease Qlabcd—
—
Acute diseaseQ2abc
—
Therapy
Q2abc
—
Laboratorytests
"Smallticket" Q2abc 5dh
(e.g.,venous bloodtests)
"Middleticket"
Q2abc
5fg (e.g.,EKG)
Invasive or specialized
Q2abc
—
(e.g.,cardiac cath)
Medications
Q2abc
5j02and ventilation
Q2abc
5cSpecialmedicalpersonnel
Physical occupational therapy
Q2abc
5eRespiratory therapy
Q2abc
5bConsultant physicians
Q2abc
5iSpecialnursingunits
CCU/ICU
Q2abc
5a,aiTelemetry Q2abc 5aii
SUMMARY
MEASURES:
What was
the patient'slifeexpectancy?
Q3
—
How
treatablewas
thedisease?Q4
—
Was
thedeath preventable?(patients
who
died)—
Q7
Appropriateness of length ofstay,
andstabilityatdischarge
(patientsdischargedalive)
—
Patient'soutcome compared
withexpected
—
Q8
Overall qualityofcare
—
Q9,10Q6, 6a, 6b
"
—
-13-ADDITIONAL
DATA
SOURCES
In addition toperforming implicit reviews on the medical records, our analysis
useddata
from
thepreviously described explicit reviews ofthese records(Kahn
et al., 1990a).To
evaluate the actualoutcomes
of care ofthe patientswhose
carewas
assessedimplicitly,
we
examined
mortality rates30
days after admission.We
were
able tofindaccurate postadmissionmortality data
from
HCFA'S
Health Insurance Masterfile for92
percentofour study sample
(Kahn
etal., 1990b).STATISTICAL
METHODS
To
facilitate analysis,we
developed implicitprocessscales and an Overall Qualityofcare scale.
We
used clinicaljudgment
to group process itemsinto scales. Processscales
were
createdfrom
the 19 itemslisted in questions 1-5on
the implicitreview form.The
overall qualityofcare scalewas
basedon
answers to questions 9 and 10.We
usedfactoranalysisto see whether our process groupings
made
sense psychometrically; finalscales
were
quite similarto our initialclinically developed scale groupings.We
moved
an item
from
one scale to anotherwhen
a high correlation with the otherfactor indicated strongly that the item belonged in thatgrouping.Reliability
We
hypothesized thatsome
reviewers might judge caremore
harshly, andsome
more
leniently, thanothers.We
tested for the significance ofthis reviewer effect using aone-way
analysisofvariance.We
then determined individual reviewer standard deviations and the extent towhich
thesewere
normally distributed.We
also tested thenormality ofthe scales directly withinand across reviewers.
To
adjust for the reviewereffect,
we
calculated the group ofreviewers'mean
scores for a question or scale acrosspatients,
compared
it tothe individual reviewer'smean
score forthat question, andadjustedthe individual reviewer's scores for all patients
up
ordown
proportionallytomatch
the groupmean.
We
then used the adjusted scores inouranalyses. For example, reviewer Smith,M.D., gavepneumonia
patient JohnDoe
ascore of fouron
the five-point Likert scale forItem #1on
the implicitreview form.The
average score for Dr.Smith
inrating other
pneumonia
patients for Item#1was
five.The
average score for Item #1 for all reviewers in Dr. Smith's review groupwas
3.The
calculated score adjusted for the
-14-Interrater Reliability
To
determine interrater reliability,we
first determined thepercentage agreementbetween
reviewers forscales and itemson the implicitreview form. Percentageagreement, however, can be highly misleading, since it does not take into account the
amount
ofagreementthatwould
be expectedon
thebasisofchance alone, given theunderlying distribution of scores forthe item or scale in question.
A
70 percent level ofagreement could be very
good
ortotallyunacceptable. Ifreviewers agree70
percent ofthe time
on
theyear ofbirth across the500
cases they review, but 99 percent ofthecaseswere
born in 1956,70
percent agreement is poor. Ifthe agreement is70
percentwhen
thetrue answers across the
500
cases rated are well distributed (e.g., spanning a largenumber
of possiblebirth years),70
percentmay
be very good.A
frequently usedmethod
for adjusting the percentage agreement toremove
agreement expected
by
chance alone isto calculate theKappa
statistic. Kappa, however, ismost
appropriatewhen
the ratings are unordered categorical variables, such asdiagnosticcategories oryes/no items.
In this study,
we
have used analysis of variance ratherthanKappa
to assessinterrater reliability because
we
use ordered scales. For example, afiveon
one ofourscales is notvery different
from
a six,but very differentfrom
a nine; in assessinginterrater reliability, analysis of variance takes these relative disagreements into account
appropriately. Before calculating interrater reliability,
we
adjusted scores for reviewereffects as described above.
Interltem Reliability
To
determine whether the process and Overall Quality scaleswe
formed
performed well psychometrically,we
determined interitem reliability (Cronbach'salpha).
Overall
Quality
Scale
The
Overall Quality ofCare scale consistedoftwo
items: "Consideringeverything
you
know
about the patient, please rate overall quality of care" withfiveresponse categories ranging
from
extreme, above standard, to extreme,below
standard;and
"Would
you
send yourmother
to these physicians in thishospital?" withfour response categories rangingfrom
definitely yes to definitelyno.Responses
to these
-15-2 (bothquestions answered withthe highestpossible rating of1) to 9 (worst care). Cases reviewed
by
multiple reviewerswere
assigned themean
adjusted score across reviewers.We
then divided the Overall Quality ofCare scale into fourlevelsofcare: (1) "verypoor care," a score of7.5 to 9; (2) "poor care," a score of6.5 to 7.4; (3)
"good
care," ascore of3.5to 6.4; and (4) "very
good
care," a score of2 to 3.4. Cutoffswere
chosen tobest reflect the
meaning
ofthe response categories.One
question (#9) had threeresponses that according toour training and guidelines
were
in the "acceptable" care range, andtwo
in the "unacceptable" range.One
question (#10), hadtwo
in the"acceptable" and
two
in the"unacceptable" range. For example, poorcare orvery poorcare
means
that responses tobothofthe Overall Quality questionswere
rated in the"below
standard" or "probablywould
not sendmy
mother" range.Components
ofVariance
We
usedcomponents
of variance analysis tounderstand reasons for disagreementamong
multiplereviewers ofthesame
case, and toexamine
the relationshipbetween
thisdisagreement and the conclusions that couldbe
drawn
about the true quality differencesbetween
patients assigned particular quality ratings.The
variance analysis ofthedistribution of cases
between
very good, good, poor, andvery poor care included (1) thereviewer effect discussed above, (2) differences ofopinion
among
reviewers, and (3)actual differences in the medical care delivered to thepatient.
We
are clearlymost
interested in the last
component
ofthe variance.We
used the variance-componentsmodel
to adjust resultsofthe Overall Quality scale for the reviewer effects and for differences of opinionamong
reviewers. Thesecomponents
are mathematically relatedto imperfect reliability.
The
figuresgeneratedby
thismethod
provide a conservative estimate oftheamount
ofcarejudged to beinadequate.In calculating the "true" population quality scores,
by
adjustingfor interraterreliability,
we
accounted for theless than perfect interrater reliability ofeach scale.The
calculation
was
done
intwo
steps.Step One:
We
postulated that the underlying distribution oftrue quality scoreshad anormal distribution.
The
result of any particular reviewwas
grouped intoone ofeight possible scores for this scale (2 to 9).
The
resultwas
regarded as arandom
draw
representing the population
mean
plus arandom
reviewer/reviewer opinion effect.Step
Two:
We
estimated themean
of underlying true quality scores using the
-16-variance-components
model
to adjust for thereviewer effect andthe differences ofopinion
between
reviewers. Thesecomponents
are mathematicallyrelated to imperfectreliability.
These
concepts canbe represented inthe following equation:R
=
alpha+
beta+
gamma
(reviewer effect) ("true" care differences) (opinion)
Alpha, beta, and
gamma
each have an associatedcomponent
ofvariance and addup
to R,the observed variance.
CALCULATING
THE
NUMBER
OF REVIEWERS NEEDED
We
calculated thenumber
of reviewersneeded
to achieve givenlevels ofcertaintyabout theresults ofa review of anindividual case.
We
didthis usingtheSpearman-Brown
Prophesy Formula.The
minimum
value of"n,"thenumber
ofreviewers neededto achieve a specified level ofcertaintythat thescore
was
correct, dependson
thedistributionoftrue qualityscores as well asthe definition of"poor" and thereliability of
the instrument.
Using
thismethod
of estimatingthe true rating ofqualityofcare for thepopulation under study,
we
called the medical care delivered "poor" ifthe true ratingexceeded 6.5
on
our scale. Thus, althoughwe
did not observe this true score,we
estimated the probabilitythat the score
was
trulypooron
thebasisofthe actualreviewswe
performed and the properties ofthenormal approximation.We
used linear regression to evaluate whether poor quality ofcaremeasured
implicitly
by
the Overall Qualityscale predicted death within30
days, after adjustingfor patient sickness. (Logistic regression produced similarresults.)To
adjust for patientsickness at admission,
we
used both implicitly and explicitlymeasured
sickness.The
implicitsickness at admission measures are life expectancy (Question #3) and treatability
ofthe patient's disease (Question #4) (see
Appendix
D).We
describe our explicitmeasure
elsewhere (Keeler et al., 1990).We
also usedlinearregression to determine whether reviewers assigned qualityscores differently
when
patients died in, ascompared
withoutside, the hospital.To
-17-ofdays
between
hospital admission and day ofdeath in our analyses (proportional
-18-3.
RESULTS PART
I:PERFORMANCE
OF THE
SIR
METHOD
PATIENT
SAMPLE
A
totalof275
records foracute myocardial infarction,278
records for congestiveheart failure,
273
records for pneumonia,270
records for cerebrovascularaccident, and270
records for hip fracturewere
reviewed, for a total of 1,366records. Thiswas
93percent ofrecords selected for implicit review.
The
remaining 7 percentwere
not includedbecause oflatereturnfrom
the hospitalorreviewer (2 percent), because themicrofiche
copy
was
unreadable (3.5 percent), and formiscellaneous reasons (1.5 percent).Of
the 1,366 records thatwere
reviewed,993
were
reviewed once,333 were
reviewed twice, 33were
reviewed five times, and seven (hip)recordswere
reviewedfour times, for a total of 1,852 reviews performed.
To
evaluate ourrandomization,we
compared
the implicitreview sampleto the larger explicitreview samplefrom
which
itwas
drawn.The
explicit sample isknown
to closelymatch
national data forpatientdemographic
and hospital characteristics. Studyresultsshow
that the implicitandexplicitsamples
were
similarin compositionin terms ofage, sex, race, and type ofhospital (Table 2).
Table 2
Results: Patient
Sample
Implicit n=l,366 (%) Explicit n=14,012 (%) Demographic CharacteristicsofPatients
Age
a 80 43 Female 54 Non-white 16 41 57 19 Characteristics of Hospitals Rural 20Any
teaching 36 County 13Serves high percent Medicaidpatients 16
21 35
-19-IMPLICIT
REVIEW
QUALITY
SCALES
Our
final scales and theircomponent
items are listed inAppendix
D.REVIEWER EFFECT
We
found a significantreviewer effect formost
diseases (p<
0.01by
ANOVA
for all diseases except cerebrovascular accident, forwhich
therewere no
significantreviewer effects). That is,
some
reviewers gave significantly lowerquality scores acrossall cases theyreviewed than did others.
The
reviewer effect accounted for about 4percent ofthe variance in scores (1 percent forcerebrovascular accident, 8percent for
hipfracture, 4 percent for congestive heart failure, 6 percent for acute myocardial
infarction, and 1 percent forpneumonia). Subsequent analyses reported here have been
adjusted forthereviewer effect.
However,
thisisnot the fullstory. There are significantthough smalldifferencesbetween
thestandard deviations ofthereviewers'mean
quality scores.So
itappears thatsome
reviewerswere
more
extreme in theirapproach thanothers.Although
itis possiblethatthe adjustmentprocess couldberefined to take account ofthis, it
would
not havemuch
impacton
theresults. Therewas
no
pattern, for example, inwhich
reviewers with lowermean
values had lower or higher standard deviations.OVERALL
QUALITY
SCALE
Our
results indicated that the Overall Quality scale accuratelysummarized
theresults ofthe remainder ofthe implicitreviewform.
The
two
items inthe OverallQuality scale
were
highly correlated with each other and withresponses to themore
detailed quality questions in other sections oftheform.
The
Pearson correlationcoefficient
between
thetwo
questionsthat constitute the Overall Quality scalewas
0.86.Correlations
between
ratingson
the Overall Qualityscale, theprocess scales, and thePreventable
Death
itemwere
also high (Table 3). For example, the highnegativecorrelation ofthe Overall Quality scale withPreventable
Death
indicates thatphysiciansstrongly downrated overall quality of care forrecords
when
theybelieved that apreventable death had occurred. Overall, responses tothe
more
detailed itemsand scaleson
thereviewform
accounted for about56
percent ofthe variance(R
2) inthe OverallQuality scale. Thisreflected an
R
2 of83 percent for congestive heart failure, acute
-20-Table3
Correlations of Individual Implicit ReviewScales
and
Items with Overall QualityofCare ScaleIndividual Implicit Correlation Coefficient
Typeof Scale Review Scales
or Item and Items
CHF, AMI,
PNE
HIP
CVA
±11yoiwxaii MT}ivxj./AUiiwiiuiiaifunctional anHaiiu phrnnirwilluiiiv
disease assessment 0.59 0.51 0.66
MD
acute disease assessmentandplan 0.82 0.59 0.82
Laboratory evaluation
performed 0.61 0.63 0.53
Treatmentperformed 0.68 0.72 0.71
Nurse
RN
admission assessment 0.24 0.19Outcomes Qualityoftheoutcome 0.34 0.42 0.40
Length ofstay 0.56 0.43 0.48 Preventable death -0.66 -0.57 -0.64
Sickness atadmission Patientprognosisatadmission 0.02 -0.01
There
were
two
exceptions to thehigh correlationsbetween
the Overall Processscale and other scales.
As
we
had anticipated, given theminimal
review ofnurses' notes included in ourreview protocol, the correlationbetween
the Overall Qualityscale andnurse assessment
was
quite low.The
correlation with patientprognosiswas
also low, aswe
had hoped, indicatingthat physicianswere
not systematically rating quality ofcareon
the basis ofpatientsickness at admission.
RELIABILITY
To
evaluate thepsychometric characteristics of SIR,we
tested interitemand
interrater reliabilities, after adjusting forreviewer effect.
Table 4 liststhe implicitreview scales and their interitemand interrater
reliabilities.
Appendix
E
liststhe reliabilities for implicitreview items.We
have alsoincluded interrater reliabilities for
some
oftheitems not included in the scales.Calculations
were
not applicablewhen
the relevant itemswere
not includedon
theform
for a particular disease, or, forinteritem reliability,
when
therewas
only one item
-21-Table4
Reliability or ImplicitReviewforFive Diseases
Cronbach's
Numberof Items Interitem Interrater
on SIRFonn Reliability Reliability
CHF, AMI, CHF,AMI, CHF,AMI,
Groupsof ItemsonSIR Form
PNE
HIPCVA
PNE
HIPCVA
PNE
HIPCVA
MD
functionandchronicdiseaseassessment A A z ftu.oyso n sou.oy U.oZft90 ftu.oyAO ftU./l71 ftAO
u.oy
MD
acutediseaseassessmentandplan •1J z A ftQ"X ftu.oA7/ ftu.yzQO ft^0 ftu.zj7S ftu.oyAO
Laboratoryevaluation A A 1J ftU.OZ90 ftU./J u.ft70/y ftAO ftu.zz00 ftu.OA
m
Treatments O A •I
J ftU.OOfiA ftu.oyAO U.olA81 ftU.hUAft ftio ftU.OjA^
Respiratorycare 3 3 0.84 0.91 0.44 0.42
Quality of surgery 3 0.95 0.90
CVA
therapy 2 0.76 0.47Specialneurologictesting 3 0.45 0.45
RN
assessment 1 1034
0.50 Appropriateness oflength ofstay 1 0.13 0.58 0.42 Preventable death 1 0.16 035 0.50 Lifeexpectancy 0.64 0.73 Treatability 0.57 0.57Qualityoftheoutcome 1 0.48 0.74 0.69
Functionatdischarge 0.80
Appropriateness of surgery 1
030
Overall qualityofcare 2 2 2 0.93 0.87 0.94
0^2
0.44 0.66NOTE: Dashesindicatethatthenotationisnotapplicable.
Interrater reliabilities for scales
were
mostlybetween
0.4 and 0.7, arange consideredadequate for
comparing
groups of people, butdid not reach the 0.8to 0.9 levelconsidered desirable for rating a single individual.
COMPONENTS
OF
VARIANCE
INQUALITY
SCORES
To
estimatehow much
ofthe difference in qualityofcare ratingswas
due toactual differences in the care the patient received
compared
withmethodologicartifact,we
evaluated thepercentage ofthevariance in Overall Qualityofcare found inoursample as a
whole
thatwas
accounted forby
each ofour three postulated sources ofvariance.
These
were
the reviewer effect, the effect of imperfect interrater reliability,
-22-Overall, the reviewer effect accounted for about0.9 to 7.6percent ofthe variance in
Overall Quality scores
between
records; differences of opinion regarding case-specificmanagement
between
reviewers accounted for 38 to 65 percent, and differences incharacteristics ofthe care the patient received accounted for
36
to 63 percent (Table5).Thus, therange ofqualityof care scores in our sample reflected, to a substantial extent, true differences in the care delivered to patients.
LINKING
PROCESS
TO
OUTCOME
To
further validate our SIR,we
determined whether the process of care for apatient with agiven sickness atadmission could be linkedto the patient's mortality.
We
found (Table 6) that death within
30
daysofhospital admissionwas
strongly correlated, ateach level ofsickness, to the quality ofcare as ratedby
implicitreview.Better Overall Quality ofcare ratings
were
also linkedwith lower mortalityratesat
180
days after admission and for posthospital deaths.However,
inneither oftheseinstances
were
these data significantat the0.05 level.The
lackofsignificance is at least inpart due to the smallnumber
ofposthospital deaths(i.e., halfof our samplewas
selected to represent inhospital deaths), so
we
had
many
fewer posthospital deathstoevaluate.
To
guard against the possibility that our reviewerswere
biased in theirassessments
by
knowledge
that the patienthad died,we
compared
quality ratings for patientswho
died in and outside the hospitalon
each postadmission day, using theproportional hazards
model
analysis strategy described in Sec. 2.We
found that theTable5
PercentageofVarianceinReviewer
Judgments
Accounted forbyThreeScores ofVariance
Differences of
Reviewer Opinion
Among
DifferencesinEffect Reviewers Care Provided