A New Methodology for Evaluating Radiologist Error Rates: Factoring in the Complexity of Study and Potential for Significant Pathology

(1)

A New Methodology for Evaluating Radiologist Error Rates:

Factoring in the Complexity of Study and Potential for

Significant Pathology

By Frank E. Seidelmann, D.O. and Douglas C. Ward

The time, multitude of anatomic structures demonstrated on an imaging study, and potential for pathology are not equal for all imaging studies. The inherent information content significantly varies between modalities, based upon the technology involved in producing the images, from extremely basic to extremely complex. The training and experience of radiologists significantly varies, and certain types of advanced imaging modalities and specific body parts can only be interpreted by subspecialty radiologists.

The 2010 Radisphere error rates were analyzed, evaluated, summarized in detail, and presented to Radisphere professional and medical leadership during the winter and spring of 2011 in a presentation entitled: Radisphere Clinical Error Analysis: 2010 – QA / Peer Review. Errors: How, What, Who, and What To Do? As part of this in-depth evaluation, a new model was developed for comparing the error rates of radiologists, not based on the volume of cases interpreted or RVUs. This new model provides a methodology for assigning a Complexity Rating for the interpretation of a study, Integrating the potential for making an error of omission of Significant Pathology, (CRISP).

This paper describes the complexities in interpretations, explaining how all errors cannot be rated on a similar basis, the process that the authors undertook in the creation of a new rating system of errors, and a method to rate radiologists so that all radiologists are equally compared, no matter what type or volumes of studies they are individually reading.

Failure of Observation

The Checklist Manifesto by Atul Gawande, M.D. i_{provided insight into how to deal with complex issues. The following review}

by The Independent clearly demonstrates the benefit of a logical review of complex subjects:

“Avoidable failures continue to plague us in healthcare, government, the law, the financial industry—in almost every realm of organized activity. And the reason is simple: the volume and complexity of knowledge today has exceeded our ability as individuals to properly deliver it to people — consistently, correctly, safely.”

Dr. Trafton Drew, a psychologist at Harvard Medical School, modified the classic 1999 Simons and Chabris gorilla experimentii

and adapted it for a CT chest study being interpreted by skilled observers (radiologists).iii _{Dr. Drew superimposed a}

matchbox-sized gorilla on one slice of five CT chest images. The radiologists were asked to identify pulmonary nodules, lesions which are suspicious for cancer. 80% of the radiologists and 100% of non-radiologists completely missed the gorilla. During this experiment, an eye-tracking device monitored the movement of the radiologist’s eyes, confirming that they had looked at the gorilla, but failed to actually recognize what they had observed. Dr. Drew’s explanation for the radiologist’s

(2)

really obvious large things like a gorilla.” Professor Simons, who authored the original study, explained that this is not unique to radiologists: “We’re aware of only a small subset of our visual world at any time. We focus attention on those aspects of the world that we want to see. By focusing attention, we can filter out distractions. But in limiting our attention to just those aspects of our world we are trying to see, we tend not to notice unexpected objects or events.”

As radiologists, we have known about this phenomenon during our entire professional careers. Radiologists have referred to this phenomenon as “tunnel vision.” As residents doing barium enemas (colon examinations requiring rectal introduction of liquid contrast material), we looked closely for colon pathologies (polyps, cancer, etc.) but often missed large destructive cancerous processes in the bones, because of the intense focus to find colon pathology. Similarly, when a referring clinician provides a clinical diagnosis or requests the radiologist to “rule out” a specific pathology, the clinician has inadvertently directed the focus of the radiologist. Seasoned radiologists have learned that once they have identified specific pathology, they should then ignore it and look for other findings. But this alone will not always provide a guarantee that the radiologist will find the problem.

In 2010, Atul Gawande, M.D. gave a special lecture at RSNA entitled “Real Reform: Facing the Complexity of Health Care.” He lectured on two types of errors: “Failure of Ineptitude” and “Failure of Ignorance.” Failures of ineptitude are often failures of observation, secondary to not following a structured format or “checklist.” The Radiologic Society of North America has endorsed structured reporting to achieve consistency with similar information content and, more importantly, to avoid errors of ineptitude. Simply stated: structured reporting helps avoid failures of observation.

Radisphere has adopted and developed structured reporting (lexicons) as a standard of practice to avoid errors of ineptitude or the failure to systematically review a set of images. Radisphere’s lexicons have been developed over a 10 year period with over 10,000 man hours involved in their evolution. The purpose of the lexicons is to provide both consistent reporting across radiologists, and more importantly, to provide systematic guidance to radiologist examinations. The lexicon functions as a “checklist,” verifying that every pertinent anatomic structure is reviewed and scrutinized, so that the “gorilla” is not missed. Radisphere has produced 248 lexicons.

Radisphere’s lexicons are composed of “fields,” containing statements of normal anatomy. Each field is basically an entry in the “checklist.” The radiologist must go through each field as a checklist and confirm that each field, stating normal anatomy, is correct. The lexicon forces the radiologist to systematically evaluate each study in a consistent manner.

The number of fields is a proxy for the complexity of interpreting a study. The more fields in the lexicon, the greater the complexity of the study. Counting the number of fields for each lexicon provided a Complexity Rating. A CT Abdomen examination has 21 fields or a complexity rating of 21, while an X-ray Abdomen examination has 6 fields or a complexity rating of 6.

A modifier for integrating the potential for omission of serious pathology was developed. Certain anatomic regions have a low risk for significant pathology, while other anatomic regions have the potential for serious life threatening pathologies. For example, a hand examination, independent of the modality used to image the hand, has a very low risk of serious pathology. This can be generalized to most extremity imaging; however, the more proximal to the torso (i.e. the closer to the body), the greater the risk for more serious pathology. Very significant life threatening pathology, however, is often present in examinations of the brain, chest, abdomen, and pelvis.

(3)

A rating scale for significant pathology was developed. It was the authors’ belief that the increasing risk of significant pathology and the risk of omission are not linear functions; but rather, the risk of omission exponentially worsens with increasingly significant pathologies, putting the patient at greater risks of morbidity and mortality. The following scale was agreed upon by the investigators.

Multiplying the Complexity Rating times the Specific Pathology modifier resulted in a CRISP rating for the CPT codes of 248 types of procedures. A sample of the CRISP Ratings is presented in Table 1 within the Appendix.

The 2010 Radisphere Clinical Error Analysis was performed with the goal of fully understanding errors and assessing changes that could be made to lower error rates. The specific goals of the 2010 analysis of errors were to determine: how errors were made, what studies have the greatest risk of errors, who was making the errors, and what could be done to lower the error rates.

Radisphere’s rating of errors is a modification of the ACR’s RadPeer rating system.

• 2B - Subjective variance or difficult diagnosis, not ordinarily expected to be made; could possibly be clinically significant. • 3A - Diagnosis should be made most of the time; unlikely to be clinically significant.

• 3B - Diagnosis should be made most of the time; could possibly be clinically significant.

• 4A - Diagnosis should be made almost every time; a misinterpretation of findings is unlikely to be clinically significant. • 4B - Diagnosis should be made almost every time; a misinterpretation of findings could possibly be clinically significant. 2010 saw 491 errors of 2B, 3B, and 4B errors out of 1.1 million total studies performed. A database was developed for

analysis. The review of “charts” included reviewing the original interpretations (“the report”), the QA committee letter to the radiologist, and addendum reports if provided. Obtaining complete information was not possible for all errors due to the systems in place at that time. Complete data was available on 336 cases which represented 68% of all 2B or higher errors.

1 - Very Low – No risk of morbidity 2 - Low – Trauma, infection (treatable by medical treatment or

superficial I&D) not anticipated to result in long term disability or morbidity.

4 - Medium – Trauma, infections (non-critical anatomy, which may

undergo medical treatment or I.R. drainage or open surgical drainage), degenerative disease, inflammatory disease, benign tumors, not likely to cause morbidity, but may result in minor disability.

12 - High – Vascular events (hemorrhage, infarctions, thrombosis,

dissections, high grade stenosis) of organs that may not or are unlikely to result in immediate death (lung, liver, spleen, kidney, mesentery, extremities), life threatening infections of anatomy with the potential for spread to critical organs, which require surgery, malignant tumors, life threatening trauma, with potential for possible long term disability or eventual mortality (within weeks to months).

20 - Very High – Vascular events (hemorrhage, infarctions, thrombosis,

dissections, high grade stenosis) which involve critical life threatening organs (brain, heart) which are likely to result in immediate death, infections of critical anatomy requiring immediate surgery (spine, brain), malignant tumors with involvement of vital organs, producing life threatening compromise, life threatening trauma (if undetected would have the potential for short term mortality, within hours to days).

(4)

Understanding Where Errors Come From

Atul Gawande, M.D. postulated that 80% of errors are errors of ineptitude (mistakes we make because we don’t make proper use of what we know) and 20% of errors are errors of ignorance (mistakes we make because we don’t know enough). Our study results revealed that 70% of errors were of ineptitude and 30% ignorance. Further, 65% of the errors of ineptitude were due to the radiologist not using the lexicon as a checklist, but rather as a “template” of normal statements. 7.4% of the errors would not have been caught by using the checklist.

The Radisphere analysis demonstrated that the studies with the highest errors included OB ultrasound, CT abdomen/pelvis, CT Chest, and CT Brain. CT errors by body part were: 51% abdomen; 22% brain; 11% spine; and 7.6% chest.

The CRISP rating system was applied to the radiologists to level the “playing field of performance” so that all of the radiologists’ errors could be evaluated comparing an equal amount of cases weighted for complexity and potential for significant

pathology. Based on the then existing lexicons, the average CRISP index rating of complexity of studies read per radiologists varied from a low rating of 82 to the highest rating of 192. The average CRISP index rating per radiologist was multiplied by the total number of studies read for the year to obtain a total CRISP volume for the year. Total yearly CRISP volumes allow for comparison of radiologists reading studies of varying complexity and time requirements. Essentially, it is equivalent to saying “a ton of feathers, versus a ton of bricks, both weigh a ton.” Radisphere Clinical Error Analysis: 2010 – QA / Peer Review. Errors: How, What, Who, and What To Do? provided an in-depth analysis of errors going far beyond a simple tabulations of numbers of errors per radiologist. This analysis underscored what radiologists have always known: more complex studies are harder to interpret, take longer in time, and have a greater risk of making a mistake.

During the analysis of our errors, the question was asked: Is it possible to predict quantitatively which studies are of the greatest risk for a radiologist to make a significant error?

CMS has attempted to quantify the time and skill it takes to interpret studies and provide reimbursement with the Relative Value Units (RVU) methodology. RVU’s do not accurately reflect the complexity in reading a study or the potential for omission of serious pathology. A different methodology for assessment of the complexity of reading a study and for predicting the potential for making a serious error was needed.

Quantifying the complexity was now made easy by the development of lexicons, or structured reports. The “checklists” could serve as a proxy for complexity. A modifier was necessary to further distinguish interpretations of complex studies with potential for serious pathology. A modifier which exponentially increased with increasing potential for serious pathology was created. This resulted in a semi-quantitative rating for each study.

Our utilization of the CRISP method for evaluating studies resulted in a much more significant variance between studies, as compared with RVU’s. For example, a CT Soft Tissue Neck without contrast has an RVU of 1.3, while a CT Maxillofacial Sinuses also has a RVU of 1.3. By comparison, the CRISP index rating in our study was 564 for the CT Soft Tissue Neck, while for the CT Maxillofacial Sinuses the CRISP index rating was 36.

We believe that the CRISP methodology of semi-quantitative analysis of studies provides two very significant benefits for avoiding errors and assessing quality within a radiology group: predictive risk assessment and uniform quality

(5)

Predictive Risk Assessment

Using the CRISP method of evaluating the risk of all studies, a risk assessment can be assigned to all CPT codes. The riskiest top 25 studies can therefore be determined. CRISP risk assessment would have predicted the actual error rates which occurred in the 2010 analysis.

Using this information, Radisphere undertook two initiatives: Risk Alerts; and pro-active Selective Monitoring of cases by a second radiologist.

Risk Alerts: Radisphere developed a proactive program alerting radiologists to the fact that they are reading a risky study. Risk “alerts” with “must not miss” guidance was hard wired into the radiologist information system (radii™).

Selective Monitoring: Radisphere has piloted a program where the riskiest cases are viewed real time by a second radiologist, excluding or noting only the most significant pathology that should not be missed. This program is entitled:

Selective Monitoring of Accuracy with Reporting Timeliness (SMART). A concurrent reading radiologist is requested to review a study for the exclusion of predetermined significant pathologies, which should never be missed. This was piloted for CT Brain examinations. The concurrence interpreter was instructed to exclude (or identify) the following: No Calvarial Fracture; No acute CVA; No Intracranial Hemorrhage. A checklist for each exclusion was provided, with a box for a free form note from the concurrence radiologist if a positive finding was noted. The concurrence radiologist was a “second set of eyes” but did not provide a final report. The SMART review was provided to the interpreting radiologist, who then provided the final report with the added benefit of that second set of eyes.

Uniform Quality Assessment of Radiologists

All complex endeavors in which participants engage with different skill levels or perform tasks of varying complexities have a handicap system to level the field of comparison. This is usually seen in competitive sports, such a golf, Olympic diving, or

(6)

Based upon the results of evaluating radiologists using the CRISP model, a program was designed to improve the performance of radiologists in the lower tier of performers.

Radiologist Profile Adjustment: Radiologists were individually evaluated for their performance on risky examinations. Individual radiologist reading profiles were modified removing the riskiest studies. Radiologists were informed of the types of studies removed from their reading profile and were instructed to undertake additional study or CME training. Radiologists who did undergo additional training or study had their profiles re-instated to include some of the riskier studies. The radiologists were also placed on an ongoing professional focused review to confirm improvement and reduction in errors. Other radiologists have had a permanent reduction in the complexity of the studies that they are interpreting and have become successful productive radiologists with a lowered CRISP index rating. The lowering of the individual CRISP index rating for the low tier performers decreased their error rates and resulted in improved professional satisfaction for the radiologist, making them valued consultants to referring physicians.

Summary

A method for evaluating radiologists that adjusts for the complexity of the type of studies interpreted is necessary. The CRISP model has demonstrated value in providing important information which can reduce errors, fairly assess the quality of a radiologist’s work, and help radiologists to continually improve. Radisphere has integrated the CRISP model and SMART program into its quality assessment and improvement efforts as “works in progress” and is committed to continual program refinement.

About Radiology Quality Institute

Founded by Radisphere, RQI is a collaborative research organization dedicated to the identification and promotion of radiology quality standards and process improvements. With access to Radisphere’s extensive quality data, analytics, and outcomes, the Institute is focused on developing performance benchmarks and sharing relevant information to deliver measurable improvements in radiology quality for unparalleled levels of patient care. As the leading provider of standards-based radiology delivery solutions for more than 100 clients in 28 states, Radisphere is transforming the practice of radiology at health systems by establishing measurable performance standards and accountability for diagnostic accuracy, appropriate utilization, service level excellence and patient care.

About the Authors

Frank E. Seidelmann, D.O. is a diplomat of the American Board of Radiology, with a C.A.Q. in Neuroradiology and 36 years of academic and private practice experience. He is also Co-founder, Chairman of Radiology, and Chief Medical Officer of Radisphere. Douglas C. Ward is Radisphere’s Director of Professional Innovation and co-investigator of the Radisphere Clinical Error Analysis: 2010

(7)

Appendix

CRISP Analysis - CT Average CRISP – CT 152.7

Study Modality Complexity

Rating For SeriousPotential Pathology

CRISP Index

CT Abdomen With CT 21 20 420

CT Abdomen With & Without CT 21 20 420

CT Abdomen Without CT 21 20 420

CT Abd / Pel With CT 20 20 400

CT Abd / Pel With & Without CT 20 20 400

CT Abd / Pel Without CT 20 20 400

CTA Brain CT 17 20 340

CTA Abd Arteries, Lower Ext CT 27 12 324

CTA Chest CT 15 20 300

CTA Neck With CT 15 20 300

CT Soft Tissue Neck With CT 21 12 252

CT Soft Tissue Neck With & Without CT 21 12 252

CT Soft Tissue Neck Without CT 20 12 240

CT Chest With CT 11 20 220

CT Chest With & Without CT 11 20 220

CT Brain With CT 10 20 200

CT Brain With & Without CT 10 20 200

CT Brain Without CT 10 20 200

C T Cervical Myelogram CT 12 12 144

CT Cspine With CT 12 12 144

CT Cspine With & Without CT 12 12 144

CT Cspine Without CT 12 12 144

CT Tspine With CT 12 12 144

CT Tspine With & Without CT 12 12 144

CT Tspine Without CT 12 12 144

CT Chest Without CT 11 12 144

CT Pelvis With CT 11 12 132

CT Pelvis With & Without CT 11 12 132

CT Temporal Bones With CT 31 4 124

CT Temporal Bones With & Without CT 31 4 124

CT Temporal Bones Without CT 30 4 120

CT Pelvis Without CT 9 12 108

CT Temporal Bones With & Without (IAC’s) CT 18 4 72

CT Temporal Bones With (IAC’s) CT 18 4 72

CT Temporal Bones Without (IAC’s) CT 16 4 64

CT Lumbar Myelogram CT 14 4 56

CT Knee With CT 12 4 48

CT Knee With & Without CT 12 4 48

CT Lspine With CT 12 4 48

CT Lspine With & Without CT 12 4 48

CT Knee Without CT 11 4 44 CT Lspine Without CT 11 4 44 CT Foot CT 21 2 42 CT Shoulder CT 9 4 36 CT Ankle CT 8 4 32 CT Orbits With CT 15 2 30

CT Orbits With & Without CT 15 2 30

CT Orbits Without CT 14 2 28

CT Facial Bones With CT 6 4 24

CT Facial Bones With & Without CT 6 4 24

CT Facial Bones Without CT 6 4 24

CT Maxillofacial Sinuses CT 12 2 24

(8)

Endnotes

i. _{Atul Gawande, M.D. is a general and endocrine surgeon at Brigham and Women’s Hospital in Boston, Massachusetts}

and associate director of its Center for Surgery and Public Health. He is also an associate professor at the Harvard School of Public Health and an associate professor of surgery at Harvard Medical School. He has written extensively on medicine and public health for The New Yorker and Slate and is the author of the books Complications, Better, and The Checklist Manifesto.

ii. _{http://www.theinvisiblegorilla.com/}