After receiving an excessive amount of KCl by the diagnosing physician, the patient de- scribed in Box 3-1 was prescribed an additional dose of KCl by a second physician. Al- though told by the first physician to review the patient’s KCl levels, the second physician was unaware that the patient was already being administered an excessive amount of KCl due to several factors. These factors included the following:
1. The previous physician did not explicitly inform the second physician that KCl was al- ready ordered.
2. The KCl IV drip did not appear in the CPOE system’s medication list because IV drips are not displayed in the CPOE’s medication list.
3. The CPOE only showed the patient’s lab results before the administration of the KCl drip ordered by the first physician; therefore, the second physician only saw the patient’s previously low levels of KCl (Horsky et al., 2005).
Although the second physician had checked the previous physician’s notes, medication history, and lab reports, there was no indication that the patient was already receiving KCl. These factors, including the poorly designed CPOE interface, may not be identified in a sin- gle event chain, yet each independently contributed to the patient’s excessive KCl levels. Looking for a single “root cause” responsible for the patient’s adverse condition would fail to address the other factors that may continue to put future patients at risk.
The primary lesson from this perspective on safety can be described as the following: “Task analysis focused on action sequences and occasional deviation in terms of human errors should be replaced by a model of behavior-shaping mechanisms in terms of work system constraints, boundaries of acceptable performance, and subjective criteria guiding adaptation to change” (Rasmussen, 1997).
THE NEED FOR AN EXPLICIT EVIDENCE-BASED CASE FOR SAFETY
IN SOFTWARE5
Safety has no useful meaning for software until a clear understanding is achieved regarding what the software should and should not do and under what circumstances these things do and do not happen. (In this context, safety refers to claimed properties of software that make it safe enough to use for its intended purpose.)
When safety is at issue, the burden of proof falls on the software developer to make a con- vincing case that the software is safe enough for use. The audience for the case differs depending on the situation at hand. For example, it may be the software vendor who must make the safety case to a prospective purchaser of its products, or to an entity that might provide a safety certifi- cation for a given product. Once the software has been developed, installed, and the relevant processes and procedures put into place for proper software use, it may be the health care organi- zation that must make the safety case for the overall system—that is, the software as installed into a larger sociotechnical system—to an external oversight organization responsible for ensur- ing the safe operation of care providers.
Such a case cannot be made by relying primarily on adherence to particular software devel- opment processes, although such adherence may be part of a case for safety. Nor can the safety case be made by relying primarily on a thorough testing regimen. Rigorous development and testing processes are critical elements of software safety, but they are not sufficient to demon- strate it. Developing a comprehensive case for safety that can be independently assessed depends on the generation, availability, communicability, and clarity of evidence. Three elements are ne- cessary to develop a case for safety:
• Explicit claims of safety. No software is safe in all respects and under all conditions. Thus, to claim software is safe, an explicit articulation is needed of the requirements and properties the software is expected to possess and exhibit in use and the as- sumptions about the environment in which the software operates and usage models upon which such a claim is contingent. Explicit claims of safety further depend on the inclusion of a hazard analysis. Hazard analyses ought to identify and assess po-
tential hazards and the conditions that can lead to them so that they can be eliminat- ed or controlled (Leveson, 1995).
• Evidence. A case for safety must argue that the required behavioral properties of the software are a consequence of the combination of the actual technology involved (that is, as implemented), users, the processes and procedures they use, and other aspects of the larger sociotechnical system within which the technology is embed- ded. All domains of the sociotechnical system must be taken into account in the de- velopment of a case for safety. Evidence acquired from testing the software will be part of this case, but “lab” testing alone is usually insufficient. The case for soft- ware safety typically combines evidence from testing with evidence from analysis. Other evidence also contributes to the safety case, including the qualifications of the personnel involved in the system’s development, the safety and quality track record of the organizations in building the system’s components, integration of the compo- nents into the overall system, and the process through which the software was de- veloped. Furthermore, the safety case must present evidence that use of the technol- ogy in the actual work environment by real clinicians with real patients
demonstrates functioning without a level of malfunction greater than that specified in the design requirements.
• Expertise. Expertise—in software development and in the relevant clinical domains, among other things—is necessary to build safe software. Furthermore, those with expertise in these different contexts must communicate effectively with each other and be involved at every step of the design, development, and component integra- tion process.
When software is complex, it can be difficult to determine its safety properties. An analytical argument for safety is easier to make when global safety properties of the software can be in- ferred from an analysis of the safety properties of its components. Such inferences are more like- ly to be possible when different parts of the system are designed to operate independently of each other.
Achieving simplicity is not easy or cheap, but simpler software is much easier for indepen- dent assessors to evaluate, and the rewards of simplicity far outweigh its costs (NRC, 2007). Pit- falls to avoid include interactive complexity, in which components may interact in unanticipated ways and a single fault cannot be isolated but it causes other faults that cascade through the software. Avoiding these characteristics both reduces the likelihood of failure and simplifies the safety case to be made.
Most important to developing a plausible case for safety is the stance that developers take toward safety. A developer is better able to make a plausible safety case when it is willing to provide safety-related data from all phases in the components’ or software’s life cycle, to ensure the clarity and integrity of the data provided and the coherence of the safety case made, and to accept responsibility for safety failures. One report goes so far as to assert that “no software should be considered dependable if it is supplied with a disclaimer that withholds the manufac- turer’s commitment to provide a warranty or other remedies for software that fails to meet its de- pendability claims” (NRC, 2007).
With respect to health IT, it is not often that health care organizations make an explicit case for the safety of health IT in situ, and not often that vendors make an explicit case for the safety of their health IT products.
THE (MIS)MATCH BETWEEN THE ASSUMPTIONS OF SOFTWARE DESIGNERS AND THE ACTUAL WORK ENVIRONMENT
Generally, health IT software is created by professionals in software development, not by clinicians as content experts. Content experts are usually provided with multiple opportunities to offer input into the performance requirements that the software must meet (e.g., users brief soft- ware developers on how they perform various tasks and what they need the software to do, and they have opportunities to provide feedback on prototypes before designs are finalized). Tradi- tionally, technology development follows a process where users of the technology articulate their needs (or performance requirements) to developers. Developers create technology that performs in accordance with their understanding of user needs. Users then test the resulting technology and provide feedback to developers. Developers provide a new version that incorporates that feedback and, when users are satisfied with the technology, the developer assumes it is suitable for use in the user’s environment and delivers the technology. However, software developers and clinicians generally come from different backgrounds, making communication of ideas more dif- ficult. As a result, these processes for gaining input rarely capture the full richness and complexi- ty of the actual operational environment in which health professionals work and vary enormously from setting to setting and practitioner to practitioner.6
Deviations Versus Adherence to Formal Procedures
Indeed, in most organizations, guidance provided by formal procedures is rarely followed ex- actly by health professionals. Although this lack of user predictability can dramatically increase the difficulty for the software developer to deliver the degree of functional robustness required, deviations between work-as-designed and work-in-practice (work-in-practice is sometimes re- garded as a workaround) are not necessarily harmful or negative. Such deviations are necessary under circumstances not anticipated by rules governing work-as-designed. In some cases, devia- tions are necessary if work is to be performed at all (Kahol et al., 2011).
Deliberate deviations between work-as-designed and work-in-practice are smallest when sig- nificant changes are made to the work environment—and the introduction of new technology usually counts as a significant change. Deviations are smallest during this period of introduction because practitioners are unfamiliar with the new technology and are learning about its capabili- ties for the first time. But, as practitioners become more familiar with the new technology, the limitations imposed by the new technology become more apparent in the local work environ- ment. Practitioners thus develop modified—possibly unsanctioned—practices for using the tech- nology that account for the on-the-ground requirements of doing work; this process is sometimes known as “drift” (Snook, 2002) or workarounds and reflects the phenomenon of local rationality in which practitioners are all doing reasonable things given their limited perspective but the mod- ified practices result in poor outcomes (Woods et al., 2010).
Sometimes modified practices are needed to manage conflicting goals that arise in an opera- tional environment (e.g., pressures for speedy resolution vs. pressures for collecting more data) (Woods et al., 2010). Under some circumstances, adherence to prescribed procedures can indeed result in unsafe outcomes. While modified practices may be required to make the overall system safer, the modified practices themselves can sometimes result in unsafe outcomes. Almost by
6 For example, Suchman (1987) argues that user actions depend on a variety of circumstances that are not explicitly related to the task at hand. In practice, the behavior of people varies if they are in the presence of other people (when they can ask for advice about what actions to take), for example if they are unusually pressed for time (in which case they may take possibly risky short-cuts).
definition, the situations for which the use of the modified practices is unsafe occur only rarely. Practitioners adopt the modified practices to cope more effectively with frequently occurring sit- uations, but the modified practices have mostly not been developed with the rarely occurring sit- uation in mind.
Herein lies a critical safety paradox. Practitioners following the prescribed procedures may be unable to complete all of their work, which may motivate them to use nonstandard or unap- proved approaches. If a disaster occurs because they did not follow the prescribed procedures in a given instance, they may be blamed for not following procedures. As discussed previously, un- safe outcomes result not from human failures per se but rather from the way the various compo- nents of the larger sociotechnical system interact with each other.
Clumsy Automation
A particularly relevant illustration of mismatches between the assumptions of software de- signers and the actual work environment can be seen in the notion of clumsy automation (Woods et al., 2010). Clumsy automation “creates additional tasks, forces the user to adopt new cognitive strategies, [and] requires more knowledge and more communication at the very times when the practitioners are most in need of true assistance” (see Box 3-4). At such times, practitioners can least afford to spawn new tasks and meet new memory demands to fiddle with the technology, and such results “create opportunities for new kinds of human error and new paths to system breakdown that did not exist in simpler systems” (Woods et al., 2010).
BOX 3-4
Mismanaging KCl Levels