CHAPTER 4: CRITICAL DECISION METHOD: A REVIEW OF STUDIES AND
4.4. Methodological rigour in conducting an inquiry using the Critical Decision
One of the most effective ways of measuring the trustworthiness of the findings of a particular study is by evaluating them in relation to the procedures and methods used in generating them (Graneheim and Lundman, 2004; Sandelowski, 2000a; Lincoln and Guba, 1985; Patton, 2002; Bryman, 2006; Creswell, 2003). In a recent review of the existing knowledge elicitation methodologies, Cooke (1994) noted that though there is no shortage of methods, the lack of compelling evidence on the modes of evaluation remains a challenge. As a result, the issue of reliability, validity and generalizability of the CDM has therefore received more methodological attention in recent years than it has in the past.
4.4.1. Reliability of the CDM method
According to Gordon and Gill (1997), some of the questions commonly used to challenge the reliability of the critical decision method are: can participants be expected to report the same details when asked about the same incident at a later time? Can participants be expected to identify the same proceedings in the timeline (decision points, critical cues, action etc.)? Furthermore, issues have also been raised about the reliability of the procedures used to analyse the CDM data and decision points in particular (Hoffman, Crandell and Shadbolt, 1998). In this regard, sceptics have asked whether independent data analysts would generate the same results from the coding of raw CDM data. Finally, the reliability of the identified decision points has also been questioned.
Some authors have also questioned the retrospective nature of the critical decision method (e.g. Nisbett and Wilson, 1977; Ericsson and Simon, 1993), arguing that individuals do not always accurately report information that has to do with the recollection of past events due to the inherent limitations of the human memory. In line with this argument, it is believe that the exact circumstances surrounding an incident can never be recreated, and that once interviewed, the interviewee’s
164 memory about the event will alter to some unknown degree (Maqsood, Finegan and Walker, 2004). In addition, sceptics have often laid emphasis on the effect of
hindsight bias i.e. the tendency to view events as more predictable than they really
were (Turner, 1976; Messick and Bazerman, 1996; Kahneman, 2011). Hindsight bias has been attributed to the main reason why people will attempt to cover up their mistakes and report only the aspects of an incident that favour them (Ericsson and Simon, 1980; Kahneman, 2003; Weitzenfeld et al., 1990; Dickson, McLennan and Omodei, 2000).
Despite all the questions and concerns regarding the validity and reliability of the critical decision method, most empirical studies, particularly those that utilized real experts, have persistently shown that the critical decision method has proved to be effective and reliable in eliciting expert knowledge. (Klein et al. 1988; McLennan et
al. 2006; Burke and Hendry, 1995; Wong, 2004; Lipshitz et al., 2007). Although it
should be noted that CDM experts do not deny the possible limitations associated with the use of retrospective verbal protocol in knowledge elicitation, they simply suggest that some of the criticisms tagged with the method are slightly exaggerated (see Flanagan, 1954; Klein et al., 1989; Hoffman et al. 1998 for a review of the CDM protocol). For instance, in their research with fire fighters, Klein, Calderwood and Clinton-Cirocco (1988) observed that most of the very challenging incidents in the career track of the officers were vividly remembered and that many of the non- routine events were reported more accurately and completely than the routine ones (Eraut, 2004; Calderwood, Crandall and Baynes, 1990). This holds true even for incidents dated as far back as 10 years or more (Crandall, Klein and Hoffman, 2006).
In their review of the critical decision method, Klein et al. (1989) suggested that the CDM minimizes hindsight bias and other cognitive biases through the same strategies with which it enhances incident recall. These include allowing the same story to be narrated at least twice throughout the duration of the interview (see step 2, 3 and 4 above). The “rule of thumb” is that the more participants are committed to going over an incident, the less likely are there to be discrepancies or variations in the generated CDM data. The incident timeline phase (which allows a timeline of the
165 various events that happened throughout the incident to be sketched), in addition to the fact that participants are allowed to refer back to their log books and registers (in the event they could not remember certain things about the incident) — have both played important roles in enhancing memory recall. Furthermore, the CDM probe questions, regarded as one of the greatest strengths of the method, have also proved useful in reducing any form of inconsistency between what was initially narrated and the subsequent answers provided by participants to each of the probe questions (O’Hare et al, 1998).
One of the ways in which the issue of reliability has been mostly addressed is through the use of inter-coder agreement (i.e. the level of agreement between two or more independent judges regarding the coding result of interview data). Inter-coding reliability checks have been used to show a high level of coding agreement across a range of CDM studies (Hoffman, Crandall and Shadbolt, 1998; Klein et al., 1989; Hoffman et al., 1995). For example, in a study involving wild-land fire ground commanders, Taynor et al. (1987) utilized two independent judges to code for “decision strategy” across 29 decision points; the corresponding calculation of agreement yielded a rate of 87%. In another study by Calderwood, Crandall and Klein (1987), two independent judges also attempted the classification of the decision strategies from 18 decision points and found their rate of coding agreement to be about 89%.
4.4.2. Content validity
In addition to the issue of reliability, the internal or content validity of the critical decision method has also been addressed in the cognitive task analysis literature (c/f Hoffman and Militello, 2008). The questions posed under this theme can be framed in terms of the quality of data generated from a CDM procedure i.e. examining how comprehensive, accurate, inclusive, and precise such data are. The question can also be framed in terms of the informational content of the data e.g. does the method yield true information about the concepts, principles, decision making styles etc. of the particular domain which was investigated?
166 In their assessment of the content validity of some CDM studies, Hoffman et al. (1998) reported the relevance and value of the products developed from these studies. For instance in a study conducted in the domain of neonatal intensive care unit, Crandall and Getchell-Reiter (1993) interviewed 22 experienced nurses (mean length of experience, 13years) using the CDM protocol. Findings from the study revealed certain diagnostic cues such as muscle tone, sick eyes, edema, clotting problems, a few of which were found to be opposite of what the existing cues (i.e. indicators of infection in adults) were known to be. Also, interestingly, more than one- third of the cues that were discovered in the study appeared to be novel in the medical literature at the time. Following the outcome of the study, Crandall and Getchell-Reiter (1993) went further to conduct a validity check on the identified diagnostic cues, based on independent assessments made by a group of experts. The experts who comprised independent NICU nurses, clinical and specialist nurses and research based nurses were all found to favour the findings from the study, giving credence to the theoretical and practical relevance of such findings.
In another CDM study, Wong et al. (1996) interviewed ambulance dispatch officers at the Sydney ambulance coordination centre and used the knowledge elicited from the officers as the basis for the design of a more efficient decision aid. The content validity of the CDM output was attributed to the ability of the authors in transforming the manual system used for collecting and processing information at the ambulance call centre to a more efficient computer based system.
The content validity of the findings from the current study was mainly assessed through discussion with the author’s supervisors and from expert scrutiny. As stated earlier, findings from the study have been published in two different peer-reviewed journals (Okoli et al., 2014; Okoli et al., 2015) and in a conference proceeding (Okoli
167
4.4.3. Generalizability of the CDM outputs
As stated earlier, a common criticism of qualitative inquiry relates to its methodical dependence on small samples, which critics believe renders conclusions from such studies incapable of generalization (see Myers, 2000 for example). The term 'generalizability' means the degree to which the findings from a study sample can be generalized to the wider population (Marshall, 1996). In other words, can the conclusions reached in a single study be successfully applied beyond the scope of the instances investigated?
It should be emphasized at this juncture that qualitative studies, and CDM studies in particular, are not generalizable in the literary use of the word, neither do they claim to be (Stake, 1980; Myers, 2000; Wong and Blandford, 2002). Rather they seem to be imbued with other redeeming features which make them highly valuable for transferability to other domains. Every single incident reported in a CDM study is treated as a unique source of data and analyzed for the purpose of theme development, thereby making the issue of generalizability less significant. Studies in the CDM literature have shown substantial records of making significant contributions from and to a wide range of disciplines such as psychology, education, nursing, aviation etc. in diverse ways (Klein, Calderwood and McGregor, 1989; Crandall and Gretchell-Leiter, 1993; Hoffman et al., 1995; Schraagen et al., 2000; Hutton, Miller and Thordsen, 2003; Hutchins, Pirolli and Card, 2004; Clark et al. 2006). Most of the frameworks, theories, models, training needs and conceptual graphs generated from these studies have continued to help in bridging the gap between theory and practice, especially in the aspect of developing instructional designs (see Hoffman et al., 1995; O’Hare et al., 1998 for a review)