3. Methodology
3.3. Methods to study test taker response processes in listening assessment
3.3.2. Verbal protocols
Another common technique to investigate test takers’ response processes is by means of verbal protocols (sometimes also referred to as verbal reports or think-aloud protocols). This method has “become intrinsically intertwined” with research on response processes as part of collecting validity evidence (Hubley & Zumbo, 2017, p. 3). Gass and Mackey (2000) define verbal protocols as “gathering data by asking individuals to vocalise what is going through their minds as they are solving a problem or performing a task” (2000, p. 13). A landmark theoretical framework for verbal protocols was developed by Ericsson and Simon (1987, 1993). Since then, this technique has been used in a wide range of fields, including L2 listening research.
There are different forms of verbal protocols. Ericsson and Simon (1987, 1993) differentiate between “concurrent” and “retrospective” verbal reports. In concurrent reports participants think aloud while they are engaged in the activity, whereas retrospective reports are generated some time after participants finished the activity. It
51 is generally agreed that concurrent reports tend to be more valid, as they are “less susceptible to influences from unwanted variables than are retrospective reports” (Green, 1998, p. 6). However, as outlined above, concurrent reports cannot be used for researching listening processes due to the nature of the task. In such cases, Ericsson and Simon suggest conducting retrospective reports immediately after the activity is finished (1987, pp. 40–41). The time frame between the activity and subsequent recall is crucial. According to Ericsson and Simon, “[...] due to the limited capacity of STM [short term memory], only the most recently heeded information is accessible directly. However, a portion of the contents of STM are fixated in LTM [long term memory] before being lost from STM, and this portion can, at later points in time, sometimes be retrieved from LTM” (1993, p. 11).
A different distinction between verbal reports is drawn by A. D. Cohen (1987, 1996, 2011), who discusses the method specifically in relation to L2 strategy research. He differentiates between self-report, self-observation and self-revelation (A. D. Cohen, 1996, p. 13). In self-reports, learners describe “what they can do” in the form of “generalized statements about learning behaviour”. Self-observation involves reporting “specific rather than generalized behaviour, either introspectively, i.e., within 20 seconds of the mental event, or retrospectively” (A. D. Cohen, 1996, p. 13). In self- observation learners thus not merely report but also analyse their thought processes. In contrast, self-revelation is described by Cohen as a “stream-of-consciousness disclosure of thought processes”, and is thereby closest to Ericsson and Simon’s definition of verbal reporting.
A specific form of verbal protocols, which has been described in detail by Gass and Mackey (2000, 2007; Mackey & Gass, 2005), is the stimulated recall method. In A. D. Cohen’s terms, stimulated recall can be described as a form of retrospective self- observation. In contrast to other retrospective verbal report techniques, stimulated recall is characterised by the use of a stimulus, the purpose of which is “to reactivate or refresh recollection of cognitive processes so that they can be accurately recalled or verbalized” (Gass & Mackey, 2000, p. 53). This stimulus can have different forms, and should be “some tangible (perhaps visual or aural) reminder of an event” (Gass & Mackey, 2000, p. 17). For investigating cognitive processes of listening test takers, the use of individual test items as stimuli might be helpful. This is also observed by Field, who argues that “the circumstance of a listening test support retrospection well in that the participant
52 has to provide a set of answers, which provide triggers to assist recall of the thought processes that led to them” (Field, 2012, p. 35).
In more recent research, participants’ eye-movements have also been used as a stimulus to initiate retrospection. Although eye-movement metrics alone are of limited usefulness for studying listening test takers’ response processes, as the influence of the listening text on test takers’ eye-movements cannot be untangled from their reading behaviour (Salverda, Brown, & Tanenhaus, 2011; Winke & Lim, 2014), using participants’ eye-movements as stimulus for verbal recalls has been shown to provide rich and potentially novel insights (Brunfaut & McCray, 2015; Holzknecht et al., 2017; McCray et al., 2012; McCray & Brunfaut, 2016; Winke & Lim, 2014). In these studies, participants saw a video of their eye-movements while they had been solving the items to help them remember their thought processes. The authors report that this procedure was unobtrusive and that the eye-movements served as a powerful stimulus to help participants recall their thought processes.
As with every research method, there are some constraints associated with verbal protocols. Gass and Mackey stress that it is important to bear in mind that “what learners say they do is not always the same as what they actually do” (Gass & Mackey, 2007, p. 45). The danger of participants reporting inaccurately is higher for retrospective than concurrent reports, as there is some time between the actual event and the verbal report. This is referred to as the veridicality problem (Russo et al., 1989). Such memory effects can be controlled for by conducting retrospective reports as closely as possible to the activity under scrutiny, and by providing participants with a stimulus (Bowles, 2010, p. 14; Sasaki, 2013, p. 6). Another potential threat to validity is the problem of reactivity – the danger that the method itself could alter cognitive processing (Russo et al., 1989). This is especially a concern for concurrent verbal reports, but is generally thought to be less of an issue in retrospective reports, as they are produced some time after the activity is finished and therefore influence cognitive processing and strategic behaviour to a lesser degree. However, there is still a reactivity problem of participants knowing that they are going to have to provide a report after the activity. Reactivity would also play a role if retrospective reports are conducted more than once throughout an activity, as in the studies on listening assessment outlined in the following. In addition to these potential threats to validity, researchers need to be aware of practical considerations when conducting verbal reports. Collecting and transcribing reports is very labour and time intense. Analysing verbal reports is not straightforward either, as the data needs to
53 be coded, ideally according to a coding scheme (Kasper, 1998) and by more than one researcher to calculate coder-reliability. These practical constraints affect sample size, so that usually only small populations can be studied, which makes it hard to generalise the findings.
Verbal reporting has been employed in a number of investigations on test taker cognition in listening assessment (Badger & Yan, 2012; Buck, 1991; Field, 2012, 2015; Harding, 2011; Holzknecht et al., 2017; Ockey, 2007; Wagner, 2008; Winke & Lim, 2014; Wu, 1998). In all of these studies retrospective verbal reports were used, and whenever the research involved longer listening passages they were broken up into shorter passages, followed by probe questions to initiate retrospection. As outlined above, breaking up the listening passages that way minimises veridicality problems, but at the same time might introduce issues of reactivity. In addition to probe questions, some of these researchers also used the written test items (Badger & Yan, 2012; Field, 2012, 2015; Harding, 2011) or the participants’ eye-traces while they had been solving the items (Holzknecht et al., 2017; Winke & Lim, 2014) as stimuli to initiate retrospection. Such stimulated recalls (Gass & Mackey, 2000), as outlined above, further minimise problems of veridicality. All of the studies allowed students to self- analyse their thoughts, so the generated reports could be described as retrospective self- observations in Cohen’s (1996) terminology. The authors of the studies report that the method yielded rich and insightful data and, in the case of Harding (2011) and Holzknecht et al. (2017), expanded upon the results of quantitative methods used.