Verbal Protocol Analysis for Software Evaluatio n

Bainbridge (1 990) remarked that there are many complex jobs i n which the outcome of thinking does not emerge in observable action.

Bainbridge argued that to be able to train and support these types of jobs, we need knowledge of the cognitive processes involved. One way to obtain such information is to ask people to "think aloud" whi l e

undertaking such tasks. These reports are known as "verbal protocols" and are essentially reports of the m ental processes used during the task.

However, the validity of such reports has been under discussion for som e time. The validity problem primarily revolves around the

information tapped during such exercises. In particular, does the individual who undertakes such a process have access to the h igher order, or "meta-cognitive," thought processes? Some would argue not.

Ericsson and Simon (1 980, 1 984) have noted that, for som e time, there has been a trend within psychology to view verbal reports as suspect data, asserting that,

"behavioursim and allied schools of thought have been schizophrenic about the status of verbal isations as data." (Ericsson and Simon, 1 980, p.21 6).

Problems with verbal reports seem to stem from the original work of Boring (1 953) which discredited the practice of classica l introspection as a val i d psychological technique. More recently the influential work of Nisbett and Wilson (1 977) has also posed problems for the potentia l use of such data. Nisbett and Wilson conducted an in-depth review of the verbal protocol research and concluded that subjects have no access to their own IIhigher mental processes" and therefore cannot rel iably, or correctly, report on them.

Nisbett and Wilson's (1 977) work has, however, been criticised (Ericsson and Simon, 1 980; Hoc and Leplat, 1 983; Praetorius and D uncan, 1 98 8). Praetorius and Duncan suggested that Nisbett and Wil so n have made inaccurate extrapolations from their results, and that it is quite natural under some circumstances for individuals to be unable to report,

IIhow we do what we do, or why we think or d o as we do.1I (Praetorius and Duncan, 1 988, p.31 0).

Specifical ly, it is argued that in many cases subjects have not been supplied with appropriate media, or means of expression, adequate for eliciting the information the investigator is seeking.

I n a n attempt to clarify the validity of verbal reports as data, Ericsson and Simon (1 980) provided an in-depth review and theoretical a m algamation of the knowledge of the verbal report method. They presented a

generalised processing model to aid in the theoretical and empirical exam ination of verbal reports as data.

Ericsson and Si mon (1 980) began with two base assum ptions. First, that a cognitive process can be seen as a sequence of internal states, which are successively transformed by a series of information processes. Secondly, that information is stored in several m emories that have

different capacities of storage and accessability. The broad distinction is between what can be referred to as short term m emory (STM) and long term m'emory (LTM). The short term component is seen as having a

l i m ited capacity with, and/or, intermediate duration and long term

memory as having a large capacity and relatively permanent storage, b ut with slow fixation and access times. Within this model it is assumed that information recently acquired by the central processor is kept in the STM and is directly accessible for further processing, whereas information from LTM m ust first be retrieved before it can be reported.

The important hypothesis advanced by Ericsson and Simon (1 980) was that,

"due to the limited capacity of the STM, only the most

recently heeded information is accessible directly. However, a portion of the STM is fixated in the LTM before ·being lost from the STM, and this portion can, at later pOints in time, sometimes be retrieved." (Ericsson and S imon, 1 980, p. 223).

As a derivative of this hypothesis Ericsson and Simon (1 980) made two major distinctions,

"First, the time of verbal isation is i mportant in determ ining from what type of memory the information is l ikely to be drawn. Second, we make a distinction between procedures in which the verbalisation is a direct articulation or

expl ication of the stored information and procedures i n .

which the stored information is input to intermediate processes, such as abstraction and inference, and the verbalisation is a product of this intermediate processing.1I (Ericsson and Simon, 1 980, p.223).

Using this processing model, Ericsson and Simon ( 1 980) have been able to predict when verbal reports wil l and wi l l not be valid. Of perhaps m ore importance is the finding that by using this model they have produced data that are consistent with the experim ental findings reported by Nisbett and Wilson (1 977).

Ericsson and Si mon (1 980) concluded that evidence of inconsistency can only be found under two possible conditions. First, when cues used to access the LTM are too general, which can result in information related to, but not identical to, the information sought to be retrieved. Secondly, when subjects use intermediate processes to infer missing i nformation, which is then used to fil l out, and generalise, incomplete memories

before responding.

Discussion aside, Nisbett and Wilson ( 1 977) concluded that individuals do have access to specific data,

"The individual knows a host of personal h i storical facts: he knows the focus of his attention at -any given point of time; he knows what his current sensations are and has what a lmost all psychologists and philosophers would assert to be "knowledge" at least quantitatively superior to that of observers concerning his emotions, evaluations, and plans." (Nisbett and Wilson, 1 977, p.255).

It would seem that it is this information that software developers would want access to in the eval uation context. Therefore, Ericsson and Si mon's (1 980) comment that,

"For more than half a century, and as the result of an unjustified extrapolation of a justified chal lenge to a particular mode of verbal reporting (introspection), the

verbal reports of human subjects have been thought suspect as a source of evidence about cognitive processes."

(Ericsson and Simon, 1 980, p.247),

has particular merit. One obvious area of contention pertains to access to higher order, or meta-cognitive, processes. This m ay, however, not b e o f primary i nterest to the software developer. S i m ply put, the designer wants reliable, valid, and practically obtainable i nformation about the usability for their product. They may not be interested in the exact nature of the mental models elicited by the interface. They want to know just how easy it is to use, where problems occur and how to fix them.

In support of the verbal protocol methodology Kirakowski and Corbett (1 990) observed that human computer interactio n is primarily stepwise, making it well suited for concurrent verbalisations. Also, Robson and Crellin (1 989) stated that protocol analysis has the advantage of being a convenient method for col lecting a rich form of data. In the methodology,

data and theory are separated, resulting in no val ue judgements i mposed by any particular theoretical perspective adopted, and the data can be readily analysed i n a number of ways. The problems with the method include identifying the correct level of analysis, l oss of detai l through data compression techniques, time and effort in analysis, and problems with interpretation during transcription.

Bainbridge (1 979, 1 990) has also documented problems specific to

verbal protocols. Operators may not document what is "obvious" to the m ; most people think more quickly than they can tal k. Furthermore, som e practical problems also may arise, such as a long period of recordi ng may be necessary to obtain a representative sample of activities. It is n ot possible if the task is verbal, and analysing data is both tim e consum i n g and difficult. Sweeney and Dillon (1 987) referr ed to the time consu ming nature of verbal protocol analysis, suggesting that analysis time o n average will take ten times the data capture time. Sweeney a n d D i llon also commented on the rel iability issues surrounding verbal protocol analysis data suggesting that,

"True protocol analysis requires the use of independent raters to score the data in terms of an agreed upon rating procedure, from which reliability measures of any

conclusions drawn from the data can be obtained • • •

However, the principle of the technique can be m ore loosely used to provide a record of i nteraction from the users

perspective and thus offer an insight i nto the effect of particular interface features on interaction." (Sweeney and

Dillon, 1 987, p.369).

Karat (1 988) noted that in practice one rarely finds a subject who g ives a quality report with little prompting, concluding that,

"a significant number of subjects will simply not provide very useful verbal reports. It is generally the case that under half of the subjects drawn from a typical undergraduate subject pool wil l provide good protocols." (Karat, 1 988, p.898).

Lund (1 985) reported the use of an aided subsequent approach to verbal protocol analysis. Here the user generated a protocol whi le viewing themselves undertaking a computer based task. The advantage of this approach is that the process of generating the protocol does not interfere with the taSk, but there may be a cost in the reliability of the now retrospective verbal data, caused by bias, and after the event rationalisation of behaviour.

Hoc and Leplat (1 983) have addressed the reliability problem by

evaluating the different modalities of verbalisation related to the "thinking aloud" kind of sorting task. In particular, Hoc and Leplat examined the efficiency of simultaneous verbalisations and the unaided, and

subsequent aided, verbal protocol analysis procedures. They found that simultaneous verbalisation slowed the process of autom ation of the activity and produced some disturbances in the execution of the task. They therefore recommended that this procedure should not be used outside problem-solving activities.

Hoc and Leplat (1 983) have also recommended that unaided subsequent verbalisation should be avoided because,

"it produces too much distance from the task and there is a risk of obtaining data which are not very valid for the activity being studied." (Hoc and Leplat, 1 983, p.302).

They stated that for a logical task, aided subsequent verbalisation was the most favourable, concluding,

"Although under these conditions a slight slowing down in the stabilisation on a procedure is noted, the data obtained are similarto simultaneous verbalisation without perturbing the execution of the task (and therefore the process being studied)." (Hoc and Leplat, 1 983, p.302).

It would seem, therefore, that the subsequent a ided verbal protocol analysis is the most appropriate verbal protocol analysis technique when exa m ining human-computer interaction. This form of protocol analysis may be accomplished by video taping the user interacting with the target system, and then playing this tape back in real ti me to the user with the ''think aloud" instruction.

Video taping human-computer interaction does, however, pose both psychological and technical difficulties. In particu lar, there is the problem of reactivity caused by the obtrusiveness of the video equipment.

Furthermore, the refresh rate of the screen can create difficulties when fil m ing the computer screen. Particularly so, if the refresh rate is m uch greater than 50 Hz. This is because a fl icker results on the video i mage reducing the resolution of the events being observed ( Laws and Barber, 1 989).

Despite the controversies associated with verbal protocol analysis, Bainbridge (1 979) remarked that,

"Preliminary interviews and observation would indicate the problems and areas of interest. Static simulation with careful interviewing would give information about both general and specific knowledge, while verbal protocols and associated observation would show the details of behaviour i n real conditions of complexity and time." (Bainbridge, 1 979, p.435).

In document A comparison of the main methods for evaluating the usability of computer software : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Psychology at Massey University (Page 84-92)