• No results found

7. Evaluation

7.3 Observational Methods

Observation has always been an important, if informal, part of interface evaluation but a growing dissatisfaction with formal and experimental methods, for reasons outlined above, has led to an increasing interest in developing means of assessing usability in context which amount to more than casual observation.

Part of this dissatisfaction also develops from a feeling that the theory or principle being tested is often put at the centre of concerns, rather than the user. This has led to a number of user-centered approaches to

evaluation. As a beginning, one can solicit user responses and opinions via interviews and questionnaires. However, the usefulness of this approach depends upon such factors as the amount of bias or ambiguity inherent in the questions themselves and possible bias in the questionnaire respondents. Questionnaire and interview results cannot be counted as objective data but can be useful as a backup to experimental findings.

A more radical approach to user-centred evaluation, akin to the ethno-methodology of certain anthropologists, eschews all artificial constraints and allows free use in work contexts. The philosophy behind this approach is that the user's experience is central and that all interpretation of it by the observer is invalid. Evaluation is conducted by the user with a 'co-evaluator' and the data are the discussions which ensue

(Whiteside et. al., 1987). Such methods, however, do not normally and are

not intended to provide generalisable results. C ontextuality and generalisability are, as usual, in inverse proportion.

Another radical departure from the 'traditional' view in HCI that designs are arrived at through the application of validated principles stems mainly from two observations. The first of these is that innovation almost invariably precedes the development of theory. The second is that when an 'artifact' is introduced into a work context, it invariably alters that context and this tends to invalidate the design based on the previous context. Any subsequent alterations suffer the same fate and so the whole business of design is necessarily an iterative process; evaluating artifacts in terms of how they support user tasks in the context of use and modifying them accordingly. Any theory inevitably embodied in an artifact arises out of this process rather than preceding it. This kind of position is espoused by

Carroll and others (Carroll et.al., 1991). Approaches of this kind, unlike

the ethno-methodological account, still carry a m inim al amount of psychological 'baggage' in terms, for example, of being structured around

Norman's generic task model (Carroll et. al., op. cit.).

One of the problems with debriefing interviews with users is that subjects may not accurately remember significant aspects of their

experience. A way around this is to get them to provide a verbal protocol while the interaction is proceeding. In this way, the intentions and events which concern the user most and responses to them will naturally enter into their monologue and the researcher will gain some degree of access to the user’s 'live' thought processes. However, it should be added that the advantage of this must be set against the occasional unwillingness of some experienced subjects to share their expertise or inability to explain actions which have been 'internalised' (Diaper, 1989).

Generally, protocol analysis is a method for gaining insight into the psychological processes of an individual engaged in some task or activity. It consists in encouraging the subject to 'think out loud' by way of explanation of the thought processes which underlie and motivate current actions and psychological responses to stimuli received in the course of the activity. Since it is impractical to attempt to analyse these 'monologues' concurrently or recall them for later analysis, it is usual to undertake audio and video recording and sometimes to electronically 'log' user actions and system responses in a computer file.

In addition to the prior request for a verbal protocol, the researcher may prompt the subject from time to time to reveal current mental machinations if the subject should lapse into quiet performance or if a particularly interesting episode should occur. Of course, not all subjects will find it easy to provide such commentaries and the necessity for doing so can easily disrupt task performance. For this reason an alternative method is to have the subject provide a commentary over a recording of their prior performance and to record this commentary. The disadvantage here is that subjects may tend to 'over-rationalise' their previous actions and again not remember their mental machinations accurately enough to be useful.

With either method, another problem can be shyness or reticence on the part of the subject and every effort must be taken to remove any feelings of being under examination (Monk & Wright, 1991). This kind of approach is especially useful for identifying communication breakdown between user

and system. It allows the researcher to focus in on fairly minute aspects of the interaction and to tease out details of miscues and misconceptions. This kind of analysis is extremely laborious and time consuming to perform, since hours of video tape and log records can result from a relatively small number of subjects. Again, results will rarely submit to any significant generalisation, although attempts have been made to identify 'interaction scenarios', which describe typical situations of computer use, as basic units

of interaction which do generalise across contexts (Carroll et.al., op. cit.).

For the purposes of evaluating the research presented here, it was decided to utilise two approaches. A 'traditional' experiment was devised to try to compare two alternative interface designs and a protocol analysis was conducted so that fine details of interaction might be studied so as to evaluate the hypothesis proposed concerning the cognitive causes of context error.

The software simulation of part of the UNIX® file system described in chapter six was used for the experiment. Although it might have been possible to sample 'real' users, perhaps over a prolonged period, by monitoring them in a live UNIX® environment, it was decided to use a simulation so that greater control could be exercised over the facilities and functions of the system. The experimental variable was the presence or absence of the QDOS context error module in the simulation. At the time when the experiment was conducted, the version of QDOS developed was not capable of reasoning about process-based but only state-based context errors.

The simulation logs user commands so that this record can be scrutinised for causal antecedent actions in context error situations (in theory, the error analyser would look for prior states implicated by the error but in practice, it is easier to record actions which alter states and to look for these actions. This also has the effect of limiting the set of possible worlds considered to those most like the actual world, as described in chapter four). When context errors occur, the help system checks the log for a likely cause. For example, if a file is not found, the help system

checks for deletions, moves or renames of the file and also for its presence in the previous working directory. If such a causal antecedent is found the action is notified to the user as a definite or likely reason for why the current action failed. The simulation also logs all errors, so that an analysis of their relative frequency can be made, and the final state of the file system, so that a measure of task completion can be calculated.