4.5 Evaluation
4.5.3 Usability Evaluation Methods
User evaluation is at the heart of most usability formulated processes and it is seen as the most important step in the UE life cycle (Nielsen, 1993). The are many ways to conduct usability evaluation and the method used has a direct result on the findings of the study (Olmsted-Hawala, Murphy, Hawala, & Ashenfelter, 2010). The most common forms of usability evaluation are experimental testing, thinking aloud, field observations, data logging and qualitative inquiry such as interviews and focus groups (Faulkner, 2000; Holzinger, 2005; Nielsen, 1993). Hall (2001) outlines that the most appropriate method depends on the type and complexity of the proposed
138
system, the purpose of the evaluation, the stage in the life cycle the evaluation will be carried out and the resources at your disposal in terms of equipment, money and time. A number of usability evaluation procedures were investigated in order to select the most appropriate one for this study.
Performance Measurement (Experimental Testing)
Nielsen (1993) outlines a user evaluation method that is based on performance evaluation of the user in a controlled environment, this method could be carried out in a user testing laboratory or in the work place. Many usability test laboratories use software programs for collecting (logging) data on participants performance (Dumas & Redish, 1999). Schneiderman & Plaisant (2005) describe the typical usability laboratory consisting of two ten by ten foot areas, divided by a silvered screen mirror including computers and recording equipment. Nielsen (1993) maintains that specially equipped usability laboratories are a convenience but not an absolute necessity. Wahl (2000) adds that it is possible to covert an existing office for usability testing or to use a university computer lab with no other specialty equipment.
Faulkner (2000) proposes a similar process but describes it as experimental testing. Performance measurement is used to obtain quantitative data on the tasks carried out by the participants where there is little or no interaction between the participant and the tester (Dumas & Redish, 1999). Schneiderman & Plaisant (2005) note that the tasks should be assigned possible response times or error rates from which the product can be evaluated. The system can then be benchmarked against these acceptable measures to evaluate whether is it working effectively. Faulkner (2000) describes a number of disadvantages to performance evaluation outlining that emulating the hard sciences by carrying out user evaluation in a controlled environment such as a computer testing laboratory does not provide data which is representative of the social interactions at play in a real work setting.
Performance measurement was not pursued in this research because it is not the intention to investigate the usability of a software programme, but to determine how post-processing data through 5D BIM could have an effect on the successful application of LCC on construction projects. Performance measurement does not take into account how this process could change existing work practices and thus an
139
evaluation method that could gauge the subjective attitudes of the participants engaging in a 5D BIM - LCC process was necessary.
Thinking Aloud (TA)
A method that is conscious of the user/system interaction is the ‘Thinking Aloud’ (TA) method, which provides a closer approximation to how the users actually use the system in practice (Faulkner, 2000; Monk et al., 1993; Nielsen, 1993). Holzinger (2005) asserts that the TA may be the single most valuable usability evaluation method, as it not only addresses user performance but it also generates qualitative data by probing what the user is thinking while using the system. Nielsen (1993) and Olmsted-Hawala et al. (2010) state that by verbalising their thoughts, participants can give an insight into how they use the system, which makes it easier to identify those parts of the process that cause the most issues. Nielsen (1993) belives the strength of TA is that it can find most of the usability problems from the qualitative data it can collect from a number of fairly small evaluation participants (4 participants, +or- 1). The traditional TA approach is outlined by Olmsted-Hawala et al. (2010) as ‘concurrent TA’, which is an approach where the participant is encouraged to think aloud while working on tasks set by the evaluator. There is little or no intervention in the process from the evaluator.
Monk et al. (1993) state that a TA variant known as ‘cooperative evaluation’ overcomes some of the problems with the traditional TA method by encouraging the user to explain their behaviour while using the system. This is similar to the ‘coaching method’ outlined by Nielsen (1993), where the evaluator may ask questions to stimulate the user to explain their behaviour. Dumas & Redish (1999) describe a similar process of ‘active intervention’. They outline that the advantage of this method is that you gain insights into the participants evolving mental model of the product while they use the system. Rather than just silently observing the participant, this method provides a better understanding of the problems participants encounter while using the system (Monk et al., 1993; Nielsen, 1993). These methods may take the form of coaching the user through some aspect of the system they are having problems with (Dumas & Redish, 1999). Nielsen (1993) maintains that this method is more natural than traditional TA because the users are not under pressure to verbalise their thoughts and they feel that they are helping rather than being evaluated themselves.
140
Faulkner (2000) advises that the TA cooperative evaluation method is not conducive to quantitative measurement but provides hard evidence of user interaction with the system and can be carried out a lot quicker and easier than performance evaluation. The main disadvantage with TA in general, according to Nielsen (1992), is that performance evaluation metrics such as time taken to complete tasks and error rates are distorted by the presence and interference of the evaluator. Another issue outlined by Dumas & Redish (1999) is that participants may vary in their ability to tell you what they are thinking while they work. Some participants may forget to verbalise their thoughts, so it is important that the evaluator reminds these participants with helpful prompts (Monk et al., 1993). The level of participation applied in the evaluation depends on the level of interaction permissible, which should be explicit to both parties prior to the evaluation. However, specific questions can be a powerful tool, both to remind the participant to keep talking and to get valuable information on the reasons why they followed a course of action (Dumas & Redish, 1999). Olmsted-Hawala et al. (2010) and Boren & Ramey (2000) agree that TA protocols vary widely in terms of the methods employed as researchers often ignore protocols for TA and employ different or non-standard methodologies in an ad-hoc manner. It is important that the procedures are explicit in the study so that the research may be open to academic rigor and peer review. TA cooperative evaluation was utilised in this research and is discussed in further detail, on how it was applied, in Section 4.6.3 and Chapter 6.
Field Studies and Field Observation
Holzinger (2005) claims that field methods are the simplest of all methods. Field methods are carried out in the users own working environment. In this method the observer should be virtually invisible to ensure participants are operating under normal working conditions (Faulkner, 2000). Field observation may take numerous forms, such as data logging of the system in use and video analysis (Holzinger, 2005; Shneiderman & Plaisant, 2005). Nielsen (1993) ascertains that field observation is best suited to the evaluation of the final product in the work place. Portable usability laboratories can be used on site to support more thorough field evaluation (Shneiderman & Plaisant, 2005). Development in field evaluation has led to remote usability testing which can be carried out off site by online data logging. Schneiderman & Plaisant (2005) note that the downside is that there is less control
141
over user behaviour and less chance to observe the participants subjective reactions to using the system. Data generated from logging actual use and video recording usually gives rise to statistical information which can be supplemented with questionnaires to gain the users subjective satisfaction when using the system. Field observation was not carried out in this research. This was because, similar to performance evaluation, the artifact being evaluated was not a finished piece of software and the author did not have the capability or the resources at his disposal to implement and carry out a field study.
Other Indirect Methods
The usability evaluation methods outlined above are described by Nielsen (1993) as ‘direct evaluation methods’ as they generate data that is a direct result of the user using the system, for example the time it takes them to perform a task and the number of errors they make while performing the task. Traditional research methods such as interviews, focus groups and questionnaires can be used to gather supplementary data on user beliefs (Nielsen, 1993; Shneiderman & Plaisant, 2005). Schneiderman & Plaisant (2005) outline that interviews and focus groups can be a time consuming and costly process so it is usually only carried out on a handful of the user community. These traditional research strategies can be valuable in UE as they can uncover hidden issues that are of a more subjective nature. Holzinger (2005) states that these methods are deemed indirect evaluation methods since they do not study the actual user interface but only user opinions about the user interface.
As will be discussed in further detail in Section 4.6.3 and Chapter 6, elements of traditional research approaches, such as open-ended questions in the TA cooperative evaluation have been included in this research. However, indirect evaluation methods were not selected as the main evaluation tool because an evaluation procedure was necessary which could demonstrate the process through direct contact with the artifact.