General Methods of Usability Evaluation 64

Chapter 3 Usability Literature Review 63

3.2 General Methods of Usability Evaluation 64

Usability evaluation is any analysis or empirical study of the usability of a prototype or software. The goal is to provide feedback during software development, supporting an iterative development process [85]. Nielsen[84], Preece and Benyon [86] mentioned that usability evaluation can be carried out in the different lifecycle stages in software development. Evaluation methods include: expert evaluation, observation, survey evaluation with questionnaires and interviews, logging actual use and asking users for feedback. The different methods imply different types of evaluators, different numbers of users, and different types of data to be collected. A brief review of these methods is provided below.

Expert evaluation, also known as heuristic evaluation, is normally carried out by experienced people in interface design and human factors research who are asked to describe the potential problems they foresee for less experienced users. These experts often suggest solutions for the problems they identify. This method is efficient and provides prescriptive feedback, especially in the early stage of development. However, experts should not have been involved with previous versions of the prototype under evaluation and they should have suitable experience. The role of the experts needs to be clearly defined to ensure that they adopt the proper perspective when using the prototype. The tasks undertaken and the materials given to the experts should be

representative of those intended for the eventual users. Finally, the form of reporting adopted by the expert needs to be specified so that information is obtained about the most important problems [84].

Observational evaluation implies collecting data that provide information about what users do when interacting with educational software. Nielsen[84] claimed that observing eventual users working with the system, was an extremely

important usability method for both task analysis and for information about the true field study. Several data collection techniques may be used, for example, video recording in order not to interfere with user and taking notes while observing the user. According to Preece and Benyon [86], two broad categories of data may be obtained: how users tackled the given tasks, where the major difficulties lie, and what can be done; and performance measures like frequency of correct task completion, task timing, and frequency of participant errors. Albert and Tedesco [87] evaluated the reliability of self-reported awareness measures using eye tracking data through asking usability participants if they noticed a particular element on a website or software application. They reported that in the usability testing, there is reliability in self-reported

awareness measures. Usability practitioners should feel confident in collecting self-reported awareness measures from participants, because at least most of the time when a participant reports seeing an object they actually did.

Survey evaluation aims to assess users’ opinion or to understand their

preferences about an existing or potential product through the use of interviews or questionnaires. This is a useful method for studying how users use the system and what features they particularly like or dislike. From a usability perspective, questionnaires and interviews are indirect methods, since they do not study the user interface itself, but only users’ opinions about the user interface. However, they are direct methods when it comes to measuring user satisfaction.[84] Valid and standard questionnaires to study the usability of the software can be used in different circumstance. For example, SUMI (The Software Usability Measurement Inventory, 50 questions) is a rigorously tested and proven method of measuring software quality from the end user's point of view[88]. Others like PSSUQ

[89](The Post Study System Usability Questionnaire, composed of 19 questions), QUIS[90] (The Questionnaire for User Interaction Satisfaction, composed of 27 questions), and SUS [91] (System Usability Scale, composed of 10 questions) are designed to assess user satisfaction after participation in a scenario-based usability study.

Here a little more attention will be spent on the SUS questionnaire, as it is mature and widely used and has become an industry standard referenced in more than 1200 research publications and has probably been used in many more

evaluations that have not been published.[92] SUS was developed by Brooke in 1986, which he called a “quick and dirty” scale to satisfy the need for a

questionnaire that is both short and reliable. The goal was to have a

questionnaire that could be used immediately following a laboratory test of the usability of new software or hardware. SUS was intended to provide a measure of the user’s subjective view of the usability of a system, but not intended to provide diagnostic information. [91] SUS is not the only questionnaire to measure usability, but it is one of the best. It also has proven reliability and validity, explained below.

 SUS is short with very easy scale to administer to participants, only 10 questions to be answered that are enough to assess the usability. The questionnaire is shown in Appendix 13. Analysis by Lewis and Sauro [93] and confirmed by Borsci et al. [94] showed that SUS can measure two factors, which were usability and learnability (Question 4 and 10).  SUS is a reliable and valid questionnaire, which can be used on a small

sample size. Tullis and Stetson found that at least 12-14 participants were needed to get reasonably reliable results[95]. Bangor et al. reported by comparison with other questionnaires, the internal reliability of SUS was in the range between 0.89 (SUMI) and 0.96 (PSSUQ) [96].

SUS does not measure the efficiency and effectiveness directly and accurately, but it asks about users’ attitudes about efficiency, effectiveness and

satisfaction. This attitude information is extremely valuable, as when users can and will tell what they think, it is the first step in improving the usability of an application[97]. For evaluating the prototype of a new software application, it is a reasonable choice.

Another particularly useful technical method is to use software logging.

Normally, logging is used as a way to collect information about use of a system in the field after release, but logging can also be used as a supplementary

method during user testing to collect more detailed data. This technique records the interaction between the user and the software, and automatically collects statistics about the detailed use of the system. The data is collected

automatically from a large number of users working under different

the data usually consists of a time-stamped log of user input and software responses, it is possible to reconstruct what the user was doing and the time spent on each feature, and also to analyse the frequency of use of certain

features. However, a major problem with logging data is that it only shows what the users did but not why they did it. It is possible to combine logging with other methods, such as interviews, where users are shown data about their own use of the system and asked to elaborate on whatever interesting phenomena may be evident in the data. For example, a user who did not use a certain feature in a system might be asked why they did not use this feature.[84] The way of using logging suggested by Nielsen is through collecting statistics of low level usage data, related questions can then be prepared and asked in later interviews. However, measuring the low level of interaction is not the target of this project, the aim of using logging should focus on understanding the educational

implications of IWE, for example, whether the time students spend on using the tool match the teacher’s prediction or not.

User feedback is a major source of usability information, if the developer

collects the user’s feedback and responds to this useful information. Very often there is a tendency to get user feedback from dissatisfied users, who complain about a feature in the system while using it, or from the most vocal users. So the user feedback may not always be representative of the majority of users. It is recommended that other users should be actively sought out and observed or questioned. However, user feedback has several advantages:

1. It is initiated by the users, so it shows their immediate and pressing concerns.

2. It is an ongoing process, so feedback will be received without any special efforts to collect it.

3. It will quickly show any changes in the users’ needs, circumstances, or opinions, since new feedback will be received whenever such changes occur. [84]

Other methods introduced by Rubin [98] can also be applied such as: experimental evaluation, focus group, walk-through, paper-and pencil

have to be used for a single usability study, but normally, a combination of methods is selected according to the needs and constraints of a project [84].

3.3 Extra Work for the Usability Evaluation of Educational

In document An authoring and presentation environment for interactive worked examples (Page 73-77)