Applying Heuristic Evaluation to Human-Robot Interaction Systems

(1)

Applying Heuristic Evaluation to Human-Robot Interaction Systems

Edward Clarkson

[email protected]

I

NTRODUCTION

The attention paid to human-robot interaction (HRI) issues has grown dramatically as robotic systems have become more capable and as human contact with those systems has become more commonplace. Along with the development of robotic interfaces, there has been an increased but lagging interest in the evaluation of these systems by a variety of methods. Lacking from the HRI literature are examinations of two issues that have long been explored in traditional HCI research: formative and discount evaluation techniques (which we define and discuss in more detail later).

One technique that satisfies both of these descriptions is heuristic evaluation [19]; we propose the development a set of heuristics for use in HRI evaluation, and detail the process for its further development and validation (based on the methodology of other researchers who have adapted Nielsen’s standard heuristics to other domains [4], [17]). First, we present a brief survey of relevant HRI literature before providing an overview of heuristic evaluation and a justification for its efficacy in HRI.

H

UMAN

-R

OBOT

I

NTERACTION

S

YSTEMS

The recognition of importance of HRI has spawned an enormous variety of research into novel systems and techniques. Robots have been adapted for use in risky situations, such as urban search-and-rescue (USAR) [7]. Other canonical applications include service to the infirm [15], surgery [30], museum tours [23], and companionship and entertainment (e.g., Sony Aibo and Asimo). Most of the interfaces to these systems are through specialized software interfaces, but some (like the Aibo) are largely through the physical system itself. Some of the robotic applications in the field of affective computing share this characteristic [5], using emotional simulations to attempt more natural interactions betweens humans and robots.

These and many more systems have been cataloged in a number of extensive literature reviews. The Sandia National Laboratory released a survey of a wide range of human-machine interfaces (HMIs), focusing on robotic systems [18]. Agah performed a review of interactive intelligent systems, also characterizing each project in terms of application, research approach, system autonomy, interaction distance, and interaction media [2].

A similar, more recent study limited only to HRI interfaces was performed by [8]. The extent of HRI research is such that even large surveys have been presented on relatively narrow HRI sub-topics: Fong, Nourbakhsh and Dautenhahn conducted a review of socially interactive robots [12], and Haigh and Yanco examined the application of technology to elder care, including the use of robotic systems and human factor considerations for assistive tools.

H

UMAN

-R

OBOT

I

NTERACTION

T

AXONOMIES

Such a variety of research has spawned a need for organizational studies in the form of research taxonomies, both to categorize existing work and to guide the development of future projects. Aside from the work of Agah mentioned previously [2], there have been a number of other efforts targeted at classifying HRI systems specifically. Many efforts on classifying HRI research tend to focus on a particular aspect of the field (e.g., interaction roles).

A number of researchers have examined issues particular to multi-agent robot teams: Dudek et al. exhibit an overall classification of multi-agent systems [8]. As a part of his Ph.D. dissertation, Ali presents a task analysis of multi-agent mobile telerobotic systems [3]. Scholtz puts forward five roles for humans in HRI systems in [27], and also suggests four dimensions in which HRI is ‘fundamentally different from typical human-computer interaction.’ Yanco and Drury [32] draw from work in the Computer Supported Cooperative Work (CSCW) sub-area of HCI as the basis for a taxonomy of HRI systems (both single- and multi-agent, and with any level autonomy).

H

UMAN

-R

OBOT

I

NTERACTION

E

VALUATION

A consistent theme, however, throughout most of the literature on HRI systems, is the relative paucity of work on evaluation of those systems, especially before 2000. Georgia Tech’s Skills Impact Study notes that most researchers have contributed “lip service” to evaluation, but not many actual experimental studies [13], and [31] makes similar statements, noting that scant work has gone into assuring that HRI displays and interactions controls are at all intuitive for their users. There are researchers who have recently come to recognize the need for evaluation in general and for developing evaluation

(2)

guidelines. One of the recommendations put forth as part of a case study on USAR at the World Trade Center site [7] was for additional research in perceptual user interfaces. A recent DARPA/NSF report [6] also proposes research in evaluation methodologies and metrics as one productive direction for future research, citing the need for evaluation methods and metrics that can be used to measure the development of human-robot teams.

Among evaluation research that has been conducted, a common approach has been case studies of various systems in the field [31] [26] [23]. The number of controlled lab studies of HRI interfaces has also increased over recent years, with most studies focused on comparing various interface alternatives for teleoperation [16] [24] or robot plan specification software [10]. Various subjective and objective measures have been used in these studies. Qualitative pre- and post-test subject questionnaires, interviews and experimenter observation are common (much like many in-situ HCI studies). Quantitative, empirical measurements (in both field and lab studies) have included task completion time, error frequency, and other ad hoc gauges particular to the task environment [31]. The NASA Task Load Index (NASA-TLX) measurement scale has also been used as a quantitative measure of operator mental workload during robot teleoperation and partially autonomous operation [26] [16].

F

ORMATIVE AND

D

ISCOUNT

E

VALUATION

T

ECHNIQUES

A common thread (with a few exceptions, e.g., [23]) among HRI evaluation literature is the focus on formal summative studies and techniques. Summative evaluations that are applied on an implemented design product to judge how well it has met design goals; this is in contrast to formative evaluations, which are applied to designs or prototypes with the intention of guiding the design or implementation itself. Note that this distinction concerns the way and when a technique is applied and not the method itself. However, many evaluation techniques are obviously better suited to one application or the other. We have already noted that research is needed into techniques that can be used to judge the progress of HRI systems [6]. We can take progress in the macro sense (the advancement of HRI as a field) or micro (the development of an individual project), but both interpretations are important to HRI. In the current state of affairs, there are relatively few tools to judge HRI systems adequately. We can make a few comments about the current situation: the focus on summative applications seems to come at the expense of formative evaluations in

the development of most HRI systems (along with other principles of user-centered design [1]). Also, a contributing factor to this paucity is a lack of evaluation methods that are both suited to formative studies and have been successfully demonstrated on HRI applications specifically.

Discount evaluation techniques are methods that are designed explicitly for low cost in terms of manpower and time. Because of these properties, discount evaluations are often applied formatively. One such approach whose popularity has grown rapidly in the HCI community is heuristic evaluation (HE)1. HE is a type of usability inspection method, which is a class of techniques involving evaluators examining an interface with the purpose of identifying usability problems. This class of methods has the advantage of being applicable to a wide range of prototypes, from detailed design specifications to fully functioning systems. HE was developed by Nielsen and Molich [19] and has been empirically validated [20]. In accordance with its discount label, it requires only a few (three to five) evaluators who are not necessarily HCI (or HRI) experts (though it is more effective with training).

The principle behind HE is that individual inspectors of a system do a relatively poor job, finding a fairly small percentage of the total number of known usability problems. However, Nielsen has shown that there is a wide variance in the problems found between evaluators, which means the results of a small group of evaluators can be aggregated to uncover a much larger number of bugs [19]. Briefly, the HE process in particular consists of the following steps [21]:

• The group that desires a heuristic evaluation (such as a design team) performs preparatory work, generally in the form of:

1. Creation of problem report templates for use by the evaluators.

2. Customization of heuristics to the specific interface being evaluated. Depending on what kind of information the design team is trying to gain, only certain heuristics may be relevant to that goal. In addition, since canonical heuristics are (intentionally) generalized, heuristic descriptions given to the evaluators can include references and examples taken from the system in question.

• Assemble a small group of evaluators (Nielsen recommends three to five) to perform the HE. These evaluators do not need any domain knowledge of usability or interface design.

1

Extensive notes on heuristic evaluation are available on:

(3)

• Each evaluator independently assesses the system in question and judges its compliance with a set of usability guidelines (the heuristics) provided for them.2

• After the results of each assessment have been recorded, the evaluators can convene to aggregate their results and assign severity ratings to the various usability issues. Alternatively, an observer (the experimenter organizing and conducting the evaluation) can perform the aggregation and rating assignment.

HE has been shown to find 40 – 60% of usability problems with just three to five evaluators (hence Nielsen’s recommendation), and a case study showed a cost to benefit ratio of 1:48 (as cited in [21]). For those reasons, among others, HE has proven to be very popular in both industry and research. Of course, the results of an HE are highly subjective and probably not repeatable. However, it should be emphasized that this is not a goal for HE; its purpose is to provide a proven evaluation framework that is easy to teach, learn and perform. Its value thus comes from its low cost, its applicability early in the design process, and the fact that even those with little usability experience can administer a useful HE study. These features, which make it such a popular HCI method, also make it useful for gauging HRI systems.

H

EURISTIC

E

VALUATION AS

A

PPLIED TO

HRI

The problem with applying heuristic evaluation to HRI, however, is the validity of using existing heuristics for HRI. Scholtz states that HRI is “fundamentally different” from normal HCI in several aspects, and Yanco, et al. acknowledge HE as a useful HCI method, but rejects its applicability to HRI because Nielsen’s heuristics are not appropriate to the domain. There are many issues listed as differentiating factors between HRI and HCI/HMI, including complex control systems, the existence of autonomy and cognition, dynamic operating environments, varied interaction roles, multi-agent and multi-operator schemes, and the embodied nature of HRI systems. Despite these domain variations, the next obvious question is whether it is possible to form a new set of heuristics that are pertinent to HRI systems?

If we rely on the HCI literature, the answer is ‘yes’. Alternative sets of heuristics have already been developed for domains outside the traditional on-the-desktop software sphere. Confronted with similar problems— Nielsen’s heuristics did not address the focus of CSCW applications (teamwork)—[4] adapted heuristics for use

2

See Appendix A for a list of Nielsen’s standard heuristics.

with the evaluation of groupware applications. Similarly, [17] produced a heuristic list for ambient displays.3 In each case, the new heuristic lists were developed and validated according using similar methodologies, each based in part on Nielsen’s original work.4

The development of the CSCW guidelines was a lengthy process, begun in 1999 [14] and continued until 2002 [4]. Initially, the CSCW heuristics were based on the Locales framework of social interaction. Modification to the list continued via case studies and the use of the mechanics of collaboration framework, and was studied empirically in [4]. Their study involved a large number (27) novice and expert inspectors using their heuristics to evaluate two groupware applications. Their results were similar to that of Nielsen’s: an average group of three to five inspectors from their overall evaluators found between 40% and 60% of known usability problems, the consensus benchmark for an effective heuristic set.

One interesting finding noted concerns the issue of subject incentive: their analysis found that the novice inspectors outperformed the experts! However, they note their methodology is likely to blame. Their novice evaluators were undergraduate students who were assigned the usability inspection as a class project, while their experts participated purely on a volunteer basis and in their spare time. A survey subsequent to the main study showed that that, not surprisingly, the experts spent considerably less time and effort than the novices. Though this miscalculation in their process diluted the overall results, it does serve as a valuable reminder of the importance of subject motivation and the role incentives play in it.

Mankoff [17] pursued a slightly different approach, but still in keeping with previous work. The ambient guidelines were developed more rapidly, and based directly on the standard list from Nielsen. The list underwent a short cycle of modification based on informal surveys and pilot studies. Sixteen subjects were then recruited to perform an HE of two peripheral displays. Eight inspectors used the ambient heuristics and eight Nielsen’s in a between-subjects comparison, which showed that the ambient heuristics found more and more severe problems than the standard set. After the study it was noted that Nielsen’s heuristics did find some problems never identified in the ambient set. To remedy that, the authors repeatedly selected the heuristic (chosen from among both sets) that accounted for the largest

3

Ambient displays, as defined in [16]: “…aesthetically pleasing displays of information which sit on the periphery of a user’s attention.”

4

See Appendices B and C for the ambient and CSCW heuristic sets, respectively.

(4)

number of severe problems until none were left. This process formed their final list of ambient heuristics, which included nearly all of the ambient set and half of Nielsen’s.

The problem domains for each of these adaptations are also noteworthy. CSCW applications are entirely concerned with facilitating teamwork, organizing group behavior and knowledge, and providing support for simultaneous interaction—all issues that are similar to those that separate HCI from HRI in multi-agent and multi-operator settings. Likewise, ambient devices strive to convey information without interrupting the users attention, and do so through a variety of software or hardware form factors; evaluations of HRI systems face similar variability in trying to judge how well a system maintains operator awareness of sensor data or determine the effects of a robot’s physical appearance.

HRI

H

EURISTICS

There are a number of bases from which to develop for potential HRI heuristics: Nielsen’s canonical list [21], HRI guidelines suggested by Scholtz [28], and elements of the ambient and CSCW heuristics as well [4] [17]. Scholtz’s issues in particular are almost directly applicable as heuristics, although they do not seem to be proposed for that purpose. They have been used as “high-level evaluation criteria,” however, in a manner that bears some similarity to heuristic evaluation [31]. Sheridan’s challenges for the human-robot communication [29] can also be considered issues to be satisfied by an HRI system.

These lists and the overall body of work in HRI provide the basis for heuristics applicable to both multi-operator and multi-agent systems; however, for this work we have limited our focus to single operator, single agent settings. The reason is in part to narrow the problem focus and in part because the issues faced by systems where the human:robot ratio is not 1:1 are to a large degree orthogonal to the base 1:1 case. In addition, the validation of single human/robot systems is a wise first step since lessons learned during that work can be applied to further development.

Our initial list is based on the distinctive characteristics of HRI, and should ideally be pertinent to systems ranging from normal windowed software applications to less traditional interfaces such as that of the Sony entertainment robots or the iRobot Roomba vacuum cleaner. Our list should also apply equally to purely teleoperated as to monitored autonomous systems, and everything in between. This is a feasible goal, as in someways, what makes a robotic interface (no matter what the form) effective is no different than what makes

anything else usable, be it a door handle or a piece of software [22].

To accomplish these goals, we can identify issues relevant to the user that are common to all of these situations. Norman emphasizes a device’s ability to communicate its state to the user as an important characteristic [22]. Applied to a robot, an interface then should make evident various aspects of the robots status—what is its pose? What is its current task or goal? What does it know about its environment, and what does it not know? Parallel to the issue of what information should be communicated is how it is communicated. The potential complexity of robotic sensor data or behavior parameters is such that careful attention is due to what exactly the user needs out of that data, and designing an interface to communicate that data in the most useful format.

Many of these questions have been considered as a part of the heuristic sets mentioned previously, and we leverage that experience by taking elements in whole and part from those lists to form our own attempt at an HRI heuristic set. The original source or sources of each heuristic before adaptation are also listed:

1. Sufficient information design (Scholtz, Nielsen) The interface should be designed to convey “just enough” information: enough so that the human can determine if intervention is needed, and not so much that it causes overload.

2.

Visibility of system status (Nielsen)

The system should always keep users informed about what is going on, through appropriate feedback within reasonable time. The system should convey its world model to the user so that the user has a full understanding of the world as it appears to the system.

3.

Appropriate information presentation (Scholtz) The interface should present sensor information that is clear, easily understood, and in the form most useful to the user. The system should utilize the principle of recognition over recall.

4.

Match between system and the real world

(Nielsen, Scholtz)

The language of the interaction between the user and the system should be in terms of words, phrases and concepts familiar to the user, rather than system-oriented terms. Follow real-world conventions, making information appear in a natural and logical order.

5.

Synthesis of system and interface (None)

The interface and system should blend together so that the interface is an extension of the system itself. The interface should facilitate efficient and effective communication between system and user and vice versa.

(5)

6.

Help users recognize, diagnose, and recover from errors (Nielsen, Scholtz)

System malfunctions should be expressed in plain language (no codes), precisely indicate the problem, and constructively suggest a solution. The system should present enough information about the task environment so that the user can determine if some aspect of the world has contributed to the problem.

7.

Flexibility of interaction architecture (Scholtz) If the system will be used over a lengthy period of time, the interface should support the evolution of system capabilities, such as sensor and actuator capacity, behavior changes and physical alteration.

8.

Aesthetic and minimalist design (Nielsen, Mankoff) The system should not contain information that is irrelevant or rarely needed. The physical embodiment of the system should be pleasing in its intended setting.

Heuristics 1, 2, and 3 all deal with the handling of information in an HRI interface, representative of the importance of data processing in HRI tasks. Eight signifies the potential importance of emotional responses to robotic systems. Number five is indicative of interfaces ability to immerse the user in the system, making operation easier and more intuitive. Four, five and six all deal with the form communication takes between the user and system and vice versa. Finally, #7 reflects the variable nature that HRI platforms can have. Since the heuristics are intended for HRI systems, they focus only on the characteristics distinct to HRI. Many human-robot interfaces (especially those that are for the most part traditional desktop software applications) can and do have usability problems that are associated with ‘normal’ HCI issues (e.g., widget size or placement), but these problems can be addressed by traditional HCI evaluations.

HRI

H

EURISTIC

D

EVELOPMENT AND

V

ALIDATION

These heuristics are only a small step towards a mature HRI heuristic evaluation framework. Following the methodology similar to [4] [14] and [17] (which was based in turn on Nielsen’s method for creating his initial list), we envision the following steps:

• Create an initial list of HRI heuristics via brainstorming and synthesizing existing lists of potentially applicable heuristics (accomplished above).

• Modify the initial list based on pilot studies, consultation with other domain experts, and other informal techniques.

• Validate the modified list against existing HRI systems, hypothesizing that a small number of evaluators using the heuristics will uncover a large percentage of known usability problems, and/or that evaluators uncover more usability problem using the HRI heuristics than another set of heuristics (such as Nielsen’s). In addition, we might compare the performance of inspectors who are robotics experts but HCI novices against HCI experts who are robotics novices.

With these steps in mind, we should emphasize again that the heuristics listed above will almost certainly change— perhaps substantially—over the course of development by addition, subtraction, or modification (or all three), just as other heuristic sets have over their lifetimes. Part of their importance to the field is simply that heuristics have been proposed at all.

C

ONCLUSIONS AND

F

UTURE

W

ORK

We have summarized current and past work in HRI system development and evaluation, and examined heuristic evaluation (HE), a usability inspection method from HCI. We have found that there is indeed a need for formative and discount evaluation techniques such as HE, but that current heuristics sets are not adequate for application to HRI systems. However, previous work has validated the concept of adapting HE to new problem domains, and those problem domains share some of the differences between traditional HCI and HRI system— indicating that adapting a set of heuristics for HRI is a fruitful endeavor. To that end, we have proposed an initial set of heuristics intended for single operator, single agent human-robot interaction systems. We have ignored other task environments to simplify the development process, since the validation and focus of heuristics for multi-operator or multi-agent systems are significantly different than the base case.

Going forward, we have also outlined a basic procedure for empirically validating the HRI guidelines we have suggested. The experience from that validation can instruct further research, especially the potential development of heuristics for agent and multi-human settings. Looking at the most immediate step in the process, a pilot case study of an HE of an HRI system using the proposed heuristics would be an appropriate step towards further examination and refinement of the set. Other possible future investigations might include the use of different heuristic sets for each of Scholtz’s interaction roles [27]. As a larger goal, the adaptation of

(6)

this HCI evaluation technique will hopefully inform the modification of other HCI methods to the HRI domain.

ACKNOWLEDGEMENTS

Thanks to Yoichiro Endo for his draft review of this report.

R

EFERENCES

[1] Adams, J. Critical Considerations for Human-Robot Interface Development. In Proceedings of the 2002 AAAI Fall Symposium on Human-Robot Interaction, November 2002.

[2] Agah, A. Human interactions with intelligent systems: research taxonomy. Computers and Electrical Engineering, vol. 27, pp. 71 – 107, 2001.

[3] Ali, K. Multiagent Telerobotics: Matching Systems to Tasks. Ph.D. dissertation, Georgia Tech College of Computing, 1999.

[4] Baker, K., Greenberg, S. and Gutwin, C. Empirical development of a heuristic evaluation methodology for shared workspace groupware. In

Proceedings of CSCW 2002, pp. 96–105. ACM Press, 2002.

[5] Breazeal (Ferrell), C. A Motivational System for Regulating Human-Robot Interaction. In Proceedings of AAAI98, Madison, WI, 1998 [6] Burke, J., Murphy, R.R., Rogers, E., Scholtz, J., and Lumelsky, V. Final Report for the DARPA/NSF Interdisciplinary Study on Human-Robot Interaction. Also appears as IEEE Systems, Man and Cybernetics Part A, Vol. 34, No. 2, May 2004.

[7] Casper, J., and Murphy, R. Human-Robot Interactions during the Robot-Assisted Urban Search and Rescue Response at the World Trade Center. IEEE Transactions on Systems, Man and Cybernetics Part B , Vol. 33, No. 3, pp. 367-385, June 2003.

[8] Collet, T. and MacDonald, B. User Oriented Interfaces for Human-Robot Interation.

[9] Dudek, G., Jenkin, M., Milios, E. and Wiles, D. A Taxonomy for Multi-Agent Robotics. Autonomous Robots Vol. 3, 375-397, 1996. [10] Endo, Y., MacKenzie, D.C., and Arkin, R. Usability Evaluation of High-Level User Assistance for Robot Mission Specification. IEEE Transactions on Systems, Man, and Cybernetics - Part C., Vol. 34., No. 2, May 2004 (In Press).

[11] Fong, T., Thorpe, C., and Baur, C. Collaboration, dialogue and human-robot interaction. In Proceedings of the 10th International Symposium on Robotics Research, Lorne, Victoria, Australia. Springer, 2001.

[12] Fong, T., Nourbakhsh, I. and Dautenhahn, K. A survey of socially interactive robots. Special issue on Socially Interactive Robots, Robotics and Autonomous Systems, 42 (3-4), 2002.

[13] Georgia Tech College of Computing and Georgia Tech Research Institute. Real-time Cooperative Behavior for Tactical Mobile Robot Teams; Skills Impact Study for Tactical Mobile Robot Operational Units. DARPA Study, 2000.

[14] Greenberg, S., Fitzpatrick, G., Gutwin, C. & Kaplan, S. (1999). Adapting the Locales framework for heuristic evaluation of groupware. In Proceedings OZCHI’99, 28-30, 1999.

[15] Haigh, K. and Yanco, H. Automation as Caregiver: A Survey of Issues and Technologies. In Proceedings of the AAAI-2002 Workshop on

Automation as Caregiver: The Role of Intelligent Technology in Elder Care, Edmonton, Alberta, 2002.

[16] Johnson, C., Adams, J. and Kawamura, K. Evaluation of an Enhanced Human-Robot Interface. In Proceedings of the 2003 IEEE International Conference on Systems, Man, and Cybernetic, 2003. [17] Mankoff, J., Dey, A.K., Hsieh, G., Kientz, J., Ames, M., Lederer, S. Heuristic evaluation of ambient displays. CHI 2003, ACM Conference on Human Factors in Computing Systems, CHI Letters 5(1): 169-176, 2003.

[18] McDonald, M. Active Research Topics in Human Machine Interfaces. Sandia National Laboratory Report, December 2000. [19] Molich, R., and Nielsen, J. Improving a human-computer dialogue.

Communications of the ACM 33, 3 (March), 338-348, 1990.

[10] Nielsen, J. Enhancing the explanatory power of usability heuristics. In Proceedings of the CHI 94 Conference on Human Factors in Computing Systems, Boston, MA, 1994.

[21] Nielsen, J. How to Conduct a Heuristic Evaluation.

http://www.useit.com/papers/heuristic/heuristic_evaluation.html.

[22] Norman, D. The Design of Everyday Things. Doubleday, New York, 1990.

[23] Nourbakhsh, I., Bobenage, J., Grange, S., Lutz, R., Meyer, R. and Soto, A. An affective mobile robot educator with a full-time job,

Artificial Intelligence 114 (1–2), 95–124, 1999.

[24] Olivares, R., C. Zhou, J. Adams, and B. Bodenheimer. Interface Evaluation for Mobile Robot Teleoperation. In Proceedings of the ACM Southeast Conference (ACMSE03), pp. 112-118, Savannah, GA, March 2003.

[25] Olsen, D. and Goodrich, M. Metrics for Evaluating Human-Robot Interactions. In Proceedings of PERMIS 2003, September 2003. [26] Schipani, S. An Evaluation of Operator Workload, During Partially-Autonomous Vehicle Operation.

[27] Scholtz, J. Theory and Evaluation of Human-Robot Interaction. In

Proceedings of the Hawaii International Conference on System Science

(HICSS). Waikoloa, Hawaii, 2003.

[28] Scholtz, J. Evaluation methods for human-system performance of intelligent systems. In Proceedings of the 2002 Performance Metrics for Intelligent Systems (PerMIS) Workshop, 2002.

[29] Sheridan, T. “Eight ultimate challenges of human-robot communication,” in Proc. IEEE Int. Workshop Robot-Human Interactive Communication (RO-MAN), pp. 9–14, 1997.

[30] Taylor, R., Jensen, P., Whitcomb, L., Barnes, A., Kumar, R., Stoianovici, D., Gupta, P., Wang, Z., deJuan, E. and Kavoussi, L. A Steady-Hand Robotic System for Microsurgical Augmentation.

International Journal of Robotics Research, 18(12):1201-1210 December 1999.

[31] Yanco, H., Drury, J. and Scholtz, J. Beyond Usability Evaluation: Analysis of Human-Robot Interaction at a Major Robotics Competition. Accepted for publication in the Journal of Human-Computer Interaction, to appear 2004.

[32] Yanco, H. and Drury, J. A Taxonomy for Human-Robot Interaction. In Proceedings of the AAAI Fall Symposium on Human-Robot Interaction, AAAI Technical Report FS-02-03, pp. 111-119, 2002.

(7)

A

PPENDIX

A

Nielsen’s Canonical Heuristics [21]

Visibility of system status

The system should always keep users informed about what is going on, through appropriate

feedback within reasonable time.

Match between system and the real world

The system should speak the users' language, with words, phrases and concepts familiar to

the user, rather than system-oriented terms. Follow real-world conventions, making

information appear in a natural and logical order.

User control and freedom

Users often choose system functions by mistake and will need a clearly marked "emergency

exit" to leave the unwanted state without having to go through an extended dialogue. Support

undo and redo.

Consistency and standards

Users should not have to wonder whether different words, situations, or actions mean the

same thing. Follow platform conventions.

Error prevention

Even better than good error messages is a careful design which prevents a problem from

occurring in the first place.

Recognition rather than recall

Make objects, actions, and options visible. The user should not have to remember

information from one part of the dialogue to another. Instructions for use of the system

should be visible or easily retrievable whenever appropriate.

Flexibility and efficiency of use

Accelerators -- unseen by the novice user -- may often speed up the interaction for the expert

user such that the system can cater to both inexperienced and experienced users. Allow users

to tailor frequent actions.

Aesthetic and minimalist design

Dialogues should not contain information which is irrelevant or rarely needed. Every extra

unit of information in a dialogue competes with the relevant units of information and

diminishes their relative visibility.

Help users recognize, diagnose, and recover from errors

Error messages should be expressed in plain language (no codes), precisely indicate the

problem, and constructively suggest a solution.

Help and documentation

Even though it is better if the system can be used without documentation, it may be necessary

to provide help and documentation. Any such information should be easy to search, focused

on the user's task, list concrete steps to be carried out, and not be too large.

(8)

A

PPENDIX

B

Ambient Display Heuristics [17]

Sufficient information design

The display should be designed to convey “just enough” information. Too much information

cramps the display, and too little makes the display less useful.

Consistent and intuitive mapping

Ambient displays should add minimal cognitive load. Cognitive load may be higher when

users must remember what states or changes in the display mean. The display should be

intuitive.

Match between system and real world

The system should speak the users’ language, with words, phrases and concepts familiar to

the user, rather than system oriented terms. Follow real-world conventions, making

information appear in a natural and logical order.

Visibility of state

An ambient display should make the states of the system noticeable. The transition from one

state to another should be easily perceptible. Aesthetic and pleasing design The display

should be pleasing when it is placed in the intended setting.

Useful and relevant information

The information should be useful and relevant to the users in the intended setting.

Visibility of system status

The system should always keep users informed about what is going on, through appropriate

feedback within reasonable time.

User control and freedom

Users often choose system functions by mistake and will need a clearly marked “emergency

exit” to leave the unwanted state without having to go through an extended dialogue. Support

undo and redo.

Easy transition to more in-depth information

If the display offers multi-leveled information, the display should make it easy and quick for

users to find

out more detailed information.

“Peripherality” of display

The display should be unobtrusive and remain so unless it requires the user’s attention. User

should be able to easily monitor the display.

Error prevention

Even better than good error messages is a careful design which prevents a problem from

occurring in the first place.

Flexibility and efficiency of use

Accelerators – unseen by the novice user – may often speed up the interaction for the expert

user such that the system can cater to both inexperienced and experienced users. Allow users

to tailor frequent actions.

(9)

A

PPENDIX

C

CSCW/Groupware Heuristics [4]

Provide the means for intentional and appropriate verbal communication

The prevalent form of communication in most groups is verbal conversations. The mechanism by which we gather information from verbal exchanges has been coined intentional communication and is used to establish a common understanding of the task at hand. Intentional communication usually happens in one of three ways.

1. People talk explicitly about what they are doing and where they are working within a shared workspace. 2. People overhear others’ conversations.

3. People listen to the running commentary that others tend to produce alongside their actions.

Provide the means for intentional and appropriate gestural communication

Explicit gestures are used alongside verbal exchanges to carry out intentional communication as a means to directly support the

conversation and convey task information. Intentional gestural communication can take many forms. Illustration occurs when speech is acted out or emphasized, e.g., signifying distances with a gap between your hands. Emblems occur when actions replace words, such as a nod of the head indicating ‘yes’. Deictic reference or deixis happens when people reference workspace objects with a combination of intentional gestures and voice communication, e.g., pointing to an object and saying “this one”.

Provide consequential communication of an individual’s embodiment

A person’s body interacting with a physical workspace is a continuous and immediate information source with many degrees of freedom. In these settings, bodily actions unintentionally “give off” awareness information about what’s going on, who is in the workspace, where they are, and what they are doing. This visible activity of unintentional body language and actions is fundamental for creating and sustaining teamwork, and includes:

1. Actions coupled with the workspace include: gaze awareness (where someone is looking), seeing someone move towards an object,

and hearing sounds as people go about their activities.

2. Actions coupled to conversation are the subtle cues picked up from our conversational partners that help us continually adjust our

verbal behaviour. Cues may be visual (e.g., facial expressions), or verbal (e.g., intonation and pauses). These provide conversational awareness that helps us maintain a sense of what is happening in a conversation.

Provide consequential communication of shared artifacts (i.e. artifact feedthrough)

Consequential communication also involves information unintentionally given off by physical artifacts as they are manipulated. This information is called feedback when it informs the person manipulating the artifact, and feedthrough when it informs others who are watching. Seeing and hearing an artifact as it is being handled helps to determine what others are doing to it. Identifying the person manipulating the artifact helps to make sense of the action and to mediate interactions.

Provide Protection

Concurrent activity is common in shared workspaces, where people can act in parallel and simultaneously manipulate shared objects. Concurrent access of this nature is beneficial; however, it also introduces the potential for conflict. People should be protected from inadvertently interfering with work that others are doing, or altering or destroying work that others have done. To avoid conflicts, people naturally anticipate each other’s actions and take action based on their predictions of what others will do in the future. Therefore, collaborators must be able to keep an eye on their own work, noticing what effects others’ actions could have and taking actions to prevent certain activities. People also follow social protocols for mediating their interactions and minimizing conflicts. Of course, there are situations where conflict can occur (e.g., accidental interference). People are also capable of repairing the negative effects of conflicts and consider it part of the natural dialog.

Manage the transitions between tightly and loosely-coupled collaboration

Coupling is the degree to which people are working together. It is also the amount of work that one person can do before they require

discussion, instruction, information, or consultation with another. People continually shift back and forth between loosely- and

tightly-coupled collaboration where they move fluidly between individual and group work. To manage these transitions, people need to

maintain awareness of others: where they are working and what they are doing. This allows people to recognize when tighter coupling could be appropriate e.g., when people see an opportunity to collaborate or assist others, when they need to plan their next activity, or when they have reached a stage in their task that requires another’s involvement.

Support people with the coordination of their actions

An integral part of face-to-face collaboration is how group members mediate their interactions by taking turns and negotiating the sharing of the common workspace. People organize their actions to help avoid conflict and efficiently complete the task at hand. Coordinating actions involves making some tasks happen in the right order and at the right time while meeting the task’s constraints. Within a shared workspace, coordination can be accomplished via explicit communication and the way objects are shared. At the fine-grained level, awareness helps coordinate people’s actions as they work with shared objects e.g., awareness helps people work effectively even when collaborating over a very small workspace. On a larger scale, groups regularly reorganize their division of labour based on what the other participants are doing and have done, what they are still going to do, and what is left to do in the task.

Facilitate finding collaborators and establishing contact

Most meetings are informal: unscheduled, spontaneous or one-person initiated. In everyday life, these meetings are facilitated by physical proximity since co-located individuals can maintain awareness of who is around. People frequently come in contact with one another through casual interactions (e.g. bumping into people in hallways) and are able to initiate and conduct conversations

effortlessly. While conversations may not be lengthy, much can occur such as coordinating actions and exchanging information. In electronic communities, the lack of physical proximity means that other mechanisms are necessary to support awareness and informal encounters.