4.2 Guidelines to Support the Listening Worker
4.2.3 General Behavior
In general, the display should adhere to protocols of communication such that interaction with the user is efficient and satisfying. The display should ground the interaction by confirming user actions with audible feedback and requesting more information when commands are vague or incomplete. Views should prefer short sounds as repetitive acknowledgments of user actions instead of more time consuming speech confirmations. Views might also manipulate sound parameters such as consonance to convey additional information such as whether a user action completed successful or not (Hankinson and Edwards, 1999). On the other hand, views should use speech when the user requires a more specific report of the last user action or request for further information made. The detail of the feedback given should help a user set his or her grounding criteria for the interaction. For instance, giving a large amount of spoken feedback on each user command can help the user realize that the interaction is not proceeding as smoothly as possible (Table 3.4-16).
Using ecological sounds should help a display honor the maxims of quality and man- ner (Table 3.4-20). Everyday listening occurs in environments with rich sound spectra, not pure tones. Mimicking real world sound sources should improve user acceptance of a display over less natural alternatives. Mixing a variety of musical, environmental, and speech sounds might also hold user interest and reduce frustration, annoyance, and fatigue over extended periods of time (Mereu and Kazman, 1997).
The decision whether to use speech or non-speech sounds relates closely to the max- ims of quantity and appropriate level of detail (Table 3.4-17). When the user requires an
exacting description, the display should rely on speech. The price for using speech is that it takes longer to output and is prone to content confusion caused by other concurrent speech streams. When the display must report information quickly or unobtrusively, it should use non-speech sounds. The difficulty with using sound comes from knowing how to effectively map information to the various available acoustic properties. However, some general guidelines exist for choosing how to map certain classes of information to sound.
Representing Information in Sound
When expressing boolean information, the display should use auditory icons. For in- stance, the primary view can play a crashing sound while reading a sentence to indicate a misspelling. The absence of the crashing sound indicates no misspelling exists. The downside to such natural sounds is that relationships between real world events and computing concepts do not always exist (Mynatt, 1997; Carroll et al., 1997). The user must take time to learn arbitrary mappings. However, this requirement is not unique to audio: the meaning of many graphical indicators must also be learned. The red underline indicating a word is mispelled is just one example.
The display should represent nominal data using earcons. Short, musical motives related by timbre, melody, rhythm, and octave can name items in a set. For example, earcon rhythm can distinguish task start, resumption, completion, and cancellation. Similarly, timbre can identify the type of task in question (Brewster et al., 1993). The drawback to using earcons is that the user must learn how their properties name con- cepts. In the context of the previous example, the mapping from timbre to task type is arbitrary and requires that the user discover and memorize this association.
The display should convey ordinal relationships using relative differences in basic acoustic properties of sounds. Encoding concepts such as more/less, longer/shorter, etc. in the pitch, tempo, or duration of two sounds is possible when the user needs only a
rough comparison (Alty and Rigas, 1998). Presenting two sounds, for example, with the pitch of each sound mapped to the length two text documents might enable a rapid determine of which document is shorter and longer. The display must support config- uration of the polarity of the mapping, though, because sighted and visually impaired users tend to expect opposing relationships (Walker and Lane, 2001).
Using non-speech sounds to convey interval and absolute data is difficult. Untrained listeners cannot easily make judgements about the distance between two sound param- eters or assign an exact value to one sound parameter in isolation (Edwards, 1988; Miller, 1956). For data in these categories, a display should speak a short summary of the information whenever speech synthesis is available.
4.3
Summary
This chapter developed the groundwork for supporting office computing tasks via an auditory display. The guidelines were based both on the abilities of the user as a listener as well as the behaviors of the user as a worker. The following chapter describes an implementation of a software auditory display built from these specifications and targeted at office productivity applications.
Chapter 5
The Clique Auditory Display
This chapter describes the synthesis of the Clique software auditory display from the concepts developed in the previous three chapters. The chapter starts by approaching the design of Clique from the perspective of the user. The first few sections describe the auditory scene the user hears, how streams in that scene convey information about the user’s work, and how the user issues commands to manipulate that scene, and thus advance his or her work. A section follows explaining how the user experience satisfies the user requirements stated earlier in this work.
The middle of this chapter explains Clique in terms of the software architecture that enables the desired user experience. The corresponding sections describe the model- view-controller (MVC) paradigm used to separate application business logic from user interface, the input/output pipeline concept employed to organize output from con- current tasks, and the scripting framework designed to support adaptation of existing programs to the Clique auditory display as well as the creation of new audio applications. A section closing this discussion summarizes how the architecture resolves problems that arise when attempting to develop usable auditory displays.
The chapter concludes with a brief description of a prototype implementation of Clique in the Python programming language. The last sections explain the function of seven Clique scripts which adapt existing GUI applications for use in Clique and one
which implements a simple application from scratch with no GUI counterpart. These applications serve as the basis for user evaluation in the next chapter.