4.3 Concurrent auditory displays
4.3.1 Concurrent earcons
In some ways, earcons and any auditory display could be perceived as music, as they require the organisation of sounds or at least the design of systems to organise sounds. For the sake of this work, earcons are differentiated from music in that they are entirely bound to their function of communication. However, as earcons are closely related to instrumental music, it would seem that they might also be suitable for concurrent presentation, provided they follow similar compositional restrictions. In Blattner et al.’s (1989) original work on the subject, the earcon cue was proposed primarily as a method for serially presenting information, but a suggestion was also made about using multiple earcons concurrently. These initial investigations ran into difficulties with the interactions between concurrent cues, which was again found in (McGookin & Brewster, 2002). Concurrent earcons replicate ideas from polyphony in musical theory, where multiple melodies are presented concurrently.
The difficulties described by McGookin & Brewster (2002) led to work identifying methods for the successful presentation of concurrent earcons using experimental techniques (McGookin & Brewster, 2003, 2004b; McGookin, 2004). The result of this work led to an additional set of guidelines for the creation of concurrent earcons (McGookin & Brewster, 2004b, a). These
stipulated that earcons should be designed such that each concurrently presented earcon had a different timbre, earcons had inharmonic relationships and considerable onset asynchronies, and spatial separation (McGookin & Brewster, 2004b, a). Even when adhering to these guidelines, however, the number of concurrent sources was restricted, with large reductions in identification performance as the number of earcons was increased to four (McGookin & Brewster, 2004b).
The limited number of concurrent sources appears to be at odds with the comparatively large number of instrumental lines common in musical compositions. However, whilst in music it is often intended that multiple instrumental lines are heard as one, in order for earcons to be reliably recognised they must be perceived individually. Also, in music, information is usually conveyed by a combination of all concurrent items, but for earcons to fulfil their purpose they must succeed in conveying their individual messages. It is, therefore, unacceptable for interactions between concurrent earcons to negatively affect a user’s ability to distinguish the defining features of the constituent earcons.
As part of work into the use of design patterns to develop auditory interfaces, an interface for Microsoft Explorer was proposed which presented spoken objects and earcons within a virtual room (Putz, 2004; Frauenberger et al., 2005b, a). On one wall was displayed the menu system and on another a toolbar. The menu was displayed using speech combined with instrumental tones, which were configured so as to increase in pitch from the first to last items—similar to the auditory scrollbar (Yalla & Walker, 2008)—and it was displayed across the virtual wall so that the structure originated in the bottom left corner and spread to the top right corner. The tool-bar was represented using five earcons positioned in a row along the other wall and additional background sounds were used to represent changes in state (such as the opening of a pop-up windows). Spatialisation was performed using binaural rendering and a head-tracker to improve localisation.
While the spoken menu items were triggered individually, the tool-bar earcons were presented concurrently to the user (Frauenberger, 2013). The display of the tool-bar system was configured such that initially the display was silent, but as the user moved towards a wall using a joystick, five concurrent earcons were presented. This is notably above the four earcons found by McGookin & Brewster (2004b) to significantly reduce users’ performance. The interface was evaluated by Frauenberger et al. (2005b) using two groups of participants, either sighted or with differing degrees of visual impairment. After a short training session users responded favourably toward the system and made very few mistakes in selecting
Figure 4.1: Representation of McGookin & Brewster’s multi-modal display showing the focus and priority zones. The shade of the zones represents the increased importance required for a source to be sonified further from the focus. (Adapted from McGookin & Brewster (2001, p. 3))
folders or files. It was found that visually-impaired users favoured the tool-bar, while sighted users used the menu and tool-bar elements more equally. It is unclear what would cause this difference between the groups. The preference indicated by participants with visual impairments, however, suggests that this configuration of earcons was relatively easy to use. Unfortunately, although this work suggests that presentations of five earcons are possible, it is difficult to isolate the requirements for this.
McGookin and Brewster proposed an interesting solution to the navigation of large amounts of information distributed in two dimensions with their Fishears (McGookin & Brewster, 2001) or Dolphin (McGookin & Brewster, 2002; McGookin, 2004) systems. The systems were designed to represent a map of a theme park with a large number of rides, each represented by an earcon. The system used binaural rendering to spatialise the earcons in appropriate locations. To reduce the perceptual and computational loads, not all the rides were presented concurrently. Instead, the user could shift a focus zone that could be moved across the map, which was represented visually on the screen of a personal display assistant (PDA). A set of concentric priority zones were defined around the focus zone, which would sonify any rides within them which were deemed above a given importance level, with zones closer to the central focus having lower thresholds (see Figure 4.1). The experiment compared route-finding performance between a scrolling visual map and this multi-modal display. No differences were uncovered between the displays in terms of performance, although subjective evaluations ascribed a higher mental workload and higher levels of frustration and annoyance to the multi-modal interface (McGookin, 2004). McGookin (2004) noted that the earcons used in this experiment are unlikely to have been optimal due to their acoustic similarities, which resulted from the common ‘grammar’ used in their construction. Furthermore, it is
notable that there was no exploration of this design as a model for an entirely non-visual display, or a comparison with alternative auditory display methods.