Conception - Human Audition - Clique : perceptually based, task oriented auditory display for G

4.1 Human Audition

4.1.3 Conception

After sounds have been segregated into streams and placed in the auditory image store, conscious attention selects a stream for further processing using knowledge stored in long-term memory. The next section details this process including constraints on stream selection, how selected streams are associated with learned knowledge, and how attention can affect the primitive process.

Schema-based Processing

The second phase of auditory scene analysis is a learned, top-down process in which one primitive stream in the auditory image store is selected for conscious inspection. This process occurs attentively and uses working memory to hold information about a stream of interest. Streams pulled into working memory serve as both retrieval cues for schemas, representations of knowledge stored in long-term memory, and as cues for storing new schemas (Card, 1983). Retrieved schemas influence the listener’s conception of the attended sound: a stream of tones may be recognized as the melody of known tune, a stream of phonemes may be heard as an intelligible sentence, and so on. Schema-based processing may also influence primitive processing such that streams are formed to more readily match a recently retrieved template (Bey and McAdams, 2002) while attention to a particular stream provides maintenance of its segregation from other perceptual objects (Cusack et al., 2004). The diagram shown in Figure 4.7 depicts this second phase.

As mentioned previously, a tenet of the theory of auditory scene analysis is that only one stream formed by the primitive process may be chosen for attentive, schema-based processing at a time. While the primitive system segregates the sound spectrum into

Figure 4.7: Sound streams produced by the primitive segregation and grouping processing are available for attentive processing. One stream is selected at a time for processing and the act of selecting can bias the primitive process. Based on a diagram from (Valkenburg and Kubovy, 2004).

multiple perceptual objects, the attentive system may onlyselect a particular object for further processing. Thus, a listener may hear music on the car radio and his passenger talking at the same time, but may pay attention to and process only one stream at a time. The listener cannot consciously attend to and understand both streams at once. Any apparent ability to do so likely derives from a rapid switching of attention among primitive streams held in the auditory store, not from true parallel processing (Barber et al., 2003). Once selected, the stream becomes the sole figure of auditory attention while all the other streams become background (Valkenburg and Kubovy, 2004). Still, there is evidence that background speech streams may have a priming effect, an increase in speed or accuracy of information detection, on conscious listening (Rivenez et al., 2004).

The efficiency with which listener may select a single stream for attention depends on the similarity of that stream relative to other streams in the auditory buffer (Bey and McAdams, 2002; Arons, 1992; Cherry, 1953). If primitive segregation produces two

streams that share many of the same characteristics listed in Table 4.1, directing conscious attention to and fixating on one of the two streams will be difficult. Moreover, if the two streams share similar semantic content matching the same learned schema, processing of one of the two streams will be difficult in the presence of the other (Arons, 1992). For instance, if the driver tunes the car radio to a news station with an announcer that sounds strikingly similar to his chatty passenger, the driver will have trouble fo- cusing attention solely on the passenger’s voice. The ability of the driver to understand what his passenger is saying will be further inhibited if both the radio announcer and the passenger are talking about the news. What the newscaster says about national politics will confuse and become mixed with what the passenger might be saying about local politics (Wickens and Hollands, 2000).

Streams segregated by primitive analysis may be regrouped by attention (Bey and McAdams, 2002; Bregman, 1990). Conscious listening can cause separate streams to fuse into one in order to match an active template. Again, the efficiency with which streams may be fused in schema-based analysis is dependent on the similarity of the streams to one another and the degree to which they match the target template. For example, say some notes in familiar melody are shifted an octave higher. When the melody is presented in this manner, a listener will likely hear two sound streams—one for the lower notes and one for the higher notes. If informed of the name of the melody, however, the listener will be able, with some effort, to fuse the two streams into one fitting the learned schema for that melody. Nevertheless, if pitch shifted portions of the melody are made even more dissimilar to the rest of the notes, for instance, by transposing them by a number of octaves or presenting them in a different timbre, integration of the two separate streams by conscious effort will become more difficult.

In the same vein, a part of a fused stream may be “listened out” by the attentive process such that it is pulled into its own stream. This slicing operation is not the

equivalent of primitive segregation, though. The listener is aware of almost none of the information left behind after the selection (Bregman, 1990).

A sudden change in a stream not only resets the streaming mechanism, but can also pull conscious attention to it (Ueda et al., 2005). For instance, if the car radio tuner suddenly breaks and starts putting out static instead of music, both the driver and passenger will likely pay immediate attention to the radio stream, regardless of whether or not their attention was previously directed elsewhere. The same behavior holds true for the sudden onset of a new auditory stream. If a police car suddenly turns on its siren nearby, both vehicle occupants will start attending to the new sound and processing its meaning. In both cases, the ability of a stream to involuntarily drive listener attention to it is a function of its similarity to other streams, the magnitude of the change, and the effort the listener has invested in listening to another stream.

The resources allocated for attentive processing of sound streams are independent of those used for primitive analysis. Investing effort in understanding a particular stream does not inhibit the streaming process from segregating the remaining sound content at the same time (Kline and Glinert, 1994; Wickens, 1991). Similarly, the resources used for processing of auditory information are separate from those applied to other modalities. A listener is capable of identifying the presence of separate auditory and visual targets when they overlap in time whereas identification of two targets is more difficult when they are presented in the same modality (Duncan et al., 1997). There also appears to be separate resource banks for verbal versus spatial processing as well as for input processing versus response preparation (Wickens, 1991).

An auditory display must be consistent in how it presents information in order to facilitate the formation and retrieval of sound schemas from long-term memory. The presentation of multiple, concurrent streams of information is possible and enables the switching of listener attention among them. Ensuring the distinction of streams in terms of the parameters discussed in the previous section improves the efficiency with which

the listener may select a particular stream for conscious processing. Since the listener is limited to attending to one stream at a time within a modality, however, a display must not place critical content in separate, current streams. On the other hand, the display need not account for listener attention to and responses in other modalities as different modalities, stages of processing, and type of processing draw on different resources.

In document Clique : perceptually based, task oriented auditory display for GUI applications (Page 152-156)