• No results found

Spatial auditory perception

3.2 Hearing: key principles

3.2.2 Spatial auditory perception

In natural acoustic environments, sounds arrive at the ears from many different directions. Birds sing in the trees above us, our footsteps come from beneath us, and a twig snaps behind us. Being able to identify the location of a sound source is an important feature of human auditory perception. Unlike vision, audition is sensitive to sounds coming from all directions. By decoding spatial information from sounds, we are able to develop a better understanding of our environment. Furthermore, spatial information of auditory events provides information to allow quick orientation of visual attention to salient sounds. It also provides additional benefits where the perception of multiple sounds are concerned, which will be discussed later within this chapter.

Looking at a sound signal, there are no implicit cues to its location and as sounds arrive at the ears from many directions. This section describes how spatial information is deduced from the signals arriving at the ears and how headphone presentations can produce different spatial percepts.

When talking about locations in relation to a listener, it is useful to define several planes relative to the head. The median plane bisects the head down the centre between the eyes. The frontal plane also runs downwards through the head but perpendicular to the median plane, while the horizontal plane runs laterally through the head (Blauert, 1997) (see Figure 3.3). It is common to refer to positions in terms of their azimuth or elevation (Blauert, 1997). Azimuth refers to their angular displacement on the horizontal plane with 0◦ directly in front

Figure 3.4: Illustration of the additional path taken by sound to reach the contralateral ear for sources away from the median plane. Adapted from (Rumsey, 2001, p. 22)

of the listener on the median plane, and 180◦ directly behind on the median plane. Elevation

refers to angular displacement along the vertical, median plane, with 0◦ at the level of the ears, +90◦ directly above the head and−90directly below.

Spatial cues

As mentioned in Section 3.2.1, the outer ear imparts spectral modifications on signals depending on the angle from which they enter. This spectral modification occurs because sound takes different routes around the pinna before entering the ear canal. As these routes differ in length, phase differences are introduced between the signals that have taken different paths. When these signals are combined, the phase differences result in attenuating some frequencies, while accentuating others (Moore, 2012). Due to the size of the pinna, this only effects higher frequencies, above 4 kilohertz (kHz) (Plack, 2014),

This, along with reflections from the head and shoulder, create monaural cues that are highly individual, due to anthropometric differences. These cues are particularly important in resolving the elevation of sources (Plack, 2014). Studies in which listeners have attempted to localise sound sources with modified monaural cues have shown an increase in the amount of error (Hofman et al., 1998). It is interesting to note, however, users can adapt to these modifications that over longer periods (Hofman et al., 1998).

In addition to monaural cues, spatial hearing also exploits small differences between the signals arriving at both ears. There are two principle binaural cues that vary depending on the location of a sound source. The first of these is differences in the time at which the signal arrives each ear, referred to as the interaural time difference (ITD) or interaural phase difference (IPD). When sounds are away from the median plane, the source will be closer to one of the ears (see Figure 3.4). As a result of this, the sound will reach the closer

(ipsilateral/proximal) ear first. The magnitude of the ITD increases with angular separation from the median plane, reaching a maximum of approximately 690 microseconds (Moore, 2012). At low frequencies, this is exhibited as phase difference between the signals due to the period of the wave (Moore, 2012).

The second of the binaural cues is the difference between the levels of the signals arriving at each ear, the interaural level difference (ILD). The ILD is largely due to shadowing caused by the head, which causes a reduction in level at the further (contralateral/distal) ear (Moore, 2012). The amount of ILD is dependent on the frequency of the sound, with higher frequencies showing larger differences than lower frequencies (Feddersen et al., 1957). Unlike ITD, ILD shows a more complex relationship to azimuthal distance from the median plane, which depends on their frequency (Feddersen et al., 1957).

Rayleigh (1907) noted that the two cues are effective for only certain frequency ranges. At higher frequencies (> 1500 Hz), the wavelength of signals are shorter than the distance between the ears, making the phase differences unreliable (Moore, 2012). Similarly, at low frequencies the wavelength of the signals means that the signal diffract around the head, making the level difference unreliable (Moore, 2012). It would seem, therefore, that neither cue covers the entire frequency range. At high frequencies, however, time differences in the envelope of the signals can be observed (Middlebrooks & Green, 1990). These have been found to provide localisation cues (e.g., McFadden & Pasanen, 1976; Henning, 1980) and, therefore, may work alongside the ILD at higher frequencies.

Simulating spatial sound with headphones

When presenting a signal directly to each ear using headphones, binaural and monaural cues are not present and sources are perceived as coming from within the listener’s head (Plack, 2014). When the signal is identical at both ears, this is referred to as a diotic presentation. In order to provide users with an accurate spatial impression of sounds presented over headphones, it is necessary to introduce the monaural and binaural cues that would be present normally.

As the acoustic cues that influence auditory spatial perception for a source emanating from a specific location are attributable to the linear time invariant response of the head, shoulders and pinnae, it is possible to accurately model the system through the use of impulse responses (Blauert, 1997). These impulse responses are referred to as head-related impulse responses (HRIRs) in the time domain or head-related transfer functions (HRTFs) in the frequency

domain. Through convolving a source audio signal with the HRIR and then presenting the resulting signals over stereo headphones very convincing spatialisation can be achieved. Within this thesis, this is referred to as binaural presentation, processing or rendering. The results of binaural rendering are, however, highly variable and depend heavily upon both the methods used to capture the HRTFs and to render the final acoustic scene. Factors such as the individual characteristics of the HRTF and whether head-tracking systems are used to recreate cues from small head movements are known to effect the quality of the spatialisation (Rumsey, 2001).

Binaural processing is not the only method for providing some spatial impression over headphones. By including only differences in intensity or phase between the ears, it is possible to shift the perceived location of the source from the centre of the head. The manipulation of the intensity at each ear causes the signal to remain within the head but shift towards the ear presented with the higher intensity signal (Blauert, 1997). This is referred to as intensity panning, and is commonly used in consumer systems. Stereo mixes rely on differences in intensity (Moore, 2012). While presentation over properly configured loudspeakers leads to phase differences being reconstructed due to cross-talk between the channels (Blumlein, 1933), this is not the case when stereo material is listened to over headphones.

An extreme form of this is monaural presentation, where the signal is only presented to one ear, or dichotic presentations, where completely different signals are presented at each ear. While these are rarely encountered in consumer systems, they have formed the basis for experimental work on spatial auditory attention.