2.5 Binaural Simulation
2.5.2 The Head-related Transfer Function (HRTF)
The head-related transfer function is defined as the free-field transfer function from a sound source to each of a listener’s ears (Xie, 2013) and can be thought of as a linear, time-invariant (LTI) process. These frequency domain functions, one for each ear, contain most of the important localisation cues needed by a human listener. However, dynamic cues caused by in situ head-movements are not included. Møller (1992) states that transfer functions measured at any of the microphone positions shown in Figure. 2.12 constitute a HRTF due to the ear canal being regarded as a one-dimensional transmission line (Hammershøi and Møller, 1996). The HRTF is a function of sound source distance, azimuth,
elevation and frequency but also varies between individuals due to anatomical differences. Figure. 2.12 shows a simple diagram of the human outer-ear anatomy and commonly-used measurement positions.
EAR CANAL
PEEPopen P
DRP PERPopen
(a) Open ear canal.
EAR CANAL PLUG
PEEPblocked
PERPblocked
(b) Blocked ear canal.
Figure 2.11: Human outer-ear anatomy with pinna, ear-canal and ear-drum. ERP = ear reference point, EEP = ear entrance point and DRP = drum reference point. Ear-canal blocking is usually achieved using expanding foam ear-plugs.
HRTF measurements can be made using a far-field loudspeaker with a broad and flat frequency response. To remove the effect of the loudspeaker, microphone and propagation delay in an efficient way, a reference measurement can be made. This is the transfer function between loudspeaker input terminals and a microphone positioned at the centre of the head (without the head present) referred to as P0.
This method can be called themeasurement-equalised HRTF and can be measured as shown in Figure. 2.12(b).
Sound Source θ P L PR x y θ
(a)PLand PR measurement.
Sound Source θ P 0 x y θ (b) P0 measurement.
Figure 2.12: Setup for ear and reference measurements needed to derive HRTFs.
The HRTF is a frequency domain representation of the filtering effect where the inverse Fourier transform gives the head-related impulse response (HRIR).
When HRTFs are considered binaurally, differences between left and right ears can be used to highlight the two fundamental localisation cues: interaural time and level differences. Early investigations by Strutt (1907) on pure-tone localisation introduced the Duplex Theory which states that the human auditory system uses inter-aural time differences for the lateralisation of sounds in the low-frequency spectral region (<500 Hz) where shadowing by the head is negligible and lateralisation of high-frequency sound events is dominated by inter-aural level differences, where head shadowing is more dominant due to the relative size of the head compared to the wavelength. Experiments which revisit the topic still support the duplex theory today (Macpherson and Middlebrooks, 2002). Representative ITD and ILD values are shown in Figure. 2.13.
(a) ITD (b) ILD
Figure 2.13: Broadband interaural cues across azimuth and elevation calculated using a freely available HRTF dataset (Andreopoulou et al., 2015). ITD is calculated using the maximum IACC method.
The HRTF can be represented by three fundamental transfer function components: (1) Minimum-phase component, (2) all-pass component and (3) linear-phase component. The minimum-phase element has a magnitude response equal to that of the original HRTF but the smallest phase angle. Due to the natural log of the magnitude response being related to the phase angle of the minimum-phase transfer function’s phase response, a Hilbert transform can be implemented to approximate this easily. The all-pass component is a unity magnitude filter with any excess phase response. The linear-phase filter is a pure delay. Minnaar et al. (1999) has shown that the omission of the all-pass component in binaural simulation is not perceptible by humans.
The pure-delay (linear-phase) parts of the HRTF, which are independent at each ear, represent the ITD. The delay between the ears is simple in concept. However, in practice it is difficult to estimate and many methods exist. Early procedures calculate the differences in time-of-arrival by finding the time at which the magnitude exceeds a certain threshold such as 5% of the maximum (Sandvad and Hammershøi, 1994), for each ear. Although efficient, this method
suffers from bias due to low interaural coherence on the contralateral ear (Nam et al., 2008). Kistler and Wightman (1992) implemented a method by calculating the argument of the maximum value in the interaural cross-correlation function. A similar method was used by Nam et al. (2008) where the cross-correlation was performed between HRIRs and their minimum-phase versions yielding the time of arrival for each ear separately. A linear-phase fitting method in frequency domain was also proposed by Jot et al. (1995). As shown in Figure. 2.13, the ITD values range from 0µs at a source azimuth of 0◦ to approximately ±700µs at a source azimuth of ±90◦.
Although a number of methods exist, all have been shown to produce valid results and the choice of implementation method may be defined simply by practicality. ITD approximations for the computational model presented in Chapter 7 use the method of Kistler and Wightman (1992) as this allows for ITD to be calculated at the same time as the running interaural coherence function. For ITD metrics presented in Chapter 5, the method of Nam et al. (2008) was used to improve stability of the approximation when interaural coherence is low.
The ILD can range between 0 and±20 dB and is highly frequency dependent due to the inherent frequency dependance of the HRTF.