7.2 The Existing Computational Model
7.2.2 Binaural and Central Processing
The binaural processing stage uses the selection criteria implemented by Faller and Merimaa (2004). Firstly, the time-domain nerve firing densities in each auditory filter band are split into time windows so that interaural cues can be processed. All processing in this chapter uses the 10 ms time window proposed by Faller and Merimaa (2004). By calculating the running inter-aural cross-correlation function, IC and ITD values are estimated as the maximum of the interaural cross-correlation and the argument of the maximum of the interaural cross-correlation respectively. ILD is calculated as the energy difference between left and right nerve-firing densities and all values are calculated as a function of time window index, n and auditory filter index k. These three values are represented by:
IT D(k, n) ILD(k, n) IC(k, n)
(7.1)
ITD and ILD values are then selected as valid according to a frequency dependent threshold C0. At each time constantn,IT D(k, n) and ILD(k, n) are only used if
the corresponding IC(k, n) value is greater than C0. Faller and Merimaa (2004)
proposed that the physiological representation ofC0 is adaptive, depending on the
room the listener is in or other external factors. Sheaffer (2013) used empirical data to define Equation. 7.2, the primary feature being an increase in the threshold at high-frequencies.
C0(k) = (1−e−µk) (7.2)
wherek is the auditory filter band index (1,2, ..K−1, K) whereK = 44, the total number of auditory filters. µis the control parameter for changing the slope ofC0
with respect to frequency. It is important to note that this selection function is not scaled according to frequency, but the number of filters in the auditory filterbank and therefore, the tuning parameterµshould be selected carefully when a different number of auditory filters is used. For calculations shown here,µ= 0.15 (Sheaffer, 2013).
ERB Number 5 10 15 20 25 30 35 40 C0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ITD/ILD CUES USED
ITD/ILD CUES UNUSED
Interaural Cross-correlation Selection Threshold
Figure 7.3: C0 function used to select valid ITD and ILD values. µ= 0.15
Probability density functions, P DFIT D(k, τ) and P DFILD(k, α) are created by
counting the occurrences of ITD and ILD values across n after the C0 selection
criteria. τ and α values are constrained to the minimum and maximum possible range across which ITD and ILD are calculated. Figure. 7.4 shows a time-domain representation of the calculation process.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Amp lit u d e 0 0.02 0.04
0.06 Nerve Firing Density (ERB band #20, 1568.9Hz)
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 IT D (ms) -1 0 1 ITD 0 0.5 1 -1 0 1 PDFITD 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 IL D (d B) -20 0 20 ILD 0 0.5 1 -20 0 20 PDFILD Time (s) 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 IC 0 0.5 1 IC
Figure 7.4: The process for calculating valid ITD and ILD values and probability density functions. Data is shown for a sound source at θ = 315◦ and ERB band 20 with a centre-frequency of 1568.9Hz. Left hand side, from the top (a) input left and right nerve firing densities, (b) calculated ITD value, (c) calculated ILD value (post peripheral processing), (d) IC function with correspondingC0threshold
value.
Histogram plots on the right side show the probability density values for the chosen ERB filter band. The PDFs are created for both ITD and ILD values and for each auditory filter band, represented byP DFIT D(k, τ) andP DFILD(k, α) respectively.
Figure. 7.5 shows the PDF data across all frequency bands. PDF data is normalised for each auditory filter k.
ITD (ms) -1 -0.5 0 0.5 1 Frequency (Hz) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Probability 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a)P DFIT D(k, τ) ILD (dB) -20 -15 -10 -5 0 5 10 15 20 Frequency (Hz) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Probability 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (b) P DFILD(k, α)
Figure 7.5: Cue probability functions for a sound source atθ = 45◦.
Sheaffer (2013) combines P DFIT D(k, τ) and P DFILD(k, α) into compact matrix
form named thecue probability pattern CP P(k, θ). For each sound source azimuth θ and auditory filter k, a 2-dimensional matrix can be calculated to define the localisation characteristics where,
CP P(k, θ) = P DFIT D(k, τ) P DFILD(k, α) (7.3)
The assumption of this stage of the computational model is that for each of the possible directions of sound stimuli presented to the model, there is an almost unique CPP matrix. A reference dataset is created by calculating CPP values for anechoic data at known sound source directions. Each CPP is labelled according to the sound source direction. An unknown binaural test stimulus can then be analysed by firstly calculating CPP data. The correlation between the test CPP and the reference CPP data for each angle will provide a prediction of the most likely stimulus directions.
Correlation analysis is performed by taking a 2D cross-correlation of the CPP data per frequency band. For computations of the localisation model used in this
chapter, the reference dataset was calculated by taking an anechoic approximation of the SBSBRIR dataset as described in Chapter. 4. This ensured that the same electroacoustic equipment was used for the reference and test stimuli.
The resulting cue correlation function, CC(k, θ) gives a visual representation of the localisation angle of the test stimulus for each frequency band, k. For modelling localisation cues across frequency, Stern et al. (1988) provides a best-fit third order polynomial model based on original data from Raatgever (1980) as,
ω(f) = 10−(b1f+b2f2+b3f3)/10 (7.4)
Where b1 =−9.383×10−2,b2 = 1.126×10−4,b3 =−3.992×10−8. At frequencies
above 1200 Hz, the function is set to the constant value ofω(1200). This integration process provides the summarised localisation prediction, S(θ).