• No results found

5.2 Description of model

5.2.3 Central processor

A decision-making device in computational models of human perception is a processor mapping the intermediate output to a final judgement, preferably designed to reflect

the cerebral structure and mechanism. Since, unfortunately, relevant information has yet to be fully understood, most of the binaural hearing models employ a decision- making device based mainly on assumptions that are consensually accepted. In the cross- correlation model, the peak position or the centroid of the cross-correlation function has traditionally been chosen as an indicator giving spatial location information of sound sources [8]. In the meantime, the development of artificial neural networks provided a more sophisticated non-linear decision device to combine all available information regarding the spatial extent of sound sources such as ITD, ILD and spectral cues [12, 72– 74].

In the current model, given the uniqueness of the EI-patterns in accordance with the known characteristics of auditory signal processing, a pattern-matching process has been assumed to take place in the central decision-making stage. First, a white Gaussian noise is filtered through one of the KEMAR HRTFs [27] that have been interpolated from 5-degree to 1-degree resolution (see Appendix A for the HRTF interpolation). If this synthesised binaural signal is considered as the input signal to the peripheral and

the binaural processes of the current model, the ultimate collection of the 60×360

EI-patterns corresponding to 60 auditory frequency bands and 360 azimuthal directions can be obtained to form a memory, or a template in a computational terms, of sound localisation, as each of these patterns is close to unique for corresponding direction of source in each auditory frequency band as discussed above.

Having established the template, a simple pattern matching procedure is employed to find the best match for a new target signal. Based on the cross-correlation between the target EI-pattern and the template, the pattern-matching procedure is represented by

χ(θ, f) = P τ,αEItg00(τ, α, f)·EIT00(τ, α, θ, f) qP τ,αEI002tg P τ,αEI002T (5.5) θp(f) = arg max θ χ(θ, f) (5.6) whereEI00

tg andEIT00 are the EI-patterns from the target and the template, respectively,

and χ indicates the normalised cross-correlation between the patterns.

It is expected that this pattern-matching process works in a similar way to finding the nearest neighbour in the characteristic-curve model described in chapter 2, where, pre-

sumably, the conversion factors and in Eq. (2.1) are equivalent to the neural

resolution determined by the amount of delay and attenuation in each tap of the 2D network shown in Fig. 5.4. Nevertheless, in order to show the equivalence between the two decision-making processes, it is essential to prove that the outcome of Eqs. (5.5)

and (5.6) is equal to that of Eqs. (2.2) and (2.3) at a single frequency, which is diffi- cult since the EI-patterns are computed for individual HRTFs, and are not analytically represented. Assuming that the left and the right channels of the binaural input signals are related only by time delay and amplitude difference, the analytical form of the EI- patterns have been approximated in appendix B, and this can be further investigated in future work to clarify the link between the two models.

Figure 5.9 shows an example of the function χ(θ, f) for a source at 45, where circles

indicate θp(f) in each of 60 frequency bands. It is obvious that greater similarity is

found between the target EI-patterns and the template when the response angle is in the vicinity of the actual target location. In addition, it is noteworthy that this pattern-matching procedure can give mirror-imaged errors associated with the front-

back confusion, which are indicated by the local estimates found around 135. This is

true even without the introduction of internal error, if a running, instead of frozen, noise source is used as an input signal.

It is essential to further combine the model predictions in each frequency band in order to produce a final global prediction. Working with the cross-correlation model, Stern et al. [75] and Shackleton et al. [37] previously dealt with this issue by making use of a frequency weighting of binaural stimuli. For instance, the latter has shown that the simple weighted addition of the cross-correlation functions across the auditory channel can represent a global cross-correlation function.

Similarly, the current model applies a weighting scheme to collect all the ‘local’ predic-

tions to establish a ‘global’ probability functionD(θ), where the weighting function has

been obtained from the energy spectral density multiplied by the salience factor of bin- aural stimuli suggested by Raatgever [37]. The latter reflects the empirical dominance of binaural stimuli at low frequencies, while the former assumes that a signal band of greater energy has more influence on the final decision. Fig. 5.10 shows examples of the weighting functions depending on the spectral characteristics of source signals.

Mathematically, this ‘power-weighting’ scheme can be represented as

D(θ) =

P

Pθθp(f)×W(f)

fW(f)

(5.7)

whereW(f) is the frequency weighting function, andδis the Kronecker delta. (It should

be recalled that the function D(θ) is defined only for integer numbers between 0 and

359, limited by the resolution of the interpolated HRTF.)

An example of the probability functionD(θ) is shown in Fig. 5.9(b), which resulted from

the estimate of the true source position, whilst the other at 135 indicates the possibility

of front-back confusion as already implied in Figs. 5.7 and 5.9(a). Since the pattern- matching procedure first produces estimates for each frequency band, it is possible to

have many distinctive peaks in the plot ofD(θ), arising from a multiple number of sound

sources that are separated in the frequency domain or at least have non-overlapping spectral components. Similar to the case discussed in section 2.3 regarding the dual images created by ambiguous interaural phase differences, the listener’s attention is assumed to play an important role when multiple peaks are observed in the probability

functionD(θ). Here, it is assumed that the model selects the estimate corresponding to