Chapter 3. New Methods for Artificial Spatialisation
3.3 Novel Algorithms: Theoretical Discussion
3.3.5 Implementing a Working Spatialisation Tool using the Non-linear ITD
Having derived a non-linear low-frequency curve, issues involved with
implementing these scaling factors, both theoretically and practically will now be discussed.
3.3.5.1 Applying Phase
The values derived from the Functional Model are then used directly in the re- synthesis of the phase spectrum of the required HRTF in the spatialisation
application. Magnitude values are interpolated as before, phase values are derived from the Functional Model. ITD is transformed to IPD by simply multiplying each spectral bin frequency by 2 times the ITD (from the opcode implementation discussed in the next chapter):
phasel = TWOPI_F * freq * -(itd/2); phaser = TWOPI_F * freq * (itd/2);
The formula used essentially multiplies 2 by the bin frequency to give the amount of rotations around the unit circle per second for the frequency in question. The ITD value is the amount of seconds delay. Therefore, a phase value for each frequency is derived. Timing information is thus introduced into the signal.
Importantly, the leading ear is given a positive orientation, and multiplied by half the ITD value. The lagging ear/ear further from the source is given a negative
orientation, and multiplied by minus half the ITD. This apparently unintuitive operation is discussed below.
It is perhaps helpful to view the problem from both an ITD and IPD point of view. ITD is, in these circumstances, a vectorial quantity; it has direction. ITD will be positive on the nearer side to the source, and negative on the farther. From a phase point of view, the leading ear will always have a larger phase, as above. Positive phase goes to the nearer ear, as the source is arriving from that side.
For example, a source from right: the right ear will be given a positive phase, the left, negative. Phase difference will then imply the correct ITD, as it does in the scenario discussed above whereby IPD is extracted as opposed to imposed. Each ear essentially follows its own phase function, as illustrated in figure 3.6, below. This breakdown of phase is also utilised by Zotkin [234, 233].
Figure 3.6, IPD orientation
There is an x axis switch depending on direction of source arrival. This switch can alternatively be thought of as right to left implying positive to negative, and left to
right implying positive to next positive cycle ( to 3), which is equivalent to wrapping back to negative.
3.3.5.2 Impulse Shifting
Imposing phase values in this way will mean that the zero-centred/zero-phase impulse will wrap to the end of the impulse for the nearer ear (the positive phase essentially implying an earlier onset, which wraps to the end of the impulse; the negative phase is delayed, as it gets to 0 phase later). Moving back into the time domain, it now appears that the nearer ear impulse happens after the further ear, as the nearer impulse has wrapped around to the end of the impulse. See the below figure for an example.
Figure 3.7: A non-shifted (above) and shifted (below) Functional Phase based Stereo HRIR, for a source at 0 degree elevation, 90 degree angle.
This is clearly an unnatural result. Although IPD will be correct, even a casual observation of the impulse illustrates the error in the order of the sound reaching the respective ears. For this reason, the impulse is shifted in time, by half the size of the
buffer. This shift ensures a causal filter, and is also performed in the linear phase model in [111]. Essentially, this process adds the correct phase spectra to the zero- phase, magnitude-only impulse and moves it to be centred around the mid point of the filter. The result is a time-accurate and phase-accurate filter. Interestingly, adding this time alignment provides much better localisation. This highlights the importance in correct onset time as well as phase spectra for localisation.
In the figures above, both HRTFs represent the HRTF for a source to the right of the listener (0 degree elevation, 90 degree angle). The right ear should intuitively receive the signal first. As the functionally derived phase wraps around the zero time point, this is not the case, as shown in the first figure. If the impulse is shifted, to be centred around the centre tap of the filter, the situation is rectified. Interaural phase and onset time are now both correct.
An STFT process is required for dynamic sources, as phase is no longer derived to match magnitude (minimum-phase) or static (Phase Truncation). Spatialisation in this scenario is more successful without the impulse shift. As the STFT is used here, the process cannot strictly be defined as convolution. A more accurate description is perhaps an STFT-based filtering process. Magnitudes and phases are imposed on the input sound, but the full convolution output is not saved, as the output is the same size as the input/impulse buffer. The magnitude spectrum is, however, filtered by the impulse and the phase spectrum is also processed to mimic the delays inherent in the phase spectra of the derived impulse. Due to the processing departures from traditional convolution employed here, audible high frequency noise will appear if there are abrupt peaks in an impulse. Usually, impulses start at the beginning of the file, temporally, so these peaks will be windowed in the output. Shifting the impulse is therefore not desirable in STFT
implementation (and indeed, introduces noise due to non windowed, centred impulse peaks). Spatial characteristics are however emphasised by repetition.
3.3.5.3 A Step towards Individualisation
The Functional Model uses the Woodworth formula as a basis for initial ITD calculation. As this formula includes a radius parameter, the user can enter an appropriate radius for their head. This provides an element of individualisation. However, as the MIT dataset is used, HRTF data will imply listening through KEMAR’s ears, with KEMAR’s head and torso altering auditory events. Also, as discussed in the next chapter, there appears to be an optimal radius for low-
frequency accuracy with regard to comparison to the empirical data, which may not necessarily be a good fit of an arbitrary user’s HRTFs. This radius based
individualisation is also recognised in [32].