Chapter 3. New Methods for Artificial Spatialisation
3.2 Current Computer Music Tools
One of the primary initial goals of this research was to provide a stable, flexible and efficient open source tool for HRTF based binaural spatialisation. The solutions available at the time this project was commenced will be discussed below (hrtfer for Csound and earplug~ for Pure Data (PD)), as well as more recent additions to the open source repertoire. The more established approaches are summarised in table 3.1, below.
Approach Detail Comments
hrtfer MIT dataset, truncates to
nearest source
Lack of interpolation
earplug~ Time domain interpolation
algorithm
Computationally costly
iem_bin_ambi Virtual ambisonics Static setup
Table 3.1: Summary of established Computer Music binaural spatialisation tools
3.2.1 hrtfer for Csound
Csound is an open source software tool used for audio based research and creative activities. The core of the system is based on opcodes: processing units with a specific function (e.g. signal generation/processing, for more see [33, 25]). One such
opcode is hrtfer, developed by Eli Breder and David MacIntyre in 1996. The opcode uses the MIT HRTFs [142], and essentially convolves the input with the appropriate HRTFs. Accurate spatialisation is available at static locations which correspond exactly to HRTF measured points. However, at non-measured points, the nearest data will be used, resulting in potential angular imprecision. This lack of
interpolation also causes a problem in the case of moving sources. The source jumps from measured point to measured point, often resulting in discontinuities in the output. Statically, the dataset is relatively dense at key locations (primarily the horizontal plane), but does not satisfy MAA requirements. The authors suggest a fade out of the old convolution result and a fade in of the new to minimise these ‘clicks’, which does reduce the severity of the noise. This feature is, however, disabled in the latest version of Csound, as it causes dropouts in the audio. When implemented by this author, the crossfades do reduce the discontinuities to a degree, depending on the frequency content/bandwidth of the source (narrow-band sources are still effected by the abrupt FIR filter switching; more noisy sources may mask the switch). In all sources, however, jumps in location may perceptually imply a
staggered path, when a smooth trajectory is more desirable. This opcode could greatly benefit from an interpolation algorithm, as discussed in the previous chapter. Also, hrtfer uses the compact set of HRTFs, when perhaps the diffuse set is now more appropriate.
3.2.2 earplug~ for PD
PD offers a more visual default editor than Csound, allowing users to edit a canvas/patch, using object boxes not unlike csound opcodes. One such object is
earplug~ [229], which does offer an interpolation algorithm. The four nearest
thus derived for each processing block (64 samples, as per PD’s default). Previous interpolated HRIRs are stored and a similar interpolation is performed between the current and previous HRIR (over time in this case; essentially fading out the old and in the new, a computationally costly process). This interpolation is performed in the time domain (on empirical HRIRs), the issues with which have been discussed in the previous chapter. Also, as discussed in chapter 1, time domain convolution (of 128 point interpolated HRIRs) is considerably more computationally costly than convolution performed in the frequency domain.
3.2.3 Virtual Loudspeakers: iem_bin_ambi
In [148], a virtual loudspeaker approach is taken, implemented in PD. Essentially, static HRTFs are used to spatialise sources at loudspeaker positions. This paradigm will be further discussed in chapter 6. Briefly, in a static listener scenario, this approach removes the need for HRTF interpolation; source movement can be controlled by the multi-channel signal feeding the virtual loudspeakers. In this case, ambisonics is chosen as the multichannel algorithm. Inherently, any imperfections of the multichannel approach will be reflected in the binaural reproduction. The
iem_bin_ambi objects realise this algorithm.
3.2.4 More Recent Approaches
The work on IIR HRTF filters discussed in the previous chapter is presented as a PD external mobile~ [172, 173] (under development). In [181], the virtual ambisonics approach (implemented as the Girafe system) is discussed, with a view to
implementation in Super Collider: a dynamically typed, client/server based audio processing language.
Perhaps most interesting is the minimum-phase based CW_binaural~,
presented at the PD convention in July 2009 [54]. The research paper presenting this work clearly highlights the issues previously raised with minimum-phase
interpolation implementation. The object is designed using an object-oriented paradigm, to allow for updates; its authors underline the continuously developing nature of the research field, echoing the conclusions to the previous chapter. Currently, cross-correlation is used to estimate ITD, but desirability of support for various methods is highlighted. The authors also highlight the potential timbral distortion imposed by using simpler approaches to delay lines. More complex methods will reduce this distortion, but are more costly. In fact, in the benchmarking tests performed, CPU peak usage is the same for independent HRTF filtering (an FFT-based 128-point filter) as it is for implementation of an independent 3rd order delay; 3.12 peak CPU usage. A significant reduction is reported when using linear interpolation (1.56 peak usage, only slightly larger than with non fractional delays). In another interesting CPU usage test, 6.24 peak usage is reported in FFT based processing of the complete process and 20.2 in time-domain convolution processing. It is also important to consider real-time performance when implementing delays; higher order delay line interpolation algorithms typically require use of past and future samples, increasing latency. Using previous samples only makes the process causal.
It is also worth pointing out that commercial, proprietary solutions are not discussed here, as the source code/algorithms used are not available. However, perhaps the most commonly used commercial tool in Computer Music: spat~, for Max/MSP, is based on Jot’s research, which is abundantly referenced in this work.
In conclusion, several solutions exist in the Computer Music domain (judging by recent trends, it is hoped that activity in the area will continue to flourish). In light of the literature reviewed in the previous chapter, there is, however, a need for an approach that avoids the minimum-phase approximation, minimises any data preparation/processing/compression (primarily to allow for direct creative use by non-experts), allows for smooth movement of sources, is efficient, and is supported in a flexible and dynamic environment.