Analysis of Results and Recommendations - The composition and performance of spatial music

The experiments discussed in the preceding sections illustrate the wide range of interrelated factors which influence the perceived performance of spatialization techniques such as stereophony, Ambisonics and WFS. Virtual sources positioned using multichannel stereophonic techniques have been found to be unstable if the source is not positioned at a loudspeaker. This is true at all listener positions for lateral and rear sources but also for front sources if the listener is displaced laterally closer to one of the loudspeakers. However, although source localization often collapses to the nearest loudspeaker, this will always be to the loudspeaker pair about the desired source position. Therefore, increasing the number of loudspeakers will also increase the overall localization accuracy and six loudspeakers seems to be the minimum number of channels required for reasonable accuracy in all directions, at least for a single listener. The differences between virtual sources positioned at, or between loudspeakers also has an effect on dynamically moving sources and a number of studies have found that trajectories created with stereophony tend to therefore highlight the positions of the loudspeakers.

It is clear that for Ambisonics, the number and arrangement of loudspeakers in the array is particularly important. A regular hexagonal array was found to produce much better results than either square or rectangular layouts, particularly for lateral sources, and so can be taken as a minimum specification, as with stereophony. Likewise, the precedence effect influences localization accuracy in much the same way as with stereophony. The results of a number of tests clearly indicate that different decoder designs are optimal depending on whether playback is for a single listener or a group of listeners. A full-band max rE (or in-phase, if the audience is

very near the array) has been shown to be the optimal solution for larger listening areas as these decoders reduce (or eliminate, in the case of in-phase) the anti-phase components which aid localization at the centre point but significantly distort it at other listener positions. The results of a number of studies also seem to confirm that

localization accuracy is improved when the order of the Ambisonics system is increased [Pullki et al, 2005; Jot et al, 1999; Daniel, 2000]. This result was largely expected as an increase in order represents an increase in the spatial resolution of the spherical harmonic representation of the reconstructed sound field. The directional information represented by the spherical harmonics is therefore more accurately represented and, consequently localization accuracy improves. A number of studies have also clearly demonstrated that the size of the effective listening area also increases with the order of the system [Bates et al, 2007b; Frank et al, 2008]. A number of theoretical studies indicate that Ambisonics can only perfectly reconstruct a sound field in a very small area at the centre of the array, and this is only possible up to a certain frequency [Poletti, 1996; Daniel et al, 1998; Bamford, 1995]. Daniel suggests that the reconstructed sound field becomes increasingly and linearly distorted away from the centre point. Consequently, if the system order (and hence the

reconstruction frequency limit at the centre point) is increased, this will also increase the accuracy of the reconstructed sound field at other, off-centre positions [Daniel et al, 1998].

Ambisonics is generally preferred to stereophony for dynamically moving sources as it produces smooth trajectories that do not highlight the positions of the

loudspeakers. Consequently, Ambisonics will never produce a source using a single loudspeaker and so unlike stereophony, cannot produce the most tightly focussed virtual image possible. Increasing the order the system reduces this effect as the increase in directivity reduces the number of loudspeakers which are active at any one time. There is some evidence that a similar trade-off is apparent with recorded sounds as in one test, Soundfield microphone recordings were found to be more spacious and enveloping than stereophonic recordings, but also less accurate in terms of directional localization.

The simulation of distance appears to be largely dependent on the addition of artificial reverberation. Other processes such as the Doppler effect, air absorption, or wavefront curvature (as in WFS) do not seem to be able produce a reliable perception of distance on their own, but are important as secondary distance cues. While a straightforward ratio between the direct and diffuse reverberant signals can produce a relative sense of distance, the results of a number of tests indicate that early

reflections is preferable to purely specular reflections [Martin et al, 2001]. The addition of artificial reverberation clearly supports distance modelling but its effect on directional localization is unclear. Some have suggested that these additional indirect signals reduce localization accuracy especially at off-centre listening positions

[Begault, 1992], while others have suggested the exact opposite and argue that the increased realism of such a sound scene benefits localization [Lund, 2000].

WFS is in many respects very different from stereophony or Ambisonics and is certainly much more demanding in technical terms. While the results of various listening tests seem to suggest that well-localized virtual sources can be created with WFS this is very much dependent on the spatial aliasing frequency, and hence on the size of the array. In addition, questions remain as to how focussed these virtual sources are, and the contribution of the precedence effect in localization with WFS. While it appears that WFS can be used to increase the effective listening area of other spatialization techniques such as two-channel stereo or 5.1 surround sound, extending this to a full, large scale system is difficult, if for no other reason than the many, many loudspeakers which would be required to surround a large audience. In addition, the results of listening tests do not seem to support the claim that WFS can position sources behind or in front of the array through the reproduction of the correct

wavefront curvature. The one notable exception to this is when the listener can move through the listening area, otherwise WFS systems must use artificial reflections and reverberation to simulate different source distances, in much the same way as other spatialization techniques. These results indicate that WFS is perhaps not, at the moment at least, the most suitable system for the presentation of spatial music as the perceptible benefits do not seem to justify the vastly increased technical requirements.

6.8.1 Discussion

The preceding discussion illustrates the difficulties in the presentation of spatial audio to multiple listeners. The influence of the precedence effect is particularly noticeable for off-centre listeners and it appears that a high degree of directional localization accuracy can only really be achieved for every listener if a single loudspeaker is used. Spatialization techniques such as pair-wise amplitude panning, and to a lesser extent, higher order Ambisonics, produce the next best results, as the number of contributing loudspeakers is restricted and are situated in the

same approximate direction as the source. As lower order Ambisonics systems generally utilize every loudspeaker to produce the virtual image, these are particularly susceptible to localization distortion due to the precedence effect. WFS, while

appropriate for certain applications, does not seem to be viable as of yet for presentations of spatial music due to the technical and logistical restraints and the limited benefits.

It is also clear that Ambisonics is consistently preferred to stereophony for moving sources as it disguises the positions of the loudspeakers which results in a smoother trajectory. A number of non-standard amplitude panning techniques have been developed which attempt to overcome this problem through increasing the number of loudspeakers which are used at any one time. While these techniques certainly appear to improve matters for dynamically moving sources, it is not clear if they provide any advantage over a high order Ambisonics system other than the fact that they can readily collapse the virtual image to a single loudspeaker. Max Re and in-phase decoding schemes, without shelf filtering, appear to be the optimal decoding schemes for larger listening areas.

The results presented in this Chapter suggest that a minimum of six

loudspeakers is required to produce optimal results for a single listener with either stereophony or Ambisonics. An eight channel system would therefore seem to represent an acceptable minimum layout for larger number of listeners as it contains eight discrete spatial locations to which sources will be localized with a good degree of accuracy, it is sufficient for third order Ambisonics, and is reasonably achievable in terms of hardware. It has been the experience of the author that quadraphonic

systems produce extremely poorly localized lateral virtual sources when extended for larger numbers of listeners and movements from front to back instead abruptly switch between each position (see Section 10.2). While an additional pair of lateral

loudspeakers alleviates this issue somewhat, the wide angle between lateral loudspeaker pairs is still problematic. An eight channel system contains a pair of lateral loudspeakers and this provides a more useful degree of discrimination in lateral positions and movements, while still being reasonably efficient and economical. For these reasons, the author has adopted an eight channel loudspeaker array as a standard system for performances of spatial music.

6.8.2 Implications

The results presented in the preceding section suggest that it is very difficult to produce spatial locations and trajectories which are unambiguously perceived by every listener, in the same way. Even in the case of point sources which are clearly localized, each listener will be orientated differently with regards to the loudspeaker array, and so will have a different perspective on the spatial layout. As noted earlier, directional localization accuracy is the main topic under investigation in many of these tests, but this is not necessarily the only way in which space can be utilized in a musical composition. The results presented earlier suggest that this may in fact be a necessity. However, it is just as important to know if these other uses of space are clearly perceptible to an audience, and if so, which spatialization technique can achieve this most effectively, if at all? Clearly, Ambisonics is the preferred

spatialization technique for dynamically moving sources. However, it is also clear that the precise trajectory perceived by each listener will be strongly influenced by their position within the array.

If a recorded sound is to be used in spatial music composition, the ambisonic Soundfield microphone represents the most flexible recording option if an enveloping sound field is required. However, if a more directional diffusion is required, then monophonic or stereophonic microphone techniques are perhaps more applicable as although multi-channel microphone techniques can be very effective, they are tied to a specific reproduction layout.

While many composers continue to utilize various multi-channel techniques, others have adopted an entirely different approach based upon a single two-channel stereo source and a large, disparate collection of spatially distributed pairs of

loudspeakers, i.e. a loudspeaker orchestra. This aesthetic represents a very different approach to the multi-channel techniques discussed in the preceding chapters.

However, the art of diffusion is admirably focussed on the perception of the audience and the real technical problems which arise in these kinds of performances, something which is often lacking in multi-channel tape compositions.

The second half of this thesis will focus on spatial music composition via the analysis of a number of different composers and aesthetics, and some original compositions by the author. Different approaches to the use of space as a musical parameter will be assessed in terms of the technical and perceptual research presented

in the preceding chapters. Inevitably, greater emphasis will be placed on music from the twentieth century as many significant aspects of spatial music are dependent on technical developments from this era, however, spatial music is not solely a twentieth century phenomenon. The spatial distribution of performers has been used for

centuries in European religious choral music, and this antiphonal style is itself derived from the even more ancient call-and-response form. The next chapter in this thesis will examine this early form of spatial music and investigate the development of acoustic spatial music in the first half of the twentieth century, prior to the

development of recording and amplification technology and electronic spatialization techniques.

7 Acoustic Spatial Music

Spatial music is often closely associated with technological developments in the twentieth century, yet the use of space as a musical parameter is much older. Call- and-response patterns can be found throughout history in many different cultures and musical traditions. In this dialogue form, musical material is divided between two groups, which will necessarily be situated at two different spatial locations. Call-and- response patterns therefore represent the most basic form of spatial music and they are a fundamental aspect of the earliest formalized system of spatial music, antiphonal choral music.

In document The composition and performance of spatial music (Page 122-129)