Ambisonics - Reproduction Techniques - Comparisons on the perceptions of reproduced urban sound

3.3 Reproduction Techniques

3.3.1 Ambisonics

3.3.1.1 Ambisonics: Fundamentals

The ambisonic system was the first technique developed to reproduce an entire soundfield. The simplest form is first-order ambisonic. Extended from this is higher-order ambisonic which splits the soundfield into a series of spherical harmonic functions and in doing so

raises the upper frequency that can be accurately reproduced and expands the space in which the soundfield is accurately produced. The aforementioned space is known as the ‘sweet spot’ and may be calculated from the system order [63].

Zero order ambisonics only accounts for the pressure component (W). For first order ambisonic recording, the soundfield is stored in ‘b-format’ consisting of 4 channels, W (omnidirectional information) and X, Y, Z (directional information from figure-of-eight microphones) [87]. These are then derived as:

Here i is the source, k the number of sources, si is the i-th source mono signal, and θi and Φi

are the horizontal direction (azimuth) and elevation (theta) of the source respectively. The 1

√2

⁄ weighting (on all channels) is an engineering solution required, particularly with synthesised soundfields containing many sound sources, to achieve a better balance of signal levels over the four channels.

Every loudspeaker in the array receives all four channels, each channel weighted according to the position of the loudspeaker:

With L, the number of loudspeakers (which must be at least the number of ambisonic channels, four); θj and Φj, the horizontal direction (azimuth) and elevation (theta) of the j-

th loudspeaker.

The scalability of ambisonics, with the introduction of higher-orders, allows the soundfield approximation to be optimised for given hardware, ergonomic or packaging constraints. It offers the best accuracy, and hence localisation, without requiring vast numbers of loudspeakers. When recording in ambisonic, manipulation is straightforward and the encoded material is usable on any ambisonic system.

Higher-order ambisonic in 2D takes the form of an equatorial ring. Since humans are more sensitive to localisation within the horizontal plane, 2D higher-order ambisonics is an affective auralisation technique, especially if some elevation is produced by psychoacoustic effects or by vector-based panning with some additional lofted speakers.

3.3.1.2 Choosing a 2D Ambisonic Configuration

A 2D Ambisonic configuration was chosen for two reasons. Primarily it is more practical and portable than a 3D configuration since the eight loudspeakers can be mounted on portable

speaker stands rather than requiring a large and complex custom rig. Portability was deemed important for the later stages of testing that may have required building the Ambisonic rig in various public spaces. (A later change in method gave rise to the use of binaural reproduction for Phase 3, as justified in section 3.3.2.)

Secondly, a study by C. Guastavino reported that 2D reproduction offered a greater level of perceived ‘naturalness’ for outdoor soundscapes and 3D reproduction was reported to offer a greater ‘naturalness’ for indoor soundscapes [88]. The study simply asked participants directly to rate the ‘naturalness’ of various soundfield reproductions. Admittedly this lacks enough evidence to propose only 3D ought to be used for the reproduction of indoor soundscape and only 2D for the reproduction of outdoor soundscapes. However, Guastavino’s comparison of perceived naturalness was deemed sufficient to justify the use of a 2D Ambisonic configuration that would, of primary concern, offer a practical and portable solution.

3.3.1.3 Physical Setup

An equatorial ring of eight channels was chosen for the ability to render third order as well as first and second order Ambisonics. Genelec 8030a active loudspeakers were arranged at a (sitting) listener’s ear height, starting at 27.5° azimuth and equally spaced at 45° angles at a radius of 1.5m. (Measurements were made to the front surface of the Genelec

loudspeaker casings, laterally central and an inch below the centre of the tweeter.) The loudspeakers’ LED displays were covered.

Phase One used first order Ambisonic decoding with MaxMSP. Phase Two used third order Ambisonic decoding with WigWare. An RME HDPSe MADI soundcard was used, feeding to an RME digital to analogue converter. The b-format recordings were 44.1kHz 16bit.

3.3.1.4 Calibration

In Phase One the sound pressure level was measured at MediaCity whilst simultaneously recording. A sound level meter was set to slow integration and A-weighting. The equivalent sound level was measured over ten seconds, LAeq,10s = 55.2dB. In the listening room, all 8 loudspeakers were set to the same volume setting before the ten second b-format recording was reproduced in ambisonics. LAeq,10s was measured with the same sound level meter and the playback amplitude adjusted between repeats until the LAeq,10s at the sweet spot reached 55.2dB.

The process of calibrating was streamlined in Phase Two. A b-format soundfield recording was made of diffuse 80dB(A) white noise in a reverberation room (using the same equipment settings as per field recording). (This was performed at the University of Salford, courtesy of Anugrah Sabdono Sudarsono.) Prior to the listening room tests, the eight channels were levelled one at a time with 70dB(A) of white noise and a sound level meter

at the sweet spot. After balancing, the b-format white noise was reproduced and the master volume set for a meter reading of 80dB(A) at the sweet spot.

In document Comparisons on the perceptions of reproduced urban soundfields and urban soundscapes : a mixed model approach (Page 81-86)