5.3 Ambient Services
6.1.3 Generating Anthropomorphic Auditory Icons
We aimed at finding the means to extract the attributes of aforementioned affect bursts, which are responsible for the recognition of affect and to apply them to new carrier sounds, which could then be associated to virtual or physical entities such as smart objects. This would en- able us to flexibly generate new affect bursts for arbitrary sources by designing carrier sounds that reflect the peculiarities of that object and by modulating it to express a corresponding affective state. The optimal solution in theory would be to remove all human-like acoustic attributes from recorded affect bursts, but at the same time keeping sufficient acoustical in- formation that would enable a listener to perceive the intended affective notion with the same efficiency as with the original sound material.
6.1. ANTHROPOMORPHIC AUDITORY ICONS 151
In order to systematically synthesise anthropomorphic auditory icons, we started with an analysis of the recordings of the Montreal Affect Bursts. Informal listening tests and an anal- ysis of the spectrograms of the given sound samples indicated that it will be hard to extract the emotional component from several affect bursts, which we suspected to be grounded in subtle nuances beyond pitch and envelope. We therefore recorded an additional series of af- fect bursts in a recording studio with a 27 years old male speaker and aimed at creating a wide variety of affective expressions with which we can conduct further experiments to identify the most suitable material. In total we have recorded 97 affect bursts and 43 samples with different kinds of physiological sounds. Figure 6.1 provides an overview of the categories of the recorded sound samples; the numbers in brackets display the amount of sound samples in the respective category. These recordings (plus the Montreal Affect Bursts) could directly be incorporated into realisations of TASO systems or, as described in this section, provide a basis for further synthesising efforts.
Recordings (male speaker,
27 years old) Affective Sounds (97) Physiological Sounds (43) Belch (6) Cough (2) Flatulence (5) Hiccup (3) Nausea (18) Sneeze (2) Snore (2) Swallow (5) Admiration (7) Anger (10) Boredom (6) Contempt (7) Disgust (13) Fear (7) Happy (3) Pain (12) Pleasure (5) Relief (4) Startle (10) Suffering (6) Surprise (5) Worry (2)
Figure 6.1: Overview of newly recorded and classified sound samples
A literature review in the field of electronic sound engineering lead us to a so called
vocoder2, which is recently more common in the domain of electronic music production. A
vocoder applies a multiband filter on the carrier sound and the modulating sound, thereby dividing both source samples into a variable number of frequency bands. Furthermore, the vocoder follows the envelope of the modulating sound in each band and applies this envelope to the corresponding band of the carrier sound, which is realised through a Voltage Controlled Amplifier (VCA). Figure 6.2 displays a schematic diagram of this process. We utilised this method in order to extract the energy and pitch characteristics of the modulating sound (the original affect burst in our case) and to apply it to a carrier sound (for which we used dif-
2
Multiband filter VCA VCA VCA VCA Modulator Carrier Vocoded Signal
Band 1 Band 2 Band 3 Band x
Mixer
Figure 6.2: The vocoding principle1
ferent artificially generated sounds, e.g. sine waves), which determines the overall timbre of the resulting sound sample. In addition to vocoding we applied custom optimisations on certain sounds, depending on the acoustic attributes of the particular affect burst. For exam- ple, for vowel-deprived sounds (like a snarl-like sound expressing anger) we added a low frequency oscillator to accentuate the rhythm of the sound in certain frequencies. Another method of amplification was to exaggerate the pitch curve afterwards by manually adjusting the pitch over time or adding a frequency shifter to create frequency vibration (as e.g. with the sound expressing disgust). Figure 6.3 displays waveforms and spectrograms of the orig- inal Montreal Affect Burst recordings of happiness and anger performed by speaker number 59, followed by the vocoded versions, based on a sine wave and a sawtooth carrier sound.
2adapted fromhttp://www.dma.ufg.ac.at/app/link/Grundlagen:Audio/module/8086,
6.1. ANTHROPOMORPHIC AUDITORY ICONS 153
Figure 6.3: Exemplary waveforms and spectrograms (0 - 8000 Hz).
The sound representing happiness basically consists of laughter, resulting in a distinct energy pattern, which can be recognised in both types of diagrams and vocoded signals. Thus, the rhythm of the laughter is also recognisable with a sine wave carrier, which actually covers only a limited bandwidth. The opposite case can be seen in the anger example, which is a growl-like sound. The waveform does not have a very sharp profile, but a rather complex energy distribution over different frequencies. The diagrams show that the sine wave carrier is not suitable to reflect this, whereas the application of a sawtooth-based carrier achieves a more similar pattern of energy distribution over the whole frequency range. Since this is a consequence of the vocoding principle, a general rule for the construction of carriers must be to create sounds that cover a wide frequency spectrum.
As with these two examples, we have used Ableton Live 8.1 and the Eiosis ELS 22-band
Vocoder plugin for creating a a set of compound filters based on vocoding for each affect
general framework to create anthropomorphic auditory icons for the categories anger, disgust,
happiness, pleasure, sadness, surprise, fear, and pain. These filters can be applied to new
carrier sounds to create anthropomorphic sounds customised to particular objects. In Section 7.1 we present a user study on the perception of affect in such synthesised affect bursts.