GCT535: Sound Technology for Multimedia
Spatial Audio
Outlines
●
Localization using HRTF
Hearing Sound in Space
●
We have different auditory experiences depending on where we listen to
sound
Hearing Sound in Space
●
Sound Localization
○ Hear sound as a point source in 3-D space
○ Possible with two ears and human body structure
○ Panning (stereo), 3-D Sound
●
Room Effect
○ Hear sound and its myriads of reflections
○ Determined by the room type: size, structure and materials
Sound Localization
●
Perception of Directions
○ A sound source arrives in each of two ears with differences in time and level
○ We call them ITD (inter-aural Time Difference) and IID (Inter-aural Intensity
Difference)
○ IID is used as a main cue of direction above about 1.5 kHz
●
Only the level and time differences are enough?
R L
ITD
Head-Related Transfer Function (HRTF)
●
A filter that characterizes how a sound arrives in the outer end of ear
canal from the source
○ Measured from human body or dummy head
○ Determined by the refection on head, pinnae, torso or other body parts
○ Function of azimuth (horizontal angle), elevation (vertical angle) and
frequency 𝐻!(𝜔, ∅, 𝜃) 𝐻"(𝜔, ∅, 𝜃) R L Source: https://developers.google.com/vr/concepts/spatial-audio
HRTF Measurement
Fall 2007 Music 220D Report Kolar 3Subject/Receiver Alignment: subject wearing mic attachment/strain relief and
mirror-alignment headgear positioned on rotating stool (noted during 2nd session that midpoint of rotation not fixed because of seat asymmetry); flashlight suspended above subject to reflect light off top-of-head mirror onto azimuth markings on walls for sighted alignment by subject (fig. 3); line-of-sight verification by subject for speaker elevation alignment
figure 3: receiver mic attachment & alignment headgear, flashlight-mirror-marker positioning system
Equipment/Signal Chain: AuSIM HeadZap speaker (fig. 2 & 3) powered by Samson
Servo 120 amplifier, AuSIM Binaural Probe Microphone Set (AuPMC002, Sennheiser KE4-211-2 omni electret capsules, fig. 4) into Grace 101 preamplifier, RME Fireface 800 converter, Digital Performer 5.12 recording software on MacBook Pro running OS 10.4.11.
figure 3: AuSIM Binaural Probe Microphone specifications and response per product literature Measured Head-Related Impulse Responses Front Back Back Left Right
HRTF Measurement
Front Back Back Left Right Magnitude response of the HRIRsRendering Sound in 3-D Space
●
Binaural Synthesis
○ 3D sound effects
○ Typically run with headphones
○ Applications: Game, VR
●
Methods
○ Convolution with HRTFs: the IRs are typically several hundreds sample long
○ Modeling HRTFs with biquad filters (e.g. Prony’s method)
●
Issues
○ Standardization: dummy head
○ Individualization Input Left output Right output ℎ!(𝑡, ∅, 𝜃) ℎ"(𝑡, ∅, 𝜃)
Head-Related IR Datasets
●
ARI
○ 85 subjects and 1550 source positions
○ https://www.kfs.oeaw.ac.at/index.php?option=com_content&view=article&i
d=608&Itemid=857&lang=en
●
CIPIC
○ 45 subjects and 1250 source positions
○ http://interface.cipic.ucdavis.edu/sound/hrtf.html
●
IRCAM
○ 50 subjects and 187 source positions
HRTF Demos
●
Virtual Barber Shop Hair Cut
○ https://www.youtube.com/watch?v=8IXm6SuUigI
●
OpenAL Example
○ https://www.youtube.com/watch?v=tY9DhuEe1WY
●
Google Chrome Omnitone
○ http://googlechrome.github.io/omnitone
●
My Research Work
Stereo Panning
●
Amplitude Panning in Loudspeakers
○ The gains for each speaker for monophonic source
■ 𝑥! 𝑡 = 𝑔!𝑥 𝑡 , 𝑖 = 1, 2 (left, right)
○ The tangent law (Bennett et. al.)
■ "#$%"#$% ! = &"'&# &"(&# ○ Power preservation ■ 𝑔)* + 𝑔** = 1
BASIC SPATIAL EFFECTS 143
When the source is very near to the listener, there are also some binaural effects which are used in distance perception [DM98]. With close sources the magnitude of ILD is higher and appears at lower frequencies than a far source with the same direction, which creates the perception of a nearby source.
5.3 Basic spatial effects for stereophonic loudspeaker and headphone playback
The most common loudspeaker layout is the two-channel setup, called the standard stereophonic setup. It was widely taken into use after the development of single-groove 45◦/45◦ two-channel
records in the late 1950s. Two loudspeakers are positioned in front of the listener, separated by 60◦ from the listener’s viewpoint, as presented in Figure 5.2. The setup of two loudspeakers is
very common, though quite often the setup is not as shown in the figure. In domestic use, or in car audio typically the listener is not situated in the centre, but the loudspeakers are located in different directions and distances from him than in Figure 5.2. However, even then two-channel reproduction is preferred from monophonic presentation in most cases. On the other hand, in some cases the listener can be assumed to be in the best listening position, as in computer audio.
q0 = 30° q Virtual source Best listening area
Figure 5.2 Standard stereophonic listening configuration.
This section deals with the spatial effects obtainable with such two-channel reproduction and simple processing. Two types of effects are presented, the creation of point-like sources, and the creation of spatially spread sources. More advanced methods for loudspeaker reproduction with HRTF processing are discussed in Section 5.4.5, which provide some more degrees of freedom, but unfortunately also introduce some limitations in listening position and listening room acoustics.
5.3.1 Amplitude panning in loudspeakers
Amplitude panning is the most frequently used virtual-source-positioning technique. In it a sound signal is applied to loudspeakers with different amplitudes, which can be formulated as
Reverberation
●
Acoustic phenomenon when a sound source is played in a room
○ Thousands of echoes are reflected against wall, ceiling and floors
○ The patterns are determined by the volume and geometry of the room and
materials on the surfaces
○ We can recognize the geometry and composition of the room from the
sound
○ They provide different (often better) feelings of the sound
Sound Source
Listener Direct sound
Room Impulse Response
●
Room reverberation is characterized by its
impulse response (IR)
●
The room IR is composed of three parts
○ Direct path
○ Early reflections: convey a sense of the
room geometry and size
○ Late-field reverberation: high echo density
like noise, determined by room size and materials
●
RT60
Signal Processing Techniques for Digital Audio Effects 70
Reverberation and Linear Time-Invariant Systems
0 10 20 30 40 50 60 70 80 90 100 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
CCRMA Lobby Impulse Response
time - milliseconds
response amplitude
direct path
early reflections
late-field reverberation
Figure 61: CCRMA lobby response to transient signal
• The arrival times and nature of reflected source signals are sen-sitive to the details of the environment geometry and materials. • As a result, reverberation may seem unmanageably complex. • Fortunately, reverberation has two properties which allow its
analysis and synthesis without having to know the details: – linearity, and
– time invariance.
• Here, we explore reverberation by studying impulse responses of enclosed spaces.
Room Impulse Response
Artificial Reverberation
●
Convolution reverb
○ Measure the impulse response of a room
○ Convolve input with the measured IR
●
Mechanical reverb
○ Use metal plate and spring
○ EMT140 Plate Reverb: https://www.youtube.com/watch?v=HEmJpxCvp9M
●
Delay-based reverb
○ Early reflections: feed-forward delayline
○ Late-field reverb: allpass/comb filter, feedback delay networks (FDN)
Delay-based Reverb
●
Schroeder model
○ Cascade of allpass-comb filters
○ Mutually prime number for delay lengths
Delay-based Reverb
●
Feedback Delay Networks
○ Mixing matrix creates “good spreading” of delayed outputs
■ Chosen to be orthonormal (unitary matrix)
○ The lengths of delaylines are chosen to be mutually prime number
○ Should generate a white noise in lossless mode
○ T60 is controlled by the loop gains
Feedback Delay Networks
+ x(n) Z-M1 Z-M2 Z-M3 + a11 a12 a13 a11 a12 a13 a11 a12 a13 y(n)
Measuring Impulse Response
●
Measurement Model
○ Assume the system as linear time-invariant
○ Use a test signal and the output to derive the impulse response
○ Using a sine sweep: based on the convolution theorem
Signal Processing Techniques for Digital Audio Effects 74
Impulse Response Measurement
Measurement Model:
Figure 63: Measurement Configuration
- Room presumed LTI.
- Why not just crank an impulse out the speaker, record the result
and declare victory?
s(t)
LTI system
r(t)
test
sequence measured response
n(t)
h(t)
measurement noise
Figure 64: Measurement Model
- Noise assumed additive, unrelated to the test signal. (How will it
be related to the system?)
𝑟 𝑡 = 𝛿 𝑡 ∗ ℎ 𝑡 + 𝑛 𝑡 𝑟 𝑡 → *ℎ 𝑡
*ℎ 𝑡 = FFT'){ FFT 𝑟 𝑡
Measuring HRTFs
20 𝑠(𝑡) 𝑟!(𝑡) 𝑟"(𝑡) ,ℎ!(𝑡) ,ℎ"(𝑡)Fall 2007 Music 220D Report Kolar 3
Subject/Receiver Alignment: subject wearing mic attachment/strain relief and
mirror-alignment headgear positioned on rotating stool (noted during 2nd session that midpoint of
rotation not fixed because of seat asymmetry); flashlight suspended above subject to reflect light off top-of-head mirror onto azimuth markings on walls for sighted alignment by subject (fig. 3); line-of-sight verification by subject for speaker elevation alignment
figure 3: receiver mic attachment & alignment headgear, flashlight-mirror-marker positioning system
Equipment/Signal Chain: AuSIM HeadZap speaker (fig. 2 & 3) powered by Samson
Servo 120 amplifier, AuSIM Binaural Probe Microphone Set (AuPMC002, Sennheiser KE4-211-2 omni electret capsules, fig. 4) into Grace 101 preamplifier, RME Fireface 800 converter, Digital Performer 5.12 recording software on MacBook Pro running OS 10.4.11.
Measuring Room IRs
Music 318, Winter 2007, Impulse Response Measurement 13
0 500 1000 1500 -0.5 0 0.5 sine sweep, s(t) amplitude frequency - kHz
sine sweep spectrogram
0 200 400 600 800 1000 0 5 10 0 500 1000 1500 2000 -1 -0.5 0 0.5
1 sine sweep response, r(t)
time - milliseconds
amplitude
time - milliseconds
frequency - kHz
sine sweep response spectrogram
0 500 1000 1500 2000 0 5 10 0 100 200 300 400 500 600 700 800 900 1000 -0.04 -0.02 0 0.02 0.04 0.06
0.08 measured impulse response
time - milliseconds
amplitude
CCRMA Lobby Measurment Example
s(t)
r(t)
ˆ
h (t)
Room IR datasets
●
Open AIR
○ http://www.openairlib.net/
●
Aachen Impulse Response Database
○
Convolution by FFT
●
Direct convolution in time domain is computationally expensive
○ It has O(n2) in complexity
○ Especially, when the length of IR is long (e.g. room IR)
●
FFT Convolution
○ Using Convolution Theorem: 𝑥 𝑛 ∗ ℎ 𝑛 ↔ 𝑌 𝑘 = 𝐻 𝑘 𝑋(𝑘)
○ FFT has O(nlog2n) and convolution in frequency O(n) in complexity
●
However, the issue is latency when the IR is long. How can we implement
it in real-time reverb?
○ Solution: partitioned convolution
References
●
Reverberation using Feedback Delay Network
○ https://ccrma.stanford.edu/~jos/pasp/FDN_Reverberation.html