Spatialized Audio Rendering for
Immersive Virtual Environments
Martin Naef, Markus Gross
Computer Graphics Laboratory, ETH Zurich
Oliver Staadt
Context: The blue-c
• Collaborative Immersive Virtual Reality Environment • Provide remote collaboration features
– Shared, synchronized virtual world
– Render partners using 3D video streams – Concurrent rendering and acquisition
Audio for VR
• Increase the sense of presence
• Guide the interest of the user
• Provide cues for orientation
• Requires 3D sound rendering
Overview
• Available systems and technology
• System overview
• Audio rendering pipeline
– Sound sources
– Simulation of physical effects – 3D positioning and mixdown
• API integration
• Experiments
Rendering Options
• Using headphones
– Model head/pinnae using HRTF
– Head-tracking for each user required
– Calibration for individual users required
• Using multiple speakers
– Multi-channel hardware required – Speaker placement is critical
Available Systems
• High-end systems
– Offline calculation of impulse responses
• E.g. CATT-Acoustics
– Convolution processors
• E.g. Lake
Available Systems
• Low-end systems
– PC sound cards
• Direct Sound, EAX, OpenAL
– Speaker placement
Design Goals
• Good sound quality at moderate cost • Believable results
– Not necessarily physically correct
• Support for networked sound
– Needed for remote collaboration
• Flexible speaker placement
• Efficient implementation on standard hardware
System Overview
• Part of the blue-c software core
Sound System Graphics System sync Source Localization Pipeline Reverb 11 1 n n nn Mix Bus Application Scene Graphics• Pipeline
Audio Rendering
Pipeline
Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Speakers Room-EQ LPF Sub Reverb Head TrackingAudio Sources
• Recorded audio
– Mono samples or loops for effects
– Multi-channel files for background music
• Live input
– Microphones
– External synthesizer or sampler
• Networked input
– Remote microphones for collaboration Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking
Audio Sources
• Keep all state information
– 3D position
– Reference distance – Gain
– Temporary rendering data
• Filter coefficients and state • Delay lines
• Mixdown matrix Distance DelaySource Air Absorption Distance Gain
Distance Delay
• Simulate propagation speed of sound (300 m/s) • Store and delay samples in a memory buffer
• Keep independent write and read pointers • Read pointer is moved according to distance
– Linear interpolation of time and samples – Results in a frequency shift (Doppler effect)
Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking Delay Line Write Data Read Data P cur P cur+ td.last P
Air Absorption
• Frequency-dependant power loss
– Higher frequencies are attenuated more – Only perceivable for large distances
(approx. -4 dB per 1000 m at 1 kHz)
• Simplified model
– High-shelving filter (bi-quad)
Source Distance Delay Air Absorption Distance Gain
Distance Gain
• Power loss according to distance
• Uses reference distance
Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking s ref d
D
D
L
=
3D Positioning
• Simulate source direction using a discrete, small number of speakers
• Distribute mono-stream onto multiple speaker channels
Source Distance Delay Air Absorption Distance Gain
3D Positioning
• Calculate channel gains with dot-product
• Open up "active angle" to avoid differences in perceived spread
• Normalize gain factors
Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking
⋅
+
=
,
0
1
.
1
1
.
0
max
spk s chnv
v
L
Loudness Projection
• Correct the individual channels
to move "sweet spot"
• Use head-tracking information
• Allows irregular distances to
the listener for individual
speaker
Source Distance Delay Air Absorption Distance Gain spk spkD
D
L
=
Room Simulation
• Simulate room echo
• Provide a sense of the size and material of the acoustic space
• Two fundamental approaches
– Simulated impulse response (large FIR filters) – Parameterized reverberation algorithms
Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking
Room Simulation
• Separate send channel to studio effect processor (t.c. M-ONE XL)
– Provides smooth, pleasing reverb – Intuitive parameterization
– Mix reverb output onto the mix bus
• Use effect send gain as additional distance cue
– High direct-sound to room echo ratio for close sounds Source Distance Delay Air Absorption Distance Gain 2 ref 1− = D L
Room EQ and LF
Management
• Parametric equalizer in the mixing bus
allows to adjust to acoustic environment
– Attenuate resonant frequencies
– Account for non-linear speaker response
• Low-frequency management
– low-pass filter a sum signal to drive subwoofer Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking
Fused Pipeline
• Mix signal onto main bus using a single mixdown-matrix
– Source gain – Distance gain
– Position and projection gain for each speaker channel
• Steps are reduced into a single vector-matrix multiplication Source Distance Delay Air Absorption Mix Matrix Speakers Rev. 3D Positioning Sub f g Room-EQ LPF Source Distance Delay Air Absorption Distance Gain
Fused Pipeline
• Mixdown matrix
– Calculated at audio block boundaries – Linear interpolation between last and
current matrix (every 32 samples)
– Provides smooth transition between different positions Source Distance Delay Air Absorption Mix Matrix Rev. 3D Positioning f g Room-EQ LPF
API Integration
• Sound service in the blue-c API core
– Control sound sources and system
• Audio nodes in the scene graph
– Sound as object attribute
– Support transformation nodes
– Provide translation between virtual (scene) and real coordinate systems (physical setup)
Benchmarks
• Single MIPS R12000 CPU, 400 MHz
• 44.1 kHz sampling rate, 20 ms latency, 8 channel ADAT input/output
• Delay-line is expensive
• Latency has little influence
33 sources 31 sources 65 sources Stream 30 sources 25 sources 54 sources Live 37 sources 33 sources 78 sources Preload Localized Stereo Mono Source
Applications
• Used for several applications
– Landscape (ship seeking test) – Infoticles
– "Fashion show" blue-c feature demo – Collaborative chess
Conclusions
• High quality sound system
• Based on standard components
• Moderate cost
– ~ US$5000 for audio system
Future Work
• Integration into area management
– Culling of sound sources – Portal effects
– Assign reverberation parameters to areas
• Linux port
http://blue-c.ethz.ch
Related Work - Acoustics
• [Begault:94] Overview
• [Gardner:92] Virtual Acoustics / Reverb • [Krockstadt:68] Ray-tracing
• [Funkhouser:99] Beam-tracing
• [Gardner:94] HRTF • [Pulkki:99] VBAP
Related Work - VR
• [Takala:92] Sound Rendering
• [Tsingos:97] Soundtracks for animation
• [Eckel:99] Cyberstage Sound Server
• [Jot:99] IRCAM Spatialisateur
• [Huopaniemi:99] DIVA
Implementation Notes
• Rendering runs in its own process
– Sound sources can be added and modified at any time
– Parameter updates only at block boundaries
• Runs on
– SGI Onyx 3200 (MIPS R12000, 400 MHz)
– I/O through 8 channel ADAT
– Inexpensive studio hardware and speakers
– ~ US$5000 for audio system
SoundService SoundSource PreloadSource LiveInputSource StreamSource PreloadData LiveInput 3DPositioning ReverbControl
Speaker Placement
• More speakers means better localization
– 6 speaker provide good results – 8 speakers almost "equal power"
distribution Source Distance Delay Air Absorption Distance Gain 3D Positioning Projection Room-EQ LPF Reverb Head Tracking