• No results found

Spatial Audio Juhan Nam

N/A
N/A
Protected

Academic year: 2021

Share "Spatial Audio Juhan Nam"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

GCT535: Sound Technology for Multimedia

Spatial Audio

(2)

Outlines

Localization using HRTF

(3)

Hearing Sound in Space

We have different auditory experiences depending on where we listen to

sound

(4)

Hearing Sound in Space

Sound Localization

○ Hear sound as a point source in 3-D space

○ Possible with two ears and human body structure

○ Panning (stereo), 3-D Sound

Room Effect

○ Hear sound and its myriads of reflections

○ Determined by the room type: size, structure and materials

(5)

Sound Localization

Perception of Directions

○ A sound source arrives in each of two ears with differences in time and level

○ We call them ITD (inter-aural Time Difference) and IID (Inter-aural Intensity

Difference)

○ IID is used as a main cue of direction above about 1.5 kHz

Only the level and time differences are enough?

R L

ITD

(6)

Head-Related Transfer Function (HRTF)

A filter that characterizes how a sound arrives in the outer end of ear

canal from the source

○ Measured from human body or dummy head

○ Determined by the refection on head, pinnae, torso or other body parts

○ Function of azimuth (horizontal angle), elevation (vertical angle) and

frequency 𝐻!(𝜔, ∅, 𝜃) 𝐻"(𝜔, ∅, 𝜃) R L Source: https://developers.google.com/vr/concepts/spatial-audio

(7)

HRTF Measurement

Fall 2007 Music 220D Report Kolar 3

Subject/Receiver Alignment: subject wearing mic attachment/strain relief and

mirror-alignment headgear positioned on rotating stool (noted during 2nd session that midpoint of rotation not fixed because of seat asymmetry); flashlight suspended above subject to reflect light off top-of-head mirror onto azimuth markings on walls for sighted alignment by subject (fig. 3); line-of-sight verification by subject for speaker elevation alignment

figure 3: receiver mic attachment & alignment headgear, flashlight-mirror-marker positioning system

Equipment/Signal Chain: AuSIM HeadZap speaker (fig. 2 & 3) powered by Samson

Servo 120 amplifier, AuSIM Binaural Probe Microphone Set (AuPMC002, Sennheiser KE4-211-2 omni electret capsules, fig. 4) into Grace 101 preamplifier, RME Fireface 800 converter, Digital Performer 5.12 recording software on MacBook Pro running OS 10.4.11.

figure 3: AuSIM Binaural Probe Microphone specifications and response per product literature Measured Head-Related Impulse Responses Front Back Back Left Right

(8)

HRTF Measurement

Front Back Back Left Right Magnitude response of the HRIRs

(9)

Rendering Sound in 3-D Space

Binaural Synthesis

○ 3D sound effects

○ Typically run with headphones

○ Applications: Game, VR

Methods

○ Convolution with HRTFs: the IRs are typically several hundreds sample long

○ Modeling HRTFs with biquad filters (e.g. Prony’s method)

Issues

○ Standardization: dummy head

○ Individualization Input Left output Right output ℎ!(𝑡, ∅, 𝜃) ℎ"(𝑡, ∅, 𝜃)

(10)

Head-Related IR Datasets

ARI

○ 85 subjects and 1550 source positions

○ https://www.kfs.oeaw.ac.at/index.php?option=com_content&view=article&i

d=608&Itemid=857&lang=en

CIPIC

○ 45 subjects and 1250 source positions

○ http://interface.cipic.ucdavis.edu/sound/hrtf.html

IRCAM

○ 50 subjects and 187 source positions

(11)

HRTF Demos

Virtual Barber Shop Hair Cut

○ https://www.youtube.com/watch?v=8IXm6SuUigI

OpenAL Example

○ https://www.youtube.com/watch?v=tY9DhuEe1WY

Google Chrome Omnitone

○ http://googlechrome.github.io/omnitone

My Research Work

(12)

Stereo Panning

Amplitude Panning in Loudspeakers

○ The gains for each speaker for monophonic source

■ 𝑥! 𝑡 = 𝑔!𝑥 𝑡 , 𝑖 = 1, 2 (left, right)

○ The tangent law (Bennett et. al.)

"#$%"#$% ! = &"'&# &"(&# ○ Power preservation ■ 𝑔)* + 𝑔** = 1

BASIC SPATIAL EFFECTS 143

When the source is very near to the listener, there are also some binaural effects which are used in distance perception [DM98]. With close sources the magnitude of ILD is higher and appears at lower frequencies than a far source with the same direction, which creates the perception of a nearby source.

5.3 Basic spatial effects for stereophonic loudspeaker and headphone playback

The most common loudspeaker layout is the two-channel setup, called the standard stereophonic setup. It was widely taken into use after the development of single-groove 45◦/45two-channel

records in the late 1950s. Two loudspeakers are positioned in front of the listener, separated by 60◦ from the listener’s viewpoint, as presented in Figure 5.2. The setup of two loudspeakers is

very common, though quite often the setup is not as shown in the figure. In domestic use, or in car audio typically the listener is not situated in the centre, but the loudspeakers are located in different directions and distances from him than in Figure 5.2. However, even then two-channel reproduction is preferred from monophonic presentation in most cases. On the other hand, in some cases the listener can be assumed to be in the best listening position, as in computer audio.

q0 = 30° q Virtual source Best listening area

Figure 5.2 Standard stereophonic listening configuration.

This section deals with the spatial effects obtainable with such two-channel reproduction and simple processing. Two types of effects are presented, the creation of point-like sources, and the creation of spatially spread sources. More advanced methods for loudspeaker reproduction with HRTF processing are discussed in Section 5.4.5, which provide some more degrees of freedom, but unfortunately also introduce some limitations in listening position and listening room acoustics.

5.3.1 Amplitude panning in loudspeakers

Amplitude panning is the most frequently used virtual-source-positioning technique. In it a sound signal is applied to loudspeakers with different amplitudes, which can be formulated as

(13)

Reverberation

Acoustic phenomenon when a sound source is played in a room

○ Thousands of echoes are reflected against wall, ceiling and floors

○ The patterns are determined by the volume and geometry of the room and

materials on the surfaces

○ We can recognize the geometry and composition of the room from the

sound

○ They provide different (often better) feelings of the sound

Sound Source

Listener Direct sound

(14)

Room Impulse Response

Room reverberation is characterized by its

impulse response (IR)

The room IR is composed of three parts

○ Direct path

○ Early reflections: convey a sense of the

room geometry and size

○ Late-field reverberation: high echo density

like noise, determined by room size and materials

RT60

Signal Processing Techniques for Digital Audio Effects 70

Reverberation and Linear Time-Invariant Systems

0 10 20 30 40 50 60 70 80 90 100 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

CCRMA Lobby Impulse Response

time - milliseconds

response amplitude

direct path

early reflections

late-field reverberation

Figure 61: CCRMA lobby response to transient signal

• The arrival times and nature of reflected source signals are sen-sitive to the details of the environment geometry and materials. • As a result, reverberation may seem unmanageably complex. • Fortunately, reverberation has two properties which allow its

analysis and synthesis without having to know the details: – linearity, and

– time invariance.

• Here, we explore reverberation by studying impulse responses of enclosed spaces.

(15)

Room Impulse Response

(16)

Artificial Reverberation

Convolution reverb

○ Measure the impulse response of a room

○ Convolve input with the measured IR

Mechanical reverb

○ Use metal plate and spring

○ EMT140 Plate Reverb: https://www.youtube.com/watch?v=HEmJpxCvp9M

Delay-based reverb

○ Early reflections: feed-forward delayline

○ Late-field reverb: allpass/comb filter, feedback delay networks (FDN)

(17)

Delay-based Reverb

Schroeder model

○ Cascade of allpass-comb filters

○ Mutually prime number for delay lengths

(18)

Delay-based Reverb

Feedback Delay Networks

○ Mixing matrix creates “good spreading” of delayed outputs

■ Chosen to be orthonormal (unitary matrix)

○ The lengths of delaylines are chosen to be mutually prime number

○ Should generate a white noise in lossless mode

○ T60 is controlled by the loop gains

Feedback Delay Networks

+ x(n) Z-M1 Z-M2 Z-M3 + a11 a12 a13 a11 a12 a13 a11 a12 a13 y(n)

(19)

Measuring Impulse Response

Measurement Model

○ Assume the system as linear time-invariant

○ Use a test signal and the output to derive the impulse response

○ Using a sine sweep: based on the convolution theorem

Signal Processing Techniques for Digital Audio Effects 74

Impulse Response Measurement

Measurement Model:

Figure 63: Measurement Configuration

- Room presumed LTI.

- Why not just crank an impulse out the speaker, record the result

and declare victory?

s(t)

LTI system

r(t)

test

sequence measured response

n(t)

h(t)

measurement noise

Figure 64: Measurement Model

- Noise assumed additive, unrelated to the test signal. (How will it

be related to the system?)

𝑟 𝑡 = 𝛿 𝑡 ∗ ℎ 𝑡 + 𝑛 𝑡 𝑟 𝑡 → *ℎ 𝑡

*ℎ 𝑡 = FFT'){ FFT 𝑟 𝑡

(20)

Measuring HRTFs

20 𝑠(𝑡) 𝑟!(𝑡) 𝑟"(𝑡) ,ℎ!(𝑡) ,ℎ"(𝑡)

Fall 2007 Music 220D Report Kolar 3

Subject/Receiver Alignment: subject wearing mic attachment/strain relief and

mirror-alignment headgear positioned on rotating stool (noted during 2nd session that midpoint of

rotation not fixed because of seat asymmetry); flashlight suspended above subject to reflect light off top-of-head mirror onto azimuth markings on walls for sighted alignment by subject (fig. 3); line-of-sight verification by subject for speaker elevation alignment

figure 3: receiver mic attachment & alignment headgear, flashlight-mirror-marker positioning system

Equipment/Signal Chain: AuSIM HeadZap speaker (fig. 2 & 3) powered by Samson

Servo 120 amplifier, AuSIM Binaural Probe Microphone Set (AuPMC002, Sennheiser KE4-211-2 omni electret capsules, fig. 4) into Grace 101 preamplifier, RME Fireface 800 converter, Digital Performer 5.12 recording software on MacBook Pro running OS 10.4.11.

(21)

Measuring Room IRs

Music 318, Winter 2007, Impulse Response Measurement 13

0 500 1000 1500 -0.5 0 0.5 sine sweep, s(t) amplitude frequency - kHz

sine sweep spectrogram

0 200 400 600 800 1000 0 5 10 0 500 1000 1500 2000 -1 -0.5 0 0.5

1 sine sweep response, r(t)

time - milliseconds

amplitude

time - milliseconds

frequency - kHz

sine sweep response spectrogram

0 500 1000 1500 2000 0 5 10 0 100 200 300 400 500 600 700 800 900 1000 -0.04 -0.02 0 0.02 0.04 0.06

0.08 measured impulse response

time - milliseconds

amplitude

CCRMA Lobby Measurment Example

s(t)

r(t)

ˆ

h (t)

(22)

Room IR datasets

Open AIR

○ http://www.openairlib.net/

Aachen Impulse Response Database

(23)

Convolution by FFT

Direct convolution in time domain is computationally expensive

It has O(n2) in complexity

○ Especially, when the length of IR is long (e.g. room IR)

FFT Convolution

○ Using Convolution Theorem: 𝑥 𝑛 ∗ ℎ 𝑛 ↔ 𝑌 𝑘 = 𝐻 𝑘 𝑋(𝑘)

FFT has O(nlog2n) and convolution in frequency O(n) in complexity

However, the issue is latency when the IR is long. How can we implement

it in real-time reverb?

○ Solution: partitioned convolution

(24)

References

Reverberation using Feedback Delay Network

○ https://ccrma.stanford.edu/~jos/pasp/FDN_Reverberation.html

Impulse Response Measurement

References

Related documents

AkiDwA, the African and Migrant Women’s Network in Ireland, developed this resource as part of a project funded by the Office of the Minister for Integration, examining the

Therefore, we constructed a DBR, the Questionnaire for Monitoring Behavior in Schools (QMBS) with multiple items to measure the occurrence of specific externalizing and

The new results, which are reviewed in this report, include the comparison with recent data on χ c1,2 production from ATLAS Collaboration and special discussion of heavy

Key policy drivers (IOM Health Professions Education: A Bridge to Quality (2003); Lancet Commission (Frenk et al., 2010), Framework for Action on Interprofessional Education

By the numerical simulation when the moving platform is parallel to the base, the pitch analysis of principal screws validates the kinematic characters of 3-UCR parallel robot leg. A

samtykkeskjemaet til kurset, har fem av deltakerne begynt i praksis. To av dem treffer jeg en dag etter at skolen er slutt på voksenopplæringssenteret. De synes det er uproblematisk å

Figure 4.3 Graphical representation of carbon pools for (a) Temperate lake systems; Rostherne Mere (2011) integrated water column, a eutrophic lake and (b) Arctic lake systems;

Figure 1.8 – Sperm guidance mechanisms within the mammalian female reproductive tract.. Diagrammatic representation of the mammalian female reproductive tract, illustrating