Digital Audio Compression: Why, What, and How

(1)

1

Digital Audio Compression:

Why, What, and How

An Absurdly Short Course An Absurdly Short Course

Jeff Bier

Berkeley Design Technology, Inc.

©

Outline

z Why Compress?

z What is Audio Compression?

z How Does it Work?

(2)

3

Why: Too Much Data!

z “CD quality”

Rate: 2 audio channels ×44,100 samples/sec ×16 bits = 1.4 Mbit/second audio data

Audio CD capacity: 0.8 gigabytes audio data

z “Cinema quality”

~6 audio channels ×

48,000 samples/sec ×16 bits = 4.6 Mbit/second, ~1 Gbyte/hour

z Typical compression today: few hundred

kbit/sec for even 6 channels

4

Crossing Thresholds

M em ory C ap ac ity - M b y tes /c h ip

0 .0 0 8 0 .0 3 2 0 .1 2 5 0 .5 2 8 3 2 1 2 8 5 1 2 0 .0 0 1 0 .0 1 0 .1 1 1 0 1 0 0 1 0 0 0 1 9 7 5 1 9 8 0 1 9 8 5 1 9 9 0 1 9 9 5 2 0 0 0 2 0 0 5 2 0 1 0 Ye ar M b y tes / ch i Audio CD DVD Sound Track MP3 Album MP3 Song

(3)

5

Commercial Applications

z Cinema (digital film sound)

z Consumer devices Mini-disc (Sony) Handheld players Games DVD z Distribution Internet distribution

Audio over cable (set-top box)

Satellite/terrestrial audio broadcast

Digital television

What: Compression Goals

z Reduced bandwidth and/or storage

z Make decoded signal as close as

possible to original signal

z Lowest implementation complexity

z Reasonable arithmetic requirements

z Applicable to as many signal types as

possible

z Robust

(4)

7

Psychoacoustics

z What does it cover?

Relationship between what arrives at the

ear and what we hear

z Why is it important for compression?

Don’t transmit what the ear can’t hear

z How to figure out what ear can’t hear?

Range of human hearing

Masking

8

Range of Human Hearing

Zwicker

(5)

9

Auditory Masking

z One signal can make another inaudible Masker Masker Signal not Signal not masked masked Freq Freq.. Am p Am p . . Masked Masked signals signals Shifted Shifted hearing hearing threshold threshold Hearing Hearing threshold threshold

Auditory Masking (cont’d)

z Temporal Masking After Zwicker/Fastl p. 78, After Zwicker/Fastl p. 78, Buser/Imbert p. 47 Buser/Imbert p. 47 Pre-masking (“Back-ward”) 10-30 msec Shifted Threshold Masker Duration Post-masking (“Forward”) Time Time →→ Sou nd Press u re Leve l Sou nd Pr e s s u re Le v e l →→ 100-200 msec

(6)

11 Remove Inaudible Components

Perceptual Compression

Window (Snapshot) Spectral Analysis Coding Scheme Quantize Pack Data into Frames Calculate Thresholds 12

Spectral Analysis/Synthesis

z What is it?

Break a signal into spectrum

Recover the signal from its spectrum

Forward (analysis) Reverse/inverse

(synthesis)

(7)

13

Spectral Analysis (cont’d)

z How is it done? 1. Filter bank 2. Transform ...

+

... Time domain Frequency domain Time domain

Spectral Analysis (cont’d)

z Does reconstructed output = input?

Yes, if:

Don’t change transform data

(no filtering)

Design window correctly

Window input often enough

Overlap windows in analysis/resynthesis

properly

z If yes: Identity System

(8)

15 z Spectral analysis DCT Wavelet ... Davidson et al., 1994

How: Analysis

16 Davidson et al., 1994

How: Analysis

z Calculate masking thresholds

(9)

17

How: Noise Allocation

Davidson et al., 1994

z Adds noise

Keep below

masking threshold

z Remove inaudible components

z Quantize remaining components

Use minimum # bits

How: More Tricks

z Filter

Bandlimit input signal

LFE bandlimited to <120 Hz

z Differential coding of spectral values

z Coupling

Across time

(10)

19

How: Coding Scheme

z Direct coding

z Entropy (e.g., Huffman) coding

MPEG: “noiseless coding”

PAC: “information-theoretic coding”

z Quantization table

z Run-length

z Vector quantization

20

Artifacts: “Pre

-

echo”

z Quantization noise spread

z Noise components ≥ 1-2 msec before

impulsive signal not masked

z Fix: Shorter window; wavelets (ePAC)

Resynthesized Original

(11)

21

Comparing Specifications

z Bit rate ranges: < 8 kbps - 9.6 Mbps

z Bit widths: 16-24 bits

z Sample rates: 8-192 kHz

z Number of channels: 1-many dozen

z Spectral bins: 128-1024

z Time resolution: 4-12 msec

z Compression ratios: 6-12:1 typical

z Audio quality… transparent to annoying

Which Algorithm?

MPEG-1 1992 MPEG-2 1994 AAC 1997 MPEG-4 1999 ... ATRAC (Sony) 1992 PAC (Lucent) 1992 AC-3 (Dolby) 1995 Coherent Acoustics (DTS) 1996 Qdesign TwinVQ (NTT) 1995 MLP (Meridian) 1997 G2 WMA Cine ma M PEG

(12)

23

MPEG Family

z Moving Pictures Experts Group

z Moving pictures + associated audio

z MPEG-1, MPEG-2 (MP3), MPEG-4

z Ongoing standardization effort

(MPEG-7)

24

MPEG

-

1 Audio

z 1992

z Able to work well with CD, DAT

z One or two channels

Single channel

Two independent channels

Stereo

Stereo with joint coding

z 32, 44.1, 48 kHz

z Specifies bit stream format, decoder

(13)

25

MPEG

-

1 Audio

Layers

z Layer 1: simplest; Philips DCC

z Layer 2: more efficient coding; DAB,

CD-I

z Layer 3: higher frequency and time

resolution; ISDN, Internet

z All 3 layers use same header structure

z Decoder for one layer must also decode

lower-numbered layers

z Higher-numbered layers have more

complex decoder

MPEG

-

2 Audio

z 1994

z MPEG-2 video for digital TV

z Higher bit rates than MPEG-1

z Backward compatible with MPEG-1

Three layers, like MPEG-1

z Add lower sample rates

16, 22.05, 24 kHz

z 5.1 + up to 7 multilingual/commentary

channels

z “MP3” = MPEG-1/2 Layer 3

(14)

27

MPEG

-

2 Advanced Audio

Coding (AAC)

z 1997

z Goals:

“Indistinguishable” at 384 kbit/sec

Higher quality, multi-channel

z Features:

“Non-backward-compatible” (“NBC”)

Up to 48 channels (stereo, 5.1 …)

“Tools” (modules) combined into

“profiles”

28

AAC Profiles

z LC (Low-Complexity)

Most commonly used

TNS (Temporal Noise Shaping)

z SSR (Scalable Sampling Rate)

Features gain control “tool”

z Main

LTP (Long-Term Predictor)

Delivers the best audio quality of the three

(15)

29

Conclusions

Entertainment is going digital

Audio is a key component

Many new market opportunities opening up

Internet audio is hot; audio may be the Internet “killer app”

Audio compression is a key technology

Many algorithms, many applications

Better algorithms →better quality, more compression

Computation requirements are going up

In the Future…

(16)

31

For More Information

http://www.BDTI.com http://www.eg3.com/dsp comp.dsp Microprocessor Report DSP Processor Fundamentals, BDTI

Or, join BDTI...We're Hiring! (See www.BDTI.com)

Collection of BDTI's papers on DSP processors, tools, apps, benchmarking Links to other good DSP sites Usenet group

For info on newer DSPs Textbook on DSP processors