1
Digital Audio Compression:
Digital Audio Compression:
Why, What, and How
Why, What, and How
An Absurdly Short Course An Absurdly Short Course
Jeff Bier
Jeff Bier
Berkeley Design Technology, Inc.
Berkeley Design Technology, Inc.
©
©2000 BDTI2000 BDTI
Outline
Outline
z Why Compress?
z What is Audio Compression?
z How Does it Work?
3
Why: Too Much Data!
Why: Too Much Data!
z “CD quality”
Rate: 2 audio channels ×44,100 samples/sec ×16 bits = 1.4 Mbit/second audio data
Audio CD capacity: 0.8 gigabytes audio data
z “Cinema quality”
~6 audio channels ×
48,000 samples/sec ×16 bits = 4.6 Mbit/second, ~1 Gbyte/hour
z Typical compression today: few hundred
kbit/sec for even 6 channels
4
Crossing Thresholds
Crossing Thresholds
M em ory C ap ac ity - M b y tes /c h ip
0 .0 0 8 0 .0 3 2 0 .1 2 5 0 .5 2 8 3 2 1 2 8 5 1 2 0 .0 0 1 0 .0 1 0 .1 1 1 0 1 0 0 1 0 0 0 1 9 7 5 1 9 8 0 1 9 8 5 1 9 9 0 1 9 9 5 2 0 0 0 2 0 0 5 2 0 1 0 Ye ar M b y tes / ch i Audio CD DVD Sound Track MP3 Album MP3 Song
5
Commercial Applications
Commercial Applications
z Cinema (digital film sound)
z Consumer devices Mini-disc (Sony) Handheld players Games DVD z Distribution Internet distribution
Audio over cable (set-top box)
Satellite/terrestrial audio broadcast
Digital television
What: Compression Goals
What: Compression Goals
z Reduced bandwidth and/or storage
z Make decoded signal as close as
possible to original signal
z Lowest implementation complexity
z Reasonable arithmetic requirements
z Applicable to as many signal types as
possible
z Robust
7
Psychoacoustics
Psychoacoustics
z What does it cover?
Relationship between what arrives at the
ear and what we hear
z Why is it important for compression?
Don’t transmit what the ear can’t hear
z How to figure out what ear can’t hear?
Range of human hearing
Masking
8
Range of Human Hearing
Range of Human Hearing
Zwicker
9
Auditory Masking
Auditory Masking
z One signal can make another inaudible Masker Masker Signal not Signal not masked masked Freq Freq.. Am p Am p . . Masked Masked signals signals Shifted Shifted hearing hearing threshold threshold Hearing Hearing threshold thresholdAuditory Masking (cont’d)
Auditory Masking (cont’d)
z Temporal Masking After Zwicker/Fastl p. 78, After Zwicker/Fastl p. 78, Buser/Imbert p. 47 Buser/Imbert p. 47 Pre-masking (“Back-ward”) 10-30 msec Shifted Threshold Masker Duration Post-masking (“Forward”) Time Time →→ Sou nd Press u re Leve l Sou nd Pr e s s u re Le v e l →→ 100-200 msec
11 Remove Inaudible Components
Perceptual Compression
Perceptual Compression
Window (Snapshot) Spectral Analysis Coding Scheme Quantize Pack Data into Frames Calculate Thresholds 12Spectral Analysis/Synthesis
Spectral Analysis/Synthesis
z What is it? Break a signal into spectrum
Recover the signal from its spectrum
Forward (analysis) Reverse/inverse
(synthesis)
13
Spectral Analysis (cont’d)
Spectral Analysis (cont’d)
z How is it done? 1. Filter bank 2. Transform ...
+
+
... Time domain Frequency domain Time domainSpectral Analysis (cont’d)
Spectral Analysis (cont’d)
z Does reconstructed output = input?
Yes, if:
Don’t change transform data
(no filtering)
Design window correctly
Window input often enough
Overlap windows in analysis/resynthesis
properly
z If yes: Identity System
15 z Spectral analysis DCT Wavelet ... Davidson et al., 1994
How: Analysis
How: Analysis
16 Davidson et al., 1994How: Analysis
How: Analysis
z Calculate masking thresholds
17
How: Noise Allocation
How: Noise Allocation
Davidson et al., 1994
z Adds noise
Keep below
masking threshold
z Remove inaudible components
z Quantize remaining components
Use minimum # bits
How: More Tricks
How: More Tricks
z Filter
Bandlimit input signal
LFE bandlimited to <120 Hz
z Differential coding of spectral values
z Coupling
Across time
19
How: Coding Scheme
How: Coding Scheme
z Direct coding
z Entropy (e.g., Huffman) coding
MPEG: “noiseless coding”
PAC: “information-theoretic coding”
z Quantization table
z Run-length
z Vector quantization
20
Artifacts: “Pre
Artifacts: “Pre
-
-
echo”
echo”
z Quantization noise spread
z Noise components ≥ 1-2 msec before
impulsive signal not masked
z Fix: Shorter window; wavelets (ePAC)
Resynthesized Original
21
Comparing Specifications
Comparing Specifications
z Bit rate ranges: < 8 kbps - 9.6 Mbps
z Bit widths: 16-24 bits
z Sample rates: 8-192 kHz
z Number of channels: 1-many dozen
z Spectral bins: 128-1024
z Time resolution: 4-12 msec
z Compression ratios: 6-12:1 typical
z Audio quality… transparent to annoying
Which Algorithm?
Which Algorithm?
MPEG-1 1992 MPEG-2 1994 AAC 1997 MPEG-4 1999 ... ATRAC (Sony) 1992 PAC (Lucent) 1992 AC-3 (Dolby) 1995 Coherent Acoustics (DTS) 1996 Qdesign TwinVQ (NTT) 1995 MLP (Meridian) 1997 G2 WMA Cine ma M PEG23
MPEG Family
MPEG Family
z Moving Pictures Experts Group
z Moving pictures + associated audio
z MPEG-1, MPEG-2 (MP3), MPEG-4
z Ongoing standardization effort
(MPEG-7)
24
MPEG
MPEG
-
-
1 Audio
1 Audio
z 1992
z Able to work well with CD, DAT
z One or two channels
Single channel
Two independent channels
Stereo
Stereo with joint coding
z 32, 44.1, 48 kHz
z Specifies bit stream format, decoder
25
MPEG
MPEG
-
-
1 Audio
1 Audio
LayersLayers
z Layer 1: simplest; Philips DCC
z Layer 2: more efficient coding; DAB,
CD-I
z Layer 3: higher frequency and time
resolution; ISDN, Internet
z All 3 layers use same header structure
z Decoder for one layer must also decode
lower-numbered layers
z Higher-numbered layers have more
complex decoder
MPEG
MPEG
-
-
2 Audio
2 Audio
z 1994
z MPEG-2 video for digital TV
z Higher bit rates than MPEG-1
z Backward compatible with MPEG-1
Three layers, like MPEG-1
z Add lower sample rates
16, 22.05, 24 kHz
z 5.1 + up to 7 multilingual/commentary
channels
z “MP3” = MPEG-1/2 Layer 3
27
MPEG
MPEG
-
-
2 Advanced Audio
2 Advanced Audio
Coding (AAC)
Coding (AAC)
z 1997
z Goals:
“Indistinguishable” at 384 kbit/sec
Higher quality, multi-channel
z Features:
“Non-backward-compatible” (“NBC”)
Up to 48 channels (stereo, 5.1 …)
“Tools” (modules) combined into
“profiles”
28
AAC Profiles
AAC Profiles
z LC (Low-Complexity)
Most commonly used
TNS (Temporal Noise Shaping)
z SSR (Scalable Sampling Rate)
Features gain control “tool”
z Main
LTP (Long-Term Predictor)
Delivers the best audio quality of the three
29
Conclusions
Conclusions
Entertainment is going digital
Audio is a key component
Many new market opportunities opening up
Internet audio is hot; audio may be the Internet “killer app”
Audio compression is a key technology
Many algorithms, many applications
Better algorithms →better quality, more compression
Computation requirements are going up
In the Future…
In the Future…
31
For More Information
For More Information
http://www.BDTI.com http://www.eg3.com/dsp comp.dsp Microprocessor Report DSP Processor Fundamentals, BDTI
Or, join BDTI...We're Hiring! (See www.BDTI.com)
Collection of BDTI's papers on DSP processors, tools, apps, benchmarking Links to other good DSP sites Usenet group
For info on newer DSPs Textbook on DSP processors