• No results found

Digital Audio Compression: Why, What, and How

N/A
N/A
Protected

Academic year: 2021

Share "Digital Audio Compression: Why, What, and How"

Copied!
16
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Digital Audio Compression:

Digital Audio Compression:

Why, What, and How

Why, What, and How

An Absurdly Short Course An Absurdly Short Course

Jeff Bier

Jeff Bier

Berkeley Design Technology, Inc.

Berkeley Design Technology, Inc.

©

©2000 BDTI2000 BDTI

Outline

Outline

z Why Compress?

z What is Audio Compression?

z How Does it Work?

(2)

3

Why: Too Much Data!

Why: Too Much Data!

z “CD quality”

„ Rate: 2 audio channels ×44,100 samples/sec ×16 bits = 1.4 Mbit/second audio data

„ Audio CD capacity: 0.8 gigabytes audio data

z “Cinema quality”

„ ~6 audio channels ×

48,000 samples/sec ×16 bits = 4.6 Mbit/second, ~1 Gbyte/hour

z Typical compression today: few hundred

kbit/sec for even 6 channels

4

Crossing Thresholds

Crossing Thresholds

M em ory C ap ac ity - M b y tes /c h ip

0 .0 0 8 0 .0 3 2 0 .1 2 5 0 .5 2 8 3 2 1 2 8 5 1 2 0 .0 0 1 0 .0 1 0 .1 1 1 0 1 0 0 1 0 0 0 1 9 7 5 1 9 8 0 1 9 8 5 1 9 9 0 1 9 9 5 2 0 0 0 2 0 0 5 2 0 1 0 Ye ar M b y tes / ch i Audio CD DVD Sound Track MP3 Album MP3 Song

(3)

5

Commercial Applications

Commercial Applications

z Cinema (digital film sound)

z Consumer devices „ Mini-disc (Sony) „ Handheld players „ Games „ DVD z Distribution „ Internet distribution

„ Audio over cable (set-top box)

„ Satellite/terrestrial audio broadcast

„ Digital television

What: Compression Goals

What: Compression Goals

z Reduced bandwidth and/or storage

z Make decoded signal as close as

possible to original signal

z Lowest implementation complexity

z Reasonable arithmetic requirements

z Applicable to as many signal types as

possible

z Robust

(4)

7

Psychoacoustics

Psychoacoustics

z What does it cover?

„ Relationship between what arrives at the

ear and what we hear

z Why is it important for compression?

„ Don’t transmit what the ear can’t hear

z How to figure out what ear can’t hear?

„ Range of human hearing

„ Masking

8

Range of Human Hearing

Range of Human Hearing

Zwicker

(5)

9

Auditory Masking

Auditory Masking

z One signal can make another inaudible Masker Masker Signal not Signal not masked masked Freq Freq.. Am p Am p . . Masked Masked signals signals Shifted Shifted hearing hearing threshold threshold Hearing Hearing threshold threshold

Auditory Masking (cont’d)

Auditory Masking (cont’d)

z Temporal Masking After Zwicker/Fastl p. 78, After Zwicker/Fastl p. 78, Buser/Imbert p. 47 Buser/Imbert p. 47 Pre-masking (“Back-ward”) 10-30 msec Shifted Threshold Masker Duration Post-masking (“Forward”) Time Time →→ Sou nd Press u re Leve l Sou nd Pr e s s u re Le v e l →→ 100-200 msec

(6)

11 Remove Inaudible Components

Perceptual Compression

Perceptual Compression

Window (Snapshot) Spectral Analysis Coding Scheme Quantize Pack Data into Frames Calculate Thresholds 12

Spectral Analysis/Synthesis

Spectral Analysis/Synthesis

z What is it?

„ Break a signal into spectrum

„ Recover the signal from its spectrum

Forward (analysis) Reverse/inverse

(synthesis)

(7)

13

Spectral Analysis (cont’d)

Spectral Analysis (cont’d)

z How is it done? 1. Filter bank 2. Transform ...

+

+

... Time domain Frequency domain Time domain

Spectral Analysis (cont’d)

Spectral Analysis (cont’d)

z Does reconstructed output = input?

Yes, if:

„ Don’t change transform data

(no filtering)

„ Design window correctly

„ Window input often enough

„ Overlap windows in analysis/resynthesis

properly

z If yes: Identity System

(8)

15 z Spectral analysis „ DCT „ Wavelet „ ... Davidson et al., 1994

How: Analysis

How: Analysis

16 Davidson et al., 1994

How: Analysis

How: Analysis

z Calculate masking thresholds

(9)

17

How: Noise Allocation

How: Noise Allocation

Davidson et al., 1994

z Adds noise

„ Keep below

masking threshold

z Remove inaudible components

z Quantize remaining components

„ Use minimum # bits

How: More Tricks

How: More Tricks

z Filter

„ Bandlimit input signal

‹ LFE bandlimited to <120 Hz

z Differential coding of spectral values

z Coupling

„ Across time

(10)

19

How: Coding Scheme

How: Coding Scheme

z Direct coding

z Entropy (e.g., Huffman) coding

„ MPEG: “noiseless coding”

„ PAC: “information-theoretic coding”

z Quantization table

z Run-length

z Vector quantization

20

Artifacts: “Pre

Artifacts: “Pre

-

-

echo”

echo”

z Quantization noise spread

z Noise components 1-2 msec before

impulsive signal not masked

z Fix: Shorter window; wavelets (ePAC)

Resynthesized Original

(11)

21

Comparing Specifications

Comparing Specifications

z Bit rate ranges: < 8 kbps - 9.6 Mbps

z Bit widths: 16-24 bits

z Sample rates: 8-192 kHz

z Number of channels: 1-many dozen

z Spectral bins: 128-1024

z Time resolution: 4-12 msec

z Compression ratios: 6-12:1 typical

z Audio quality… transparent to annoying

Which Algorithm?

Which Algorithm?

MPEG-1 1992 MPEG-2 1994 AAC 1997 MPEG-4 1999 ... ATRAC (Sony) 1992 PAC (Lucent) 1992 AC-3 (Dolby) 1995 Coherent Acoustics (DTS) 1996 Qdesign TwinVQ (NTT) 1995 MLP (Meridian) 1997 G2 WMA Cine ma M PEG

(12)

23

MPEG Family

MPEG Family

z Moving Pictures Experts Group

z Moving pictures + associated audio

z MPEG-1, MPEG-2 (MP3), MPEG-4

z Ongoing standardization effort

(MPEG-7)

24

MPEG

MPEG

-

-

1 Audio

1 Audio

z 1992

z Able to work well with CD, DAT

z One or two channels

„ Single channel

„ Two independent channels

„ Stereo

„ Stereo with joint coding

z 32, 44.1, 48 kHz

z Specifies bit stream format, decoder

(13)

25

MPEG

MPEG

-

-

1 Audio

1 Audio

Layers

Layers

z Layer 1: simplest; Philips DCC

z Layer 2: more efficient coding; DAB,

CD-I

z Layer 3: higher frequency and time

resolution; ISDN, Internet

z All 3 layers use same header structure

z Decoder for one layer must also decode

lower-numbered layers

z Higher-numbered layers have more

complex decoder

MPEG

MPEG

-

-

2 Audio

2 Audio

z 1994

z MPEG-2 video for digital TV

z Higher bit rates than MPEG-1

z Backward compatible with MPEG-1

„ Three layers, like MPEG-1

z Add lower sample rates

„ 16, 22.05, 24 kHz

z 5.1 + up to 7 multilingual/commentary

channels

z “MP3” = MPEG-1/2 Layer 3

(14)

27

MPEG

MPEG

-

-

2 Advanced Audio

2 Advanced Audio

Coding (AAC)

Coding (AAC)

z 1997

z Goals:

„ “Indistinguishable” at 384 kbit/sec

„ Higher quality, multi-channel

z Features:

„ “Non-backward-compatible” (“NBC”)

„ Up to 48 channels (stereo, 5.1 …)

„ “Tools” (modules) combined into

“profiles”

28

AAC Profiles

AAC Profiles

z LC (Low-Complexity)

„ Most commonly used

„ TNS (Temporal Noise Shaping)

z SSR (Scalable Sampling Rate)

„ Features gain control “tool”

z Main

„ LTP (Long-Term Predictor)

„ Delivers the best audio quality of the three

(15)

29

Conclusions

Conclusions

„ Entertainment is going digital

‹ Audio is a key component

‹ Many new market opportunities opening up

‹ Internet audio is hot; audio may be the Internet “killer app”

„ Audio compression is a key technology

‹ Many algorithms, many applications

‹ Better algorithms better quality, more compression

‹ Computation requirements are going up

In the Future…

In the Future…

(16)

31

For More Information

For More Information

http://www.BDTI.com http://www.eg3.com/dsp comp.dsp Microprocessor Report DSP Processor Fundamentals, BDTI

Or, join BDTI...We're Hiring! (See www.BDTI.com)

Collection of BDTI's papers on DSP processors, tools, apps, benchmarking Links to other good DSP sites Usenet group

For info on newer DSPs Textbook on DSP processors

References

Related documents

Recessed lighting fixtures, including wiring compartments, ballasts, and other heat producing devices, unless these are certified by the manufacturer for installation surrounded

We discuss above the ‘limited’ creative autonomy among such work roles as creative managers, producers and researchers (UK case), specifically, those involved in

We propose an RFID warehouse architecture that con- tains a fact table, stay, composed of cleansed RFID records; an information table, info, that stores path-independent in-

Our present data show that the numbers of perforin-, granzyme B- and TIA-1- positive cells in the portal tract area correlated with acute rejection severity after

The aims of this study were to describe the prevalence of chronic complications among urban Chinese type 2 diabetic outpatients; and to analyze the associations between

Using Hadoop’s distributed and parallel MapReduce environment, we present an architec- ture to mine positive as well as negative association rules in big data using frequent

Newman and Maisels, although not a major concern, is the suggestion that a direct bilirubin determination. not be obtained until jaundice has persisted

• Access to email, USFConnect, and other services delayed for faculty new hires until new hire paperwork submitted and EPAF created, processed,