MPEG/Audio Encoding and Decoding Figure 6 shows block d iagrams of the MPEG/

audio encoder and decoder. 1 1 12 In this high-level representation, encodi ng c losely para l lels t he pro cess described above. The i nput audio stream passes through a fil ter bank that divides the i npu t into multiple subbands. The input audio stream sim u ltaneously passes through a psychoacoustic model that determines the signal - to-mask ratio of each subband . The bit or noise a l location block uses the signal-to-mask ratios to decide how to apportion the total nu mber of code bits available for the quantization of the subband signal s to mini mize the audibil i ty of the quantization noise.

/

STRONG TONAL SIGNAL

FREQUENCY

Figure

5

A udio Noise Masking

Final ly, the last block takes the representation of the quantized audio samp.les and formats the data into a decodable bit stream. The decoder simply reverses the formatting, then reconstructs the quantized subband values, and fi nally transforms the set of subband values into a time-domain audio signal. As specified by the ,vi i'EG requirements, ancil lary data not necessarily related to the aud io stream can be fitted within the coded bit stream.

The MPEC;;audio standard has three distinct lay ers for compression. Layer l t<>rms the most basic algorithm, and Layers rr and 111 are enhancements that use some elements found in Layer I. Each suc cessive layer improves the compression perfor mance but at the cost of greater encoder and decoder complexity.

Layer

I

The Layer l algorithm uses the basic filter bank found in a l l layers. This filter bank divides the audio signal into 32 constant-width frequency bands. The filters are relatively simple and provide good time resolution with reasonable frequency resol u tion relative to the perceptual properties of the human ear. The design is a compromise with three notable concessions. First, the 32 constant width bands do not accurately reflect the ear's criti cal bands. Figure 7 i l lustrates this discrepancy. The bandwidth is too wide for the lower frequencies so the number of quantizer bits cannot be specifica l ly tuned for the noise sensitivity within each cri tical band. Instead, the incl uded critical band with the greatest noise sensitivity d ictates the nu mber of quantization bits required for the entire fi lter band. Second, the filter bank and its i nverse are not loss less transformations. Even without quantization, the inverse transformation wou ld nor perfectly recover the original input signal . Fortunately, the error introduced by the fi lter bank is small and inaudible. Finally, adjacent filter bands h�tve a signif icant frequency overlap. A signal at a si ngle fre quency can affect two adjacent filter bank outputs.

The filter bank provides 32 frequency samples, one sample per band, for every 32 input audio sam ples. The Layer l algorithm groups together 12 sam ples from each of the 32 hands. Each group of 12 samples receives a bit a l location and, if the bit al lo cation is not zero, a sca le factor. Coding for stereo redu ndancy compression is sl ightly different and is discussed later in this paper. 'T'he bit a l l.ocation determines the nu mber of bits used to represent each sample. The scale factor is a multipl ier that sizes the samples to maximize the resol ution of the quantizer. The Layer I encoder formats the

Digital Audio Compression PCM AUDIO I N PUT ENCODED BIT STREAM BIT/NOISE

TIME -TO-FREQUENCY _ALLOCATION, _BIT-STREAM

MAPPING FILTER _{OUANTIZER, AND} _FORMATIING

BANK _CODING _{- ·}

I I

PSYCHOACOUSTIC ANCILLARY DATA

MODEL

BIT-STREAM

UNPACKING

(OPTIONAL)

(a) MPEG/Audio Encoder

FREQUENCY SAMPLE - RECONSTRUCTION I I I y ANCILLARY DATA ( I F ENCODED) ( b) MPEG/Audio Decoder FREQUENCY-TO-TI M E MAPPING

Figure

6

MPEG/ A udio Compression and Decompression

M P EG/AUDIO FILTER BANK BANDS

ENCODED

BIT STREAM

DECODED

PCM AUDIO

0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 19 20 21 22 23 24 25 26 27 28 29 30 31

CRITICAL BAND BOUNDARIES

Figure 7 MPEG/Audio Filter Bandwidths versus Critical Bandwidths 32 groups of 12 samples (i.e . , 384 samples) inro a

frame. Besides the audio data, each frame contains a header, an optional cycl ic redundancy code (CRC) check word, and possibly ancillary data.

Layer II The Layer I I algorithm is a simple enhancement of Layer I. It improves compression performance by coding data in larger groups. The Layer II encoder forms frames of 3 by 12 by 32 = 1 , 152 samples per audio channeL Whereas Layer I codes data in si ngle groups of 12 samples for each

subband, Layer II codes data in 3 groups of

12

sam ples for each subband. Again d iscounting stereo redundancy coding, there is one bit allocation and up to three scale factors for each trio of 12 samples. The encoder encodes with a unique scale factor for each group of 12 samples only if necessary to avoid audible distortion. The encoder shares scale factor values between two or all three groups i n two other cases: (1) when the values of the scale factors are sufficiently close and (2) when the encoder anticipates that temporal noise masking by the ear

Multimedia

will hide the consequent d istortion. The Layer ll algorithm also improves performance over Layer I by representi ng the bit al location, the scale factor val ues, and the quantized samples with a more effi cient code.

Layer

lll

The Layer Ill algorith m is a much more refined approach. 1·1 1 i Although based on the same

filter bank found in Layers I and I I , Layer I I I compen sates for some filter bank deficiencies by process i ng the fil ter outputs with a mod ified d iscrete cosine transform (MDCT) . Figure 8 shows a block diagram of the process.

The MDCTs further subdivide the filter bank out puts in frequency to provide better spectral resolu tion. Because of the inev itable trade-off between time and frequency resolution, Layer Ill specifies two d ifferent M DCT block lengths: a long block of 36 samples or a short block of 12. The short block length improves the time reso lution to cope with tran sients. Note that the short block length is o ne- third that of a long block; when used, three short b locks replace a si ngle long block. The switch between long and short blocks is not instantaneous. A long block with a special ized long- to -short or short-to long data window prov ides the transition mecha n ism from a long to a short block. Layer I l l has three blocking modes: two modes where the outputs of the 32 fi lter banks can a l l pass through MDCTs with the same block length and a mi.,'l:ed block mode where the 2 lower-frequency bands use long blocks and the 30 upper bands use short blocks.

Other major enhancements over the Layer I and Layer II algorithms include:

PCM AUDIO I N PUT -- LAYER I AND LAYER II FILTER BANK SUBBAND 0 SUBBAND 1 SUBBAND 31

• Alias reduction - Layer III specifies a method of processing the MDCT values to rem ove some redu ndancy caused by the overlapping bands of the Layer I and Layer II filter bank.

• Nonuniform quantization - The Layer III quan

tizer raises its input to the 3/4 power before quantization to provide a more consistent signal to-noise ratio over the range of quantizer val ues. The requantizer in the MPEG/audio decoder rel inearizes the values by raising its output to the 4/3 power.

• Entropy codi ng of clara values - Layer JII uses

Huffman codes to encode the quan tized samples for better data compress ion. 1"

• Use of a bit reservoir - The design of the Layer Ill bit stream better fits the variable length nature of the compressed data . As with Layer II, Layer II! processes the audio data i n frames of 1 , 152 sam ples. Unl ike Layer II, the coded data representing these samples does not necessa rily fit into a fixed-length frame in the code bit stream. The encoder ca n donate bits to or borrow bits from the reservoir when appropriate.

• _{Noise al locatio n instead of bit a l location - The}

bit al location process used by Layers l and II only approximates the amount of noise caused by quantization to a given number of bits. The Layer III encoder uses a noise a llocation iteration loop. I n this loop, the quantizers are varied in an orderly way, and the resu lting quantization noise is actual ly calcu lated and specifically a l located to each subband.

LONG, LONG-TO-SHORT SHORT, SHORT-TO-LONG W I N DOW SELECT

LONG OR SHORT BLOCK CONTROL (FROM

PSYCHOACOUSTIC MODEL)

Figure 8 MPEG/ Audio Layer

Ill

Filter Bank Processing, Encoder Side

In document dtj v05 02 1993 pdf (Page 36-39)