Dolby AC-3 - Multichannel Audio Coding Systems

2.2 Multichannel Audio Coding Systems

2.2.1 Dolby AC-3

A very popular algorithm for high-quality audio compression is Dolby AC-3, which is also known as Dolby Digital or Dolby SR·D, has been developed by Dolby Laboratories at late 1980’s. It has both stereophonic and five-channel versions, while its data rates range from 32 to 640 kbps. In the “5.1 channels” case, the minimum achieved data rate for high quality audio is 382 kbps.

The encoding procedure of AC-3 coding system is depicted in Fig. 2.2. Overlapping blocks of 512 Pulse Code Modulation (PCM) time samples of an audio signal are multiplied by a Kaiser-Bessel Derived (KBD) analysis window and are modeled by the analysis part of a perfect reconstruction Modified Discrete Cosine Transform (MDCT) filter bank2_{. The}

resultant frequency coefficients are represented as a binary exponent and a mantissa. The exponent encoder is a procedure, which exploits the MDCT coefficients redundancies that occur in time and frequency domain and estimates the signal spectrum known as spectral envelope. The mantissa quantizer groups MDCT coefficients in blocks, while the maximum of each block is quantized as an exponent proportional to the left shifts required until overflow. In the next step of encoding, the spectral envelope is used in the bit allocation routine. This routine evaluates the bits that will be used for the encoding of each mantissa and determines the prospective bit rate. Finally, the signal spectrum along with the quantized mantissas are combined into AC-3 frames, which are transmitted to the receiver.

2_{More information related to MDCT filter bank and window functions can be found in subsections 3.8}

Chapter 2. Background 11 Spectral Envelope Encoder Perceptual Model Bit Allocation Mantissa Quantizer Encoded Spectral Envelope Quantized Mantissas

AC−3 frames Encoded

Bit Stream

PCM samles _Analysis

MDCT fb

Exponents

Mantissas

Bit Allocation Information

Figure 2.2: The encoding procedure of Dolby AC-3 coding system.

At this point we mention that Dolby AC-3 is enhanced with psychoacoustic analysis, exploiting knowledge of the properties of the human auditory system (in particular, the spectral and temporal masking effects of inner ear). The principle of audio masking is illustrated in Fig. 2.3. The signal component at 1 kHz, distorts and raises the masking threshold which defines the level that other signal components must exceed in order to be audible. If a second audio component is present at the same time and close in frequency to the first, then for the second component to be perceived by the ear, it must be at a higher level than it would otherwise need to be if present only on its own; otherwise it is masked by the first signal. Essentially, the system codes only audio signal components that the ear will hear and discards any audio information that the ear will not perceive, according to the psychoacoustical model.

Specifically in AC-3, (see Fig. 2.2), the coefficients of the spectral envelope are entered into a perceptual model, which estimates the masked threshold of each frame. This model exists only at the encoder, is not inverted to the decoder and it determines the most suitable (for the audio data) set of perceptual model parameters. After several threshold calculations in a rate control routine, these parameters result to a fixed form and they are transmitted to the decoder.

The encoding algorithm of AC-3 has some extra functions as well. A frame header is specified, which determines the bit-rate, the number of channels, the sampling frequency and more information necessary for the retrieval of the original bit stream at the decoder. Error detection functions are also inserted, which allow the decoder to make sure that the data are error free. The spectral resolution of the analysis MDCT filter bank can dynamically vary, adapting to the features of the audio blocks. Finally, the original channels may be coupled at high frequencies at the coding procedure (this technique is also known as intensity coding) [11, 10], in order to accomplish a more efficient coding approach. In channel coupling the properties of spatial hearing are exploited and the main idea is to transmit only one spectral envelope (instead of two or more) from independent channels together

Figure 2.3: The principle of Psychoacoustic Masking [1].

with some side information, which is used in the decoder for recovering the individual envelopes.

On the other hand, the decoding portion of AC-3 system is mainly the inverse of the corresponding encoding, as it is shown in Fig. 2.4. The encoded bit stream is re- ceived, synchronized, checked for transmission errors and decomposed into spectral envelopes and quantized mantissas. From bit allocation iterations, useful information for the de-quantization of mantissas is extracted. The spectral envelopes are decoded and transformed to exponents, which together with de-quantized mantissas are inserted to the synthesis MDCT filter bank, giving the original PCM time samples. In case where the transmitted channels were coupled in the encoding process, they must be de-coupled. Also, if the spectral resolution at the analysis filter bank has been assigned dynamically, it must be altered in the synthesis filter bank in the same manner. For more information on Dolby AC-3 the interested reader is referred to [11, 10, 12, 13].

In document Multichannel Audio Modeling and Coding Using a Multiscale Source/Filter Model (Page 32-34)