Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Multimedia
Communications
Dr.‐Ing. Aljoscha Smolic
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MMC
Overview
1. Introduction
2. Fundamentals (Signal Processing,
Information Theorie)
3. Speech Processing & Coding
4. Audio Processing & Coding
5. Still Image Coding (JPEG, etc.)
6. Video Coding (MPEG, etc.)
7. MPEG-4 Multimedia Framework, MPEG-7
8. 3D Video and Free Viewpoint Video
Audio
Overview
Properties of audition
Motivation, requirements, parameters, quality of audio coding
PCM parameters of audio coding
MPEG-4 Audio Lossless Coding (TU-Berlin, Liebchen/Noll)
Standard Audio coding: MPEG-1 (MP3), MPEG-2/4 AAC
Multi channel audio (e.g. 5.1 systems), 3D audio (MPEG-4)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Materials
Liebchen/Noll, FG, NUE, TU-Berlin, Lossless Audio Coding:
http://www.nue.tu- berlin.de/menue/forschung/projekte/beendete_projekte/mpeg-4_audio_lossless_coding_als/
K.H. Brandenburg, „MP3 and AAC Explained“, AES 17th International Conference on High Quality Audio Coding
Book
M. Bossi, R.E. Goldberg, „Introduction to Digital Audio Coding
and Standards“, Kluwer Acad. Publishers, 2002.
Examples
MP3
Audio
Coding
Audio\MP3\128_Karma_Police.mp3 128 kbit/s
Audio\MP3\64_Karma_Police.mp3 64 kbit/s
Audio\MP3\32_Karma_Police.mp3 32 kbit/s
Audio\MP3\20_Karma_Police.mp3 20 kbit/s
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Term
Audio
audire = to hear
Audio = everything related to the human aural sense
Also used for technical equipment
Sound = oscillations of air pressure
Ear = Measuring instrument for oscillations of air pressure
Physiology
of
Audition
Ossicle Balance organ Acoustic nerve Snail Eustachian tube Eardrum Auditory canalAljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
External ear:
Auricle (ear conch), auditory canal
Resonator Amplification, direction
Physiology
of
Audition
Middle ear: Eardrum Ossicle:Amplification (leverage, area transformation)
Protection, blocking of ossicle by a muscle, but: reaction
time 60‐120ms
Eustachian tube to mouth, pressure compensation
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Cochlea, 32mm long, 2 ½ windings
Filled with fluid
• 3 canals separated by membranes
• On the Basilar membrane: hair cells
(Corti organ)
Physiology
of
Audition
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Physiology
of
Audition
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Process of hearing:
Eardrum is set into oscillation by sound
Motion of ossicle
Transmission to the inner Cochlea
A fluid wave emerges in the Cochlea
Hair cells are sheared
Position of maximum displacement of membrane
determines frequency (position‐frequency‐transform)
Stimulus is transmitted via acoustic nerve to brain
Physiology
of
Audition
Sound
Intensity
Level
• Sound intensity is measured as sound intensity level
• Reference pressure: auditory threshold at 1 kHz
• Logarithmic scale for adaptation to characteristics of audition,
(doubling of pressure = 6 dB increase of sound intensity level)
dB
p
p
L
0log
20
Pa
p
0
20
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Parameters
of
Audition
• Frequency range: 20‐20000 Hz
• Dynamic range: 120 dB
• Frequency resolution: 850 tones
• 400.000 sounds
• Ear performs a short term spectral anaysis, perception of frequencies
• Phase difference is mainly important for sound directions
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Masking
by
White
Noise
Masking
by
Narrow
Band
limited
Noise
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Masking
by
Narrow
Band
limited
Noise
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Relative
Perception
of
Tones
• Tones that are close together in frequency are not perceived
separatly
• Minimum distance of two tones to be perceived separately is
dependent on absolute frequency:
– The higher the absolute frequency, the larger the minimum
distance
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
• Audition works a band pass filter bank (third octave)
• Separation of audible frequency range in 25 frequency groups (each a
third octave)
• Usage e.g. for coding (sub‐band coding in mp3)
Relative
Perception
of
Tones
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Requirements
on
Audio
Coding
• High Quality
– Lossless – Transparent
– „Acceptable“ quality
• Robustness against channel errors
• Low complexity, power consumption
• Low delay
• Graceful Degradation
• Editing capability (for professional applications)
Quality
Parameters
of
Audio
Coding
• Distortion, signal‐to‐noise‐ratio (SNR, MOS)
• Frequency range (20‐20000 Hz)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
• MOS: Mean opinion score
– 5 Imperceptible
– 4 Perceptible, but not annoying
– 3 Slightly annoying
– 2 Annoying
– 1 Very annoying
• Subjective Tests many participants (experts or average?)
• Representative test material (average, challenging material?)
• Test conditions (speakers, room acoustics, distraction)
Subjective
Quality
Measures
PCM
Coding
Bereich [Hz] Abtastrate [kHz] Bit / Wert Bitrate [kBit/s] Telefon 300 3.400 8 8 64 Sprache Breitband 50 7.000 16 8 128 Audio Schmalband 10 10.000 24 16 384 Audio Breitband 10 20.000 48 16 768 • Mono signalsAljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
PCM
Coding
Stereo 44,1 kHz
Gerät Rate [Mbit/s] Redundanz [Mbit/s] Gesamtrate [Mbit/s]
Compact Disc (CD) 1,41 2,91 4,32
Digital Audio Tape (DAT) 1,41 1,67 3,08 • Redundancy for error protection
PCM
Coding
• Further parameters, e.g. for professional applications (studios) • Sampling rates: 94 kHz, 192 kHz
– better filtering
– security range for effects, mixing, … • Resolution: 24 bit, 32 bit
– higher dynamic range
– security range for amlification, effects, mixing, … • Significant increase of raw PCM data rate
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
PCM
Coding
• Multi channel audio, e.g. Dolby surround (cinema, DVD) • 5.1 = 6 separate audio channels
• Significant increase of raw PCM data rate
Lossless
Audio
Coding
• Motivation: Storage and archiving for higher quality requirements, e.g. in studios
• Pure redundancy reduction, i.e. the signal can be reconstructed perfectly
• Irrelevancy can turn into relevancy by editing (filtering, effects, mixing, …)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE • Example audio studio:
• highest-quality audio: 96 kHz x 24 bit => 2.3 Mbit/s • 4 min song: ca. 69 MByte
• 48 tracks: ca. 3.3 GByte storage
• Several takes, songs per album, artists, … • => Extreme storage requirements
Lossless
Audio
Coding
• Example film audio, multi channel 5.1
• highest-quality audio: 96 kHz x 24 bit => 2.3 Mbit/s • 90 min: ca. 1.25 GByte
• 6 channels: ca. 7.5 Gbyte storage
• Mixed from different takes, versions, music, speech, sounds, effects, …
• => Extreme storage requirements
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
Lossless
Coding
• Part of MPEG-4
• Based on a proposal of Tilman Liebchen und Prof. Peter Noll, FG Nachrichtenübertragung, TU-Berlin
• Principle: ADPCM + Entropy coding
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
Lossless
Coding
MPEG
‐
4
Lossless
Coding
• Forward ADPCM: coefficients have to be transmitted
• The more coefficients are used for prediction the lower the error variance and with that the bitrate
• But: the more coefficients have to be transmitted, bitrate increases • Optimization problem regarding overall bitrate
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
Lossless
Coding
• Processing in time windows of 2048 Samples at 48 kHz (ca. 43 ms)
• Division into 4 sub-segments in case of instationary signals • Joint Stereo Coding: exploitation of redundancies between
channels by prediction
• Entropy coding (Rice Coding)
MPEG
‐
4
Lossless
Coding
• Results: compression related to PCM in % (same quality, lossless!)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
Lossless
Coding
• Limited compression, factor 2-2.5
• More is hardly possible only by redundancy reduction • Nevertheless: important economic factor in professional
applications (storage/archiving in studios) and others
Sub
‐
band
Coding
• Band-pass filter bank
• Often separated in analogy to human perception (third octaves)
h1(n) h2(n) hM(n) CODM COD2 COD1 Multiplex x(n) Bitstrom … …
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Lossy
Audio
Coding
• Irrelevancy reduction • Perceptual Coding
• Things that cannot be perceived are omitted
• The error is adjusted just below the threshold of perception
Perceptual
Coding
• Analysis of the signal related to a psycho acoustic model, which models the characteristic of audition
• Calculation of auditory thresholds, masking, temporal masking, etc.
• Quantization of the signal related to the audibility of the error
Filterbank Quantisierung Codierung MUX
Psychoakust. Modell
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
• Main technology: Sub-band coding and transform coding (DCT) • Coding in the frequency domain
• Adapted to human audition, which perceives frequency characteristics in time windows
• Separation of the signal in frequency components and separate coding
Lossy
Audio
Coding
• Sub-band coding: filter bank, relatively low frequency resolution • Transform coding (DCT): relatively high frequency resolution • Windows: processing in blocks of samples (e.g. 8 ms)
• Problem: pre-echoes in sudden percussive sounds, error is distributed over the whole window and may become audible in silent passages
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Pre-Echoes
Transformation Quantization Inv. Transf.
Time window of samples
Distribution of the error over the whole window
MPEG
‐
1
Audio
Layer
1+2
Filterbank Analyse Skalierer Quantisierer Mux FFT Mithör-schwellen Filterbank Synthese Deskalierer Dequantisierer De-Mux Dynamische Bitzuweisung & Codierung Dynamische Bitzuweisung & Decodierung
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
• Layer 1, simplest form, 32 frequency bands • Layer 2 some improvements
– finer quantization • Operation modi
– Mono
– Stereo (separately coded)
– Dual separate channels (e.g. for bilingual programs) – Joint Stereo (exploitation of stereo redundancy)
MPEG
‐
1
Audio
Layer
1+2
MPEG
‐
1
Audio
Layer
3
/
MP3
Filterbank Analyse MDCT dyn. Fensterung Skalierer Quantisierer Huffman-Coder Mux FFT Coder Zusatzinfo Mithör-schwellen Filterbank Synthese Inverse MDCT dyn. Fensterung Deskalierer Dequantisierer Huffman-Decoder De-Mux Decodier Zusatzinfo
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Improvements
of
MP3
• Hybrid coding, sub-band + MDCT for higher frequency resolution
– 32 bands -> up to 576 spectral lines • Better bit assignment
• Entropy coding • Pre-echo control
• Dynamic windows, overlapped
MPEG
Standards
• MPEG Standards only specify the syntax of the bitstream and the decoding process
• The encoder can be implemented by any provider differently as long as the right syntax is created (same language)
• Allows competition among providers while ensuring interoperability of systems
• A good MP3 Encoder: Fraunhofer IIS, esp. Psycho-acoustic model
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
Standards
• Principle of asymmetric coding: encoder is much more complex than decoder (complex signal analysis only in encoder)
• Receivers can be realized simple, cheap
• High complexity is often acceptable at sender side (broadcast, DVD production)
• Not the case in telecommunication, but that is not the main focus of MPEG
MPEG
‐
1
Audio
Compression
Gains
Layer Stereo-Rate [kBit/s] Gewinn Anwendung
1 384 4 DCC
2 256...192 6...8 DAB
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
5.1
Surround
Setup
MPEG
‐
2
BC
• Backward compatible
• Supports 5.1 and is compatible to MPEG-1
• Supports lower sampling frequencies (16, 22,05, 24 kHz) for special applications at very low bitrates (MPEG-1: 32, 44,1, 48 kHz)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
2
AAC
• Abandoning backward compatibility even higher compression ratios can be achieved
• Advanced Audio Coding (AAC)
• Same principle as in MP3, perceptual coding
• Some new algorithms that in sum provide further improvements • At 96 kbit/s same quality as MP3 at 128 kbit/s
• Suitable for very low bitrates down to 32 kbit/s and below
MPEG
‐
2
AAC
Modified DCT (128/1024) Psycho-akustisches Modell TemporalPrediction Codierung
Huffman-Codierung der Zusatz-information M u l t i p l e x n Temporal Noise Shaping Stereo
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
AAC
• Some small improvements over MPEG-2 AAC • Spectral band replication
MPEG
‐
4
Audio
Toolbox
• Not only coding of audio
• Definition of an audio-visual scene (2D/3D), e.g. distribution of AV-objects in a virtual 3D room
• AV-scene consists of audio, video and synthetic objects => the scene is composed
• Described in a specific script language (BInary Format for Scenes, BIFS, superset of VRML)
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
MPEG
‐
4
Audio
Toolbox
MPEG
‐
4
Audio
Toolbox
• Specification of different codecs, optimized for different types of data, e.g.
– General Audio (MPEG-4 AAC)
– Speech codecs (CELP, text-to-speech) – Wave table codecs für synthetic audio (MIDI)
• Specification of interaction mechanisms with audio objects – Switch on/off, move, increase/decrease volume, copy, … • Final mix of the scene at the decoder
Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY
DO NOT DISTRIBUTE
Parametric
Audio
Coding
Modellbasierte Signal-zerlegung Psycho-akustisches Modell Quantisierung Parameter-codierung Codierung der Zusatz-information M u l t i p l e x Selektion relevanter Komponenten Harmonische Komponenten Sinusoidale Komponenten Rausch-Komponenten
• Separation of audio into components (analysis), e.g. harmonic, sine, noise components
• Separate optimized coding
Parametric
Audio
Coding