• No results found

Multimedia Communications

N/A
N/A
Protected

Academic year: 2021

Share "Multimedia Communications"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Multimedia

Communications

Dr.‐Ing. Aljoscha Smolic

(2)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MMC

 

Overview

1. Introduction

2. Fundamentals (Signal Processing,

Information Theorie)

3. Speech Processing & Coding

4. Audio Processing & Coding

5. Still Image Coding (JPEG, etc.)

6. Video Coding (MPEG, etc.)

7. MPEG-4 Multimedia Framework, MPEG-7

8. 3D Video and Free Viewpoint Video

Audio

 

Overview

Properties of audition

Motivation, requirements, parameters, quality of audio coding

PCM parameters of audio coding

MPEG-4 Audio Lossless Coding (TU-Berlin, Liebchen/Noll)

Standard Audio coding: MPEG-1 (MP3), MPEG-2/4 AAC

Multi channel audio (e.g. 5.1 systems), 3D audio (MPEG-4)

(3)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Materials

Liebchen/Noll, FG, NUE, TU-Berlin, Lossless Audio Coding:

http://www.nue.tu- berlin.de/menue/forschung/projekte/beendete_projekte/mpeg-4_audio_lossless_coding_als/

K.H. Brandenburg, „MP3 and AAC Explained“, AES 17th International Conference on High Quality Audio Coding

Book

M. Bossi, R.E. Goldberg, „Introduction to Digital Audio Coding 

and Standards“, Kluwer Acad. Publishers, 2002.

Examples

 

MP3

 

Audio

 

Coding

Audio\MP3\128_Karma_Police.mp3 128 kbit/s

Audio\MP3\64_Karma_Police.mp3 64 kbit/s

Audio\MP3\32_Karma_Police.mp3 32 kbit/s

Audio\MP3\20_Karma_Police.mp3 20 kbit/s

(4)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Term

 

Audio

audire = to hear

Audio = everything related to the human aural sense

Also used for technical equipment

Sound = oscillations of air pressure

Ear = Measuring instrument for oscillations of air pressure

Physiology

 

of

 

Audition

Ossicle Balance organ Acoustic nerve Snail Eustachian tube Eardrum Auditory canal
(5)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

External ear:

Auricle (ear conch), auditory canal

Resonator Amplification, direction

Physiology

 

of

 

Audition

 Middle ear: Eardrum Ossicle: 

Amplification (leverage, area transformation)

Protection, blocking of ossicle by a muscle, but: reaction 

time 60‐120ms

Eustachian tube to mouth, pressure compensation

(6)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Cochlea, 32mm long, 2 ½ windings

Filled with fluid

• 3 canals separated by membranes

• On the Basilar membrane: hair cells 

(Corti organ)

Physiology

 

of

 

Audition

(7)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Physiology

 

of

 

Audition

(8)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Process of hearing:

Eardrum is set into oscillation by sound

Motion of ossicle

Transmission to the inner Cochlea

A fluid wave emerges in the Cochlea

Hair cells are sheared

Position of maximum displacement of membrane 

determines frequency (position‐frequency‐transform)

Stimulus is transmitted via acoustic nerve to brain

Physiology

 

of

 

Audition

Sound

 

Intensity

 

Level

• Sound intensity is measured as sound intensity level

• Reference pressure: auditory threshold at 1 kHz

• Logarithmic scale for adaptation to characteristics of audition, 

(doubling of pressure = 6 dB increase of sound intensity level)

dB

p

p

L

0

log

20

Pa

p

0

20

(9)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Parameters

 

of

 

Audition

• Frequency range: 20‐20000 Hz

• Dynamic range: 120 dB

• Frequency resolution: 850 tones

• 400.000 sounds

• Ear performs a short term spectral anaysis, perception of frequencies

• Phase difference is mainly important for sound directions

(10)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Masking

 

by

 

White

 

Noise

Masking

 

by

 

Narrow

 

Band

 

limited

 

Noise

(11)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Masking

 

by

 

Narrow

 

Band

 

limited

 

Noise

(12)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Relative

 

Perception

 

of

 

Tones

• Tones that are close together in frequency are not perceived 

separatly

• Minimum distance of two tones to be perceived separately is 

dependent on absolute frequency:

– The higher the absolute frequency, the larger the minimum 

distance

(13)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

• Audition works a band pass filter bank (third octave)

• Separation of audible frequency range in 25 frequency groups (each a 

third octave)

• Usage e.g. for coding (sub‐band coding in mp3)

Relative

 

Perception

 

of

 

Tones

(14)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Requirements

 

on

 

Audio

 

Coding

• High Quality

– Lossless – Transparent

– „Acceptable“ quality

• Robustness against channel errors

• Low complexity, power consumption

• Low delay

• Graceful Degradation

• Editing capability (for professional applications)

Quality

 

Parameters

 

of

 

Audio

 

Coding

• Distortion, signal‐to‐noise‐ratio (SNR, MOS)

• Frequency range (20‐20000 Hz)

(15)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

• MOS: Mean opinion score

– 5 Imperceptible

– 4 Perceptible, but not annoying

– 3 Slightly annoying

– 2 Annoying

– 1 Very annoying

• Subjective Tests many participants (experts or average?)

• Representative test material (average, challenging material?)

• Test conditions (speakers, room acoustics, distraction)

Subjective

 

Quality

 

Measures

PCM

 

Coding

Bereich [Hz] Abtastrate [kHz] Bit / Wert Bitrate [kBit/s] Telefon 300 3.400 8 8 64 Sprache Breitband 50 7.000 16 8 128 Audio Schmalband 10 10.000 24 16 384 Audio Breitband 10 20.000 48 16 768 • Mono signals
(16)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

PCM

 

Coding

Stereo 44,1 kHz

Gerät Rate [Mbit/s] Redundanz [Mbit/s] Gesamtrate [Mbit/s]

Compact Disc (CD) 1,41 2,91 4,32

Digital Audio Tape (DAT) 1,41 1,67 3,08 • Redundancy for error protection

PCM

 

Coding

• Further parameters, e.g. for professional applications (studios) • Sampling rates: 94 kHz, 192 kHz

– better filtering

– security range for effects, mixing, … • Resolution: 24 bit, 32 bit

– higher dynamic range

– security range for amlification, effects, mixing, … • Significant increase of raw PCM data rate

(17)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

PCM

 

Coding

• Multi channel audio, e.g. Dolby surround (cinema, DVD) • 5.1 = 6 separate audio channels

• Significant increase of raw PCM data rate

Lossless

 

Audio

 

Coding

• Motivation: Storage and archiving for higher quality requirements, e.g. in studios

• Pure redundancy reduction, i.e. the signal can be reconstructed perfectly

• Irrelevancy can turn into relevancy by editing (filtering, effects, mixing, …)

(18)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE • Example audio studio:

• highest-quality audio: 96 kHz x 24 bit => 2.3 Mbit/s • 4 min song: ca. 69 MByte

• 48 tracks: ca. 3.3 GByte storage

• Several takes, songs per album, artists, … • => Extreme storage requirements

Lossless

 

Audio

 

Coding

• Example film audio, multi channel 5.1

• highest-quality audio: 96 kHz x 24 bit => 2.3 Mbit/s • 90 min: ca. 1.25 GByte

• 6 channels: ca. 7.5 Gbyte storage

• Mixed from different takes, versions, music, speech, sounds, effects, …

• => Extreme storage requirements

(19)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

Lossless

 

Coding

• Part of MPEG-4

• Based on a proposal of Tilman Liebchen und Prof. Peter Noll, FG Nachrichtenübertragung, TU-Berlin

• Principle: ADPCM + Entropy coding

(20)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

Lossless

 

Coding

MPEG

4

 

Lossless

 

Coding

• Forward ADPCM: coefficients have to be transmitted

• The more coefficients are used for prediction the lower the error variance and with that the bitrate

• But: the more coefficients have to be transmitted, bitrate increases • Optimization problem regarding overall bitrate

(21)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

Lossless

 

Coding

• Processing in time windows of 2048 Samples at 48 kHz (ca. 43 ms)

• Division into 4 sub-segments in case of instationary signals • Joint Stereo Coding: exploitation of redundancies between

channels by prediction

• Entropy coding (Rice Coding)

MPEG

4

 

Lossless

 

Coding

• Results: compression related to PCM in % (same quality, lossless!)

(22)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

Lossless

 

Coding

• Limited compression, factor 2-2.5

• More is hardly possible only by redundancy reduction • Nevertheless: important economic factor in professional

applications (storage/archiving in studios) and others

Sub

band

 

Coding

• Band-pass filter bank

• Often separated in analogy to human perception (third octaves)

h1(n) h2(n) hM(n) CODM COD2 COD1 Multiplex x(n) Bitstrom

(23)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Lossy

 

Audio

 

Coding

• Irrelevancy reduction • Perceptual Coding

• Things that cannot be perceived are omitted

• The error is adjusted just below the threshold of perception

Perceptual

 

Coding

• Analysis of the signal related to a psycho acoustic model, which models the characteristic of audition

• Calculation of auditory thresholds, masking, temporal masking, etc.

• Quantization of the signal related to the audibility of the error

Filterbank Quantisierung Codierung MUX

Psychoakust. Modell

(24)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

• Main technology: Sub-band coding and transform coding (DCT) • Coding in the frequency domain

• Adapted to human audition, which perceives frequency characteristics in time windows

• Separation of the signal in frequency components and separate coding

Lossy

 

Audio

 

Coding

• Sub-band coding: filter bank, relatively low frequency resolution • Transform coding (DCT): relatively high frequency resolution • Windows: processing in blocks of samples (e.g. 8 ms)

• Problem: pre-echoes in sudden percussive sounds, error is distributed over the whole window and may become audible in silent passages

(25)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Pre-Echoes

Transformation Quantization Inv. Transf.

Time window of samples

Distribution of the error over the whole window

MPEG

1

 

Audio

 

Layer

 

1+2

Filterbank Analyse Skalierer Quantisierer Mux FFT Mithör-schwellen Filterbank Synthese Deskalierer Dequantisierer De-Mux Dynamische Bitzuweisung & Codierung Dynamische Bitzuweisung & Decodierung

(26)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

• Layer 1, simplest form, 32 frequency bands • Layer 2 some improvements

– finer quantization • Operation modi

– Mono

– Stereo (separately coded)

– Dual separate channels (e.g. for bilingual programs) – Joint Stereo (exploitation of stereo redundancy)

MPEG

1

 

Audio

 

Layer

 

1+2

MPEG

1

 

Audio

 

Layer

 

3

 

/

 

MP3

Filterbank Analyse MDCT dyn. Fensterung Skalierer Quantisierer Huffman-Coder Mux FFT Coder Zusatzinfo Mithör-schwellen Filterbank Synthese Inverse MDCT dyn. Fensterung Deskalierer Dequantisierer Huffman-Decoder De-Mux Decodier Zusatzinfo

(27)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Improvements

 

of

 

MP3

• Hybrid coding, sub-band + MDCT for higher frequency resolution

– 32 bands -> up to 576 spectral lines • Better bit assignment

• Entropy coding • Pre-echo control

• Dynamic windows, overlapped

MPEG

 

Standards

• MPEG Standards only specify the syntax of the bitstream and the decoding process

• The encoder can be implemented by any provider differently as long as the right syntax is created (same language)

• Allows competition among providers while ensuring interoperability of systems

• A good MP3 Encoder: Fraunhofer IIS, esp. Psycho-acoustic model

(28)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

 

Standards

• Principle of asymmetric coding: encoder is much more complex than decoder (complex signal analysis only in encoder)

• Receivers can be realized simple, cheap

• High complexity is often acceptable at sender side (broadcast, DVD production)

• Not the case in telecommunication, but that is not the main focus of MPEG

MPEG

1

 

Audio

 

Compression

 

Gains

Layer Stereo-Rate [kBit/s] Gewinn Anwendung

1 384 4 DCC

2 256...192 6...8 DAB

(29)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

5.1

 

Surround

 

Setup

MPEG

2

 

BC

• Backward compatible

• Supports 5.1 and is compatible to MPEG-1

• Supports lower sampling frequencies (16, 22,05, 24 kHz) for special applications at very low bitrates (MPEG-1: 32, 44,1, 48 kHz)

(30)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

2

 

AAC

• Abandoning backward compatibility even higher compression ratios can be achieved

• Advanced Audio Coding (AAC)

• Same principle as in MP3, perceptual coding

• Some new algorithms that in sum provide further improvements • At 96 kbit/s same quality as MP3 at 128 kbit/s

• Suitable for very low bitrates down to 32 kbit/s and below

MPEG

2

 

AAC

Modified DCT (128/1024) Psycho-akustisches Modell Temporal

Prediction Codierung

Huffman-Codierung der Zusatz-information M u l t i p l e x n Temporal Noise Shaping Stereo

(31)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

AAC

• Some small improvements over MPEG-2 AAC • Spectral band replication

MPEG

4

 

Audio

 

Toolbox

• Not only coding of audio

• Definition of an audio-visual scene (2D/3D), e.g. distribution of AV-objects in a virtual 3D room

• AV-scene consists of audio, video and synthetic objects => the scene is composed

• Described in a specific script language (BInary Format for Scenes, BIFS, superset of VRML)

(32)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

MPEG

4

 

Audio

 

Toolbox

MPEG

4

 

Audio

 

Toolbox

• Specification of different codecs, optimized for different types of data, e.g.

– General Audio (MPEG-4 AAC)

– Speech codecs (CELP, text-to-speech) – Wave table codecs für synthetic audio (MIDI)

• Specification of interaction mechanisms with audio objects – Switch on/off, move, increase/decrease volume, copy, … • Final mix of the scene at the decoder

(33)

Aljoscha Smolic Multimedia Communications FOR CLASS USE ONLY

DO NOT DISTRIBUTE

Parametric

 

Audio

 

Coding

Modellbasierte Signal-zerlegung Psycho-akustisches Modell Quantisierung Parameter-codierung Codierung der Zusatz-information M u l t i p l e x Selektion relevanter Komponenten Harmonische Komponenten Sinusoidale Komponenten Rausch-Komponenten

• Separation of audio into components (analysis), e.g. harmonic, sine, noise components

• Separate optimized coding

Parametric

 

Audio

 

Coding

References

Related documents

Indeed, the developed questionnaire truly assessed the KAP level on immunization among migrant mothers and the findings of this study showed how migrant mothers’ knowledge,

On day 30, normal rats had a mean±SEM serum triglyceride level of 89.14mg.

Applying this characteristic, we design a golden mean (GM) algorithm by dynamically folding RNA secondary structures according to the golden section points and by forming

This analysis, representing the longest follow-up for any phase 3 trial evaluating BRAFi/MEKi combination therapy, demon- strated that long-term survival is achievable with D þ T in

Repeated-measures analysis of variance (ANOVA) was car- ried out in Minitab 17.1 (Minitab Inc., State College, PA) to examine the differences in measured properties, including 15

2013-2014 CONFIRMATION ENROLLMENT (2nd year): For high school students who have met all the first year requirements in a parish Youth Ministry program for grades 9-12, or who

There are various process parameters used in deep drawing process like punch force, blank holding force, blank thickness, punch velocity, punch stroke,

We were surprised about the suggested causative role of PLD3 for cerebellar ataxia, since our protein expression data generated by immunoblot with a PLD3-specific antibody