Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Multimedia
Communications
Dr.
‐
Ing.
Aljoscha
Smolic
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MMC
Overview
1. Introduction
2. Fundamentals (Signal Processing,
Information Theorie)
3. Speech Processing & Coding
4. Audio Processing & Coding
5. Still Image Coding (JPEG, etc.)
6. Video Coding (MPEG, etc.)
7. MPEG-4 Multimedia Framework, MPEG-7
8. 3D Video and Free Viewpoint Video
MPEG
‐
4/7
Overview
MPEG-4 Face Animation
Layered Video Coding
MPEG-4 Multimedia Framework
3D Mesh Compression
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Materials
•
Introduction
to
MPEG
‐
7:
Multimedia
Content
Description
Interface
,
B.
S.
Manjunath (Editor),
Philippe
Salembier (Editor),
Thomas
Sikora (Editor)
,
ISBN:
978
‐
0
‐
471
‐
48678
‐
7,
Hardcover,
396
pages,
April
2002.
•
The
MPEG
‐
4
book
,
Fernando
C.
N.
Pereira,Touradj Ebrahimi,
ISBN
‐
10: 0130616214,
ISBN
‐
13:
9780130616210,
Publisher:
Prentice
Hall,
Copyright:
2003,
Format:
Paper;
896
pp
•
J.
‐
R.
Ohm
Multimedia
Communication
Technology
Springer
‐
Verlag
Materials
•
http://www.chiariglione.org/mpeg/standards/mpeg‐7/mpeg‐7.htm
•
http://www.chiariglione.org/mpeg/tutorials/papers/IEEEMM_mp7
overview_withcopyrigth.pdf
•
http://www.chiariglione.org/mpeg/standards/mpeg‐4/mpeg‐4.htm
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
Face
Animation
Face
&
Body
Animation
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Head
Model
Adaptation
Generic head model
Adapted head model
Adaptation
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
MPEG
‐
4
FDP,
FAP
FDP: Face Definition Parameters
FAP: Face Animation Parameters
Peter_Vektor
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
FDP,
FAP
Facial feature tracking
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Model-based Video Coding
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Model
Based
Coding
1 kbit/s !!!
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Model
Based
Coding
1 kbit/s !!!
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Model
Based
Coding
1 kbit/s !!!
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Character
Animation
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Character
Animation
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Character
Animation
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Character
Animation
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Text
Driven
Animation
Hello Peter, how are you doing today?
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
A E O Z B
Voice signal
Phonem
Visem
[a:], [æ]
Voice controlled lip movement
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Emotions by high level feature control
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
SMS
to
Video
• SMS
is
sent
to
the
provider
• User
chooses
a
character
(real
person
or
cartoon
‐
like)
• MMS
(video
animation)
is
generated
and
sent
to
the
receiver
• There,
SMS
is
read
by
the
chosen
character
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
User
Selection
• Different
characters
can
be
selected
• Additional
variations
with
emoticons:
:
‐
)
:
‐
(,….
• Characters
created
from
single
image
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Animated
Text
Messages
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Text
Driven
Animation
The Borg have assimilated many species
with many mythologies to explain such moments of clarity.
I have always dismissed them as trivial.
Perhaps I was wrong
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
SMS-2-Video Service
Character:
US English, GB English, German, French And other are possible as well.
Language:
“Hello, how are you doing today? You have received 5 emails and 2 phone calls. Would you like me to read your emails for you?”
Text: Emotion:
Input:
video @
< 100 kbit/sec
Text-To-Speech Server Animation Serverhttp://www.hhi.fraunhofer.de/en/departments/image-processing/applications/text2video-conversion/
Courtesy of Prof. Peter Eisert, HU Berlin, Fraunhofer HHI
Text
Driven
Animation
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Layered
Video
Coding
Layered Coding
Goal:
Better image quality compared to standard
block-based MC/DCT
Method:
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Layered
Coding
Shoulder region
-2-D sprite coding
Facial region
-3-D wire-grid coding
Background
-Static or 2-D sprite coding
Model failure region
-Standard coding (MPEG-4 VOP)
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
3-D Coding of Facial Region
Sprite Coding of 2-D Regions
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Region Mask for Sequence “Foreman”
2nd original frame
Region mask
2nd Reconstructed Frame of Sequence “Foreman”
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
9th Reconstructed Frame of Sequence “Foreman”
Layered coder
H.263
H.263
Layered coder
Frame Difference of 9th Frame of Sequence “Foreman”
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
Multimedia
Framework
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
Scene
Composition
MPEG
‐
4
Concept
• Not only coding of media
• Definition of an audio-visual scene (2D/3D), e.g. distribution of
AV-objects in a virtual 3D room
• AV-scene consists of audio, video and synthetic objects => the
scene is
composed
• Described in a specific script language (BInary Format for
Scenes, BIFS, superset of VRML)
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
Multimedia Standard
Audio-visual Scene, consists of
Audio
Video (arbitrary shape)
Still images
2D/3D computer graphics
Text
Interaction mechanisms
AV-Scenes are composed and rendered
Scene Description:
BIFS (Binary Format for Scenes)
MPEG-4 Scene
BIFS Scene graph
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
4
BIFS
BIFSConfig { nodeIDbits 10 routeIDbits 10 protoIDbits 10 isCommandStream TRUE pixelMetric FALSE hasSize FALSE pixelWidth 0 pixelHeight 0 } ObjectDescriptor { objectDescriptorID 100 streamType JPEG fileName "images/auckland_0.jpg" } ObjectDescriptor { objectDescriptorID 101 streamType JPEG fileName "images/auckland_1.jpg" } ObjectDescriptor { objectDescriptorID 102 streamType JPEG fileName "images/auckland_2.jpg" } ...MPEG
‐
4
BIFS
Group { children[ NavigationInfo {type [ "ROTATE", "WALK", "EXAMINE", "ANY" ] headlight FALSE } #NavigationInfo Viewpoint { fieldOfView 0.33 position 0 0 0 orientation 0 1 0 0 } #Viewpoint
DEF Cyl Transform { translation 0 -5 0 children [ DEF Sw0 Switch { whichChoice -1 choice [ Shape { appearance Appearance { texture ImageTexture
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
• Superset of VRML
• Text or binary (compressed)
• Streamable, updatable, timing model
• Includes audio and video
• Face and body animation
• Etc.
MPEG
‐
4
BIFS
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Omni-directional Video
slide 50
er
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Omni
‐
directional
c
amera
shown
at
IBC
2007
Auflösung vom 5000 x 2000 Pixel mit 5 HD-Kameras
Ralf
S
chäf
er Folie 51
Scene
Creation
and
Video
Tiling
• Cylindrical
or
spherical
geometry
as
approximation
of
planar
tiles
• Tile
size
and
number
depend
on
rendering
viewpoint
and
graphics
hardware
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Visibility Sensors
visibility sensor
Pre-fetching of
neighbouring patches
Unloading of patches
not contributing to
screen view
current view on screen
• Usage
of
Head
Tracker
allows
comfortable
naviagtion
• Head
‐
Mounted
Display
can
also
be
used
for
scene
visualization
Immersive Omni-directional Video with HMD
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
3D
Mesh
Compression
3D
Mesh
Compression
•
Humanoid
sequence
provided
by
Vrije
Universiteit
Brussel
(VUB)
consisting
of
117
keyframes
in
different
resolutions
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Dynamic 3D Mesh Compression
Intra/Inter
Switch
Octree
Clustering
Scal./
Quant.
m t
( )
+
+
MPEG-4
3DMC
Reconstr./
Inv Scal.
Octree
Reconstr.
)
(
ˆ
t
o
o t
( )
d t
( )
Arithmet.
Coding
Memory
-y t
( )
0
)
(
ˆ
t
d
)
(
ˆ
t
m
)
1
(
ˆ
t
m
)
1
(
ˆ
t
m
Evaluation
– Humanoid,
1940
vertices
Original
AFX-IC
60,1 kbit/s
D3DMC
62,7 kBit/s
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Chicken Crossing
915 kbit/s
507 kbit/s
original
400 time-consistent meshes with 3030 vertices
Evaluation
– Chicken,
3030
vertices
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
7
MPEG
‐
7
Tremendous amount of multimedia is available and
growing
Search for content gets more and more difficult
Automatic tools to assist search are necessary, search
enigines for the Internet
Metadata are tagged to multimedia data for content
description and classification
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
7
Simplest
form:
descriptive
text
Manual
generation
Often
this
is
produced
during
production
anyway
(playlist,
story
board,
cast
list,
scripts,
etc.)
Has
to
be
associated
with
the
data
in
a
standardized
way
Metadata
Tagging
Martina Schmidt, Fernschachgegnerin, [email protected] Ärmelkanal, 23.10.2001 Temperatur 13°C Aussichtspunkt, 162 m, geformt aus Kalk während der Eiszeit ca. 10000 v.u.Z.Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
MPEG
‐
7
Automatic extraction of signal-based features
Visual: color, shape, texture, motion, etc.
Audio: harmony, melody, frequency features, etc.
Features are captured in compact form (few bits) in so
called „Descriptors“
Groups of „Descriptors“ can be combined to „Description
Schemes“
MPEG
‐
7
System
MPEG-7-Inhaltsbeschreibung Description Scheme 1 Description Scheme 2 Descriptor 1 Descriptor 2 Descriptor 3 Beschreibende Parameter Merkmalsextraktion Ähnlichkeitsanalyse AnwendungAljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Visual
Descriptors
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Query
‐
by
‐
Example,
Color
Beispielbild
gefundene Bilder
Datenbank mit 5000 Bildern
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Visualization
of
TV
Channels
by
Color
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Query
‐
by
‐
Example,
Texture
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Region
Shape
Descriptor
Aljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
Name Parametersatz
Transformation
Translation
a
1,
b
1 0 1 1 0 1 1y
b
y
x
a
x
4-Parameter
a
1,
a
2,
a
3,
b
1 0 2 0 3 1 1 0 3 0 2 1 1y
a
x
a
b
y
y
a
x
a
a
x
Affin
a
1,
a
2,
a
3,
b
1,
b
2,
b
30 3 0 2 1 1 0 3 0 2 1 1
y
b
x
b
b
y
y
a
x
a
a
x
Perspektivisch
a
1,
a
2,
a
3,
b
1,
b
2,
b
3,
c
1,
c
2 0 2 0 1 0 3 0 2 1 1 0 2 0 1 0 3 0 2 1 11
1
y
c
x
c
y
b
x
b
b
y
y
c
x
c
y
a
x
a
a
x
Parabolisch
,
,
,
,
,
,
,
,
,
,
,
6 5 4 3 2 1 6 5 4 3 2 1b
b
b
b
b
b
a
a
a
a
a
a
0 0 6 2 0 5 2 0 4 0 3 0 2 1 1 0 0 6 2 0 5 2 0 4 0 3 0 2 1 1y
x
b
y
b
x
b
y
b
x
b
b
y
y
x
a
y
a
x
a
y
a
x
a
a
x
Parametric 2D Motion Models
Influence of Different Parameters
-150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 x y -150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 x y -150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 x y
a
1: x-Translation
a
2: x-Skalierung
b
3: y-Skalierung
-150 -100 -50 0 50 100 150 -150 -100 -50 0 50 100 150 x yAljoscha Smolic Multimedia Communications
FOR
CLASS
USE
ONLY
DO
NOT
DISTRIBUTE
•
Search for different types of global motion by estimation and
evaluation of motion parameters
•
E.g. „Translation to the left“, „zoom out“
a
2,
b
3
MPEG-7: Parametric Motion Descriptor
a
1
System for Motion-Based Retrieval
•
Determination of shot boundaries (SG), and key frames (KF) as
visual representative
•
Estimation of average global motion over a shot
•
Storage of extracated metadata with the video in a data base
Videosignal KF KF KF SG SG SG PMD PMD PMD SG MPEG-7 Datenbank
Aljoscha Smolic Multimedia Communications