VTC 2006 Fall
Dallas
Iterative Joint Video and Channel Decoding in a
Trellis-Based Vector Quantized Video Codec and Trellis
Coded Modulation Aided Wireless Videophone
R. G. Maunder, J. Kliewer, S. X. Ng, J. Wang, L-L. Yang, L. Hanzo
School of ECS
Outline
Outline
❏
Introduction
❏
System Overview
❏
Trellis-Based Vector Quantization
❏
Results
Introduction
Shannonian Video and Channel Coding
❏
Video and channel coding may be performed
indepen-dently without penalty.
❏
However, this requires a number of conditions to be met,
including:
❏
infinite complexity and
❏
infinite latency.
❏
Hence, Shannonian video and channel coding is not
prac-tical.
Introduction
Joint Video and Channel Coding
❏
As in the Shannonian approach, the video codec achieves
compression.
❏
However, like the channel codec, the video also has a
For-ward Error Correction (FEC) capability.
❏
In the receiver, video and channel decoding are performed
jointly.
❏
We employ:
❏
a frame differencing and trellis-based Vector
Quantiza-tion (VQ) video codec and
System Overview
Transmitter
Bit
concatenator
f
n
−
+
Reconstruction buffer
e1 eM
M VQ
encoders
ˆf
n
+
e
ˆ e ˆf
n−1
+ Frame differencer
Bit interleaver (M sub-frames)
Frame difference decomposer
Frame difference recomposer
ˆeM ˆe1
u1 uM
u TCM
encoder
System Overview
Bit
concatenator
fn
− +
Reconstruction buffer
e1 eM
M VQ
encoders
ˆf
n
+
e
ˆe ˆf
n−1
+ Frame differencer
Bit interleaver (M sub-frames)
Frame difference decomposer
Frame difference recomposer
ˆ eM ˆ
e1
u1
uM
u TCM
encoder
System Overview
Receiver
❏
The TCM decoder and the VQ decoders iteratively exchange mutually extrinsic
Log Likelihood Ratio (LLR) soft information.
❏
This extrinsic information is exploited as a priori information during decoding.
❏
The a priori information is subtracted from the resultant a posteriori
informa-tion to obtain more reliable extrinsic informainforma-tion for use in the next iterainforma-tion.
(1)
y
L1
a(u) =L
2
e(u)
L1
e(u)
+
−
decoderTCM
L1
p(u) = L
2
e(u) +L
1
e(u)
L2
e(u)
L2
a(u
M)
Reconstruction buffer
˜f
n−1
+ − + ˜f n + L2
a(u) = L
1
e(u)
(2) M VQ decoders LLR interleaver LLR deinterleaver L2
a(u
1)
L2
p(u) =L
1
e(u) +L
2
e(u)
recomposer Frame difference (M sub-frames)
LLR partitioner LLR concatenator ˜ e1 ˜ eM ˜ e L2
p(u
1) L2
p(u
System Overview
(1)
y
L1a(u) = L
2
e(u)
L1e(u)
+ −
decoderTCM
L1p(u) = L
2
e(u) +L
1
e(u)
L2e(u)
L2a(u
M)
Reconstruction buffer
˜f
n−1
+ − + ˜f n +
L2a(u) = L
1
e(u)
(2) M VQ decoders LLR interleaver LLR deinterleaver
L2a(u
1)
L2p(u) = L
1
e(u) + L
2
e(u)
recomposer
Frame difference (M sub-frames)
LLR partitioner
LLR
concatenator
˜ e1
˜eM
˜ e
L2p(u
1) L2
p(u
Trellis-Based Vector Quantization
Frame Difference Decomposition
❏ The Frame Difference (FD) e is decomposed into M number of sub-frames [e1 . . .eM] for the sake of reducing trellis-based VQ complexity.
❏ Each FD sub-frame em, m ∈ [1. . . M], comprises J number of video blocks [em1 . . . emJ ] chosen as one Macro Block (MB) from each MB group.
❏ This ensures that the sub-frames have similar characteristics.
❏ For example:
Macro-block grouping boundaries. Each macro-block group comprises M = 33 macro-blocks. Video block, comprising (8×8) pixels.
Macro-block, comprising JMB = 4 video blocks.
FD sub-frame em, comprising
J = 12 video blocks. FD e, comprising
M ·J = 396 video blocks.
em 1
em 2
em 3
em 4
em5 em 6
em7 em 8 em9
em 10
Trellis-Based Vector Quantization
Macro-block grouping
boundaries. Each
macro-block group comprises
M
= 33 macro-blocks.
Video block, comprising
(8
×
8) pixels.
Macro-block, comprising
J
MB= 4 video blocks.
FD sub-frame
e
m, comprising
J
= 12 video blocks.
FD
e
, comprising
M
·
J
= 396 video blocks.
em 1
em 2
em 3
em 4
em 5
em 6
em 7
em 8 em
9
em 10
em 11
Trellis-Based Vector Quantization
Vector Quantization Codebook
❏ The LBG-algorithm designed VQ codebook comprises K number of VQ tiles [VQ1 . . .VQK] of various dimensions.
❏ This allows the efficient encoding of large areas of low FD activity.
❏ Each VQ tile VQk, k ∈ [1. . . K], is represented by a minimum free-distance 2 Re-versible Variable Length Code (RVLC) RVLCk with an entropy-considered length.
❏ For example:
k
Index Video blocks (Jk
x ×Jyk) =Jk
Bits Ik
(8×8) pixels.
comprising Video block,
VQ1 VQ2 VQ3 VQ4 VQ5
VQ tiles:
1 1
2 3 4 5
(2×2) = 4 0
2 3 4 5
11 101 1001 10001 (1×2) = 2
(1×1) = 1
(1×1) = 1
(1×1) = 1
RVLC code
Trellis-Based Vector Quantization
k
Index
Video blocks
(
J
kx
×
J
yk) =
J
kBits
I
k(8
comprising
×
8) pixels.
Video block,
VQ
1VQ
2VQ
3VQ
4VQ
5VQ tiles:
1
1
2
3
4
5
(2
×
2) = 4
0
2
3
4
5
11
101
1001
10001
(1
×
2) = 2
(1
×
1) = 1
(1
×
1) = 1
(1
×
1) = 1
RVLC code
Trellis-Based Vector Quantization
Vector Quantization Code Constraints
❏ Each J video block FD sub-frame em must be represented by an exact tessellation of VQ tiles from the VQ codebook.
❏ The corresponding transmission sub-frame um is formed by concatenating the corre-sponding RVLC codes and must comprise I number of bits [um1 . . . umI ].
❏ These code constraints must be adhered to during VQ encoding and may be exploited to achieve FEC during VQ decoding.
❏ For example:
um ={1, , , , ,0 0 0 1 1,0 0 1 1 0 1, , , , , ,1, , , ,0 1 0 0}
Ta 5 1 5
1 1 1
IkT JkT kT T Tb Tc Td Te Tf 4 3 3 1 1 4 3 3 1 1 4 4 em1 e2m e3m em4 e5m e6m em7 e8m e9m em10em11em12
ˆ
em =
video blocks. Macro-block, JMB = 4 comprising pixels. FD sub-frame, comprising
J = 12 video blocks. Video block,
comprising (8×8)
Trellis-Based Vector Quantization
u
m=
{
1
, , , , ,
0
0 0 1
1
,
0 0 1 1 0 1
, , , , ,
,
1
, , , ,
0
1 0 0
}
T
a5
1
5
1
1
1
I
kTJ
kTk
TT
T
bT
cT
dT
eT
f4
3
3
1
1
4
3
3
1
1
4
4
e
m1e
2me
3me
m4e
5me
6me
m7e
8me
9me
m10e
m11e
m12ˆ
e
m=
video blocks.
Macro-block,
J
MB= 4
comprising
pixels.
FD sub-frame,
comprising
J
= 12
video blocks.
Video block,
comprising
(8
×
8)
Transmission sub-frame, comprising
I
= 17 bits.
Trellis-Based Vector Quantization
Vector Quantization Trellis Structure
❏ Each transition T in the trellis represents a possible application of the JkT -block VQ tile VQkT and the corresponding IkT -bit RVLC RVLCkT.
❏ The trellis represents every possible J-block FD sub-frame ˆem and I-bit transmission sub-frame um combination.
❏ For example:
F D su b -frame ˆe m , comp risin g J = 12 vid eo b lo cks Ta Tb Tc Td Te Tf ˆ em 1 ˆ em 2 ˆ em 3 ˆ
em4
ˆ em 5 ˆ em 6 ˆ
em7
ˆ em 8 ˆ em 9 ˆ
em10
ˆ em 11 ˆ em 12 um
1 um2 um3 um4 um5 um6 um7 um8 u9m um10 um11 um12 u13m um14 um15 um16 um17
Trellis-Based Vector Quantization
F
D
su
b
-frame
ˆe
m,
comp
risin
g
J
=
12
vid
eo
b
lo
cks
T
aT
bT
cT
dT
eT
fˆ
e
m1ˆ
e
m2ˆ
e
m3ˆ
e
m4ˆ
e
m5ˆ
e
m6ˆ
e
m 7ˆ
e
m 8ˆ
e
m 9ˆ
e
m 10ˆ
e
m 11ˆ
e
m 12Trellis-Based Vector Quantization
Vector Quantization Encoding and Decoding
❏
VQ encoding performs the Viterbi algorithm on the trellis using a mean
squared error distortion metric.
❏
This ensures that the VQ code constraints are adhered to and obtains the
optimal Minimum Mean Squared Error (MMSE) encoding.
❏
The trellis is also employed during VQ decoding to guarantee the recovery of
valid information and to provide a FEC capability by exploiting the VQ code
constraints and the minimum free distance of the RVLCs.
❏
The BCJR algorithm is performed on the basis of the transmission sub-frame
a priori soft information
L
2a(
u
m)
.
❏
A posteriori soft information
L
2p(
u
m)
is obtained by considering the a posteriori
probabilities of transitions on vertical cross sections through the trellis.
❏
Similarly, a MMSE soft FD sub-frame reconstruction
˜
e
mis obtained by
Results
Simulation parameters
❏ 100 frames of ‘Lab’ QCIF (176× 144)-pixel 10 fps video sequence
❏ M = 33, J = 12, I = 45, K = 512, 14.85 kbps
❏ 3/4-rate TCM using 16QAM modulation
❏ Bandwidth efficiency of η = 2.00 bit/s/Hz
❏ Eb/N0 = 3.96 dB at Rayleigh fading channel capacity limit
Results
EXIT chart
❏ Using minimum free-distance 2 RVLCs ensures that VQ decoding can achieve unity extrinsic mutual information and an infinitesimally low decoding error.
❏ Tunnel for Eb/N0 > 5.25 dB, just 1.29 dB from the channel capacity limit.
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
IA1, IE2
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
IE 1 ,
IA
2
High latency Eb/N0= 6 dB
Low latency Eb/N0= 6 dB
EXIT trajectory: Eb/N0= 11 dB
Eb/N0= 10 dB
Eb/N0= 9 dB
Eb/N0= 8 dB
Eb/N0= 7 dB
Eb/N0= 6 dB
Eb/N0= 5.25 dB
Eb/N0= 4 dB
Results
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
I
1, I
20.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
I
E1
,
I
A2
High latency Eb/N0= 6 dB Low latency Eb/N0= 6 dB EXIT trajectory:
Eb/N0= 11 dB Eb/N0= 10 dB Eb/N0= 9 dB Eb/N0= 8 dB Eb/N0= 7 dB Eb/N0= 6 dB Eb/N0= 5.25 dB Eb/N0= 4 dB
Results
PSNR performance
❏ VQ- and MPEG4-based bench-markers employ iterative channel decoding, have a 0.1s latency and have the same computational complexity as our approach.
4 5 6 7 8 9 10 11 12
Eb/N0[dB]
24 26 28 30 32
A
v
er
ag
e
P
S
N
R
[d
B
]
MPEG4-based bench-marker VQ-based bench-marker VQ-TCM low latency VQ-TCM high latency
8 iter 2 iter 1 iter
1.4 dB
Results
4
5
6
7
8
9
10
11
12
24
26
28
30
32
A
v
er
ag
e
P
S
N
R
[d
B
]
MPEG4-based bench-marker
VQ-based bench-marker
VQ-TCM low latency
VQ-TCM high latency
8 iter
2 iter
1 iter
1.4 dB