DISCOVER Monoview Video Codec

(1)

DISCOVER Monoview Video Codec

Fernando Pereira

Instituto Superior Técnico, Portugal

on behalf of the DISCOVER project

DISCOVER Workshop on

Recent Advances in Distributed Video Coding

(2)

2

Outline

1. DVC Before DISCOVER

1. PRISM Solution (Univ. Berkeley)

2. Feedback-channel based Solution (Univ. Stanford)

2. DISCOVER Threats and Opportunities

1. Distributed Video Coding: The Challenges

2. Promising Applications

3. DISCOVER Monoview Video Codec

1. Architecture 2. Problems to Address 3. Encoder Modules 4. Decoder Modules 4. DISCOVER Performance 5. DISCOVER Future

(3)

DVC Before

DISCOVER

(4)

4

The DVC World in 2004 …

• PRISM (

Power-efficient, Robust, hIgh

compression Syndrome based Multimedia

coding

) solution developed at Univ. Berkeley

by Prof. Ramchandran’s team.

• Feedback-channel based solution developed

at Univ. Stanford by Prof. Girod’s team.

(5)

PRISM: Encoder

• Encoder:

– Divides frame n in blocks.

– Selects skip/intra/WZ coding with different syndrome modes based on Frame Differences.

(6)

6

PRISM: Encoder

• Encoding of a WZ block:

– Compute DCT.

– Syndrome on LSB of low frequency coefficients. – CRC for these low frequency coefficients.

– Conventional coding for high frequency coefficients.

– Position of low vs high frontier depends on correlation strength.

Channel Code MSB Channel Code MSB LSB

(7)

PRISM: Decoder

• Decoder:

For every WZ block in frame n:

– Motion search at the decoder by triying every Side Information candidate block in frame (n-1).

– Corrects the (lower frequency) DCT coefficients with the syndrome.

– Check if the CRC is correct.

– Keeps the first predictor that provides a correct CRC. – Joints the high frequency coefficients which have been

(8)

8

Feedback-channel Solution: Encoder

• Encoder:

– Creates groups of frames with one key frame and (N-1) WZ coded frames.

– For every WZ frame:

• DCT + bitplanes (T-domain) or bitplanes of pixel values (pixel domain).

• Bitplanes are fed to a turbo encoder and parity bits are generated to be send on (decoder) request.

(9)

Feedback-channel Solution: Decoder

• Decoder:

– Constructs an estimation of the WZ frame using motion compensated interpolation using previous (n-1) and next (n+1) key frames sent

conventionally (if GOP size=2).

– Corrects the bitplanes of the estimation using the received parity bits and a noise correlation model.

(10)

10

Pros and Cons

• PRISM (Univ. Berkeley)

– Block base approach

– No need for a feedback channel for rate control

– A fixed high number of bits per WZ coded block

– Encoder more complex (mode decision)

– Decoder more complex (motion search)

• Feedback-based codec (Univ. Stanford)

– Finer rate control

– Simpler encoder and decoder (?)

– Frame based approach

(11)

DISCOVER

Threats and

Opportunities

(12)

12

The Challenges

•

The Conceptual Challenge

•

The Coding Efficiency Challenge

•

The Complexity Challenge

•

The Error Robustness Challenge

•

The Scalability Challenge

(13)

(14)

14

Emerging Challenges

• Applications (from down-link to up-link)

– Wireless digital video cameras

– Multimedia mobile phones and PDAs

– Low-power video sensors and surveillance cameras – Wireless video teleconferencing systems

• Requirements

– Light and flexible distribution codec complexity – Robustness to packet/frame losses

– High compression efficiency – Low latency

• Target

– Inter coding efficiency

– Intra coding complexity (encoder) – Intra coding robustness

Heavy encoder Light decoder light Transcoding

(15)

DISCOVER Studied Applications

1. Wireless Video Cameras

2. Wireless Low-Power Surveillance

3. Visual Sensor Networks

4. Networked Camcorders

5. Distributed Video Streaming

6. Mobile Document Scanner

7. Video Conferencing with Mobile Devices

8. Mobile Video Mail

9. Disposable Video Cameras

10. Multiview Image Acquisition

11. Wireless Capsule Endoscopy

(16)

16

DISCOVER

Monoview

Codec

(17)

Selecting an Architecture

– No need for a feedback channel for rate control

– A fixed high number of bits per WZ coded block

– Encoder more complex (mode decision)

– Decoder more complex (motion search)

• Feedback-based codec (Univ. Stanford)

– Finer rate control

– Frame based approach

(18)

18

Selecting an Architecture

– No need for a feedback channel for rate control

– DIFFICULT TO OBTAIN DETAILED SPECIFICATION

– A fixed high number of bits per WZ coded block – Encoder more complex (mode decision)

– Decoder more complex (“motion” search)

• Feedback-based codec (Univ. Stanford)

– Finer rate control

– A BIT LESS DIFFICULT TO OBTAIN DETAILED SPECIFICATION

– SOFTWARE IMPLEMENTATION AVAILABLE (from IST and VISNET)

– Frame based approach – Feedback channel, latency

(19)

DISCOVER Architecture

7 1 WZ and Conventional Video Splitting 3 Wyner-Ziv Encoder 3a T 3b Q 4 Conventional Video Decoder 8 Wyner-Ziv Decoder 8a Channel Decoder 8b Decoder Succ. / Failure 8d T-1 5 Side Information Extraction 2 Conventional Video Encoder 7b Soft Input Computation 6 Virtual Channel Model 7a T 3c Channel Encoder 3e Minimum Rate Estimation 8c Q-1_and Reconst. 8c Q-1_and Reconst. 3d Buffer

• Based on the feedback-channel solution from Univ. Stanford.

• Based on a split between Wyner-Ziv (WZ) and key frames.

(20)

20

Main Problems to Address

• Elimination of architectural limitations

– Coded key frames (not lossless)

– No original frames for decoder request control

– No original frames at decoder for correlation noise modeling

• Efficient exploitation of temporal correlation at

encoder by controlling the GOP size.

• Improvement of the accuracy of the side information

interpolation/extrapolation.

• Improvement of the accuracy of correlation noise

estimation at decoder.

• Elimination or reduction of feedback-channel usage

through encoder or hybrid rate control.

(21)

Encoder Modules: Adaptive GOP Size

To better exploit the temporal redundancy in the video, the

encoder performs GOP length selection depending on the motion activity in the sequence:

– High motion ⇒ low correlation ⇒ smaller GOP sizes – Low motion ⇒ high correlation ⇒ longer GOPs sizes

(22)

22

Encoder Modules: Adaptive GOP Size

• To perform GOP size control, it is proposed to:

– Measure at the encoder the amount of motion in a video sequence using adequate (low complexity) metrics.

– Perform hierarchical clustering of motion activity data - group frames which accumulate less motion using four (simple) motion activity

(23)

Enc. Modules: Transform and Quantization

•

Transform:

Wyner-Ziv frames are transformed using a 4×4

Discrete Cosine Transform, the one from H.264/AVC, whose

coefficients are organized in (4×4) 16 bands.

•

Independent Quantization:

Each DCT band is quantized

separately using a predefined number of levels, depending on

the target quality for the WZ frame.

•

DC Quantization:

A uniform scalar quantizer is used for the

DC band, assuming the data range .

•

AC Quantization:

For AC bands, a dead-zone quantizer with

doubled zero interval is applied. The

dynamic data range is calculated separately for each

bth

band,

b>1

, to be quantized, and transmitted to the decoder in the

coded bit stream.

•

Bitplane Coding:

The quantization indices of each DCT band

(24)

24

Encoder Modules: Channel Coding

Turbo Codes

•

Turbo Encoder

– 2 identical Recursive Systematic Convolutional (RSC) encoders.

– Pseudo-random interleaver. – Puncturing for lower rates.

•

Turbo Decoder

– Two Soft-Input Soft-Output (SISO) decoders.

– Maximum A Posteriori (MAP) algorithm.

– Laplacian distribution to model the X,Y correlation.

LDPC (Low-Density Parity-Check) Codes

– LDPC Accumulate (LDPCA) codec as developed by D. Varodayan, et

al. in “Rate-Adaptive Codes for Distributed Source Coding”, EURASIP Signal Processing Journal, Special Issue on Distributed Source

(25)

Encoder Modules: Minimum Rate Estimator

To reduce the number of requests to be made by the decoder (with a strong impact on the decoding complexity), the encoder can estimate a minimum number of accumulated syndromes to be sent

per bitplane and per band.

• The DISCOVER codec solution is based on the Wyner-Ziv

rate-distortion bound for two correlated Gaussian sources which defines the minimal rate at which one source (X) can be transmitted at a given

distortion DX_{, to be , where}_σ2 _{is the variance of the}

correlation noise between the two sources, given that the second source (Y, the Side Information) is known perfectly at the decoder. • A separate rate for each bitplane can be obtained by estimating the

reduction of distortion brought by each bitplane with respect to previously decoded bitplanes (for each band).

• σ2 is a parameter of the noise correlation channel model, which is

(26)

26

Encoder Modules: Encoder Rate Control

• The DISCOVER codec assumes Decoder Rate Control based

on a feedback channel but…

– In some applications, the feedback channel is not available. – The feedback channel introduces delay in the system.

• So, it may be important to perform efficient Encoder Rate

Control (ERC) for transform domain (TD) WZ video coding.

(27)

Encoder Modules: Encoder Rate Control

• An estimate of the SI frame is generated at the encoder using a low-complexity estimation technique (adjacent original key frames are used as input).

• The same 4x4 DCT transform is applied over the SI frame estimate and each DCT band is uniformly quantized.

• The conditional entropy is computed for each bitplane.

• The relative error probability p between corresponding DCT band bitplanes of the SI and WZ frames is computed.

(28)

28

The ‘Clever Guy’ …

But opposite to conventional video coding, the

decoder (not anymore the encoder !) is the …

KING …

(29)

Decoder Modules: Side Information Creation

Since the RD performance is highly dependent on the quality

of the side information, it is essential to find efficient

encoder and decoder tools to generate the highest quality

(30)

30

Decoder Modules: Side Information Creation

• Trajectory-based Motion Interpolation:

(31)

Dec. Modules: Correlation Noise Estimation

Performing efficient decoder (online) correlation noise estimation

for WZ video coding

– Is essential for a more realistic/practical PDWZ video coding scenario. – Implies the dynamic estimation of the correlation noise distribution

parameter assuming a Laplacian distribution.

– Targets to be as efficient as the offline estimation based on the original information.

(32)

32

Dec. Modules: Correlation Noise Estimation

Correlation noise estimation for WZ video coding:

– Made at the decoder, based on the key frames → realistic scenario.

– Exploits temporal correlation by using the motion compensated residual. – Different spatial granularity levels may be used to achieve better

adaptation to the correlation noise statistics:

• Frame level • Block level • Pixel level

Frame level ? Compute R frame variance Yes Next frame Motion compensated residual frame R Compute CN parameter at frame level as function of R

frame variance

No

Block level ? Compute block variance Yes

Next frame Compute CN parameter at

block level as function of block variance Last R frame block ? Yes No No Pixel level ? Yes Next frame Compute CN parameter at pixel level Last R frame pixel ? Yes No

(33)

Dec. Modules: Request Stopping Criteria

• To establish if decoding is successful, the decoder convergence is tested by computing the syndrome check error, i.e. the Hamming

distance between the received syndrome and the one generated using the decoded bitplane, followed by a cyclic redundancy check (CRC).

– If the Hamming distance is different from zero, then the decoder proceeds

to the next iteration. After a certain amount of iterations (≈100), if the

Hamming distance remains different from zero, then the bitplane is

assumed to be erroneously decoded and the LDPCA decoder requests for more syndromes via the return channel.

– If the Hamming distance is equal to zero, then the successfulness of the decoding operation is verified using a 8-bit CRC sum.

• If the CRC sum computed on the decoded bitplane matches the value received from the encoder, the decoding is declared successful and the decoded bitplane is sent to the reconstruction module.

• Otherwise, the decoder requests more accumulated syndromes and thus a final low error probability is always guaranteed.

(34)

34

Decoder Modules: Reconstruction

• The decoded value is reconstructed in a mean squared

error-optimal way as the expectation of

x

given the decoded

quantization index,

q,

and the side information value,

y,

this

means

.

• The calculation of this expectation value is performed using

closed-form expressions derived for a Laplacian correlation

model.

• Those frequency bands for which no information was

transmitted from the encoder are taken directly from the Side

Information.

• After that, the inverse 4x4 DCT transform is applied, and the

whole WZ frame is restored in the pixel domain.

(35)

DISCOVER

Performance

(36)

36

Test Conditions

• Frames: all frames this means 299 for Foreman, 329 for Hall Monitor, 299 for Coast Guard, and 299 for Soccer.

• Spatial resolution: QCIF. • Temporal resolution: 15 Hz

and 30 Hz which means 7.5 or 15 Hz for the WZ frames when GOP=2 is used.

• GOP length: 2, 4 and 8

(a) (b) (c) (d)

(37)

Evaluation Metrics

• Forward Channel Performance Evaluation

– Measuring the Overall Rate-Distortion Performance

– Measuring the Quality Evolution of WZ Decoded Frames – Measuring the Bitplane Compression Factor

– Measuring the Decoded Quality Versus the Side Information Quality

• Feedback Channel Performance Evaluation

– Measuring the Number of Requests

– Measuring the Feedback Channel Rate

– Measuring the Number of Errors Versus the Number of Requests

– Measuring the Number of Requests Versus Side Information Quality

• Complexity Performance Evaluation

– Encoding Complexity

(38)

38

RD Performance (GOP 2)

Coast Guard 24 26 28 30 32 34 36 38 40 0 50 100 150 200 250 300 350 400 450 500 550 600 Rate [kbps] P S N R [ d B ]

DISCOVER H.264/AVC (Intra) H.263+ (Intra) H.264/AVC (No Motion)

Hall Monitor 25 27 29 31 33 35 37 39 41 43 0 50 100 150 200 250 300 350 400 450 500 550 600 Rate [kbps] P S N R [ d B ]

Soccer 24 26 28 30 32 34 36 38 40 0 50 100 150 200 250 300 350 400 450 500 550 600 Rate [kbps] P S N R [ d B ]

Foreman 25 27 29 31 33 35 37 39 41 0 50 100 150 200 250 300 350 400 450 500 550 600 Rate [kbps] P S N R [ d B ]

(39)

RD Performance (GOP 2,4,8)

Coast Guard 24 26 28 30 32 34 36 38 0 50 100 150 200 250 300 350 400 450 500 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 LDPC - GOP 4 LDPC - GOP 8 Hall Monitor 29 31 33 35 37 39 41 0 50 100 150 200 250 300 350 P S N R [ d B ] LDPC - GOP 2 LDPC - GOP 4 LDPC - GOP 8 Soccer 25 27 29 31 33 35 37 39 0 50 100 150 200 250 300 350 400 450 500 550 600 650 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 LDPC - GOP 4 LDPC - GOP 8 Foreman 25 27 29 31 33 35 37 39 41 P S N R [ d B ] LDPC - GOP 2 LDPC - GOP 4 LDPC - GOP 8

QCIF, 15 Hz

(40)

40

LDPC versus Turbo Codes

Coast Guard 25 27 29 31 33 35 37 0 50 100 150 200 250 300 350 400 450 500 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 TC - GOP 2 LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 Hall Monitor 29 31 33 35 37 39 41 0 50 100 150 200 250 300 350 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 TC - GOP 2 LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 Soccer 25 27 29 31 33 35 37 39 0 50 100 150 200 250 300 350 400 450 500 550 600 650 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 TC - GOP 2 LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8 Foreman 25 27 29 31 33 35 37 39 41 0 50 100 150 200 250 300 350 400 450 500 550 600 Rate [kbps] P S N R [ d B ] LDPC - GOP 2 TC - GOP 2 LDPC - GOP 4 TC - GOP 4 LDPC - GOP 8 TC - GOP 8

(41)

Bitplane Compression Factor (Qi 4)

Compression Factor (Qi=8)

2 1 2 1 3 2 1 3 2 1 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 6 5 4 3 2 1 6 5 4 3 2 1 7 6 5 4 3 2 1 A C 1 4 A C 1 3 A C 1 2 A C 1 1 A C 1 0 A C 9 A C 8 A C 7 A C 6 A C 5 A C 4 A C 3 A C 2 A C 1 D C 0 5 10 15 20 25 30 35 40 Bitplane Number C o m p re s s io n F a c to r Coastguard Foreman

Compression Factor (Qi=8)

1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 1 2 1 2 D C A C 1 A C 2 A C 3 A C 4 A C 5 A C 6 A C 7 A C 8 A C 9 A C 1 0 A C 1 1 A C 1 2 A C 1 3 A C 1 4 0 5 10 15 20 25 30 35 40 C o m p re s s io n F a c to r

(42)

42

Number of Requests (Qi 8)

Number of Requests (Qi=8)

2 1 2 1 3 2 1 3 2 1 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 4 3 2 1 5 4 3 2 1 5 4 3 2 1 5 4 3 2 1 6 5 4 3 2 1 6 5 4 3 2 1 7 6 5 4 3 2 1 A C 1 4 A C 1 3 A C 1 2 A C 1 1 A C 1 0 A C 9 A C 8 A C 7 A C 6 A C 5 A C 4 A C 3 A C 2 A C 1 D C 0 5 10 15 20 25 Bitplane Number N u m b e r o f R e q u e s ts Coastguard Foreman

Number of Requests (Qi=8)

1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 1 2 3 1 2 3 1 2 1 2 D C A C 1 A C 2 A C 3 A C 4 A C 5 A C 6 A C 7 A C 8 A C 9 A C 1 0 A C 1 1 A C 1 2 A C 1 3 A C 1 4 0 5 10 15 20 25 Bitplane Number N u m b e r o f R e q u e s ts

(43)

Encoding Complexity (GOP 2)

Coast Guard 8 7 6 5 4 3 2 1 0 10 20 30 40 50 60 Qi T im e ( s e c )

DISCOVER (WZ Frames) DISCOVER (Key Frames) H.264/AVC (Intra) H.264/AVC (No Motion)

Hall Monitor 8 7 6 5 4 3 2 1 0 10 20 30 40 50 60 T im e ( s e c )

Soccer 1 2 3 4 5 6 7 8 0 10 20 30 40 50 Qi T im e ( s e c )

Foreman 1 2 3 4 5 6 7 8 0 10 20 30 40 50 T im e ( s e c )

(44)

44

Decoding Complexity (GOP 2)

Coast Guard 8 7 6 5 4 3 2 1 0 500 1000 1500 2000 2500 3000 Qi T im e ( s e c )

Hall Monitor 8 7 6 5 4 3 2 1 0 200 400 600 800 1000 1200 1400 Qi T im e ( s e c )

Soccer 8 7 6 5 4 3 2 1 0 500 1000 1500 2000 2500 3000 3500 4000 Qi T im e ( s e c )

Foreman 1 2 3 4 5 6 7 8 0 500 1000 1500 2000 2500 3000 3500 Qi T im e ( s e c )

(45)

Performance Conclusions

• In terms of RD performance, the DISCOVER codec already wins

against the H.264/AVC Intra codec, for most test sequences, and

for GOP=2; for more quiet sequences, the DISCOVER codec

already wins against the H.264/AVC No Motion codec.

• For longer GOP sizes, winning against H.264/AVC Intra is more

difficult highlighting the importance and difficulty of side

information, notably when key frames are farther away.

• The total bitrate for the feedback channel is rather low … but the

feedback adds delay and requires a real-time setup.

• DISCOVER encoding complexity is always much lower than the

H.264/AVC Intra encoding complexity, even for GOP=2 where it

performs better in terms of RD performance.

(46)

46

DISCOVER

(the) Future

(47)

Main Conclusions

• Since the DISCOVER monoview codec performs better than

H.264/AVC Intra for GOP=2, for most sequences, this highlights

that Wyner-Ziv is already a credible coding solution when

encoding complexity is a very critical requirement (even if at the

cost of some additional decoding complexity).

• The results achieved during the lifetime of DISCOVER allowed

to improve the compression performance of monoview WZ

codecs but it is clear that much research is still to be made to

approach the theoretical limits …

• Further research should address side information creation,

correlation noise modeling, channel codes, rate control,

reconstruction, WZ selective coding, etc …

(48)

48

DISCOVER for the World

The DISCOVER Codec may be downloaded at

http://www.discoverdvc.org/ !

• The executable codec, along with sample configuration

and test files, can be downloaded for:

– Windows – Linux/32-bit – Linux/64-bit

• An overview paper and a detailed performance

evaluation with precise test conditions are also

available.

(49)

Main References

General

– J. Ascenso, C. Brites, and F. Pereira, “Improving frame interpolation with spatial motion smoothing for pixel

domain distributed video coding,” in Proc. 5th EURASIP Conf. Speech Image Processing, Multimedia Commun. Services, Smolenice, Slovak Republic, July 2005.

– J. Ascenso, C. Brites, F. Pereira, “Content adaptive Wyner-Ziv video coding driven by motion activity”, IEEE

International Conference on Image Processing, Atlanta, USA, October 8-11, 2006.

– X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, M. Ouaret, “The DISCOVER codec: architecture,

techniques and evaluation”, Picture Coding Symposium, Lisboa, Portugal, November 2007.

– C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi, R. Leonardi, J. Ostermann, “Distributed monoview and

multiview video coding, IEEE Signal Processing Magazine, vol. 24, nº 5, pp. 67 – 76, September 2007. Codec

– J. Ascenso, C. Brites, F. Pereira, "Content adaptive Wyner-Ziv video coding driven by motion activity", IEEE

International Conference on Image Processing, Atlanta, USA, October 2006.

– J. Ascenso, F.Pereira, "Adaptive hash based side information exploitation for efficient Wyner-Ziv video coding",

IEEE International Conference on Image Processing, San Antonio, USA, September 2007. Encoder

– C. Brites, F. Pereira, “Encoder rate control for transform domain Wyner-Ziv Video coding”, IEEE International

Conference on Image Processing, San Antonio, Texas, USA, September 2007.

– D. Kubasov, K. Lajnef, and C. Guillemot, “A hybrid encoder/decoder rate control for a Wyner-Ziv video codec

with a feedback channel”, IEEE Multimedia Signal Processing Workshop, MMSP, Chania, Crete, Greece, Oct. 2007.

Decoder

– C. Brites, J. Ascenso, F. Pereira, “Modeling correlation noise statistics at decoder for pixel based Wyner-Ziv

video coding”, Picture Coding Symposium, Beijing, China, April 2006.

– C. Brites, J. Ascenso, F. Pereira, “Studying temporal correlation noise modeling for pixel based Wyner-Ziv video

coding”, IEEE International Conference on Image Processing, Atlanta, USA, October 2006

– D. Kubasov, J. Nayak, C. Guillemot, “Optimal reconstruction in Wyner-Ziv video coding with multiple side

(50)

50

Thanks for your attention !

M

ore information at

http://

www.discoverdvc.org

/

IST DISCOVER