Optimized two-layer DCT-based video compression algorithm for packet-switched network transmission

(1)

ABSTRACT

BLAKE, STEVEN LANGLEY. Optimized Two-Layer DCT-Based Video Com-pression Algorithm for Packet-Switched Network Transmission. (Under the direction of Dr. Tony L. Mitchell and Dr. Sarah A. Rajala)

In this dissertation a systematic study of the issues involved in packet video transmission is presented. Classical video compression algorithms are reviewed, with special emphasis given to the H.261 videoconferencing standard. A brief review of quality-of-service characteristics for packet-switched networks is provided. Require-ments for packet video quality-of-service are examined, and the bit rate and error statistics of ve video sequences encoded using the H.261 algorithm are presented.

Asynchronous Transfer Mode networks are discussed as a suitable technology for enabling packet video transmission. The call admission problem is discussed and two conservative admission algorithms are analyzed. Statistical multiplexing gains are computed for various variable bit rate video sources which are policed using a set of standardized trac descriptors.

Bounds on the instantaneous encoding and transmission rate of a video source are derived. A novel rate-control algorithm is presented, which decouples quantization selection from the instantaneous encoder buer occupancy. The performance of this rate control algorithm with various video sources and source trac descriptors is investigated.

(2)

Three techniques|periodic replenishment, conditional macroblock replenishment, and block error thresholding|are proposed as means to improve two-layer encoder eciency. Their performance is analyzed for two video sequences.

(3)

OPTIMIZED TWO-LAYER DCT-BASED VIDEO

COMPRESSION ALGORITHM FOR

PACKET-SWITCHED NETWORK TRANSMISSION

by

STEVEN LANGLEY BLAKE

A dissertation submitted to the Graduate Faculty of

North Carolina State University

in partial fulllment of the

requirements for the Degree of

Doctor of Philosophy

ELECTRICAL ENGINEERING

Raleigh

1995

APPROVED BY:

(4)

DEDICATION

This dissertation is dedicated in loving memoryto mymother, Jerry Roddy Blake.

(5)

BIOGRAPHY

Steven Langley Blake

was born in Winston-Salem, NC in March 1966. After living in both North Carolina and New Jersey, he began his undergraduate program in electrical engineering at North Carolina State University in August 1984. He completed his bachelors and masters degrees in 1988 and 1989, respectively. During 1993 and 1994 he was employeed at MCNC in the High Performance Computing and Communications group, where he conducted much of this research. He is currently employed in the Networking Architecture department at IBM Networking Hardware Division. His research interests include high-performance networking architectures, protocol and scheduling support for multimedia applications, and video coding.

(6)

ACKNOWLEDGEMENTS

First and foremost, I would like to thank my mother and father, whose tireless, unwavering love, encouragement, and support throughout my life have made this work possible.

I would like to thank my committee co-chairpersons, Dr. Tony Mitchell and Dr. Sarah Rajala, for their long{standing support and encouragement of my work. Their helpful direction and suggestions contributed greatly to myeorts. Most importantly, they allowed me tremendous freedom in setting the course of my research, and for that, I am eternally grateful. I also wish to thank my other committee members, Dr. Arne Nilsson, Dr. Ethelburt Chukwu, and Mr. George Abbott, for agreeing to serve on my committee and for their helpful comments and suggestions. I would be remiss if I did not express my gratitude to Dr. Ben O'Neal, Dr. Keith Townsend, and Dr. Mark White, for their advice and encouragement over the years.

Completion of this work would not have been possible without the nancial and collaborative support of MCNC. I would especially like to thank Dan Stevenson, Fred Heaton, Dr. Fengmin Gong, and Dr. Frank Jou, for their advice, insights, and encouragement.

Special thanks are due to Dr. Amy Reibman at AT&T Bell Laboratories, for providing me with several of the uncompressed video sequences that I used, as well as for her helpful advice and suggestions.

Many of my colleagues in the ECE graduate program have assisted me over the years, and I gladly oer my gratitude to them, with special thanks oered to Tung

(7)

Ouyang, Bongtae Kim, Michael Izquierdo, Jim Freebersyser, Dr. Olen Stokes, and Dr. Robert Van Dyck for the many fruitful discussions we have had together.

My enthusiasm was bolstered through many trying times by the love and encour-agement of my lovely wife Wendy. I am truly the luckiest man in the world for having her stand by my side.

Finally, to all of my friends, and especially to Wendy, Mike, Cli, Ken, Tung, and Tushar, thanks for bearing with me throughout this adventure. What a long, strange trip it's been.

(8)

List of Tables

xi

List of Figures

xv

1 INTRODUCTION

1 2 REVIEW OF IMAGE CODING ALGORITHMS

8

2.1 Digital Image Representation : : : : : : : : : : : : : : : : : : : : : : 8 2.2 Classical Intraframe Coding Techniques : : : : : : : : : : : : : : : : : 11 2.2.1 Predictive Coding : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.2 Transform Coding: : : : : : : : : : : : : : : : : : : : : : : : : 14 2.2.3 Subband/Wavelet Coding : : : : : : : : : : : : : : : : : : : : 17 2.2.4 Vector Quantization : : : : : : : : : : : : : : : : : : : : : : : 20 2.3 Interframe Coding Techniques : : : : : : : : : : : : : : : : : : : : : : 23 2.3.1 Digital Video Characteristics: : : : : : : : : : : : : : : : : : : 23 2.3.2 Interframe Predictive Coding : : : : : : : : : : : : : : : : : : 24 2.3.3 Motion-Compensated Predictive Coding : : : : : : : : : : : : 26 2.3.4 Motion-Compensated Frame Interpolation : : : : : : : : : : : 28 2.3.5 Three-Dimensional Transform/Subband Coding : : : : : : : : 29

(9)

2.4 DCT-Based Motion-Compensated Interframe Prediction Algorithms : 30 2.4.1 H.261 Coding Algorithm : : : : : : : : : : : : : : : : : : : : : 32 2.4.2 MPEG Coding Algorithm : : : : : : : : : : : : : : : : : : : : 42

3 REVIEW OF PACKET-SWITCHED NETWORK SERVICE

CHAR-ACTERISTICS

45

3.1 Basic Service Characteristics : : : : : : : : : : : : : : : : : : : : : : : 46 3.1.1 Routing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 46 3.1.2 Delay : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 47 3.1.3 Congestion and Loss : : : : : : : : : : : : : : : : : : : : : : : 47 3.1.4 Admission Control : : : : : : : : : : : : : : : : : : : : : : : : 50 3.2 ATM Network Service Characteristics : : : : : : : : : : : : : : : : : : 52

4 QUALITY-OF-SERVICE REQUIREMENTS FOR PACKET VIDEO

TRANSMISSION

55

4.1 Delay: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 55 4.1.1 End-to-End Latency : : : : : : : : : : : : : : : : : : : : : : : 55 4.1.2 Audio-Video Skew : : : : : : : : : : : : : : : : : : : : : : : : 56 4.1.3 Packet Delay Variation : : : : : : : : : : : : : : : : : : : : : : 57 4.2 Error Tolerance : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 58 4.3 Transmission Rate Requirements : : : : : : : : : : : : : : : : : : : : 60 4.3.1 Constraints on the Instantaneous Transmission Rate : : : : : 61 4.3.2 Bit Rates of Unbuered VBR Video Sequences : : : : : : : : : 63 4.3.3 Bit Rates of Buered VBR Video Sequences : : : : : : : : : : 68 4.4 Reconstructed Image Quality : : : : : : : : : : : : : : : : : : : : : : 73

(10)

5 CONNECTION ADMISSION CONTROL FOR PACKET VIDEO

TRANSMISSION

77

5.1 Connection Admission Control : : : : : : : : : : : : : : : : : : : : : : 79 5.2 Statistical Multiplexing Gain As A Function Of Connection Trac

Descriptors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 86 5.2.1 Cc=PCR = 200 : : : : : : : : : : : : : : : : : : : : : : : : : : 88

5.2.2 Cc=PCR = 50 : : : : : : : : : : : : : : : : : : : : : : : : : : : 91

5.2.3 Cc=PCR = 10 : : : : : : : : : : : : : : : : : : : : : : : : : : : 93

5.2.4 SMG vs Cc=PCR : : : : : : : : : : : : : : : : : : : : : : : : 95

5.2.5 Discussion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 96 5.3 Statistical Multiplexing Gains of VBR Video Sequences : : : : : : : : 97 5.3.1 Leaky Bucket Parameters for VBR Video Sequences : : : : : : 98 5.3.2 Trac Descriptors and Statistical Multiplexing Gain : : : : : 104

6 ENCODER RATE CONTROL FOR VBR TRANSMISSION

112

6.1 Instantaneous Constraints on the Encoded and Transmitted Bit Rate 114 6.2 Encoder Rate Control : : : : : : : : : : : : : : : : : : : : : : : : : : 120 6.2.1 RM8 CBR Rate Control Algorithm : : : : : : : : : : : : : : : 121 6.2.2 VBR Rate Control Algorithm : : : : : : : : : : : : : : : : : : 124 6.3 Target Bit Rate Selection : : : : : : : : : : : : : : : : : : : : : : : : 127 6.3.1 Complexity Measures and Bit Rate Prediction : : : : : : : : : 129 6.3.2 Modied VBR Encoder Rate Control Algorithm : : : : : : : : 133 6.3.3 Transmission Rate Scheduling : : : : : : : : : : : : : : : : : : 137 6.4 Encoder Rate Control Performance : : : : : : : : : : : : : : : : : : : 137 6.4.1 VBR vs CBR Trac Constraints : : : : : : : : : : : : : : : : 137

(11)

6.4.2 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 146

7 EFFICIENT TWO-LAYER ENCODING ALGORITHM

147

7.1 Argument for a Two-Layer Encoding Algorithm : : : : : : : : : : : : 148 7.2 Architecture Modications to Support Two-Layers : : : : : : : : : : : 150 7.3 Comparison of Two-Layer Encoding Methods : : : : : : : : : : : : : 157 7.4 Eciency Enhancements : : : : : : : : : : : : : : : : : : : : : : : : : 164

8 STATISTICAL MULTIPLEXING GAIN FOR TWO-LAYER VBR

VIDEO

176

8.1 Quality-of-Service Specication for Two-Layer Video Encoding : : : : 176 8.1.1 Best-Case SMG Improvement : : : : : : : : : : : : : : : : : : 176 8.1.2 CAC and Policing for Two-Layer Video : : : : : : : : : : : : : 178 8.2 Two-Layer Encoder Rate Control : : : : : : : : : : : : : : : : : : : : 183 8.3 Performance of Two-Layer Rate-Controlled VBR Video : : : : : : : : 188 8.4 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 203

9 RELATED RESEARCH

206 10 CONCLUSIONS

211

10.1 Discussion of Results : : : : : : : : : : : : : : : : : : : : : : : : : : : 211 10.2 Signicant Contributions : : : : : : : : : : : : : : : : : : : : : : : : : 212 10.3 Suggestions for Future Research : : : : : : : : : : : : : : : : : : : : : 213

11 Bibliography

215

(12)

2.1 Parameter values for CCITT p64 kbps video formats. : : : : : : : 32

4.1 Mean time to cell loss (MTL) for various mean bit rates MBR and CLR thresholds. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 59 4.2 Bandwidth ranges and burstiness ratios for various packet video

ap-plications. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 60 4.3 Bit rate statistics (bps) for seven video sequences withQ = 12 (MPEG

B-frames are coded with Q = 24). : : : : : : : : : : : : : : : : : : : : 68 4.4 Peak rate statistics for the Hockey sequence for L = 0;3;5 frame

periods of encoder buering. : : : : : : : : : : : : : : : : : : : : : : : 72 4.5 Bit rate statistics (bps) for seven video sequences withQ = 12 (MPEG

B-frames are coded withQ = 24) and with decoder delay L = 3 (frame intervals). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 73 4.6 PSNR statistics (dB) for ve video sequences with Q = 12. : : : : : : 76

5.1 Probability distributions for CAC1 and CAC2: tPCR = 100 (slots),

tC = 15000 (slots), MBS = 50 (cells), K = 1024 (slots). : : : : : : : 85

5.2 NPRA for each trac descriptor set. : : : : : : : : : : : : : : : : : : 88

5.3 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the result-ing statistical multiplexresult-ing gains and achieved link loads determined for the Salesman sequence using CAC1: Cc = 353207 (cells/sec),

K = 256 (slots), PCR = 3253 (cells/sec), Ravg = 363450 (bits/sec),

NPRA = 103, L = 3 (frames), Q = 12. : : : : : : : : : : : : : : : : : : 105

(13)

5.4 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the re-sulting statistical multiplexing gains and achieved link loads deter-mined for the Claire sequence using CAC1: Cc = 353207 (cells/sec),

NPRA = 207, L = 3 (frames), Q = 12. : : : : : : : : : : : : : : : : : : 106

5.5 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the re-sulting statistical multiplexing gains and achieved link loads deter-mined for the Dave sequence using CAC1: Cc = 353207 (cells/sec),

NPRA = 160, L = 3 (frames), Q = 12. : : : : : : : : : : : : : : : : : : 107

5.6 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the re-sulting statistical multiplexing gains and achieved link loads deter-mined for the Tennis sequence using CAC1: Cc = 353207 (cells/sec),

NPRA = 23, L = 3 (frames), Q = 12. : : : : : : : : : : : : : : : : : : 108

5.7 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the re-sulting statistical multiplexing gains and achieved link loads deter-mined for the Calendar MPEG sequence using CAC1: Cc = 353207

(cells/sec), K = 256 (slots), PCR = 6307 (cells/sec), Ravg = 1088067

(bits/sec), NPRA= 53, L = 3 (frames). : : : : : : : : : : : : : : : : : 109

5.8 Trac descriptors (SCR (cells/sec) and MBS (cells)) and the result-ing statistical multiplexresult-ing gains and achieved link loads determined for the HockeyMPEG sequence usingCAC1: Cc = 353207 (cells/sec),

NPRA = 61, L = 3 (frames). : : : : : : : : : : : : : : : : : : : : : : : 110

6.1 Trac descriptor parameters, leaky bucket parameters, nPRA, n10?9,

and RCBR for videoconferencing trac descriptorsLB1, LB2, and LB3.138

6.2 Ei and PSNR statistics for theSalesman sequence encoded with VBR

trac constraintsLB1, LB2, and LB3, and corresponding CBR trac constraints CBR1, CBR2, CBR3. : : : : : : : : : : : : : : : : : : : 139 6.3 Ei and PSNR statistics for the Dave sequence encoded with VBR

trac constraintsLB1, LB2, and LB3, and corresponding CBR trac constraints CBR1, CBR2, CBR3. : : : : : : : : : : : : : : : : : : : 142

(14)

6.4 Ei and PSNR statistics for the Claire sequence encoded with VBR

trac constraintsLB1, LB2, and LB3, and corresponding CBR trac constraints CBR1, CBR2, CBR3. : : : : : : : : : : : : : : : : : : : 145

7.1 Statistics for the single-layer encoder enhancement for the Claireand

Tennis sequences: Q = 12. : : : : : : : : : : : : : : : : : : : : : : : : 167 7.2 Statistics for the two-layer encoder without eciency enhancement for

the Claire and Tennis sequences: Sb = 32,Qb = 24,Qe= 12. : : : : : 167

7.3 Statistics for the two-layer encoder with Periodic GOB Replenishment for the Claire and Tennis sequences: Sb = 32, Qb= 24, Qe= 12. : : : 167

7.4 Statistics for the two-layer encoder with Block Error Thresholding for theClaireandTennissequences: Sb = 32,Qb = 24,Qe= 12,Tbe1 = 34

dB, Tbe2= 0:85. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 168

7.5 Statistics for the two-layer encoder with Conditional Macroblock Re-plenishment for the Claire and Tennis sequences: Sb = 32, Qb = 24,

Qe = 12, Tcmr = 37 dB. : : : : : : : : : : : : : : : : : : : : : : : : : : 168

7.6 Statistics for the two-layer encoder with Periodic GOB Replenishment, Block Error Thresholding, and Conditional Macroblock Replenish-ment for theClaire andTennis sequences: Sb = 32,Qb = 24,Qe = 12,

Tbe1= 34 dB, Tbe2= 0:85, Tcmr = 37 dB. : : : : : : : : : : : : : : : : 169

7.7 Statistics for the two-layer encoder with Periodic GOB Replenishment, Block Error Thresholding, and Conditional Macroblock Replenish-ment for theClaire andTennis sequences: Sb = 32,Qb = 24,Qe = 12,

Tbe1= 34 dB, Tbe2= 0:85, Tcmr = 37 dB. : : : : : : : : : : : : : : : : 170

8.1 Policing parameters and admission parameters for the Claire, Dave, Salesman, and Tennis sequences;L = 3, Q = 12. : : : : : : : : : : : : 189 8.2 Policing parameters and admission parameters for the Claire, Dave,

Salesman, and Tennis sequences using two-layer rate control;L = 3. : 190 8.3 Statistical multiplexing results for the Clairesequence using two-layer

rate control; L = 3, Qbtar = 24, Sbtar = 32,Qetar = 12, nbase = 390. : : 192

(15)

8.4 Statistics for the Claire sequence using a single-layer rate-controlled encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 193 8.5 Statistical multiplexing results for the Dave sequence using two-layer

rate control; L = 3, Qbtar = 24, Sbtar = 32,Qetar = 12, nbase = 295. : : 195 8.6 Statistics for the Dave sequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 196 8.7 Statistical multiplexing results for the Salesman sequence using

two-layer rate control; L = 3, Qbtar = 24, Sbtar = 32, Qetar = 12, nbase= 194.199 8.8 Statistics for theSalesmansequence using a single-layerrate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 200 8.9 Statistical multiplexing results for theTennissequence using two-layer

rate control; L = 3, Qbtar = 24, Sbtar = 32,Qetar = 12, nbase = 29. : : : 201 8.10 Statistics for the Tennis sequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 202 8.11 Cell loss periods as a function of PCR and SCR at various values

of CLR for the Claire, Dave, Salesman, and Tennis sequences using two-layer rate control; L = 3, Qbtar = 24, Sbtar = 32, Qetar = 12. : : : 203 8.12 Statistical multiplexinggains for theClaire, Dave, Salesman, and

Ten-nis sequences using two-layer encoding. : : : : : : : : : : : : : : : : : 205

10.1 Link utilization for video sources using peak rate, single-layer VBR, and two-layer VBR bandwidth allocation. : : : : : : : : : : : : : : : 212

(16)

2.1 Sampling of luminance and chrominance pels. : : : : : : : : : : : : : 34 2.2 H.261 Video Encoder.: : : : : : : : : : : : : : : : : : : : : : : : : : : 35 2.3 H.261 Video Decoder. : : : : : : : : : : : : : : : : : : : : : : : : : : : 36

3.1 Relationship between network utilization and queueing delay. : : : : : 49 3.2 Relationship between network utilization and throughput. : : : : : : 50 3.3 Utilization vs loss for ON/OFF sources: C = 100Rmax. : : : : : : : : 51

4.1 Bit rates (bits/frame) for 400 frames of theSalesman, ClaireandDave

sequences; Q = 12. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 4.2 Bit rates (bits/frame) for 300 frames of the Tennis and 150 frames of

the Ferriswheelsequence; Q = 12. : : : : : : : : : : : : : : : : : : : : 64 4.3 Bit rate (bits/frame) for 896 frames of the Calendar MPEG sequence. 65 4.4 Bit rate (bits/frame) for 896 frames of the Hockey MPEG sequence. : 66 4.5 Bit rate (bits/sec) vs quantizer stepsize for the videoconferencing

se-quences. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 66 4.6 Bit rate (bits/sec) vs quantizer stepsize for the `TV-type' sequences. : 67 4.7 Bit rate (bits/sec) for the Salesman sequence with L = 0;3;5 frame

periods of encoder buering. : : : : : : : : : : : : : : : : : : : : : : : 70 4.8 Bit rate (bits/sec) for the Tennis sequence with L = 0;3;5 frame

periods of encoder buering. : : : : : : : : : : : : : : : : : : : : : : : 70

(17)

4.9 Bit rate (bits/sec) for the Hockey sequence with L = 0;3;5 frame periods of encoder buering. : : : : : : : : : : : : : : : : : : : : : : : 71 4.10 Peak bit rate (bits/sec)vsquantizer stepsize for theSalesmansequence

with L = 0;3;5 frame periods of encoder buering. : : : : : : : : : : 71 4.11 Peak bit rate (bits/sec) vs quantizer stepsize for theTennis sequence

with L = 0;3;5 frame periods of encoder buering. : : : : : : : : : : 72 4.12 PSNR response of ve H.261 video sequences for Q = 12. : : : : : : : 74 4.13 Mean PSNR vs quantizer stepsize for the H.261 sequences. : : : : : : 75 4.14 Minimum PSNR vs quantizer stepsize for the H.261 sequences. : : : : 75

5.1 Relationship of MBS, tPCR, tON, tOFF, and tC for a periodic,

deter-ministic ON/OFF source. : : : : : : : : : : : : : : : : : : : : : : : : 80 5.2 Cell multiplexor and buer system model. : : : : : : : : : : : : : : : 81 5.3 CLRvsN response curve for CAC algorithmsCAC1,CAC2: Cc=PCR =

100, MBS = 50 (cells), PCR=SCR = 3, K = 1024 (slots). : : : : : : 84 5.4 Convolved probability distribution of the number of cells observed in

aK=2 slot window from a multiplex of N = 250 homogeneous sources for CAC algorithms CAC1, CAC2: Cc=PCR = 100, MBS = 50

(cells), PCR=SCR = 3, K = 1024 (slots). : : : : : : : : : : : : : : : 86 5.5 Statistical multiplexing gain vs CLR vs CAC for Cc=PCR = 200,

MBS = 1000 (cells), and K = 256 (slots). : : : : : : : : : : : : : : : 89 5.6 Statistical multiplexing gain vs CLR vs CAC for Cc=PCR = 200,

MBS = 50 (cells), and K = 1024 (slots). : : : : : : : : : : : : : : : : 90 5.7 Statistical multiplexing gain vs CLR vs K for Cc=PCR = 200 and

MBS = 1000 (cells) using CAC1. : : : : : : : : : : : : : : : : : : : : 90 5.8 Statistical multiplexing gain vs CLR vs CAC for Cc=PCR = 50,

MBS = 1000 (cells), and K = 256 (slots). : : : : : : : : : : : : : : : 91 5.9 Statistical multiplexing gain vs CLR vs K for Cc=PCR = 50 and

MBS = 1000 (cells) using CAC1. : : : : : : : : : : : : : : : : : : : : 92

(18)

5.10 Statistical multiplexing gain vs CLR vs MBS for Cc=PCR = 50 and

K = 1024 (slots) using CAC1. : : : : : : : : : : : : : : : : : : : : : : 92 5.11 Statistical multiplexing gain vs CLR vs CAC for Cc=PCR = 10,

MBS = 1000 (cells), and K = 256 (slots). : : : : : : : : : : : : : : : 93 5.12 Statistical multiplexing gain vs CLR vs K for Cc=PCR = 10 and

MBS = 1000 (cells) using CAC1. : : : : : : : : : : : : : : : : : : : : 94 5.13 Statistical multiplexing gain vs CLR vs MBS for Cc=PCR = 10 and

K = 1024 (slots) using CAC1. : : : : : : : : : : : : : : : : : : : : : : 94 5.14 Statistical multiplexing gain vs CLR vs Cc=PCR for MBS = 1000

(cells) and K = 256 (slots) using CAC1. : : : : : : : : : : : : : : : : 95 5.15 Statistical multiplexinggainvsCLRvsCc=PCR for MBS = 50 (cells)

and K = 1024 (slots) usingCAC1. : : : : : : : : : : : : : : : : : : : 96 5.16 Block diagram of system buers and the leaky bucket channel constraint. 99 5.17 Ri scheduling response as a function of Ei+Bei?1,Rimin, Rimax, and R. 100

5.18 Leaky bucket admission curvefor theSalesmansequence: ^R = 1135920 (bits/sec), Q = 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 101 5.19 Leaky bucket admission curve for the Claire sequence: ^R = 563520

(bits/sec), Q = 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 101 5.20 Leaky bucket admission curve for the Dave sequence: ^R = 730260

(bits/sec), Q = 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 102 5.21 Leaky bucket admission curve for the Tennis sequence: ^R = 5186340

(bits/sec), Q = 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 102 5.22 Leaky bucket admission curve for theCalendar MPEG sequence: ^R =

2236680 (bits/sec), L = 3 (frames). : : : : : : : : : : : : : : : : : : : 103 5.23 Leaky bucket admission curve for the Hockey MPEG sequence: ^R =

1940310 (bits/sec), L = 3 (frames). : : : : : : : : : : : : : : : : : : : 103 5.24 Statistical Multiplexing GainsvsSCR=PCRvsCLR for theSalesman

sequence using CAC1: Cc = 353207 (cells/sec), K = 256 (slots),

PCR = 3253 (cells/sec), NPRA = 103, Q = 12, L = 3 (frames). : : : : 105

(19)

5.25 Statistical Multiplexing Gains vs SCR=PCR vs CLR for the Claire

5.26 Statistical Multiplexing Gains vs SCR=PCR vs CLR for the Dave

5.27 Statistical Multiplexing Gains vs SCR=PCR vs CLR for the Tennis

5.28 Statistical Multiplexing GainsvsSCR=PCRvsCLR for theCalendar

MPEG sequence using CAC1: Cc = 353207 (cells/sec), K = 256

(slots), PCR = 6307 (cells/sec), NPRA = 53, L = 3 (frames). : : : : : 109

5.29 Statistical Multiplexing Gains vs SCR=PCR vs CLR for the Hockey

MPEG sequence using CAC1: Cc = 353207 (cells/sec), K = 256

(slots), PCR = 5518 (cells/sec), NPRA = 61, L = 3 (frames). : : : : : 110

6.1 Relationship betweenEi()?Ri() and ~QRM8()? ~Qi

?1 for the RM8

rate control algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : 123 6.2 Relationship between Ei()? Eitar() and ~QVB() ? ~Qi

?1 for the

virtual buer rate control algorithm. : : : : : : : : : : : : : : : : : : 126 6.3 Ei vsprediction error variance for the Salesman sequence: Q = 12. : : 130

6.4 Relative error Ei?Ei ?1

Ei for theSalesman sequence: Q = 12. : : : : : : 131 6.5 Normalized Autocorrelation Functions of Ei vs lag (frames) for the

Dave, Salesman and Claire sequences: Q = 12. : : : : : : : : : : : : : 131 6.6 Normalized Autocorrelation Functions of the number of bits per GOB

vs lag (frames) for the Dave, Salesman and Claire sequences: Q = 12. 132 6.7 EiPtar as a function of Qi?1. : : : : : : : : : : : : : : : : : : : : : : : 134

6.8 Ri vs trac constraint for the Salesman sequence: Qtar = 12, L = 3

(frames). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 140 6.9 PSNR vs trac constraint for the Salesman sequence: Qtar = 12,

L = 3 (frames). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 140

(20)

6.10 Quantizer stepsize histogram vs trac constraint for the Salesman

sequence: Qtar = 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : 141

6.11 Rivstrac constraint for theDavesequence: Qtar = 12,L = 3 (frames).141

6.12 PSNR vs trac constraint for the Dave sequence: Qtar = 12, L = 3

(frames). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 143 6.13 Quantizer stepsize histogram vs trac constraint for the Dave

se-quence: Qtar= 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 143

6.14 Ri vs trac constraint for the Claire sequence: Qtar = 12, L = 3

(frames). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 144 6.15 PSNR vs trac constraint for the Claire sequence: Qtar = 12, L = 3

(frames). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 144 6.16 Quantizer stepsize histogram vs trac constraint for the Claire

se-quence: Qtar= 12, L = 3 (frames). : : : : : : : : : : : : : : : : : : : 145

7.1 Increase in SMG as CLR is relaxed from CLR = 10?9 to CLR =

10?3: Tennis, Dave, Claire, and Salesman sequences. : : : : : : : : : 149

7.2 Two-Layer Video Encoder. : : : : : : : : : : : : : : : : : : : : : : : : 153 7.3 Enhancement-layer quantizer characteristic. : : : : : : : : : : : : : : 155 7.4 Enhancement-layer quantizer residual error characteristic.: : : : : : : 156 7.5 Two-Layer Video Decoder. : : : : : : : : : : : : : : : : : : : : : : : : 156 7.6 Ei (bits/frame) for the Tennis sequence coded with single-layer (Q =

12), requantization (Qb = 24, Qe= 12), and spectral separation (Sb =

32, Qb =Qe = 12) encoders. : : : : : : : : : : : : : : : : : : : : : : : 157

7.7 PSNR curves for the Tennis sequence coded with single-layer (Q = 12), requantization (Qb = 24, Qe= 12), and spectral separation (Sb =

32, Qb =Qe = 12) encoders. : : : : : : : : : : : : : : : : : : : : : : : 158

7.8 Ei (bits/frame) for the Claire sequence coded with single-layer (Q =

12), requantization (Qb = 24, Qe= 12), and spectral separation (Sb =

32, Qb =Qe = 12) encoders. : : : : : : : : : : : : : : : : : : : : : : : 159

(21)

7.9 PSNR curves for the Claire sequence coded with single-layer (Q = 12), requantization (Qb = 24, Qe= 12), and spectral separation (Sb =

32, Qb =Qe = 12) encoders. : : : : : : : : : : : : : : : : : : : : : : : 160

7.10 Two-layer peak and mean overhead and base-layer peak and mean fractions of the hybrid encoding method (Sb = 32,Qe= 12) relative to

a single-layer encoding (Q = 12) of theTennis sequence as a function of increasing Qb.: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 161

7.11 Two-layer peak and mean overhead and the base-layer mean fraction of the hybrid encoding method (Sb = 32,Qe = 12) relative to a

single-layer encoding (Q = 12) of the Tennis sequence as a function of the base-layer peak ratio. : : : : : : : : : : : : : : : : : : : : : : : : : : : 162 7.12 Two-layer peak and mean overhead and base-layer peak and mean

fractions of the hybrid encoding method (Sb = 32, Qe = 12) relative

to a single-layer encoding (Q = 12) of theClairesequence as a function of increasing Qb.: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 163

7.13 Two-layer peak and mean overhead and the base-layer mean fraction of the hybrid encoding method (Sb = 32,Qe = 12) relative to a

single-layer encoding (Q = 12) of the Claire sequence as a function of the base-layer peak ratio. : : : : : : : : : : : : : : : : : : : : : : : : : : : 163 7.14 Two-layer peak and mean overhead and base-layer peak and mean

fractions of the hybrid encoding method (Sb = 32, Qe = 12) with

Periodic GOB Replenishment, Block Error Thresholding (Tbe1 = 34

dB, Tbe2= 0:85), and Conditional Macroblock Replenishment (Tcmr =

37 dB), relative to a single-layer encoding (Q = 12) of the Tennis

sequence as a function of increasing Qb. : : : : : : : : : : : : : : : : : 171

7.15 Two-layer peak and mean overhead and base-layer peak and mean fractions of the hybrid encoding method (Sb = 32, Qe = 12) with

Periodic GOB Replenishment, Block Error Thresholding (Tbe1 = 34

dB, Tbe2= 0:85), and Conditional Macroblock Replenishment (Tcmr =

37 dB), relative to a single-layer encoding (Q = 12) of the Claire

sequence as a function of increasing Qb. : : : : : : : : : : : : : : : : : 172

7.16 Ei (bits/frame) for the Tennis sequence coded with single-layer (Q =

12), hybrid (Sb = 32, Qb = 24, Qe = 12), and ecient hybrid

(Pe-riodic GOB Replenishment, Block Error Thresholding, Conditional Macroblock Replenishment,Sb = 32, Qb = 24,Qe = 12,Tbe1= 34 dB,

Tbe2= 0:85, Tcmr = 37 dB) encoders. : : : : : : : : : : : : : : : : : : 173

(22)

7.17 PSNR curves for the Tennis sequence coded with single-layer (Q = 12), hybrid (Sb = 32, Qb = 24, Qe = 12), and ecient hybrid

7.18 Ei (bits/frame) for the Claire sequence coded with single-layer (Q =

12), hybrid (Sb = 32, Qb = 24, Qe = 12), and ecient hybrid

7.19 PSNR curves for theClairesequence coded with single-layer (Q = 12), hybrid (Sb = 32, Qb = 24, Qe = 12), and ecient hybrid (Periodic

GOB Replenishment, Block Error Thresholding, Conditional Mac-roblock Replenishment, Sb = 32, Qb = 24, Qe = 12, Tbe1 = 34 dB,

8.1 Eective bandwidth (bits/sec) vs SCR=PCR for the Claire sequence for CLR = 10?9, 10?3, with L = 3 and Q = 12. : : : : : : : : : : : : 177

8.2 Eective bandwidth (bits/sec) vs SCR=PCR for the Dave sequence for CLR = 10?9, 10?3, with L = 3 and Q = 12. : : : : : : : : : : : : 178

8.3 Eective bandwidth (bits/sec) vs SCR=PCR for the Salesman se-quence for CLR = 10?9, 10?3, with L = 3 and Q = 12. : : : : : : : : 179

8.4 Eective bandwidth (bits/sec)vs SCR=PCR for theTennis sequence for CLR = 10?9, 10?3, with L = 3 and Q = 12. : : : : : : : : : : : : 179

8.5 Two-layer VBR video UPC function. : : : : : : : : : : : : : : : : : : 181 8.6 Ri(bits/frame) for theClairesequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 192 8.7 PSNR (dB) for theClairesequence using a single-layerrate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 193

(23)

8.8 Ri(bits/frame) for theDavesequence using a single-layerrate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 195 8.9 PSNR (dB) for the Davesequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 196 8.10 Ri (bits/frame) for the Salesman sequence using a single-layer

rate-controlled encoder, a two-layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-layer rate con-trol: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : 199 8.11 PSNR (dB) for theSalesmansequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 200 8.12 Ri (bits/frame) for the Tennis sequence using a single-layer

rate-controlled encoder, a two-layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-layer rate con-trol: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : 201 8.13 PSNR (dB) for theTennissequence using a single-layer rate-controlled

encoder, a layer encoder with base-layer rate control, and a two-layer encoder with base- and enhancement-two-layer rate control: Qbtar = 24, Sbtar = 32, Qetar = 12. : : : : : : : : : : : : : : : : : : : : : : : : 202

(24)

INTRODUCTION

Recent developments in semiconductor performance, storage capacity, signal process-ing technology, and network capacity have made it possible to code, store, and trans-mit visual information in a digital format. Digital video coding will be an important component of future multimedia, teleconferencing, and entertainment applications. As such, video transmission is emerging as an important application for data commu-nications networks. As these networks increase in both capacity and ubiquity, they will become ideal vehicles for the delivery of entertainment programming, videocon-ference trac, and multimedia audio-video data streams, because they allow these applications to be integrated within a common network/computing infrastructure.

Current standards for digital video coding, such as the International Telegraph and Telephone Consultative Committee (CCITT) H.261 videoconferencing standard [1] and the International Organization for Standardization (ISO) Motion Picture Ex-perts Group (MPEG) video storage and transmission standards [2, 3, 4], are designed for transport over constant bit rate (CBR) networks, such as the existing digital tele-phone network. Because of their recursive nature, these algorithms are not robust in the presence of data loss. In particular, because of the interactivenature of teleconfer-encing applications, the H.261 algorithm has stringent latency requirements (< 150 msec) [5, 6]. Well-engineered circuit-switched networks are capable of providing the

(25)

quality-of-service (QoS) necessary for digital video transmission.

However, circuit-switched networks are not able to exibly allocate network ca-pacity to bursty data transmission sources. For this reason, the trend in networking design is towards packet-switched networks, which are better able to statistically mul-tiplex bursty sources and hence achieve higher network utilization. Packet-switched networks, especially those based on the asynchronous transfer mode (ATM), ap-pear to be a better choice for carrying integrated audio/video/data trac such as is prevalent in multimedia applications.

Packet-switched networks function by segmenting source data into packets, which may have variable size. Packets are queued at a transmitter or intermediate net-work node until transmission capacity is available on an outgoing link (based on a medium access control protocol). Due to the asynchronous nature of packet-switched networks, video transport faces issues of delay variation, packet loss and bandwidth allocation not present in circuit-switched networks. Retransmission of lost data is generally not feasible, due to the end-to-end latency constraints of video conferencing and distribution applications. Therefore, standard video coding algorithms need to be restructured to address these packet transmission issues.

Adding packet video support to existing video coding standards is desirable, since current investments in hardware and applications can be preserved. In addition, packet video algorithms should take advantage of the unique features of packet-switched networks. Specically, packet-packet-switched networks inherently support sta-tistical multiplexing, meaning that applications need not request a xed data rate. Since the complexity of scene contents and the amount of motion in video sequences varies randomly in time, ecient video coding algorithms generate variable bit rates. CBR encoders control the transmitted bit rate through the use of rate buers and

(26)

adaptive control of the compression ratio. Since the quality of a video sequence will be judged based on the minimum quality delivered over time, CBR codecs (

co

dec-dec

oder) must never exceed the maximum compression ratio that corresponds to this minimum quality. Consequently, CBR codecs must request enough bandwidth from the network to handle peak rate scenes (ie., with heavy motion content) at the maximum compression ratio. This rate will be equivalent to the peak rate generated by a buered variable bit rate (VBR) encoder operating at the same compression ratio. This means that VBR video transport is generally more bandwidth ecient than CBR transport at a given minimum quality level, since a VBR encoder will not transmit at its peak rate continuously [7].

The utilization of a packet-switched network is a function of the burstiness of its data sources and the desired packet loss ratio. Bursty sources with stringent loss rate requirements must be transmitted at a lower utilization than less bursty sources with lower loss rate requirements. The potential statistical multiplexing gain (SMG) of VBR packet video is dicult to quantify due to the bursty nature of VBR video encoders. Use of decoder delay and encoder/decoder buers can smooth the video bit rate for short-term bursts, but cannot smooth long-term rate variations. Service-specic trac constraints will be required to allow the network to provide a high QoS to each connection while achieving satisfactory utilization. Some form of encoder rate control is required to ensure that the video data rate meets the negotiated trac constraints.

The MPEG algorithm was designed specically for entertainment and multime-dia video applications requiring video cassette recorder (VCR) quality resolution and detail. These applications are CBR and typically operate at 1.5 Mbps.

Higher-resolution video can be supported by the MPEG-2 algorithm, which typically oper-ates at roper-ates of 4{16 Mbps [8]. Entertainment video typically has more motion and

(27)

scene complexity than videoconferencing applications, which generally display one or more participants on a constant background with minimal panning and zoom-ing. CBR videoconferencing applications generally require from 128 to 768 kbps of bandwidth. Lower bandwidth is required for low-resolution videophone applications, while higher bandwidth is required for higher-resolution videoconferencing applica-tions that display multiple participants. In each case the video applicaapplica-tions require signicantly more bandwidth than digital telephony (64 kbps).

H.261 and MPEG are both Discrete Cosine Transform (DCT) based interframe coding algorithms, ie., they code the dierence between a current frame and a motion-compensated prediction of that frame based on previous (and future (MPEG)) frames [1, 3] by using the DCT. Headers, addressing elds, and motion vectors must be delivered with high reliability, otherwise lost data will cause errors to propagate over several frames. The network must be able to provide video sources a guaranteed QoS (ie., bandwidth, packet loss rate, delay). Because video sources are bursty, a network multiplexing video sources cannot operate at high utilization levels while maintaining low loss rates and low delay [7]. Therefore, high-QoS bandwidth is an expensive network resource.

It is possible to segment video data into two or more priority layers, and various schemes to accomplish this have been developed [9, 10, 11, 12, 13, 14, 15]. The segmentation is determined based on perceptual criteria, and can be performed either in the sequency domain (the DCT analog of the discrete frequency domain), or by separate quantization of the video data in separate layers. Essential video data is transmitted in a high-priority base layer. The encoder negotiates a high QoS for this layer. Less essential video data is transported in one or more enhancement layers. To reduce the sensitivity of the encoder to loss in the enhancement layer it is necessary that the enhancement-layer video signal be coded in intraframe mode. In this way

(28)

loss of some of the enhancement-layer data will not signicantly aect the overall video quality, and therefore the enhancement layer can be transmitted with lower QoS than the base layer. Layered video coding will be of benet only if 1) the base-layer's rate is lower than the corresponding single-layer rate; 2) if the total rate of the two layers is not signicantly greater than the single-layer rate, and 3) if the low-QoS bandwidth used to transport the enhancement layer is a \cheaper" resource than the equivalent high-QoS bandwidth.

The objective of this research is to determine whether a VBR video codec can achieve a signicant statistical multiplexing gain over a CBR codec delivering a video signal with equivalent quality. An additional objective is to determinewhether a two-layer VBR codec can deliver additional statistical multiplexing gains over an equiv-alent single-layer codec. An emphasis is placed on videoconferencing-type service. It is believed that by applying peak-rate control to the base layer and by allocating to the enhancement layer the excess trac generated during intraframe bursts and sustained image activity, that the two-layer encoder can achieve superior statistical multiplexing gains.

The solutions pursued to the problem under consideration must adhere to the following constraints:

1. The video source trac is policed so as to comply with pre-negotiated deter-ministic trac descriptors.

2. The transmission rate of the video codec is controlled so as to comply with the source's policing function.

3. Sources are allocated to the network using a conservative connection admission control algorithm which determines allocations based only on the source's

(29)

deterministic trac descriptors.

4. Video delay variation constraints limit the size of buers internal to network nodes.

The video sources examined are not analyzed to determine a statistical trac model. Instead, the parameters of a leaky bucket policing function which will admit a video source without loss are computed. The leaky bucket parameters map to a unique set of deterministic trac descriptors (peak rate, sustained rate, and maximum burst size) which can describe the source to the network. The network is assumed to determine source allocations based only on the trac descriptors negotiated at the beginning of each video connection setup. The network is assumed to have no special knowledge of video trac statistics and hence must assume that each video source behaves as a case source. The network admits a source only when the worst-case behavior would not violate the specied packet loss requirements.

This report is organized as follows. Chapter 1 is an introduction to the packet-video problem being examined. Chapter 2 is a basic review of classic intraframe and interframe video coding algorithms, with a detailed examination of the H.261 coding standard. Chapter 3 is a review of the QoS issues faced in packet-switched networks. Basic ATM concepts are described here. Chapter 4 examines the service requirements of packet-video applications, and presents the rate and error statistics of ve video sequences encoded by a H.261-based single-layer VBR encoder. Chapter 5 examines two conservative connection admission control algorithms, and presents simulation results using the video sequences to determine the statistical multiplexing gains achieved using standardized trac descriptors. Chapter 6 presents a deriva-tion of the instantaneous bounds on the encoding and transmission rates which are imposed by 1) the xed encoder/decoder buer sizes; 2) the xed decoder playout

(30)

latency; and 3) the leaky bucket trac constraint. A novel VBR rate control algo-rithm which decouples quantizer stepsize selection from the encoder buer occupancy is presented. The performance of this rate control algorithm is examined under both CBR and VBR trac constraints. Chapter 7 describes a proposed two-layer video codec architecture which is a hybrid of the requantization and spectral separation layering techniques. Modications made to improve the two-layer coding eciency are described and their impact on performance is measured. Chapter 8 presents a two-layer connection admission control technique and a two-layer VBR rate con-trol algorithm. The additional statistical multiplexing gains achieved via two-layer transmission are calculated. Chapter 9 examines related research eorts and com-pares their approaches and results to those presented here. Chapter 10 concludes the report with a discussion of the results and with suggestions for future research.

(31)

REVIEW OF IMAGE CODING ALGORITHMS

Substantial image data rate compression is necessary to support the economic trans-mission of full-motion color video. This chapter reviews the classic algorithms used to code digital video for storage and transmission. Issues in digital image representation and error measures are introduced. An overview of dierential, transform, subband, and vector quantization coding of images is presented, including comparisons and performance gures. In addition, techniques for removing temporal redundancy, in-cluding conditional replenishment and motion-compensated predictive coding, are discussed. A detailed analysis of DCT-based motion-compensated interframe predic-tion algorithms is included, and references to the CCITT H.261 videoconferencing standard and the ISO MPEG video coding standard are made.

2.1 Digital Image Representation

Transmission of video signals over digital telecommunications networks requires the transformation of a continuous image eld into the discrete domain. An analysis of image coding techniques should be preceded by a discussion of the issues involved in the representation of discrete images.

(32)

As a consequence of Shannon's sampling theorem, it is known that a continuous image can be preserved if it is sampled at its Nyquist rate. Since continuous im-ages are essentially not band-limited, the chosen image sampling rate will dene the resolution and hence the detail of the reproduced image. Typical television images have a resolution of approximately 500500 pels (although the reproduced

resolu-tion is often less). Typical consumer VCRs deliver an image resoluresolu-tion of 200300

pels. Band-limiting must be performed by the image-capture device (video camera, scanner) prior to sampling to prevent aliasing.

The sample values obtained from the image-capture device will lie in the continu-ous domain and must be quantized for digital storage or transmission. The continucontinu-ous image samples are usually mapped into a nite set of discrete amplitudes which span the intensity range of the image. This quantization process, where each quantized amplitude is represented by a unique digital code word, is known as pulse code mod-ulation (PCM). Monochrome images are usually uniformly quantized at 8 bits/pel (256 levels). If less than 6 bits/pel (64 levels) are used for uniform quantization, then contouring eects become visible in the reproduced image [16]. Non-uniform image quantization characteristics, such as those that attempt to match the contrast sensitivity of the human visual system (HVS) are also possible [16, 17].

Color images inherently require greater data capacity than their monochromatic equivalents. Due to the trichromatic response of the HVS, (most of) the gamut of visible colors can be reproduced by a linear combination of three orthogonal primary colors. This is the basis of operation for color cathode ray tube (CRT) displays, where each color pel is represented by a red, green, and blue phosphor dot. A com-mon color space for image processing is the National Television Standards Committee (NTSC)RNGNBN space. Each pel is represented by a 3-vector representing the

(33)

close match). Typical \full-color" images are represented with 24 bits/pel (8 bits/pel for each red, green, and blue component). Although use of this color space is intu-itive, image processing need not be conned to a three primary space, but can also be performed in a luminosity/chromaticity space, such as the NTSC Y IQ space or the Y UV space. The color coordinates for these spaces can be found by a linear transformation of the RNGNBN coordinates [18]. The advantage of processing in

a luminosity/chromaticity space is that the chrominance frequency response of the HVS is shifted towards the lower spatial frequencies as compared to the luminosityre-sponse [18, 19]. Compression gains can be achieved by subsampling the chrominance components of an image while still maintaining good subjective image quality.

Image coding is the application of image capture, pre-processing, compression, and possibly post-processing techniques such that a continuous image eld can be accurately and eciently represented in the digital domain. Compression algorithms are generally characterized as either lossless or lossy. In lossless algorithms, the original quantized sample values can be exactly recovered, assuming no bit errors in storage or transmission. The lossless algorithms generally are based on entropy-coding; more probable sample values (or blocks of sample values) are assigned shorter code words so that the overall bit rate is reduced. Examples of lossless coding algorithms are the Human algorithm, arithmetic coding, and run-length coding [19]. Usually the lossless algorithms achieve a compression ratio of only 2:1. To

achieve higher compression ratios, lossy algorithms are used. In the lossy algorithms distortion is introduced such that the original sample values can no longer be exactly recovered (note that the quantization process also introduces distortion which is inherent in any conversion from the continuous to the digital domain). The common lossy image compression algorithms are predictive coding, transform coding, subband coding, and vector quantization.

(34)

When comparing the performance of various lossy compression algorithms, it is important to have image delity measures which are both mathematically tractable and easily computable. The most commonly used delity measure is the average least squares error (LSE="lse), which is an approximation of the mean square error

(MSE) and is used when the statistics of the image ensemble are not known [19]. The image signal-to-noise ratio (SNR=?10log("lse=

2_{)) can be dened as the ratio}

of the image power 2 _{to the LSE power (for 8-bit images). An alternative measure,}

the peak signal-to-noise ratio (PSNR = ?10log("lse=255255)) is dened as the

ratio of the squared maximum peak-to-peak value of the image to the LSE power. The value of PSNR is generally 12{15 dB larger than SNR [19]. The LSE measure is mathematically attractive since it can be easily applied when optimizing compression algorithms; however, its performance does not correlate well with subjective evalu-ations of image degradation. This is because the LSE averages impairments over the entire image; large local distortions which are most visually objectionable do not signicantly eect the LSE. Other delity measures which try to incorporate HVS properties are also possible, but they are generally harder to compute [16]. Generally PSNR values below 30 dB indicate noticeable image degradation.

2.2 Classical Intraframe Coding Techniques

2.2.1 Predictive Coding

The number of bits/pel required to accurately represent an image (with low distor-tion) is a function of the intensity range of the image and the variance of the pel values. An image with low pel variance can be represented with few quantization levels while still maintaining low quantization distortion; conversely, an image with high pel variance will require more quantization levels to maintain low quantization

(35)

distortion and consequently will require more bits/pel. The design of quantizers that minimize mean squared distortion for a given sample probability density function (pdf) is discussed in [16, 19].

For lossy compression algorithms, the achievable compression ratio is a function of the tolerated LSE. For an image which is an uncorrelated random eld, the achiev-able compression ratio is solely a function of the pel variance. In practice, images exhibit statistical pel correlation over various regions. This correlation can be used to reduce the number of quantization levels needed to represent a pel, increasing the compression ratio for a given distortion level.

In dierential pulse code modulation (DPCM), a prediction of a pel is computed from a function of previous pel values, and the dierence between the prediction and the actual pel value is quantized and coded. Normally the predictor is a causal nite impulse response (FIR) lter and is designed based on an autoregressive (AR) model of the image pel sequence. If the AR model is accurate, then the predictor error sequence will have reduced variance as compared to the pel sequence, and fewer quantization levels will be required for the same distortion level. The pel predictor is usually preceded by the quantizer in a feedback loop; the pel prediction is based on the quantized values of the previous pels [19]. In this case the chosen AR model may not be optimum, but quantization errors cannot accumulate. The coded pel values are reconstructed using a replica of the predictor loop. Note that if the pel values fed to the DPCM encoder are already PCM quantized, then the error sequence can be represented as a sequence of integers and can be coded without distortion using an entropy coder.

Linear DPCM predictors are often designed based on a stationary p-th order AR model of the image data. Experiments have shown that when the predictor

(36)

coecients match the picture statistics, then lter orders greater than 3 do not yield substantial gains in MSE performance; however, if the coecients do not match, then MSE decreases are small for lter orders greater than 1 [17]. two-dimensional (2D) causal predictors can also be used; these tend to improve the subjective rendition of vertical edges [17]. Typically, a 2D pel predictor has non-zero coecients for the three neighbor pels to the upper left (assuming scanning progresses to the right and down) [19]. Because image statistics are generally nonstationary, it is advantageous to vary the predictor model based on the local image characteristics. This is often accomplished by measuring the directional correlation in a region and switching to an appropriate predictor [17].

The pdf of the DPCM prediction error is usually modeled as a Laplacian distri-bution. The optimal quantizer for such a pdf will be non-uniform. Generally the quantizer is either chosen to be a non-uniform Lloyd-Max quantizer, or a uniform quantizer followed by an entropy coder [17, 19]. An alternative means of design-ing the error quantizer is to minimize the mean square subjective error based on some predened visual delity criterion; experiments reveal that this technique can yield gains of 1 bit/pel over a Lloyd-Max quantizer [16]. Varying the quantizer

characteristic to account for the nonstationarity of the image statistics can lead to performance gains; this can be accomplished by adapting to the local quantization error variance or to some psychovisual criterion [17].

For most images, one-dimensional (1D) DPCM yields an 8-10 dB improvement in SNR over PCM at 1{3 bits/pel. For 2D DPCM, the theoretical SNR improvement over PCM is approximately 20 dB, or about 3.25 bits/pel. Typically, compression ratios of 3{3.5:1 can be achieved for 2D DPCM. If the quantizer is designed based on HVS properties, then compression ratios of 4{5:1 can be achieved at 30 dB PSNR [16]. Entropy coding of a Lloyd-Max quantizer output yields about 1 bit/pel or 6 dB

(37)

SNR improvement [17].

The primary advantage of DPCM as an image compression algorithm is that its simplicity leads to an economical hardware implementation. The primary disadvan-tage of DPCM is that the maximum achievable compression ratio for low distortion reproduction is low. Also DPCM decoders are sensitive to bit errors in the transmis-sion channel, since the decoder forms an innite impulse response (IIR) lter loop. Care must be taken to insure that the prediction lter is stable, so that the artifacts caused by bit errors decay rapidly [17].

2.2.2 Transform Coding

The goal of DPCM image coding is to map the image pels into a set of values that are uncorrelated and that have reduced energy, so that coding gains can be achieved via reduced quantization resolution. Because DPCM bases its predictions on causal one or two-dimensional lters, maximal pel decorrelation cannot be achieved since neighboring pels lying in the \future" can contribute to prediction accuracy. One technique that can improve compression performance is transform coding. Here, an image is segmented into multipleM N blocks, and each block is transformed into

a new domain using a unitary (energy-preserving) transform. The resulting M N

transform coecients should be uncorrelated and should exhibit considerable energy compaction into only a few coecients. The transform coecients correspond to the weights of the transform basis functions needed to reproduce the original block. For correlated images, most energy is compacted into the coecients of the low-frequency basis functions [19]. Compression is achieved by observing the variance of transform coecients over an ensemble of image blocks, determining the variance and pdf of each coecient, and designing quantizers for each coecient that yield acceptable

(38)

image reproduction while reducing the number of bits needed to code the block. It has been observed that for the various proposed image transforms, energy compaction improves for larger block size; however the gains are usually small beyond block sizes of 1616 , and hardware implementation is simplied for smaller blocks [17].

For an ensemble of image blocks with a known covariance matrix, the Karhunen-Loeve (KL) transform exhibits optimal decorrelation and energy compaction perfor-mance over the ensemble [17, 16]. The basis functions for the KL transform are the eigenvectors of the covariance matrix; the transform coecients are the correspond-ing eigenvalues. The minimum MSE representation that can be achieved uscorrespond-ing only K basis functions is the set of basis functions corresponding to and weighted by the K largest eigenvalues. The KL transform is not very useful for coding purposes since the basis functions vary with the image statistics, and no general fast KL algorithm exists [16].

Alternative transforms that have deterministic basis functions are the Hadamard, Haar, Slant, Discrete Fourier (DFT), Discrete Sine (DST), and Discrete Cosine (DCT) transforms. All exhibit good energy compaction and have fast algorithms. The DCT, which belongs to a family of sinusoidal transforms, is particularly suitable for image coding, with compaction performance nearly identical to the KL transform for highly correlated rst-order Markov sequences ( > 0:5) [16]. The DCT requires only real arithmetic and can be computed using an algorithm similar to the fast Fourier transform (FFT) (O(N2_log₂_{N) for N}N blocks). The DCT has been

cho-sen as the coding transform for the American National Standards Institute (ANSI) Joint Photographic Experts Group (JPEG) image coding standard [20], the MPEG video coding standard [2], and the H.261 video teleconferencing standard [21]. Al-though other transforms, such as the Hadamard, have much simpler computational requirements,their reduced energy compaction performance as compared to the DCT

(39)

prevents their use in high-compression applications.

Compression is attained when using the DCT by reducing the precision used to represent the transform coecients. The best visibly acceptable compression is achieved by maintaining high precision (many quantization levels) for low-frequency components, while reducing the number of levels allocated to the higher-frequency components to which the HVS is less sensitive. By observing the transforms of ensembles of image blocks it is possible to determine the variance and pdfs of the various transform coecients. Usually the lowest-frequency (DC) component is mod-eled with a Rayleigh density, while the other coecients are modmod-eled as zero-mean Gaussian or Laplacian densities [17, 19]. The minimum MSE bit allocation within a block will allocate more bits to the lower-frequency coecients. Compression can be achieved by applying zonal ltering, where only a subset of coecients with the highest ensemble variance are quantized and the rest are thrown away [19]. The zonal lter mask can be static or adaptive; image blocks can be classied as belonging to dierent activity classes, each with its own zonal mask and quantization rule. Ex-amples of possible activity classes are those which exhibit predominantly horizontal, vertical, diagonal, or no structure [22].

An alternative to zonal ltering is threshold coding, where the coecients of a block with energy exceeding a given threshold are quantized and the others are thrown away [19]. The decision threshold can vary with the compression ratio, and should be set based on HVS visibility properties. Threshold quantization oers im-proved performance over the use of a xed zonal mask, since the quantization rule can adapt to the varying block statistics. A disadvantage is that addressing infor-mation of the quantized coecients must also be coded; this can be accomplished by run-length coding of the transition boundaries of the quantized coecients [19]. Usually the coecients are scanned in a zig-zag pattern from the lowest frequency

(40)

coecient up. In addition to adaptively selecting the coecients to be coded, the quantization levels themselves can be varied according to changes in the coecients' variances [16]. In either case, each block may be coded with a varying number of bits; image quality at a xed compression ratio is achieved by allocating more bits to blocks of higher energy.

In general, the DCT yields higher compression ratios than DPCM for a given subjective image quality. PSNR values greater than 30 dB have been achieved when adaptively DCT coding a monochrome image at 0.5 bit/pel, yielding a compression ratio of 8{16:1 [19]. At high compression ratios, block boundaries can become visible. This eect can be reduced by low-pass-ltering the image [22], or by recursive block coding, where adjacent blocks overlap [16]. In either case, for a xed bit rate, image resolution is reduced. The DCT algorithm is much more dicult than the DPCM algorithm to implement in hardware for real-time performance; however, recent ad-vancements in very-large scale integration (VLSI) processor performance allow the DCT to be utilized in real-time image coders [23, 24, 25].

2.2.3 Subband/Wavelet Coding

Transform compression algorithms take advantage of the non-uniform spatial fre-quency sensitivity of the HVS by allocating more quantization levels to low frefre-quency transform coecients of the image where quantization distortion is most percepti-ble. However, in the case of the DCT, the spectra of the basis functions contain substantial energy over the normalized frequency range (0;) (one-dimensional case) [26]. Better performance can be achieved if narrow baseband and passband lters are used to decompose the image, since greater control of the quantization noise spectrum can be realized [27]. This is the approach taken by subband image decomposition

(41)

techniques.

In subband image coding, the image being coded is fed into a lter bank con-sisting of two or more two-dimensional lters. In a common example, the image is decomposed using four lters, one of which is lowpass in both the horizontal and vertical directions, two of which are lowpass in one direction and highpass in the other, and one of which is highpass in both directions [28]. These lters are designed to have narrow transition bands. Because the output of each lter has one-half the bandwidth of the image in each direction, each lter output can be decimated (sub-sampled) by a factor of 2:1 in each direction (a reduction in sample points of 4:1 per lter output). The decimated signals form four subimages; the total number of samples from the four lter outputs equal the number of pels in the original image. If the lters are ideal, then the subimages can be combined to reconstruct the original image after upsampling each subimage and interpolating between null sample points using a reconstruction (synthesis) lter.

Non-idealities in subband lter implementation can lead to aliasing errors since lters with non-zero transition bands are subsampled at their cuto bandwidth fre-quencies. A realizable lter function set which yields zero aliasing error is based on the quadrature mirror lter (QMF) [28]. Near-exact image reconstruction can be achieved using a QMF decomposition if no quantization distortion is introduced in the subbands. 2-D QMFs are usually implemented as separable FIR lter structures and have the property that the transition bands of two lters adjacent in frequency response have mirror symmetry. It has been shown that lter design and aliasing errors are visually insignicant in comparison to quantization error when 12 or more taps are used in the QMF implementation [29]. Increasing the number of taps results in a narrower transition band for each lter, but also results in greater computational requirements. Subband image decomposition using QMFs is alternatively referred to

(42)

as wavelet decomposition [30, 31]. Alternatives to QMFs utilizing fewer lter taps include symmetric short kernel lters [32] and IIR lters [33].

Various decomposition lter bank architectures have been proposed, incorporat-ing 8, 11, and 16 subbands. Codincorporat-ing gains can be realized by applyincorporat-ing dierent quantizer characteristics to each subband. Fewer quantization levels are needed for satisfactory coding of edges, which are found in the high-frequency subbands [27]. The pdf of subband sample values is usually well modeled by a Laplacian distribu-tion [28, 27]. Substantial pel-to-pel correladistribu-tion exists in the lowest-frequency sub-band, and therefore DPCM quantization is often applied here. Less correlation is observed in the higher-frequency subbands; PCM quantization is usually applied in these bands. The Lloyd-Max quantizer for each subband often does not produce the best subjective results since quantization levels are clustered in the region of low sample amplitude, where distortion is least perceptible. The quantizer characteris-tics are often modied to include a large dead zone [27, 32]. Entropy coding of the coded sample values and run-length coding of the addresses of non-zero samples in the high-frequency subbands can further reduce the required bit rate. Various ap-proaches have been investigated to determine the optimal assignment of quantization levels to the subbands [34]. One technique is to use spatially varying quantizers that increase the quantizer resolution in areas of a subband with high activity [28].

Subband image coding generally exhibits an increase in SNR of 0.6-1.4 dB vs

88 block DCT coding at the same compression ratio [35]. At low bit/pel levels,

subband coding produces a more subjectively pleasing result than DCT coding due to the elimination of block boundary eects. Because the subband lters are shift invariant, hardware implementation may be easier than DCT coding. Since linear convolution of an image with a lter function produces a larger resulting image, image extension methods such as circular extension and symmetric extension are usually

(43)

implemented so that image truncation can be applied without introducing distortion in the reconstruction phase [33].

2.2.4 Vector Quantization

DPCM, transform, and subband coding all achieve compression gains by taking ad-vantage of the correlated structure of images to reduce the number of bits required to represent an image with sucient delity. Each technique functions by transforming the pels of the original image into a new domain using a one-to-one mapping with memory; the new elements are quantized as scalars. One consequence of Shannon's rate-distortion theory is that vectors of elements can be coded more eciently (with less distortion) than separately quantizing the scalar elements, even when the scalars are uncorrelated or independent [36]. Shannon's theory does not indicate how such an optimal vector quantizer would be designed; over the past decade various vector quantization techniques for image coding have been proposed in the literature [37].

A typical vector quantizer for image coding functions by breaking the image into M N pel blocks, where each pel is PCM coded and can take on one of K possible

values. The set of all possible image blocks has KMN _{elements; each possible block}

can be thought of as a vector in a Euclidean space of dimensionMN [37]. Compres-sion is achieved by determining a smaller set of reproduction vectors and mapping each possible image block to the nearest reproduction vector (in a distortion sense). A typical distortion measure is the LSE, which in a Euclidean space corresponds to vector distance. The address of the selected reproduction vector is transmitted (or stored), and the receiver reconstructs the compressed image by using the address to access a codebook of reconstruction vectors.

The most common technique for developing the reconstruction codebook is the