• No results found

Video Processing & Communications - Wang

N/A
N/A
Protected

Academic year: 2021

Share "Video Processing & Communications - Wang"

Copied!
628
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Errata for

VIDEO PROCESSING AND COMMUNICATIONS

Yao Wang, Joern Ostermann, and Ya-Qin Zhang

(©2002 by Prentice-Hall, ISBN 0-13-017547-1)

Updated 6/12/2002

Symbols Used

Ti = i-th line from top; Bi = i-th line from bottom; Fi = Figure i, TAi = Table i,

Pi=Problem i,E(i)=Equation(i), X -> Y = replace X with Y

Page Line/Fig/Tab Corrections

16

F1.5

Add an output from the demultiplexing box to a microphone at the

bottom of the figure.

48

B6,

E(2.4.4)-E(2.4.6)

Replace “v_x”, “v_y” by “\tilde v_x”, “\tilde v_y”

119

E(5.2.7)

C(X)->C(X,t),r(X)->r(X,t),E(N)->E(N,t)

125

F5.11

Caption: “cameras”-> “a camera”, “diffuse”-> “ambient”

126

T7

“diffuse illumination”-> “ambient illumination”

133

B10

T_x,T_y,T_z -> T_x,T_y,T_z, and Z

B4

Delete “when there is no translational motion in the Z direction, or”

B2

“aX+bY+cZ=1” -> “Z=aX+bY+c”

Before

E(5.5.13)

Add “(see Problem 5.3)” after “before and after the motion”

138

P5.3

“a planar patch” -> “any 3-D object”, “projective mapping”->Equation

(5.5.13)”

P5.4

“Equation 5.5.14”-> “Equation (5.5.14)”,

“aX+bY+cZ=1”-> “Z= aX+bY+c”

143

T4

After “true 2-D motion.” Add “Optical flow depends on not only 2-D

motion, but also illumination and object surface texture.”

159

T6

After “block size is 16x16” add “, and the search range is 16x16”

189

P6.1

“global”->”global-based”

190

P6.12

Add at the end “Choose two frames that have sufficient motion in

between, so that it is easier to observe the effect of motion estimation

inaccuracy. If necessary, choose frames that are not immediate

neighbors.”

199

T9

“Equation (7.1.11) defines a linear dependency … straight line.” ->

“Equation (7.1.11) says that the possible positions x’ of a point x after

motion lie on a straight line. The actual position depends on the

Z-coordinate of the original 3-D point.”

200

B8

“[A]” -> “[A]^T [A]”

214

P7.5

“Derive”-> “Equation (7.1.5) describes”

Add at the end “(assuming F=1)”

P7.6

Replace “\delta” with “\bf \delta”

218

F8.1

“Parameter statistics” -> “Model parameter statistics”

247

F8.9

Add a box with words “Update previous distortion \\ D_0=D_1” in the

line with the word “No”.

(3)

255

F8.14

Same as for F8.9

261

P8.13(a)

“B_l={f_k, k=1,2,… ,K_l}” -> “B_l, which consists of K_l vectors in

{\cal F}”

416

TA13.2

Item “4CIF/H.263” should be “Opt.”

421

TA13.3

Item “Video/Non-QoS LAN” should be “H.261/3”

436

T13

“MPEG-2, defined” -> “MPEG-2 defined”

443

T10

“I-VOP”->”I-VOPs”, “B-VOP”-> “B-VOPs”

575

P1.3

“red+green=blue”-> “red+green=black”

P1.4

“(1.4.4)” -> “(1.4.3)”, “(1.4.2)” -> “(1.4.1)”

(4)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

PREFACE

xxi

GLOSSARY OF NOTATIONS

xxv

1

VIDEO FORMATION, PERCEPTION,

AND REPRESENTATION

1

1.1

Color Perception and Specification

2

1.1.1

Light and Color, 2

1.1.2

Human Perception of Color, 3

1.1.3

The Trichromatic Theory of Color Mixture, 4

1.1.4

Color Specification by Tristimulus Values, 5

1.1.5

Color Specification by Luminance and Chrominance

Attributes, 6

1.2

Video Capture and Display

7

1.2.1

Principles of Color Video Imaging, 7

1.2.2

Video Cameras, 8

1.2.3

Video Display, 10

1.2.4

Composite versus Component Video, 11

1.2.5

Gamma Correction, 11

1.3

Analog Video Raster

12

1.3.1

Progressive and Interlaced Scan, 12

1.3.2

Characterization of a Video Raster, 14

(5)

wang-50214

wang˙fm

August 23, 2001

14:22

x

Contents

1.4

Analog Color Television Systems

16

1.4.1

Spatial and Temporal Resolution, 16

1.4.2

Color Coordinate, 17

1.4.3

Signal Bandwidth, 19

1.4.4

Multiplexing of Luminance, Chrominance, and Audio, 19

1.4.5

Analog Video Recording, 21

1.5

Digital Video

22

1.5.1

Notation, 22

1.5.2

ITU-R BT.601 Digital Video, 23

1.5.3

Other Digital Video Formats and Applications, 26

1.5.4

Digital Video Recording, 28

1.5.5

Video Quality Measure, 28

1.6

Summary

30

1.7

Problems

31

1.8

Bibliography

32

2

FOURIER ANALYSIS OF VIDEO SIGNALS AND

FREQUENCY RESPONSE OF THE HUMAN

VISUAL SYSTEM

33

2.1

Multidimensional Continuous-Space Signals and Systems

33

2.2

Multidimensional Discrete-Space Signals and Systems

36

2.3

Frequency Domain Characterization of Video Signals

38

2.3.1

Spatial and Temporal Frequencies, 38

2.3.2

Temporal Frequencies Caused by Linear Motion, 40

2.4

Frequency Response of the Human Visual System

42

2.4.1

Temporal Frequency Response and Flicker Perception, 43

2.4.2

Spatial Frequency Response, 45

2.4.3

Spatiotemporal Frequency Response, 46

2.4.4

Smooth Pursuit Eye Movement, 48

2.5

Summary

50

2.6

Problems

51

2.7

Bibliography

52

3

VIDEO SAMPLING

53

3.1

Basics of the Lattice Theory

54

3.2

Sampling over Lattices

59

3.2.1

Sampling Process and Sampled-Space Fourier Transform, 60

3.2.2

The Generalized Nyquist Sampling Theorem , 61

(6)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

xi

3.2.4

Implementation of the Prefilter and Reconstruction Filter, 65

3.2.5

Relation between Fourier Transforms over Continuous, Discrete,

and Sampled Spaces, 66

3.3

Sampling of Video Signals

67

3.3.1

Required Sampling Rates, 67

3.3.2

Sampling Video in Two Dimensions: Progressive versus

Interlaced Scans, 69

3.3.3

Sampling a Raster Scan: BT.601 Format Revisited, 71

3.3.4

Sampling Video in Three Dimensions, 72

3.3.5

Spatial and Temporal Aliasing, 73

3.4

Filtering Operations in Cameras and Display Devices

76

3.4.1

Camera Apertures, 76

3.4.2

Display Apertures, 79

3.5

Summary

80

3.6

Problems

80

3.7

Bibliography

83

4

VIDEO SAMPLING RATE CONVERSION

84

4.1

Conversion of Signals Sampled on Different Lattices

84

4.1.1

Up-Conversion, 85

4.1.2

Down-Conversion, 87

4.1.3

Conversion between Arbitrary Lattices, 89

4.1.4

Filter Implementation and Design, and Other Interpolation

Approaches, 91

4.2

Sampling Rate Conversion of Video Signals

92

4.2.1

Deinterlacing, 93

4.2.2

Conversion between PAL and NTSC Signals, 98

4.2.3

Motion-Adaptive Interpolation, 104

4.3

Summary

105

4.4

Problems

106

4.5

Bibliography

109

5

VIDEO MODELING

111

5.1

Camera Model

112

5.1.1

Pinhole Model, 112

5.1.2

CAHV Model, 114

5.1.3

Camera Motions, 116

5.2

Illumination Model

116

(7)

wang-50214

wang˙fm

August 23, 2001

14:22

xii

Contents

5.2.2

Radiance Distribution under Differing Illumination and Reflection

Conditions, 117

5.2.3

Changes in the Image Function Due to Object Motion, 119

5.3

Object Model

120

5.3.1

Shape Model, 121

5.3.2

Motion Model, 122

5.4

Scene Model

125

5.5

Two-Dimensional Motion Models

128

5.5.1

Definition and Notation, 128

5.5.2

Two-Dimensional Motion Models Corresponding to Typical Camera

Motions, 130

5.5.3

Two-Dimensional Motion Corresponding to Three-Dimensional Rigid

Motion, 133

5.5.4

Approximations of Projective Mapping, 136

5.6

Summary

137

5.7

Problems

138

5.8

Bibliography

139

6

TWO-DIMENSIONAL MOTION ESTIMATION

141

6.1

Optical Flow

142

6.1.1

Two-Dimensional Motion versus Optical Flow, 142

6.1.2

Optical Flow Equation and Ambiguity in Motion Estimation, 143

6.2

General Methodologies

145

6.2.1

Motion Representation, 146

6.2.2

Motion Estimation Criteria, 147

6.2.3

Optimization Methods, 151

6.3

Pixel-Based Motion Estimation

152

6.3.1

Regularization Using the Motion Smoothness Constraint, 153

6.3.2

Using a Multipoint Neighborhood, 153

6.3.3

Pel-Recursive Methods, 154

6.4

Block-Matching Algorithm

154

6.4.1

The Exhaustive Block-Matching Algorithm, 155

6.4.2

Fractional Accuracy Search, 157

6.4.3

Fast Algorithms, 159

6.4.4

Imposing Motion Smoothness Constraints, 161

6.4.5

Phase Correlation Method, 162

6.4.6

Binary Feature Matching, 163

6.5

Deformable Block-Matching Algorithms

165

6.5.1

Node-Based Motion Representation, 166

(8)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

xiii

6.6

Mesh-Based Motion Estimation

169

6.6.1

Mesh-Based Motion Representation, 171

6.6.2

Motion Estimation Using the Mesh-Based Model, 173

6.7

Global Motion Estimation

177

6.7.1

Robust Estimators, 177

6.7.2

Direct Estimation, 178

6.7.3

Indirect Estimation, 178

6.8

Region-Based Motion Estimation

179

6.8.1

Motion-Based Region Segmentation, 180

6.8.2

Joint Region Segmentation and Motion Estimation, 181

6.9

Multiresolution Motion Estimation

182

6.9.1

General Formulation, 182

6.9.2

Hierarchical Block Matching Algorithm, 184

6.10

Application of Motion Estimation in Video Coding

187

6.11

Summary

188

6.12

Problems

189

6.13

Bibliography

191

7

THREE-DIMENSIONAL MOTION ESTIMATION

194

7.1

Feature-Based Motion Estimation

195

7.1.1

Objects of Known Shape under Orthographic Projection, 195

7.1.2

Objects of Known Shape under Perspective Projection, 196

7.1.3

Planar Objects, 197

7.1.4

Objects of Unknown Shape Using the Epipolar Line, 198

7.2

Direct Motion Estimation

203

7.2.1

Image Signal Models and Motion, 204

7.2.2

Objects of Known Shape, 206

7.2.3

Planar Objects, 207

7.2.4

Robust Estimation, 209

7.3

Iterative Motion Estimation

212

7.4

Summary

213

7.5

Problems

214

7.6

Bibliography

215

8

FOUNDATIONS OF VIDEO CODING

217

8.1

Overview of Coding Systems

218

8.1.1

General Framework, 218

(9)

wang-50214

wang˙fm

August 23, 2001

14:22

xiv

Contents

8.2

Basic Notions in Probability and Information Theory

221

8.2.1

Characterization of Stationary Sources, 221

8.2.2

Entropy and Mutual Information for Discrete Sources, 222

8.2.3

Entropy and Mutual Information for Continuous

Sources, 226

8.3

Information Theory for Source Coding

227

8.3.1

Bound for Lossless Coding, 227

8.3.2

Bound for Lossy Coding, 229

8.3.3

Rate-Distortion Bounds for Gaussian Sources, 232

8.4

Binary Encoding

234

8.4.1

Huffman Coding, 235

8.4.2

Arithmetic Coding, 238

8.5

Scalar Quantization

241

8.5.1

Fundamentals, 241

8.5.2

Uniform Quantization, 243

8.5.3

Optimal Scalar Quantizer, 244

8.6

Vector Quantization

248

8.6.1

Fundamentals, 248

8.6.2

Lattice Vector Quantizer, 251

8.6.3

Optimal Vector Quantizer, 253

8.6.4

Entropy-Constrained Optimal Quantizer Design, 255

8.7

Summary

257

8.8

Problems

259

8.9

Bibliography

261

9

WAVEFORM-BASED VIDEO CODING

263

9.1

Block-Based Transform Coding

263

9.1.1

Overview, 264

9.1.2

One-Dimensional Unitary Transform, 266

9.1.3

Two-Dimensional Unitary Transform, 269

9.1.4

The Discrete Cosine Transform, 271

9.1.5

Bit Allocation and Transform Coding Gain, 273

9.1.6

Optimal Transform Design and the KLT, 279

9.1.7

DCT-Based Image Coders and the JPEG Standard, 281

9.1.8

Vector Transform Coding, 284

9.2

Predictive Coding

285

9.2.1

Overview, 285

9.2.2

Optimal Predictor Design and Predictive Coding Gain, 286

9.2.3

Spatial-Domain Linear Prediction, 290

(10)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

xv

9.3

Video Coding Using Temporal Prediction and Transform Coding

293

9.3.1

Block-Based Hybrid Video Coding, 293

9.3.2

Overlapped Block Motion Compensation, 296

9.3.3

Coding Parameter Selection, 299

9.3.4

Rate Control, 302

9.3.5

Loop Filtering, 305

9.4

Summary

308

9.5

Problems

309

9.6

Bibliography

311

10

CONTENT-DEPENDENT VIDEO CODING

314

10.1

Two-Dimensional Shape Coding

314

10.1.1

Bitmap Coding, 315

10.1.2

Contour Coding, 318

10.1.3

Evaluation Criteria for Shape Coding Efficiency, 323

10.2

Texture Coding for Arbitrarily Shaped Regions

324

10.2.1

Texture Extrapolation, 324

10.2.2

Direct Texture Coding, 325

10.3

Joint Shape and Texture Coding

326

10.4

Region-Based Video Coding

327

10.5

Object-Based Video Coding

328

10.5.1

Source Model F2D, 330

10.5.2

Source Models R3D and F3D, 332

10.6

Knowledge-Based Video Coding

336

10.7

Semantic Video Coding

338

10.8

Layered Coding System

339

10.9

Summary

342

10.10

Problems

343

10.11

Bibliography

344

11

SCALABLE VIDEO CODING

349

11.1

Basic Modes of Scalability

350

11.1.1

Quality Scalability, 350

11.1.2

Spatial Scalability, 353

11.1.3

Temporal Scalability, 356

11.1.4

Frequency Scalability, 356

(11)

wang-50214

wang˙fm

August 23, 2001

14:22

xvi

Contents

11.1.5

Combination of Basic Schemes, 357

11.1.6

Fine-Granularity Scalability, 357

11.2

Object-Based Scalability

359

11.3

Wavelet-Transform-Based Coding

361

11.3.1

Wavelet Coding of Still Images, 363

11.3.2

Wavelet Coding of Video, 367

11.4

Summary

370

11.5

Problems

370

11.6

Bibliography

371

12

STEREO AND MULTIVIEW SEQUENCE PROCESSING

374

12.1

Depth Perception

375

12.1.1

Binocular Cues—Stereopsis, 375

12.1.2

Visual Sensitivity Thresholds for Depth Perception, 375

12.2

Stereo Imaging Principle

377

12.2.1

Arbitrary Camera Configuration, 377

12.2.2

Parallel Camera Configuration, 379

12.2.3

Converging Camera Configuration, 381

12.2.4

Epipolar Geometry, 383

12.3

Disparity Estimation

385

12.3.1

Constraints on Disparity Distribution, 386

12.3.2

Models for the Disparity Function, 387

12.3.3

Block-Based Approach, 388

12.3.4

Two-Dimensional Mesh-Based Approach, 388

12.3.5

Intra-Line Edge Matching Using Dynamic Programming, 391

12.3.6

Joint Structure and Motion Estimation, 392

12.4

Intermediate View Synthesis

393

12.5

Stereo Sequence Coding

396

12.5.1

Block-Based Coding and MPEG-2 Multiview Profile, 396

12.5.2

Incomplete Three-Dimensional Representation

of Multiview Sequences, 398

12.5.3

Mixed-Resolution Coding, 398

12.5.4

Three-Dimensional Object-Based Coding, 399

12.5.5

Three-Dimensional Model-Based Coding, 400

12.6

Summary

400

12.7

Problems

402

(12)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

xvii

13

VIDEO COMPRESSION STANDARDS

405

13.1

Standardization

406

13.1.1

Standards Organizations, 406

13.1.2

Requirements for a Successful Standard, 409

13.1.3

Standard Development Process, 411

13.1.4

Applications for Modern Video Coding Standards, 412

13.2

Video Telephony with H.261 and H.263

413

13.2.1

H.261 Overview, 413

13.2.2

H.263 Highlights, 416

13.2.3

Comparison, 420

13.3

Standards for Visual Communication Systems

421

13.3.1

H.323 Multimedia Terminals, 421

13.3.2

H.324 Multimedia Terminals, 422

13.4

Consumer Video Communications with MPEG-1

423

13.4.1

Overview, 423

13.4.2

MPEG-1 Video, 424

13.5

Digital TV with MPEG-2

426

13.5.1

Systems, 426

13.5.2

Audio, 426

13.5.3

Video, 427

13.5.4

Profiles, 435

13.6

Coding of Audiovisual Objects with MPEG-4

437

13.6.1

Systems, 437

13.6.2

Audio, 441

13.6.3

Basic Video Coding, 442

13.6.4

Object-Based Video Coding, 445

13.6.5

Still Texture Coding, 447

13.6.6

Mesh Animation, 447

13.6.7

Face and Body Animation, 448

13.6.8

Profiles, 451

13.6.9

Evaluation of Subjective Video Quality, 454

13.7

Video Bit Stream Syntax

454

13.8

Multimedia Content Description Using MPEG-7

458

13.8.1

Overview, 458

13.8.2

Multimedia Description Schemes, 459

13.8.3

Visual Descriptors and Description Schemes, 461

13.9

Summary

465

13.10

Problems

466

13.11

Bibliography

467

(13)

wang-50214

wang˙fm

August 23, 2001

14:22

xviii

Contents

14

ERROR CONTROL IN VIDEO COMMUNICATIONS

472

14.1

Motivation and Overview of Approaches

473

14.2

Typical Video Applications and Communication Networks

476

14.2.1

Categorization of Video Applications, 476

14.2.2

Communication Networks, 479

14.3

Transport-Level Error Control

485

14.3.1

Forward Error Correction, 485

14.3.2

Error-Resilient Packetization and Multiplexing, 486

14.3.3

Delay-Constrained Retransmission, 487

14.3.4

Unequal Error Protection, 488

14.4

Error-Resilient Encoding

489

14.4.1

Error Isolation, 489

14.4.2

Robust Binary Encoding, 490

14.4.3

Error-Resilient Prediction, 492

14.4.4

Layered Coding with Unequal Error Protection, 493

14.4.5

Multiple-Description Coding, 494

14.4.6

Joint Source and Channel Coding, 498

14.5

Decoder Error Concealment

498

14.5.1

Recovery of Texture Information, 500

14.5.2

Recovery of Coding Modes and Motion Vectors, 501

14.5.3

Syntax-Based Repair, 502

14.6

Encoder–Decoder Interactive Error Control

502

14.6.1

Coding-Parameter Adaptation Based on Channel Conditions, 503

14.6.2

Reference Picture Selection Based on Feedback Information, 503

14.6.3

Error Tracking Based on Feedback Information, 504

14.6.4

Retransmission without Waiting, 504

14.7

Error-Resilience Tools in H.263 and MPEG-4

505

14.7.1

Error-Resilience Tools in H.263, 505

14.7.2

Error-Resilience Tools in MPEG-4, 508

14.8

Summary

509

14.9

Problems

511

14.10

Bibliography

513

15

STREAMING VIDEO OVER THE INTERNET AND

WIRELESS IP NETWORKS

519

15.1

Architecture for Video Streaming Systems

520

15.2

Video Compression

522

(14)

wang-50214

wang˙fm

August 23, 2001

14:22

Contents

xix

15.3

Application-Layer QoS Control for Streaming Video

522

15.3.1

Congestion Control, 522

15.3.2

Error Control, 525

15.4

Continuous Media Distribution Services

529

15.4.1

Network Filtering, 529

15.4.2

Application-Level Multicast, 531

15.4.3

Content Replication, 532

15.5

Streaming Servers

533

15.5.1

Real-Time Operating System, 534

15.5.2

Storage System, 537

15.6

Media Synchronization

539

15.7

Protocols for Streaming Video

542

15.7.1

Transport Protocols, 543

15.7.2

Session Control Protocol: RTSP, 545

15.8

Streaming Video over Wireless IP Networks

546

15.8.1

Network-Aware Applications, 548

15.8.2

Adaptive Service, 549

15.9

Summary

554

15.10

Bibliography

555

APPENDIX A: DETERMINATION OF SPATIAL–TEMPORAL

GRADIENTS

562

A.1

First- and Second-Order Gradient

562

A.2

Sobel Operator

563

A.3

Difference of Gaussian Filters

563

APPENDIX B: GRADIENT DESCENT METHODS

565

B.1

First-Order Gradient Descent Method

565

B.2

Steepest Descent Method

566

B.3

Newton’s Method

566

B.4

Newton-Ralphson Method

567

B.5

Bibliography

567

APPENDIX C: GLOSSARY OF ACRONYMS

568

(15)

wang-50214

wang˙fm

August 23, 2001

14:22

(16)

wang-50214

wang˙fm

August 23, 2001

14:22

Preface

In the past decade or so, there have been fascinating developments in multimedia

rep-resentation and communications. First of all, it has become very clear that all aspects

of media are “going digital”; from representation to transmission, from processing to

retrieval, from studio to home. Second, there have been significant advances in digital

multimedia compression and communication algorithms, which make it possible to

deliver high-quality video at relatively low bit rates in today’s networks. Third, the

advancement in VLSI technologies has enabled sophisticated software to be

imple-mented in a cost-effective manner. Last but not least, the establishment of half a dozen

international standards by ISO/MPEG and ITU-T laid the common groundwork for

different vendors and content providers.

At the same time, the explosive growth in wireless and networking technology

has profoundly changed the global communications infrastructure. It is the confluence

of wireless, multimedia, and networking that will fundamentally change the way people

conduct business and communicate with each other. The future computing and

com-munications infrastructure will be empowered by virtually unlimited bandwidth, full

connectivity, high mobility, and rich multimedia capability.

As multimedia becomes more pervasive, the boundaries between video, graphics,

computer vision, multimedia database, and computer networking start to blur, making

video processing an exciting field with input from many disciplines. Today, video

processing lies at the core of multimedia. Among the many technologies involved, video

coding and its standardization are definitely the key enablers of these developments.

This book covers the fundamental theory and techniques for digital video processing,

with a focus on video coding and communications. It is intended as a textbook for a

graduate-level course on video processing, as well as a reference or self-study text for

(17)

wang-50214

wang˙fm

August 23, 2001

14:22

xxii

Preface

researchers and engineers. In selecting the topics to cover, we have tried to achieve

a balance between providing a solid theoretical foundation and presenting complex

system issues in real video systems.

SYNOPSIS

Chapter 1 gives a broad overview of video technology, from analog color TV

sys-tem to digital video. Chapter 2 delineates the analytical framework for video analysis

in the frequency domain, and describes characteristics of the human visual system.

Chapters 3–12 focus on several very important sub-topics in digital video technology.

Chapters 3 and 4 consider how a continuous-space video signal can be sampled to

retain the maximum perceivable information within the affordable data rate, and how

video can be converted from one format to another. Chapter 5 presents models for

the various components involved in forming a video signal, including the camera, the

illumination source, the imaged objects and the scene composition. Models for the

three-dimensional (3-D) motions of the camera and objects, as well as their projections

onto the two-dimensional (2-D) image plane, are discussed at length, because these

models are the foundation for developing motion estimation algorithms, which are

the subjects of Chapters 6 and 7. Chapter 6 focuses on 2-D motion estimation, which

is a critical component in modern video coders. It is also a necessary preprocessing

step for 3-D motion estimation. We provide both the fundamental principles governing

2-D motion estimation, and practical algorithms based on different 2-D motion

repre-sentations. Chapter 7 considers 3-D motion estimation, which is required for various

computer vision applications, and can also help improve the efficiency of video coding.

Chapters 8–11 are devoted to the subject of video coding. Chapter 8 introduces

the fundamental theory and techniques for source coding, including information theory

bounds for both lossless and lossy coding, binary encoding methods, and scalar and

vector quantization. Chapter 9 focuses on waveform-based methods (including

trans-form and predictive coding), and introduces the block-based hybrid coding framework,

which is the core of all international video coding standards. Chapter 10 discusses

content-dependent coding, which has the potential of achieving extremely high

com-pression ratios by making use of knowledge of scene content. Chapter 11 presents

scalable coding methods, which are well-suited for video streaming and

broadcast-ing applications, where the intended recipients have varybroadcast-ing network connections and

computing powers. Chapter 12 introduces stereoscopic and multiview video processing

techniques, including disparity estimation and coding of such sequences.

Chapters 13–15 cover system-level issues in video communications. Chapter 13

introduces the H.261, H.263, MPEG-1, MPEG-2, and MPEG-4 standards for video

coding, comparing their intended applications and relative performance. These

stan-dards integrate many of the coding techniques discussed in Chapters 8–11. The MPEG-7

standard for multimedia content description is also briefly described. Chapter 14 reviews

techniques for combating transmission errors in video communication systems, and

also describes the requirements of different video applications, and the characteristics

(18)

wang-50214

wang˙fm

August 23, 2001

14:22

Preface

xxiii

of various networks. As an example of a practical video communication system, we

end the text with a chapter devoted to video streaming over the Internet and wireless

network. Chapter 15 discusses the requirements and representative solutions for the

major subcomponents of a streaming system.

SUGGESTED USE FOR INSTRUCTION AND SELF-STUDY

As prerequisites, students are assumed to have finished undergraduate courses in signals

and systems, communications, probability, and preferably a course in image

process-ing. For a one-semester course focusing on video coding and communications, we

recommend covering the two beginning chapters, followed by video modeling

(Chap-ter 5), 2-D motion estimation (Chap(Chap-ter 6), video coding (Chap(Chap-ters 8–11), standards

(Chapter 13), error control (Chapter 14) and video streaming systems (Chapter 15).

On the other hand, for a course on general video processing, the first nine chapters,

in-cluding the introduction (Chapter 1), frequency domain analysis (Chapter 2), sampling

and sampling rate conversion (Chapters 3 and 4), video modeling (Chapter 5), motion

estimation (Chapters 6 and 7), and basic video coding techniques (Chapters 8 and 9),

plus selected topics from Chapters 10–13 (content-dependent coding, scalable coding,

stereo, and video coding standards) may be appropriate. In either case, Chapter 8 may

be skipped or only briefly reviewed if the students have finished a prior course on

source coding. Chapters 7 (3-D motion estimation), 10 (content-dependent coding),

11 (scalable coding), 12 (stereo), 14 (error-control), and 15 (video streaming) may also

be left for an advanced course in video, after covering the other chapters in a first course

in video. In all cases, sections denoted by asterisks (*) may be skipped or left for further

exploration by advanced students.

Problems are provided at the end of Chapters 1–14 for self-study or as

home-work assignments for classroom use. Appendix D gives answers to selected problems.

The website for this book (www.prenhall.com/wang) provides MATLAB scripts used to

generate some of the plots in the figures. Instructors may modify these scripts to generate

similar examples. The scripts may also help students to understand the underlying

operations. Sample video sequences can be downloaded from the website, so that

students can evaluate the performance of different algorithms on real sequences. Some

compressed sequences using standard algorithms are also included, to enable instructors

to demonstrate coding artifacts at different rates by different techniques.

ACKNOWLEDGMENTS

We are grateful to the many people who have helped to make this book a reality. Dr.

Barry G. Haskell of AT&T Labs, with his tremendous experience in video coding

stan-dardization, reviewed Chapter 13 and gave valuable input to this chapter as well as other

topics. Prof. David J. Goodman of Polytechnic University, a leading expert in wireless

communications, provided valuable input to Section 14.2.2, part of which summarize

characteristics of wireless networks. Prof. Antonio Ortega of the University of Southern

(19)

wang-50214

wang˙fm

August 23, 2001

14:22

xxiv

Preface

California and Dr. Anthony Vetro of Mitsubishi Electric Research Laboratories, then

a Ph.D. student at Polytechnic University, suggested what topics to cover in the

sec-tion on rate control, and reviewed Secsec-tions 9.3.3–4. Mr. Dapeng Wu, a Ph.D. student

at Carnegie Mellon University, and Dr. Yiwei Hou from Fijitsu Labs helped to draft

Chapter 15. Dr. Ru-Shang Wang of Nokia Research Center, Mr. Fatih Porikli of

Mit-subishi Electric Research Laboratories, also a Ph.D. student at Polytechnic University,

and Mr. Khalid Goudeaux, a student at Carnegie Mellon University, generated several

images related to stereo. Mr. Haidi Gu, a student at Polytechnic University, provided

the example image for scalable video coding. Mrs. Dorota Ostermann provided the

brilliant design for the cover.

We would like to thank the anonymous reviewers who provided valuable

com-ments and suggestions to enhance this work. We would also like to thank the students

at Polytechnic University, who used draft versions of the text and pointed out many

typographic errors and inconsistencies. Solutions included in Appendix D are based on

their homeworks. Finally, we would like to acknowledge the encouragement and

guid-ance of Tom Robbins at Prentice Hall. Yao Wang would like to acknowledge research

grants from the National Science Foundation and New York State Center for Advanced

Technology in Telecommunications over the past ten years, which have led to some of

the research results included in this book.

Most of all, we are deeply indebted to our families, for allowing and even

encour-aging us to complete this project, which started more than four years ago and took away

a significant amount of time we could otherwise have spent with them. The arrival of

our new children Yana and Brandon caused a delay in the creation of the book but also

provided an impetus to finish it. This book is a tribute to our families, for their love,

affection, and support.

Y

AO

W

ANG

Polytechnic University, Brooklyn, NY, USA

[email protected]

J ¨

ORN

O

STERMANN

AT&T Labs—Research, Middletown, NJ, USA

[email protected]

Y

A

-Q

IN

Z

HANG

Microsoft Research, Beijing, China

(20)

VIDEO FORMATION,

PERCEPTION, AND

REPRESENTATION

In this rst chapter, we describe what is a video signal, how is it captured and

perceived, how is it stored/transmitted, and what are the important parameters

thatdeterminethequalityandbandwidth(whichinturndeterminesthedatarate)

of a video signal. We rst present the underlying physics for color perception

and speci cation (Sec. 1.1). We then describe the principles and typical devices

for video capture and display (Sec. 1.2). As will be seen, analog videos are

cap-tured/stored/transmitted in a raster scan format, using either progressive or

in-terlacedscans. Asan example,wereviewtheanalogcolortelevision(TV) system

(Sec.1.4),andgiveinsightsastohowarecertaincriticalparameters,suchasframe rateandlinerate,chosen,whatisthespectralcontentofacolorTVsignal,andhow

candi erentcomponentsofthesignalbemultiplexed into acompositesignal.

Fi-nally,Section1.5introducestheITU-RBT.601videoformat(formerlyCCIR601),

thedigitizedversionoftheanalogcolorTVsignal. Wepresentsomeofthe consider-ationsthathavegoneintotheselectionofvariousdigitizationparameters. Wealso

describeseveralotherdigitalvideoformats,includinghigh-de nitionTV(HDTV).

Thecompressionstandardsdevelopedfordi erentapplicationsandtheirassociated

videoformatsaresummarized.

Thepurposeofthischapter istogivethereadersbackgroundknowledgeabout

analogand digitalvideo, and to provideinsights to commonvideo systemdesign

problems. As such, the presentation is intentionally made more qualitative than

quantitative. Inlater chapters, wewill come back to certain problemsmentioned

inthis chapterandprovidemorerigorousdescriptions/solutions.

1.1 Color Perception and Speci cation

A video signal is a sequence of two dimensional (2D) images projected from a

(21)

colorvalueatanypointinavideoframerecordstheemittedorre ectedlightata particular3Dpointintheobservedscene. Tounderstandwhatdoesthecolorvalue meanphysically, wereview in this sectionbasicsof lightphysicsand describethe attributesthat characterizelightandits color. Wewill alsodescribetheprinciple ofhumancolorperceptionanddi erentwaystospecifyacolorsignal.

1.1.1 Light and Color

Light is an electromagnetic wave with wavelengths in the range of 380 to 780

nanometer(nm), to which thehumaneyeissensitive. Theenergyoflightis

mea-suredby ux,withaunitofwatt,whichistherateatwhichenergyisemitted. The radiantintensity of alight, which is directlyrelatedto the brightnessof thelight we perceive, is de ned asthe ux radiated into a unit solid angle in aparticular direction,measuredinwatt/solid-angle. Alightsourceusually canemit energyin arangeofwavelengths,anditsintensitycanbevaryinginbothspaceandtime. In thisbook,weuseC(X;t;)torepresenttheradiantintensitydistributionofalight, whichspeci es thelightintensityat wavelength ,spatial location X=(X;Y;Z)

andtimet.

Theperceivedcolorofalightdependsonitsspectralcontent(i.e. thewavelength

composition). Forexample, alightthat has itsenergy concentratednear 700nm

appearsred. Alightthathasequalenergyintheentirevisiblebandappearswhite.

In general, alight that has a verynarrow bandwidth is referred to as a spectral

color. Ontheotherhand,awhitelightissaidto beachromatic.

There are twotypes of light sources: the illuminating source, which emits an

electromagnetic wave, and there ecting source, which re ects an incident wave.

1

The illuminating light sources include the sun, light bulbs, the television (TV)

monitors,etc. Theperceivedcolorof anilluminating lightsourcedepends onthe

wavelengthrangeinwhichitemitsenergy. Theilluminatinglightfollowsanadditive rule,i.e. theperceivedcolorofseveralmixedilluminatinglightsourcesdependson thesumofthespectraofalllightsources. Forexample,combiningred,green,and bluelightsinrightproportionscreatesthewhitecolor.

There ectinglightsourcesarethosethatre ectanincidentlight(whichcould itselfbeare ectedlight). Whenalightbeamhitsanobject,theenergyinacertain wavelengthrangeisabsorbed,whiletherestisre ected. Thecolorofare ectedlight dependsonthespectralcontentoftheincidentlightandthewavelengthrangethat isabsorbed. A re ectinglightsourcefollowsasubtractiverule,i.e. theperceived colorofseveralmixedre ectinglightsourcesdependsontheremaining,unabsorbed wavelengths. Themostnotablere ectinglightsourcesarethecolordyesandpaints. Forexample,iftheincidentlightiswhite, adyethatabsorbsthewavelengthnear 700nm(red)appearsascyan. Inthissense,wesaythatcyanisthecomplementof

1

Theilluminatingandre ectinglightsourcesarealsoreferredtoasprimaryandsecondarylight sources,respectively. Wedonotusethosetermstoavoidtheconfusionwiththeprimarycolors associatedwithlight. Inotherplaces, illuminatingand re ectinglightsarealsocalledadditive

(22)

Figure 1.1. Solidline: Frequencyresponsesof the threetypesof cones onthe human retina. Theblueresponsecurveismagni edbyafactorof20inthe gure. DashedLine: TheluminouseÆciencyfunction. From[10 ,Fig.1].

red(orwhiteminus red). Similarly,magentaandyellowarecomplementsofgreen

and blue, respectively. Mixing cyan, magenta, and yellow dyes produces black,

whichabsorbstheentirevisiblespectrum.

1.1.2 Human Perception of Color

Theperceptionofalightinthehumanbeingstartswiththephotoreceptorslocated

in the retina (the surface of the rear of the eye ball). There are two types of

receptors: cones that function under bright light andcan perceivethecolor tone,

and rods that work under lowambient light and canonly extract the luminance

information. Thevisualinformationfromtheretinaispassedviaopticnerve bers tothebrainareacalledthevisualcortex,wherevisualprocessingandunderstanding

isaccomplished. Therearethreetypesofconeswhichhaveoverlappingpass-bands

inthevisiblespectrumwithpeaksatred(near570nm),green(near535nm),and

blue(near445nm)wavelengths,respectively,asshowninFigure1.1. Theresponses ofthesereceptorsto anincominglightdistributionC()can bedescribedby:

C i = Z C()a i ()d; i=r;g;b; (1.1.1) where a r ();a g ();a b

() arereferredto asthefrequencyresponses orrelative

ab-sorption functions of the red, green, and blue cones. The combination of these

threetypesofreceptorsenablesahumanbeingto perceiveanycolor. Thisimplies

that the perceived coloronly depends on three numbers, C

r ;C g ;C b , rather than thecompletelightspectrumC(). Thisisknownasthetri-receptortheoryofcolor

(23)

There are two attributes that describe the color sensation of a human being:

luminanceandchrominance. Thetermluminance referstotheperceivedbrightness

ofthelight,whichisproportionaltothetotalenergyinthevisibleband. Theterm

chrominance describes the perceived color tone of a light, which depends on the

wavelength compositionof thelight. Chrominanceisin turncharacterizedbytwo

attributes: hue and saturation. Hue speci es the color tone, which depends on

thepeakwavelengthofthelight,whilesaturation describeshowpurethecoloris,

whichdependsonthespreadorbandwidthofthelightspectrum. Inthisbook,we

usethewordcolortorefertoboththeluminance andchrominanceattributesofa

light, although it is customary to use the word colorto referto the chrominance

aspectofalightonly.

Experimentshaveshown that there exists asecondaryprocessing stage in the

humanvisualsystem(HVS),whichconvertsthethreecolorvaluesobtainedbythe

conesintoonevaluethatisproportionaltotheluminanceandtwoothervaluesthat

areresponsibleforthe perception ofchrominance. This is knownastheopponent

color model oftheHVS[3,9]. It hasbeenfoundthat thesameamountofenergy

produces di erent sensations of the brightness at di erent wavelengths, and this

wavelength-dependent variation of the brightness sensation is characterized by a

relative luminous eÆciency function, a

y

(), which is also shown (in dashed line) in Fig. 1.1. It is essentially thesum of thefrequency responses of allthree types

ofcones. Wecan see thatthegreen wavelengthcontributesmostto theperceived

brightness,theredwavelengththesecond,and theblue theleast. The luminance

(oftendenotedbyY)isrelatedtotheincominglightspectrumby:

Y =

Z C()a

y

()d: (1.1.2)

In theaboveequations, wehave neglectedthe time andspace variables, since we

are only concerned with the perceived color or luminance at a xed spatial and

temporal location. Wealsoneglectedthescaling factorcommonlyassociatedwith

eachequation,whichdependsonthedesiredunitfordescribingthecolorintensities

andluminance.

1.1.3 The Trichromatic Theory of Color Mixture

A veryimportant ndingin color physicsis that mostcolorscanbeproduced by

mixing three properly chosen primary colors. This is known as the trichromatic

theoryof colormixture, rstdemonstratedbyMaxwellin1855[9,13]. LetC

k ;k= 1;2;3representthecolorsofthreeprimarycolorsources,andCagivencolor. Then thetheoryessentiallysays

C= X k =1;2;3 T k C k ; (1.1.3) where T k

's are the amounts of the three primary colors required to match color

(24)

negative. Assuming onlyT 1

is negative,this means that one cannot match color

C by mixing C 1 ;C 2 ;C 3

, but one can match colorC+jT

1 jC 1 with T 2 C 2 +T 3 C 3 :

In practice, the primary colors should be chosen so that mostnatural colors can

be reproduced using positive combinations of primary colors. The most popular

primary set for theilluminating light sourcecontains red, green, and blue colors,

knownastheRGBprimary. Themostcommonprimarysetforthere ectinglight

source containscyan, magenta, and yellow, known astheCMY primary. Infact,

RGB and CMY primary sets are complement of each other, in that mixing two

colorsin oneset willproduceonecolorin theother set. Forexample,mixing red withgreenwillyieldyellow. Thiscomplementaryinformationisbestillustratedby acolorwheel,which canbefoundin manyimageprocessingbooks,e.g.,[9, 4].

For achosenprimary set,one waytodeterminetristimulusvaluesofanycolor isby rstdeterminingthecolormatchingfunctions,m

i

(), forprimarycolors,C i

,

i=1,2,3. These functions describe the tristimulus values of a spectral color with

wavelength , for various  in the entire visible band, and can bedetermined by

visualexperimentswithcontrolledviewing conditions. Thenthetristimulusvalues foranycolorwithaspectrumC() canbeobtainedby[9]:

T i = Z C()m i ()d; i=1;2;3: (1.1.4)

Toproduceallvisiblecolorswithpositivemixing,thematchingfunctionsassociated withtheprimarycolorsmustbepositive.

Theabovetheory forms thebasisfor colorcaptureand display. Torecordthe colorofanincominglight,acameraneedstohavethreesensorsthathavefrequency responsessimilartothecolormatchingfunctionsofachosenprimaryset. Thiscan beaccomplishedbyopticalorelectronic lterswiththedesiredfrequencyresponses. Similarly, todisplayacolorpicture,thedisplaydevice needstoemit threeoptical

beams of the chosen primary colors with appropriate intensities, as speci ed by

the tristimulus values. In practice, electronic beams that strike phosphors with

the red, green and blue colors are used. All present display systems use a RGB

primary, although the standard spectra speci ed for the primary colors may be

slightlydi erent. Likewise, acolorprinter canproducedi erentcolorsby mixing

three dyes with the chosen primary colors in appropriate proportions. Most of

the color printers use the CMY primary. For amore vivid and wide-rangecolor

rendition,somecolorprintersusefourprimaries,byaddingblack(K)to theCMY

set. Thisis known asthe CMYKprimary, which canrendertheblack colormore

truthfully.

1.1.4 Color Speci cation by Tristimulus Values

TristimulusValues Wehaveintroducedthetristimulusrepresentation ofacolor

in Sec. 1.1.3, which speci es the proportions, i.e. the T k

's in Eq. (1.1.3), of the threeprimarycolorsneededtocreatethedesiredcolor. Inordertomakethecolor

(25)

should benormalizedso that T k

=1;k=1;2;3for areferencewhite color(equal

energy in allwavelengths) with aunit energy. Whenweuse aRGB primary, the

tristimulusvaluesareusuallydenotedbyR ;G;andB.

ChromaticityValues: Theabovetristimulusrepresentationmixesthe luminance

andchrominanceattributesof acolor. Tomeasure onlythechrominance

informa-tion(i.e. thehueandsaturation)ofalight,thechromaticitycoordinateis de ned as: t k = T k T 1 +T 2 +T 3 ; k=1;2;3: (1.1.5) Sincet 1 +t 2 +t 3

=1,twochromaticityvaluesaresuÆcienttospecifythe chromi-nanceofacolor.

Obviously, the color value of an imaged point depends on the primary colors

used. Tostandardizecolordescriptionandspeci cation,severalstandardprimary

colorsystemshavebeenspeci ed. Forexample,the CIE,

2

aninternationalbody

ofcolorscientists,de ned aCIE RGBprimary system,whichconsists ofcolorsat

700(R 0 ),546.1(G 0 ),and 435.8(B 0 )nm.

Color CoordinateConversion Onecanconvert thecolorvaluesbasedononeset

ofprimariestothecolorvaluesforanothersetofprimaries. Conversionof(R,G,B)

coordinate to the (C,M,Y) coordinate is, for example, oftenrequired for printing

colorimagesstoredinthe(R,G,B)coordinate. Giventhetristimulusrepresentation

ofoneprimary set in termsofanother primary,one candeterminetheconversion

matrix between the two color coordinates. The principle of color conversionand

thederivationof theconversionmatrixbetweentwosetsofcolorprimariescanbe foundin[9].

1.1.5 Color Speci cation by Luminance and Chrominance

At-tributes

TheRGBprimarycommonlyusedforcolordisplaymixestheluminanceand

chromi-nanceattributesofalight. Inmanyapplications, itisdesirabletodescribeacolor

in terms of itsluminance and chrominancecontentseparately, to enable more

ef- cient processing and transmission of color signals. Towards this goal, various

three-componentcolor coordinates havebeendeveloped, in which one component

re ectsthe luminance and theother twocollectivelycharacterizehueand

satura-tion. Onesuch coordinate istheCIE XYZprimary,in which Ydirectly measures

theluminance intensity. The(X;Y;Z)valuesin thiscoordinateare relatedtothe (R ;G;B)valuesintheCIERGBcoordinateby[9]:

2 4 X Y Z 3 5 = 2 4 2:365 0:515 0:005 0:897 1:426 0:014 0:468 0:089 1:009 3 5 2 4 R G B 3 5 : (1.1.6) 2

(26)

Com-Inadditionto separatingtheluminance andchrominanceinformation,another

advantageoftheCIEXYZsystemisthat almostallvisiblecolorscanbespeci ed

withnon-negativetristimulusvalues,whichisaverydesirablefeature. Theproblem

is that theX,Y,Z colors sode ned are notrealizable by actual colorstimuli. As

such,theXYZprimaryisnotdirectlyusedforcolorproduction,ratheritismainly introducedforde ning otherprimariesandfornumericalspeci cationofcolor. As will be seenlater, thecolorcoordinatesused fortransmissionof colorTVsignals,

suchasYIQandYUV,areallderivedfrom theXYZcoordinate.

Thereareothercolorrepresentationsinwhichthehueandsaturationofacolor areexplicitlyspeci ed,inadditiontotheluminance. OneexampleistheHSI coor-dinate,where Hstandsforhue,S forsaturation,andI forintensity (equivalentto luminance)

3

. Althoughthiscolorcoordinateclearlyseparatesdi erentattributesof alight,itisnonlinearlyrelatedtothetristimulusvaluesandisdiÆculttocompute.

The book by Gonzalez hasa comprehensivecoverageof various color coordinates

andtheirconversions[4].

1.2 Video Capture and Display

1.2.1 Principle of Color Video Imaging

Having explained what is light and how it is perceived and characterized, we are

now in a position to understand themeaning of avideosignal. In short,a video

recordstheemittedand/orre ectedlightintensity,i.e. C(X;t;)from theobjects

in thescene that is observedbyaviewing system(a humaneyeor acamera). In

general,thisintensitychangesbothintimeandspace. Here,weassumethat there aresomeilluminatinglightsourcesinthescene. Otherwise,therewillbenoinjected

norre ectedlightandtheimagewillbetotallydark. Whenobservedbyacamera,

onlythosewavelengthstowhichthecameraissensitivearevisible. Letthespectral

absorption function of the camera be denoted by a

c

(), then the light intensity distributioninthe3Dworldthatis\visible"tothecamerais:

 (X;t)= Z 1 0 C(X;t;)a c ()d: (1.2.1)

Theimage function captured by thecameraat anytime t is theprojectionof

the light distributionin the3D scene onto a2D image plane. Let P()represent

thecameraprojectionoperator so that theprojected2Dposition ofthe3D point

X is given byx =P(X). Furthermore, letP

1

() denote the inverse projection

operator,sothatX=P

1

(x)speci es the3Dpositionassociatedwitha2Dpoint

x:Thentheprojectedimageisrelatedtothe3Dimageby

(P(X);t)=  (X;t) or (x;t)=  P 1 (x);t  : (1.2.2)

Thefunction (x;t)iswhatisknownasavideosignal. Wecanseethatitdescribes

the radiant intensity at the 3D position X that is projected onto x in the image

(27)

planeattimet. Ingeneralthevideosignalhasa nitespatialandtemporalrange.

The spatialrange depends onthe cameraviewing area, whilethe temporal range

dependsonthedurationinwhichthevideoiscaptured. Apointintheimageplane iscalledapixel(meaningpictureelement)orsimplypel.

4

Formostcamerasystems, theprojectionoperatorP()canbeapproximatedbyaperspectiveprojection. This isdiscussedinmoredetailin Sec.5.1.

IfthecameraabsorptionfunctionisthesameastherelativeluminouseÆciency functionofthehumanbeing,i.e. a

c

()=a

y

(),thenaluminanceimageisformed.

If the absorption function is non-zero over a narrow band, then a monochrome

(or monotone) image is formed. To perceive all visible colors, according to the

trichromaticcolorvisiontheory(seeSec.1.1.2),threesensorsareneeded,eachwith afrequencyresponsesimilar tothecolormatchingfunction foraselectedprimary color. Asdescribedbefore,mostcolorcamerasusethered,green,andbluesensors forcoloracquisition.

If the camera hasonly one luminance sensor, (x;t) is ascalar function that

represents the luminance of the projected light. In this book, we use the word

gray-scale to refertosuch avideo. Thetermblack-and-white will beused strictly todescribeanimagethathasonlytwocolors: blackandwhite. Ontheotherhand, ifthecamerahasthreeseparatesensors,eachtunedtoachosenprimarycolor,the signalisavectorfunction that containsthree colorvaluesateverypoint. Instead of specifyingthese colorvalues directly, onecanuse othercolor coordinates (each consistsofthreevalues) tocharacterizelight,asexplainedin theprevioussection.

Note that for special purposes, onemay use sensorsthat work in afrequency

range that is invisible to the human being. For example, in X-ray imaging, the

sensorissensitiveto thespectralrangeoftheX-ray. Ontheotherhand,an infra-redcameraissensitivetotheinfra-redrange,whichcanfunctionatverylowambient light. Thesecamerascan\see"thingsthatcannotbeperceivedbythehumaneye. Yetanotherexampleistherangecamera,inwhichthesensoremitsalaserbeamand

measures thetime it takesfor thebeamto reach anobjectand then bere ected

back to the sensor. Because the round trip time is proportional to the distance

between the sensor and the objectsurface, the image intensity at any point in a

rangeimagedescribesthedistanceorrangeofitscorresponding3Dpointfromthe camera.

1.2.2 Video Cameras

All theanalogcamerasoftodaycaptureavideoin aframebyframemannerwith

acertain time spacing betweenthe frames. Somecameras (e.g. TV camerasand

consumervideocamcorders) acquireaframe byscanning consecutivelines witha

certainlinespacing. Similarly,allthedisplaydevicespresentavideoasa consecu-tivesetofframes,andwithTVmonitors,thescanlinesareplayedbacksequentially asseparatelines. Suchcaptureanddisplaymechanismsaredesignedtotake

advan-4

(28)

tageofthefactthat theHVScannotperceiveveryhighfrequencychangesintime andspace. ThispropertyoftheHVSwillbediscussedmoreextensivelyinSec.2.4.

There are basically two types of video imagers: (1) tube-based imagers such

as vidicons, plumbicons, or orthicons, and (2) solid-state sensors such as

charge-coupleddevices (CCD).The lensof acamerafocuses theimage ofa sceneontoa

photosensitivesurfaceof theimager of thecamera, which converts optical signals into electrical signals. The photosensitive surfaceof the tube imager is typically scannedlinebyline(knownasrasterscan)withanelectronbeamorotherelectronic methods, andthescannedlinesin each framearethenconvertedintoanelectrical signal representingvariations of lightintensity as variations in voltage. Di erent linesarethereforecapturedatslightlydi erenttimesinacontinuousmanner. With

progressive scan, the electronic beam scans every line continuously; while with

interlacedscan, the beamscans everyother line in onehalf of the frame time (a

eld)andthenscanstheotherhalfofthelines. Wewilldiscussrasterscaninmore detailinSec.1.3. WithaCCDcamera,thephotosensitivesurfaceiscomprisedofa 2Darrayofsensors,eachcorrespondingtoonepixel,andtheopticalsignalreaching eachsensorisconvertedtoanelectronicsignal. Thesensorvaluescapturedineach frametimeare rststoredinabu er,whicharethenread-outsequentiallyoneline at atimeto formarastersignal. Unlikethetubebasedcameras,alltheread-out

values in the same frame are captured at the same time. With interlaced scan

camera,alternatelinesareread-outineach eld.

Tocapturecolor,thereareusuallythreetypesofphotosensitivesurfacesorCCD

sensors, eachwith afrequencyresponse that is determined bythe colormatching

functionofthechosenprimarycolor,asdescribedpreviouslyinSec.1.1.3. Toreduce

thecost,mostconsumercamerasuseasingleCCDchipforcolorimaging. Thisis

accomplishedbydividingthesensorareaforeachpixelintothreeorfoursub-areas, eachsensitivetoadi erentprimarycolor. Thethreecaptured colorsignalscanbe

eitherconverted tooneluminance signalandtwochrominancesignalandsentout

asacomponentcolorvideo,ormultiplexedintoacompositesignal. Thissubjectis explainedfurtherin Sec.1.2.4.

ManycamerasoftodayareCCD-basedbecausetheycanbemademuchsmaller

and lighter than the tube-based cameras, to acquire the same spatial resolution.

Advancementin CCD technologyhas madeit possibleto capture in averysmall

chipsizeaveryhighresolutionimagearray. Forexample,1/3-inCCD'swith380K

pixelsarecommonlyfoundinconsumer-usecamcorders,whereasa2/3-inCCDwith

2millionpixels hasbeendeveloped forHDTV.The tube-based camerasare more

bulkyand costly,andareonlyusedin specialapplications,suchasthoserequiring veryhighresolutionorhighsensitivityunderlowambientlight. Inadditiontothe circuitryforcolorimaging,mostcamerasalsoimplementcolorcoordinateconversion

(from RGB to luminance and chrominance) and compositing of luminance and

chrominancesignals. Fordigitaloutput,analog-to-digital(A/D)conversionisalso

incorporated. Figure 1.2 shows the typical processings involvedin a professional

(29)

Figure 1.2. SchematicBlockDiagramof aProfessionalColorVideoCamera. From[6 , Fig.7(a)].

imagequality, digitalprocessingis introducedwithin thecamera. Foranexcellent expositionof thevideocameraanddisplaytechnologies,see[6].

1.2.3 Video Display

Todisplayavideo,themostcommondevice isthecathoderaytube(CRT).With

aCRT monitor,anelectron gunemits anelectronbeamacrossthescreenline by

line, exciting phosphorswith intensities proportionalto the intensityof the video signalatcorrespondinglocations. Todisplayacolorimage,threebeamsareemitted

by three separate guns, exciting red, green, and blue phosphors with the desired

intensitycombinationateachlocation. Tobemoreprecise,eachcolorpixelconsists ofthreeelementsarrangedinasmalltriangle,knownasatriad.

TheCRTcanproduceanimage havingaverylargedynamicrangesothatthe

displayedimagecanbeverybright,suÆcientforviewingduringdaylightorfroma distance. However,thethicknessofaCRTneedstobeaboutthesameasthewidth ofthescreen,fortheelectronstoreachthesideofthescreen. Alargescreenmonitor is thus too bulky, unsuitable for applications requiringthin andportable devices.

Tocircumventthis problem,various atpaneldisplayshavebeendeveloped. One

populardeviceisLiquidCrystalDisplay(LCD).TheprincipleideabehindtheLCD

istochangetheopticalpropertiesandconsequentlythebrightness/colorofthe liq-uidcrystalbyanappliedelectric eld. Theelectric eldcanbegenerated/adapted

by either an arrayof transistors, such asin LCD's using active matrix

thin- lm-transistors(TFT),orbyusingplasma. Theplasmatechnologyeliminatestheneed

for TFT and makeslarge-screen LCD's possible. There are also new designs for

atCRT's. A morecomprehensivedescriptionofvideodisplaytechnologiescanbe

foundin[6].

(30)

frameinstantiscompletelyrecordedonthe lm. Fordisplay,consecutiverecorded framesareplayedbackusingananalogopticalprojectionsystem.

1.2.4 Composite vs. Component Video

Ideally, a color video should be speci ed by three functions or signals, each

de-scribing one color component, in either a tristimulus color representation, or a

luminance-chrominancerepresentation. A video in this format is known as

com-ponent video. Mainly for historical reasons, various composite video formatsalso

exist, wherein the three color signalsare multiplexed into a singlesignal. These

compositeformatswereinventedwhenthecolorTVsystemwas rstdevelopedand

there was a need to transmit the color TV signal in a way so that a

black-and-white TVset canextract from it the luminance component. Theconstruction of

acomposite signalrelieson theproperty thatthe chrominancesignalshavea

sig-ni cantlysmallerbandwidththantheluminancecomponent. Bymodulatingeach

chrominance component to a frequency that is at the high end of the luminance

component,and addingtheresultingmodulatedchrominancesignalsandthe

orig-inal luminance signal together, onecreates acompositesignal that contains both

luminanceandchrominanceinformation. Todisplayacompositevideosignalona

colormonitor,a lterisusedtoseparatethemodulatedchrominancesignalsandthe

luminance signal. Theresultingluminance and chrominancecomponentsarethen

convertedtored,green,andbluecolorcomponents. Withagray-scalemonitor,the luminancesignalaloneisextractedanddisplayeddirectly.

AllpresentanalogTVsystemstransmitcolorTVsignalsinacompositeformat.

The composite format is also used for video storage on some analog tapes(such

as the VHS tape). In addition to being compatible with a gray-scale signal, the

compositeformateliminatestheneedforsynchronizingdi erentcolorcomponents

when processing acolor video. A composite signal also hasa bandwidth that is

signi cantlylowerthanthesumofthebandwidthofthreecomponentsignals,and

thereforecanbetransmittedorstoredmoreeÆciently. These bene tsarehowever

achievedattheexpenseofvideoquality: thereoftenexistnoticeableartifactscaused

bycross-talksbetweencolorandluminancecomponents.

Asacompromisebetweenthedatarateandvideoquality,S-videowasinvented,

whichconsists oftwocomponents,the luminancecomponentand asingle

chromi-nancecomponentwhichisthemultiplexoftwooriginalchrominancesignals. Many

advanced consumer level video cameras and displays enable recording/display of

video in S-video format. Component format is used only in professional video

equipment.

1.2.5 Gamma Correction

References

Related documents

Our results suggest a positive and significant association between dividend payouts and corporate governance practices, indicating that firms pay higher dividends if

Recently, Johnson Matthey chemical engineers and scientists and Anglo Coal mining engineers, have developed COMET TM : a new, lower temperature abatement solution for VAM from

An access point resembles a network hub, relaying data between connected wireless devices in addition to a (usually) single connected wired device, most often an Ethernet

As it has been already discusse d that McDonald’s is undertaking its business in 120 countries of the world, so the company is al wa ys anxious to

Although worm propagation models have been studied widely, there are few studies that identify the origins of an outbreak or determine the number of initially

The results from 43 study models (Sect. 2.1) suggest that the imposed condition of thermodynamic equilibrium for the whole system (sum of two subsystems) defines two new as-

An example of the first electron density profile mea- surements conducted with one-eighth of the Jicamarca antenna ar- ray and a modified Collins transmitter operating at 50 MHz

Average cash rental rates exceed $60 per acre for cropland in the Minnehaha-Moody and Clay-Lincoln-Turner-Union county clusters (Table 3A). Irrigated land average