• No results found

Taos - A Revolutionary H.264 Video Codec Architecture For 2-Way Video Communications Applications

N/A
N/A
Protected

Academic year: 2021

Share "Taos - A Revolutionary H.264 Video Codec Architecture For 2-Way Video Communications Applications"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

Taos - A Revolutionary

H.264 Video Codec

Architecture For 2-Way

Video Communications

Applications

Author Kishan Jainandunsing, PhD VP Marketing Abstract

As the video conferencing and telepresence market continues to make the transition from CIF and D1 resolutions to full blown HD, due to broadband proliferation and declining HD display prices, OEMs are realizing that their only option in this transition is H.264/MPEG-4 AVC (Part 10). However, tradition-ally this has come at a price – high costs, high power dissipation, high encode-decode latencies and low channel densities. The Taos H.264 video codec architecture addresses all these issues. It implements unique features, such as “zero” latency, high channel-density and HD video quality, while keeping costs and power dissipation com-petitive with or below incumbent solutions. In addition, Taos also addresses equally important system-level issues that are inherent to digital video communications, such as noise filtering, optimal network bandwidth usage and error resil-iency and concealment. Taos builds upon 1st generation WW10K low-delay, multi-channel and HD codec chipset from W&W Communica-tions. As such, Taos is a tried and proven video-codec architecture for practical solutions in real-time video systems.

Introduction

The Taos H.264 video codec architecture addresses crucial requirements for latency, multi-channel, video resolution and video quality in 2-way video communica-tions applicacommunica-tions, such as video conferencing, video telephony and telepresence. Taos implements unique features, such as “zero” latency, flexible channel resource allocation and HD video quality, which solve the most demanding requirements in these applications. In addi-tion, Taos also addresses equally important system-level issues, such as noise filtering, optimal network band-width usage and error resiliency and concealment.

Latency and Zero Latency Defined

Simply put, video codec latency is defined here as the time lapse between the first pixel of video appearing in the source and the first pixel of decoded video appear-ing at the destination. Latency sensitive video applica-tions require that the time lapse between source and decoded video is extremely small. How small depends on the application, but as a guideline, keeping latency

Latency

Source Video

Decoded Video Latency between source and decoded video

(2)

video port supports multiplexed video streams. This allows Taos to support up to 32 independent video streams simultaneously in encode or decode mode. Dual DDR controllers support external DDR-2 memory and provide sufficient memory bandwidth and storage capacity to support 32 independent video streams at up to 1920x1088 resolution.

A video pre-processor subsystem supports several func-tions, such as de-multiplexing of input video streams, frame rate adaptation, content-adaptive noise filtering, duplication and downscaling. A video post-processor subsystem provides several functions, such as multiplex-ing several decoded streams onto a smultiplex-ingle video display port and on-screen display (OSD) support.

An I2C master interface allows video peripheral cir-cuits to be controlled, such as PAL/NTSC encoders and decoders, HDMI receivers and transmitters, CMOS and CCD sensors. A flash memory interface controller pro-vides support for flash devices over a serial interface for storage of Taos configuration settings.

A 32-bit/66MHz PCI bus and a 32-bit generic host bus provide communication with an external host processor for network connectivity, audio, driver, operating system and application software support. A high-performance, multi-channel DMA controller handles high-speed data transfers of encoded streams between the codec and external host processor’s memory. The DMA engine sup-ports scatter/gather data transfers, significantly reducing overhead on the host side.

“Zero” Latency Encode-Decode

In mainstream implementations the encoding process starts when a complete frame of video is present, intro-ducing at least 33ms of latency into the encoder and another 33ms at the decoder. Together with multi-pass motion estimation, multi-pass rate control and frame-based source filtering, traditional implementations can easily exhibit in excess of 200ms encode-decode latency. In contrast, Taos implements fine-grain pipelining at the macro-block level, advanced bit rate prediction and in-loop source filtering. The encoding process starts as soon as the first lines of video are available in a frame. In this way the encoder does not need to wait for an down to sub 10ms is a good idea. For convenience we

will call such low latency “zero” latency. This is in con-trast with the orders of magnitude higher latency found in non latency-sensitive applications.

True Multi-Channel Defined

True multi-channel is defined here as independently encodable and decodable video streams. Each video stream is encoded with its own set of encoding param-eters. Changing parameters for one stream does not af-fect other streams and can be done dynamically during the encoding process. Similarly, decoding one stream does not affect the decoding of other streams, including error propagation and concealment.

True HD Defined

The HD, or High-Definition, moniker is used in the video industry for resolutions of 1280x720 and upwards. The term “true HD” refers here to 1920x1088 resolution at 60 frames per second in progressive scan mode. This represents the highest resolution defined [at present] for high-definition video.

The Taos Architecture

A high-level block diagram representation of the Taos architecture is shown below. At the heart of the archi-tecture is the multi-stream, “zero” latency, high-definition H.264 codec. The I/O subsystem supports eight physi-cal video ports, which can be all eight inputs, all eight outputs or a combination of inputs and outputs. Each

Video resolution chart

Taos high-level architecture block diagram

a) Frame-based pipelining, high latency implementation

b) Fine-grain pipelining, “zero” latency implementation Affecting latency through implementation choices

(3)

In case of spatial multiplexing a single stream is cre-ated by multiplexing the frames of the streams into single frames. In this case the frames may be of differ-ent resolution and size, but they must be of the same frame rate. In this mode a single input port can support multiplexing of up to 16 CIF streams, 4 D1 streams, one 720p stream, or one 1080i/p stream. The aggregate of streams across all 8 ports must not exceed 32 streams.

The high channel density and large amount of video ports makes DVR and video server implementations based on Taos very cost effective. Video capture subsys-tems can be kept very simple for 8 port syssubsys-tems. Tem-poral and spatial multiplexing can be performed with readily available standard-of-the-shelf video decoder-multiplexers.

The high number of channels specifically benefits video conferencing applications. The number of cameras can be expanded drastically compared to existing solutions. This allows participants their own camera, which in turn promotes a better overall experience by the participants. entire frame to be present before it starts encoding.

This comes with the extra benefit of very little memory needed for buffering.

In addition, Taos performs single pass motion estima-tion, single-pass rate control and in-loop content-adaptive motion compensated temporal filtering. This, in combination with the macro-block level fine-grain pipelining, results in sub 2ms encode-decode latency for 1080p30 video and sub 4ms latency for D1 video at 30 frames/second. Higher frame rates result in proportion-ately lower latencies and vice versa, since the latency is mainly dependent on the pixel clock of Taos’ video ports. For instance the latency drops to sub 2ms for a D1 stream at 60 frames/second and to sub 1ms for a 1080p60 video stream. Vice versa, the latency for a D1 stream at 15 frames/second increases to sub 8ms and for a 1080p15 stream to sub 4ms. Higher latencies at lower frame rates can be avoided by down sampling the frame rate inside Taos, prior to encoding, using Taos’ frame rate adaptation functionality. Finally, opera-tion in Baseline, Main or High Profile does not affect latency and video quality.

Two-way video communications applications are highly sensitive to latency. In case of noticeable delay a con-versation becomes impossible, unless a “walky-talky” like protocol is strictly followed, but in this case not just for speech but also for motion. This makes the conversation unnatural and cumbersome. With Taos’ “zero” latency, a video conferencing or video telephony session can progress spontaneously and naturally, without the need for awkward and artificial communication protocols between the participants. Requirements of sub 33ms latency are necessary in this case.

Multi-channel Encoding

The Taos input video ports support temporal and spatial multiplexed streams. Through temporal multiplexing, a single stream is created by time division multiplexing of frames of individual streams. In this case resolution and frame size of all streams must be the same. Only the frame rate may be different between streams. In this mode a single input port can support multiplexing of 32 separate video streams. Alternatively, the 32 streams can be distributed across the eight ports.

a) Same resolution & frame size, same frame rates

b) Same resolution & frame size, different frame rates Temporal multiplexing of video streams

Spatial multiplexing of video streams a) High latency implementation

Unnatural and non-spontaneous conversations b) Taos “zero” latency implementation Natural and spontaneous conversations

(4)

ing applications. The quality of the video lies within 2 to 5% of the theoretical performance delivered by the JVT (Joint Video Team) JM (Joint Model) H.264 reference codec.

Continuously increasing broadband coverage, video processing horsepower and image sensor resolutions against continuously falling prices is causing OEMs to rapidly incorporate HD resolutions in video conferenc-ing and telepresence equipment. HD conferencconferenc-ing and telepresence provides a much more gratifying experi-ence to participants than VGA or SD resolution. Return on investments for corporations are therefore more likely to be higher than otherwise.

Frame Rates and Resolutions

Each video port may operate at different frame rates and resolutions, completely independent from each other. The earlier mentioned ability to handle up to 32 streams can be distributed across video ports. The rela-tionship between frame rate and number of streams at a given resolution is given in the table below for 1080, 720, D1 and CIF resolutions, where n is the number of streams.

Relationship Between Frame Rate and Streams By Resolution

Two conditions apply to multi-stream, multi-port distribu-tion for different frame rates and resoludistribu-tions:

1. The total number of frames per second cannot exceed the equivalent of one 1080p60 stream or the equivalent of 1200 CIF frames/second.

2. The total number of streams cannot exceed 32. In this example the conversion factor used between CIF and the other resolutions is according to the table below.

CIF Conversion Factor Between Resolutions

Taos HD video quality compared to JVT JM results

Input Stream Duplication and Scaling

A video input port can duplicate its stream, scale it down and compress it simultaneously with the original stream. For instance, a D1 or 720p30 stream can be copied, scaled down to CIF or QCIF and sub sampled down to 15 frames/second. It is then subsequently com-pressed separately from the original stream.

This function allows OEMs to offer highly innovative fea-tures. For instance, a high-resolution video stream can be simultaneously transmitted with a scaled down copy of itself at lower resolution and frame rate. This copy can be transmitted via a mobile telephony network to a remote participant’s cell phone.

Multi-channel Decoding

Taos output video ports support multiplexing of decod-ed video streams. Several modes are supportdecod-ed, such as picture-by-picture (PxP), picture-in-picture (PiP) and picture-on-picture (PoP). Multiplexing of up to 16 video streams per port is possible, with a maximum of 32 streams in aggregate over 8 video output ports. In case of PxP (tiling) an integer number of multiplexed streams must fit within the output resolution. For instance, 4 D1 or 16 CIF frames in a 1920x1088 frame.

The multi-stream display functions of Taos drastically sim-plify system design. Additional OSD (on-screen display) functions further enhance the functionality and simplify system design.

HD Encoding and Decoding

Taos has the horsepower to encode or decode HD video up to 1080p60, which satisfies the most

demand-Resolution CIF Conversion Factor

CIF 1

DI 4

720p 8

1080i/p 16

Resolution Frame Rate (n = streams)

1080i/p 60/n, n 32

720p 120/n, n 32

D1 300/n, n 32

CIF 1200/n, n ≤32

(5)

Frame rates and resolutions can be changed dynami-cally, not exceeding the maximum processing capacity provided by Taos.

Dynamic control of resolution and frame rate allows a video conferencing or telepresence system to increase the resolution and frame rate on the fly for a camera feed which priority has been increased, at the expense of video streams which priority has been lowered.

Error Resiliency and Concealment

Taos provides a series of powerful error resiliency features. Among these are variable GoP (Group of Pictures) size, I-frame forcing, macro-block intra-refresh and multiple slices. Variable GoP size can be used to make transmission of the compressed video more robust under noisy channel conditions. I-frame forcing can be used for reasonably noise-free transmission channels, which permit very long or infinite GoP sizes. The few times packets are corrupted or dropped, the decoder requests the encoder to transmit an I-frame, so that the decoder can recover from the problem.

Macro-block intra-refresh allows an I-frame to be dis-tributed across multiple frames, thus smoothing out bit rate peaks in I-frame forcing and making I-frames more robust under noisy channel conditions – an error occur-ring in an I-frame slice does not corrupt an entire I-frame in this case, but only the slice in which it occurred. Multiple slices are another method to contain and recover from errors quickly. By dividing up frames into multiple slices, an error in a slice does not propagate across the slice’s boundary and is thus contained. Mul-tiple slices and macro-block intra-refresh have both the effect of lowering overall bit error rates.

On the decode side the decoder can either freeze on the frame immediately preceding the corrupted frame, or substitute corrupt macro-blocks with skips to cover them up.

Taos provides support for the implementation of H.241 protocols on a host processor for communication between the encoder and decoder. Through this, the decoder can signal the encoder to change GoP size, force an I-frame, change macro-block intra-refresh and multiple-slices parameters.

These error resiliency and concealment features are very important in two-way video communications ap-plications. The tolerance for errors is very low in these applications and recovery must happen fast.

Example of frame rate and resolution distribution across Taos’ video ports

Changing resolutions and frame rates dynamically

Various error resiliency techniques supported by Taos

(6)

Network Efficiency

The Taos encoder takes into consideration maximum transmission unit (MTU) size. Slices can be defined as a function of the number of bytes that optimally fits in the MTU. This avoids fragmentation and segmentation. The result is that network bandwidth is not being wasted unnecessarily, but instead is optimally used, without the need for expensive over-provisioning.

Programmability and Time-To-Revenue

Taos strikes a good balance between programmability and hardwired functionality. Its rich register set provides extensive control over many of the video processing and system interface functions. Thus, developers do not have to take on the arduous, time-consuming and expensive task of application software porting and programming of video compression algorithms, as is the case with integrated host CPU and programmable DSP architectures. This in turn means low risk development and quick time-to-revenue for OEMs.

Low Power Dissipation and Cost

Taos is designed with low power dissipation in mind. Total power dissipation in single channel 1080p30 mode is sub 500mW, or sub 25mW in single channel CIF mode at 30 frames/second. This addresses the most stringent power dissipation requirements.

At the same time the Taos architecture has been de-signed with low cost in mind. This is achieved through a combination of efficient logic implementation, a 90nm silicon process and high channel densities. The result is the most competitive cost per channel in the industry.

Conclusions

Taos is a revolutionary H.264 codec architecture, which provides video-processing functionality highly optimized for two-way video communications applications. Its “zero” latency, true multi-channel and true HD capa-bilities meet the most difficult-to-satisfy requirements in these applications. The “zero” latency capabilities address the fundamental problem of real-time opera-tion in communicaopera-tions applicaopera-tions. Features such as extracting motion information, stream copying and scaling, and dynamic channel resolution and frame rate changing open up opportunities for OEMs to innovate on top of these features. Beyond video compression and decompression, Taos addresses important system aspects as well, such as error resiliency, error conceal-ment, noise filtering and network bandwidth utilization. Taos builds on the legacy of the proven W&W Commu-nications WW10K H.264 HD codec chipsets for multi-channel, HD and “zero” latency applications. This makes Taos a sure bet for OEMs in the two-way video commu-nications market.

For More Information

For more information on Taos contact W&W Commu-nications at www.wwcoms.com or write an email to info@wwcoms.com.

Bit Rate Control

Taos implements constant bit rate (CBR) control for network transmission applications as well as variable bit rate (VBR) control for storage applications. Bit rate con-trol does not affect “zero” latency. The variance of the bit rate in case of VBR can be set, so as not to exceed available bandwidth of the storage interface.

Motion Information

Access to motion information ads another dimension of innovation to video conferencing and telepresence systems. This information can be used for instance to detect gesturing by a participant whereupon the camera can promptly zoom in, for instance. In another example, where each participant has its own camera, the frame rate and resolution can be instantly increased for a video feed upon detection of gesturing by the participant on that feed.

Taos provides raw motion information in two ways. One is by providing motion vector statistics (average, mini-mum, maximum and variance) across definable regions and the other is by providing complete motion vector maps and SAD (Sum of Absolute Differences) informa-tion for entire frames. Regions are allowed to overlap. Both motion information methods are highly compute-intensive. Taos therefore off-loads an external host CPU from performing such calculations. Instead, the host may run OEM specific algorithms on the raw motion information, which interpret whether or not motion is occurring, what relevance the motion has and what action to undertake.

Noise Filtering

Taos implements in-loop, content-adaptive motion-compensated temporal filtering (CA-MCTF). This re-duces noise levels in the source video with filter strengths adaptively changing based on the content. Subjective quality greatly improves by leaving fine detailed features in the video unaffected, while removing random noise. Sharpness and clarity of the video is maintained, while encoder bit rates are reduced by up to 45%. The single-pass, in-loop operation of the filter maintains “zero” encode-decode latency.

Noisy Source CA-MTCF Filtered

Bit Rate

100%

55%

Bit Rate

(7)

2903 Bunker Hill Lane, Suite 107 Santa Clara, CA 95054, USA Tel: +1.408.481.0264 Fax: +1.408.213.2951

notice. W&W Communications is a trademark of W&W Communications, Inc. All other trademarks and registered trademarks are property of their respective holders. Copyright ©2001-2006 W&W Communications, Inc. All rights reserved.

Gran Via 6, 4 Madrid, 28013, Spain Tel: +34.91.524.7467 Fax: +34.91.524.7499

Shangdi DongLu #5-1

JingMeng GaoKe Bldg. A, Suite 201 Beijing, China 100085

Tel: +86.10.6296.8780 Fax: +86.10.6296.5943

References

Related documents