Coding Shaping Delay at Constant Target Rate

5 Transmission of CBR MPEG Video

5.2 Intra-frame and Predictive Coding

5.2.2 Coding Shaping Delay at Constant Target Rate

We consider the encoder operating at a xed target rate and study the eect of the coding shaping delay on the visual quality of the encoded stream. This is a realistic scenario as CBR MPEG encoders usually have their xed target rate and change the ratio between I-frame and P-frame dimension according to the motion in the scene being encoded. If it is slow, P-frames are small and more bits out of the GOP budget are used in encoding I-frames. If a scene is fast, P-frames require more bits and the dimension of I-frames decreases accordingly. In order to have a simple (even though not accurate) measure of the visual quality which allows dierent conguration to be compared, we make the following Assumption.

Assumption 4

If the same scene is encoded more than once with dierent encoding parameters, the visual quality of the encoded streams obtained with the dierent experiments can be compared by comparing the dimension of I-frames: larger I-frames mean better visual quality.

According to this Assumption and to what stated above, given the target rate, the fast moving is a scene, the lower the quality.

In a videoconferencing scenario, scenes are usually slow and thus, given a target rate, the encoder produces large I-frames. By doing this it introduces a large coding shaping delay which is lower bound by the dimension of the largest picture according to Disequation (17).

We now devise an expression that gives the coding shaping delay as function of P-frame dimension and other parameters of a CBR encoder that exploits predictive coding. We assume that the encoder uses a constant number of bits to encode each GOP: it must produce these

bits in a GOP period, i.e., each GOP is encoded with N T B bits. Actual CBR do not

usually comply to this assumption; thus, the expression we devise does not provide the coding shaping delay of any particular encoder. Nevertheless, it gives the avor that CBR encoders for videoconferencing applications should better not exploit predictive coding in order to reduce the end-to-end delay.

According to the foregoing assumption, the following equation holds for each GOP

FI ₊N?1

i=1

FPi =NT B (21)

whereFI _{is the dimension of the I-frame in the GOP,} _F_Pi _{is the dimension of the}_it_{h P-frame.}

By extracting FI _{from Equation (21), we obtain, for each GOP,}

FI ₌_N T B ? N?1 X i=1 FPi (22)

The smaller the number of bits used to encode P-frames, the larger the dimension of the I-frame in the GOP.

Equation (18) in Section 5.1.1 gives the coding shaping delay Sc = maxseqF=B, where

maxseqF is the dimension of the largest encoded picture in the sequence. When predictive

coding is exploited, the largest picture is the I-frame in the GOP with the smallest P-frames. For the sake of simplicity (also in the notation), we assume that each I-frame has the same

dimension, i.e., maxseqF = I as given by Equation (22); substituting in Equation (18):

Sc = NT B? PN ?1 i=1 FPi B =N T ? PN ?1 i=1 FPi B (23)

Picture Dimension

The smaller the P-frames, the longer the coding shaping delay that the encoding/decoding system must introduce. If the encoded scene is completely static, the P-frames are encoded with a very small amount of bits; in principle 0. Thus, the I-frame

grows using all the bits intended for the encoding of the GOP, i.e.,NT B; this corresponds

to a coding shaping delay equal to the GOP periodN T.

Thus, a CBR encoder which aims at yielding a short coding shaping delay must bound the ratio between the dimension of I-frames and P-frames, even though the scene to encode is slowly moving and would allow a large ratio. The key element in determining the number of

bits used to encode pictures is the VBV dimensionVs; a dimension equal to N T B allows

the encoder to take full advantage of predictive coding when the scene is particularly slow and to use a large amount of bits to encode I-frames. A larger dimension allows the encoder to deliver even more uniform quality by smoothing sudden increase in the complexity of pictures and in the motion over more than a GOP period. Nevertheless, this requires larger coding shaping delay and buers in both encoder and decoder.

Beside the VBV, CBR encoders better have other means for controlling the number of

bits used to encode pictures. For example, the software encoder dvdenc before encoding a

picture sets a target dimension; this is chosen according to a predened amount of bits to be used in the encoding of each GOP and a preferred ratio between dimension of I-frames and P-frames. This is particularly useful when a particularly complex picture (i.e., with low spatial redundancy) is being intra-frame coded; if the encoder used only the VBV to determine the target for the picture dimension, in order to deliver good quality, it would use the maximum number of bits allowed by the system to encode a picture, i.e., it would ll up the VBV.

According to Claim 2, the dimension of following picture(s) would be bound by B T; this

bound can be too small, particularly if the scene becomes suddenly fast. In fact, P-frames are encoded with a number of bits that is not sucient to show the same visual quality of the I-frame corresponding I-frame and thus the quality of the GOP is lowered.

GOP Size

Increasing the GOP size (i.e.,N) increases the coding shaping delay because the percentage of small pictures (the P-frames) increases and so I-frames are made larger to keep the target rate; even though according to Assumption 4 this increases the visual quality of

the encoded scene, it should be avoided to keep the delay smaller14.

Using a low video frame rate (i.e., large T) increases the coding shaping delay, as shown

by Equation (23); the longer the video frame period T, the larger the coding shaping delay.

Nevertheless, Equation (22) shows that increasing the video frame period also improves the quality of the encoded scene because I-frames become larger.

Conclusion 1

Given the target rate of a CBR MPEG encoder, predictive coding increases the visual quality. Nevertheless, every parameter setting aimed at quality improvement increases the coding shaping delay.

In document End-to-endDelayofVideoconferencingoverPacket (Page 76-78)