B The MPEG Encoding Standard
B.3 Predictive-coded Pictures
Predictive-coding reduces both spatial and temporal redundancy. Pictures are encoded by reference to the previous one (having it been intra- or predictive-coded) which is called the
reference picture in the following. This Section describes the basic steps of predictive-coding; the rst two allow temporal redundancy to be reduced before going through the very steps of intra-coding.
B.3.1 Motion Estimation
On the luminance component of the digital image, 16x16-pixel squares are identied, each composed of 4 blocks; one block on each of the chrominance components corresponds to the
16x16-pixel square on the luminance component. Motion estimation operates on macroblocks
(MBs) which consist of 6 corresponding blocks (4 on the luminance component and one on the two chrominance components). The reference picture is searched for a MB \similar" to
the one being encoded; the possibly found MB is called a predictor. The dierence between the given MB and the predictor is encoded as a representation of the former.
An algorithm, not specied by the MPEG standard, is run in order to identify a predictor in the area of the reference picture around the location corresponding to the MB being encoded. The algorithm behavior is controlled by two parameters which determine its performance in terms of running time and contribution to the overall picture compression.
1. The search range identies the area around the current MB position in which the ref-
erence picture is searched for the predictor. The larger the search area, the higher the probability of nding a very well matching MB, thus yielding good compression. On the other hand, the larger is the search area, the longer it takes to the algorithm to complete the search. The search range can be changed on a picture by picture basis.
2. Thesimilarity criteria aims at choosing as predictor the MB which is going to provide
the highest compression when the dierence from the actual MB is encoded. The more complex the similarity criteria, the better the compression obtained, but the harder the computation.
Motion estimation is the most computationally intensive and time consuming step of the whole encoding process.
B.3.2 Motion Compensation
When a predictor is found for the MB, the dierence between each pixel and the corresponding one in the predictor is computed. Since two 8-bit integers are subtracted, the result is a 9-bit integer. If a MB similar enough is not found, each block of the MB is encoded as an I-frame block, i.e., the motion compensation step is skipped.
The complete encoding of a motion compensated MB encompasses also a motion repre-
sentation which allows the predictor to be identied. Motion is represented through a motion
vector, i.e., the bidimensional oset of the predictor from the position in the picture of the MB being encoded. The motion vector is encoded as dierence from the motion vector of the previous MB.
B.3.3 DCT
The DCT is performed on the output of the previous step. If a predictor has been found, the DCT is performed on the dierences between the pixels of the given MB and those of the predictor. If no predictor has been found, the DCT is applied on the pixels of the MB itself.
B.3.4 Quantization
The result of the DCT is quantized using dierent quantizers for motion compensated MBs and intra-coded MBs. The motion compensated MBs likely have small coecients and thus the quantizer must have no deadzone, i.e., the range of values that are quantized to zero must be smaller than the quantization stepsize.
If the coecients of a motion compensated MB are all zero (i.e., a predictor identical to the MB has been found) the MB is no further processed and it is represented in the MPEG stream by a special 6-bit code.
B.3.5 Entropy Encoding
All the coecients of motion compensated MBs are encoded using run-length encoding and Human encoding (i.e., the DC-coecient is not treated dierently). Blocks of non motion compensated MBs are encoded according to the intra-coding process.
B.3.6 Controlling P-frame Dimension
The number of bits obtained when predictive-coding a picture (P-frame dimension) depends on the following factors:
Dynamics of scenes: if scenes are static, subsequent pictures do not change too much,
motion compensation is highly eective, and P-frames are small.
Resolution of images: see Section B.2.4.
Search rangeandsimilarity criteriaused to nd the predictor. Both parameters could be
used to control the compression ratio, but they are employedtoo early in the compression process. Thus, on one side, reiterations with dierent values would be impracticable due to the large amount of computation required by motion estimation. On the other side, the relationship between changing these parameters and the number of generated bits is not trivial.
Quantizer: see Section B.2.4.
Variance of the rst symbols produced by run-length encoding: see section B.2.4.
The same actions proposed in Section B.2.4 for controlling the size of I-frames, can be taken for P-frames. In addition, before encoding a P-frame, the percentage of MBs which can be motion compensated can be roughly estimated by estimating the overall similarity of the two images. Having some knowledge or statistics about the number of bits needed to encode motion compensated MBs, it is possible to roughly estimate the amount of bits that are going to be produced and consequently choose the search range and the quantizers to be used.