Key to Success
3. Statistical redundancy is constituted by elements that are regularly repeated, including the horizontal and vertical sync pulses, and can be
6.6 Exploiting the Temporal Redundancy
6.6 Exploiting the Temporal Redundancy
As stated earlier, temporal redundancy can be found by comparing two adjacent frames. The bit-rate reduction techniques used for the removal of that type of redundancy are consequently called interframe compression. Considering that two adjacent frames are separated by either 33 or 40 milliseconds, it is clear that in such a short time span not many changes can occur. In its essence the interframe compression technique consists of analyzing two adjacent frames, calculating all the differences that exist between them, and sending further only these noted differences.
Figure 6.2 shows the amount of temporal redundancy existing between two adjacent frames. A careful look at this figure will reveal that the differences are minimal and that most of the content of the first frame is repeated in the second. Therefore, the role of the interframe compression will consist in detecting the differences and coding them accordingly. The first of the two frames shown in Figure 6.2 will be compressed by the intraframe compression methods and all the spatial redundancy will be eliminated; the next frame will consist only of the coded information of the differences between these two frames. The differences between two successive frames are analyzed and quantified by comparing two pictures element by element, that is, block by block.
However, the simple interframe compression cannot deal successfully with situations when there is a lot of movement in the scene. A moving object will be in two different positions in two adjacent frames, regardless of the fact that the displacement could be very small. If the two pictures are compared, taking as a reference the static background over which an object is moving, the result will be a relatively large difference signal that would have to then be coded. To further reduce the bit rate and facilitate an accurate prediction of all parts of the picture, including the moving ones, it is necessary to apply an additional process known as motion compensation. That technique is used to estimate as precisely as possible the location in the previous picture of the block to be coded, and then the difference signal is calculated by using just that area of the previous picture. This process provides motion vectors that describe the location of the block used for the computation of the difference signal in the previous picture. It is important to stress that the inaccuracies in estimating the motion vectors will not distort the decompressed picture but will lead to an increase of the bit rate to be transmitted and less efficient compression.
The goal of motion estimation is to detect horizontal and vertical motion vec- tors that will provide the minimum error for the block to be coded. In that respect the most important parameters are the search range and the hierarchical search algorithms. The search range defines the number of pixels over which the motion-estimation process will check for block matches. A larger search range means a wider span of motion speeds that can be evaluated. However, here again a technical optimum has to be found. If the search range is restricted, the motion-estimation process will not track all fast movements or small mov- ing objects and the compression will be less efficient. On the other hand, a full search range permits the checking of all possible vector displacements, but it requires considerable processing capacity. One way to reduce such a stringent computational requirement is to implement hierarchical search algorithms. The hierarchical search is a series of consecutive search procedures. The first pass investigates possible motion vectors and defines the best one. The subsequent stages are refinements centered on the vector that was defined in the first pass. Compared with an exhaustive search, this system is certainly a compromise since there is no assurance that the first algorithm will find the best possible vector, but it leads to considerable savings in the processing capacity of the encoder chips.
After the application of compression tools described above, an MPEG-2 data stream will consist of several types of compressed frames:
• I-frames (intra) use only intraframe compression (i.e., all the spatial and statistical redundancy existing inside that same frame is removed). All the necessary information for decoding these frames is contained in the frame
6.6 Exploiting the Temporal Redundancy 85
itself and does not require referencing to any adjacent one. For that reason decoders use I-frames as a starting point for decoding a group of pictures that also contain other types of frames.
• P-frames (predicted) use the nearest preceding frame (either an I- or another P-frame) as a base for the computation of their predicted content. Such a content calculation is called forward prediction. P-frames offer a considerably greater compression ratio that I-frames, but they cannot be used independently since I-frames are needed in the decoding process. • B-frames (bidirectional) are computed by means of bidirectional predic-
tion, that is, on the basis of both the previous and the next adjacent frame. As such the B-frames are even more efficiently coded than P-frames but cannot be used as a base for predicting other frames. B-frames bring an important contribution to the efficiency of the bit-rate reduction, but their decoding requires more storage capacity in the decoder and this makes it more expensive.
In principle, the predicted frames will be more efficiently compressed than the intraframe compressed frames, and it could seem advantageous to constitute a data stream composed mainly of such frames. However, to start its operation, a decoder must encounter an I-frame and start decoding from that point. If there is no reference I-frame, an error introduced in the first predicted frame would propagate along the datastream. To overcome these problems the MPEG coder controls the number and frequency of I-frames in the data stream, defining the length of a group of pictures (GOP).
A group of pictures (see Figure 6.3) is a set of compressed I-, B-, and P-frames, or pictures that start with an I-frame and extends to the frame preceding the next I-frame in the sequence. A GOP is open when the last P-frame in it is referenced to the first I-frame of the next GOP, and closed when all the pictures in the group are referenced to the first I-frame of that same group and when the last picture in the series is a P-frame.
The MPEG coder is also responsible for the definition of the number of B- frames in one GOP and for the order of transmission, which should be such as to permit a correct decoding of the stream. For example, if a GOP contains B- frames, the decoding efficiency will be greatly enhanced and the decoder design simplified if the B-frame is sent after and not between the two reference frames used for its prediction. The normal display order is then re-established before the output of the decoder.
Such a bit-rate reduction method, or more precisely set of methods, produces a bitstream of a variable bit rate reflecting the variable picture content. However, transmission and recording channels usually require constant bit rates, and, for that reason, at the end of the MPEG encoding process the compressed signal is
I B B Predictions
P B B P
Figure 6.3 MPEG-2 group of pictures.
601 bit stream
Motion estimation
Temporal Spatial Quantizer VLC Buffer
Variable bit rate Rate control
Compressed video data with a fixed bit rate
Motion vectors
Figure 6.4 Workflow of MPEG-2 coding.
fed to a “smoothing buffer” and the data are subsequently read from the buffer at a constant data rate (see Figure 6.4). At the same time the buffer controls the amount of stored data and sends control signals back to the coder. If the buffer is too full, the control signal pushes coding to a higher level thus reducing the output bit rate. When the level of data in the buffer becomes too low, the control signal switches the coding to a lower level and this results in a higher bit rate at the output.