Key to Success
3. Statistical redundancy is constituted by elements that are regularly repeated, including the horizontal and vertical sync pulses, and can be
6.9 The MPEG-4 Compression System
The MPEG-4 compression is most frequently described as an object-based com- pression method, where an object is any part of a picture that can be assessed and
6.9 The MPEG-4 Compression System 91 DCT Quantization Variable length coding Buffer OUT (MPEG)
Buffer fill level control IN (SDI) (a) IN (SDI) DCT Buffer Quantization Variable length coding OUT (DV) Analysis Quantizer control (b)
Figure 6.6 a) Feedback and b) feedforward control of the quantization process.
processed independently. By using a larger number of additional lossless predic- tion tools, this compression system achieves a considerably higher efficiency than its predecessors. In the case of MPEG-2 and other previously described compres- sion methods, the whole picture content is described with pixels creating fields and pictures. MPEG-4 has a different approach based on additional tools used for image description, and it requires a whole set of new definitions, almost a new vocabulary. Therefore, before offering a short explanation of the workings of that complex compression system it is necessary to explain the basic definitions of that new vocabulary:
• Video object is a planar pixel array of any shape that changes its shape or position with time.
• Video object plane is the equivalent of a picture in MPEG-2. In MPEG-2 a moving object changes its position from one picture to another, while in MPEG-4 a video object intersects object planes.
• Still object or sprite is a planar pixel array that does not change with time. • Texture is the appearance of a part of the picture.
• Mesh object is a 2D or 3D shape that changes its position and shape with time. By using computer-based processing methods it is possible to apply a texture on the mesh (the warping process) and obtain rendered images. • Face and body animation is used in transmissions at very low bit rates, such
as videoconferencing or video telephones where it is used to recreate facial and body movements at the decoder.
The MPEG-2 compression method achieves a considerable bit-rate reduction by using motion-compensated prediction and creating P- and B-frames. The motion compensation is based on detecting movement in regular fixed-size areas of the picture called macroblocks. If a moving object is not aligned with the boundaries of a macroblock, the desired smoothness of the movement rendition and the overall decoded picture quality require the transmission of a higher residual bit rate, which lowers the efficiency of the compression process.
In the case of MPEG-4, moving objects are assessed separately as arbitrary shapes and processed separately from the background. In that way consider- ably less residual data are transmitted. Namely, in the MPEG-2 encoding system, the first frame is encoded, then the differences in the consecutive frames that are the result of the object’s movement are detected, encoded and sent forward together with motion vectors whose role is to interpret correctly the movement itself. In the case of MPEG-4 encoding only the first frame is coded and then the moving object in the following frames is described by movement vectors only, which obviously will require less bits. Several moving objects as well as the back- ground can be separately assessed and processed. The data resulting from these separate processes are multiplexed and sent as a single bitstream. At the other end, the decoder acts as a multilayer compositing vision mixer. It receives the back- ground data, decodes it and keys in the texture of processed moving video objects and mesh objects. If the background shifts due to camera movements, it can be treated as another video object and shifted at the decoder by using information from motion vectors.
MPEG-4 is a very complex and very successful bit-rate reduction method featuring a large number of different tools. Some even believe that a codec equipped with all options offered by this method will never be built. Therefore, the following is not a full explanation of all essential MPEG-4 compression aspects but rather presents a general description of the video object coding method as a sample explanation of the workings of several MPEG-4 compression tools.
As shown on Figure 6.7 the incoming picture is analyzed, and the moving video object (object 1) is detected. A bounding rectangle is defined around the moving object, encompassing the entirety of the object. The coordinates of the bounding
6.9 The MPEG-4 Compression System 93 Background or Object 2 Object 1 Object 2 coding Shape coder Compressed bitstream Multiplexing Texture coder Texture padding Bounding rectangle
Figure 6.7 MPEG-4 video object coding.
rectangle are then coded. The bounding rectangle can change from one video object plane to another, in line with the object movement. Inside the bounding rectangle all original pixels outside the shape of Object 1 are replaced by texture
padding data. Padding data are selected in such a way as to produce minimal
DCT coefficients that can be discarded in the process, leaving only the texture contained inside the boundaries of Object 1. The shape of the video Object 1 is then coded in a shape coding process known as context-based coding, which relies heavily on prediction, while the texture of the object is coded separately using a conventional DCT coding process. At the same time, the background is coded either as a still sprite or as another object (Object 2) that can also change over time. If the background is a still picture, it can be coded by using a wavelet-based compression. The wavelet transform is a mathematical formula—a process that, like the DCT coding, explained earlier, does not perform the bit-rate reduction but simply represents the picture in a form that facilitates the detection of redundancy, which is then rejected by other means. The name “wavelet” is a literary translation of the French word “ondelette,” coined by French mathematicians who were instrumental in developing this mathematical formula.
In this way, the shape data and the shape movement as well as the texture data and texture movement of all objects and backgrounds are separately analyzed and coded, and the result of all these separate coding processes are multiplexed (combined to represent a single signal) to create one single MPEG-4 bitstream.
At the decoder side, all multiplexed bitstreams will be separated and separately analyzed to recreate the background and to key into it the video objects.
Besides video object coding, the complete set of MPEG-4 compression tools for broadcasting applications also encompasses other sophisticated methods that can ensure important additional savings in bit rate. The use of all these tools ensures an impressive compression efficiency.
The latest extension of MPEG, the H.264 AVC (advanced video coding), can be considered a sort of extreme refinement and great improvement of MPEG-2 and MPEG-4 compression tools. H.264 is not based on object coding but on the revision and ultimate refinement of all previously developed solutions. It is an extremely complex system requiring a considerably greater processing power both on the coding and decoding sides, but offering as a reward two to two-and-a-half times better compression efficiency than MPEG-2. Although H.264 or the proprietary Windows Media 9 compression methods offer considerably higher compression efficiency than MPEG-2, they are not meant to replace it in existing applications. In the domains of standard-definition digital broadcasting or television production, MPEG-2 is firmly implanted, and it would be difficult and probably counterpro- ductive to change all the adopted standards. In addition, you have to take into account that the number of pieces of equipment produced in accordance with that compression method is really impressive. On the other hand, new applica- tions such as DVB H (transmission of video content to handheld devices) or some areas of HDTV production are domains that could benefit from these new highly efficient methods.