Video compression: Performance of available codec software
1 Introduction 1.1 Digital Video
A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes advantage of the spatio‐temporal properties of the human eye to simulate continuity in motion. The persistence of the human eye is such that nanoseconds of exposure to an image results in milliseconds of image on the retina. Hence, images played at a speed greater than a millisecond would appear to be continuous. In general, the eye cannot differentiate between individual images when they are played at a rate of 25 per second or higher. Several standards for television exist, which define the frame rate of the video being displayed. Some of them are the NTSC, PAL etc. The frame rate varies from 25 fps to 60fps depending on the standard. The video file consists of the individual images (also known as frames) and the sequencing information.
1.2 The Size Barrier
Consider a video that is being played out at the rate of 30 images per second. For a 640x480 grayscale video represented in the raw lossless format, that would be 640x480x30 bytes per second. For a 30 minute video, this would be approximately 16GB. For a colour video, using three bytes per pixel, that would be 48GB not even including the audio and the sequencing information. This is almost the size of 2 blue‐ray discs for a small sized SD video. For some modern HD transmissions, the frame sizes are as high as 1920x1080 which would work out to video sizes be greater than 300GB when uncompressed .
1.3 Video Compression
It is impossible to even imagine transmitting videos of such huge sizes. To reduce the size of the video to manageable proportions, the videos are usually never stored or transmitted in the raw format. Even in situations where compression is not required, the video is still compressed. This is because, the human eye is insensitive to higher frequencies and minute variations in colour and transmitting this information would be a waste of resources. Every video is subjected to some kind of compression. The compression method is based upon the application and bandwidth constraints where the video is used.
Compression may be classified
I. Based upon the reproducibility into a. Lossless compression
As the name indicates, videos compressed using this method can be reproduced to the original content without any change in data. Some methods which perform lossless compression are Huffman coding, Run Length Coding etc. The amount of compression achieved using these methods is very less compared to those achieved using loss methods(
which are discussed next). Further, the amount of compression is also greatly dependent upon the content of the video.
b. Lossy compression
This compression is performed by dropping information which does not significantly affect the visualization of the video. For example, the human eye is insensitive to high frequencies and also does not recognize minor variations in colours. Hence this information can be dropped while encoding the video. Methods such as JPEG perform lossy compression.
II. Based upon where the compression is performed into a. Intraframe compression
This method takes advantage of the spatial redundancy present in each frame of the video and compresses each frame based upon one of the compression methods. In general, indoor videos have a uniform, non‐changing background and have a very high spatial redundancy which can be greatly compressed.
b. Interframe compression
This method identifies the temporal redundancies between consecutive frames in a video and attempts to remove them. Usually, videos do not have much scene changes and hence will have a lot of temporal redundancy.
Usually good video formats implement both Inter and intra frame compression techniques.
1.4 Video encoding formats
A video encoding format is a representation for compressed video. Such a format specifies the representation of each frame, the sequencing information between frames and compression and decompression methods for inter and intra frame redundancy.
Although maximum compression is targeted, usually, all formats have a certain amount of redundancy in them. This is to maintain performance in environments where there is frame dropping and data loss.
Error propagation resistance mechanisms are part of the specifications of all video encoding formats.
These also assist in seeking of data. Without these, every time we watch a move, we would have to start from the beginning without being able to cue forward.
Some popular formats for video encoding are:
• wmv
• Mpeg – 1
• Mpeg – 4
• Asf
1.5 Container formats
Container formats are different from encoding formats. They hold combinations of the video and audio encoded formats. They specify the bitrates of the audio and video and help maintain the synchronization between the audio and video. Some containers are designed to hold only a specific combination of audio and video while some are capable of holding several combinations (but only one combination at a time). Two popular container formats are .avi and .wmv. .avi can be used to hold several video formats including mpeg‐4 and mpeg‐1
Some video encoding formats are containers in themselves and are capable of holding both audio and video. For example, MPEG‐1, MPEG‐4
1.6 Codec
A codec is an acronym for Coder‐Decoder. It is capable of encoding a set of images into a video and decoding a video into a set of images. Each image usually constitutes a frame in the video. However, several additional frames are added for the reasons discussed earlier.
Each codec is capable of working with only a specific video format. However, several codecs can exist for a single format. Usually, each multimedia company format has its own codec for its player for a format.
For example, theora, mov are all codecs for the mpeg‐4 format. Codecs can be in either software or hardware. The software codecs are slower and inexpensive as compared to the hardware codecs which are much faster.
The specifications for a format are not rigid and provide for some variations. Although codecs implement a specified format, they may vary in their method of operation resulting in variations in quality and performance.
2 Codec Evaluation
With the ever increasing need for bandwidth, codec designers tend to be over greedy and design algorithms which might badly affect the aesthecity of the video content. Hence, evaluation criteria for codec performances are required to verify the quality of the compressed videos.
2.1 Criteria for Comparison
The codecs are compared based on the following criteria
1. Quality of Video
2. Performance of the codec
2.2 Quality of video
Quality of video corresponds to the look and feel of the video, the resolution, the artifacts, the blurring and other visual aesthetic components. The quality of video depends on both the format of the video and the codec used to encode to that format. Usually, several codecs implement a single format.
However, each one differs from the other. Quality also depends on the amount of information on the video being encoded. Also, the performance will not be constant throughout the video. Clips with higher information have more artifacts than scenes with little movement and scene changes. Quality can be measured as objective or subjective.
2.2.1 Objective Quality
Objective quality is to measure the quality in mathematical terms which makes it very easy to compare and evaluate. Some of the metrics available to measure objective quality are:
a. Mean Square Error (MSE): It is the second moment of the difference and describes the variance between the original frame and the encoded frame.
b. Peak Signal to Noise Ratio (PSNR): The ratio between the maximum signal level and the noise.
Mathematically, it is given by:
c. Colour Difference: This is the absolute difference of the individual colour components between the input frame and the output frame. It is calculated by
d. Structural Similarity (SSIM)[2] – This is used to measure the similarity between two images. It is a number between 0 and 1. It is a function of luminance, contrast and structural similarity. It is independent of the colour components.
2.2.2 Subjective Quality
Subjective quality is measured by visually inspecting the encoded video for artifacts, blurring, blocking and overall quality.
2.3 Performance of the Codec
The performance of the codec is measured as a function of three quantities 1. Compression ratio of the codec( File size)
2. Speed of encoding( compression) 3. Speed of decoding( decompression)
2.3.1 Compression ratio of the codec
The compression ratio of the codec is measured by encoding a repetitive set of frames using the codecs to yield videos of different formats. The file size of the encoded video to the uncompressed video will act as a measure of the capacity for compression. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios.
2.3.2 Encoding and Decoding speed
The encoding and decoding speed vary from codec to codec and within the same codec for different frames. Higher the redundancy, slower the encoding and smaller is the size of the file. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios.
2.4 Bit Rates
Bit Rate is measured in Kilo Bits per second and represents the amount of data flow per unit time. It is an important factor that decides the quality of the video. For example, consider a video which has a bit rate of 1000 KbPS. For a standard definition video, this would mean that there would be about 29.8 frames in the 1000 Kilo Bits i.e about 33 Kilo Bits per frame or 4 Kilo bytes per frame. This restricts the amount of data that can be used to represent a frame. Lower bit rates mean higher compression and lower quality of video, more noise, blocking, discolouration etc.
Application Bit Rates
a Video streaming 100‐500 KbPS
b SD video 500‐2000 KbPS
c HD video >2000 KbPS
By measuring each of the quantities discussed in 2.3 and 2.4, we will be able to identify the appropriate codec for a specific application.
3 Implementation
3.1 Codecs
The following codecs are being evaluated in this study
Sl.
No
Codec Designer/Developer Format Container
1 WMV2 Microsoft wmv wmv
2 Theora Xiph.org MPEG‐4 avi
3 Asf Microsoft asf asf
4 MPEG4 MPEG MPEG‐4 mp4
5 Quicktime Apple MPEG‐4 mov
6 MPEG‐1 MPEG MPEG‐1 mpeg
All codecs are part of the ffmpeg library.
3.2 Dataset
The following videos are used for evaluation of the codecs. The reason for selection of the video is also described. All videos are of 352x288 pixel dimension, but may appear stretched in this document.
3.2.1 Quality Measurement 3.2.1.1 Akiyo
Figure 1 A frame from the Akiyo video sequence
This is a 300 frame video in the uncompressed YUV format. This video shows a news reader. It has no background changes and almost negligible foreground changes.
3.2.1.2 Foreman
Figure 2 A frame from the Foreman video sequence
This is a 300 frame video in the uncompressed YUV format. This video has a sudden scene change at the end. Other than that, there is no background change. Only the face shows rich emotions which can be hard to compress
3.2.1.3 Football
Figure 3 A frame from the Football video sequence
This is a 125 frame video in the uncompressed YUV format. This video has a constant background and a very rapid and large change in the foreground as player keep coming in and going out of the frames.
3.2.1.4 Stephan
Figure 4 A frame from the Stephan video sequence
This is a recording of Stephan Edberg’s tennis match. This is 300 frames in length and is also in the uncompressed YUV format. This video has a fast foreground change as the player runs about, and a background change as the camera follows him. This would be the hardest kind of natural video to encode.
Video Foreground Change Background Change
Akiyo • •
Foreman √ •
Football √ •
Stephan √ √
3.2.2 Performance Measurement
In order to measure the performance in terms of compression ratio and speed of encoding, I have proposed a set of frames as shown below. These frames will together allow us to measure the best and worst case scenarios.
Alternate frames Spatial Redundancy Temporal Redundancy 3.2.2.1
100% 100%
3.2.2.2
100% 0%
3.2.2.3
¬=0% 100%
3.2.2.4
¬=0% ¬=0%
These pairs of alternating frames incorporate the best and worst case scenarios for compression.
3.2.3 Bit Rates
In order to cover the entire range of applications, the videos will be encoded to the following Bit rates:
a. 600 KbPS
This is the range at which youtube plays its videos.
b. 1,000 KbPS
This is the bit rates generally used in video conferencing c. 3,000KbPS
These bit rates are generally used in optical disc playbacks.
4 Results and Discussion
As part of the exercise, I was able to mesure most of the evaluation parameters. However, due to issues with the ffmpeg library, I did not get an accurate measure of the coding and decoding times.
4.1 Akiyo
Akio MSE
1.5 2 2.5
3000
wmv2 theora asf mpeg4 1
qt
0.5 mpeg1
0
600 1000
Figure 5 Mean Squared Error for Akiyo
Akio PSNR
48 47 wmv2
theora
43 44 45 46
1000 3000
asf mpeg4 qt mpeg1
600
Figure 6 PSNR for Akiyo
Akio Absolute Color Distance
0 0.2 0.4 0.6 0.8 1 2
3000 1.
wmv2 theora asf mpeg4 qt mpeg1
600 1000
Figure 7 Absolute Colour distance for Akiyo
Akio SSIM
0.9965 0.997 0.9975 0.998 0.9985 0.999 0.9995 1
3000
wmv2 theora asf mpeg4 qt mpeg1 0.996
600 1000
Figure 8 Structural Similarity for Akiyo
F o o tb all MS E
2 3 4 5 6 7 8 9 10
3000
wmv2 Theora as f mpeg4 Q t mpeg1
1 0
600 1000
Figure 9 Mean Squared Error for Football
F o o tb all P S NR
36
600 37
38 39 40 41 42 43
1000 3000
wmv Theora as f mpeg4 Q t mpeg1
F o o tb all A b s o lu te C o lo r Dis tan c e
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
600 1000 3000
wmv2 theora as f mpeg4 Q t mpeg1
F o o tb all S S IM
0.88
600 1000
0.9 0.92 0.94 0.96 0.98 1
3000
wmv2 Theora as f mpeg4 Q t mpeg1
Figure 10 PSNR for Football
Figure 11 Absolute Colour Distance for Football
Figure 12 Structural Similarity for Football
F oreman MS E
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
600 1000 3000
wmv2 Theora as f mpeg4 Q t mpeg1
Figure 13 Mean Squared Error for foreman
F o reman P S NR
39 40 41 42 43 44 45 46 47
600 1000 3000
wmv2 Theora as f mpeg 4 Q t mpeg 1
Figure 14 PSNR for foreman
F o reman A b s o lu te C o lo r Dis tan c e
0 0.5 1 1.5 2 2.5 3
600 1000 3000
wmv2 theora as f mpeg4 Q t mpeg1
Figure 15 Absolute Colour Distance for foreman
F o reman S S IM
0.975 0.98 0.985 0.99 0.995 1 1.005
600 1000 3000
wmv2 Theora as f mpeg4 Q t mpeg1
Figure 16 Structural Similarity for foreman
Stephan MSE
0 2 4 6 8 10
600 1000 3000
wmv2 theora asf mpeg4 qt mpeg1
Figure 17 Mean Squared Error for Stephan
Stephan PSNR
36 37 38 39 40 41 42 43 44 45
600 1000 3000
wmv2 theora asf mpeg4 qt mpeg1
Figure 18 PSNR for Stephan
Stephan Absolute Color Distance
0 1 2 3 4 5
600 1000 3000
wmv2 theora asf mpeg4 qt mpeg1
Figure 19 Absolute Colour Distance for Stephan
Stephan SSIM
0.88 0.9 0.92 0.94 0.96 0.98 1 1.02
600 1000 3000
wmv2 theora asf mpeg4 qt mpeg1
Figure 20 Structural Similarity for Stephan
0 200 400 600 800 1000 1200 1400 1600
Size in KB 1
B-W File Sizes
mpeg-1 Qt mpeg-4 asf theora wmv2
0 100 200 300 400
Size in KB 1
W-W File Sizes
mpeg-1 Qt mpeg-4 asf theora w mv2
Figure 21 File sizes with high spatial and low temporal redundancy Figure 22 File sizes with high spatial and temporal redundancy
Figure 24 File sizes with low spatial and low temporal redundancy
Figure 23 File sizes with low spatial and high temporal redundancy
0 500 1000 1500 2000 2500 3000
Size in KB 1
C-C File Sizes
mpeg-1 Qt mpeg-4 asf theora w mv2
0 5000 10000 15000 20000
Size in KB 1
C-N File Sizes
mpeg-1 Qt mpeg-4 asf theora wmv2
Following are some sample frames from the encoded videos
Figure 25 Counter Clockwise from the top – a frame from the Stephan video – original frame, wmv encoded at 600kbps and wmv encoded at 3000kbps
Figure 26 Counter Clockwise from the top – a frame from the Akiyo video – original frame, wmv encoded at 600kbps and wmv encoded at 1000kbps
In Figure 25, the distortion is clearly visible when encoded at 600kbps, but at 3000kbps, it is almost negligible. However, in Figure 26, there is no visible distortion even at 600kbps. This implies that the encoding process is sensitive to the content of the video also.
5 Conclusion
Selection of a format for encoding or representation depends upon the application which uses the video. The various criteria to be considered before selecting a format are:
• Application
o Transmission
Videos used for transmission and viewing over the internet require a high compression ratio. They can compromise on the quality as such videos are rarely used for important applications.
o Video Conferencing
Video conferencing applications have specific criteria when it comes to quality. They need the videos to be clear, but the frame rate can be compromised. Surveillance videos also fall into this category. The encoding and decoding speed are of significance here.
o Archiving
Videos used for this purpose do not have significant demands on encoding or decoding speed. They require higher resolution and quality with lower file sizes.
• Performance Requirements
o Real time video processing for UAVs etc.
The requirement here is for faster encoding speed and very little blurring o Video Viewing
Video viewing, in general, does not have much processing requirements. This is because of the availability of sufficient processing capability and non‐ real time nature of the application.
• Quality requirements o Entertainment o Conferencing o Surgical procedures
6 Future Work
Possible future work includes
a. Measuring blurring effects of the codecs
b. Measuring blocking effects and impact on edge detection algorithms c. Evaluating coding and decoding times.
d. Identifying impact of frame size on coding speed and compression ratio.
7 References
[1] Madhuri Khambete, and Madhuri Joshi, “Blur and Ringing Artifact Measurement Image Compression using Wavelet Transform”, PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 20 APRIL 2007 ISSN 1307-6884
[2] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity” , IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004