Video compression: Performance of available codec software

(1)

Video compression: Performance of available codec software

1 Introduction 1.1 Digital Video

A digital video is a collection of images presented sequentially to produce the effect of continuous motion. It takes advantage of the spatio‐temporal properties of the human eye to simulate continuity in motion. The persistence of the human eye is such that nanoseconds of exposure to an image results in milliseconds of image on the retina. Hence, images played at a speed greater than a millisecond would appear to be continuous. In general, the eye cannot differentiate between individual images when they are played at a rate of 25 per second or higher. Several standards for television exist, which define the frame rate of the video being displayed. Some of them are the NTSC, PAL etc. The frame rate varies from 25 fps to 60fps depending on the standard. The video file consists of the individual images (also known as frames) and the sequencing information.

1.2 The Size Barrier

Consider a video that is being played out at the rate of 30 images per second. For a 640x480 grayscale video represented in the raw lossless format, that would be 640x480x30 bytes per second. For a 30 minute video, this would be approximately 16GB. For a colour video, using three bytes per pixel, that would be 48GB not even including the audio and the sequencing information. This is almost the size of 2 blue‐ray discs for a small sized SD video. For some modern HD transmissions, the frame sizes are as high as 1920x1080 which would work out to video sizes be greater than 300GB when uncompressed .

1.3 Video Compression

It is impossible to even imagine transmitting videos of such huge sizes. To reduce the size of the video to manageable proportions, the videos are usually never stored or transmitted in the raw format. Even in situations where compression is not required, the video is still compressed. This is because, the human eye is insensitive to higher frequencies and minute variations in colour and transmitting this information would be a waste of resources. Every video is subjected to some kind of compression. The compression method is based upon the application and bandwidth constraints where the video is used.

Compression may be classified

I. Based upon the reproducibility into a. Lossless compression

(2)

As the name indicates, videos compressed using this method can be reproduced to the original content without any change in data. Some methods which perform lossless compression are Huffman coding, Run Length Coding etc. The amount of compression achieved using these methods is very less compared to those achieved using loss methods(

which are discussed next). Further, the amount of compression is also greatly dependent upon the content of the video.

b. Lossy compression

This compression is performed by dropping information which does not significantly affect the visualization of the video. For example, the human eye is insensitive to high frequencies and also does not recognize minor variations in colours. Hence this information can be dropped while encoding the video. Methods such as JPEG perform lossy compression.

II. Based upon where the compression is performed into a. Intraframe compression

This method takes advantage of the spatial redundancy present in each frame of the video and compresses each frame based upon one of the compression methods. In general, indoor videos have a uniform, non‐changing background and have a very high spatial redundancy which can be greatly compressed.

b. Interframe compression

This method identifies the temporal redundancies between consecutive frames in a video and attempts to remove them. Usually, videos do not have much scene changes and hence will have a lot of temporal redundancy.

Usually good video formats implement both Inter and intra frame compression techniques.

1.4 Video encoding formats

A video encoding format is a representation for compressed video. Such a format specifies the representation of each frame, the sequencing information between frames and compression and decompression methods for inter and intra frame redundancy.

Although maximum compression is targeted, usually, all formats have a certain amount of redundancy in them. This is to maintain performance in environments where there is frame dropping and data loss.

Error propagation resistance mechanisms are part of the specifications of all video encoding formats.

These also assist in seeking of data. Without these, every time we watch a move, we would have to start from the beginning without being able to cue forward.

(3)

Some popular formats for video encoding are:

• wmv

• Mpeg – 1

• Mpeg – 4

• Asf

1.5 Container formats

Container formats are different from encoding formats. They hold combinations of the video and audio encoded formats. They specify the bitrates of the audio and video and help maintain the synchronization between the audio and video. Some containers are designed to hold only a specific combination of audio and video while some are capable of holding several combinations (but only one combination at a time). Two popular container formats are .avi and .wmv. .avi can be used to hold several video formats including mpeg‐4 and mpeg‐1

Some video encoding formats are containers in themselves and are capable of holding both audio and video. For example, MPEG‐1, MPEG‐4

1.6 Codec

A codec is an acronym for Coder‐Decoder. It is capable of encoding a set of images into a video and decoding a video into a set of images. Each image usually constitutes a frame in the video. However, several additional frames are added for the reasons discussed earlier.

Each codec is capable of working with only a specific video format. However, several codecs can exist for a single format. Usually, each multimedia company format has its own codec for its player for a format.

For example, theora, mov are all codecs for the mpeg‐4 format. Codecs can be in either software or hardware. The software codecs are slower and inexpensive as compared to the hardware codecs which are much faster.

The specifications for a format are not rigid and provide for some variations. Although codecs implement a specified format, they may vary in their method of operation resulting in variations in quality and performance.

2 Codec Evaluation

With the ever increasing need for bandwidth, codec designers tend to be over greedy and design algorithms which might badly affect the aesthecity of the video content. Hence, evaluation criteria for codec performances are required to verify the quality of the compressed videos.

2.1 Criteria for Comparison

The codecs are compared based on the following criteria

(4)

1. Quality of Video

2. Performance of the codec

2.2 Quality of video

Quality of video corresponds to the look and feel of the video, the resolution, the artifacts, the blurring and other visual aesthetic components. The quality of video depends on both the format of the video and the codec used to encode to that format. Usually, several codecs implement a single format.

However, each one differs from the other. Quality also depends on the amount of information on the video being encoded. Also, the performance will not be constant throughout the video. Clips with higher information have more artifacts than scenes with little movement and scene changes. Quality can be measured as objective or subjective.

2.2.1 Objective Quality

Objective quality is to measure the quality in mathematical terms which makes it very easy to compare and evaluate. Some of the metrics available to measure objective quality are:

a. Mean Square Error (MSE): It is the second moment of the difference and describes the variance between the original frame and the encoded frame.

b. Peak Signal to Noise Ratio (PSNR): The ratio between the maximum signal level and the noise.

Mathematically, it is given by:

c. Colour Difference: This is the absolute difference of the individual colour components between the input frame and the output frame. It is calculated by

d. Structural Similarity (SSIM)[2] – This is used to measure the similarity between two images. It is a number between 0 and 1. It is a function of luminance, contrast and structural similarity. It is independent of the colour components.

2.2.2 Subjective Quality

Subjective quality is measured by visually inspecting the encoded video for artifacts, blurring, blocking and overall quality.

(5)

2.3 Performance of the Codec

The performance of the codec is measured as a function of three quantities 1. Compression ratio of the codec( File size)

2. Speed of encoding( compression) 3. Speed of decoding( decompression)

2.3.1 Compression ratio of the codec

The compression ratio of the codec is measured by encoding a repetitive set of frames using the codecs to yield videos of different formats. The file size of the encoded video to the uncompressed video will act as a measure of the capacity for compression. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios.

2.3.2 Encoding and Decoding speed

The encoding and decoding speed vary from codec to codec and within the same codec for different frames. Higher the redundancy, slower the encoding and smaller is the size of the file. By selecting appropriate frames to compress, we can measure both the best and worst case scenarios.

2.4 Bit Rates

Bit Rate is measured in Kilo Bits per second and represents the amount of data flow per unit time. It is an important factor that decides the quality of the video. For example, consider a video which has a bit rate of 1000 KbPS. For a standard definition video, this would mean that there would be about 29.8 frames in the 1000 Kilo Bits i.e about 33 Kilo Bits per frame or 4 Kilo bytes per frame. This restricts the amount of data that can be used to represent a frame. Lower bit rates mean higher compression and lower quality of video, more noise, blocking, discolouration etc.

Application Bit Rates

a Video streaming 100‐500 KbPS

b SD video 500‐2000 KbPS

c HD video >2000 KbPS

By measuring each of the quantities discussed in 2.3 and 2.4, we will be able to identify the appropriate codec for a specific application.

3 Implementation

3.1 Codecs

The following codecs are being evaluated in this study

(6)

Sl.

No

Codec Designer/Developer Format Container

1 WMV2 Microsoft wmv wmv

2 Theora Xiph.org MPEG‐4 avi

3 Asf Microsoft asf asf

4 MPEG4 MPEG MPEG‐4 mp4

5 Quicktime Apple MPEG‐4 mov

6 MPEG‐1 MPEG MPEG‐1 mpeg

All codecs are part of the ffmpeg library.

3.2 Dataset

The following videos are used for evaluation of the codecs. The reason for selection of the video is also described. All videos are of 352x288 pixel dimension, but may appear stretched in this document.

3.2.1 Quality Measurement 3.2.1.1 Akiyo

Figure 1 A frame from the Akiyo video sequence

This is a 300 frame video in the uncompressed YUV format. This video shows a news reader. It has no background changes and almost negligible foreground changes.

(7)

3.2.1.2 Foreman

Figure 2 A frame from the Foreman video sequence

This is a 300 frame video in the uncompressed YUV format. This video has a sudden scene change at the end. Other than that, there is no background change. Only the face shows rich emotions which can be hard to compress

3.2.1.3 Football

Figure 3 A frame from the Football video sequence

This is a 125 frame video in the uncompressed YUV format. This video has a constant background and a very rapid and large change in the foreground as player keep coming in and going out of the frames.

(8)

3.2.1.4 Stephan

Figure 4 A frame from the Stephan video sequence

This is a recording of Stephan Edberg’s tennis match. This is 300 frames in length and is also in the uncompressed YUV format. This video has a fast foreground change as the player runs about, and a background change as the camera follows him. This would be the hardest kind of natural video to encode.

Video Foreground Change Background Change

Akiyo • •

Foreman √ •

Football √ •

Stephan √ √

3.2.2 Performance Measurement

In order to measure the performance in terms of compression ratio and speed of encoding, I have proposed a set of frames as shown below. These frames will together allow us to measure the best and worst case scenarios.

(9)

Alternate frames Spatial Redundancy Temporal Redundancy 3.2.2.1

100% 100%

3.2.2.2

100% 0%

3.2.2.3

¬=0% 100%

3.2.2.4

¬=0% ¬=0%

These pairs of alternating frames incorporate the best and worst case scenarios for compression.

3.2.3 Bit Rates

In order to cover the entire range of applications, the videos will be encoded to the following Bit rates:

a. 600 KbPS

This is the range at which youtube plays its videos.

b. 1,000 KbPS

This is the bit rates generally used in video conferencing c. 3,000KbPS

These bit rates are generally used in optical disc playbacks.

4 Results and Discussion

As part of the exercise, I was able to mesure most of the evaluation parameters. However, due to issues with the ffmpeg library, I did not get an accurate measure of the coding and decoding times.

4.1 Akiyo

(10)

Akio MSE

1.5 2 2.5

3000

wmv2 theora asf mpeg4 1

qt

0.5 mpeg1

0

600 1000

Figure 5 Mean Squared Error for Akiyo

Akio PSNR

48 47 wmv2

theora

43 44 45 46

1000 3000

asf mpeg4 qt mpeg1

600

Figure 6 PSNR for Akiyo

Akio Absolute Color Distance

0 0.2 0.4 0.6 0.8 1 2

3000 1.

wmv2 theora asf mpeg4 qt mpeg1

600 1000

Figure 7 Absolute Colour distance for Akiyo

Akio SSIM

0.9965 0.997 0.9975 0.998 0.9985 0.999 0.9995 1

3000

wmv2 theora asf mpeg4 qt mpeg1 0.996

600 1000

Figure 8 Structural Similarity for Akiyo

(11)

F o o tb all MS E

2 3 4 5 6 7 8 9 10

3000

wmv2 Theora as f mpeg4 Q t mpeg1

1 0

600 1000

Figure 9 Mean Squared Error for Football

F o o tb all P S NR

36

600 37

38 39 40 41 42 43

1000 3000

wmv Theora as f mpeg4 Q t mpeg1

F o o tb all A b s o lu te C o lo r Dis tan c e

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

600 1000 3000

wmv2 theora as f mpeg4 Q t mpeg1

F o o tb all S S IM

0.88

600 1000

0.9 0.92 0.94 0.96 0.98 1

3000

Figure 10 PSNR for Football

Figure 11 Absolute Colour Distance for Football

Figure 12 Structural Similarity for Football

(12)

F oreman MS E

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

600 1000 3000

Figure 13 Mean Squared Error for foreman

F o reman P S NR

39 40 41 42 43 44 45 46 47

600 1000 3000

wmv2 Theora as f mpeg 4 Q t mpeg 1

Figure 14 PSNR for foreman

F o reman A b s o lu te C o lo r Dis tan c e

0 0.5 1 1.5 2 2.5 3

600 1000 3000

wmv2 theora as f mpeg4 Q t mpeg1

Figure 15 Absolute Colour Distance for foreman

F o reman S S IM

0.975 0.98 0.985 0.99 0.995 1 1.005

600 1000 3000

Figure 16 Structural Similarity for foreman

(13)

Stephan MSE

0 2 4 6 8 10

600 1000 3000

Figure 17 Mean Squared Error for Stephan

Stephan PSNR

36 37 38 39 40 41 42 43 44 45

600 1000 3000

Figure 18 PSNR for Stephan

Stephan Absolute Color Distance

0 1 2 3 4 5

600 1000 3000

Figure 19 Absolute Colour Distance for Stephan

Stephan SSIM

0.88 0.9 0.92 0.94 0.96 0.98 1 1.02

600 1000 3000

Figure 20 Structural Similarity for Stephan

(14)

0 200 400 600 800 1000 1200 1400 1600

Size in KB 1

B-W File Sizes

mpeg-1 Qt mpeg-4 asf theora wmv2

0 100 200 300 400

Size in KB 1

W-W File Sizes

mpeg-1 Qt mpeg-4 asf theora w mv2

Figure 21 File sizes with high spatial and low temporal redundancy Figure 22 File sizes with high spatial and temporal redundancy

Figure 24 File sizes with low spatial and low temporal redundancy

Figure 23 File sizes with low spatial and high temporal redundancy

0 500 1000 1500 2000 2500 3000

Size in KB 1

C-C File Sizes

mpeg-1 Qt mpeg-4 asf theora w mv2

0 5000 10000 15000 20000

Size in KB 1

C-N File Sizes

mpeg-1 Qt mpeg-4 asf theora wmv2

(15)

Following are some sample frames from the encoded videos

Figure 25 Counter Clockwise from the top – a frame from the Stephan video – original frame, wmv encoded at 600kbps and wmv encoded at 3000kbps

Figure 26 Counter Clockwise from the top – a frame from the Akiyo video – original frame, wmv encoded at 600kbps and wmv encoded at 1000kbps

(16)

In Figure 25, the distortion is clearly visible when encoded at 600kbps, but at 3000kbps, it is almost negligible. However, in Figure 26, there is no visible distortion even at 600kbps. This implies that the encoding process is sensitive to the content of the video also.

5 Conclusion

Selection of a format for encoding or representation depends upon the application which uses the video. The various criteria to be considered before selecting a format are:

• Application

o Transmission

Videos used for transmission and viewing over the internet require a high compression ratio. They can compromise on the quality as such videos are rarely used for important applications.

o Video Conferencing

Video conferencing applications have specific criteria when it comes to quality. They need the videos to be clear, but the frame rate can be compromised. Surveillance videos also fall into this category. The encoding and decoding speed are of significance here.

o Archiving

Videos used for this purpose do not have significant demands on encoding or decoding speed. They require higher resolution and quality with lower file sizes.

• Performance Requirements

o Real time video processing for UAVs etc.

The requirement here is for faster encoding speed and very little blurring o Video Viewing

Video viewing, in general, does not have much processing requirements. This is because of the availability of sufficient processing capability and non‐ real time nature of the application.

• Quality requirements o Entertainment o Conferencing o Surgical procedures

6 Future Work

Possible future work includes

(17)

a. Measuring blurring effects of the codecs

b. Measuring blocking effects and impact on edge detection algorithms c. Evaluating coding and decoding times.

d. Identifying impact of frame size on coding speed and compression ratio.

7 References

[1] Madhuri Khambete, and Madhuri Joshi, “Blur and Ringing Artifact Measurement Image Compression using Wavelet Transform”, PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 20 APRIL 2007 ISSN 1307-6884

[2] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity” , IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 13, NO. 4, APRIL 2004