3.3 Conventional Distributed Video Coding
3.3.4 Comparison of DVC Architectures
The similarities and differences between these three DVC systems are summarized as follows.
(i) Frame Classification
In both the Stanford system and DISCOVER, the input video sequence are divided into WZ frames and key frames. In PRISM, there is no classification of frames performed. All video frames are treated similarly.
(ii) Spatial Transformation
In all three architectures, block-based DCT is used. In the Stanford system and DISCOVER, only the WZ frames are transformed. The transform coefficients of each frame are grouped according to their values into bands.
In the Stanford and DISCOVER codecs, after Turbo / LDPC decoding, inverse DCT is performed to decode the WZ frames. In the PRISM codec, a block is reconstructed from the corresponding SI and quantized bit stream.
(iii) Quantization
In the Stanford system and DISCOVER, each DCT band is uniformly quantized with a number of levels that depend on the target quality or on the DCT coefficients. For a given band, bits of the quantized symbols are grouped together, forming bit- planes, which are then independently turbo encoded or LDPC encoded. In the PRISM architecture, a scalar quantizer is used.
(iv) Block Classification
This is only done in PRISM since the other two are frame-based codecs. (v) Turbo/LDPC Coding
Only turbo encoding is used in the Stanford system while DISCOVER makes use of both turbo and LDPC encoding for coefficient bit-planes. The Turbo/LDPC decoder receives successive chunks of parity bits from the feedback channel. To decide whether more bits are needed for the successful decoding, the decoder uses a simple request stopping criteria which checks that all Turbo/LDPC code parity check equations are satisfied for the decoded codeword. In DISCOVER, a further CRC checking is performed to obtain a good reconstruction quality.
(vi) Syndrome Coding and Hash Generation
This is performed in the PRISM codec only. For the syndrome class, only the least significant bits of the quantized DCT coefficients are syndrome encoded. In addition, for each block, the encoder sends a 16-bit cyclic redundancy check (CRC) checksum as a signature of the quantized DCT coefficients. This is needed in order to select the best candidate block (SI) at the decoder. Candidate blocks are used for syndrome decoding. A hash signature is generated for each decoded candidate block. For successful decoding, the generated hash signature is compared with the CRC hash received from the encoder.
(vii) Side Information Creation
This is an important step in DVC decoding. For both the Stanford and DISCOVER codecs, SI is created by previously decoded key frames using motion compensated frame interpolation. This is an estimate for the WZ frames. The better the es- timate, the smaller the number of parity bits needed for correction. In PRISM, motion estimation is performed using a reference frame by positioning a window around the center of block to be decoded.
(viii) Correlation Noise Modelling
The correlation statistics between side information and WZ frames is modelled by the Laplacian distribution. This modelling is needed in both the Stanford system and DISCOVER. Prism does not require this step.
3.4
Summary
In this chapter, a review of CS based Image and Video coding is presented. Different CS image coding schemes are classified into different categories and then key points in each category are discussed. Similarly, a classification for different CS video coding schemes is discussed. The differences with the work done in this thesis and available CS image/video literature is also discussed.
Chapter 4
Sensing Matrix, Quantization
Matrix and Reconstruction
Algorithms for Image Compression
In a conventional lossy image compression system, an invertible transform is applied to the image which provides its expansion in terms of transform coefficients. Typically most of the energy of the signal is concentrated in a relatively small subset of the transform coefficients. Consequently, when quantization is then applied to the coefficients, a sig- nificant number of quantized coefficients will be zero and therefore need not be encoded. After quantization, a lossless compression process called “entropy coding” encodes the data into a bit stream for storage or transmission. Decompression is performed by inverse quantization followed by inverse transformation. This process is used in JPEG [1]. The choice of transformation and the design of the quantization matrix are important factors in the performance of the compression system.
For a system based on compressed sensing, the process is somewhat different. Instead of applying a transform to the image, a set of linear measurements is obtained through a sensing matrix. The number of measurements is typically much smaller than the original image. Figure 4.1 illustrates this process in block diagram form. Here the measurementsy
is obtained by applying a sensing matrix Φ to an imagexwith a total ofN pixels. Φ is an
Image Compression
Image
CS Encoding
Quantization
Recovered
Image
CS
Decoding
Inverse
Quantization
Encoder
Decoder
Sparsity
Transform /TV
Figure 4.1: CS Image Compression
vector y = Φx is m. The CS measurements are then quantized and entropy encoded. At the decoder, inverse quantization is followed by a CS recovery process to reconstruct the image. In this case, the performance of such a compression system is determined by the number of measurements, the sensing matrix, the quantization matrix, and the CS reconstruction algorithm.
In this chapter, the effects of the choice of sensing and quantization matrices, and the CS reconstruction algorithms are studied in a non-distributed image compression set- ting. The efficacy of several different sensing matrices are evaluated in terms of encoding complexity and ease of implementation. A quantization matrix is designed and its perfor- mance is evaluated. Finally, several different CS reconstruction algorithms are compared in terms of reconstruction time and reconstruction quality. The results obtained in this
chapter is then applied to distributed image and video coding in subsequent chapters.