DSC for traditional imaging systems

2.5 Distributed coding

2.5.2 DSC for traditional imaging systems

The Slepian-Wolf and Wyner-Ziv theorems show that the correlated sources can be distributively compressed with a minimum coding loss when compared to the joint encoding. However, it is not straightforward to apply these theoretical results for multi-view or video compression. One has to solve the following issues in order to design a practical distributed coding system. The statistical dependency between sources given in terms of conditional mass function P (X|Y ) is not an ideal model for describing the correlation between multi-view images and video sequences. In such scenarios, as described in Section 2.3, the correlation model for accurate scene representation is eﬀectively described by the disparity or motion models. Furthermore, the Slepian- Wolf and Wyner-Ziv results come under the assumption that the statistical dependency (or equivalently the correlation model parameters) between the Wyner-Ziv encoded source and the side information is perfectly known at the encoder. In practical distributed scenarios, the side information is available only at the decoder and not at the encoder. Therefore, while designing a practical distributed coding solution, one has to accurately model the characteristics of the channel and estimate the channel model parameters at the encoder for controlling the Slepian-Wolf coding rate.

The ﬁrst practical distributed coding scheme for compressing the time varying images captured by a video camera has been proposed in [79, 80], where the video frames are independently encoded and jointly decoded by exploiting the correlation between images using block-based motion compensation. In

[79], the video frames are categorized into key frames and Wyner-Ziv frames with a GOP size of 2, i.e.,

I_j, j∈ {1, 3, . . . , (2J/2) − 1} represent the key frames and I_j, j∈ {2, 4, . . . , 2 J/2} represent the Wyner-

Ziv frames, where J represents the number of frames in the video sequence. The key frames are encoded independently using standard coding solutions, e.g., JPEG 2000 or H.264. The Wyner-Ziv frames are ﬁrst transform coded (e.g., DCT) followed by a scalar quantization with 2M _{levels. Then, the quantized}

coefficients are represented in M bitplanes, and each bitplane is finally channel coded (e.g., Turbo codes). The resulting parity bits are stored in the buffer and transmitted to the joint decoder upon request. The joint decoder first estimates a side information ˜I_jusing block-based motion compensation and interpolation from the decoded key frames, denoted as ˆI_j−1 and ˆI_j+1. The side information ˜I_j efficiently captures the low frequency components, but not the high frequency components, as the motion compensation usually fails to efficiently capture the visual information along the edges and in texture regions. Therefore, the side information can be considered as a noisy version of the original Wyner-Ziv frame, and the noise can be corrected using the parity bits transmitted from the encoder. Using a feedback channel, the decoder then requests the encoder to transmit the parity bits, and this process is repeated until the error in the side information is corrected. The Slepian-Wolf encoding rate in this scenario is controlled in an accurate manner by exchanging the channel statistics using the feedback messages.

The PRISM architecture [80] is very similar to the previous one, except that it has no feedback channel. Instead, the rate control is achieved at the encoder by allowing limited communication between the encoders. Furthermore, in addition to the parity bits, the encoder also transmits a cyclic redundancy check word (CRC) computed from the quantized Wyner-Ziv frames in order to assist the motion estimation/compensation at the decoder. The decoder first selects a set of candidate motion compensated side information blocks from the previously decoded reference frames. The transmitted CRC is then compared with the CRC generated from the decoded block. If there is no deviation, the decoding is reported as successful, otherwise the decoder chooses another candidate block. This process is repeated until successful decoding is achieved. Inspired by these frameworks [79, 80], significant research efforts have been carried out in the past decade in order to improve the compression performance. In particular, a lot of research has been focused on the side information improvement and on the accurate correlation noise modeling ([81, 82]). More details related to recent advances in distributed video coding can be found in the overview articles [83, 84].

Now, we discuss few studies reported in the literature about the application of distributed coding prin- ciples to camera networks. The works reported in the literature are generally based on coding with side information, where one camera is used as a reference to decode the information from the other cameras. For example, in [85, 86] the cameras are categorized into reference and Wyner-Ziv cameras and the correlation among views is exploited at the joint encoder using disparity estimation based on epipolar geometry, which usually requires camera parameters. When camera parameters are not available and calibration is not pos- sible, the joint decoder can rather use block-based disparity estimation to exploit the redundancy between images [87, 88]. Gehrig et al. [89, 90] have proposed a geometry-based distributed coding scheme for com- pressing the multi-view images. The authors proposed to represent each view using a piecewise polynomial model. Using this polynomial representation of image, the correlation model is built by relating the locations of discontinuities among diﬀerent views. However, they consider a special camera arrangement where all the cameras are placed in a straight line. Super-resolution techniques have been applied for the distributed coding in camera networks in [91]. In their framework, each sensor transmits a low resolution image to the decoder. At the decoder, these low resolution images are registered with respect to a reference image, where the image registration is performed by shape analysis and image warping. The registered images are then jointly processed to decode a high resolution image. Distributed compression has also been applied for the multiple images captured in omnidirectional sensor networks [92]. The correlation between omnidirectional images is ﬁrst estimated that is modeled using the local transformations of the sparse features captured by an over-complete structured dictionary. Then, the estimated correlation model is used for Wyner-Ziv image coding based on partitioning the dictionary into several cosets.

2.5 Distributed coding 21

i.e., they target the corner points U or V in the Slepian-Wolf rate region shown in Fig. 2.6. It is clear that these frameworks do not balance the transmission rate between the encoders. In practice, it may however be desirable to have more flexibility in the transmission rate among the encoders. The first work that addresses balanced rate allocation in distributed coding is based on time sharing mechanism [93], which is however hard to implement, due to node synchronization issues. The first practical scheme for symmetric coding based on channel code partitioning has been proposed in [94]. This scheme has been later extended to multiple sources using systematic channel codes [95]. It is based on splitting the generator matrix of the channel code into sub-generator matrices. Codewords are then generated using the sub-matrices, and are assigned to each encoder. The compression rate of each encoder is determined by the number of rows retained in the corresponding sub-matrix. The advantage of this system is the need for only one channel code. However, this framework is limited to systematic channel codes. The authors in [96] have developed a symmetric DSC solution using a general linear channel code framework based on algebraic binning concept. Simulation results have shown that almost the entire Slepian-Wolf region can be covered with this coding algorithm. Symmetric distributed coding can also be achieved by information partitioning. In this context, Sartipi et al. [97] have considered the compression of two sources by information partitioning, where half of the source bits are transmitted directly, while the corresponding syndrome bits are generated on the other half (complementary part) of the source bits. Similar to [96], the authors show that they can approach the entire Slepian-Wolf region, and thus the decoding error becomes insensitive to an arbitrary rate allocation among the encoders. However, both schemes are based on capacity approaching channel codes, which usually approach the Slepian-Wolf bound only for long source length (typically 104). Grangetto et al. [98] have proposed a balanced coding scheme for small block length binary sources. The algorithm is based on a time sharing version of distributed Arithmetic codes, which performs better than the Turbo code solution in the considered framework.

DSC with rate allocation has also been considered in imaging applications. The authors in [99] proposed a rate balanced DSC scheme for video sequences. In this scheme, each frame is divided into two partitions and one partition is then transmitted directly. In addition, each frame is Wyner-Ziv encoded and the side information is eventually generated using motion estimation. This scheme permits to avoid hierarchical relations between frames. However, it results in high coding rates, since one of the partitions in each frame is encoded using both Wyner-Ziv and independent coding. Finally, a balanced distributed coding scheme for camera networks has been proposed in [100], based on linear channel code construction that can achieve any point in the Slepian-Wolf region. The proposed linear codes have however not been applied to the actual coding of images in camera networks.

In document Distributed Compressed Representation of Correlated Image Sets (Page 35-37)