5.6 Extension to multiple image sets
5.7.4 Distributed video coding
We study now the performance of the proposed algorithms in distributed video coding applications. The experimental setup is similar to the stereo imaging framework described in the previous section, except
5.7 Experimental results 85
that the correlation estimation relates to the motion estimation instead of disparity computation. We have tested our algorithm on three datasets. The first dataset is built using the frames 2 and 3 of the Foreman sequence, and the second dataset is built using the frames 65 and 66 of the Tennis sequence. The third dataset Mequon is selected from the Middlebury optical flow database4. Frames 2 and 66 are selected as the reference image I1 in the first and second datasets respectively. The quality of the reference image
is approximately 45 dB for the first dataset, and 33 dB for the second and third datasets. We use the same dictionary described in the previous section for approximating the image ˆI1. For the first dataset we approximate ˆI1using K = 60 atoms, and for the second and third datasets we approximate ˆI1using K = 90 atoms. The measurements Y2are compressed using a 2-bit uniform quantizer and an arithmetic coder. The search window size is δtx= δty = 4 pixels for the translational components tx and ty for the first dataset, and δtx= δty= 6 for the second and third datasets.
Fig. 5.15 illustrates the accuracy of the motion information computed in the OPT-2 scheme with 1267 and 3801 quantized measurements (i.e., 5% and 15% measurement rate respectively). It compares the image
˜
I2 reconstructed by warping the reference image, to respectively the original images I2 and I1. We see that
the warped image is closer to I2 than I1, which confirms the benefit of the motion estimation in the joint
decoder. We further observe that the error denoted by black pixels is reduced significantly in the face region due to a good estimation of the motion field in smooth areas. Similarly to the stereo experiments the motion around sharp edges is however not perfectly captured due to the choice of a dictionary that does not include very thin geometrical patterns. Similar experimental findings are observed in the Tennis and Mequon datasets shown in Fig. 5.16 and Fig. 5.17 respectively, where the proposed algorithm can capture efficiently the complex and large motion fields in the Tennis and Mequon datasets from 1267 (5%) quantized measurements.
We further study the RD performance of the proposed algorithms in the decoding of the image ˜I2. From
Fig. 5.18(a) and Fig. 5.18(b) it is clear that our proposed solutions outperform independent coding since it exploits the correlation between images. We then compare the performance to state-of-the-art solutions in joint and distributed video coding. First, we provide the performance of a DSC scheme based on motion learning [147], using the experimental setup similar to the one demonstrated in the previous section and a reference image ˆI1 of 45 dB for a fair comparison (denoted as Motion learning in the figures). In addition, we implement OPT-2 with a different dictionary that is built on blocks of the reference image, similarly to [103] (denoted as Block scheme in the figures). For the sake of completeness, we further provide results of a joint video encoding solution based on H.264 with an IP encoding structure (i.e., a GOP size of 2). We again encode the reference I-frame (I1) at a quality of 45 dB for the Foreman dataset (33 dB for the
Tennis dataset), and we vary the quantization parameter for the P-frame (I2) to build the rate-distortion
characteristics. We consider two different settings in the H.264 motion estimation, which is performed with variable and fixed macro block size. From Fig. 5.18(a) we first observe that the measurement consistency term Etin OPT-2 greatly improves the performance of our motion estimation algorithm. It also outperforms the DSC solution based on motion learning due to better model of the geometric correlation between images. The correlation estimation with block-based dictionary is less efficient than the estimation with a dictionary of geometric atoms. Finally, from Fig. 5.18(a) and Fig. 5.18(b) we see that the joint encoding based on H.264 is better than the distributed coding solutions for both Foreman and Tennis datasets. However, our algorithm is able to compete at low bit rates with H.264 based on a fixed block-size motion estimation, which is certainly an interesting and promising result. It should be noted that in our scheme we predict the second image based on motion compensation (i.e., warping); this certainly fails to estimate accurately the visual information along the edges and in texture regions. On the other hand, state-of-the-art schemes such as H.264 and DSC-based on motion learning compensate also for the prediction error in addition to correlation estimation. Though this is the case, we show by experiments that the proposed scheme outperforms H.264 (at low rate) and DSC-based on motion learning due to an accurate motion field estimation. In Chapter 7, we propose a joint reconstruction algorithm that improves the quality of the image ˜I2 by estimating the
(a) MSE: 78.49 (b) (mh,mv) (c) MSE: 47.22 (d) MSE: 60.1
(e) (mh,mv) (f) MSE: 38.1 (g) MSE: 62.11
Figure 5.15: Comparison of the warped image ˜I2 with respect to I2and I1 with the OPT-2 scheme in the Foreman dataset. (a) Inverted absolute error 1 − |I1− I2| between original images. Top row: Results estimated from a measurement rate of 5% with a 2-bit quantized measurements: (b) motion map (mh, mv); (c) inverted prediction error 1 − |˜I2− I2| with respect to I2; (d) inverted prediction error1 − |˜I2− I1| with respect to I1. Bottom row: Results estimated from a measurement rate of 15% with a 2-bit quantized linear measurements: (e) motion map (mh, mv); (f) inverted prediction error 1 − |˜I
2− I2| with respect to I2; (g) inverted prediction error1 − |˜I2− I1| with respect to I1.
(a) MSE: 282.77 (b) (mh,mv) (c) MSE: 154.25 (d) MSE: 271.12
Figure 5.16: Comparison of the warped image ˜I2 with respect to I2 and I1 with the OPT-2 scheme in the Tennis dataset: (a) inverted absolute error1 − |I1− I2| between original images; (b) motion map (mh, mv) estimated with OPT-2; (c) inverted prediction error 1 − |˜I2− I2| with respect to I2; (d) inverted prediction error 1 − |˜I2 − I1| with respect to I1. The motion field is estimated using a measurement rate of 5% with a 2-bit quantized linear measurements.
missing visual information from quantized linear measurements. We then compare the RD performances for the predicted image ˜I2 between the graph-based (global) and the constructive search parameter (local)
optimization methodologies. The comparison is available in Fig. 5.19 for the Foreman dataset, where the RD performances for global and local optimization schemes are represented in the dashed and dotted lines respectively. From Fig. 5.9, we see that the RD performance is significantly improved when the OPT-1 and OPT-2 optimizations are solved using strong optimization techniques based on Graph Cuts, which is
5.7 Experimental results 87
(a) MSE: 481.01 (b) (mh,mv) (c) MSE: 195.77 (d) MSE: 414.17
Figure 5.17: Comparison of the warped image ˜I2with respect to I2and I1with the OPT-2 scheme in the Mequon dataset: (a) inverted absolute error1 − |I1− I2| between original images; (b) motion map (mh, mv) estimated with OPT-2; (c) inverted prediction error 1 − |˜I2− I2| with respect to I2; (d) inverted prediction error 1 − |˜I2 − I1| with respect to I1. The motion field is estimated using a measurement rate of 5% with a 2-bit quantized linear measurements. 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 15 20 25 30 35 40 Rate (bpp) PSNR (dB) JPEG 2000 H.264 − Variable blocks H.264 − Block size 8 OPT−2 Motion learning Block scheme OPT−1 0 0.05 0.1 0.15 0.2 0.25 0.3 18 20 22 24 26 28 30 32 34 Rate (bpp) PSNR (dB) Tennis: JPEG 2000 Tennis:H.264 − Variable blocks Tennis: H.264 − Block size 8 Tennis: OPT−2 Mequon: OPT−2 Mequon : JPEG 2000
(a) (b)
Figure 5.18: (a) Rate-distortion performance with OPT-1 and OPT-2 schemes for decoding ˜I2 in the Foreman dataset. Comparisons with state-of-the-art coding solutions in independent, joint and distributed video coding schemes. (b) Rate-distortion performance of the proposed OPT-2 scheme with state-of-the-art independent coding solutions based on JPEG 2000 for Tennis and Mequon datasets.
consistent with our earlier observations in the disparity estimation experiments.