In this chapter, a novel interpolation-based virtual view assisted super-resolution method for mixed-resolution multiview video has been proposed. The low resolution views in
Table 3.6: The Luminance PSNR (dB) and SSIM values and gains over the benchmark method for multiview video
Doorflower QP 22 27 32 37 42 47 PSNR Lan 33.40 33.15 32.55 31.46 29.62 27.43 Pro 37.48 36.89 35.62 33.80 31.49 28.70 ∆PSNR 4.08 3.74 3.07 2.34 1.87 1.27 SSIM Lan 0.972 0.967 0.957 0.941 0.904 0.840 Pro 0.988 0.982 0.973 0.960 0.932 0.876 ∆SSIM 0.015 0.015 0.016 0.019 0.028 0.036 Pantomime QP 22 27 32 37 42 47 PSNR Lan 35.56 35.27 34.53 33.13 30.80 27.54 Pro 40.16 39.71 38.11 35.41 31.91 28.00 ∆PSNR 4.60 4.44 3.58 2.28 1.11 0.46 SSIM Lan 0.994 0.993 0.991 0.986 0.975 0.939 Pro 0.997 0.996 0.994 0.989 0.978 0.942 ∆SSIM 0.003 0.003 0.003 0.003 0.003 0.003 Champagne QP 22 27 32 37 42 47 PSNR Lan 34.01 33.67 32.90 31.62 29.55 26.95 Pro 37.75 36.88 35.48 33.17 30.42 27.31 ∆PSNR 3.74 3.21 2.58 1.55 0.87 0.36 SSIM Lan 0.992 0.990 0.985 0.977 0.961 0.923 Pro 0.996 0.994 0.989 0.981 0.965 0.926 ∆SSIM 0.004 0.004 0.004 0.004 0.004 0.003 Dog QP 22 27 32 37 42 47 PSNR Lan 34.99 34.29 32.95 31.07 28.45 25.82 Pro 36.81 35.77 33.97 31.71 28.76 26.00 ∆PSNR 1.82 1.48 1.02 0.64 0.31 0.18 SSIM Lan 0.983 0.977 0.962 0.933 0.862 0.733 Pro 0.991 0.984 0.970 0.940 0.869 0.743 ∆SSIM 0.008 0.007 0.007 0.007 0.007 0.010 Kendo QP 22 27 32 37 42 47 PSNR Lan 37.56 36.85 35.48 33.49 30.87 27.86 Pro 41.05 39.90 38.01 35.58 32.58 29.35 ∆PSNR 3.49 3.05 2.53 2.09 1.71 1.49 SSIM Lan 0.986 0.981 0.972 0.956 0.927 0.877 Pro 0.990 0.986 0.980 0.968 0.946 0.907 ∆SSIM 0.004 0.005 0.008 0.012 0.020 0.030
the MR multiview video are super-resolved to full resolution size using a two-step pro- cess. In the first stage, the similarity between the LR pixels and their counterparts in the virtual view is measured. Then if necessary, a smoothness check will be carried out to determine whether to use virtual view pixels or interpolated pixels to fill the zero-
filled pixels. Subsequently, the quality of the virtual-view-based pixels is enhanced by compensating the intrinsic luminance difference between the two views. Furthermore, the inter-view correlation is exploited to enhance the LR pixels in the super-resolved frame by reducing their compression distortion. Therefore, different from the previous interpolation-based SR algorithms, the advantages of virtual views have been exploited by the proposed method at different stages. Moreover, it has been shown that the proposed algorithm achieves superior performance with respect to both benchmark and state-of-the-art approaches. Future work will be devoted to combining temporal correlations with inter-view correlation to improve the exploitation of the virtual views. It is worth reporting that the work reported in this section has led to the following publication: Zhi Jin, Tammam Tillo, Jimin Xiao, Chao Yao and Yao Zhao, Virtual View Assisted Video Super-Resolution and Enhancement, Circuits and Systems for Video
Chapter 4
Depth-Map-Assisted Texture
Super-Resolution for Multiview
Video Plus Depth
4.1
Introduction
The MVD format [98], as one popular representation format for 3D multiview data, consists of textures and the associated per-pixel depth data. Although this format allows any intermediate view within a certain range to be generated, referring to 2D video systems, the required transmitted data of 3D multiview video is still very large.
Compared with texture, depth maps require less transmission bitrate [99], however, the quality of the decoded depth maps highly affects the quality of the DIBR generated synthesized views. Hence, much research has only focused on reducing the amount of transmitted texture data. In [100], Garcia et al. proposed a mixed-resolution-based coding approach where all of the frames were divided into groups of N FR frames and M LR frames. In each group, the smallest encoding resolution for the M LR frames was obtained by iteratively comparing theN-th reconstructed FR frame with its original version. Since the best downsampling ratio was obtained based on only one FR frame and then applied to the remaining LR frames, the obtained results may not be optimal. Moreover, if the resolution estimation analysis was done on a large scale, the computational complexity would hugely increase. While, Zhang et al. in [9] proposed a content adaptive downsampling on both views of a stereoscopic video. However, the downsampling mechanism needs information regarding the interpolation mechanism used.
erally improve the rate-distortion performance at low bitrate. However, it may result in subjective quality degradation. In order to overcome this quality degradation, such as, block artifacts, blurred details and ringing artifacts around the edges [90], some sophisticated edge-guided upsampling and super-resolution methods have been pro- posed. In [101], a sharp HR gradient field was obtained from the LR image by an adaptive self-interpolation algorithm, and then this HR gradient was used as an addi- tional edge-preserving constraint to refine the FR image. In [8] and [102], virtual views have been utilized to recover FR frames in a MR-MVD framework, so in [8] the high frequency content has been extracted from the virtual view and then added to the LR frame to reconstruct the FR frame. However, in this work, the high frequency content is extracted from the whole frame, thus the local characteristics of the scene are not taken into account. While, in [102], the local similarity between the LR frames and their corresponding virtual view has been used to steer the FR recovery mechanism. Consequently, for similar areas, virtual view pixels are used to generate the FR frame, whereas, for non-similar areas a conventional interpolation method is used. The main weakness of this approach is that it does not provide a mechanism to jointly fuse these two kinds of pixel. Different from the previously described paradigms, in [103] and [9], an optimized down/upsampling framework is proposed. In this approach, in order to minimize interpolation errors, the image downsampling pattern is evaluated as a function of the interpolation method. Unfortunately, this paradigm is not suitable for video applications, because the downsampling pattern is frame dependent which means that temporal redundancy cannot be efficiently removed by the video encoder. Thus, for video applications there is a need to use temporally static downsampling patterns. In this chapter asystematical framework to downsample and upsample MVD data is proposed. In the proposed downsampling approach, the rows of two adjacent texture views are downsampled following an interlacing and complementary pattern, before compression. The aim of this downsampling approach is to facilitate the upsampling at the decoder side, where the LR views will be upsampled by fusing the virtual view pixels with directional interpolated pixels with the aid of pattern direction of the dis- carded pixels. This approach has two benefits. Firstly, the high frequency information contained in counterpart LR view can be properly utilized to upsample the other LR view through the generated virtual views. Secondly, since the virtual view quality de- pends on many factors, e.g. DIBR technique and depth map quality, it generally has
low quality on depth discontinuous areas, where on the other hand, directional interpo- lation approaches can work well. Hence, by taking advantages of these two strategies, the discarded pixels can be recovered efficiently. Experimental results have shown that the proposed algorithm achieves superior performance with respect to the filter-based interpolation algorithms and other state-of-the-art algorithms. The proposed upsam- pling approach is named Directional Data Fusion Upsampling (DDFU) throughout this chapter.
The rest of this chapter is organized as follows. Section 4.2 describes the details of the proposed down/upsampling algorithm. Experimental results are presented in Section 4.3 and the conclusion is in Section 4.4.