Research on Multi-View Video Plus Depth Coding Based on Network Bandwidth

(1)

Procedia

Engineering

www.elsevier.com/locate/procedia

2012 International Workshop on Information and Electronics Engineering (IWIEE)

Research on Multi-View Video Plus Depth Coding Based on

Network Bandwidth

Yapei Zhu

a

, MeiYu

a*

, Gunjun Zhang

a

, Gangyi Jiang

a

, Zongju Peng

a

a_{Faculty of Information Science and Engineering, Ningbo University, Ningbo 315211, China}

Abstract

In the free viewpoint television systems, video coder often has to operate within fixed bandwidth limitations over the network. This paper firstly shows detailed coding results at the target bit rates and selects the optimal combinations of quantization parameters (QPs) for video and depth. Then the joint coding for multi-view video plus depth which utilizes the optimal combination of the QPs is proposed. Simulation results demonstrate that the proposed coding method decreases the bit-rate by 16.07%~24.02% in comparison with JMVC, while the proposed method has better performance in optimizing the tradeoff between bit-rate and visual quality.

Keywords: Free viewpoint television systems; Multi-view video coding; Network bandwidth; Quantization parameters

1. Introduction

The requirements of video quality and content have become higher and higher as the rapidly development and widely application of digital video, and traditional 2D image has not been able to satisfy the viewers [1]. At the same time, the 3D video (3DV) technology is becoming a hot research topic. 3DTV/FTV is the next generation of Television technology after the maturation of high definition television (HDTV) [2,3,4]. Multi-view Video Coding (MVC) was the first phase of FTV (free-viewpoint TV), which enabled the efficient coding of multiple camera views. 3DV is the second phase of FTV, which is a new framework that started back. FTV is a framework that allows viewing of a 3D world by freely changing the viewpoint [5].

*

* Corresponding author. Tel.: +86-0311-8799-4277.

E-mail address: [email protected].

Open access under CC BY-NC-ND license.

(2)

Generally, multi-view video technology is in its infancy, there are still lots of key aspects should be researched deeply. In FTV systems, the complete processing chain is that captures a large amount of multi-view videos and the associated per-pixel depth maps, compresses and transmits them to interactive presentation. As a result, the quality of the synthesized view is highly associated with the reconstructed videos and depth maps. Among these, how to achieve optimized synthesized video quality becomes a great deal in the FTV application.

This remainder of this paper is organized as follows. Section 2 depicts the applications and requirements of 3DV. Different combinations of QP values for video and depth are tested in Sections 3 and 4, which is based on the character and requirement of 3DV. The work is concluded in Section 5.

2. The Applications and Requirements of 3DV

For representing a large number of views, a multi-view extension of stereo video coding is used, typically requiring a bit rate that is proportional to the number of views [6]. However, since the quality improvement of multi-view displays will be governed by an increase of emitted views, a format is needed that allows the generation of arbitrary numbers of views with the transmission bit rate being constant. Consequently, Joint Video Team proposed the structure of multi-view video plus depth (MVD) for the 3DV systems [7], which is the combination of video signals and associated depth maps. The depth maps provide disparities associated with every sample of the video signal that can be used to render arbitrary numbers of additional views via view synthesis.

Due to limitations in the production environment, the 3DV data format is assumed to be based on limited camera inputs; stereo content is most likely, but more views might also be available. The targets of the 3DV format are illustrated in Fig. 1.

Data

Format FormatData Constrained Rate

(based on distribution)

Limited Camera Inputs

• Wide viewing angle • Large number of output views Left Right Auto-stereoscopic N-view displays Stereoscopic displays

• Variable stereo baseline • Adjust depth perception

Fig. 1. Target of 3DV format illustrating

The rate required for transmitting the 3DV format should be fixed to the distribution constraints. For support of a large range of high-quality displays, there should not be an increase in the rate simply because the display requires a higher number of views to cover a larger viewing angle. In this way, the transmission rate and the number of output views are decoupled. For support of displays providing a significantly limited number of output views, it should be possible to decrease the bit rate by excluding information that is not required for rendering from the transmission. The amount of MVD data is tremendous because it includes multiple texture videos and associated depth videos of the same scene at different positions and angles [8]. In order to transmit and store these signals for practical use, they must be effectively compressed.

(3)

3. The combinations of QP values for multi-view video and depth

Firstly, depth maps were created by using depth estimation reference software (DERS) [9]. In the compression experiments the reference JMVC software is used to compress the test sequences. The resolution of the sequences is 1024×768 and the camera is parallel with which the distance between the two cameras is 6.5 cm. For four views, as in our case, this leads to limited inter-view prediction, only for anchor frames. In the process of compression, the same multi-view videos and depth video test data sets were encoded and decoded using H.264/MVC with typical settings for MVC, like variable block size, a search range of 32, CABAC enabled and rate control via Lagrangian techniques. From both test data sets the first 41 frames were used with GOP’s size of 8.

Secondly, the depth data has been coded using QPs 20 to 51 with a step size of one. For the video a QP range from 22 to 51 and a step size of one have been used. For all combinations of coded video and coded depth views 1, 3 and 5 have been synthesized from view 0 and view 2, 4 and 6 using VSRS 3.5 reference software and default settings.

Thirdly, for optimization the PSNR of the synthesized views 2, 4 and 6 have been considered. The PSNR of the three views were calculated by comparing synthesized view from uuencoded video and depth with synthesized view from coded video and depth. The sequences coded with QP combination leading to a total bit rate close to the target bit rates of 3000kbit/s, 2000 kbit/s, 1000 kbit/s and 500 kbit/s and highest PSNR of the synthesized views have been chosen for the Exploration Experiments.

Finally, the joint coding for MVD which utilizes the optimal combination of the QPs is tested and compared.

The PSNR of synthesized views and total bit rate for different combinations of QP values for video and depth are shown in Fig. 2(a). Fig. 2(b) shows detailed coding results at the target bit rates. The numbers in the brackets show the combinations of QPs for video and depth in the format: (VideoQP, DepthQP). Table 1 summarizes the selected QPs for Video and Depth as well as the resulting bit rate distributions. 0 1000 2000 3000 4000 5000 6000 7000 8000 28 30 32 34 36 38 40 42

Book Arrival - View Synthesis Quality

Tatal Bitrate(kbps) S ynt he si s P S N R (dB ) 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 39.4 39.6 39.8 40 40.2 40.4 40.6 40.8 41 (27,32)

Book Arrival - View Synthesis Quality

Tatal Bitrate(kbps) S ynt hes is P S N R (d B )

Fig. 2. (a) first picture; (b) second picture(a) PSNR of synthesized views vs. total bit rate for various QP combinations; target bit rates are marked with green lines; (b) PSNR of synthesized views vs. total bit rate at 3000 kbit/s

(4)

Target rate(kbps) 3000 2000 1000 500

QP video 27 30 35 42

QP depth maps 32 35 43 47

Bit rate video/Total bit rate(%) 74 70 75 71

With the constant target rate, the bit rate of the video accounts for more than 70% in the total bit rate. As a consequence, the quality of synthesized view quality is highly dependent from video quality.

Fig. 3. The first frame of synthesized views at each target bit-rate

Figure 3 shows the first frame of the view 2 synthesized from coded data as an example. From Fig.3, we can see clearly that at the higher bit-rate, the video quality is perfect and clear. With the decreasing of the bit-rate, the view is blurred.

4. Joint coding for Multi-view Video plus Depth Based on Network Bandwidth

Table 2 shows the RD performances of the proposed method and the JMVC. Every cell shows bit-rate of a test sequence with respect to a certain basis QP. Experimental results of encoding bit-rate in which BS indicates the bit-rate saving in coding process and it is defined by

% 100 × − = JMVC proposed JMVC Bitrate Bitrate Bitrate BS (1) Compared with the JMVC, the bit-rate of the proposed method decreases significantly by average 17.98%, ranging from 16.07% to 24.02%.

(5)

Figure 4 shows the RD performances of the proposed method and the JMVC. Compared with JMVC, the proposed method leads to significant RD performance optimization in synthesized quality.

0 500 1000 1500 2000 2500 3000 3500 32 33 34 35 36 37 38 39 40

41 Book Arrival - View Synthesis Quality

Tatal Bitrate(kbps) Syn th esi s P SN R (d B ) Proposed JMVC 0 200 400 600 800 1000 1200 1400 33 34 35 36 37 38 39 40 41

42 Leave Laptop - View Synthesis Quality

Tatal Bitrate(kbps) Syn th esi s P SN R (d B ) Proposed JMVC

Fig. 4. RD performance comparison between JMVC and the proposed method Table 2. RD performance comparison between the JMVC and the proposed method

5. Conclusion

The work, presented in this paper, combines multi-view video plus depth coding within fixed bandwidth limitations based on view synthesis. In reality, different combinations of QPs for video and depth maps are coded in order to synthesize better quality of the virtual views. The proposed joint coding which utilizes the optimal combination of the QPs is proposed. Simulation results demonstrate that the proposed coding method decreases the bit-rate by 16.07%~24.02% in comparison with JMVC, while the proposed method has better performance in optimizing the trade-off between bit-rate and visual quality.

Acknowledgments. This work is supported by Natural Science Foundation of China (Grant 60832003,

60872094), and Natural Science Foundation of Zhejiang Province (Y1101240, Y1090752), and Natural Science Foundation of Ningbo (2011A610197, 2011A610200).

References

[1] A. Kubota, et al., “Multi-view imaging and 3DTV”, IEEE Signal Processing Magazine, vol. 11, no. 8, P10-21, 2007. [2] A. Smolic, K. Mueller, P. Merkle, et al., “3D Video and Free Viewpoint Video-Technologies, Applications and MPEG Standards”, Proceedings of 2006 IEEE International Conference on Multimedia and Exposition, P2161-2164, Jul. 2006.

[3] A. Smolic, P. Kauff, “Interactive 3D Video Representation and Coding Technologies”, Proceedings of 2005 IEEE Special Issue on Advances in Video Coding and Delivery, vol. 93, no. 1, 2005.

[4] ISO/IEC JTC1/SC29/WG11, “Multiview Video Coding Requirements”, Doc. N8064, Montreux, Switzerland, Apr. 2006. Color QP Book Arrival total bit-rate (kbps) Leave Laptop total bit-rate (kbps)

JMVC Proposed Saved（%） JMVC Proposed Saved（%）

27 _{3650.00 2987.97 18.13 2861.58 2401.70 16.07}

30 _{2446.83 1998.86 18.31 1902.94 1591.05 16.39}

35 _{1303.42 990.29 24.02 1026.02 813.44 20.72}

(6)

[5] ISO/IEC JTC1/SC29/WG11, “Applications and Requirements on 3D Video Coding”, Doc. W12035, Torino, Italia, Jul. 2011. [6] K. Mueller, P. Merkle, T. Wiegand, “3-D Video Representation Using Depth Maps”, Proceedings of the IEEE, vol. 99, no. 4, P643-656, 2011.

[7] A. Smolic, K. Mueller, et al., “Multi-view Video plus Depth (MVD) Format for Advanced 3D Video Systems”, San Jose, USA, Apr. 2007.

[8] P. Kauff, N. Atzpadin, et al., “Depth Map Creation and Image Based Rendering for Advanced 3DTV Services Providing Ineroperability and Scalability”, Signal Processing: Image Communication, Special Issue on 3D Video and TV, vol. 22, no. 2, P217-234, 2007.

[9] ISO/IEC JTC1/SC29/WG11, “Reference Software for Depth Estimation and View Synthesis”, Doc. M15377, Archamps, France, Apr. 2008.