Foreground-Object-Protected Depth Map Smoothing for DIBR

(1)

Foreground-Object-Protected Depth Map Smoothing for DIBR

Xiao-han Lu, Fang Wei, Fang-min Chen

School of Information and Communication Engineering Beijing University of Posts and Telecommunications

Beijing,China

[email protected], [email protected], [email protected]

Abstract—Depth image based rendering (DIBR) uses a 2-D color image and its associated depth map to render virtual views and then a stereoscopic image. One of the main problems in DIBR is how to reduce the size of holes in the generated virtual view image. Smoothing the depth map before image warping is a common solution. However, the previous smoothing methods bring edge distortions to the foreground objects which are more valuable than the background. In this paper, a piecewise smoothing filter is adopted, performing filtering on the depth map only in the background regions while leaving the foreground regions unchanged. And the filtering is also controlled by the registration points of the depth map. Experimental results show that the proposed method obtains better subjective qualities for the virtual views, and it is more time-saving compared to the former methods while having a better PSNR performance.

Keywords-DIBR; depth map smoothing; foreground object protection; depth map registration

I. INTRODUCTION

Stereoscopic videos are getting more and more popular with the rapid developments of digital technologies. In most stereoscopic video systems, two temporally synchronized video streams (respectively for the left and right eye) are used to provide the stereoscopic perception to the viewers. Such a stereoscopic video needs much more storage spaces and bandwidth than a monocular one. Besides, it can only provide unalterable stereoscopic perceptions, which are not enough for some applications, such as, a stereoscopic interactive application providing users with the freedom to dynamically customize the stereoscopic perceptions to meet their own appetites.

Video-plus-depth stereoscopic systems with lower storage spaces, bandwidth requirement and customizable stereoscopic perceptions turn out to be a new choice, in which the depth image based rendering (DIBR) is one of the key algorithms, to synthesize virtual views from a monocular image and its associated depth image. The quality of the final stereoscopic videos largely depends on the effectiveness of DIBR results.

DIBR algorithms are basically divided into three steps, depth image smoothing, image warping, and holes filling. The holes, represent the areas in the virtual views which are not visible in the original monocular image. Smoothing the depth map to reduce its discontinuity is helpful to decrease

the size of the holes, thus making holes much easier to be filled.

In the previous works, Gaussian filter is usually used as the smoothing filter. For example, [2] utilized a symmetric Gaussian low-pass filter to smooth the depth images. [3] and [9] used an asymmetric Gaussian low-pass filter. Edge dependent low-pass filters were proposed in [5] and [10]. [7] adopted parallax maps to render original images. These smoothing filters successfully preclude distortions such as rubber-sheet artifacts and strip-like impairments. However, all of them blur the edges in depth maps, resulting in that both foreground and background pixels are used to fill the holes in the generated virtual view images.

In this paper, a new approach to smooth the depth maps for DIBR is presented. It is based on the concept of foreground object protection (in section 2). After the sharp depth discontinuities prediction, the depth map is smoothed with the principle that not changing pixel values of any foreground regions, but only that of the background. In this way, the proposed method produces stereoscopic images with better subjective quality and meanwhile turns out to provide a good tradeoff between the time saving, the hole reduction, and the PSNR.

II. CONCEPT OF FOREGROUND OBJECT PROTECTION

A. Foreground Object Definition

Fig. 1(c)(d)(e) show the DIBR process without any depth map smoothing. The sharp depth discontinuity (the boundary in Fig. 1(c)) within the depth map results in wider holes in the virtual view image, as in Fig. 1(d). To model the discontinuities in depth maps, we define two relative concepts, foreground and background pixels. A pixel with higher depth value is defined as a foreground pixel and a background pixel is defined in the other way round. Accordingly, a foreground and background object can be defined in the same manner. For example, in Fig. 1(b), the man's head is considered to be a foreground object, and the wall behind him is a background object.

B. Problems of Previous Works in Foreground Object Protection

Previous depth map smoothing works are not good enough in foreground object protection. They distorted the foreground objects more or less. For instance, Fig. 1(f)(g)(h)

(2)

show a DIBR process with a horizontal Gaussian filter. The smoothing step blurs the edges of the depth map (edge gradients in Fig. 1(f)), and makes the holes in the generated virtual view image thinner and widely spread (Fig. 1(g)). This is because that the foreground pixels close to the holes in the virtual view generated without smoothing filters are shifted to fill parts of the holes. After the hole-filling, these foreground pixels are usually copied to completely fill the holes. Then the foreground objects will be prolonged (Fig. 1(h)). (a) (b) (c) (d) (e) (f) (g) (h)

Figure 1. (a) Original color image (b) Part of the image, man’s head (c) Original depth map (d) Generated virtual view image(holes are marked black). (e) Hole-filling result for (d). (f) Depth map preprocessed by Gaussian filter (g) Generated virtual view image using smoothed depth map

(h) Hole-filling result for (g)

C. Significance in Foreground Object Protection

Choosing appropriate pixels to fill holes is the key to determine the stereoscopic visual quality [8]_{. Subjectively, we} would prolong the background window rather than the man's head to fill the holes. That looks more natural. What's more, the consistence of the foreground objects between the original image and the generated virtual view image ensures the quality of stereo perception. Objectively, viewers are more sensitive to image quality degradation in the foreground than that in the background [6]_{. In most cases,} foreground objects have more crucial details which should not be destroyed. In a word, it is better to use pixels from the background as the filled-in pixels for holes, and not change any pixels in the foreground, that is, protect the foreground objects.

D. Problem Formulation of Depth Map Registration

In practical cases, edges of the depth map are not totally matched with those of the color image [8]_{. Fig. 2(b)(c)(d)} compare edges of the depth map and those from the color image on ‘teddy’. The edges are 2 to 4 pixels wider in the

color image than those in the depth map. Moreover, edges in the color image are more closely modeled as having a "ramp like" profile. As can be seen from Fig. 2(d), colors of the paw's edge gradually change as a ramp. On the contrary, the depth map usually has a steep edge (see Fig.1(c)) whose corresponding position in the color image is usually at the middle of the ramp.

Things above make the foreground objects separated by holes after warping, Fig. 2(e). Filling these holes will make the foreground objects prolonged and distorted.

Hence, to protect the foreground object, the whole edge ramp should be classified as foreground, which means, edges of the depth map should be registered to the position where color image ends the edge ramps.

(a) (b) (c)

(d) (e)

Figure 2. (a) Image ‘teddy’ (b) Enlarged depth map of ‘teddy’ (c)Enlarged image ‘teddy’ whose edges do not match with those in the depth map (d)Enlarged teddy’s paw with an edge ramp(e) Generated virtual view

image

III. PROPOSED METHOD

Our method mainly aims to reduce the foreground object distortions discussed in the former section. Firstly, predict the occurrence of holes by finding the sharp discontinuities in the depth map. Then a registration of depth map is used to deal with misalignment between the color image and the corresponding depth map. At last, a piecewise smoothing filter is performed only in the background. As a result, the foreground pixels in the depth map are preserved; only the depth pixels in background which would lead to holes are smoothed; other background pixels are also preserved.

A. Hole Region Prediction

The smoothing for the depth map is only done on the depth pixels which would lead to holes. So it is necessary to find such depth pixels before applying smoothing filters to the depth map. Here, suppose that using a left-eye view and its associated depth map to generate the right-eye view, so that holes will only appear at the right side of the foreground objects. In this case, a high-to-low depth value discontinuity in the horizontal direction is what we are looking for.

(3)

If d(x, y) satisfies the following rule, (x, y) and some of the following right pixels will lead to holes after image warping. For that, we mark Ld (x, y) = 1:

¯

®

−

+

≥

=

otherwise

0

)

,

1

(

)

(

1

)

,

(

x

y

d

x,y

d

x

y

θ

L

_d (1)

where ˛is a threshold depending on the magnitude of the depth changes.

B. Depth Map Registration

To better protect foreground objects from the impact of the misaligned depth map, we register the foreground regions in the depth map to the position of the edge endings in the color image.

To find the end of edges in color image, 1-D horizontal Laplacian operator is employed, as in (2) and (3). The Laplacian operator computes the second-order derivative, producing double response to a ramp, and well reflecting the beginning and the ending of a ramp.

2 2 2

(

,

)

x

f

y

x

f

∂

=

∇

(2)

)

,

1

(

)

,

1

(

)

,

(

2

)

,

(

2

f

x

y

=

f

x

y

−

f

x

+

y

−

f

x

−

y

∇

(3)

f(x, y) is the luminance value of the color image at the position (x, y). From each position found from hole region prediction on (suppose the number of hole region prediction position is M), that is,

(4) which usually indicates the middle of the edge ramp, go right in the same row, and search for the maximum response of the 1-D Laplacian operator within a certain range N. It is most likely to be the ending of an edge ramp; it is also the registration point of the depth edge we are looking for.

(5)

C. Piecewise Smoothing Method

After the depth map is registered, the scopes of the foreground regions are found, that is, those of the background regions are also determined. Now we can begin the piecewise smoothing along the edge ramps of the foreground to the background.

To ensure the foreground objects being preserved after image warping, we set the depth value at each position found from the hole region prediction to the final registering point to the same value, as below.

d

ˆ

(

x

_i

+

j

,

y

_i

)

=

d

(

x

_i

,

y

_i

),

j

∈

Z

,

1

≤

j

≤

r

_i (6) So that the position of pixels from (xi, yi) to (xi+ri, yi) after image warping is continuous as they were in the original image, because image warping is just translation of original image pixels with displacement directly or inversely proportional to the depth values.

Afterwards, we adopt a descent processing to smooth the background regions of the depth map, making the width of holes minimized with appropriate depth gradients [8]_{. Here,} a fixed gradient G is used to make the width of each hole as even as possible.

(7)

S represents the number of steps to finish descent. Then the new depth value for the background is updated using fixed gradient G, and with S steps.

S

k

Z

k

G

k

y

r

x

d

y

k

r

x

d

_i _i _i _i _i _i

≤

∈

×

−

+

=

+

1

,

)

,

(

)

,

(

ˆ

(8) (a) (b)

Figure 3. Example for Depth map smoothing (a) Depth map before smoothing (b) Depth map after smoothing

IV. THE EXPERIMENTAL RESULTS

To evaluate the proposed approach, we adopt images 'Interview' with 720×576 pixels, images 'Teddy' and 'Cones' with 450×375 pixels as the test images.

A. Parameter Selection

The threshold of depth discontinuity detection (˛) should be large enough to reduce the effects of the gradual changes but also small enough to predict huge discontinuity in depth maps. A value between 5 and 30 is suitable in our cases. Depth maps with a bigger size need a larger threshold. We select ˛ =10 in our experiments. And the depth discontinuities are generally not greater than 50, so the value of the smoothing step (S) is set to 5.

From the statistics, the width of the edge ramps in color images with such a resolution are not larger than 5, so we set

N=5.

B. Subjective Visual Quality

Fig. 4 shows the comparison of depth maps before and after our proposed smoothing method.

Fig. 5 shows the generated virtual right-view images with and without the depth map registration on the image ‘Teddy’. To get a clearer view, these images are not applied with any smoothing methods. We can see that after the registration, most of the misaligned foreground pixels are rectified.

[

(

,

)

(

1

,

)

]

1

y

x

d

y

x

d

S

G

=

−

+

M

i

Z

i

y

x

L

d

(

i

,

i

)

=

1

,

∈

,

1

≤

)

,

(

max

arg

2 1 , j N i i Z j i

f

x

j

y

r

=

_∈ _≤ _≤

∇

+

(4)

Fig. 6 shows a comparison of the generated right view images using our proposed method and the original left view image, on the image 'Teddy'. We can see that the proposed method remains the shape and edge quality of teddy bear (the foreground object relative to the wall) the same as it was before.

Fig. 7 shows enlarged detailed results on the other two images. In these images, our proposed method is compared with the asymmetric Gaussian method and the edge-dependent method. It can also be seen that our method visually protects the foreground objects while asymmetric Gaussian method distorts the foreground objects (prolonged face and the cone wrapped to a different angle) and edge-dependent method makes the foreground pixels blended with many different background pixels.

(a) (b)

Figure 4. Depth map ‘Teddy’ (a) before smoothing (b) after smoothing

(a) (b)

Figure 5. registration result of image 'teddy' (a)generated virtual view image (b)generated virtual view image with depth map registration

C. Objective Evaluation

This section compares the PSNR, the computation time and the hole size between the proposed method and the previous works. Experiments are done with Opencv2.0 and Microsoft Visual Studio 2008.

PSNR can reveal the fidelity of depth cues in the generated right views images. To obtain the PSNR, the generated right view images of ‘cone’ and ‘teddy’ use the original right view images as the ground truth while without original right view image, that of ‘interview’ is compared with the generated right view image of the non-smoothing method.

The computation time shows the speed improvement of our depth map smoothing algorithm, and the hole size shows hole reduction performance helpful for reducing visual artifacts.

From Table 1, we can see that, the proposed method is more time-saving compared to the former method while obtaining a better PSNR performance and resulting in relatively smaller hole size.

V. CONCLUSION

This paper presents a foreground-object-protected smoothing method for depth maps in DIBR. The proposed method visually protects the foreground objects in generated right view images. Moreover, it has a better PSNR performance and relatively smaller hole size compared to the previous works. Better stereoscopic quality and real-time DIBR could be achieved by using this method.

TABLEI. COMPARISON OF PSNR, COMPUTATION TIME AND HOLES

Methods PSNR Time(ms) Hole size

Inter-view Non-smoothing / 141 43003 Edge-dependent Gaussian 40.56 172 41998 Asymmetric Gaussian 30.70 5651 14132 proposed 39.06 171 40857 cone Non-smoothing 35.30 78 14464 Edge-dependent 34.88 95 15299 Asymmetric Gaussian 30.66 1934 4340 proposed 35.21 94 14583 teddy Non-smoothing 35.49 78 10915 Edge-dependent 35.14 95 11223 Asymmetric Gaussian 31.88 1934 5108 proposed 35.91 94 10551 ACKNOWLEDGMENT

This work was supported by “the Fundamental Research Funds for the Central Universities” of China (2012RC0136).

REFERENCES

[1] C. Fehn, “Depth-image-based-rendering (DIBR), compression and transmission for a new approach on 3D-TV,” Proc. International Society for Optical Engineering Conf. Stereoscopic Displays and Virtual Reality Systems XI, San Jose, CA, Vol. 5291, pp. 93̄104, Jan. 2004.

[2] W. J. Tam, G. Alain, L. Zhang, T. Martin, and R. Renaud, “Smoothing depth maps for improved stereoscopic image quality,” Proc. SPIE Conf. Three-Dimensional TV, Video, and Display III, Philadelphia, U.S.A., Vol. 5599, pp. 162-172, Oct. 2004.

[3] Liang-Hao Wang, Xiao-Jun Huang, Ming Xi, Dong-Xiao Li, Member, IEEE, and Ming Zhang, “An Asymmetric Edge Adaptive Filter for Depth Generation and Hole Filling in 3DTV”, IEEE Trans on Broadcasting, Vol. 56, No. 3, pp. 425-431, Sep 2010.

[4] Pei-Jun, Lee, “Non-geometric Distortion Smoothing Approach for Depth Map Preprocessing,” IEEE Trans. on Multimedia, Vol. 13, No. 2, pp. 246-254, Apr 2011.

[5] E. Ekmekcioglu, M. Mrak, S.T. Worrall and A.M. Kondoz, “Edge adaptive up-sampling of depth map videos for enhanced free-viewpoint video quality,” Electronics letters, Vol. 45, No. 7, pp. 353-354, Mar 2009.

[6] Michael R. Frater, John F. Arnold, and Abedin Vahedian ̌Impact of Audio on Subjective Assessment of Video Quality in Videoconferencing Applications,̍IEEE Trans. on Circuits and Systems for Video Tech., Vol. 11, No. 9, pp. 1059-1062, Sep 2001. [7] Ting-Ching Lin, Hsien-Chao Huang, and Yueh-Min Huang,

“Preserving Depth Resolution of Synthesized Images Using Parallax-Map-Based DIBR for 3D-TV,” IEEE Trans on Consumer Electronics,

(5)

Vol. 56, No. 2, pp. 720-727, May 2010.

[8] Michael Fieseler, Xiaoyi Jiang, “Registration of Depth and Video Data in Depth Image Based Rendering,” 3DTV-Conference: The True Vision-Capture, Transmission and Display of 3D Video, Potsdam, Germany, pp. 1-4, May 2009

[9] L. Zhang and W. J. Tam, “Stereoscopic image generation

based on depth images for 3DTV,” IEEE Trans. Broadcast., vol. 51, pp. 191–199, Jun. 2005.

[10] W. Y. Chen, Y. L. Chang, S. F. Lin, L. F. Ding, and L. G. Chen, “Ef¿cient depth image based rendering with edge dependent depth ¿lter and interpolation,” in Proc. IEEE Int. Conf. Multimedia and Expo, Jul.2005, pp. 1314–1317.

(a) (b)

Figure 6. (a)Original left view for image 'teddy' (b)generated right view image with the proposed method

(a) (b) (c) (d)

(e) (f) (g) (h)

Figure 7. Enlarged segments for image 'interview' and 'cone'. (a)(e)Original left view image, and generated right view image (b)(f) with the asymmetric Gaussian smoothing method (c)(g) with the edge-dependent method (d)(h)with the proposed method