Semi global Matching Based on a Modified Census Transform with Gradient Maps

(1)

2018 International Conference on Computational, Modeling, Simulation and Mathematical Statistics (CMSMS 2018) ISBN: 978-1-60595-562-9

Semi-global Matching Based on a Modified Census

Transform with Gradient Maps

Li SHI, Cheng-lei PENG, Si-dan DU and Yang LI*

School of Electronic Science and Engineering, Nanjing University, China

*Corresponding author

Keywords: Stereo match, Census transform, Semi-global match, Gradient maps.

Abstract.Census transform has been identified to be very effective and robust for stereo matching. But it needs too much computation and the accuracy is limited. In this paper, a semi-global matching based on a modified census transform is proposed to decrease the computation. To take use of the parallel computation like CUDA, a new cost model is proposed by using the gradient maps. The proposed method performs better than the original method in the experiment. Compared to the original method, the proposed method reduces the time consumption while remaining the accuracy.

Introduction

Stereo vision is a technology to extract depth information about the objects in the scenes from two or more cameras [1]. By calculating the horizontal displacement of the corresponding points on the same row between two images, the depth information about the scene is obtained. Such process is called stereo matching. Stereo vision is widely used in many areas, such as obstacle detection, surveillance, advanced driver assistance systems, etc.

Generally, stereo algorithms consist of the following four steps: Matching cost computation, cost aggregation, disparity optimization, disparity refinement [2]. Most of the algorithms are divided into two major types, local algorithms and global algorithms. Local algorithms aggregate cost based on a support window and then take winner takes all (WTA) strategy to get the disparity. Local algorithm has high speed and low-energy consumption. But the accuracy of the local algorithm is limited. Compared to local algorithms, global algorithms perform better. Global algorithms mostly minimize a global cost function that combines data and smoothness terms over the whole stereo pair. Global algorithms usually gain more accuracy. But the huge computation and high energy consumption of the global algorithms limit the performance in real-time application. Apart from these algorithms, the Semi Global Matching (SGM) [3] algorithm has become very popular due to its good trade-off between accuracy and computation requirements. The algorithm integrates along multiple 1-dimensional (1D) energy paths to approximate a global MRF regularized cost function.

Matching cost function is an important part of stereo matching algorithms. Census cost function has been identified to be very effective and robust for stereo matching, especially under strong illumination variations. It is widely used in actual applications. But along with the increase of the size of the support window, the computation increases sharply which means more energy consumption and more processing time.

In recent years, many techniques have been studied on solving the defects of the Census transform. [4] Introduced a four-model Census transform to solve the problems of matching quality with the increased complexity. [5] Proposed a fast census transform to decrease the computation by decreasing the comparisons in the window for each pixel which weakens the weight of the center pixel. [6] Introduced a mixed matching cost called AD-Census which combines the AD matching cost and the Census matching cost. The AD-Census improves the accuracy of the matching quality but requires more computation.

(2)

and more algorithms are implemented on such hardware for real-time applications. In order to be able to use parallel computation, we have transformed the whole algorithm. The gradient maps in each direction are calculated in advance, and a new computational model is formed, and the cost is obtained by calculating the Hamming distance between the points in such models. The gradient maps can be easily calculated using parallel computation, and the subsequent process consumption is very small. The algorithm greatly reduces the time consumption while remaining the accuracy.

Proposed Method

In this section, we describe the proposed method for disparity map generation. Figure.1 provides an overview of the proposed method. The proposed method mainly consists Four steps: (1) Cost Model. (2) Cost computation. (3) Disparity computation & optimization. (4) Disparity refinement. Let’s review the traditional census transform before explaining the proposed method.

Cost Model & Cost Computation

Census cost function has been identified to be very effective and robust for stereo matching. It is a non-parametric local transform which is based on local intensity relations between the pixel centered in a window and pixels within the window. Compared with other cost function, census cost function is more robust in real-time applications, especially under strong illumination variations. The census transform is defined as (1):

Left Image

Right Image

Cost Model

Cost Computation

Disparity computation &optimization

[image:2.612.98.488.321.660.2]

Disparity RefinementDisparity

Figure 1. Overview of the proposed method.

Figure 2. (a) Census transform (b) Modified census transform.

' '

( , ) n m ( ( , ), ( , )) i n j m

T u v  I u v I u i v j  

    

(1)

0 ( , )

1

if x y x y

else

   

 ₍₂₎

' , '

2 2

n m

n  _{ } m  _{ }

   ₍₃₎

Where I u v( , )_{means the intensity of the pixel}( , )u v _,_{m n} _{means the size of the support window} and  represents the bit catenation. Then the costs are calculated with the Hamming distance of the two bit strings.

(3)

in eight directions will be considered. For a n n _{support window, the traditional method needs} 1

n n  _{comparisons for each pixel while the proposed one only needs}4(n1)_.

To take use of the parallel computation, the cost model is constructed. Figure.3 shows the process of the construction of the cost model. The cost model for each image will be constructed from the gradient maps obtained from eight directions. There are (n1) / 2_{gradient maps for each direction.}

Figure.4(a) indicates the directions of the gradient maps for each pixel, there are 8 directions for each pixel. Figure.4(b) illustrates the detail of the gradient map. (4) obtains the gradient maps.

, ( ( ), ( ))

r s r r

G  I p I ps ₍₄₎

Where Gr s, means the value of the gradient map for the pixel p in the direction r with the offset s.

For a n n _{support window, the offset s takes a value in a range}[1,(n1) / 2]_{. The variable L in Fig.3,}

[image:3.612.200.415.235.568.2]

the length of the cost model, equals to 4(n1)_{that the number of the gradient maps. And the matching}

Figure 3. The construction of the cost model.

[image:3.612.200.419.247.413.2]

(a) (b)

Figure 4. The directions of the gradient maps are shown in (a), the gradient maps for each pixel are shown in (b). Cost can be acquired by calculating hamming distance between the corresponding vectors with length L in the cost models. The core of the computation is focused on the acquisition of the gradient maps. In the actual implementation, it is easy to get gradient maps with parallel computing. And the gradient maps obtained in the opposite direction are complementary which means we just need to calculate the half number of the maps that equals to 2(n1)_{. So for the census and recommended} algorithms for the same window, the latter takes less time. And when we limit the same amount of calculation, the latter can be expanded to larger window sizes for better results. By using parallel programming like CUDA, the gradient maps and hamming distances can be obtained effectively. Disparity Computation and Optimization

(4)

accuracy and complexity. The algorithm approximates a global MRF regularized cost function by summing up several 1-dimensional (1D) energy paths. And using 8 or 16 paths is sufficient to cover the structure of the image. The cost along each path r is calculated by means of dynamic programming:

1 1

2

( , ) ( , ) min( ( , ), ( , 1) , ( , 1) , min ( , ) ) min ( , )

r r

i i

L p d C p d L p r d

L p r d P L p r d P

L p r i P L p r i

  

     

   

(5) Where L p dr( , ) is the cost of the path r for pixel p and disparity d. And it equals to the matching cost C p d( , )_{plus the minimum path cost of the previous pixel p-r, with P1 and P2, the penalties of the}

smoothness term.P1 has to be smaller than P2. P1 penalizes inclined surfaces and P2 penalties discontinuous points.

The final energy E for the pixel p and disparity d is summed along multiple paths:

( , ) r( , ) r

E p d 



L p d

(6) Then we can obtain the disparity using a winner-takes-all strategy on E as shown in (7). A media filter is used to clear the outliers.

arg min ( , ) d D

D E p d





(7)

Experiment Results and Discussion

In the experiment, we use the Middlebury datasets [7] to verify the proposed method. The datasets contain 4 images named Adirondack, Classroom, Recycle and Sticks, together with their ground truth. The resolutions of the images are 718×496,749×474,680×463,714×501 comparatively. Figure.5 shows the results of disparity estimated by the traditional census-based SGM and the proposed method.

Table.1 indicates the error rate of bad pixels of the two algorithms with the ground truth that provided by the datasets. The error rate is calculated on non-occluded regions. The evaluation criterion of the error rate is given by (8).

1

(| e( , ) T( , ) | ) err D x y D x y

N 





 

(8) Where De and DT are the estimated disparity and the ground truth disparity respectively. N is the

total number of pixels in the non-occluded region and



is the tolerance which equals to 2 in our experiment.

Table.2 indicates the processing time of the part, the matching cost, for the census transform and the proposed method. All the experiments are implemented using C++ with OpenCV and CUDA. The experiment platform is windows 10 64-bit,inter core i7-4790 CPU @3.60 GHZ，32 GB RAM, NVIDIA GeForce GTX 760. The window size for the CT-SGM is 7 and 9 for the proposed method.

As can be seen from the Table 1 and 2, the proposed method achieves lower error rate while consuming less time than the census-based SGM.

Conclusion

(5)

[image:5.612.85.529.95.344.2]

computation time decreases. The results indicate that our method acquires a better performance than the CT+SGM.

Figure 5. Results of different method.

Table 1. Error rate using census transform and proposed method.

Dataset Bad-2 average error rate (%)

CT+SGM proposed method

Adirondack 9.79% 9.08%

Classroom 9.09% 8.47%

Recycle 12.84% 11.95%

Sticks 15.63% 15.01%

Average 11.84% 11.13%

Table 2. Error rate using census transform and proposed method.

Dataset Computation time of matching cost

Census proposed method

Adirondack 50.45ms 48.67ms

Classroom 49.23ms 49.29ms

Recycle 50.44ms 49.98ms

Sticks 62.94ms 61.71ms

(6)

References

[1] Ttofis, C., C. Kyrkou, and T. Theocharides, A Low-Cost Real-Time Embedded Stereo Vision System for Accurate Disparity Estimation Based on Guided Image Filtering. IEEE Transactions on Computers, 2015. PP (99): p. 1-1.

[2] Scharstein, D., R. Szeliski, and R. Zabih. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. in Stereo and Multi-Baseline Vision, 2001. (SMBV 2001). Proceedings. IEEE Workshop on. 2001.

[3] Hirschmuller, H., Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008. 30(2): p. 328-341.

[4] Men, Y., et al., A Stereo Matching Algorithm Based on Four-Moded Census and Relative Confidence Plane Fitting. Chinese Journal of Electronics, 2015. 24(4): p. 807-812.

[5] Guo, S., P. Xu, and Y. Zheng. Semi-global matching based disparity estimate using fast Census transform. in 2016 9th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). 2016.

[6] Sun, X., et al. Stereo Matching with Reliable Disparity Propagation. in 2011 International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission. 2011.