A Consensus Matching Algorithm with FREAK Features

(1)

2017 2nd_{International Conference on Computer Science and Technology (CST 2017)} ISBN: 978-1-60595-461-5

A Consensus Matching Algorithm with FREAK Features

Xue-wu LI and Yin-wei ZHAN

a*

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou510006, China

a_{[email protected]}

*Corresponding author

Keywords: Object tracking, Feature matching, FREAK, Consensus-based matching.

Abstract. While the consensus matching algorithm is able to discover target

disappeared for a while and then reappears to the scene, it fails for complicated scenes such as drastic change of illumination and image rotation. In order to overcome this drawback, we proposed an improved strategy for the original consensus matching scheme by replacing BRISK with FREAK and FAST with SURF. According to experiments, the adoption of SURF and FREAK results in better performance in the case of illumination change and image rotation, and the FREAK helps to reduce the complexity of the algorithm.

Introduction

Target tracking is an important research topic in computer vision, and has demonstrated potential applications to video surveillance and pedestrian detection. Continuous efforts have been motivated by the challenging situation with complex scenes such as target deformation, motion blur, target occlusion, and illumination variance, where stable target detectors and trackers are objectives of investigation.

Amongst so many achievements in literature, the tracking-learning-detection (TLD) algorithm [1,2] proposed by Kalal et al should be a successful strategy in 2010s with good performance in illumination and scale changes. Henrique set a proposed a circulant structure with kernels (CSK) algorithm [3] that performs well in processing speed but cannot continue to track the target when the target disappears, the crucial reason of which is that CSK uses grayscale features but ignores the color information of targets. Further, the CN [4] (color name) algorithm and the KCF [5] (kernelized correlation filters) algorithm tried to adds color information on the basis of the CSK algorithm, but did not well solve the problem of target appearance change. In another way, contributions [6, 7] have been made to improve the TLD, but unfortunately slower in processing than CSK, CN, and KCF.

In contrast overall, the CMT [8,9] (consensus-based matching and tracking) algorithm is a fascinating strategy for long-term model-free object tracking. It deals quite well with targets of appearance change and runs fast with accurate tracking. Nonetheless, it is not fully adapted to changes of drastic illumination and rotation.

(2)

A Sketch of the CMT Algorithm

The CMT algorithm can be described briefly as follows, for a given video i.e. an image sequence It, t=1, ..., n.

1. Initialization. For the first frame I1, extract its FAST [10] feature points ri with

BRISK [11] feature descriptors fi, i=1, ..., N, which make up O = {(ri, fi); i=1, ..., N}.

Manually set the bounding box b1 of the target in I1. Divide O into two subsets of

foreground and background according to whether a feature point in O falls into b1.

2. Global matching. From t=2 to n, each frame It is subjected to backward and

forward optical flow tracking and the resulted feature points are set to P. Meanwhile, extract the FAST feature points and BRISK feature descriptors of It and set the result

as Q. Then match P with Q by KNN algorithm to get a rough foreground Ot of It.

3. Local matching. Estimate the scales and rotation angle of the target θin It. Each

feature point in Ot after scaled s and rotated θis matched with O by KNN algorithm.

Then take the smallest rectangle that encloses the feature points as the target area bt.

Benefit from the advantageous combination of FAST and BRISK, the CMT algorithm can attain quick and accurate tracking.

An Improved Implementation of the CMT

The FAST feature points refer to the pixels in the image that differ greatly from those in their neighborhoods. Therefore, the process of extracting the FAST feature points is to find the pixels that differ greatly from the points in the neighborhood.

So, we can see that the FAST feature point does not deal well with the change of illumination and image rotation. Hence the FAST is not invariant under rotation and illumination change.

We have perceived by experiments that under drastic illumination changes and target rotation, the adoption of FAST and BRISK is prone to incorrect matching, and so is the CMT algorithm.

Suo et al [12] made a systematic comparison of a variety of features, in which the FREAK performs well in compromising the robustness of illumination changes and the running speed. Therefore, in the following context, we will adopt FREAK [13] (SURF) instead of BRISK (FAST) in the CMT setting in order to overcome that the CMT algorithm is prone to tracking failure in the case of drastic illumination changes.

Comparison of FREAK with BRISK

FREAK (fast retina keypoint) is a keypoint descriptor inspired by the human visual system esp. the retina. A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. In order to eliminate noise interference, each sampling point needs to be Gaussian smoothing. Different from BRISK sampling mode, FREAK uses Gaussian fuzzy kernel functions of different sizes and therefore performs better in feature description.

To demonstrate the advantage of FREAK over BRISK, we make an experiment, the result of which is shown in Fig.1, where column (a) shows the matching results with FREAK and column (b) for the BRISK. It’s obvious that the FREAK is much better than BRISK.

(3)

The Realization of CMT with FREAK and SURF

Now comes our final scheme for the improved version of CMT. Here, feature point extraction of each frame is conducted by using the SURF algorithm, followed by the feature descriptor of FREAK algorithm. See Fig. 2 for an illustration of the improved CMT.

[image:3.612.179.435.152.289.2]

(a) (b)

Figure 1. The match result of FREAK and BRISK.

Get the SURF feature points and FREAK descriptors of the first

frame

Next frame

Forword/backword optical tracking

Get the SURF feature points and FREAK descriptors of current

frame

Matching the points and the descriptors

Get the minimum matrix that include all of the feature points

Figure 2. The flowchart of CMT with SURF and FREAK.

Experiments

The experiments are conducted under the environment of a PC of Intel (R) Core (TM) i5-4590 3.30GHz CPU of 8G memory and with the operating systemWin10 of 64-bit, VS2013 and Open CV 2.4.13.

[image:3.612.251.394.327.574.2]

(4)

videos. We choose the videos Fish, Jogging, and David. Fish has 476 frames andits characteristics are that only change of light intensity happens. Jogging has 307 frames; one target in it is completely occluded at the frame 74 and then reappears. David has 770 frames; its main characteristics is to contain light changes and target rotation.

The experimental results are shown in Fig.3 to Fig. 5, where blue boxes indicate the tracking results of CMT and the red onesforiCMT. The corresponding running time is shown in Table 1.

Table 1. Comparison of the time costs with these algorithms.

algorithms Fish Jogging David

CMT 0.268s/frame 0.088s/frame -

iCMT 0.286s/frame 0.097s/frame 0.082s/frame

For the video David, the processing of CMT stops at frame 441, which means CMT cannot complete the tracking task. This is why we cannot find the time consumption related to David.

[image:4.612.110.495.263.582.2]

Frame 7 Frame 41 Frame 94 Frame 386

Figure 3. Comparison for illumination changes.

Figure 4. Comparison for occlusion.

Figure 5. Comparison for drastic illumination changes.

Notice that there is only a little light changing in the video Fish. Therefore, both algorithms can track the target in Fish stably, with almost the same tracking accuracy (the ratio of the area occupied by the target in the tracked box).

In Fig. 4, it shows that both iCMT and CMT can track the target accurately when the illumination change is not obvious but the target is occluded. One target is partially occluded in frame 64, is completely hidden in frame 74, and later reappears in frame 109.

(5)

changes and target rotation. In the beginning, at frame 2, the two algorithms can track the target well due to little light changes with the first frame; but later, at frame 137, the target is detected with iCMT but lost with CMT. At frame 440 where the target has been in a bright situation, the CMT algorithm can track the target but with less tracking accuracy. From frame441 on, the CMT fails to keep tracking of the target. But the iCMT is able to track the target through the whole video.

Summary

In this paper, in order to overcome the drawbacks of the CMT that fails in tracking targets in drastic illumination changes and target rotation, we altered the ingredients of the algorithm CMT by replacing BRISK with FREAK and FAST with SURF. The proposed algorithm enhances the CMT with the ability of tracking targets in drastic illumination changes and target rotation. We have seen from the experiments that the improved version is somewhat slower than the original one, which is worth for further investigation.

Acknowledgement

This work is supported by Project of Science and Technology Program of Guangdong with grant No. 2014B040401012 and Project of Science and Technology Program of Guangzhou with grant No. 201604016034.

References

[1] Z. Kalal, K. Mikolajczyk, J.Matas. Forward-backward error: Automatic detection of tracking failures,in: ICPR, 2010, pp. 2756-2759.

[2] Z. Kalal, J. Matas, K. Mikolajczyk. Pn learning: Bootstrapping binary classifiers by structural constraints,in: CVPR, 2010, pp. 49-56.

[3] J. F. Henriques, R. Caseiro, P. Martins, et al. Exploiting the circulant structure of tracking-by-detection with kernels, in: ECCV, 2012, pp. 702-715.

[4] M. Danelljan, F. Shahbaz Khan, M. Felsberg, et al. Adaptive color attributes for real-time visual tracking,in: CVPR, 2014, pp. 1090-1097.

[5] J.F. Henriques, R. Caseiro, P. Martins, et al. High-speed tracking with kernelized correlation filters,IEEE Transactions on PAMI 37(2015) 583-596.

[6] Xin Zhou, Qiumeng Qian, Yongqiang Ye, Congqing Wang. Improved TLD visual target tracking algorithm, Journal of Image and Graphics 18:9(2013) 1115-1123. (In Chinese)

[7] Fei Qin, Ronggui Wang, Qixiang Liang, et al. Improved TLD target tracking algorithm based on key feature points, Computer Engineering and Applications 52:4(2016) 181-187. (In Chinese)

[8] G. Nebehay, R. Pflugfelder. Consensus-based matching and tracking of keypoints for object tracking, in: WACV, 2014, pp. 862-869.

(6)

[10] E. Rosten, T. Drummond. Machine learning for high-speed corner detection,in: ECCV, 2006, pp. 430-443.

[11] S. Leutenegger, M. Chli, R.Y. Siegwart. BRISK: Binary robust invariant scalable keypoints, in: ICCV, 2011, pp. 2548-2555.

[12] Chun-bao Suo, Dong-qing Yang, Yun-peng Liu. Comparing SIFT, SURF, BRISK, ORB and FREAK in some different perspectives, Beijing Surveying and Mapping, no. 4 (2014) 23-26, 22. (In Chinese)

[13] A. Alahi, R. Ortiz, P. Vandergheynst. Freak: Fast retina keypoint, in: CVPR, 2012, pp. 510-517.