2017 2nd International Conference on Computer Science and Technology (CST 2017) ISBN: 978-1-60595-461-5
A Consensus Matching Algorithm with FREAK Features
Xue-wu LI and Yin-wei ZHAN
a*School of Computer Science and Technology, Guangdong University of Technology, Guangzhou510006, China
*Corresponding author
Keywords: Object tracking, Feature matching, FREAK, Consensus-based matching.
Abstract. While the consensus matching algorithm is able to discover target
disappeared for a while and then reappears to the scene, it fails for complicated scenes such as drastic change of illumination and image rotation. In order to overcome this drawback, we proposed an improved strategy for the original consensus matching scheme by replacing BRISK with FREAK and FAST with SURF. According to experiments, the adoption of SURF and FREAK results in better performance in the case of illumination change and image rotation, and the FREAK helps to reduce the complexity of the algorithm.
Introduction
Target tracking is an important research topic in computer vision, and has demonstrated potential applications to video surveillance and pedestrian detection. Continuous efforts have been motivated by the challenging situation with complex scenes such as target deformation, motion blur, target occlusion, and illumination variance, where stable target detectors and trackers are objectives of investigation.
Amongst so many achievements in literature, the tracking-learning-detection (TLD) algorithm [1,2] proposed by Kalal et al should be a successful strategy in 2010s with good performance in illumination and scale changes. Henrique set a proposed a circulant structure with kernels (CSK) algorithm [3] that performs well in processing speed but cannot continue to track the target when the target disappears, the crucial reason of which is that CSK uses grayscale features but ignores the color information of targets. Further, the CN [4] (color name) algorithm and the KCF [5] (kernelized correlation filters) algorithm tried to adds color information on the basis of the CSK algorithm, but did not well solve the problem of target appearance change. In another way, contributions [6, 7] have been made to improve the TLD, but unfortunately slower in processing than CSK, CN, and KCF.
In contrast overall, the CMT [8,9] (consensus-based matching and tracking) algorithm is a fascinating strategy for long-term model-free object tracking. It deals quite well with targets of appearance change and runs fast with accurate tracking. Nonetheless, it is not fully adapted to changes of drastic illumination and rotation.
A Sketch of the CMT Algorithm
The CMT algorithm can be described briefly as follows, for a given video i.e. an image sequence It, t=1, ..., n.
1. Initialization. For the first frame I1, extract its FAST [10] feature points ri with
BRISK [11] feature descriptors fi, i=1, ..., N, which make up O = {(ri, fi); i=1, ..., N}.
Manually set the bounding box b1 of the target in I1. Divide O into two subsets of
foreground and background according to whether a feature point in O falls into b1.
2. Global matching. From t=2 to n, each frame It is subjected to backward and
forward optical flow tracking and the resulted feature points are set to P. Meanwhile, extract the FAST feature points and BRISK feature descriptors of It and set the result
as Q. Then match P with Q by KNN algorithm to get a rough foreground Ot of It.
3. Local matching. Estimate the scales and rotation angle of the target θin It. Each
feature point in Ot after scaled s and rotated θis matched with O by KNN algorithm.
Then take the smallest rectangle that encloses the feature points as the target area bt.
Benefit from the advantageous combination of FAST and BRISK, the CMT algorithm can attain quick and accurate tracking.
An Improved Implementation of the CMT
The FAST feature points refer to the pixels in the image that differ greatly from those in their neighborhoods. Therefore, the process of extracting the FAST feature points is to find the pixels that differ greatly from the points in the neighborhood.
So, we can see that the FAST feature point does not deal well with the change of illumination and image rotation. Hence the FAST is not invariant under rotation and illumination change.
We have perceived by experiments that under drastic illumination changes and target rotation, the adoption of FAST and BRISK is prone to incorrect matching, and so is the CMT algorithm.
Suo et al [12] made a systematic comparison of a variety of features, in which the FREAK performs well in compromising the robustness of illumination changes and the running speed. Therefore, in the following context, we will adopt FREAK [13] (SURF) instead of BRISK (FAST) in the CMT setting in order to overcome that the CMT algorithm is prone to tracking failure in the case of drastic illumination changes.
Comparison of FREAK with BRISK
FREAK (fast retina keypoint) is a keypoint descriptor inspired by the human visual system esp. the retina. A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. In order to eliminate noise interference, each sampling point needs to be Gaussian smoothing. Different from BRISK sampling mode, FREAK uses Gaussian fuzzy kernel functions of different sizes and therefore performs better in feature description.
To demonstrate the advantage of FREAK over BRISK, we make an experiment, the result of which is shown in Fig.1, where column (a) shows the matching results with FREAK and column (b) for the BRISK. It’s obvious that the FREAK is much better than BRISK.
The Realization of CMT with FREAK and SURF
Now comes our final scheme for the improved version of CMT. Here, feature point extraction of each frame is conducted by using the SURF algorithm, followed by the feature descriptor of FREAK algorithm. See Fig. 2 for an illustration of the improved CMT.
[image:3.612.179.435.152.289.2](a) (b)
Figure 1. The match result of FREAK and BRISK.
Get the SURF feature points and FREAK descriptors of the first
frame
Next frame
Forword/backword optical tracking
Get the SURF feature points and FREAK descriptors of current
frame
Matching the points and the descriptors
Get the minimum matrix that include all of the feature points
Figure 2. The flowchart of CMT with SURF and FREAK.
Experiments
The experiments are conducted under the environment of a PC of Intel (R) Core (TM) i5-4590 3.30GHz CPU of 8G memory and with the operating systemWin10 of 64-bit, VS2013 and Open CV 2.4.13.
[image:3.612.251.394.327.574.2]videos. We choose the videos Fish, Jogging, and David. Fish has 476 frames andits characteristics are that only change of light intensity happens. Jogging has 307 frames; one target in it is completely occluded at the frame 74 and then reappears. David has 770 frames; its main characteristics is to contain light changes and target rotation.
The experimental results are shown in Fig.3 to Fig. 5, where blue boxes indicate the tracking results of CMT and the red onesforiCMT. The corresponding running time is shown in Table 1.
Table 1. Comparison of the time costs with these algorithms.
algorithms Fish Jogging David
CMT 0.268s/frame 0.088s/frame -
iCMT 0.286s/frame 0.097s/frame 0.082s/frame
For the video David, the processing of CMT stops at frame 441, which means CMT cannot complete the tracking task. This is why we cannot find the time consumption related to David.
[image:4.612.110.495.263.582.2]Frame 7 Frame 41 Frame 94 Frame 386
Figure 3. Comparison for illumination changes.
Frame 4 Frame 64 Frame 74 Frame 109
Figure 4. Comparison for occlusion.
Frame 2 Frame 137 Frame 440 Frame 662
Figure 5. Comparison for drastic illumination changes.
Notice that there is only a little light changing in the video Fish. Therefore, both algorithms can track the target in Fish stably, with almost the same tracking accuracy (the ratio of the area occupied by the target in the tracked box).
In Fig. 4, it shows that both iCMT and CMT can track the target accurately when the illumination change is not obvious but the target is occluded. One target is partially occluded in frame 64, is completely hidden in frame 74, and later reappears in frame 109.
changes and target rotation. In the beginning, at frame 2, the two algorithms can track the target well due to little light changes with the first frame; but later, at frame 137, the target is detected with iCMT but lost with CMT. At frame 440 where the target has been in a bright situation, the CMT algorithm can track the target but with less tracking accuracy. From frame441 on, the CMT fails to keep tracking of the target. But the iCMT is able to track the target through the whole video.
Summary
In this paper, in order to overcome the drawbacks of the CMT that fails in tracking targets in drastic illumination changes and target rotation, we altered the ingredients of the algorithm CMT by replacing BRISK with FREAK and FAST with SURF. The proposed algorithm enhances the CMT with the ability of tracking targets in drastic illumination changes and target rotation. We have seen from the experiments that the improved version is somewhat slower than the original one, which is worth for further investigation.
Acknowledgement
This work is supported by Project of Science and Technology Program of Guangdong with grant No. 2014B040401012 and Project of Science and Technology Program of Guangzhou with grant No. 201604016034.
References
[1] Z. Kalal, K. Mikolajczyk, J.Matas. Forward-backward error: Automatic detection of tracking failures,in: ICPR, 2010, pp. 2756-2759.
[2] Z. Kalal, J. Matas, K. Mikolajczyk. Pn learning: Bootstrapping binary classifiers by structural constraints,in: CVPR, 2010, pp. 49-56.
[3] J. F. Henriques, R. Caseiro, P. Martins, et al. Exploiting the circulant structure of tracking-by-detection with kernels, in: ECCV, 2012, pp. 702-715.
[4] M. Danelljan, F. Shahbaz Khan, M. Felsberg, et al. Adaptive color attributes for real-time visual tracking,in: CVPR, 2014, pp. 1090-1097.
[5] J.F. Henriques, R. Caseiro, P. Martins, et al. High-speed tracking with kernelized correlation filters,IEEE Transactions on PAMI 37(2015) 583-596.
[6] Xin Zhou, Qiumeng Qian, Yongqiang Ye, Congqing Wang. Improved TLD visual target tracking algorithm, Journal of Image and Graphics 18:9(2013) 1115-1123. (In Chinese)
[7] Fei Qin, Ronggui Wang, Qixiang Liang, et al. Improved TLD target tracking algorithm based on key feature points, Computer Engineering and Applications 52:4(2016) 181-187. (In Chinese)
[8] G. Nebehay, R. Pflugfelder. Consensus-based matching and tracking of keypoints for object tracking, in: WACV, 2014, pp. 862-869.
[10] E. Rosten, T. Drummond. Machine learning for high-speed corner detection,in: ECCV, 2006, pp. 430-443.
[11] S. Leutenegger, M. Chli, R.Y. Siegwart. BRISK: Binary robust invariant scalable keypoints, in: ICCV, 2011, pp. 2548-2555.
[12] Chun-bao Suo, Dong-qing Yang, Yun-peng Liu. Comparing SIFT, SURF, BRISK, ORB and FREAK in some different perspectives, Beijing Surveying and Mapping, no. 4 (2014) 23-26, 22. (In Chinese)
[13] A. Alahi, R. Ortiz, P. Vandergheynst. Freak: Fast retina keypoint, in: CVPR, 2012, pp. 510-517.