An Object Tracking Method Combined Spatio temporal Context Learning with Color Features

(1)

2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN: 978-1-60595-458-5

An Object Tracking Method Combined Spatio-temporal

Context Learning with Color Features

Bo XU

1

, Yu-lan WANG

2,*

and Zhen-hai WANG

3

1_{School of Mechanical and Vehicle Engineering, Linyi University, Linyi, China}

2_{School of Civil Engineering and Architecture, Linyi University, Linyi, China}

3_{College of Informatics, Linyi University, Linyi, China}

*Corresponding author

Keywords: Object tracking, Spatio-temporal context learning, Color feature.

Abstract. The correlation filtering tracking approaches have attracted much attention because of its high-speed and robustness. For further improving the efficiency and robustness in tracking. We propose the usage of color features to construct the object appearance in the STC tracker. The combined feature of gray and color features can be computed efficiently and possess high discriminative power. Experiment results on two video sequences demonstrate that the spatio-temporal context tracker with color features can obtain superior performance in real-world tracking.

Introduction

Object tracking is one of the most active research problems in computer vision, which has attracted more scholars to study it. Tracking has been widely applied in many different fields such as surveillance, human-machine interaction, vehicle navigation, human actions recognition, to name a few. Although a lot of tracking algorithms have been proposed within the last decades, the tracking problem still remains rather challenging [1]. The main challenge for object tracking is to handle appearance changes of the object and background due to occlusion, pose variation and illumination changes. There are two ways to deal with the problem of object tracking: one is that the tracking problem is considered as a sample classification problem so as to distinguish between object and background [2,3]; the other is that the new location of object is determined by calculating the optimal similarity between the search areas and the template in the image [4,5].

(2)

Review of STC Tracking Approach

The spatio-temporal context (STC) learning algorithm establish the spatial relationship of the interesting target and its local context through the prior knowledge of the low level features, and it builds the statistical correlation between the target and surrounding area which tackles the tracking problem as a computing confidence map. STC can get the best position by maximizing the target location probability function using FFT in learning and target detection.

[image:2.595.213.396.216.333.2]

During object tracking, the local context refers to the background region that contains the target object and is directly adjacent to the target. The temporal and spatial context learning algorithm models the gray level and location features in context region as following figure 1.

Figure 1. Illustration of context features.

Ωc is the local context region which is in the solid rectangular box. x* is the tracking center in the

dotted rectangular box which is considered as the target region. c(z)=(I(z),z) is the features of context at the location of z, I(z) is the gray value of image and z is the position information.

The confidence map c(x) is formulated:

𝑐(𝑥) = ∑ 𝑃(𝑥, 𝑣(𝑧)|𝑜) 𝑣(𝑧)∈𝑋𝑐

= ∑_{𝑣(𝑧)∈𝑋}𝑐𝑃(𝑥|𝑣(𝑧), 𝑜)𝑃(𝑣(𝑧)|𝑜) (1)

𝑃(𝑥|𝑣(𝑧), 𝑜) denotes the context relationship between the target tracking location and its context, that is, the tracking probability at z where both the target and context exist. 𝑃(𝑣(𝑧)|𝑜) is a context prior probability which models appearance of the local context.

In STC, 𝑃(𝑣(𝑧)|𝑜

𝑃(𝑣(𝑧)|𝑜) = 𝐼(𝑧)𝜔𝜎(𝑧 − 𝑥∗) (2) 𝜔_𝜎(𝑑) = 𝑎 ∙ exp⁡(−𝑑2_/𝜎2₎_{is a weighted gauss function, a is a constant,}_𝜎_{is a scale parameter.}

The confidence map function c(x) is defined as:

𝑐(𝑥) = 𝑏 ∙ 𝑒𝑥𝑝(−|(𝑥 − 𝑥∗_)/𝛼|𝛽₎

(3)

Where b is a regular constant, 𝛼 is a scale parameter, 𝛽 = 1 is a shape parameter in STC. The conditional probability 𝑃(𝑥|𝑣(𝑧), 𝑜) to be solved is defined as:

𝑃(𝑥|𝑣(𝑧), 𝑜) = ℎ𝑠𝑐_{(𝑥 − 𝑧)}

(4)

This form is for the following process using FFT. Putting (2), (3) and (4) together, (1) can be written:

(3)

= ∑ ℎ𝑠𝑐_{(𝑥 − 𝑧)𝐼(𝑧)𝜔}

𝜎(𝑧 − 𝑥∗) 𝑣(𝑧)∈𝑋𝑐

= ℎ𝑠𝑐(𝑥) ⊗ [𝐼(𝑥)𝜔𝜎(𝑥 − 𝑥∗)] (5)

According to FFT, the spatial context model is obtained:

ℎ𝑠𝑐_{(𝑥) = ℱ}−1₍ ℱ(𝑐(𝑥))

ℱ(𝐼(𝑧)𝜔𝜎(𝑧−𝑥∗))) (6)

The tracking process of STC is following:

(1) Computing the spatial context model at the t-th frame according to (6). (2) Updating the spatio-temporal context model by:

𝐻_𝑡+1𝑠𝑡𝑐(𝑥) = (1 − 𝜌)𝐻𝑡𝑠𝑡𝑐(𝑥) + 𝜌ℎ𝑡𝑠𝑐(𝑥) (7)

Where 𝜌 is a learning parameter, 𝐻𝑡𝑠𝑡𝑐(𝑥) is the spatio-temporal context at the t-th frame which can

be observed in frequency domain by𝐻_𝑡𝑠𝑡𝑐_{(𝜔) = 𝐹(𝜔)𝐻}

𝑡𝑠𝑐(𝜔), 𝐹(𝜔) = 𝜌/[𝑒𝑗𝜔− (1 − 𝜌)] is a

temporal filter.

(3) Computing the confidence mapat the t+1-th frame by the formula:

𝑐_𝑡+1(𝑥) = ℱ−1_{ℱ[𝐻

𝑡+1𝑠𝑡𝑐(𝑥)] ∙ ℱ[𝐼𝑡+1(𝑥)𝜔𝜎(𝑥 − 𝑥𝑡∗)]} (8)

(4) Computing the target center location at the t+1-th frame:

𝑥_𝑡+1∗ = arg max𝑥∈Ω𝑐(𝑥𝑡∗)𝑐𝑡+1(𝑥) (9)

Color Features

Although color is commonly experienced as an indispensable quality in describing the world around us, many feature-based representations are only based on shape description, and ignore color information [10,11]. The description of color is hampered by the large amount of variations which causes the measured color values to vary significantly. Joost van de Weijier et al. [12] provided three color spaces for the color histograms: RGB, HSL, and L*a*b*. Color attributes or color names (CN) in the L*a*b* space are linguistic color labels assigned by humans to represent colors. Berlin and Kay [13] concluded that the English language contained eleven basic color terms: black, blue, brown, grey, green, orange, pink, purple, red, white and yellow. In this paper, we employ the mapping method described in [14] to transform the RGB space into the color names space, which is an 11 dimensional color representation.

Following [15], we give the definition of color names features: it is a vector containing the probability of a color name given an image region I:

⁡CN = {𝑝(𝑐𝑛₁|I), 𝑝(𝑐𝑛₂|I), … , 𝑝(𝑐𝑛₁₁|I)}

𝑝(𝑐𝑛𝑖|I) = 1

𝑀∑𝑐∈𝐼𝑝(𝑐𝑛𝑖|𝑔(𝒄)) (10)

where, 𝑐𝑛𝑖 is the i-the color names, c are the spatial coordinates of the M pixels in region I, 𝑔(𝒄) is

(4)

[image:4.595.72.527.72.167.2]

(a) RGB spaces (b) CN spaces (c) RGB spaces (d) CN spaces

Figure 2. Images comparison between RGB spaces and CN spaces.

The STC Tracking with Color Features

When the target moves rapidly, pose changes and is in the complex background, the appearance of target changes apparently resulting in a failure tracking and drift phenomenon. To solve the problem above, this paper proposes a new tracking approach which combines the color information and the STC algorithm. In this algorithm, the feature of the object is modeled by using the gray feature and the adaptive dimension reduction color feature. In order to adapt to the multi-dimensional color feature, the conjugate complex of context prior model is introduced to improve the spatial context model.

The context prior probability with color features can be written:

𝑃(𝑣(𝑧)|𝑜) = (𝐼(𝑧) + 𝑥_𝑝)𝜔_𝜎(𝑧 − 𝑥∗) (11) Where 𝑥_𝑝 denotes the adaptive dimensionality reduction color features. According to [10,11], firstly, the RGB values of color image block are mapped into 10 dimensional space to get the 11 dimensional color probability 𝑥_𝑐; then 𝑥_𝑝 is computed by the linear project formulate 𝑥_𝑝 = 𝐵_𝑝𝑇_𝑥

𝑐,

where 𝐵𝑝𝑇 is a project matrix with 𝐷1× 𝐷2.

The procedure of the mentioned approach is summarized in table 1.

Table 1. STC tracking with color features. 1. Initiating the parameter of tracker α, β, ρ and so on;the color mapping, ground_truth. 2. Inputting the video frame and Extracting the CN features.

3. Computing the context prior model by cosine windows 𝜔𝜎,

P(c(z)|o) = (I(z) + 𝑥𝑝)𝜔𝜎(𝑧 − 𝑥∗)

Confidence map of target, the center error and precision data 4. If frame > 1

Computing the confidence map by:

𝑚𝑡+1(𝑥) = 𝐻𝑡+1𝑠𝑡𝑐(𝑥)⨂(𝐼𝑡+1(𝑥)𝜔𝜎(𝑥 − 𝑥∗))

Obtaining the max response coordinate; Updating confidence map;

End

5. Computing the spatio-temporal context model using FFT:

ℎ𝑠𝑐_{(𝑓) =} ℱ(𝑚(𝑥))

ℱ(𝑃(𝑐(𝑥)|𝑜))

6. Locking the target location, computing the center location error and precision.

Experiment Results

This section presents two challenging tracking examples which illustrate the benefits of combining the STC algorithmwith color features to demonstrate the efficiency and robustness of the proposed method.

The parameters of the map function are set to α = 2.25 and β = 1. The learning parameter is

(5)

[image:5.595.57.534.149.464.2]

In DAVID sequences, the tracking results of the proposed algorithm and the STC tracking method are almost the same when the object is in dark environment, both algorithms can track the object accurately. With the movement of object, light intensity changes from dark to bright, the two tracking results in the difference. From figure 3, we can see the tracking results of our proposed algorithm outperforms the STC algorithm results.

Figure 3. The tracking results of DAVID sequences.

Figure 4. The tracking results of GIRL sequences.

In GIRL sequences, the light intensity remains unchanged, but the pose of object is constantly changing and the occlusion phenomenon appears. From the tracking results shown in figure 4, the proposed tracking method can track the target more accurately than the STC algorithm under the condition of object rotation, pose variation and occlusion.

Conclusion

In this paper, we propose a new method for the object tracking which the color attributes are extracted in STC tracker. First, we review the STC tracking algorithm, next we introduce the merits of color attributes in object appearance representation, then propose the tracking method that the STC tracking algorithm combine the color attributes.

To test the accuracy of the proposed method, two videos are used. These videos involve illumination changes, object rotation, pose variation and partial occlusion, and the proposed method is applied on them. The experimental results show that the tracking accuracy of the proposed method on these videos are higher than that of the CN and STC tracker, and the center error outperforms the CN tracker and is slightly superior to the STC tracker. The proposed method cannot address occlusion. Therefore, in the future, we intend to place the occlusion handling mechanism into the proposed method.

Acknowledgement

(6)

References

[1] Wang Q, Chen F, Xu W L & Yang M H. Online discriminative object tracking with local sparse representation. Application of computer vision, (2012) 425-432.

[2] Qu Z W, Wei F L & Wei W. Pedestrian detect by radar vision data fusion. Journal of Jinlin University, 2013 (43): 1230-1234.

[3] Lei P, Wu T F & Pei M T, Robust Tracking by Accounting for Hard Negatives Explicitly. International Conference on Pattern Recognition. Tsukuba, Japan, 2012.

[4] Wu Y, Lim J & Yang M H. Online object tracking: a benchmark. CVPR 2013.

[5] Collins, R.T., Liu, Y. On-line selection of discriminative tracking features. ICCV2003:346-352.

[6] K. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and scene recognition. PAMI, 32(9):1582-1596, 2010.

[7] J. van de Weijer and C. Schmid. Coloring local feature extraction. In ECCV, 2006. 1, 5

[8] M. Danelljan, F. S. Khan, M. Felsberg, and J. v. d. Weijer, Adaptive color attributes for real-time visual tracking, in Computer Vision and Pattern Recognition (CVPR), 2014: 1090-1097.

[9] F.S. Khan, J. van de Weijer, M. Vanrell, Modulating shape features by color attention for object recognition, Proc. IEEE Int. J. Comput. Vis. 98 (1) (2012)49-64.

[10]Joost V D W, Schmid C. Applying Color Names to Image, Description IEEE International Conference on Image Processing, 2007: 493-496.

[11]Joost V D W, Schmid C. Coloring local feature extraction, European Conference on Computer Vision, 2006:334-348.

[12]Weijer J V D, Schmid C, Verbeek J. Learning Color Names from Real-World Images, IEEE Conference on Computer Vision & Pattern Recognition. 2007:1-8.

[13]Berlin B, Kay P. Basic Color Terms: Their Universality and Evolution. The David Hume Series of Philosophy and Cognitive Science Reissues. 1998:209.

[14]J. van de Weijer, C. Schmid, J. J. Verbeek, and D. Larlus. Learning color names for real-world applications. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society, 2009, 18(7):1512-1523.