Comparative Performance Evaluation of Three Object Tracking Methods

(1)

297

Comparative Performance Evaluation of Three Object

Tracking Methods

1_{Prajna Parimita Dash,}2_{Dipti Patra,}3_{Satya Ranjan Behera,}

4_{Subhendu Kumar Behera,}5_{Sudhansu Kumar Mishra}

1,2_{Dept of Electrical engineering, NIT, Rourkela, India}

3,4

Dept of ECE, DRIEMS, Cuttack, India

5

Dept of EEE, BIT, MESRA, India

[email protected],[email protected],

[email protected],[email protected],[email protected]

Abstract— Object tracking is the process of locating a moving object in the consecutive video frames. It is a challenging problem in the field of computer vision, automated surveillance, traffic monitoring, augmented reality, object based video compression etc. In this paper three techniques such as kernel based tracking using color histogram, tracking using segmentation and covariance feature based mean shift tracking algorithm have been applied for different challenging situations. Experimental results revels that the histogram based method is efficient in terms of computation time and covariance tracker is better in terms of detection rate. Covariance tracker can handle various challenges like occlusion, illumination changes etc. more effectively as compared to other two methods.

Keywords—Color histogram, Covariance. Segmentation, Object tracking, Occlusion.

I. INTRODUCTION

Fast and reliable detection of moving objects from a video sequence is important for different applications such as video surveillance, navigation, traffic management, video broadcasting, teleconferencing, human-computer interface etc. It is also the most essential components of computer vision applications that range from consumer electronics to smart weapons. Tracking of objects is very complex in nature due to several problems such as presence of different noise in video, motion of objects, non-rigid or articulated nature of objects, partial and full object occlusion, change in scene illumination, changes in background etc.

Researchers have proposed many methods for object tracking. A survey on video tracking is suitably presented by Emanuele Trucco et al. [1] and by Alper et al. [2]. Object tracking is done using image segmentation (Region growing) and pattern matching [3]. Dipti Patra et al. [4] used image segmentation and pattern matching for object tracking.

The images captured at poor illumination produces unwanted objects after segmentation. This spurious problem can be eliminated and tracking become more effective by incorporating adaptive thresholding [5]. Qingming Huang [6] used the thresholding technique with adaptive window selection for uneven lighting image. But this algorithm takes much time to give the segmentation results. In addition to it can not track the object untill the next segmentation process completed. This may not be suitable for real-time applications.

One of the tracking method i.e. the templates and density-based appearance models is being proposed by Schweitzer et al. [7]. Fieguth and Terzopoulos [8] generate histogram based object models by finding the mean color of the pixels inside the rectangular object region. Histogram matching technique is also implemented by F. Porikli et al. [9] for object tracking. Comaniciu and Meer [10] use a weighted histogram model, where mean color is computed from a circular object region. Comaniciu extended the mean-shift tracking approach which used a joint spatial-color histogram instead of just a color histogram [11]. Kang et al. used histograms of color and edges of the object models [12]. Alper used object tracking by asymmetric kernel mean shift with automatic scale and orientation selection [13]. Hong et al. applied color histogram based non-rigid object tracking algorithm using mean shift kernel approach [14]. Liu uses adaptive template block for matching block of target in [15]. However when there is similarity in object color with the background, the histogram matching gives poor performance.

(2)

298

In [17] F. Porikli et al. incorporated covariance tracking using Model Update Based on Means on Riemannian Manifolds.

The rest of the paper is organized as follows. The Fundamentals of Object Tracking in Video Sequences areoutlined in Section 2. The three techniques for object tracking are presented and discussed in Section 3. Section 4 provides the simulation results of present studies. Finally the conclusion of the investigation and further possible extension of the work is outlined in Section 5.

II.FUNDAMENTALS OF OBJECT TRACKING IN VIDEO SEQUENCES

The object tracking is a serial process of object representation, feature selection, object detection and finally object tracking. The object can be represented as a single point or set of points. Object is represented as primitive geometric shapes like rectangle, ellipse, circle, object silhouette, object contour, articulated shape models, skeletal model etc. The shape representation may also be combined with the appearance representations for tracking

purpose. Some of the commonly appearance

representations are probability densities of object appearance, templates, active appearance model and multiview appearance model. Feature selection is another important steps in object tracking. Some of the commonly used features are color, edges, texture, optical flow, gradient etc. Most of the tracking method requires an object detection mechanism in every frame. Some commonly used object detection methods are: point detectors, background subtraction, segmentation etc. After detection of the object the tracker’s task is to generate the trajectory of an object over time by locating the object position in every frame of the video.

III.THREE TECHNIQUES FOR OBJECT TRACKING

A. Object tracking in video images based on segmentation and pattern matching

Tracking over a long period of time is challenging as the shape and scale of the object changes. It is done by repeatedly segmenting object from background. Spatial support obtained in segmentation provides rich information about the object and enables reliable tracking. Desirable features of the segmented objects are extracted. Then to detect the objects in the successive frames feature matching is performed.

In this paper Otsu thresholding is used for segmentation and Manhattan distance is calculated to estimate the position and motion of the object in successive frames. Let

f

(

x

,

y

)

be the image with

m

gray levels and the threshold be

j

, where

0 

J



m



1

. Then, all pixels in image

f

(

x

,

y

)

can be divided into two groups: group

A

with gray level values of pixels less than or equal to

j

, and group

B

with values greater than

j

.

Also, let



₁

(

j

),



₂

(

j

)

be the number of pixels

and

M

₁

(

j

),

M

₂

(

j

)

be the average gray level value in group A and group B, respectively. Then,

1

0 ,

)

(

0

1











m

j

n

j

j i i



(1)

1

0 ,

)

(

)

.

(

)

(

1 0

1











_j

_m

j

n

i

j

M

j i i



(2)

1

0 ,

)

(

1 1

2









  

m

j

n

j

m j i i



(3)

1

0 ,

)

(

)

.

(

)

(

2 1 1

2









  

m

j

n

i

j

M

m j i i



(4)

where

n

_i is the number of pixels with gray level value

i

. The average gray level value M_T of all the pixels in image

)

,

(

x

y

f

can be expressed as:

1

0 ,

)

(

)

(

)

(

)

(

)

(

)

(

2 1 2 2 1

1









j

m

j

M

j

M

j

M

_T



(5)

The variance between the two groups, denoted as



_B2

(

j

)

is 2 2 2 2 1 1 2

)

(

)(

(

)

(

)(

(

)

(

T T

B

j





j

M

j



M





j

M

j



M



)

(

)

(

))

(

)

(

)(

(

)

(

2 1 2 2 1 2 1

j

M

j

M

j









(6)

The

j

value range from

0

to

m



1

. The Eq. (6) is applied to calculate each



B2

(

j

)

and the value

j

(3)

299

The main advantages of this method is that it makes full use of low-level and mid-level cues and does not require (i) rigid shape (ii) distinctive local features or (iii) unique color (iv) model dynamics (v) a priori knowledge of the object. Another advantage is that the trackingof object for long time without drifting is possible in this method.

The disadvantages of this method are (i) it is sensitive to change in illumination (ii) sensitive to noise (iii) can perform only with a static background. It is unable to work for the problems with weak object contours and strong background edges near object. Hence it does not handle occlusion.

3.2. Kernel-based object tracking using color histogram

In this method a feature space is chosen to characterize the target. The reference target model is represented by its probability density function in the feature space. In the subsequent frame, a candidate model is defined at location

y

and is characterized by the probability density function

p

(

y

)

.The

pdfs

are estimated from the

m

-bin histograms. A similarity function



ˆ

(

y

)

called as the

Bhattacharyya coefficient between

p

ˆ

and

q

ˆ

plays the role of likelihood and its local maxima in image indicate the presence of objects.

The target mode

q

ˆ



{

q

ˆ

_u

}

_u_₁_..._m,

1 ˆ

1





 u m

u

q

(7)

The candidate model

p

ˆ

(y)



{

p

ˆ

_u

(

y

)}

_u_₁_..._m,

1 ˆ

1







u m

u

p

(8) The distance between two discrete distributions is defined as

]

ˆ

[

1 )

(

y







p

(y),

q

d

(9) where



[

p

ˆ

(

y

),

q

ˆ

]

is the Bhattacharyya coefficient.





m

u

u u

y

q

p

y

1

ˆ

)

(

ˆ

]

ˆ

),

(

ˆ

[

)

(

ˆ



p

q



(10)

The main advantages of this methods are: (i) this method successfully coped with complex camera motion, partial occlusion of the target, presence of significant clutter, and large variations in target scale and appearance (ii) it runs very fast (ii) it is suitable for models that have multiple dominant colors (iii) when there is no object, global

optimum needs much less time (iv) position, size and orientation can be simultaneously and precisely detected and tracked.The main disadvantage of this method are: (i) in this method the spatial information of the target is lost (ii) the similarity measures like Bhattacharya coefficients

and Kullback-Leibler divergence are not very

discriminative (iii) representation of object histogram disregards the spatial arrangement of the features and do not scale to higher dimensions (iv) as it depends on the color feature, it can not give good performance when an object and its background have similar colors (v) the color of an object depends on illumination, view point, and camera parameters that tend to change during a long tracking process. So fixed color features are not discriminative enough.

3.3. Object tracking using covariance tracker

For a given object region, the covariance matrix of some features is computed. It is considered as model of the object. In the current frame the region that has minimum covariance distance from the model is determined and that region is assigned as estimated location. Let

I

_{be the}

observed one dimensional intensity image.

F

be the

d

H

W



dimensional feature image extracted from

I

.

)

,

(

)

,

(

x

y

I

x

y

F





₍₁₁₎

For a given rectangular window region

R



F

,

n k k

}

1...

{

f

_ is the

d

dimensional feature vector inside

R

.

)]

,

(

),

,

(

),

,

(

,

[

x

y

I

x

y

I

x

y

I

xx

x

y

k



f

(12)

The covariance matrix for the

M



N

rectangular region

R

is calculated as follows.

T R

MN

(

)

(

)

1

R k R

k

μ

f

μ

f

C







(13)

where

μ

_Ris the vector of the mean of the corresponding

features for the points within the region

R

. To get the most similar region to the target object, the distance between the covariance matrices corresponding to the target object window and the candidate region is calculated as:





(

)

(

C

i

,

C

j

ln

2



k

C

i

,

C

j



(14)

where



_k

(

C

_i

,

C

_j

)

are the generalized Eigen values of

i

(4)

300

0 



_j _k

k i

x

C

x

C

k



,

k



1 ,

2 ,...

d

₍₁₅₎

where

x

_k are the generalized Eigen vectors.

At each frame we search the whole image to find the region which has the smallest distance from the current object model. The best matching region determines the location of the object in the current frame.The main advantages of this method are: (i) the covariance matrix can take many possible features such as coordinate, color, gradient, edge, texture, motion etc. Hence it captures the spatial and statistical properties (ii) the covariance matrices are low-dimensional compared to other region descriptor

and due to symmetry

C

_R has only

(

d

2



d

)

2

distinct values (iii) it works detecting single as well as multiple rigid and non-rigid objects (iv) it works satisfactorily at object deformations and appearance change (v) noise is largely filtered out during covariance computation (vi) the covariance matrix of any region has the same size, thus it enables comparing any regions without being restricted to a constant window size (viii) covariance is invariant to the mean changes such as identical shifting of color values. This became an advantageous property when objects are tracked under varying illumination conditions.The main disadvantages of this method are: (i) since covariance matrices lie in a Riemannian space an appropriate distance metric has to be used when comparing regions (ii) computation is not very fast as tracking required global search. Hence the tracking of very fast moving object is difficult.

IV.SIMULATION RESULTS

Figure 3.(a)

Figure 3.(b)

Figure 3.(c)

Ball sequence frame 1, 10, 25 (left to right) fig 3(a): Segmentation based, fig 3(b): Histogram based, fig 3(c): Covariance

based 1/58

50 100 150 200 250 300 350 50

100

150

200

29/58

50 100 150 200 250 300 350 50

100

150

200

57/58

50 100 150 200 250 300 350 50

100

150

200

Figure 4.(a)

1/58

50 100 150 200 250 300 350 50

100

150

200

29/58

50 100 150 200 250 300 350 50

100

150

200

57/58

50 100 150 200 250 300 350 50

100

150

200

Figure 4.(b)

1/58

50 100 150 200 250 300 350 50

100

150

200

29/58

50 100 150 200 250 300 350 50

100

150

200

57/58

50 100 150 200 250 300 350 50

100

150

200

[image:4.612.47.284.488.664.2]

Figure 4.(c)

(5)

301

Figure.5(a)

Figure.5(b)

Fig 5: Football sequence 1 with even lighting condition with and without occlusion: (a): frame 9, 40, and 49 (left to right) Histogram based (b): frame 26, 49, 76(left to right) Covariance based

Figure.6(a)

[image:5.612.309.570.112.408.2]

Figure.6.(b)

Figure.6.Football player sequence 2 with uneven lighting condition: frame 26, 49, 76 (left to right) fig 6(a): Histogram based, fig 6(b): Covariance based

Detection rate is the ratio of the number of frames the object location is accurately estimated to the total number of frames in the sequence. The detection rate of three different methods applied to four different video sequences are shown in Table.1. The covariance tracking method is more efficient as compared to other two methods in terms of detection rate. The segmentation based method is not applied to football video 1 and 2.

TABLE.1

Ball Video

Table Tennis

video

Football video 1 with even lighting

Football video 2 with uneven lighting

Segmentation Base

Frame Missed

/Total 2/92 8/58 ---- ----

Detection

rate 97.82 %

86.20%

---- ----

Histogram Based

Frame Missed/

Total

2/92 2/58 58/427 90/280

Detection

rate 97.82 % 96.55 %

86.42 %

67.86 %

Covariance method

Frame Missed/

Total

0/92 0/58 16/427 8/280

An experiment has been carried out to compare the three methods in terms of CPU time required in second to detect object in each frame and detection rate.

TABLE.2 Segmentation

Based

Histogram Based

Covariance method

CPU Time/frame

CPU Time

Per frame

CPU Time

Per frame

Ball video 750 msec 500 msec 600msec

Tennis video 800 msec 550msec 700 msec

Football video 1 with even

lighting condition

--- 580msec 720msec

Football video 2 with uneven

lighting condition

--- 580 msec 730 msec

[image:5.612.47.305.114.371.2]

(6)

302

V.CONCLUSION AND FUTURE WORK

In this paper three techniques have been applied for object tracking for different challenging situations. From the experiments results it can be conclude that the object tracking using segmentation and feature matching method is a simple but cannot efficiently track the object with the described difficult situation. The kernel based object tracking using color histogram method is not very good in terms of detection rate but its computational time is less as compared to other two methods. The object tracking using covariance tracker method is more efficient as compare to the other two methods as it can handle occlusion, illumination changes, color changes etc. Though its detection rate is higher than other techniques, still this method is having the problem of computational burden, for which it fails to track the object with faster motion.

In addition to the color features if some other features like spatial information, texture information etc. will be included then performance of kernel based object tracking using color histogram method can be improved and it may become more efficient and faster algorithm. Future research work in the topic includes incorporation of other features like texture and motion vector to improve the robustness of tracking algorithm. Statistical models based approach may be applied to the problem. Evolutionary computing methods for object initialization may apply to the problem.

References

[1] Emanuele Trucco and Konstantinos Plakas, 2006, ―Video Tracking: A Concise survey‖ IEEE Journal.Ocean Eng., vol. 31, no. 2, pp. 520-528.

[2] Alper Yilmaz, Omar Javed and Mubarak Shah, 2006, ―Object Tracking: A Survey‖ ACM Comput. Surv.38,4,Article 13,December.

[3] O. Javed and M. Shah, ―Tracking and object classification

for automated surveillance,‖ in Proc. Eur. Conf. Computer

Vision IV, pp. 343–357,2002

[4] Dipti Patra ,Santosh Kumar K, Debarati Chakraborty Object Tracking in Video Images Using Hybrid Segmentation Method and Pattern Matching IEEE India Council Conference INDICON 2009.

[5] Takashi Morimoto, Osamu Kiriyama, Youmei Harada, Hidekazu Adachi, Tetsushi Koide and Hans J¨urgen Mattausch,‖Object Tracking in Video Pictures based on Image Segmentation and

Pattern Matching‖, in proc Euro conf image processing ,pp

3215-3218, 2005

[6] Qingming Huang , Wen Gao, Wenjian Cai,‖ Thresholding technique with adaptive window selection for uneven lighting image‖,ELSEVIER, Pattern Recognition Letters 26 (2005) ,Pages,801–808,2005

[7] H. Schweitzer, J. W. Bell, and F. Wu. 2002, ―Very fast template matching‖. European Conference on Computer Vision (ECCV).pp.358–372.

[8] P. Fieguth, and D.Terzopoulos.1997, ―Color-based tracking of heads and other mobile objects at videoframe rates‖. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).pp.21–27.

[9] ] F. Porikli. Integral histogram: A fast way to extract

histograms in Cartesian spaces. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, San Diego, CA, volume 1, pages 829 – 836, 2005.

[10]D. Comaniciu, V. Ramesh, and P. Meer. 2003 ―Kernel-based object tracking‖. IEEE Trans. Patt. Analy. Mach. Intell. 25, pp-564–575.

[11]D. Comaniciu, P. Meer, 2002, ―Mean shift: A robust approach toward feature space analysis‖. IEEE Trans. Patt. Analy.Mach. Intell.24, 5, pp-603–619.

[12]J. Kang, I. Cohen, G. Medioni. 2004, ―Object reacquisition using geometric invariant appearancemodel‖.International Conference on Pattern Recongnition (ICPR).pp-759–762.

[13]Alper Yilmaz, 2007, "Object Tracking by Asymmetric Kernel Mean Shift with Automatic Scale and Orientation Selection," IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.1-6.

[14]Hong Yu, Jie Wei, Jin Li, 2009 "Object Tracking by Mean Shift Based on Color Distribution and Simulated Annealing," International Seminar on Future Information Technology and Management Engineering, pp.128-131.

[15]E. Zhou ,C. Liu, Y. Sun, Z. Wang . S. Gong. 2010,―Adaptive tracking window updating algorithm based on particle filtering‖. IEEE International congress on Image and Signal processing.pp.303-307.

[16] O. Tuzel, F. Porikli, and P. Meer. Region covariance: A fast descriptor for detection and classification. In Proc. 9th European Conf. on Computer Vision, Graz, Austria, volume 2, pages 589–600, 2006.