Technology (IJRIT) www.ijrit.com

(1)

Preethi K

, IJRIT 606 International Journal of Research in Information

Technology (IJRIT) www.ijrit.com

www.ijrit.com www.ijrit.com

www.ijrit.com ISSN 2001-5569

Object Tracking in Video Based on LSK Descriptors and Object Tracking in Video Based on LSK Descriptors and Object Tracking in Video Based on LSK Descriptors and Object Tracking in Video Based on LSK Descriptors and

Bhattacharyya Similarity Measure Bhattacharyya Similarity Measure Bhattacharyya Similarity Measure Bhattacharyya Similarity Measure

Preethi K¹ and Jayasree K²

1M.Tech Student, Dept. of Computer Engineering, Model Engineering College, Ernakulam, Kerala, India

[email protected]

2Assistant Professor, Dept. of Computer Engineering, Model Engineering College, Ernakulam, Kerala, India

[email protected]

Abstract

A visual object tracking method which is based on local steering kernels and color histograms. In this method an object of interest is determined in the previous video frame and search for the object in the next frame by finding frame region that best matches with the input. As object changes object model is updated using a stack. Here Bhattacharyya distance is used for histogram comparisons. LSK features are extracted for both the candidate region and target object. Based on the LSK similarity new tracked object location is determined. After determining the object position in the current frame, the position of the object in the following frame is predicted using a linear kalman filter.

Keywords: Bhattacharyya measure,Local Steering Kernel,Kalman filter,Color histogram similarity

1. Introduction

Visual tracking has been an area of intensive research in the field of computer vision. Object tracking in video is an important problem which has many applications like video surveillance, target tracking using video sensors, etc. It is generally defined as a problem of estimating the position of an object

over a sequence of images. The main objective is to track followed by analysis to provide crucial information about the behaviors, interactions, and relationships between objects of interest. Many

techniques have been proposed for solving tracking problems. However, there are many factors that make the problem complex, such as illumination variation, appearance change, shape deformation, partial occlusion, and camera motion. Moreover, lots of these applications require a real-time response.

In video object tracking, various techniques are involved, including motion object detection, background subtraction, shadow removing, object tracking, etc. Many moving object detection approaches rely on background subtraction from one or several static cameras. Background subtraction algorithm involves construction of a statistical model that describes the background state of each pixel.

Different algorithms build the background model differently. Kalman filter is used to track linear dynamical background pixels under Gaussian noise, but the background model is easily affected by the

(2)

Preethi K

, IJRIT 607

foreground pixels in actual scene, and the accuracy of motion detection is low.

Two major components can be distinguished in a typical visual tracker. Target Representation and Localization is mostly a bottom-up process which has also to cope with the changes in the appearance of the target. Filtering and Data Association is mostly a top-down process dealing with the dynamics of the tracked object, learning of scene priors, and evaluation of different hypotheses. The way the two components are combined and weighted is application dependent and plays a decisive role in the robustness and efficiency of the tracker. In real-time applications only a small percentage of the system resources can be allocated for tracking, the rest being required for the preprocessing stages or to high-level tasks such as recognition, trajectory interpretation, and reasoning. Therefore, it is desirable to keep the computational complexity of a tracker as low as possible.

The object representation methods can be divided into five categories: model-based, appearance-based, contour-based, feature-based, and hybrid ones. Our tracking method is an appearance based one using both the color histograms to describe object color information and the local steering kernel (LSK) object texture descriptors. Bhattacharyya distance is used as similarity measure.

2. Related Work

The objective of video tracking is to associate target objects in consecutive video frames. A robust tracking algorithm must be able to solve the various difficulties in tracking, such as occlusion, rotation, size changes, the light changes etc. In recent years researchers have proposed a lot of tracking algorithms. They mainly based on motion model, optical flow, feature and image region features etc. Generally, in video object tracking, various techniques are involved, including motion object detection, background subtraction, shadow removing, object tracking, etc.

Tracking methods can be divided into two main categories. In the first category, the state sequence of a target is iteratively predicted and updated using prior information from past measurements and likelihood information from current measurements, respectively. Various filters have been used to predict the state sequence of a target including Kalman filters[11] and extended Kalman filters for linear predictions, and unscented Kalman filters for non-linear predictions. The most general class of filters, however, is represented by particle filters, also called bootstrap filters, which are based on Monte Carlo integration methods. Methods belonging in the second category use various target characteristics, such as color or gray-level information, shape, and motion information and perform tracking by predicting changes in the appearance of the target from frame to frame. In tracking alone methods, the initial locations of the targets is usually specified manually. On the other hand, the majority of methods employing detection along with tracking use a detect-then-track approach where the target is detected in the first frame and then turned over to the tracker in subsequent frames. The main problem with these methods is that they aim to resolve detection and tracking sequentially and independently from each other.

Visual Tracking problem can be formulated in two another different categories: generative and discriminative. Generative tracking methods use an appearance model to represent the target observations.

Tracking is formulated as searching the target location that has the most similar appearance to the model.

One example is using subspace models[15]. Other popular tracking methods include mean shift tracker, covariance tracker, and fragment tracker . Discriminative tracking methods cast the tracking as a binary classification problem. Tracking is formulated as finding the target location that can best separate the target from the background.

3. Proposed Method

An appearance based approach uses both color histograms(CH) and local steering kernels(LSK) to track the objects in video. The algorithm employs spatial information through LSKs and color information through CH for representing both the object instances and search region in the frame. This method uses

(3)

Preethi K

, IJRIT 608

color histograms(CH) and local steering kernel(LSK) object texture descriptors for identifying object on each frame. CHs and LSK features are compared using Bhattacharyya measure[23].

Prediction of the object on the following frame is obtained through Kalman filter method.

Fig 1: Block diagram of the proposed method

3.1 LSK Feature Extraction

(4)

Preethi K

, IJRIT 609

Local steering kernels (LSK) represent the local structures in images, and give a measure of local pixel similarities. LSKs are local descriptors of the image structure, which exploit both spatial and pixel-value information. They are a nonlinear combination of weighted spacial distances between a pixel of an image of size N1 X N2 and its surrounding M X M pixels. Local steering kernel measures the local similarity of a pixel to its neighbors both geometrically and radiometrically. The key idea is to robustly obtain local data structures by analyzing the radiometric (pixel value) differences based on estimated gradients, and use this structure information to determine the shape and size of a canonical kernel.

For each pixel, we have to extract an LSK vector, which measures likeness of a pixel to its neighboring pixels. Steps for calculating LSK for a pixel is given is below.

3.2 Color Histogram Similarity

Histograms are often used in image processing for feature representation (colors, edges, etc.). The complex nature of images implies a large amount of information to be stored in histograms, requiring more and more computation time. Color histograms in particular have many advantages for tracking non-rigid objects as they are robust to partial occlusion, are rotation and scale invariant and are calculated efficiently. Many approaches in computer vision require multiple comparisons

of histograms for rectangular patches of an input image. In such approaches, we dispose a reference histogram and try to find the region of the current image whose histogram is the most similar. The similarity is given by a measure that has to be computed between each target histogram and the reference one. Euclidean distance, histogram intersection, Bhattacharyya distance, or cosine or quadratic distances are used for the calculation of image similarity ratings.

Here we use color histogram similarity to eliminate background patches. In this method, search region is divided into overlapping patches and extract three histograms for each patch. These histograms are

(5)

Preethi K

, IJRIT 610

compared with histograms of query image using Bhattacharyya distance measure. Then a binary similarity matrix is formed from this CH similarity matrix. A 0 indicates that candidate patch is more similar to background and a 1 indicates that candidate patch is more similar to object. Method is shown below.

3.3 Object Localization

Object localization is performed on each frame based on the LSK and CH similarity matrices. Search region is divided into overlapping patches and for each patch, we construct two similarity matrices. LSK similarity between this patch and the object patch detected in the previous frame, MLSK and CH similarity matrix between this patch and previosly detected frame, BCH. Final decision is taken by computing the final decision matrix.

M = MLSK BCH

* denotes the element-wise matrix multiplication. The new candidate object position is at the patch with the maximal value in the final decision matrix.

3.4 Search Region Extraction

Linear Kalman filter is simple and efficient method for motion state prediction method. The Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) solution of the least- squares method. The filter is very powerful in several aspects: it supports estimations of past, present, and even future states, and it can do so even when the precise nature of the modeled system is unknown. The Kalman filter[11] addresses the general problem of trying to estimate the state x ∈ R of a discrete-time controlled process that is governed by the linear stochastic difference equation

xk+1 = Axk + Buk + wk

with a measurement z ∈ R that is

zk = Hxk + vk .

(6)

Preethi K

, IJRIT 611

The random variables wk and vk represent the process and measurement noise respectively.

Object position in the initial frame is given to the kalman filter algorithm which predicts the postion of the search region on the next frame. Method is shown below.

4. Experimental Results

The proposed method accepts a video and a query image as input and estimates the position of the object in the initial frame and subsequently tracks its movement in the remaining frames. Given below is an example for the query image provided as input.

Fig 2: Query Image

The accepted video is then divided into frames. The initial frame is then selected and divided into overlapping patches. For each image patch, LSK features are extracted and compared to the query LSK

(7)

Preethi K

, IJRIT 612

features to obtain LSK similarity matrix. Non-maxima suppression is applied to the LSK similarity matrix to obtain the final object patch.

Fig 3: Object detection in initial frame

After detecting the position of the object in initial frame, its position is given to the Kalman filter method which will predict the position of the search region on the next frame. Then color histogram of this search region is extracted for each plane and the binary color histogram similarity matrix is computed. Using this CH similarity matrix and LSK similarity matrix, final decision matrix is computed. Using this decision matrix, final object position on that frame is identified.

Fig 4: Object detection in the 4th frame

(8)

Preethi K

, IJRIT 613

5. Conclusions

This method track the object in a video based on the color histogram and local steering kernel. The method identifies the object in the initial frame using local steering kernel. Using color histogram information and LSK features, target object in the current frame is identified. Its location in the next frame which best suits the object is then determined. Bhattacharyya coefficient is used as the similarity measure. The target object is determined with respect to the detected object in the previous video frame. Search region in the next frame is estimated using Kalman filter.

References

[1] Olga Zoidi, Anastasios Tefas, and Ioannis Pitas, Visual Object Tracking Based on Local Steering Kernels and Color Histograms, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS

FOR VIDEO TECHNOLOGY, VOL. 23, NO.5, MAY 2013

[2] Dorin Comaniciu, Peter Meer, Visvanathan Ramesh, ”Kernel-Based Object Tracking”, Real-Time Vision and Modeling Department, Siemens Corporate Research

[3] Jinhua Wang , JieCao , Di Wu , Yabing Yu, ”An Object Tracking Algorithm Based on the Current Statistical Model and the Multi-Feature Fusion”, JOURNAL OF SOFTWARE, VOL. 7, NO. 9, SEPTEMBER 2012

[4] Hanzi Wang, David Suter, Konrad Schindler, and Chunhua Shen, ”Adaptive Object Tracking Based on an Effective Appearance Filter”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 29, NO. 9, SEPTEMBER 2007

[5] Zulfiqar Hasan Khan, and Irene Yu-Hua Gu, ”Joint Feature Correspondences and Appearance Similarity for Robust Visual Object Tracking”, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5, NO. 3, SEPTEMBER 2010

[6] Pushe Zhao, Hongbo Zhu, He Li, and Tadashi Shibata, ”A Directional-Edge-Based Real Time Object Tracking System Employing Multiple Candidate-Location Generation”, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 23, NO. 3, MARCH 2013

[7] Du Yong Kim, and Moongu Jeon, ”Spatio-Temporal Auxiliary Particle Filtering With l1 Norm-Based Appearance Model Learning for Robust Visual Tracking”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 2, FEBRUARY 2013

[8] Katja Nummiaro, Esther Koller-Meier, Luc Van Gool, ”An adaptive color-based particle filter”, Image and Vision Computing

[9] Arthur Louis Alaniz II, Christina Marianne G. Mantaring, Department of Electrical Engineering, Stanford University, ”Using Local Steering Kernels to Detect People in Videos”

(9)

Preethi K

, IJRIT 614

[10] Junxian Wang1 , George Bebis1 and Ronald Miller2, Computer Vision Laboratory, University of Nevada, Reno, NV, ”Robust Video-Based Surveillance by Integrating Target Detection with Tracking”

[11] G. Welch and G. Bishop, An introduction to the Kalman filter, Univ. North Carolina, Chapel Hill, NC, Tech. Rep. TR95041, 2000.

[12] Tao Gao, Guo Li, Shiguo Lian and Jun Zhang, ”Tracking video objects with feature points based particle filtering”

[13] Xiaohe Li, Taiyi Zhang, Xiaodong Shen, and Jiancheng Sun, ”Object tracking using an adaptive Kalman filter combined with mean shift”, SPIE, Optical Engineering, VOL.49

[14] Lian Xiaofeng, Zhang Tao, Liu Zaiwen, ”A Novel Method on Moving-Objects Detection Based on Background Subtraction and Three Frames Differencing”, 2010 International Conference on Measuring Technology and Mechatronics Automation

[15] Xue Mei, Haibin Ling, Yi Wu, Erik P. Blasch, and Li Bai, ”Efficient Minimum Error Bounded Particle Resampling L1 Tracker With Occlusion Detection”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 7, JULY 2013

[16] Hae Jong Seo, and Peyman Milanfar, ”Training-Free, Generic Object Detection Using LocallyAdaptive Regression Kernels”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 32, NO. 9, SEPTEMBER 2010

[17] Karthik Hariharakrishnan and Dan Schonfeld, ”Fast Object Tracking Using Adaptive Block Matching”, IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 5, OCTOBER 2005

[18] Peng Zhang, Tie-Yong Cao, Tao Zhu, ”A novel hybrid motion detection algorithm based on dynamic thresholding segmentation ”, 2010 IEEE International Conference on Communication Technology

[19] Changjlang Yang, Duraiswami.R, Davis L, ”Efficient mean-shift tracking via a new similarity measure”, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

[20] Jing Huang, Sean P. Ponce, Seung In Park, Yong Cao, and Francis Quek, ”GPU- AcceleratedComputation for Robust Motion Tracking Using the CUDA Framework” , Center of Human Computer Interaction Virginia Polytechnic Institute and State University.

[21] Forsyth and Ponce, ”Computer Vision - A modern Approach”, Prentice Hall, second edition, 2002 [22] Jifeng Ning, Lei Zhang, David Zhang, Chengke Wu, ”Interactive image segmentation by maximal similarity based region merging”, Biometric Research Center, Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong, China

[23] Konstantinos G. Derpanis, ”The Bhattacharyya Measure”.