International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
46
A Computational Analysis of Recent Multi-Object Tracking
Methods Based on Particle Filter, HMM and Appearance
Information of Objects
Raksha Shrivastava
1, Professor Rajesh Nema
21,2Department of Electronics and Communication, NRI Institute of Information Science and Technology, Bhopal (M.P)
Abstract— Detection of object’s movement in a video is an
important process for various applications. The crucial step to determine is the path of the objects as time advances. Many sophisticated techniques to track the multiple moving objects in the video have been proposed. This paper gives a detailed description of the recent object trackers based on particle filtering and Markov Models and their analysis in terms of time complexity, computational complexity, computation time, robustness and false positive.
Keywords— Computational Complexity (CP), False
Positive (FP), False Positive Rate (FPR), frames per second (fps), Finite State Machines (FSM), HMM (Hidden Markov Model), Markov Chain Monte Carlo (MCMC), Particle Filter (PF), PF for Joint Detection, Tracking (PFJDT), and True Positive (TP).
I. INTRODUCTION
The advanced multi-object tracking methods can detect the movement, orientation and the path of the objects. The probabilistic characteristics of the object tracking algorithms enable us to detect the unstable and unpredictable trajectory of the object in the video.
The basic steps in the tracking of objects in video are: choose a feature to describe the objects, detect the objects of interest, track those objects for every frame and analyze their tracking which fetches their behavior.
In this paper, recent multi-object tracking methods are explained broadly and a computational analysis is performed based on time complexity, computational complexity, computation time, robustness, true positive (TP) and false positive (FP).
II. RELATED WORK
A.Parameters in Object Tracking
Several techniques are approached for object tracking [7], [8], [12 - 14], [23], [25], [27], [28], [38], [43], [45], [47], [48]. Some of the parameters used in object tracking are,
true positive (TP) and false positive (FP).
When the ground truth, i.e. the objects to the tracked, is manually segmented, one can compare and spot the correctly labeled object pixels (true positives – TP) and non-object pixels which are falsely labeled as non-object pixels (false positives – FP). False positive is also known as false alarm.
False reduction is the decrease of the detection of false
objects, which are represented by TP and FP. False reduction analysis determines the tracking accuracy of the multi-object tracker. The ratio of false-positive detections that cannot be adapted to any ground-truth trajectories over the number of detections is known as the false positive rate. False positive is also known as false alarm.
B.Multi-Track Linking
For tracking large groups of objects the data association is an important process [4]. Long-term occlusions are mainly responsible for problems during the data association process, resulting in “track-switch” or “track-lost” errors.
When occlusions occur, individual or multiple tracks become merged. After the occlusions, the merged tracks separate into individual tracks. To maintain the integrity of merging and splitting process, the “track linking method” assumes each track as a “tracklet” and links these tracklets.
Earlier approaches used a local linking strategy that calculated the pairwise cost between tracklets in a repetitive manner [35, 41, 27]. Track graph was introduced by Nillius
et al [42] and a Bayesian network interference algorithm as
a global linking strategy. The global linking allows simultaneous matching of various tracklets, but it is computationally expensive.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
47 Another division of data association techniques are sampling-based algorithms. Oh et al [50] introduced a framework to particulate the data association hypothesis and track a large number of objects, by a Markov Chain Monte Carlo (MCMC) approach. Khan et al [39] proposed a probabilistic model to affiliate merged and split components using a MCMC-based particle filter. Yu and Medioni [20] extended the work of Songhwai et al [50] to detect the appropriate temporal and spatial affiliation of segments with a Data-Driven MCMC sampling approach.
A problem in measurement-level of data association methods is the huge problem size for batch processing. A tradeoff is usually made between batch size and accuracy by a sliding window [21]. The framework of temporal data-association is extended to the tracklet level, where a matched unit comprises of a pair of trajectory segments [34]. In temporal data association the local links are produced at each level between track fragments [35, 41, 21]. But, Nillius et al [42] processed the track graph globally which defines all the object interactions.
C.Particle-Filter Based Multi-Object Tracker
The objects in multiple-object tracking methods are bigger when compared to the conventional point tracking methods. These are known as „extended‟ objects. One needs to create point measurements relative to extended object detections and apply one of the existing point-target tracking algorithms, for multiple-object tracking. Point-target tracking methods such as Kalman filters, JPDA (Joint Probabilistic Data Association) [59], multi-dimensional assignment [60] and the PHD (Probability Hypothesis Density) filter [32] can be used for multi-object tracking.
Comaniciu et al [62] proposed a weighted histogram calculated from a circular region for representation of the object. The similarity between the color histograms was evaluated by the Bhattacharya coefficient. Tracking involved gradient ascent with an adaptive step. For tracking colored objects, Nummario et al [54] proposed a particle filter for single colored object and manual initialization. Multi-object tracker based on color and PF with automatic object initialization/deletion was proposed in [57].
D.Long-Term Online Multiface Tracking Using Particle
Filter and Hidden Markov Model (HMM)
Most of the multi-face detectors in the recent years are applicable only when the persons look towards the cameras [1d], but this is not possible for all scenarios. The difficult head postures when they last for long time, it is tough to track the trajectories of the objects.
Many multiple face tracking methods have been proposed ([57, 21, 32, 28, 17]), which mainly concentrates on new features, better dynamics, multi-cue fusion mechanisms or adaptive models ([7, 8, 9, 10]). The results are always based on short video sequences only.
Only very multi-object trackers address the track termination and track initialization, especially in terms of performance evaluation. A high confidence threshold in the face detector may result in missing an early track initialization. Conversely, false tracks may also occur due to low threshold false tracks.
Some methods combine the track termination and track creation within the tracking architecture itself, like in Reversible-Jump Markov Chain Monte Carlo (RJ-MCMC) [46], [30]. But for effective tracking global scene likelihood [22] models with a constant number of observations are required, which are tedious to implement in multi-face tracking applications. Kalal et al [16] proposed a method for failure detection in visual object tracking, which is based on the fact that a correctly tracked target can be tracked backwards in time. But, the backward tracking increases the overall computational complexity (by a scale linear in backward weight). In a particle filtering architecture, another method is to directly model a failure state as an arbitrary variable within the probabilistic model [37].
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
48 Another technique for multiple pedestrian tracking [11] affiliates smaller tracklets online in a statistical sampling architecture, but no mechanism was proposed for the starting/ending of the tracks.
III. RECENT MULTI-OBJECT TRACKING METHODS BASED ON PARTICLE FILTER, HIDDEN MARKOV MODEL AND
APPEARANCE INFORMATION OF OBJECTS
Many multi-object tracking methods have been proposed for various applications in the field of robotics, medicine, human surveillance and weather tracking. These are adapted to the fast and arbitrary motion of many objects for a video sequence. Each multi-object tracker has its own pros and cons, adaptable to its own purpose and field of application. Some of the multi-object trackers based on particle filter, Hidden Markov model (HMM) and appearance information of objects are described in the following section.
E.Multi-Object Tracking Based on Coupled Layer
Utilizing HMM and Sequential Particle Filter
A coupled layer based object tracking method consists of a local layer and a global layer, which is adaptive to the object‟s global and local appearance [10]. The local layer and the global layer use the local and global appearance information respectively. The local layer comprises a group of local patches that limits the changes in the target‟s appearance geometrically. The whole structure is updated by addition and removal of local patches. The addition of the patches is governed by the global layer that models the object‟s global visual elements like apparent local motion, shape and color. The global visual elements are updated by stable patches of local layer.
The allocation of new patches in the local layer is constrained by global layer which combines the target‟s visual features. To overcome this constraint, a probabilistic model is required. A sequential particle filtering scheme and Hidden Markov Model (HMM) [6] is combined with this coupled layer object tracker to delimit the allocation of new patches in the local layer.
1) Overview of the Multi-Object Tracker: The local layer
focusses on the target object‟s geometric deformation and this information is passed from the local layer to global layer through initialization of particle filter and HMM.
The sequential particle filter is used to detect the local layer patches, and the sequence of deformation information is stored using HMM at global layer. This enhancement to the global layer improves multiple objects tracking efficiency. The sequential particle filter functions as a video tracker using distributed multi-object tracking and high-order Markov chains. The local layer patches needs to be guessed during the tracking to initialize the adaption of the visual model. The target object‟s center is assumed as the weighted average of the patches‟ positions. The HMM is applied for global layer prediction and sequential particle filter is used for initialization of local layer. The particle filter localization integrated with HMM prediction enhances the prediction performance and thereby decreases the time consumption. The positions of detectedobjects are stored using predetermined memory allocation.
2) Description of the Multi-Object Tracker: The
multi-object tracker consists of several modules such as loading the video sequence, frame conversion, particle filter implementation and HMM. The overview of the multi-object tracker is shown in Fig. 1.
a) Loading of Video Sequence: The input video sequence is a stack of images separated by time frames. The image sequence is loaded using mmread () function in MATLAB. This function also performs the frame conversion. A reader object is created which can read the image stack. The video file can be of any extension. The mmread () function inputs the image sequences with parameters such as, bits/pixel, fps, duration and height, width and number of frames.
b) Particle Filter Implementation: The particle filter approximates the filtered posterior distribution of the images in the image stack, by a group of weighted particles. The local layer patches are initialized by the particle filter via the prediction of the previous frame patches‟ locations and they are weighed based on a motion model.
The state of the system, at time t is estimated using Markov model and represented as in equation 1.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
[image:4.612.50.297.136.509.2]49
Fig. 1. Overview of the Multi-Object Tracker.
Here, E (1) can be initialized using prior knowledge. According to Markov model, current state is independent of past and future state. Therefore, the observations are dependent only on the current state as expressed in equation 2.
E (t) = P (yt | xt) * P (xt | xt-1) * E (t-1) (2)
Where, P (yt | xt) is the observation model and E (t-1) is the proposal distribution. In addition to these parameters the proposed framework also requires the following attributes namely, Motion model, Observation model and Initial model. The samples should be likelihood weighted by ratio of posterior and proposal distribution. Thus, weight of particle should be changed depending on observation for current frame. In the proposed method, particle filtering has used sequential Monte Carlo simulation.
The following steps are carried out by the particle filter:
Initialize xt for the first frame
Generate particle set consisting of N particles i.e. {xt m}
m = 1, 2, … N
Predict each particle (Using 2nd order auto-aggressive dynamics)
Compute distance between each particle
Weigh each particle depending on distance
Select the location of target as a particle, which has minimum distance
c) Implementation of Hidden Markov Model: The
Markov process used in the sequential particle filtering technique is not suitable for prediction at global layer, because it is a simple stochastic process. In the Markov Process the past states doesn‟t have any influence on the present states.
Let {xt : t is in T} be a stochastic process with discrete-state space S and discrete-time space T. The time space satisfies Markov property P (xn+1 = j | xn = in-1 … x0 = i0) = P (xn+1 = j | xn = i) for any set of states i, j in S and n ≥ 0 is called Markov chain.
The allocation of new patches in the local layer is encodes the target‟s global visual features. This does not have a direct effect on the change of the states, so this scenario deems the model as hidden. The model observes the emission of the changes in the states.
A probabilistic HMM uses the following representations:
A set of states over time, denoted by STATES
A set of emissions, or observations over time, denoted by SEQ
An M-by-Mtransition matrixTRANS whose entry (i,
j) is the probability of a transition from state i to state j.
An M-by-Nemission matrixEMIS whose i, k entry
gives the probability of emitting symbol sk, given that the model is in state i.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
50
3) Merits of this Multi-Object Tracker: The combination
of the HMM prediction with the particle filter localization enhances the prediction performance and thereby limiting the time consumption. The storage of the positions of the detected objects using cell array memory allocation decreases the computational complexity, which in turn increases the robustness of the system.
4) Demerits of this Multi-Object Tracker: The
computation time of this multi-object tracker is less when compared to the recent multi-object trackers. Also the number of objects tracked in the video sequence is limited. This multi-object tracker does not resolve occlusions and multiple views of a camera. The false positive analysis was performed only for the first few frames and not for the entire video sequence.
F.Multi-Track Linking Methods for Track Graphs Using
Set-cover Techniques and Network-flow
This multi-object tracking method accounts for occlusions in multiple small objects via set-cover techniques and network flow. A track-linking framework is designed for short-term and long-term occlusions. A two-stage network-flow process is constructed to track the merging and splitting events produced by occlusion [4].
Local appearance information is used to differentiate objects and process the links trajectory segments via a series of optimal bipartite-graph matches, which resolves the short-term occlusions. Global appearance information is used to characterize the objects and the linking process calculates a logarithmic approximation solution for the set cover problem.
For multiple views of the objects, a track graph is constructed for each view and the track segments from each graph are linked simultaneously.
1) Overview of Multi-Track Linking: A track graph G =
(V, E) is classified over sets of vertices „V‟, that define individual or merged tracks and edges „E‟ that define merging or splitting events. The flow on the edge shows the number of objects involved during the merging or splitting event. The vertex that composes of only incoming edges is known as sink and the vertex that has only outgoing edges is known as source. The set of all source vertices is represented as „S‟ and the set of all sink vertices is represented as „T‟. Every vertex has its track capacity to denote single or multiple objects. For a source vertex, the associated track-capacity is the sum of outgoing flows, while it is the sum of incoming flows for that of sink vertex.
2) Algorithm for Track Graph Construction: The
algorithm consists of two stages. In the first stage, the algorithm produce track fragments and merge/split the track that gives the vertices and edges of the track graph. In the second stage, a path-reducing min-flow algorithm estimates the track-capacity of every flow and vertex for each edge. The generation of track-merge hypotheses and track-split hypotheses are given in Fig. 2, and Fig. 3.
The list of Hm and Hs is ordered according to time and a vertex of the track graph is produced for each track on this order. In the source vertices, the flow is pushed through the graph G until the lower bound capacity of every edge is satisfied, resulting in a feasible flow.
3)Linking Scheme for Track Graph: A local and global
[image:5.612.323.571.410.705.2]linking procedure process the track graph. Local linking is performed only when a track can find its successor without the need to look further. Thus local information is not passed through the whole graph and linking is effectively performed by a series of matching of bipartite graphs. The vertices on the graph are first ordered according to the initiation time of the relative task. The local linking executes each vertex sequentially until all matching is complete.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
[image:6.612.57.276.256.467.2]51 There are three types of local structures in a track graph. The first structure represents a merge hypothesis (Hm) and each individual track is extended with the merged track. The trajectory is smoothened when the tracks are combined. The second structure represents a split hypothesis (Hs) and each split track is extended reversely with the merged track. Finally, the third structure represents a merge hypothesis next to a split hypothesis. Each individual track is extended with the merged track and the best match between two sets of tracks is searched.
Fig. 3. Generation of Track-Split Hypotheses (Hs).
Global linking connects many trajectory segments together at the same time. The linking process is defined as a generalized set-cover problem. All the possible paths from source set „S‟ to sink set „T‟ is enumerated. A deterministic greedy method is used to solve the set-cover problem.
For linking the tracklets in multiple views, the multi-view global linking problem is described as a joint set-cover problem. A track graph is generated independently for each view. For each graph all valid paths are estimated. A cover on the set of vertices for each view is computed, with a solution of minimum weighted sum.
4) Merits of this Multi-Object Tracker: This multi-object
tracker is able to resolve occlusions and can handle higher number of smaller object both in single view and multiple views of the cameras. It has a good false positive rate.
5)Demerits of this Multi-Object Tracker: This cannot
track people in a highly dense area.
B.Particle Filter for Tracking People in Visual
Surveillance
The particle filter detection is based on automatic background modeling rather than manually generating the object color model [5]. A labeling method is used to create tracks of the objects. Some of the issues that need to be addressing in multiple objects tracking are creation and removal of object states, missing detection, clutter and ambiguity.
The creation and deletion of object states is framed as an estimation of the number of objects in the video sequence. A particle filter (PF) [51] is a sequential estimator that evaluates the state density with a discrete group of point estimates. The multi-modal state densities allowed by the PF can be useful in handling ambiguity, clutter and missing detections. A specific variant of the PF for visual tracking is the PF for Joint Detection and Tracking (PFJDT) [49].
1) Overview of Particle Filter Based Multi-Object
Tracker: In PFJDT algorithm, each particle defines the
number of objects in the video sequence and their boundaries. However, the PFJDT has some limitations that avoid them to use for visual surveillance tracking. PFJDT is only effectively a state estimator. The filtering process should provide a state density that eases the tracking by corresponding states over time. Another demerit of the PFJDT is that it depends on object appearance (color) model, which is must be pre-constructed.
This multi-object tracker uses the PFJDT framework [49], but with two modifications. First, the measurement update step utilizes the foreground detections from a statistical model of the background. Next, a labeling algorithm is used to create labeled tracks from the estimate of the state density. The algorithm works upon the individual particles which accounts for accuracy and threshold approximation which accounts for efficiency.
2) Architecture of the Particle Filter: In the sequential
Bayesian estimation framework [51] the posterior probability density function is estimated, using two steps of prediction and update. PFs are a strong class of powerful and relatively easy to implement approximations based on the Sequential Monte Carlo method. The PF tracking system consists of three components connected in series, such as background model, particle filter, and object labeling.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
[image:7.612.61.272.131.481.2]52
Fig. 4. Process of Background Modeling.
The particle filter is applied to estimate the number of foreground objects in the video sequence and their limits. The object labeling algorithm establishes the relation between object bounding limits at each time step. Object tracks with a unique integer identifier are obtained as a result of object labeling.
The output of a multi-object tracker should be a group of object state estimates with a unique track identifier. The track identifier defines the states over time as belonging to the same object. A track displays the history of an object‟s state during the video sequence. A particle state labeling step is necessary to establish the correspondence of the particle states over time and create object tracks.
3) Merits of this Multi-Object Tracker: This multi-object
tracker is designed especially for tracking people in applications like visual surveillance.
4)Demerits of this Multi-Object Tracker: This
multi-object tracker cannot focus on the faces of the people for concentrated tracking.
C.Long-Term Online Multiface Tracking Using Particle
Filter and Hidden Markov Model (HMM)
An efficient way to track multiple faces is by improving the track management, i.e. deciding when to add/stop a target tracking [1]. Erroneous early stopping while tracking must be carefully considered, as the observation models are insufficient to deliver reliable likelihood (tracking) information and explain the variability of tracked objects. The tracking is performed in a multiobject state-space Bayesian filtering framework solved with Markov Chain Monte Carlo. A probabilistic filtering step decides when to add/remove a target object from the tracker, where decisions depend on face detections, long-term observations, track state characteristics and likelihood measures. The long-term observations are collected and processed using two separate HMMs (Hidden Markov Models), calculating on when to insert or delete a target to the tracker.
1) Overview of this Multi-Object Tracker: This
multi-object tracker works upon a principled Bayesian filter framework, solved with a MCMC sampling scheme that deals with object interactions. The tracking framework consists of an explicit probabilistic filtering framework, long-term image observations and static observations.
2) Multi-face Tracking Based on Particle Filter: The
problem of Multi-face tracking is dealt with a recursive Bayesian framework. The posterior probability distribution is estimated over the present state, under observations for a time period. The estimation process is implemented using a particle filter with a MCMC sampling scheme. The main components of the model are, State Space, State Dynamics, Observation Likelihood and Target Algorithm.
A multi-object state space formulation is used with the global state. The state of a face comprises of speed, eccentricity (ratio between width and height), position and scale. In the state dynamics, the speed and position factors of the visible faces are given by a mixture of first-order auto-regressive model and a uniform distribution. A first-order model with steady-state is applied in detection of eccentricity and scale factors. The steady state values for eccentricity and scale are renewed only when a detected face is affiliated with the face track and at a lesser pace compared to the frame-to-frame dynamics.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
53 For the scenarios of multi-faces in meetings, continuous partial face occlusions happen only rarely. Also, the faces are often occluded by other body parts that are not tracked by the system. For longer full occlusions, the strategy is to have the algorithm delete the occluded face‟s track and start it over as soon as possible.
The face is split into three horizontal bands and in each band two normalized histograms with two different levels of quantization are computed. The scheme used in [57] decouples the colored pixels from the grey-scale pixels and it is applied with two different quantization levels. This option of semi-global multi-level histograms originates from a compromise between robustness to appearance variations across people, speed, head pose variations and a well maintained likelihood.
The histogram models for a face are initialized when a new target is included in the tracker. To increase the robustness to improper initialization and variance of lighting conditions, the models are updated whenever a detected face is affiliated with the given face track.
The tracking algorithm functions in two stages at every time instant. First, the states of the currently visible faces are estimated recursively, depending on the histogram model and solution by a MCMC sampling scheme. Second, a decision is made on addition/deletion of the face. The MCMC sampling scheme permits the efficient sampling in the high-dimensional state space of interacting targets as in [46].
3) Creation and Removal of Multi-targets: The manner
in which the objects are inserted and deleted from the tracker is an important feature of the algorithm. The aim of the application scenario is to avoid false alarm as much as possible. So, a quick detection is necessary in case of a tracking failure. Also, it should not stop tracking when there is no failure because it may take longer time until the face is detected again.
Two different HMMs are used; one is for object creation and the next for object removal, with different types of observations. A face detector (profile and frontal views) is used for every 10 frames. The HMMs depend on observations calculated on all frames since the last update. Before the creation and removal process, each detection is affiliated to a track under the following the conditions:
Detection is not affiliated with any other target.
Detection has smallest distance to target.
Distance between detection and target is lesser than two times the average width of their limits.
Two boundaries must overlap.
A general way to learn the affiliation rules and parameters is to use the training data, as in [40].
a) Creation of Targets: During the initialization of a
new target two objectives must be satisfied. First, the erroneous initializations due to misdetections must be minimized. Next, the correct targets must be initialized early.
For the decision of the addition of new targets to the face tracker, a simple HMM is used. The HMM is useful for the calculation of the probability of a hidden, discrete variable. This variable indicates whether a face is there or not at the specified position.
b) Removal of Targets: The algorithm is checked
whether it is still correctly following a face or if it has lost during the tracking. The algorithm may lose track, when it gets deflected by a similar background. The static and dynamic observations are made.
The first static observation for a specified target relies on the output of the face detector. The second static observation is the tracking memory value at a specified target position in the image. This observation guarantees that the tracking is more likely to be maintained when the face stays at its previous position. Also, the target should be deleted with a higher probability when it shifts to image regions that were never occupied by a face before. The third static observation is the tracker‟s observation likelihood calculated at the mean state value of the specified target. The fourth static observation corresponds to the variance of the target filtering distribution. A higher variation of the state distribution results in a higher uncertainty and the track should be halted more quickly.
The dynamic observations depend upon the temporal variation of different features of the object. They rely on the discovery of rapid increases or decreases over the instant of particle variance and observation likelihood. The values of these features are assumed to be normally distributed during the tracking process.
4)Identification of Different Persons: The algorithm
tracks the identities of different persons and it associates each track with a person. Person models are built for longer-term descriptions of person appearance taken from observations during the tracking.
5) Merits of this Multi-Object Tracker: It is mainly
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
54
6)Demerits of this Multi-Object Tracker: The analysis
with respect to large objects has not been performed. The number of objects tracked is lesser when compared to other multi-object trackers. The temporal filtering of the HMM accumulating observations over longer time periods leads to delayed failure decisions. The failure detection is dependent on the time it takes to receive a new face detection to reinitialize the tracking.
IV. BACKGROUND WORK
The other multi-object tracking methods which work upon the variants of particle filter, Hidden Markov Model and the appearance information of objects are discussed in the following section.
D.Multi-target Visual Tracking for Surveillance with
Multiple Active Cameras
This multi-object tracker is surveillance system that is capable of tracking with almost real-time response [9]. Multiple pan-tilt cameras are used in this technique. The distributed camera agent is constructed to track the multiple moving objects. The particle filter is extended with an estimation of target depth estimate to track the occluding targets. A technique to select suboptimal camera action is introduced for a camera upon a pan-tilt platform that has to track many targets within its limited FoV (Field of View) simultaneously. The multi-object tracker is based on the Monte Carlo method and mutual information to maintain the tracked target‟s coverage. The surveillance system consists of small number of active cameras to monitor a wide space. This technique maximizes the number of targets to be tracked. A task assignment strategy, i.e. the online position technique and a hierarchical camera selection is introduced in the multi-object tracker. The architecture of the multi-camera system consists of input/output HMM, which models the correlations between each target, the captured image and a given camera action. A variant of the particle filter known as sequential importance sampling (SIS) particle filter is used to the posterior of an independent target. The SIS particle filter also provides the visual information of a target by obtaining samples and the importance functions of other indistinguishable overlapping targets. A sampling importance resampling (SIR) particle filter is used to obtain the samples which are resampled from the joint posterior of overlapped targets during the previous time instant.
E.High Density Multiple Human Tracking Based on
Particle Filter
This multi-object tracker is capable of automatic multiple human detection in high density crowds and extreme occlusions [3]. Human tracking in the presence of high density is a challenging problem. Standard techniques like background modeling fail to perform when most of the objects in the scene are at motion. The human detection and tracking process are combined into a single framework and a confirmation-by-classification method is introduced to calculate the confidence in a track trajectory through occlusion and to discard false positive detections. A Viola and Jones AdaBoost cascade classifier for detection, a particle filter for tracking and histograms based on color for appearance modeling. This method is an integration of head detection, particle filter based tracking via appearance information and confirmation-by-classification.
F.Multi-Object Tracking Based on Markov Chains
A detection-based three-level hierarchical association is presented to track multiple objects in dense environments [35]. One of the disadvantages in this multi-object tracker is that it takes input only from a single camera. At the low level, dependable tracklets (short tracks) are created by connecting detection responses based on preservative affinity constraints. At the middle level, the tracklets are further affiliated to obtain longer tracklets with more complex affinity parameters. The affiliation is computed as MAP (Maximum a posteriori) problem, which is solved by the Hungarian algorithm. At the high level, entries, scene occlusions and exits are calculated using the pre-computed tracklets, which are applied to refine the final trajectories. This approach is employed in the pedestrian class.
V. DISCUSSION AND RESULTS
The performance and computational analysis of various recent multi-object tracked are compared. The comparisons are made in terms of computational complexity, time complexity, computation time, true positive, false positive and robustness.
G.Multi-Object Tracking Based on Coupled Layer
Utilizing HMM and Sequential Particle Filter
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
[image:10.612.47.291.184.286.2] [image:10.612.61.273.331.493.2]55 The input video sequence which runs at 30 fps composes of 101 frames, with a total runtime of 26.2 seconds. The input video sequence at 5th frame and 91st frame is shown in Fig. 5.
Fig. 5. Input Video Sequence at 5th Frame and 91st Frame.
1) Computation Time: The average time taken to process
a single frame is 1.77311ms. The implementation of computation time analysis in MATLAB is given in Fig. 6.
Fig. 6. Computation Time Analysis of the Multi-Object Tracker.
2)False Reduction: The false reduction analysis for the
first 12 frames of the video sequence is implemented in MATLAB and is shown in Fig. 7.
Fig. 7. False Reduction Analysis of the Multi-Object Tracker.
It is computed that for this multi-object tracker, the average TP is 8.3 and average FP is 1.08.
H.Multi-Track Linking Methods for Track Graphs Using
Set-cover Techniques and Network-flow
1) Time Complexity: This multi-object tracker involves a
deterministic greedy method for solving the set-cover problem. It constitutes a time complexity of O (MN + M2), where M = |V| and N = |P| and V, P is set of vertices and paths in the track graph respectively [4].
2) False Positive Rate (FPR): Three different density
[image:10.612.324.577.338.489.2]video sequences of bats are considered, denoted as B1, B2, and B3, consisting of 20, 50 and 100 bats per frame respectively. B1 and B3 contained 100 frames and B2 contained 200 frames for each view. The FPR analysis for different linking methods for the video sequences is given in Fig. 8.
Fig. 8. False Positive Rate (FPR) Analysis of the Multi-Object Tracker.
I. Particle Filter for Tracking People in Visual
Surveillance
The CAVIAR (Context Aware Vision Image-based Active Recognition) video sequence from VS-PETS 2004 [52] is taken as the video sequence for testing purpose. This video sequence consists of data over 26,000 frames and permits rigorous comparisons.
1) Computational Complexity: The computational
complexity is O (T. m^), m^is the most frequent value of the estimate of the maximum number of objects in the scene and „T‟ is the number of matching particles, where T << N (Total number of particles in the image scene) [5].
2) False Positive Rate: This multi-object tracker gives a
[image:10.612.72.262.545.704.2]International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
56
J. Long-Term Online Multiface Tracking Using Particle
Filter and Hidden Markov Model (HMM)
This multi-object tracker is tested with both short and long video sequences in terms of FPR.
1) Short Video Sequences: The snapshots of the video
[image:11.612.49.301.240.346.2]sequence are given in Fig. 9 [1]. The pictures show that one face is being occluded by another one. The tracking of this video sequence provides a FPR of less than 0.1 [1].
Fig. 9. Short-term Video Sequence Tracking Analysis.
2) Long Video Sequences: The snapshots of the video
[image:11.612.49.301.383.672.2]sequence are given in Fig. 10 [1].
Fig. 10. Long-term Video Sequence Tracking Analysis.
In Fig. 10 (b) and Fig. 10 (g) some targets are initialized from misdetections. In Fig. 10 (c) the tracks are not maintained when the detections are not found. In Fig. 10 (f) and Fig. 10 (g) the tracking failures are detected earlier. In Fig. 10 (h) the lost target is reinitialized earlier itself (the second person from the left). The FP rate in this video sequence was estimated to be 0.33 [1].
3) Computational Time: This multi-object tracker runs in
real-time, at around 20-23 fps, including the video decoding. The video sequences are at a resolution of 640 x 360 pixels [1]. Around 39 % of the processing is spent on the tracker‟s likelihood computation, 27 % on the face detection, 9 % on the MCMC sampling, 9 % on the frame decoding and conversion, 7 % on target creation and 1 % on target removal [1].
K.Multi-target Visual Tracking for Surveillance with
Multiple Active Cameras
This multi-object tracker has been tested with both processed online image sequences and pan-tilt camera platforms operated in real time. The image size of all frames is 320 x 240 pixels [9].
The computational complexity (CP) for a particle filter is O (N), where N is the number of particles in the objects. For M individual targets the CP would be O (M. N) without any decrease. But as these targets interfere, the CP increases to O (M!. M. N) due to requirement for estimation of joint target state and secondary state of depth hypothesis. When M increases, M! , also surpasses in growth rate with respect to all exponential functions and other polynomials of M. The typical motion paths of individual targets in one interacting group do not witness significant variations in their depth order. Therefore, the depth hypothesis will only be for newly joining targets in an interacting group and depth order of existing targets. Thus the CP can be reduced to O (M2 . N) [9].
L.High Density Multiple Human Tracking Based on
Particle Filter
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
[image:12.612.50.298.130.353.2]57
Fig. 11. Multiple Human Tracking Analysis.
A total of 40 frames consisting of 1414 heads were labeled at an average of 35.35 heads per frame. 20 particles were used per trajectory. The average FP per frame was 2.05. The processing time was about 2 seconds per frame, for a frame window of 640 x 480 on an Intel Pentium 4 2.8 GHz with 2 GB RAM. The red boxes in Fig. 11 show the false positives and the green boxes in Fig. 11 shows the true positives.
M. Multi-Object Tracking Based on Markov Chains
This multi-object tracker is tested in an environment of multiple pedestrians. The input video sequences for this multi-object tracker are CAVIAR set [65] and i-LIDS AVSS (Advanced Video and Signal Based Surveillance) AB set [64]. The CAVIAR set consists of 26 video sequences captured in a corridor and i-LIDS AVSS AB set contains three videos captured in a subway station. Both the video sequences are filled with heavy occlusions. The final False Alarm per Frame (FAPF) for the CAVIAR video sequence is 0.025 and for i-LIDS AVSS AB video sequence is 0.137 [35]. The multi-object tracker was tested at about 50 fps on a 3.0 GHz PC. For increasing the computational efficiency of the system, a sliding window technique was used at the middle level to decrease the size of the transition matrix for the Hungarian algorithm.
N.Overall Comparison of False Positives:
[image:12.612.322.566.209.494.2]The overall comparison of the results of multi- object tracking methods based on particle filter, Hidden Markov Model (HMM) and the appearance information of objects have been compared in terms of false positives (FP), as shown in Fig. 12.
Fig. 12.
VI. CONCLUSION
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
58 REFERENCES
[1 ] S. Duffner and J. M. Odobez, "Track Creation and Deletion Framework for Long-Term Online Multiface Tracking," Image Processing, IEEE Transactions on, vol. 22, pp. 272-285, 2013. [2 ] C. Cuevas and N. Garcia, "Efficient Moving Object Detection for
Lightweight Applications on Smart Cameras," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 23, pp. 1-14, 2013. [3 ] I. Ali and M. N. Dailey, "Multiple human tracking in high-density crowds," Image and Vision Computing, vol. 30, pp. 966-977, 2012. [4 ] Z. Wu, et al., "Efficient track linking methods for track graphs using
network-flow and set-cover techniques," presented at the Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, 2011.
[5 ] J. Sherrah, et al., "Particle filter to track multiple people for visual surveillance," Computer Vision, IET, vol. 5, pp. 192-200, 2011. [6 ] P. Pan and D. Schonfeld, "Video Tracking Based on Sequential
Particle Filtering on Graphs," IEEE Transactions On Image Processing, vol. 20, pp. 1641-1651, 2011.
[7 ] C.-H. Kuo and R. Nevatia, "How does Person Identity Recognition Help Multi-Person Tracking?," 2011.
[8 ] S. Duffner and J. M. Odobez, "Exploiting long-term observations for track creation and deletion in online multi-face tracking," in Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, 2011, pp. 525-530. [9 ] H. Cheng-Ming and F. Li-Chen, "Multitarget Visual Tracking Based
Effective Surveillance With Cooperation of Multiple Active Cameras," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 41, pp. 234-247, 2011.
[10 ]L. Cehovin, et al., "An adaptive coupled-layer visual model for robust visual tracking," in IEEE International Conference on Computer Vision, 2011, pp. 1363-1370.
[11 ]B. Benfold and I. Reid, "Stable multi-target tracking in real-time surveillance video," in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 2011, pp. 3457-3464.
[12 ]B. Song, et al., "A stochastic graph evolution framework for robust multi-target tracking," presented at the Proceedings of the 11th European conference on Computer vision: Part I, Heraklion, Crete, Greece, 2010.
[13 ]J. Santner, et al., "PROST: Parallel robust online simple tracking," in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 723-730.
[14 ]A. Saffari, et al., "Online multi-class LPBoost," in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, 2010, pp. 3570-3577.
[15 ]D. Mikami, et al., "Memory-based particle filter for tracking objects with large variation in pose and appearance," presented at the Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III, Heraklion, Crete, Greece, 2010.
[16 ]Z. Kalal, et al., "Forward-Backward Error: Automatic Detection of Tracking Failures," in Pattern Recognition (ICPR), 2010 20th International Conference on, 2010, pp. 2756-2759.
[17 ]F. Jialue, et al., "Human Tracking Using Convolutional Neural Networks," Neural Networks, IEEE Transactions on, vol. 21, pp. 1610-1623, 2010.
[18 ]A. Andriyenko and K. Schindler, "Globally optimal multi-target tracking on a hexagonal lattice," presented at the Proceedings of the 11th European conference on Computer vision: Part I, Heraklion, Crete, Greece, 2010.
[19 ]W. Zheng, et al., "Tracking a large number of objects from multiple views," in Computer Vision, 2009 IEEE 12th International Conference on, 2009, pp. 1546-1553.
[20 ]Q. Yu and G. Medioni, "Multiple-Target Tracking by Spatiotemporal Monte Carlo Markov Chain Data Association," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, pp. 2196-2210, 2009.
[21 ]Z. Wu, et al., "Tracking-reconstruction or reconstruction-tracking?comparison of two multiple hypothesis tracking approaches to interpret 3D object motion from several camera views," presented at the Proceedings of the 2009 international conference on Motion and video computing, Snowbird, Utah, 2009.
[22 ]E. Ricci and J. M. Odobez, "Learning large margin likelihoods for realtime head pose tracking," in Image Processing (ICIP), 2009 16th IEEE International Conference on, 2009, pp. 2593-2596.
[23 ]Y. Ming, et al., "Context-Aware Visual Tracking," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 1195-1209, 2009.
[24 ]D. Mikami, et al., "Memory-based Particle Filter for face pose tracking robust under complex dynamics," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 999-1006.
[25 ]Y. Li, et al., "Learning to associate: HybridBoosted multiple object tracking," in Proc. Comput. Vis. Pattern Recognit., 2009, pp. 2953– 2960.
[26 ]Z. Kalal, et al., "Online learning of robust object detectors during unstable tracking," in Proc. Int. Conf. Comput. Vis., 2009, pp. 1417– 1424.
[27 ]X. Junliang, et al., "Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 1200-1207.
[28 ]S. M. Bhandarkar and X. Luo, "Integrated detection and tracking of multiple faces using particle filtering and optical flow-based elastic matching," Computer Vision and Image Understanding, vol. 113, pp. 708-725, 2009.
[29 ]B. Babenko, et al., "Visual tracking with online Multiple Instance Learning," in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, 2009, pp. 983-990.
[30 ]J. Yao and J.-M. Odobez, "Multi-camera multi-person 3D space tracking with MCMC in surveillance scenarios," presented at the Proc. Eur. Conf. Comput. Vis., Workshop Multicamera Multimodal Sensor Fusion Algorithms Appl., Marseille, France, 2008.
[31 ]K. Shafique, et al., "A rank constrained continuous formulation of multi-frame multi-target tracking problem," in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, 2008, pp. 1-8.
[32 ]E. Maggio, et al., "Efficient Multitarget Visual Tracking Using Random Finite Sets," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 18, pp. 1016-1027, 2008.
International Journal of Emerging Technology and Advanced Engineering
Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 3, Issue 02, February 2013)
59 [34 ]K. Li, et al., "Cell population tracking and lineage construction with
spatiotemporal context," Medical Image Analysis, vol. 12, pp. 546– 566, 2008.
[35 ]C. Huang, et al., "Robust Object Tracking by Hierarchical Association of Detection Responses," presented at the Proceedings of the 10th European Conference on Computer Vision: Part II, Marseille, France, 2008.
[36 ]G. Bradski and A. Kaehler, Learning openCV: computer vision with the OpenCV library: O‟Reilly, 2008.
[37 ]C. Plagemann, et al., "Efficient failure detection on mobile robots using particle filters with Gaussian process proposals," presented at the Proceedings of the 20th international joint conference on Artifical intelligence, Hyderabad, India, 2007.
[38 ]J. Hao, et al., "A Linear Programming Approach for Multiple Object Tracking," in Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, 2007, pp. 1-8.
[39 ]K. Zia, et al., "MCMC Data Association and Sparse Factorization Updating for Real Time Multitarget Tracking with Merged and Multiple Measurements," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 28, pp. 1960-1972, 2006.
[40 ]M. Richardson and P. Domingos, "Markov logic networks," Mach. Learn., vol. 62, pp. 107-136, 2006.
[41 ]A. G. A. Perera, et al., "Multi-Object Tracking Through Simultaneous Long Occlusions and Split-Merge Conditions," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 666-673.
[42 ]P. Nillius, et al., "Multi-Target Tracking - Linking Identities using Bayesian Network Inference," presented at the Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2, 2006.
[43 ]H. Grabner and H. Bischof, "On-line Boosting and Vision," in Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2006, pp. 260-267.
[44 ]Z. Cha and R. Yong, "Robust Visual Tracking via Pixel Classification and Integration," in Pattern Recognition, 2006. ICPR 2006. 18th International Conference on, 2006, pp. 37-42.
[45 ]A. Adam, et al., "Robust Fragments-based Tracking using the Integral Histogram," presented at the Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1, 2006.
[46 ]K. Zia, et al., "MCMC-based particle filtering for tracking a variable number of interacting targets," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, pp. 1805-1819, 2005. [47 ]J. Yang and J. Y.-T. Leung, "A generalization of the weighted set
covering problem," Naval Research Logistics, vol. 52, pp. 142–149, 2005.
[48 ]K. Smith, et al., "Evaluating Multi-Object Tracking," in Computer Vision and Pattern Recognition - Workshops, 2005. CVPR Workshops. IEEE Computer Society Conference on, 2005, pp. 36-36.
[49 ]J. Czyz, et al., "A particle filter for joint detection and tracking of multiple objects in color video sequences," in Information Fusion, 2005 8th International Conference on, 2005, p. 7 pp.
[50 ]O. Songhwai, et al., "Markov chain Monte Carlo data association for general multiple-target tracking problems," in Decision and Control, 2004. CDC. 43rd IEEE Conference on, 2004, pp. 735-742 Vol.1. [51 ]B. Ristic, et al., Beyond the Kalman filter: particle filters for tracking
applications: Artech House, 2004.
[52 ]R. Fisher, "PETS04 surveillance ground truth data set," in Proc. Sixth IEEE Int. Workshop on Performance Evaluation of Tracking and Surveillance, 2004, pp. 1-5.
[53 ]W. Ying, et al., "Tracking appearances with occlusions," in Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, 2003, pp. I-789-I-795 vol.1. [54 ]K. Nummiaro, et al., "An adaptive color-based particle filter," Image
Vision Comput., vol. 21, pp. 99-110, 2003.
[55 ]A. D. Jepson, et al., "Robust online appearance models for visual tracking," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 1296-1311, 2003.
[56 ]S. L. Dockstader, et al., "Markov-Based Failure Prediction for Human Motion Analysis," presented at the Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, 2003. [57 ]P. Perez, et al., "Color-Based Probabilistic Tracking," presented at the
Proceedings of the 7th European Conference on Computer Vision-Part I, 2002.
[58 ]P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. I-511-I-518 vol.1. [59 ]C. Rasmussen and G. D. Hager, "Probabilistic data association
methods for tracking complex visual objects," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 560-576, 2001.
[60 ]T. Kirubarajan, et al., "Multiassignment for tracking a large number of overlapping objects and application to fibroblast cells]," Aerospace and Electronic Systems, IEEE Transactions on, vol. 37, pp. 2-21, 2001.
[61 ]I. Haritaoglu, et al., "W4: real-time surveillance of people and their activities," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 22, pp. 809-830, 2000.
[62 ]D. Comaniciu, et al., "Real-time tracking of non-rigid objects using mean shift," in Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, 2000, pp. 142-149 vol.2. [63 ]T. Darrell, et al., "Integrated person tracking using stereo, color, and
pattern detection," in Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, 1998, pp. 601-608.
[64 ]iLIDS.Available:http://www.elec.qmul.ac.uk/staffinfo/andrea/avss200 7_d.html