At the core of most contemporary visual-tracking algorithms lies a filter framework. De-terministic methods for tracking have been largely surplanted with probabilistic frameworks better equipped to handle the innate uncertainty in real-world environments. The proba-bilistic framework aggregates temporally-seperated information up to the current frame of the video sequence, in order to provide a better estimate of the target object pose. For long-term tracking, maintaining a joint distribution over all possible trajectories until the current frame can be taxing on computational and memory resources. Therefore, what is
commmonly done is only the distribution of possible poses for the current frame is com-puted - this is termed the filter distribution. Strictly computing the filter distribution for the current frame conditioned on all observations to date can be onerous computationally, if the computation is performed independently for each new frame. Fortunately, filter ap-proaches have been devised in order to compute the filter distribution recursively, leveraging the previously computed distribution at the previous frame.
3.3.1 Bayesian Filter
The objective of the filter frameworks to be discussed is to compute the probability distribution function (PDF) of the object pose at the current frame (i.e. time-step) given all previous observations. Let zt represent the generic pose (e.g. pixel coordinates of object centroid in image) of the object for the current frame t, and let ytrepresent the observation for frame t (ex. the image pixels). Yi:j = {yi, yi+1, . . . , yj} will denote all observations made between frames i and j. The filter distribution is then given by p(zt|Y1:t).
A common weak assumption is that the object pose at the current frame conditioned on the object pose in the previous frame is statistically independent from the poses of all earlier frames. This conditional independence is referred to as the Markov property, specifically the poses follow a first-order Markov model. The property can be generalized to nth-order;
whereby, current frame pose conditioned on the last n frames’ poses is independent of the earlier frames.
p(zt|zt 1, zt 2, . . . , z1) = p(zt|zt 1, zt 2, . . . , zt n) (3.1) Through application of the Markov assumption, the filter distribution can be decomposed into a recursive form which is both computational and memory efficient,
p(zt|Y1:t) = p(yt|zt)1
$ Z
p(zt|zt 1, Y1:t 1)p(zt 1|Y1:t 1)dzt 1
$ = p(yt|Y1:t 1) (3.2)
where we have used the conditional independence of the observation yt and the previous frame pose zt 1 given the current frame pose zt. Typically, the modes of the filter distri-bution p(zt|Y1:t) are of interest, since the modes correspond to the most probable poses of
the object at the current frame. The normalization factor $ is commonly discarded since it effects all possible poses equally.
3.3.2 Kalman Filter
A special case of the Bayesian filter arises when the likelihood p(yt|zt)and the transition distributions p(zt|zt 1, Y1:t 1)are Gaussian. A linear state-space model can be constructed for the state-evolution and observations,
zt+1= Azt+ ⌘⌘⌘t (3.3)
yt= Hzt+ ⌫⌫⌫t (3.4)
where zt 2 Rn is the state vector, A 2 Rn⇥n is the state-transition matrix, yt 2 Rm is the observation vector, and H 2 Rm⇥n is the observation matrix. ⌘⌘⌘t and ⌫⌫⌫t are mutu-ally statisticmutu-ally independent Gaussian noise signals, with covariance matrices Nx and Ny, respectively. The special linear-Gaussian case recursively propagates forward a Gaussian posterior distribution; hence, only the mean and covariance need to be propagated. The propagation equations for this special case is termed a Kalman filter, detailed in Algorithm 1 [122].
The constraint on a linear Gaussian state-space model can be quite restrictive on the possible models. The Kalman filter has been generalized to the non-linear case by lineariza-tion of the state-space model via the Extended Kalman filter (EKF) framework [122]. The non-linear state-space model allows for arbitrary state-transition fs(·) and observation hm(·) models.
zt+1= fs(zt) + ⌘⌘⌘t (3.5)
yt= hm(zt) + ⌫⌫⌫t (3.6)
During the time-update step the state-transition function fs(·) is linearized about the previous posterior mean estimate µµµt|t,
fs(zt)⇡ fs(µµµt|t) +r fs(µµµt|t)(zt µµµt|t) (3.7)
Input : observations yt
Output: state estimates of zt
Data:
µµ
µt|t=E{zt|Y1:t}
⌃t|t=E{(zt µµµt|t)(zt µµµt|t)T|Y1:t} µµ
µt+1|t=E{zt+1|Y1:t}
⌃t+1|t=E{(zt+1 µµµt+1|t)(zt+1 µµµt+1|t)T|Y1:t}
Result: recursive computation of posterior distribution p(zt|Y1:t)
/* Initialization */
µµ
µ0|0= µµµ0 ;
⌃0|0 = ⌃0 ;
for t 0 to N do
/* Time-Update */
µµµt+1|t = Aµµµt|t;
⌃t+1|t= A⌃t|tAT + Nx;
/* Measurement-Update */
Kt+1= ⌃t+1|tHT(H⌃t+1|tHT + Ny) 1; µ
µµt+1|t+1 = µµµt+1|t+ Kt+1(yt+1 Hµµµt+1|t);
⌃t+1|t+1 = (I Kt+1H)⌃t+1|t; end
Algorithm 1: Kalman Filter
the Jacobian r f(µµµt|t) takes the place of the state-transition matrix A in the EKF, when updating the state covariance matrix. The observation model is similarly linearized about the predicted mean estimate µµµt+1|t,
hm(zt+1)⇡ hm(µµµt+1|t) +r hm(µµµt+1|t)(zt+1 µµµt+1|t) (3.8) the Jacobian r hm(µµµt+1|t) is used in place of the observation matrix H for computation of the Kalman gain Kt+1 (i.e. observation covariance and state-observation cross-covariance matrices).
3.3.3 Unscented Kalman Filter
The linearization step required by the EKF can be a crude approximation to the actual system dynamics and may result in filter divergence. For this reason, other means have been devised to better handle non-linear state-space models. Julier et al. proposed an Unscented Transform [77] which approximates the distribution through a nonlinear function as opposed to linearization of the nonlinear function itself. The distribution of the state zt2 Rn is approximated via a collection of 2n + 1 sigma points. The Unscented Transform is shown in Algorithm 2, and the details of the UKF are presented in Algorithm 3. The UKF has a single tuning parameter , for Gaussian distributions the heuristic n + = 3 is applied.
Result: approximate Gaussian distribution N (z;µµµz, ⌃z) through nonlinear function y = f (z)
/* Compute Transformed Sigma Points */
(i)y = f ( (i)z ) 8i;
/* Compute Mean of Transformed Points */
µµ
µy =P2n
i=0w(i) (i)y ;
/* Compute Covariance of Transformed Points */
⌃y =P2n
i=0w(i)( y(i) µµµy)( y(i) µµµy)T;
Algorithm 2: Unscented Transform [77]
Input : observations yt
Output: state estimates of zt
Data:
Result: recursive computation of posterior distribution p(zt|Y1:t)for nonlinear state-transition fs(·) and hm(·)
/* Time-Update: Compute State Prediction */
(µµµzt+1|t, ⌃zt+1|t,{ (i)z }2ni=0) =UnscentedTransform(fs(·), µµµat|t, ⌃at|t);
Algorithm 3:Unscented Kalman Filter [77]
3.3.4 Particle Filter
Different variants of the Kalman filter have been devised in order to handle non-linear state-transition and observation functions; however, the assumption of Gaussianity still remains. This assumption can break-down; for instance, in the presence of heavy-tailed noise in the observations - the Gaussian noise model may be inadequate. Monte-Carlo methods are able to deal with non-Gaussian and non-linear models through the use of
sampling methods. Particle filters are a class of Sequential Monte-Carlo (SMC) methods, which have gained popularity for their ease of implementation and parallelizability. The objective of particle filters is to recursively compute the posterior distribution p(z0:t|y0:t) or filtering distribution p(zt|y0:t), for multi-modal distributions. Particle filters are based on importance-sampling [135],
where zi0:t are samples from the posterior distribution. Since, the posterior is typically unknown - importance sampling works by weighting samples from an approximate proposal distribution q(z0:t|y0:t),
Sequential methods have been developed in order to avoid computing the importance weights (!(z0:t)) anew for each time-step,
p(z0:t|y0:t) = p(yt|zt)p(zt|zt 1)p(z0:n 1|y0:n 1) (3.14) p(z0:t|y0:t)⇡ q(zt|z0:t 1, y0:t)q(z0:t 1|y0:t 1) (3.15)
!t= p(yt|zt)p(zt|zt 1)
q(zt|z0:t 1, y0:t) !t 1 (3.16)
Input : observations yt
Output: state estimates of zt
Result: recursive computation of posterior distribution p(zt|y1:t)
/* Initialization */
z(i)0 ⇠ p(z0) 8i 2 {1, . . . , Ns};
for t 0 to N do
/* Importance Sampling */
˜z(i)t ⇠ p(zt|z(i)t 1);
/* Update Importance Weights */
!(i)t = p(yt|˜z(i)t );
˜
!(i)t = !
(i)
PNst i=0!(i)t ;
/* Resample with replacement Ns particles according to importance
weights */
z(i)t Resample(˜z(i)t , ˜!t(i)); end
Algorithm 4: Bootstrap Particle Filter [135]
3.3.5 Interacting Multiple Model Filter
The target being tracked may undergo various systematic changes during the course of its trajectory: from periods of relative stationarity to periods of high dynamic motion.
Capturing the underlying state-dynamics of a complex trajectory in a single state-space model may be impractical. Blum et al. introduced the Interacting Multiple Model (IMM) filter in order to maintain and combine multiple state-dynamic models. Under the IMM framework, a seperate model can be formulated for the different expected motion the target may undergo. Algorithm 5 presents the details of the IMM filter. Multiple-model filters will be addressed in more detail in Section 4.5.1.
Input : initial model pmf '0, homogeneous model transition probabilities ⇡, and observations yt
output: state estimates of zt
Data:
⇡i|j = p(mt= i|mt 1= j) 'it= p(mt= i)
N is the length of the time-sequence M is the total number of models
Result: cooperative multiple model scheme for estimating latent state mean µµµzt =E{zt|Y1:t}
/* Initialization */
Given {µµµ(j)0|0, ⌃(j)0|0}Mj=1 ; for t 1 to N do
/* Time-Update of Model Probabilities */
'it|t 1=PM
Algorithm 5: Interacting Multiple Model [19, 113]
(a) (b) (c)
(d) (e) (f)
Figure 3.1: MeanShift operation. The red curve denotes the true object trajectory, the blue ellipse represents the predicted object location using MeanShift. (a) is the initial position of the estimated object location (blue cross) and the target object (orange circle). (b) shows the subsequent position of the target object and the initial position of the estimator prior to running any MeanShift iterations. (c) demonstrates the motion prior predicting the subsequent location and scale of the object (blue cross and dashed circle). (d)-(f) are several iterations of the SOAMST algorithm as the estimate of the location and scale of the object are refined.