Generally, according to the type of samples used to build the model, the on- line adaptive algorithms can be divided into two groups: generative methods which only use positive samples to infer the relationship between them, and dis- criminative methods which use both positive and negative samples to train a classification hyperplane. Moreover, from the perspective of development history of object tracking, there are four stages: optical flow to match two consecutive frames [17], particle filter to model the underlying dynamics of a motion system, tracking by detection and multi-expert model.
A most basic concept of object tracking is direct image patch matching. Fol- lowing this basic idea, there are several well-known methods, such as: Lucas- Kanade tracker [17], fragments-based tracker [131] and mean shift tracking [132]. However, the target in these methods is not updated according to the appearance change of the object. Thus, an essential step forward is to build a generative appearance model to capture the variation over time, such as online subspace learning [56] and sequential Monte Carlo sampling [133]. Recently, sparse coding based methods catch much attention in the community of object tracking. Re- ported in the two experimental evaluation results [3, 134], the two sparse coding based methods [135] and [136] achieve high performance. However, despite the superior performance on partial occlusion, in a survey [137], the authors state that their experimental results have shown that visual tracking may not be a sparse representation problem. Moreover, generative methods would easily fail with a cluttered background.
Considering the significant role of discriminative information from background, pioneered by support vector tracker [19] and ensemble tracker [20], various dis-
criminative algorithms have been built to model the difference between the fore- ground and the background. Collins et al. [55] explore a mechanism which adap- tively selects the most discriminative features from a set of different colour spaces. In addition, the random projection is used in compressive tracking (CT) [138]. However, CT is a data-independent method which guarantees that no noise is introduced but lacks flexibility. Moreover, numerous methods exploring the dif- ferent properties of samples and relationships between samples, including P-N learning [1], semi-supervised SVM [139], semi-supervised boosting [140], multi- ple instance learning [58], weighted reservoir sampling [141] and semi-supervised transfer learning [142], have been also proposed to improve the performance of trackers. Recently, in [143], the confidence of a classifier is considered as a prob- ability which can be analysed using Gaussian Processes regression. Structured output tracking with kernels (Struck) [59], using the windows as input, explores the training data with the form of appearance and translation. The experimen- tal survey [144] concludes that Struck performs well on all aspects but one, the change of scale, bringing it to the number one position over their entire dataset. Futhermore, in the past few decades, numerous methods integrating multiple components are proposed to solve the various challenges. Intuitively, the diversity can be improved by using the information or knowledge from multiple sources. According to the stage of different components, we can categorise these meth- ods into three groups: combination of features, ensemble of classifiers in a same hypothesis space and multi-expert trackers.
Combination of features: Collins et al. [55] explore a mechanism which adaptively selects the most discriminative features from a set of different colour spaces. [145] fuses multiple observation models with parallel and cascaded eval- uation. Yoon et al. [146] adopt two steps: tracker selection and interaction to fuse multiple features. In [147], three different levels of features are modelled to enable robust model relearning.
Ensemble of classifiers: The co-tracking algorithm [139] trains multiple SVM classifiers using different feature types and combines their tracking results to achieve robust tracking. A set of random ferns is adopted to explore comparative features in [1]. Visual tracker sampler [148] incorporates a process of sampling trackers into the framework of particle filtering [18], without differentiating the trackers. Randomised ensemble trackers [149] consider the weights of classifiers as a non-stationary distribution. Three Struck [59] based trackers with different features [150] are combined to select the best tracking result among the three forward trackers. In [151], several online SVM algorithms are used as the base classifiers and a minimum entropy criterion is designed to evaluate the members. Multi-expert trackers: In [152], the Lucas-Kanade (LK) [17] method and one random forest based classifier are combined for target tracking. Similar to [152], Yan et al. [153] design an ensemble framework for the optimal selection of
detectors and trackers to do multi-target tracking. Recently, a complex system with multiple components [154]: a short-term Integrated Correlation Filter (ICF) processing, a short-term key points processing, a long-term memory updating, an output controller and a ICF updater, is proposed to produce sensitive and stable responses to complex situations.
All these methods require various updating schemes to capture the continuous deformations of the objects. As a consequence, they tend to drift by incorporating wrong information. To avoid the drifting, diverse strategies are adopted, such as: different update rates [155], data-independent knowledge [138] and selectively updating the parts [156]. However, the essential reason why drifting occurs is that classical trackers have not considered the object tracking as a “concept drift” problem and tried to solve different challenges by only one super-power model. In fact, the differences between various challenges are very large. Thus, we can see the limitation of the classical trackers comes from the basic i.i.d. assumption in machine learning, on which most of tracking-by-detection methods depend. The drifting problem is not very obvious and can be partially solved by the classical methods in short sequences but it is still quite difficult for the long-term tracking [144]. In the tracking-by-detection methods, recovering from drift may also prove a useful way to make tracking robust but the update of wrong information will destroy the structure of the classifier.