In this chapter, we have proposed a Learn++ based tracker for visual tracking. By means of automatically adjusting the members of classifiers, a democracy mechanism is adopted by LPP tracker to solve numerous challenges appearing in the scenarios, simultaneously. Moreover, LPP tracker achieves an optimal bal- ance between flexibility and stability of the classifiers and between the efficiency and performance of the model as well. In future work, it is worth considering using other constraints to guide the sampling of classifiers. Moreover, for abrupt deformation of the target when typically n < 5, LPP tracker may refuse to add
a new classifier to the ensemble. How to define an adaptive quantity to tackle such a situation is under investigation. In this chapter, the classifiers are gener- ated in a same function space and based on a same type of feature but trained using different filters and samples. In fact, there is evidence that performance of tracking may be increased by combining different successful models. Thus, to further improve the diversity of the tracker, it is meaningful to investigate how to combine various successful models in the future.
Chapter 4
A Winner-take-all Strategy for
Improved Object Tracking
As pointed out in Chapter 3, visual tracking is a fundamental task in computer vision and the goal is to estimate the locations or motion states of a predefined tar- get in video. It has many potential applications in surveillance, human-computer interaction, reality augmentation and robotics. In Chapter 3, a set of indepen- dent classifiers sampled in a same function space are dynamically maintained and updated according their recent performance and environment. However, the essential reason why online object tracking is very challenging is that it is an under-sample and incomplete-dataset problem. Thus, from the perspective of statistical learning theory, the problem leads to overfitting and low generalisation to tackle the various unpredictable changes. Therefore, to further improve the diversity of the system in this chapter (See Fig. 4.1), a Winner-take-all strategy will be exploited for online object tracking. The differences between the model LPP introduced in Chapter 3and the model in this chapter are:
• The model in the last chapter selects a set of classifiers from a same func- tional space but the parameters of them are different. These classifiers which are trained using different datasets and at different time are used to conquer various challenges. However, in this chapter, the tracker mem- bers with diverse considerations are from different functional space and thus possess more diversity.
• A set of competitive classifiers are chosen to solve current problem whist, in this chapter, only one winner tracker is selected to improve both the performance and the efficiency of system simultaneously.
The rest of this chapter is organised as follows. The hypotheses and motivation of this work is introduced in Sec. 4.1. In Sec. 4.2, how to build a performance
Function
Space
2 f 1f
3 fFunction
Space
Function
Space
Associations
Figure 4.1: Visual data association in a signal camera using a winner tracker selected from different function spaces.
prediction model and how to online select a winner tracker is detailed. Sec. 4.4
presents experimental results. Section 4.5 draws summary.
4.1
Preliminaries
4.1.1
Hypotheses
Beside the basic hypotheses of object tracking introduced in the last chapter, this chapter has the following hypotheses as well:
• The diversity of a system could be improved by incorporating different models. Moreover, the diversity of these models have the difference with each other so that they are complementary.
• It is assumed that, for a particular application, a set of trackers owning diverse properties can be selected from existing methods. Furthermore, a relationship between the performance of these selected trackers and motions (challenges) of a target can be modelled off-line on a large labelled dataset.
Figure 4.2: Comparison of two trackers Struck [59] (Red) and CT [138] (Green) on sequences carDark and Doll in the TB-50 dataset [21]. Struck outperforms CT on the first sequence (first row) but the performance is obviously lower on the second sequence. The question is: when and how to combine the two trackers without sacrificing the efficiency? This chapter will answer this question from the perspective of a winner-take-all strategy.
4.1.2
Motivation
For decades, numerous algorithms are proposed but different models achieve dis- similar results for different difficulties. For example, part-based models [208] are more robust to partial occlusion, comparative features [138] are more invariant to illumination changes and the tracking-by-detection methods [1] have a stronger ability to tackle the out-of-view problem. Fig. 4.2shows that two trackers Struck [59] and CT [138] perform very differently on two sequences. And, if several dif- ferent challenges occur in a long video sequence, most methods will fail to track the target because a single method cannot deal with all the challenges. In general, it is difficult to say which existing tracker can completely outperform all other methods in any environment.
To avoid the overfitting and improve the generalisation, the easiest way is to directly fuse the results from an ensemble [20, 149], which amplify the diver- sity of the system. However, this strategy naturally increases the computational complexity. Complicated trackers [59] normally perform better on very complex situations than some simple models but the computational complexity of these complicated methods is very high so that they are far from real-time. For exam- ple, from the findings in [134], Struck [59] achieves at least 54.9% (47.4 vs. 30.6) higher overall accuracy than CT [138] but its time complexity is more than triple. Moreover, several existing evaluation reports [3, 134,137, 144] give a compre- hensive investigation of the performance of recently proposed trackers and several datasets for evaluating different trackers are built. In these works, the strengths of various methods and their robustness to different challenges are analysed in detail. The datasets and analysis are very valuable and beneficial to understand the intrinsic principle of object tracking. If the knowledge can be exploited, the performance of a new object tracking system can be improved.
all strategy is exploited to select a winner tracker which is most suitable and efficient to tackle the current challenge, according to features extracted from the current environment and an efficiency factor. To fast extract features in a tracking environment, a dense trajectories based motion feature is designed to describe characteristics and challenges of the movement of an object and its surrounding. Based on a large public dataset, a prediction model of performance for different trackers on various challenges can be obtained off-line. Then, the learned structural regression model can be directly used to efficiently select the winner tracker online. To increase the flexibility of all members, the tracked results of the winner will be used to update other trackers. The advantages of the proposed WTA tracker are reflected from the following three aspects: 1) By exploiting the knowledge off-line, the performance of trackers can be carefully identified on a large sample set. We can consider the knowledge transferred from a dataset into a new testing sequence. 2) By incorporating the powerful and complementary abilities from multiple trackers, the diversity of the model is improved so that the WTA tracker can tackle various unpredictable difficulties. 3) Since, at any time, only one suitable tracker in terms of both accuracy and speed is executed, the WTA tracker will be much faster than the slowest one. The best cases are that the fastest tracker can be chosen for simple situations most of the time and a complex and accurate tracker will be occasionally used only when there is a difficult challenge.