Analysis of long-term tracking - Online Winner-take-all Tracking

4.3 Online Winner-take-all Tracking

4.4.5 Analysis of long-term tracking

Generally, long-term tracking could be considered as one of the most challenging because of possible different problems presenting in a same sequence. Single tracker which models a certain aspect of challenging normally fail to meet the requirement to tackle these problems simultaneously. To further evaluate the performance of our proposed WTA, 7 long-term sequences, which the average and minimum numbers of frames are 2791 and 2133 respectively, are selected from public datasets and the comparison results between WTA and some methods including TLD, Strack, MUSTer and MEEM, which are thought as the relative powerful for long-term tracking, are reported in Table 4.4 and 4.5. From the two table, we can see that the proposed WTA achieves consistently advantageous results over the state-of-the-art methods. It is worth to point out that WTA is obviously more robust than others on these 7 sequences. For example, MUSTer could completely track the object on the 5th sequence but almost fail on the 4th and 9th sequences. In contrast, WTA could solve the different sets of problems in the different sequences.

Sequence S1 S2 S3 S4 S5 S6 S7 MEEM 3 1295 23 21 130 20 14 MUSTer 3 173 10 143 82 34 5 Struck 48 1285 8 57 82 25 13 TLD 872 228 8 65 82 2500 7 WTA 1486 1297 2011 356 325 1663 186

Table 4.5: Fail comparison of tracking results on 7 long-term challenging sequences. The number denotes which number of frames the centre distance of one tracker is larger than 20 pixels.

4.5 Summary

In this chapter, to improve the performance and efficiency, a winner-take-all framework has been proposed for object tracking by incorporating the strengths of trackers for different challenges. It proves that different trackers have different characteristics and the combination of them is valuable. In the future, there are two points which need to be further investigated. On the one hand, in this chapter, to guarantee the performance, several advantageous trackers are used. In fact, it is meaningful to consider whether the performance can be hugely im- proved, when merely simple fast trackers are combined. On the other hand, in this chapter, the trackers interact with each other simply through the result of the current frame. Whether trackers can learn from each other more deeply and grow in a similar way of the crossover and mutation steps in genetic programming is still under investigation.

Chapter 5 Learning Cross-view Binary

Identities for Fast Person Re-ID

This chapter will turn to person re-identification (Re-ID) in a cross-camera setting, see Fig. 5.1. Cross-camera person re-identification (Re-ID) is a funda- mental solution for automated video surveillance [22]. It addresses the prob- lem of associating people, at different locations and times, observed by the non- overlapping Closed-Circuit TeleVision (CCTV) system. It has various potential applications, such as long-term multi-person tracking, person re-acquisition and forensic search [22]. Thus, the models introduced in previous chapters could be considered as the preprocessing of person Re-ID. Normally, in solving the task of person Re-ID, the single-view person detection and tracking are assumed to be successfully addressed. By combining the procedures of object tracking and re-identification induced in this chapter, a fully trace of a person in a large area could be discovered. Nevertheless, we still need to point out that although the methods proposed in last two chapters could be considered as the pre-stage of person Re-ID when the predefined target is set to a person, it is not limited to a person only and other objects could also be tracked. Furthermore, both single view tracking and cross-camera re-identification are designed to realise the tasks of visual data association. However, the sources or the settings of the two tasks are striking different. For instance, only one positive sample is given in the task of tracking online whist a large number of samples could be collected offline for the task of re-identification. Hence, the potential necessary assumptions behind of the two tasks are different.

Due to the various difficulties including illumination changes, viewpoint and pose variations, inter-object occlusions and low resolution images, person re- identification is still a very challenging task and far from being tackled. Most of the state-of-the-art approaches can be categorised into two groups: learning discriminative features which are invariant to view changes [23, 162, 163, 166]

Function Space 2 f 1 f Function Space Associations

Figure 5.1: Visual data association in a cross-camera setting for person re- identification. The traces of a person in a single view could be addressed using the models introduced in the Chapter 3and 4.

and learning the metric functions which are used to rank the pairs of observa- tions from different views [169, 170, 172, 219]. However, in spite of their good performance on public datasets, existing methods generally neglect considering the efficiency of the algorithm in the matching stage. In fact, the searching speed of a re-identification algorithm plays a significant role in real-world applications. Therefore, in this chapter, a novel approach, learning Cross-view Binary Identities (CBI), is proposed to reduce the computational burden for person re- identification.

The rest of this chapter is organised as follows. The hypotheses and motivation of this work are introduced in Sec. 5.1. In Sec. 5.2, how to build a model to learn cross-view identities for person re-identification is detailed. Then, Sec. 5.3

presents how to solve the complicated objective function and convergence proof. Sec. 5.4 presents experimental results. Section 5.6 draws summary.

In document Visual Data Association: Tracking, Re-identification and Retrieval (Page 99-102)