Evaluating and Extending Trajectory Features for Activity Recognition
5.2 Co-recognition Approach
5.2.2 Algorithm for Co-recognition
In this section, we explain our algorithm for co-recognition. As briefly illustrated in Fig.5.2, the proposed algorithm consists of three main phases: initial match-ing, multi-layer growmatch-ing, and object correspondence networkmatch-ing, which will be de-scribed one by one in the following subsections.
5.2.2.1 Initial Matching
To form seeds for MCs, initial matches are established using local feature detec-tors [27,29,42,57]. Basically, feature pairs having similar descriptors consist in the initial seed matches. For images, we adopted Harris-affine [40] and MSER [39]
detectors and the SIFT descriptor [34]. For videos, the spatio-temporal interest point (STIP) detector [27] and the HOG-HOF descriptor [29] are used. In initial match-ing, the nearest neighbor (NN) matching schemes are widely used in the literature [8,14,22,34,52]. The NN methods, however, severely degrade inlier detection in the presence of multiple identical objects or similar patterns because in those cases the features have multiple matching features rather than the NN feature. To avoid this problem, we allow each feature to pair with multiple features under a loose threshold; all the matching feature pairs exceeding a loose similarity threshold are simply collected among the possible feature pairs. Matching a feature with itself is avoided in intra-matching within one image.
Fig. 5.6 Multi-layer match-growing. (a) After multi-layering is performed using the initial seed matches, the iterative growing loop starts. At each iteration, the algorithm grows the matches by intra-layer expansion or inter-layer merge, and increases the posterior probability p(Θ|I). The growing procedure is iterated until p(Θ|I)no longer increases. (b) Each expansion layer consists of overlapping circular regions covering the image domain. (c) In an expansion proposal, a cluster attempts to expand a region (blue circle in the bottom) using a support match (solid black line).
When the proposal is accepted, the MC and its expansion layer is updated. (d) In a merge proposal, two clusters attempt to merge into one. When the proposal is accepted, the MC and their expansion layers are combined into one. In (c) and (d), local regions in expansion layers are represented by red squares for better visualization. Expansion and merge in videos works in the similar manner but in spatio-temporal dimensions. For details, refer to [9,10]
5.2.2.2 Multi-layer Growing
For growing initial seed matches to object-level correspondences, we propose a multi-layer growing algorithm driven by expansion/merge moves. The growing pro-cedure is summarized in Fig.5.6a. First, multi-layering is performed using the initial seed matches. Each initial seed match forms an initial singleton MC having its own expansion layer that provides space for expansion. As shown in Fig.5.6c, each ex-pansion layer consists of an overlapping circular grid of regular local regions that covers the entire image [8] or an overlapping spatio-temporal grid that covers the entire video [51]. In the iterative growing loop, expansion is proposed with proba-bility QEor merge with probability 1− QE. In an expansion proposal illustrated in Fig.5.6c, a match in an MC is selected as a supporter match (shown as the black solid line), and a local region around it is chosen as a target region (shown as the blue circle) on the expansion layer of the MC. Then, the target region establishes a new match (the blue dotted arrow), which is propagated using the support match and refined by a local search [9]. Likewise, for videos as depicted in Fig.5.7, an
expan-Fig. 5.7 Expansion move in video data. (a) The support matchMm|ipropagates a target cuboidCT
by applying transform Hm|i(b) Local search around Hm|ialong the coordinate axis (top row), spatial scale sσ (second row), and temporal scale sτ (third row) refines the new match to adapt to the deformation
sion proposal propagates a new cuboid match and then refines it by a local search in a 5-dimensional parameter space [51]. If the expansion proposal is accepted, the propagated match is included in the MC and the target region is eliminated from the expansion layer. On the other hand, in a merge move proposal illustrated in Fig.5.6d, two geometrically similar MCs are selected and proposed to merge. If the merge move proposal is accepted, the expansion layers are also combined into their intersection. The algorithm accepts the expansion/merge proposals when it in-creases the posterior probability p(Θ|I). The growing procedure is iterated until p(Θ|I) no longer increases.
Although the multi-layer approach multiplies expansion layers at the start, merge moves gradually reduce the number of layers and guide expansion moves to concen-trate on potentially matching region. Likewise, expansion moves also guide merge moves to find compatible clusters by gradually growing them. Through these co-operative moves, our algorithm efficiently finds object correspondences in spite of significant number of outliers in initial matches. Therefore, while the multi-layer growing algorithm maximizes the posterior probability of MCs in a greedy manner, its expansion/merge proposals efficiently drive the solution to a good local optimum.
We refer the reader to [9,10] for details concerning the expansion/merge proposals.
5.2.2.3 Networking of Maximal Correspondences
To reveal the relations of detected object regions, we establish the object correspon-dence networks from the set of MCs Θ∗. Since the resultant set of MCs usually includes trivial MCs arising from outliers of initial matches, we first eliminate such unreliable MCs from Θ∗. Typically, object region correspondences are likely to grow larger and have distinctive textures in their regions. Thus, we evaluate the reliability of an MC by its expanded area and its mean variance of the intensity pat-terns, and discard MCs not satisfying the threshold values. Then, the reliable MCs are considered as object correspondences and connected to construct the networks
Fig. 5.8 Object correspondence networking. (a) Each MRC obtained from the growing step is represented by convex hull region pairs of the same color. Many small and spurious MRCs are observed arising from outliers of initial matches. (b) Based on the reliability criteria, MRCs are selected as object correspondences. (c) Reliable MRCs are connected by SL-HAC based on region overlaps, and constitutes the object correspondence networks [10]
according to their overlapping regions. In this work, we use a simple and popular al-gorithm of the Single-Link Hierarchical Agglomerative Clustering (SL-HAC) [20]
to group the MCs. Similarity between two MCs is defined as the ratio of overlap-ping area to the area of the smaller MC region, and SL-HAC sequentially connects the most similar MC pairs until the similarity becomes less than 0.8. As shown in Fig.5.8, the procedure assembles MRCs of the given image into networks, each rep-resenting connected object correspondences, and the detected object regions are all classified into sets of identical objects. The same scheme can be used likewise for networking MCCs in videos. Notably, this networking scheme does not require the pre-determined number of identical object sets, and is robust to missing object cor-respondences since others can provide indirect pathways. More advanced clustering algorithms could be adopted for this purpose as well. In particular, the partial link-age HAC [7] or the noise-robust spectral clustering [32] could make the networks robust to falsely detected MCs.