Spatial Inference from Sparse Segmentations

2.5 Sparse Video Object Segmentation

2.5.4 Spatial Inference from Sparse Segmentations

It seems sensible that doing a sparse trajectory segmentation should be a powerful step in a subsequent dense pixel segmentation of an image sequence. In 2008 we began to explore this idea for a visual surveillance application [15] where inaccurate foreground delineation was tolerable. We only wanted to detect if a new object had been introduced in an outdoor scene of interest. Fig. 2.11 shows two sequences;Dumping Pedestrian (687 frames) andDumping Car

(1099 frames) which are typical examples of the type of sequences we are interested in for this surveillance application. In both sequences black garbage bags are placed into the respective scenes. We are able to detect the presence of these new objects from the segmentations performed by our technique.

First, the background is modelled as a collection of sparse SIFT features _B learnt from a sequence of training image, which is typically 100 frames of only the background scene. For a framef that occurs when the surveillance system is online and monitoring a scene, a set of SIFT features_Sf is extracted from this frame. The features for the foreground at framef are assumed to be those in Sf that do not have a matching feature in the learnt background feature set B. The second row of fig. 2.12 shows the foreground (red ‘_×’) and background (yellow ‘.’) features detected for frames 283 (left), 529 (center) and 278 (right) in theDumping Pedestrian sequence. Here the background features are those in the extracted feature set _Sf for frame f that have

Figure 2.11: Example sequences our surveillance application is expected to segment. Top row: Frames 1, 321 and 613 in theDumping Pedestriansequence of 687 frames. In this sequence a pedestrian enters and exits the scene leaving a black garbage bag behind. Bottom row: Frames 1, 545 and 1097 in the Dumping Car sequence of 1099 frames. In this sequence a car enters and exits the scene leaving a small black garbage bag behind.

matching features in the background model. The top row shows the corresponding frames for the feature maps in the second row. Note that there are features classified as foreground that actually correspond to background objects. These features are generated due to environmental changes subsequent to the background modelling step. Hence these features are not included in the background model. However, we design a foreground pseudo-likelihood map which helps to identify legitimate foreground features from these new background features generated due to environmental changes.

The pseudo-likelihood map generated for framef considers the spatial locations of the foreground and background features. The idea is that an image region is more likely to be foreground if it has a high density of foreground features. Also this image region must have a low density of background features as well. The foreground and background features are superimposed on these foreground pseudo-likelihood maps in thesecond row of fig. 2.12. Here high and low foreground likelihood values are coloured inbright orange andblack respectively.

2.5.4.1 Tracking Foreground Regions

We apply a threshold to the pseudo-likelihood map in order to identify blobs of foreground pix- els. These foreground blobs are then tracked using a Viterbi tracking strategy proposed by Pitie

2.5. Sparse Video Object Segmentation 31

τ

= 229

τ

= 256

τ

= 61

τ

= 226

τ

= 124

Figure 2.12: Top row: Frames 283 (left), 529 (center) and 278 (right) in theDumping Pedes- triansequence. Second row: Foreground (×) and background (.) feature points superimposed on a foreground pseudo-likelihood map, for the frames in the top row. In this pseudo-likelihood map high and low foreground likelihoods are coloured bright orange and black respectively.

Third row: Blob sequences for the Dumping Pedestrian sequence, superimposed on the frames in the top row. The background regions are shaded black, while the foreground is extracted using the foreground/background matte obtained from applying a threshold to the pseudo-likelihood map in the secondrow. The arrows indicate the transition of the centers of the blobs for a blob trajectory which is τ frames long. The blobs shown (circled) occur at the temporal middle frame in the respective blob trajectories, where the centers of the starting and ending blobs in each trajectory are indicated with×s and+s respectively. Bottom row: Blob trajectories for the Dumping Carsequence.

et al.[90]. Hence we obtain trajectories of foreground and background blobs from tracking with the Viterbi algorithm. We are able to differentiate between foreground and background blob trajectories based on their temporal durations. In general, unlike foreground trajectories, background trajectories do not exist for more than 20 frames. Hence a blob trajectory is considered to correspond to a legitimate foreground object if it persists for more than 20 frames.

The third row of fig. 2.12 shows the generated foreground blob trajectories for theDumping Pedestrian sequence. Here the length in frames of each blob trajectory is τ. For example

τ = 229 for the blob trajectory corresponding to the pedestrian (left). The centers of the starting and ending blobs in each trajectory are indicated with a green _× and a yellow + respectively. The red arrows indicate the displacement between the centers of the blobs in a sequence from frame to frame. We correctly estimate trajectories for the pedestrian and the garbage bag. However, a foreground blob trajectory was generated for a background image region, because this region corresponded to a background object that was stationary during background modelling, but subsequently started to move. This is an understandable result, and some cues other than temporal duration would have to be used to distinguish this trajectory from a legitimate foreground trajectory. For a sequence like the Dumping Car sequence (fig. 2.11) where the background objects remain roughly stationary during background modelling and online scene monitoring, we generate the correct number of foreground blob trajectories. The

bottom row of fig. 2.12 shows the two correctly detected foreground blob trajectories for the

Dumping Car sequence. These blob trajectories correspond to the car (left) and garbage bag (right) respectively.

In document Video object segmentation from long term trajectories (Page 41-44)