Discussion - Multi-view dynamic scene modeling

A new direction in dense geometric and temporal 3D analysis has been explored, and a novel low-level approach has been proposed for simultaneously estimating 3D dense motion ﬁelds and probabilistic shape estimates between two consecutive calibrated view sets, using only silhouette cues. Experiments show the viability and robustness of the approach with various real datasets, and outdoor conditions challenging for photometric and surface-based methods. The method is promising and opens new possibilities and applications. As it relies on no explicit boundary modeling, it can be used as input to a variety of scene analysis tasks, such as motion segmentation with no geometric model or prior, or 3D tracking, kinematic structure inference, shape estimation. Existing shape modeling and tracking methods could use the resulting ﬁelds as a cue to replace current

2D optical ﬂow or sparse match inputs without having to explicitly deal with occlusion or premature assumptions associated to an explicit boundary model. Other cues could be included in the temporal analysis, as the Bayesian framework proposed easily allows to perform fusion of heterogeneous cues, such as depth, from stereo or z-cameras, or using other sensor modalities. New temporal shape reﬁnement schemes could be explored by using soft shape priors or using more past observations.

5.7.1 Motion Flow Ambiguity

The computed flow in this chapter is called “3D occupancy flow” instead of general 3D motion flow, not only because it is computed from probabilistic occupancy information inferred from silhouette cues. It is also because some specific motions may not be recovered from this inference, for example self-rotation of a 3D sphere, where the occupancy of the shape is static. This drawback may be alleviated if the surface texture information can be combined in the motion field estimation. But with a uniform colored sphere, it is still going to be a problem. Such degeneracy is also intractable for any vision only based approaches, such as 2D optical flow and 3D scene flow cases.

However, to combine surface textures together with the 3D volumetric motion ﬁeld estimation is not straightforward, since the proposed approach deliberately assumes no explicit surface during the computation for robustness, and it would therefore require a probabilistic representation of where the surface, hence the surface texture is in the 3D volume. This would further introduce visibility ordering along viewing lines, which complicates the voxel independence assumption, i.e. every voxel’s state can be inde- pendently inferred from its camera view projection appearance. This volume-surface information combination will be a future direction to go.

5.7.2 Motion Discontinuity

There are two aspects of the problem. First of all, it has been assumed that the motion field in the 3D volume is spatially smooth in local regions. Dynamic shapes, especially articulated shapes may have drastically different motion at different parts. When these parts happen to be spatially very close, the above assumption is only valid when the volume resolution is at a even finer granularity. Otherwise, the recovered motion would be “over-smoothed” in the questionable region. Fig. 5.8 is likely to be affected by such issue, where the arm and the torso are close together but with different motion directions and magnitudes.

Instead of a uniform sampling of the 3D space with a regular grid representation, the aforementioned problem may be solved by occupancy-probability-guided sampling scheme, which densely samples the regions more likely to have cluttered diﬀerent motion parts, while sparsely samples more uniform regions. This would certainly lose the convenient ordered indexing of the regular grid, but helps overcome the “over-smooth” problem.

Second, the temporal motion smoothness is not taken for granted. Although a lot of the motions daily observed are temporally smooth, it is very common in reality, to have sudden motion changes, such as in a hand-waving motion, which the temporal motion smoothness assumption would in fact cause more problems than to be helpful. Therefore, this still remains an open question whether and how to eﬀectively use the temporal motion smoothness constraint in the formulation.

5.7.3 Skeleton Generalization

One interesting and important application is that skeleton model of an arbitrary piece- wise rigid dynamic shape (such as a person or a spider) can be generalized given a few such motion ﬁelds. With a second EM framework, the parameters of the rigid skeletal parts can be estimated, given the fact that voxels in the rigid body parts should follow

the same motion pattern described by the set of parameters.

Figure 5.10: Skeleton generalization from dense motion ﬁeld. Left: initialization of the three-part cluster of the human volume, thresholded at 0.85 probability; Right: the converged three rigid parts and the motion of these three parts.Best viewed in color.

As an example shown in Fig. 5.10, the motion field is computed frame 740 to 742 of the ROND hand waving sequence. The field is first thresholded at 0.95 probability, and shown in the left plot of Fig. 5.10. The arms are waving upward, and the motion is shown in blue vectors as well as one of the three body parts. The system is initiated with three rigid body parts, and use MATLAB K-mean clustering to get the three parts (red, green and blue). The dense flow is shown in blue vector fields. The parameters to be estimated for each rigid parts are𝑅the 3D rotation matrix and the𝑇 the 3D translation. After 2 iterations of using 𝑅 and 𝑇 computation plus Fast-PD label assignment taking into account spatial continuity, the refined three rigid parts are shown in the right plot of Fig. 5.10. The computed 𝑅s and 𝑇s for each of the three parts are shown in blue vector fields.

The rigid parts can be linked together as a complete skeleton model of the person, given more motion ﬁelds informations in the following sequences. More detailed rigid part decomposition may also be feasible when enough motion information is acquired. After the full skeleton model is recovered, the dense motion ﬁeld computation can be only applied on recovered rigid parts, with normally a much smaller parameter space,

thus drastically speeds up the motion ﬁeld computation. However, the dense motion ﬁeld computation proposed by this chapter is the key to the novel arbitrary skeleton model initialization.

Chapter 6 Heterogeneous Network for Dynamic Shape

Estimation

In document Multi-view dynamic scene modeling (Page 121-126)