Summary - Shape Clustering and Spatial temporal Constraint for Non rigid Structure from Motion

the average of 3 trials for each sequence. In Figure 3.4(c) and Figure 3.4(d), our method outperforms PND in 5 our of 6 the sequences on both configurations. Fi- nally, we evaluated our method under incomplete measurements case with 10% random missing data. Having obtained the average of 3 trials per sequence, as shown in Figure 3.4(e), we found that our method outperforms PND on 5 sequences with fixed parameters, while with individually configured parameters our method outperforms PND on all sequences. To get an idea of how the performance changes with missing data ratio, we conducted experiments on “Ball” sequence with missing data ratio ranging from 5% to 25%. As demonstrated in Figure 3.4(f), our method shows superior performance compared with PND, which proves that shape clustering does not affect the baseline method’s capability of handling missing data. To give a visual result of 3D reconstruction, in Figure 3.5, we illustrate the 3D point clouds on 4 UMPM sequences for our method and PND, with one repre- sentative frame for each sequence. Red objects show the ground truth, and blue represents reconstruction results. Clearly, our method outperforms the current state-of-the-art NRSfM method PND by a margin.

We also analyze the trajectory of a single point in “free” sequence, to show that our method performs a global error reduction on most frames. Figure 3.6 shows one point’s trajectories in x-axis, y-axis and z-axis, respectively, and finally shows the overall 3D reconstruction error for every frame. It can be seen that PND gives poor results on certain groups of frames due to high shape complexity, while our method mostly fixes the problem. This indicates that the shape complexity of the whole sequence is beyond the capability of PND, but is well handled by our method when shape complexity is significantly lowered by shape clustering.

3.5 Summary

In this chapter, we present a novel reconstructability measure to general NRSfM and an iterative shape clustering based NRSfM method. Our method is easy to implement and pushes the performance of NRSfM methods to a new limit. Future work include extending our method to dense NRSfM case and automatic model selection in shape clustering.

3.5 Summary 32

Figure 3.6: Top left, top right, bottom left: a point trajectory in ”Free” sequence showing X,Y,Z coordinates, respectively. Bottom right: error of each frame in ”Free” sequence.

Chapter 4 Spatial-Temporal Constraint in

Dense NRSfM

While sparse NRSfM has developed the capability of dealing with long-term, complex motions, dense NRSfM is still a quite difficult problem. As dense NRSfM re- constructs the 3D position for each pixel in a video, a huge number of variables are added to the system. This makes the NRSfM problem more under-determined and introduces more computational complexity, which prohibits most current sparse NRSfM method from scaling to the dense scenario. However, due to the high density of point clouds, spatial constraints between adjacent points become viable. The state-of-the-art method for dense NRSfM is the dense variational method proposed by Garg et al. [3], which uses both trace-norm minimization and total variation in the optimization process. This method adds a total variation term to Dai’s block matrix method [7] to regularize the 3D shape, gaining the ability of handling dense surfaces. Their paper gave the best performance so far on the dense synthetic benchmark, but the optimization process is quite complex, and requires a GPU hardware to be efficiently performed [3]. Furthermore, they only implemented the method on complete and accurate datasets, so its robustness re- mains questionable. Subsequent methods have focused on dealing with raw 2D videos with segmentation. Yu et al. [28] proposed to utilize the temporal smoothness in both camera motion and 3D deformation. Russel et al.[12] employed the smoothness inside segmentations and performed piece-wise reconstruction. Ran- ftl [35] investigated the relative scale in a dynamic scene. Due to the usage of segmentation, all these methods are computational complex.

In this chapter, we present a simple and robust method to address the problem of dense NRSfM, utilizing the inherent temporal and spatial smoothness of deforming shapes. Our method features the following contributions: (i) We re-

4.1 Motivation 34

visit the temporal smoothness proposed by Dai et al.[7], and prove that it can be directly implemented on the dense case. (ii) Our optimization process uses simple temporal and spatial smoothness, making the cost function convex and easy to solve. (iii) We replace the L2-norm with L1-norm on reprojection error to ro- bustify the method against noise and outliers. (iv) Our method can be efficiently implemented on a CPU, eliminating the requirement of additional hardware. Our analysis and experiments are based on the synthetic dense benchmark (4 sequences) and 2D tracks captured in real-world videos (3 sequences), both proposed by Garg

et al. [3].

(a) Input W (b) e= 0.5084 (c) e= 0.1835 (d) e= 0.0757 (e) e= 0.0700

Figure 4.1: Evolution from 2D to robust 3D shape on Synthetic Face Sequence 2 [3] in the presence of outliers. From left to right: input W; Pseudo-inverse results; Temporal smooth results; Spatial-temporal smooth results; Robust spatial-temporal smooth results. Top row: 4th frame; Bottom row: 6th frame. edenotes the RMS 3D error of the full 3D reconstruction under each scenario.

Figure 4.1 shows an evolution from 2D to 3D with different levels of constraints on a dense benchmark sequence with outliers. Each column shows the reconstruction result from a level of constraints. As we enforce temporal, spatial and robustified constraints, the reconstruction results improves.

4.1 Motivation

Our research begins with revisiting the total variation method proposed by Grag

et al. [3]. As this method achieves the best dense NRSfM performance so far, we look to improve some of its weaknesses. Recall that the convex optimization is as follows:

4.1 Motivation 35

Edata is a quadratic penalty to projection error:

Edata=

2kW −RSk

F (4.2)

where k · kF denotes the Frobenius norm of a matrix.

The regularization term Ereg is defined by a total variation regularizer T V{·}

that smoothes the 3D shape while preserving discontinuities. Assuming every point in matrix S is associated with a specific pixel in a reference image, every point has a 2-dimensional adjacency with 4 surrounding points. Let Si_f be a row matrix containing the i-th coordinate of all pixels in framef,Ereg has the following form:

Ereg = F X f=1 3 X i=1 T V{Si_f}= F X f=1 3 X i=1 N X p=1 k_OSi_f(p)k (4.3)

where _OSi_f(p) is the 2D gradient of Si_f at pixel p, which is defined as forward differences in horizontal and vertical directions on the image.

The trace norm termEtrace describes a low-rank constraint that is based on the

assumption of low-dimensional linear shape space. LetS] ∈_RF×3P _{be a rearranged}

version of S as in [7], Etrace is defined as:

Etrace=kS]k∗ =

min(F,3P)

j=1

Λj

where Λj is a singular value of S].

The method starts from initializingRandS using rigid structure-from-motion, assuming a dominant rigid component is present. Then an iterative optimization is used: fixS and minimise the cost function aboutRusing Levenberg-Marquardt algorithm, and then fix R and minimize the cost function about S, until conver- gence. The step of minimizing the function with fixed R is further decomposed into the following steps, by decoupling trace norm and TV regularization using proximal splitting: min S 1 2θkS−S¯k 2 F+ λ 2kW−RSk 2 F+ X f,i,p k_OSi_f(p)kmin ¯ S 1 2θkS−S¯k 2 F+τk ¯ S]k∗ (4.4)

where ¯S is an auxiliary variable, andθ is a parameter to makeS and ¯S close. The first equation can be transforming it to its primal-dual form using Legendre-Fenchel

In document Shape Clustering and Spatial temporal Constraint for Non rigid Structure from Motion (Page 43-48)