Robust Spatial-Temporal Smoothness: The System

4.5 Robust Spatial-Temporal Smoothness: The

System

In previous sections we have proposed to use temporal smoothness, spatial smoothness and an alternated version of the data term. By enforcing modified data term and both spatial and temporal smoothness constraints, we reach the following cost function:

kW −RSk1+λ1kHSk2F +λ2kAvec(S)k2F (4.11)

where λ1 and λ2 are the trade-off parameters. The three terms are “Data term”,

“Temporal smoothness term” and “Spatial smoothness term” correspondingly. As discussed above, the L1-norm of the reprojection error can be replaced with L2- norm with IRLS. Therefore, in each iteration, we solve the following problem:

min

St kE(W −RS)k

F +λ1kHSk2F +λ2kAvec(S)k2F, (4.12)

The cost function is a quadratic function of non-rigid shape S. Therefore, we can derive a closed-form solution by using the first order condition. We take the gradient of S in each of the three terms:

∂kE(W −RS)k2 F ∂S = 2R T ETERS−2RTETW, (4.13) ∂kHSk2 F ∂S = 2H T_HS_, _(4.14)

For the spatial term, the gradient of kAvec(S)k2

F can be computed element-

wisely. According to the chain rule, we separate the partial derivative of vec(S) and S: ∂kAvec(S)k2 F ∂Sij = Tr " ∂kAvec(S)k2 F ∂vec(S) T ∂vec(S) ∂Sij # = Tr 2ATAvec(S)T ∂vec(S) ∂Sij , (4.15) where ∂vec(_∂_SS)

ij =eij denoting a all-zero vector except the position corresponding to

ij being 1. Due to the special structure of eij, we have:

∂kAvec(S)k2

∂Sij

= 2(ATA)ijvec(S), (4.16)

where the subscriptij denotes the position corresponding to vectorization (say the ((j−1)P +i)-th row of ATA), therefore each element of the partial gradient is a

4.5 Robust Spatial-Temporal Smoothness: The System 43

linear function of vec(S).

vec(∂kAvec(S)k

∂S ) = 2(A

T_A_)vec(_S₎_, _(4.17)

The solution to the convex optimization problem lies in where the gradient g(S) = 0: ∂kE(W −RS)k2 F ∂S +λ1 ∂kHSk2 F ∂S +λ2 ∂kAvec(S)k2 F ∂S =0, (4.18)

As discussed above, we need to vectorize the function to get a consistent solution to g(S) = 0. According to the Kronecker expression of matrix vectorization:

vec(AXB) = (BT ⊗A)vec(X), (4.19)

we reach the following closed-form solution:

IP ⊗(RTETER) +λ1IP ⊗(HTH) +λ2ATA

vec(S) = vec(RTETW),

(4.20) Due to the large number of points in the dense NRSfM problem, the solution above is an extremely big linear-equation system. The size of it goes beyond the capability of ordinary CPUs, therefore we use a more efficient gradient descent method to solve it through iteration, as the cost function is convex and smooth:

g(S) = 2RTETERS−2RTW + 2λ1HTHS+ 2λ2ivec((ATA)vec(S)), (4.21)

where ivec denote the inverse operator to vectorization, which transforms a vector to matrix. In the iterative gradient descent process, a shape in iteration t+ 1 is calculated as follows:

S(i+1) =S(i)−µg(S(i)), (4.22)

whereµis the step parameter that increases each iteration. Gradient descent stops when the cost function of S reaches a minimum value.

In summary, our algorithm includes a triple-loop optimization: given a solved camera matrix R, we get an initial S from temporal smoothness constraint. Then in the outer loop, gradient descent is used to iteratively update the shape, while in the inner loop different descent step is tested for a local minimum of cost function. The algorithm is summarized in 2.

4.6 Experimental Results 44

Require: 2D feature tracks W (may be corrupted); rotation matrix Rsolved from W

Initialize: 3D shapeS0 obtained from temporal smoothness. while Not converged do

1).Calculate diagonal matrix E(t) with Eii= 1/

kWi−RiSt−i 1k

2).Calculate cost function ofS(t):

F(S(t)) =kE(t)(W −RS(t))kF +λ1kHS(t))k2F +λ2kAvec(S(t))k2F

while Cost function decreases do 1). Increase µ.

2). Calculate F(S(_i₊₁t) ) whereS_i(₊₁t) =S(_it)−µg(S(_it)). end while

end while

Ensure: Non-rigid shape S.

Algorithm 2: Dense NRSfM based on Spatial-Temporal constraint.

4.6 Experimental Results

Rotation matrixOur algorithm requires a pre-computed rotation matrix for convex optimization. Here we use the trace-norm minimization method from [7] to solve the camera. This method is scalable from sparse case to dense, because the number of points is irrelevant to the solving process. Instead, the only information required is the dimensionality of the shape, K. We manually picked the optimal

K for each sequence we experimented.

Systematic experiments To evaluate our method against other state-of-art methods, we picked 4 dense sequences from Garg et al. ’s dense synthetic bench- mark [3], and 3 sequences extracted from real-world videos [3]. Each sequence contains a 2D correspondence matrix and a quad mesh for neighborhood assign- ment. These sequences have over 20,000 trajectories forming dense surfaces, which makes the problem much more challenging than the sparse scenario.

Table 4.1: Quantitative evaluation on 4 synthetic face sequences. (Average RMS 3D reconstruction error.)

Dataset PTA MP DV Ours

Seq1 0.2431 0.2575 0.0531 0.0636 Seq2 0.0988 0.0644 0.0457 0.0569 Seq3 0.0596 0.0682 0.0346 0.0374 Seq4 0.0877 0.0772 0.0379 0.0428 Average 0.1223 0.1168 0.0428 0.0502

Synthetic sequences: 4 synthetic human face sequences with changing expres- sions, with 3D ground truth and generated rotations. Sequences 1 and 2 have 10 frames and sequences 3 and 4 have 99 frames.

4.6 Experimental Results 45

Real sequences: 3 sequences obtained from real videos, each containing a de- forming Face, Back and Heart, respectively.

(a) Synthetic Face 1 (b) Synthetic Face 2 (c) Synthetic Face 3 (d) Synthetic Face 4

Figure 4.9: Results of synthetic face sequences using spatial-temporal constraints. Red: ground truth; Blue: 3D reconstruction result. Parameters used: λ1 = 10−3, λ2= 1. Top

row: front view. Bottom row: side view. The last frame is used to present the results.

On these sequences, after solving the rotation matrix through trace-norm minimization as in [7], we first force a temporal smoothness constraint on the 3D shape to get a smooth non-rigid shape as an initialization , then get our results in an iterative fashion, optimizing the cost function with spatial-temporal constraints. Note that on sequence 1 where the correct rotation is difficult to obtain, we forced a rigid shape (more accurate than temporal smoothness) to make the initialization closer to the optimal shape. This is because the cost function is reduced at a slow rate even long before real convergence, and we stop the iteration at an approximated convergence point. The parameter selection is λ1 = 10−3,λ2 = 1.

On synthetic sequences, the results of our method are shown in Figure 4.9. We overlap the ground truth shape in red and the 3D reconstruction in blue, in order to give an effective comparison. These figures show that our method can reconstruct the 3D object quite accurately. Table 4.1 shows the quantitative evaluation of our method along with various others methods, including Trajectory Basis (PTA) [9], Metric Projection (MP) [8] and Dense Variational method (DV) [3]. Among these methods, DV is the state-of-art dense NRSfM method, and PTA and MP are sparse methods that are scalable to the dense case. As shown in the table, our method achieves competitive performance as Dense Variational approach, and outperforms the other 2 methods on all sequences. Especially on sequence 1, which is the most challenging because of a small camera rotation, our method achieves good performance while both PTA and MP fail to reconstruct the object.

4.6 Experimental Results 46

(a) Face (b) Back (c) Heart

Figure 4.10: Top row: real 2D videos from left to right: Face, Back and Heart, respectively. Middle and bottom row: results of dense sequences obtained by spatial-temporal smoothness. Sub-figure (a) to (c) are the front views of the respective sequences, and (d) to (f) are the side views. The last frame is used to present the results.

results obtained by our method are shown in Figure 4.10. As shown in the figures, our method performed reasonable results on Face and Back sequences, while on the challenging Heart sequence that has both large deformations and little rotation, our result seems to be too flat. This to some extent emphasizes the importance of a correct rotation matrix.

(a) Data with noise (b) Data with outliers

Figure 4.11: (a) Curves of 3D error on synthetic sequences with noise. Noise ratios are selected at 0%, 1%, 2%, 3%, 4%, and 5%. (b) Curves of 3D error on synthetic sequences with outliers. Outlier ratios are selected at 0%, 2%, 4%, 6%, 8%, and 10%.

Dense input with noise: As stated in previous sections, assuming a smooth 3D surface, our spatial smoothness constraint encourages local smoothness, hence

4.6 Experimental Results 47

(a) Input W (b) Temporal Smooth (c) λ1= 10−3, λ2= 1 (d) λ1= 10−3, λ2= 5

(e) Temporal Smooth (f) λ1= 10−3, λ2= 1 (g) λ1= 10−3, λ2= 5

Figure 4.12: Results of noisy W with σn = 0.02 max{W} with different parameters.

Sub-figure (a) to (d) are the front views of the face sequence, and (e) to (g) are the side views. The 119th frame is used to present the results.

increasing the accuracy and definition. To evaluate the performance of our method, we add Gaussian noise to the 2D input images, with the standard deviation σn =

rmax{|W|}, where r is the noise ratio ranging from 0.01 to 0.05. We take the average of 5 experiments for each situation to ensure an accurate result.

Figure 4.11(a) shows the performance of our method under different noise ratios on 4 synthetic sequences. It shows that even at large noise ratios, the 3D error of our method is still kept at a low level.

Dense input with outliers: to evaluate the capability of dealing with outliers, we performed and experiment with the following scenario: a certain amount of points in the video (F P points in total) are set at random positions. The outlier ratio is 2%, 4%, 6%, 8% and 10%, respectively. We calculate the final error from the average of 5 trials, in order to get a statistically accurate result.

Figure 4.11(b) illustrates the performance of our method under different outlier ratios. As outlier ratio increases, the 3D error increases slightly, keeping under 0.1 for all synthetic sequences. The error curves are quite linear, which adds proof to the robustness of our method.

Extensive analysisIn previous experiments we have fixed the spatial parameter to 1. Here we show the impact of different values of spatial parameter on real sequence Face with noise. It is shown in Figure. 4.12 that temporal smooth can generally recover the 3D structure from a noisy W, but the result remains noisy. With spatial constraint added, the 3D surface is much smoother, and smoothness increases as the spatial term is emphasized (λ2 is increased). Note that an exces-

4.7 Summary 48

sive spatial constraint will cause over-smooth, which turns detailed textures (i.e. contour of a nose) into smooth surfaces.

4.7 Summary

In this chapter, we provide a unified framework to dense point trajectories based non-rigid structure from motion, which utilizes spatial and temporal constraints to regularize the under-constrained problem. Furthermore, the cost function has been robutified to deal with real world noise and outliers. Our method reaches competitive performance with the state-of-the-art method. The implementation of our method only involves solving a series of least squares problems, thus making dense NRSfM easy.

Chapter 5 Conclusion and Future work

5.1 Conclusion

In this thesis, we give a summary of the development of NRSfM, and address the problems of the current NRSfM methods in both sparse and dense scenarios. We propose a shape clustering method for sparse NRSfM, and a spatial-temporal constraint method to deal with dense NRSfM. Both our methods give state-of-the- art performance under NRSfM benchmarks.

In Chapter 3, for sparse NRSfM, we first review the “reconstructability” concept and extend it from trajectory vs. known camera motion to general shape complexity vs. unknown motion complexity. Then we propose a similarity clustering method to divide a long-term, complex motion to several simple motions, reducing the rank of each subsequence, to lower shape complexity and raise reconstructability. Our method is simple, effective, and can be directly implemented on existing baseline algorithms.

In chapter 4, we propose a robust spatial-temporal constraint for dense NRSfM using convex optimization. We give a step-by-step analysis starting by extending temporal constraint from sparse to dense, then add a spatial constraint enforcing a Laplacian filter, and finally robustify the reprojection error by replacing L-2 norm with L-1 norm. Our method gives a simple and elegant convex optimization that can effectively solve dense NRSfM problems.

In document Shape Clustering and Spatial temporal Constraint for Non rigid Structure from Motion (Page 54-61)