Once the point-sets are in mixture model form, the registration problem can be posed as the problem of minimising a discrepancy measure between mixtures, as shown in Figure 4.7. If the point-sets are well-represented by the Gaussian mixtures, the trans- formation that aligns the GMMs will correspond to the transformation that aligns the point-sets. As discussed in Section 3.4.6, the L2 distance between Gaussian mixtures [Jian and Vemuri, 2011] has favourable properties for the geometric alignment prob- lem. It can be expressed in closed-form and efficiently implemented since it avoids numerical approximations of the integral. More critically, it has an estimator that is inherently robust to outliers [Scott, 2001], unlike the maximum likelihood estimator that minimises the Kullback-Leibler divergence. See Section 3.4.6 for a detailed dis- cussion on the robustness of theL2E estimator that minimises theL2distance between probability densities.
Figure 4.7: Two misaligned 1D Gaussian mixtures (left), generated from partially-overlapping point-sets, are aligned by minimising the distance between mixtures (right).
The objective function for the L2 distance between Gaussian mixtures (3.75) was derived in Section 3.4.6 for the general case. In this chapter, the Gaussian covariances are constrained to be isotropic and identical, due to the use of an SVM in the learning procedure. This is a standard choice for many GMA approaches that balances the expressiveness of the mixture model against evaluation speed of the objective func- tion. Let θk = µki, σk2, φki i∈SVk be the parameter set of an nk-component SVGM
with index set SVk, means µki, variances σk2, and mixture weights φki > 0, where P
i∈SVkφki = 1. Then the L2 distance between Gaussian mixtures, up to a constant
factor (2π(σ2
1 +σ22))−n/2 and addition by a constant, for a rotation R∈SO(n) and a translation t∈Rn (n= 2 or 3) is given by the objective function
f(R,t) =− n1 X i=1 n2 X j=1 φ1iφ2jexp − Rµ1i+t−µ2j 2 2 2 σ21+σ22 . (4.12)
This can be expressed in the form of a discrete Gauss transform with a computational complexity of O(n1n2) or a fast Gauss transform [Greengard and Strain, 1991] with a complexity of O(n1+n2).
The gradient vector is derived in the same way as in Jian and Vemuri [2011]. Let M0 =
h
µ1,1, . . . ,µ1,n1i| be the n1 ×n matrix of the mean vectors from the GMM parametrised by θ1 andM=T(M0,λ) be the matrix after applying a transformation
parametrised by λ. Using the chain rule, the gradient is ∂f∂λ = ∂∂fM∂∂Mλ. LetG = ∂∂fM
be ann1×nmatrix, which can be found while evaluating the objective function by Gi =− 1 σ21+σ22 m X j=1 Rµ1i+t−µ2jfij(R,t) (4.13)
where Gi is the ith row of G and fij is a summand of f. For a rigid motion, M =
respect to each motion parameter are given by ∂f ∂t =G|1n1 (4.14) ∂f ∂ri =1 | n G|M0◦∂R∂r i 1n (4.15)
where◦ is the element-wise Hadamard product and ri are the elements parametrising
R: a rotation angleα for 2D rotations and a unit quaternion for 3D rotations. The objective function is smooth, differentiable and convex in the neighbourhood of the optimal motion parameters and therefore gradient-based numerical optimisation methods can be used, such as nonlinear conjugate gradient or quasi-Newton methods. For this implementation, an interior-reflective Newton method was selected [Coleman and Li, 1996], being time and memory efficient and scaling well with the number of Gaussian components. For the quaternion parametrisation of 3D rotations, the unit- norm constraint was enforced by projecting the quaternion back to the space of valid rotations after each update by normalisation. An alternate formulation using Lagrange multipliers was also implemented, however it converged slightly less frequently than the normalisation method. See Section 3.5 for a more detailed discussion about this constraint and the alternatives that can be used to enforce it.
Although the objective function is locally convex, it is rarely convex over the entire transformation domain. As a result, this approach is susceptible to local optima, as with all local optimisation methods. This is particularly problematic for alignment problems with large motions and 3D data with symmetries or near-symmetries. There are many heuristic approaches that can alleviate this problem. Since a scale param- eter is an input to the algorithm, a multi-resolution approach can be adopted. Like simulated annealing, the scale parameterγ is increased at each iteration, with the algo- rithm initialised by the transformation found at the previous scale. This coarse-to-fine strategy is appropriate because the objective function is smoother for smaller values of γ and approaches the ICP objective function asγ increases. Another strategy is to use random-start local search, initialising the algorithm at randomly sampled points in the transformation domain. However, this can be deployed for any local optimisation algorithm and so was not used to ensure a fair comparison.
The Support Vector Registration (SVR) algorithm is outlined in Algorithm 4.1. The initial rotation and translation are typically the identity rotation and translation unless prior information is available from odometry, GPS or some other source. The training parametersν andγ can be estimated, using (4.6) forγ, or by cross-validation on a training set. For most applications, the γ value is identical for both point-sets, although this is not mandatory.
Algorithm 4.1 Support Vector Registration (SVR): a robust algorithm for point-set registration using support vector–parametrised Gaussian mixtures.
Input: two point-sets Pk={pki}ni=1k withk= 1,2; initial rotationR0; initial transla-
tion t0; initial scale parametersγ1, γ2; one-class SVM parameterν
Output: rotation R and translation t such that P1 transformed by (R,t) is well- aligned with P2
1: Initialise rotation and translation: R←R0,t←t0
2: repeat
3: Train an SVM from each point-set:
θSVMk ={pki, γk, αki}i∈SVk ←trainSVM(Pk, γk, ν)
4: Map the SVMs to GMMs using (4.9), (4.10) and (4.11):
θk=µki, σk2, φki i∈SVk ←mapToGMM
θSVMk
5: Optimise the objective functionf(R,t) (4.12) using the gradients (4.14), (4.15), and update the transformation parameters: (R,t)←arg minR,tf(R,t)
6: Anneal the scale parameter: γ ←γδ
7: untilchange in function value or transformation parameters is below a threshold