• No results found

Alignment Without Correspondences

1.2 Objective and Approach

2.1.2 Alignment Without Correspondences

When a set of correspondences is not available and is difficult to obtain, the problem becomes much more challenging, as identified by Hartley and Kahl [2009]. Methods that address this problem must jointly solve for the transformation and the correspon- dences between the sets of positional sensor data. However, most algorithms do not explicitly search over correspondence and transformation space simultaneously, instead assuming that the correspondences determine the transformation or vice versa. More- over, the correspondences themselves need not be explicit, with soft or probabilistic assignment being frequently applied.

In addition to being more challenging, the correspondence-free class of problem is more general than the class that assumes correspondences are known a priori and therefore can be applied in more situations. Furthermore, solutions to the problem of alignment given correspondences, such as Horn’s method [Horn, 1987], can be used as time-efficient sub-algorithms in correspondence-free approaches after fixing the current correspondence set. This can be helpful for finding a locally-optimal transformation before re-optimising over the correspondences.

The alignment of positional sensor data without correspondences has previously been addressed for 2D–2D and 3D–3D geometric matching problems [Besl and McKay, 1992; Fitzgibbon, 2003; Myronenko and Song, 2010; Jian and Vemuri, 2011; Irani and Raghavan, 1999; Aiger et al., 2008; Yang et al., 2016]. While the non-rigid alignment of deformable objects has received significant attention [Van Kaick et al., 2011], this review

will focus on the 3- and 6-DoF rigid alignment problems in 2D and 3D respectively. The methods for aligning positional sensor data without correspondences can use- fully be classified based on the type of optimisation. The following sections will examine approaches that employ local, global and globally-optimal search techniques.

Local Optimisation

The Iterative Closest Point (ICP) algorithm [Besl and McKay, 1992; Chen and Medioni, 1992; Zhang, 1994] is the dominant solution for positional sensor data alignment with- out correspondences due to its conceptual simplicity, ease of use and good performance. Given an initial transformation, the algorithm alternates between constructing a cor- respondence set under the current transformation and estimating the transformation given these correspondences, until convergence. The correspondence set is generated by choosing, for each point in one point-set, its Euclidean nearest neighbour in the other point-set. While in general the nearest neighbour is not the real corresponding point, ICP nonetheless often converges to a reasonable solution. The transformation is estimated by minimising the sum of squared distances between corresponding points, using a closed-form solution such as Horn’s method [Horn, 1987]. Usefully, ICP is able to work on raw sensor data in the form of point-sets, irrespective of their intrinsic properties, such as distribution, sampling density and noise intensity. However, the al- gorithm is limited by its assumption that closest-point pairs should correspond, which fails when the point-sets are not coarsely aligned or the moving ‘model’ point-set is not a proper subset of the static ‘scene’ point-set. The latter occurs frequently, since differing sensor viewpoints and dynamic objects lead to occlusion and partial-overlap. Moreover, this closest-point assumption means that ICP is susceptible to missing cor- respondences, which lead to incorrect data association, and local minima, in which the optimisation gets trapped, producing erroneous estimates without a reliable means of detecting failure.

The large quantity of work published on ICP, its variants and other local registration techniques precludes a comprehensive review. For additional background, the reader is directed to surveys on ICP variants [Rusinkiewicz and Levoy, 2001; Pomerleau et al., 2013] and 3D point-set and mesh registration techniques [Castellani and Bartoli, 2012; Tam et al., 2013]. To improve the speed of the ICP algorithm, Nüchter et al. [2007] used akd-tree and caching for closest-point search, Chen and Medioni [1992] proposed the point-to-plane distance that typically reduces the number of iterations required, and Fitzgibbon [2003] proposed the use of a distance transform for constant-time near- est neighbour look-up. For this, the closest points in one point-set are pre-computed for all grid centres of a discretised volume. While this pre-processing step can be time-

consuming, it is amortised if many point-sets are to be aligned with the point-set that has been processed. To improve the robustness of ICP to outliers from occlusion and partial overlap, outlier rejection [Zhang, 1994; Granger and Pennec, 2002], trimming [Chetverikov et al., 2005], and robust error functions [Fitzgibbon, 2003] have been applied. These approaches perform robust statistical analysis of the residual errors, removing or reducing the influence of those most likely to be outlier correspondences. To enlarge the basin of convergence of ICP, Granger and Pennec [2002] proposed Ex- pectation Maximisation ICP (EM-ICP) that used probabilistic correspondences with Gaussian weights and an annealing scheme on the variance, and Minguez et al. [2005] used a geometric distance measure for finding closest-point correspondences that si- multaneously accounted for translational and rotational displacements, having made the observation that a small rotational displacement caused points far from the sensor to be displaced significantly from their correct correspondents. However, this distance measure prevents the use of data structures for expediting the search for nearest neigh- bours, such as akd-tree or distance transform. Finally, to improve the speed, accuracy and basin of convergence of ICP, Fitzgibbon [2003] proposed Levenberg-Marquardt ICP (LM-ICP), applying the general-purpose LM optimisation algorithm [Moré, 1978]. This approach models registration as a general optimisation problem and is therefore quite versatile, enabling the use of robust error functions to attenuate the influence of points with large errors and distance transforms to compute the ICP error without establishing explicit point correspondences.

The family of probabilistic alignment approaches also seeks to improve the robust- ness of ICP to noise, outliers, and poor initialisations. Many of these approaches [Chui and Rangarajan, 2003; Myronenko and Song, 2010; Tsin and Kanade, 2004; Jian and Vemuri, 2011] can be used for both rigid and non-rigid registration, with non-rigid deformations modelled by thin-plate splines [Bookstein, 1989; Chui and Rangarajan, 2003] or Gaussian radial basis functions [Yuille and Grzywacz, 1989; Myronenko and Song, 2010]. Both Chui and Rangarajan [2003] and Myronenko and Song [2010] took a probabilistic approach to correspondence assignment using a Gaussian affinity matrix. Chui and Rangarajan [2003] proposed the Robust Point Matching algorithm that used soft assignment and deterministic annealing to alternately update the correspondences and estimate the transformation. Each point from one point-set is assumed to corre- spond to a weighted sum of the points from the other point-set using the kernelised pairwise distance affinity matrix. Myronenko and Song [2010] proposed the similar Coherent Point Drift algorithm that interpreted the alternating update strategy using the Expectation Maximisation (EM) framework [Dempster et al., 1977]. Horaud et al. [2011] extended this EM interpretation using the Expectation Conditional Maximisa- tion (ECM) algorithm that shares the desirable convergence properties of EM but is

more suitable for use with anisotropic covariances. However, these algorithms used maximum likelihood estimation, which is sensitive to outliers, and therefore required an additional Gaussian component to model outliers.

A more versatile framework can be constructed by modelling both point-sets as probability distributions and minimising a discrepancy measure between them, obviat- ing the need for establishing explicit point correspondences. Indeed, the ICP algorithm itself has been shown to be related to minimising the Kullback-Leibler divergence of two Gaussian Mixture Models (GMMs) [Jian and Vemuri, 2011]. Tsin and Kanade [2004] developed the Kernel Correlation algorithm that minimised an objective func- tion that was proportional to the correlation of two kernel density estimates, implicitly modelling the point-sets as GMMs. In a similar way, Glaunes et al. [2004] modelled the point-sets as discrete distributions using weighted sums of Dirac measures and then estimated the optimal diffeomorphic transformation between the distributions. A more generic framework was proposed by Jian and Vemuri [2011] with the GMM Registration algorithm. It modelled the point-sets as GMMs with equally-weighted Gaussians centred at every point in the set with identical and isotropic covariances, and minimised the robustL2distance between densities. A very similar framework, the Normal Distributions Transform (NDT) method, was developed by Biber and Straßer [2003], Magnusson et al. [2007], and Stoyanov et al. [2012]. The method modelled the point-sets as structured GMMs with full data-driven covariances, by computing Gaussian parameters at each cell of a 3D grid, and one implementation minimised the L2 distance between densities [Stoyanov et al., 2012]. The algorithm was shown to be faster and more robust to poor initial alignments than ICP [Magnusson et al., 2009]. While these L2 methods are robust to outliers, they are not robust to some common sampling artefacts, including occlusions and variable sampling densities, due to the generative models used to construct the Gaussian mixtures.

The alignment solution proposed in Chapter 4 of this thesis, the Support Vector Registration (SVR) algorithm [Campbell and Petersson, 2015], belongs to this family of probabilistic approaches, exploiting the outlier robustness of the L2 distance between probability densities. However, it also corrects a deficiency in existing approaches by considering robustness to sampling artefacts as a critical feature. This robustness is achieved by applying a discriminative model, a Support Vector Machine (SVM) classifier, to efficiently construct the Gaussian mixtures, which regularises the sampled points, creating a smooth, occlusion-resistant surface independent of point density and adaptive to local structural complexity. Robustness to occlusions and variable sampling densities improved the viewpoint-invariance of the models and therefore the alignment accuracy. However, while this and the other probabilistic methods are more robust to outliers and poor initialisations than ICP, they are still susceptible to local optima

and are dependent on a good transformation initialisation. The next section explores those works that are less susceptible to local optima through the application of global optimisation techniques.

Global Optimisation

Global optimisation endeavours to avoid incorrect locally-optimal alignments by ex- panding the search domain to cover a much greater region in correspondence or trans- formation space. No guarantees are given by algorithms in this category that the global optimum will be attained, although some approaches do specify the probability that a certain number of iterations will find the optimum. In addition, while these approaches solve for both the transformation and the correspondences, they typically alternate be- tween the two spaces rather than solving them jointly. As such, the approaches can be classified into those where the search is led by correspondence space search or by transformation space search.

Methods for which correspondence search leads can be divided into the hypothesise- and-test and the hypothesise-and-vote frameworks. Many of these approaches can also be applied to subsets of the original point-sets, such as those extracted by feature de- tectors, to reduce their runtime. For the family of hypothesise-and-test algorithms, also known as sample-and-verify algorithms, exhaustive search [Huttenlocher and Ull- man, 1990] can be performed by hypothesising a transformation from all possible pairs or triplets of points, for 2D or 3D respectively, in each dataset. As discussed in Sec- tion 2.1.1, these are the minimal number of correspondences required to find the rigid transformation between two sets of positional sensor data. Each hypothesis is tested by transforming one point-set and measuring how well the point-sets align, using a geometric distance or counting the number of inliers. Clearly the complexity of ex- haustive search is higher when correspondences are not available, since every point in one point-set could correspond to every point in the other point-set. For the 3D case with point-sets of sizeM andN, the time complexity isO(M4N3logN) for correspon- dence sampling and transformation testing.

The RANSAC framework [Fischler and Bolles, 1981] can also be applied in the correspondence-free case, adding randomisation to the correspondence sampling step. Using a constant-sized set of random hypotheses for one point-set reduces the time complexity to O(M N3logN). That is, the runtime required for a high probability of success scales polynomially with the size of the input. Nonetheless, the time complexity is high, limiting the approach to datasets with a small number of points. Irani and Raghavan [1999] further proposed the randomisation of the transformation testing step, testing only a constant number of random points in the transformed set except when

this initial test indicates a good quality match. This reduces the time complexity to

O(N3logN) for 3D data, although Irani and Raghavan [1999] only tested 2D datasets

due to the still considerable time complexity. Chen et al. [1999] proposed improvements to the RANSAC framework for 3D alignment, including rigidity constraints and wide 3-point bases to reduce the number of potential correspondences and improve their robustness to noise and outliers.

Another set of approaches use geometric invariances to reduce the time complexity of these hypothesise-and-test strategies [Huttenlocher, 1991; Aiger et al., 2008; Mel- lado et al., 2014; Raposo and Barreto, 2017]. Huttenlocher [1991] observed that the ratio of distances between three collinear point was preserved by rigid and affine trans- formations and so, given a set of 4 coplanar points in one point-set and hence two invariant ratios, all sets of approximately congruent 4-points in the other point-set can be extracted efficiently. Aiger et al. [2008] extended this approach into 3D and, by also pre-processing the invariants and storing them in an appropriate data structure, reduced the time complexity of the problem to O(N2+k), where the number of con-

gruent sets k in the second point-set is small in practice. Additional constraints for rigid transformations further reduced the set of candidate congruent 4-points. The approach used wide 4-point bases for noise and outlier resilience and operated on raw sensor data, although feature descriptors could be used to further reduce the runtime. More recently, a linear-time O(N) extension was proposed by Mellado et al. [2014]

that exploited a hash-table-based data structure tailored to the problem to reduce the time complexity. Finally, Raposo and Barreto [2017] showed that the runtime can be reduced further by using 2-point bases if the normal vector of one of the points is also known. Moreover, using a line base instead of a quadrilateral base allowed wider bases when the overlapping area was small, improving the runtime and noise and outlier robustness.

Like hypothesise-and-test methods, hypothesise-and-vote or sample-and-vote algo- rithms [Ballard, 1981; Stockman, 1987; Olson, 1997; Wolfson and Rigoutsos, 1997] search stochastically over the set of correspondences to generate transformation hy- potheses. However, instead of testing the hypotheses as they are generated, each hypothesis generates a vote for its transformation in a discretised Hough space. At the end, high-probability clusters in transformation space are identified, under the as- sumption that hypotheses with correct correspondences will vote consistently for the correct transformation. Pose clustering methods [Stockman, 1987; Olson, 1997] use an accumulation table for the voting and have aO(M N3) time complexity when sampling

is randomised sampling. A geometric hashing method was proposed by Wolfson and Rigoutsos [1997] to reduce the time complexity of voting methods to O(N3logN) by pre-processing configurations of one point-set using a hash table.

Another class of correspondence-free global optimisation methods has transforma- tion search lead. For this, heuristic or stochastic methods can be applied, although they are not guaranteed to converge to the correct alignment. Sandhu et al. [2010] employed a particle filtering strategy to expand the search domain of a local Gaus- sian mixture correlation optimiser. This principled but stochastic approach defined an uncertainty model for the transformation parameters to robustly predict the new distribution from which to sample particles. A related approach was proposed by Wachowiak et al. [2004] using particle swarm optimisation for the registration of 3D biomedical image data. Robertson and Fisher [2002] and Silva et al. [2005] applied genetic algorithms to the alignment problem, representing transformations as chromo- somes with six genes corresponding to the transformation parameters. A population of individuals (transformation hypotheses) with these chromosomes underwent an evo- lutionary procedure, with fitter individuals having a greater chance of reproducing to form new transformation hypotheses. Finally, Blais and Levine [1995] and Papazov and Burschka [2011] used simulated annealing with robust loss functions to widen the basin of convergence significantly, reducing the likelihood that the search will become trapped in a local optimum near the point of initialisation. However, these methods may not find the correct alignment without a good transformation prior distribution or initialisation being provided, due to the stochastic nature of the search.

As with the local optimisation algorithms, some global optimisation methods rely on the statistical properties of the point-sets. Principal Component Analysis (PCA) has frequently been applied to coarsely align positional sensor data without correspon- dences or transformation prior. Dorai et al. [1997] and Chung et al. [1998] registered 3D data by aligning the centroids of the data and then aligning the principal axes found using PCA. This approach led to 180◦ rotation errors due to the sign ambiguity of the

principal axes and failed for symmetric objects and incompletely overlapping data. An extension was proposed by Xiong et al. [2013b] to align eigenvectors in feature space using Kernel PCA, but still required substantial overlap and minimal occlusion. Fre- quency domain solutions, surveyed in Sun et al. [2014], provide an alternative approach. Makadia et al. [2006] decoupled rotation and translation search using the observation that the surface normal statistics are independent of translation. They obtained the rotation by maximising the convolution of the Extended Gaussian Images (EGI) [Horn, 1984] of the two surface normal sets, using the spherical Fourier Transform, and then estimated the translation using the fast Fourier Transform. However, discretisation artefacts were introduced by the use of histogram density estimates and the reliance on EGI peaks made the method susceptible to noise. Another 3D spectral registration method was proposed by Bülow and Birk [2013] for partially-overlapping data, using Phase Only Matched Filtering (POMF) to estimate the transformation parameters.

However, the use of interpolation for rotation estimation limited the approach to small angular deviations (±30◦).

Machine learning techniques have only had limited application to the problem of positional sensor data alignment. An indirect approach for aligning RGB-D images was proposed by Shotton et al. [2013], which used scene coordinate regression forests to infer the camera pose and thereby align the data. However, the method required a training set of RGB-D images and poses to localise new camera views and therefore would not be able to align, even indirectly, two arbitrary depth images. End-to-end deep learning architectures have not, as of yet, been applied to the 3D–3D registration problem, due to the unstructured, permutation-invariant and size-varying nature of 3D point-set data. However, neural networks have been applied to 3D feature detection and matching [Ai et al., 2017], feature description to encode local geometry using a dimensionality-reducing auto-encoder Elbaz et al. [2017], and a differentiable reformu- lation of the RANSAC algorithm [Brachmann et al., 2017], which shows promise for the correspondence-free alignment problem if the data problem can be solved.

In contrast to the work in this section, the solution to positional sensor data align- ment proposed by this thesis in Chapter 5 provides a guarantee of optimality. Since