(rotation and translation) that correctly aligns one set of sensor data with another, without any prior knowledge about how the data correspond. An ideal alignment solution would identify all outliers in the data and optimally align the inliers with respect to a geometric error criterion that accounts for noise, such as the L2 error. Note that the terms alignment and registration are used interchangeably in this thesis. The optimisation problem for geometric sensor data alignment can be written as follows. Given two sets of sensor data X1 and X2, a rigid transformation function T, and an objective function f that measures alignment quality, then
optimise R,t
f(T(X1,R,t),X2) (1.1) subject to R∈SO(n)
t∈Rn
where the rotationRand translationtare rigid transformations ofnD Euclidean space. At the optimum, the arguments R∗ and t∗ constitute the aligning transformation or,
from another perspective, the sensor pose. An example transformation function for a point-setP ={pi}Ni=1 is the application of the transformation Rrpi+t to each point.
An example objective function for two point-setsP1 and P2 is the sum of the squared closest-point residualsf(P1,P2) =P
1.1.1 Applications of Sensor Data Alignment
Geometric sensor data alignment is a fundamental task in computer vision, robotics, computer graphics and medical imaging. The underlying tasks of sensor data alignment and sensor pose estimation are themselves useful, and are frequently deployed as basic units in computer vision and robotic systems.
Applications of positional sensor data alignment in 2D and 3D are extensive. They include merging multiple partial scans into a complete model [Blais and Levine, 1995; Huber and Hebert, 2003]; using registration results as fitness scores for object recog- nition [Johnson and Hebert, 1999; Belongie et al., 2002]; registering a view into a global coordinate system for sensor localisation [Nüchter et al., 2007; Pomerleau et al., 2013]; fusing cross-modality data from different sensors [Makela et al., 2002; Zhao et al., 2005]; acquiring shape data [Gelfand et al., 2005; Aiger et al., 2008]; and finding relative poses between sensors [Yang et al., 2013a; Geiger et al., 2012]. Some higher-level applications include recording cultural heritage [Remondino, 2011], map- ping underground mine sites [Magnusson et al., 2007], and Simultaneous Localisation And Mapping (SLAM) tasks in mobile robotics [Smith and Cheeseman, 1986; Leonard and Durrant-Whyte, 1991].
Applications of directional and positional sensor data alignment (2D–3D registra- tion) are also numerous, since the ability to find the pose of a camera and map visual information onto a 3D model and vice versa is useful for many tasks. They include visual localisation [Nöll et al., 2011; Svärm et al., 2014, 2016]; camera pose estima- tion and tracking [Hartley and Kahl, 2009; Bazin et al., 2013; Kneip et al., 2015]; augmented reality [Marchand et al., 2016]; motion segmentation [Olson, 2001]; object recognition [Huttenlocher and Ullman, 1990; Mundy, 2006; Aubry et al., 2014]; auto- mated cartography [Fischler and Bolles, 1981]; and hand–eye calibration for robotics [Horaud and Dornaika, 1995; Seo et al., 2009; Heller et al., 2012; Ruland et al., 2012]. Some higher-level commercial applications include autonomous vacuum cleaners such as the Dyson 360 Eye, and augmented reality platforms such as the Microsoft Hololens, the Oculus Rift, and the Google ARCore and Qualcomm Vuforia software development kits [Zia et al., 2016].
While not a focus of this thesis, directional sensor data alignment also has many applications. A commonly used application is panoramic image stitching [Bazin et al., 2013; Enqvist et al., 2015; Parra Bustos et al., 2016], where the homography relating two images can be obtained by rotation-only search if the camera is sufficiently distant from the scene. A commercial application in this domain is Google Photo Sphere, an application for creating 360◦ panoramas from photos.
Figure 1.3: Two partially-overlapping observations (left and right) of the dragon model (middle) from the Stanford Computer Graphics Laboratory. The pair of point-sets has more structured outliers (in black) than inliers (in blue) because the overlapping region is small. 1.1.2 Key Challenges
There are two key challenges inherent to the alignment problem: outliers and non- convexity. The former arises from the sensor data, whereas the latter arises from the objective function and transformation.
Outliers are pervasive in sensor data and, for two sets of sensor data, consist of those data elements in each set that do not correspond to any element in the other set. They emanate from four major sources: sensor noise and error, sampling effects, changes in the scene and changes in the sensor viewpoint. The first two sources gen- erate random outliers. For example, impulsive noise, multipath errors, and sparse or uneven sampling can produce random outliers. The last two sources generate struc- tured outliers, which are typically more numerous. For example, a dynamic object may be absent in one dataset but present and occluding surfaces behind it in another, and parts of a scene may be visible from one viewpoint but absent or occluded from another. In real data, partially-overlapping observations are the most frequent and significant source of outliers, an example of which is shown in Figure 1.3. Outliers are problematic to alignment algorithms because alignment is a joint transformation and correspondence problem, and outliers invalidate the correspondence assumptions common to many alignment objective functions. That is, many objective functions do not model outliers or inadequately model them.
Non-convexity (or non-concavity for maximisation problems) is a property common to most useful alignment objective functions, as illustrated by Figure 1.4. Furthermore, rotation constraints also lead to non-convex optimisation problems. Hartley and Kahl [2007], for example, showed that many quasi-convex objective functions in multiple view geometry problems can be solved efficiently, unlike the many non-convex functions that arise when rotation parameters are to be solved. While it may be relatively
0 90 180 270 360 0 0.2 0.4 0.6 Rotation Angle (◦) ICP Error
Draft Copy – 30 January 2018
Figure 1.4: Alignment objective function non-convexity. In this example, the ICP objective function [Besl and McKay, 1992] is plotted for 10 random 2D points undergoing a rotation-only transformation with no outliers. The non-convexity of problems with outliers, sensor data of higher dimension, or transformations with higher degrees-of-freedom is even more pronounced. straightforward to find a local optimum of a non-convex alignment function, finding the global optimum is a hard optimisation problem.
Although not necessarily inherent to the alignment problem, tractability is another challenge for alignment algorithms. In particular, many optimal algorithms developed to circumvent the non-convexity problem have exponential time complexity and there- fore cannot be used for large datasets. However, recent work [Straub et al., 2017; Campbell et al., 2017] has shown that under a slightly weakened optimality condition, these algorithms can run in polynomial time. Moreover, sophisticated data structures, data representations and parallel processing [Yang et al., 2016; Parra Bustos et al., 2016; Campbell and Petersson, 2016] can greatly reduce the computational cost. 1.1.3 Existing Alignment Approaches
In this section, the state of the art in geometric alignment methods will be briefly outlined, with an emphasis on how the literature handles the key challenges of outliers and non-convexity as identified above. A full classification and review of the literature is given in Chapter 2.
Algorithms that solve the alignment problem can be classified into two groups: those that require a set of putative correspondences between elements of the sensor datasets [Horn, 1987; Fischler and Bolles, 1981; Enqvist et al., 2009; Lepetit et al., 2009; Svärm et al., 2016; Sattler et al., 2017] and those that do not [Besl and McKay, 1992; Fitzgibbon, 2003; Aiger et al., 2008; Breuel, 2003; Li and Hartley, 2007; Yang et al., 2016; David et al., 2004; Brown et al., 2015]. This thesis focuses on solutions to the more challenging and general problem of alignment without correspondences, designated asgeometric matching problems by Breuel [2003].
Foundational solutions to geometric matching problems applied local optimisation techniques to non-robust objective functions and were therefore susceptible to local optima and outliers. That is, the methods were liable to find only a locally-optimal solution to a non-convex objective function whose measure of alignment quality was unreliable when outliers were present in the data. The Iterative Closest Point (ICP) algorithm, proposed by Besl and McKay [1992], became the technique de rigueur for aligning positional sensor data due to its conceptual simplicity, ease of use and good performance. It alternated between finding closest-point correspondences given the current transformation and then finding the least squares transformation given the current correspondences. For aligning directional and positional sensor data (2D–3D alignment), the SoftPOSIT algorithm, proposed by David et al. [2004], was considered the most computationally efficient approach [Moreno-Noguer et al., 2008]. It alternated between probabilistic correspondence assignment given the current transformation and iterative transformation update given the current correspondences, using a least squares objective function.
To improve the robustness of these methods to outliers, different alignment objective functions have been advanced. Fitzgibbon [2003] proposed Levenberg-Marquardt ICP (LM-ICP), extending the ICP algorithm into the LM optimisation framework [Moré, 1978]. Modelling registration as a general optimisation problem enabled the use of robust Lorentzian and Huber error functions that attenuate the influence of outlier correspondences. The family of probabilistic alignment approaches [Chui and Ran- garajan, 2003; Myronenko and Song, 2010; Tsin and Kanade, 2004; Jian and Vemuri, 2011] also enabled the use of robust objective functions. In particular, modelling point- sets as probability distributions permits the closed-formL2 distance between densities recommended by Jian and Vemuri [2011]. For 2D–3D alignment, Moreno-Noguer et al. [2008] proposed the BlindPnP algorithm, which represented the region of expected transformations (the pose prior) as a Gaussian mixture model from which a Kalman filter was initialised to guide a local pose search routine. It derived its robustness to outliers from a correspondence hypothesising step that was similar to the RANdom SAmple Consensus (RANSAC) algorithm [Fischler and Bolles, 1981] but was restricted to the subset of matches that were plausible from that pose guess.
Global optimisation approaches to geometric matching problems endeavour to avoid incorrect locally-optimal alignments by expanding the search domain. Furthermore, op- timality can be guaranteed by applying the Branch-and-Bound (BB) paradigm [Land and Doig, 1960], conferring immunity to the non-convexity of the problem. Breuel [2003] introduced the use of BB to optimally solve 2D geometric alignment problems, proposing a family of bounding functions for different transformations and objective functions. However, he identified tractability as the primary impediment to extending
the work to 3D sensor data. Ceding outlier robustness was an early strategy to extend optimality to 3D alignment problems. For example, Li and Hartley [2007] minimised a non-robust LipschitzisedL2 error function using BB, but assumed the transformation was pure rotation and that the point-sets were the same size with no outliers. More recently, Yang et al. [2016] proposed the Go-ICP algorithm for 6-DoF 3D–3D align- ment, which found the optimal solution to the closest-pointL2error between point-sets using a nested BB scheme. Brown et al. [2015] proposed a similar solution for 6-DoF 2D–3D alignment. The last two methods proposed a trimming strategy to compensate for their non-robust objective functions, however this required the inlier fraction to be specified, which can rarely be known in advance. If the inlier fraction is over- or under-estimated, the trimmed objective function may become distorted such that the location of the global optimum does not occur at the correct pose.
While RANSAC approaches [Fischler and Bolles, 1981; Grimson, 1990], which ran- domly hypothesise a minimal set of correspondences, compute a transformation and evaluate its quality, are global search methods that can confer outlier robustness, they do not guarantee optimality and quickly become intractable as the number of points and outliers increases. The first globally-optimal 6-DoF geometric matching meth- ods for 3D sensor data with inherently robust objective functions were proposed in Parra Bustos et al. [2016], Campbell and Petersson [2016] and Campbell et al. [2017], the latter two of which are presented in Chapters 5 and 6 of this thesis. Using the Go-ICP framework, Parra Bustos et al. [2016] optimised the robust inlier set cardi- nality maximisation objective function for optimal 3D–3D alignment. They achieved efficient runtimes by exploiting stereographic projection techniques and sophisticated data structures including circular R-trees and matchlists.
This brief summary of the geometric alignment literature highlights the need for methods that are intrinsically robust to outliers and are not susceptible to local optima. In the next section, these considerations will be formalised into an objective to be pursued in this thesis.