Probabilistic inference and high dimensional nonlinear optimization are closely related through Bayes rule by using Gaussian distributed measurement likeli- hood models. The factor graph interpretation, Kschischang [129], makes this com- mon understanding possible. Chapters 3 and 5 discuss how nonparametric factor graphs can be assembled, and how non-Gaussian, multi-modal state estimates are inferred.
During the early 2000’s, along with new understanding of using information rather than covariance representations, graph based description of SLAM became
started becoming more popular, for example Bailey [12]. The pivotal factor graph representation was described by Kschischang et al. [129] in 2001. Dellaert et al. [41,42] connected the square root information matrix (of previous EKF-SLAM sys- tems) to the factor graph representation.
Factor graphs provide a tractable language (unifying perspective) for large scale, non-linear, and belief space interactions of many variables and factors (likeli- hood models). The interaction of information from various sensors is naturally de- scribed by adding the associated measurement likelihood model (algebraic func- tions) between variable nodes as a graph. Furthermore, factor graphs can then be used to develop the associated inference algorithms to produce the desired state estimates.
Dellaert et al. [41] illustrates how to write measurement likelihoods as a fac- tor graph model, and how to employ known linear systems solvers, such as [6], to recover mean parameter estimates. These tools include pivoted Cholesky or QR factorization for quasi Newton type optimization routines, and we direct the reader to a thorough development by Rosen et al. [198] which describes how to use trust-region methods to overcome many of the numerical problems of quasi- Newton methods.
In particular, a Gaussian factor graph solution when represented as the hidden Markov type model in Fig. 2-2 produces the same parameter estimate result as a Kalman filter. Solving over all variables produces a smoothing result, while solv- ing and marginalizing forward along the HMM produces exactly the same set of equations used for the Kalman filter.
Consider, for example, a robot exploring a building room to room, and that new information from a later room has little impact on the geometry of a previ- ously visited room. Kaess & Dellaert et al. [111] realized that a full batch solution over all variables does not have to be computed at each step, allowing for incre- mental quasi-Newton routines and leading to the iSAM1 algorithm [116]. The first key insight to incremental updates was to keep a triangular decomposition of the square root information matrix, new variables would add columns and measure- ments would add rows. The triangular nature of the augmented information ma- trix could be restored with Givens rotations (or Householder reflections) to modify only a small portion of the matrix and thereby recycling previous computations.
Maintaining the triangular component of the information matrix still had one lingering difficulty: linearization. The components in the matrix represent lin- earizations from the nonlinear measurement functions and would periodically be updated as state estimates changed. Further research into using graphical struc- tures similar to the elimination tree would result in development of the Bayes tree.
2.6.1
The Bayes Tree
During the late 1980’s, before the definition of factor graphs, Pearl’s seminal book [187] introduced Bayesian networks for graphical models in statistical inference. A Bayesian network is topographically similar to the factor graph, as it is assem- bled by eliminating one variable at a time from the graph. The Bayes network en- codes the conditional dependence structure and establishes the chordal property through implicit cliques. In turn, cliques can be discovered with the maximum car- dinality search algorithm from Tarjan et al. [222], producing an acyclic graphical model description, also known as a Junction tree.
Many methods have been developed to find an acyclic refactoring of variables in a cyclic graphical model. Koller et al. [127] point to cluster trees, rake-and- compress trees and bucket elimination trees. Alongside junction trees, the so called elimination tree from sparse linear algebra all basically represent the same desire of finding the acyclic conditional dependency structure between the vari- ables. Many different tree factorizations are possible, where the best trees have many cliques with small dimension each. While elimination strategies vary for each of the trees mentioned, the variable ordering in which the tree is assembled is a common and vital step to finding a good Bayes network and tree.
The Bayes tree by Kaess et al. [114] is a specific form of the Junction tree where a root clique is carefully selected, and where the conditional dependence struc- ture is directed from the root to the leaves cliques. The Bayes tree is assembled with a variable ordering obtained with the column approximate minimum degree ordering (COLAMD) algorithm developed by Davis et al. [39] published in 2004. Empirical study shows the variable ordering obtained from COLAMD, while be- ing a heuristic method, is within a few percent of the optimal ordering for a wide variety of cases. The COLAMD for variable ordering is the current best known method for finding the acyclic Bayes tree.
Algebraic operations and assembly of the Bayes tree is discussed in Section 5.3.1 as part of our development and defer the reader there for more detail. In broad terms, small, local pieces of the factor graph are grouped into cliques, in such a manner that an acyclic tree structure is formed. The Bayes tree represents a sym- bolic refactoring of the original factor graph model, from where an inference algo- rithm can consider operations at cliques level. A solution is found by ”combining” information from the outermost leaf cliques up towards the one root clique. Once a solution has been found at the root, the combined information is sent back down the tree towards the leaves to recover the full posterior state estimates.
from Kaess et al. [115], dense linear matrix operations can be used to locally solve portions of the information matrix. The entire Bayes tree represents how differ- ent dense portions of the triangular factor of the square root information factor is put together. The advantage of this approach approximations and estimates are local to measurement functions withing each clique rather than in batch over the entire information matrix. In iSAM2, this allows relinearization of measurement functions only in cliques that see larger shifts in parametric state estimates.
Furthermore, incremental updates to the entire system becomes more natu- ral on the Bayes tree structure. By forcing the most likely affected variables to be near the root of the Bayes tree, large parts of all the branches remain unaffected as new variables and factors are added to the factor graph. In turn, these unaffected branches can be ”unhooked” from the tree and be reattached after the root portion is re-eliminated with updated factor graph. Inference information only needs to passed up from the reattach point in the new tree, and downward passing only needs to propagate as far as meaningful updates to the states estimates are made – known as the wildfire algorithm.
Interestingly, Paskin et al. [185] had, in 2003, suggested a ”thin” junction tree approach to a HMM style filtering solution to EKF-SLAM. The approach also kept the dimension of the problem small by marginalizing out old states to achieve a fixed lag smoother type of operations. The major difference was that Paskin’s approach did not have the COLAMD algorithm to select a variable ordering from a factor graph definition.
The Bayes tree has a broader interpretation than a singular and pure Gaussian model which has been used in iSAM2, and continue the discussion of more general stochastic inference on the Bayes tree in Section 2.8.
2.6.2
Adding Inertial Measurements to Factor Graphs
As discussed earlier, inertial navigation plays a vital role in almost all autonomous system platforms in use today. SLAM research has lead us away from HMM-type inference algorithms such as the Kalman filter, favoring relative frame represen- tations captured by a factor graph model. The introduction of inertial odometry factors for general use of inertial navigation-type measurements is a core compe- tency for navigation and will be discussed in more detail in Chapter 4.
Indelman et al. [102] initially proposed to use individual inertial sensor mea- surements for odometry constraints in a factor graph, creating a new pose for each sensor measurement. However, the rapid increase in the number of poses makes it difficult to compute real-time solutions for large problems. Their later work [103]
adopted the preintegrals of Lupton [143], but again did not present an analytical version of the inertial sensor model they employed.
Martinelli [147] proposed a closed-from solution for visual-inertial odometry constraints, but only considered accelerometer biases during his derivation. While accelerometer bias certainly is an important error source, it is not the most signif- icant. Platform misalignment, which is predominantly driven by gyroscope bias, results in erroneous coupling of gravity compensation terms. This gravity mis- alignment, when integrated, is a dominant error source in all inertial navigation systems [229].
Recently Leutenegger et al. [137] published work on a visual inertial SLAM so- lution, which does indeed estimate bias terms. Their work presents an excellent overview of visual inertial systems, but does not present complete analytical mod- els for compensated interpose inertial constraints; their work does mention the need for compensation Jacobians, but are not presented.
Work by Forrester et al. [59] was conducted in parallel with this thesis work and similarly presents an exponential manifold type residual function for interpose constraints with retroactive sensor bias estimation, however, we are able to extend on their work. The interpose residual function, based on preintegrated inertial measurements, is currently unknown, and methods listed above use a linear, first degree approximation of the unknown compensation function.