This section discusses work on stochastic inference which was done outside the robotic or guidance and control communities. Techniques already developed in the statistical inference literature will be used in our algorithm development. The discussion presented in this section is by no means complete, but represents a bridging between two research communities.
2.8.1
Belief Propagation Methods
Beyond the robotics community, Cowell [37] discusses how Gaussian likelihoods, using the graphical model as guide, propagate information between variables on the graph. For example, Cowell shows the triples method passing three parametric numbers between variables and shows how a complete posterior belief for each variable can be constructed.
Loopy belief propagation by Ranganathan and Dellaert [194] transmit belief directly on the factor graph. Their method still assumes that all posterior distribu- tions have a Gaussian distribution, but does show that methods other than linear algebra optimization are possible. Loopy belief propagation requires repeated it- eration of message passing across the entire network which contains loops, in a seemingly random pattern, with little guarantee that an acceptable solution will emerge. In robotic navigation, loop closures introduce cycles in the factor graph.
The expectation propagation algorithm [152] sends summarizing statistics as messages on the factor graph structure. For example, the first two or three statis- tical moments of the posterior can be propagated across the network, rather than sending an estimate an entire density function. This option may not be well suited for drawing accurate metric solutions from multi-modal distributions, since two
modes imply two different solutions are possible. The first moment of a distribu- tion is its mean, which would incorrectly collapse distinct possibilities into a single parametric value between the modes.
The work of Kuehnel [130] in 2004 investigated Bayesian sampling-type infer- ence techniques with focus on posterior estimation for structure from motion. His work illustrates that non-parametric representations can be used for localization and mapping type approaches. Our work will pursue approximating the poste- rior distribution of variables and will avoid cyclic graph inference by propagating belief on the Bayes tree instead.
2.8.2
Belief Propagation on the Junction Tree
Given the analytic appeal of acyclic re-factorizations, and the desire for fully fledged posterior belief estimation for each variable in the system, we should ex- pect to find existing techniques in literature that have explored this approach. Indeed the CHURCH algorithm, from the mid 1990’s in statistical inference by Kjaerulff [125] perform belief propagation type inference on the junction tree. Pre- vious work on belief propagation have mostly avoided using tree representations due to the increase of dimensionality in cliques. Advent of the Bayes tree, and improvements of technology help us address three existing difficulties.
The CHURCH algorithm was developed before the COLAMD algorithm [39] existed and suffered from poorer variable orderings for larger problem sizes. The improved variable ordering of COLAMD produces lower dimension cliques in the Bayes tree, which greatly reduces the exponential complexity for each clique.
The second problem area is working with nonparametric beliefs. The CHURCH algorithm was developed before the multi-scale Gibbs sampling algo- rithm from Sudderth & Ihler et al. [218]. Rather than trying to deal with how parametric models interact in a junction tree, as CHURCH requires, any continu- ous belief can be approximated with a kernel density estimate, and then use the generalized multi-scale Gibbs product as suggested by Sudderth and Ihler.
The last aspect is driven by technology and computational power. The CHURCH algorithm did not have access to greatly enhanced computational per- formance, or more powerful programming languages such as Julia [19] which we use almost exclusively in our development.
2.8.3
Markov Chain Monte Carlo (Sampling)
Markov Chain Monte Carlo (MCMC) estimation of posterior distributions has re- ceived considerable attention, dating back to work by Gibbs and Markov type pro- cesses in the early 1900’s. These statistical inference methods were mostly used for distribution estimation in the physics community, to infer hidden parameters with non-Gaussian distribution types or higher levels of uncertainty. As we discussed earlier, parametric optimization-type systems are difficult to solve for nonsingular systems.
Metropolis & Hastings et al. [31] connected Komolgorov’s criteria for reversibil- ity with Markov chain processes with an expression called detailed balance. De- tailed balance ensures that a reversible Markov chain has a stationary distribution, and that a balanced sampling scheme stochastically produce samples from the em- bedded stationary distribution. The hybrid (Hamiltonian) Monte Carlo scheme was devised during the 1980’s by Duane [47] and further explained by [86, 91, 167] is the best know mechanism to conduct truly non-parametric posterior inference. Oh et al. [178] points out many of the MCMC [165] methods have been focused on unimodal probability densities, with difficulty exploring multi-modal posteriors.
The multi-modal problem complexity is dramatically inflated as the dimen- sion of the posterior increases. Starting in the 1990’s, many authors have proposed approaches, such as tempered sampling or multi-scale methods, to both improve speed of MCMC sampling and effectiveness in finding all modes. Latent variables introduce a further layer of complexity in the sampling process, but are funda- mentally part of any SLAM solution and our approach must therefore be able to cater for this case. To this end, our approach will rely on consistency of likelihood measurement models for wide mode proposal.
Metropolis-Hastings sampling of multi-modal posteriors has proved exceed- ingly difficult, and several works by Langevin [7], Neal’s tempered transitions [166] and a few others [58, 74] have been proposing methods for circumvent the mode discovery problem. Further samplers, including importance sampling using the
mixture importance function [178] and Stratified importance sampling have also
been proposed. The Avramidis AISDE importance sampler, Wang-Landou sam- pler [131] or many of the Equi-Energy sampler adaptations, such as [13], are worth mentioning. These samplers all intend to improve multi-modal performance but still have limitations in discovering removed modes in the posterior distribution.
The usefulness of Gibbs sampling was popularized in 1984 with a image recon- struction algorithm from Geman and Geman [72], and shows that Gibbs sampling can be affectively used for higher dimensional problems. Gibbs sampling uses the
actual conditional beliefs, assuming they are available, as proposal distributions in the sampling process. This direct use of the user-defined conditional beliefs allows such likelihoods to introduce multiple modes into the proposal distributions.
The imputation method from Tanner and Wong [221] in 1987 further expanded the notion of Gibbs sampling to include approximation of the belief functions themselves using only the current states and user-defined models. Tanner and Wong were able to show the accurate recovery of a bi-modal posterior distribution and rapid convergence rates, although the probability masses were still relatively closely spaced. Their work showed that function approximation could be done us- ing a Gibbs sampling, and was furthered by Gelfand et al. [71] and Celeux et al. [29] during the 1990’s. All these methods showed several variations to the Gibbs-type sampling strategies produce high quality results.
A powerful generalization of the Gibbs sampling scheme was presented by Sudderth and Ihler et al. [217, 218] in 2003 and 2010. Their approach uses a multi- scale Gibbs sampling scheme [101] to estimate the product between approximate and multi-modal belief functions. Coarser scales are used to explore the entire space between modes, which is then refined down to the actual belief in at the finest scale. In particular their approach is built around kernel density estimation for continuous belief approximations, see Silverman [208], and is a major part of our development. Please see a more detailed discussion in Section 5.5.1.
Other methods such as progressive Bayes by Schrempf et al. [204], or repro- ducing kernel Hilbert spaces presented by Smola, Song, Fukumizu, and Gret- ton [69, 77, 213, 214] indicate active research in nonparametric inference methods. Work in kernel Hilbert spaces are promising, since much of the theoretical ground work has already been developed, see Nashed et al. [164] from 1991 as one exam- ple. As well as more recent work on developing samplers directly in the embedded feature space, see Beskos [17].