The Gaussian distribution remains the most widely used parametric model for the SLAM posterior. In SLAM as well as other probabilistic inference problems, one typically represents the Gaussian distribution in what is referred to as the standard form, in terms of its mean vector, µ, and covariance matrix, Σ. The popularity of this parametrization is largely due to the ability to track the distribution over time with the EKF. As we have discussed, however, the need to maintain correlations among the state, an implicit characteristic of the standard form, restricts the size of the map for feature-basedSLAM. In this section, we describe the details of the canonical parametrization for the Gaussian distribution, which we mentioned earlier in §2.3.2.
We show that this representation offers advantages over the standard form and, in subsequent chapters, we exploit these characteristics to achieve scalable, feature-based SLAM.
2.4.1 Canonical Gaussian Representation
We first present an alternative parametrization to a general Gaussian distribution and contrast this representation with the standard covariance form. We focus, in particular, on the duality between the two parametrizations in the context of the fundamental aspects of SLAMfiltering.
Let ξt be a random vector governed by a multivariate Gaussian probability dis-tribution, ξt∼ N¡µt, Σt¢, traditionally parametrized in full by the mean vector, µt, and covariance matrix, Σt. Expanding the quadratic term within the Gaussian ex-ponential, we arrive at an equivalent representation for the multivariate distribution,
2.4. Information Formulation toSLAM 47
N−1¡ηt, Λt¢.
p (ξt) = N¡ξt; µt, Σt
¢
∝ exp©−12(ξt− µt)⊤Σ−1t (ξt− µt)ª
= exp©−12 ¡ξ⊤t Σ−1t ξt− 2µ⊤t Σ−1t ξt+ µ⊤tΣ−1t µt¢ª
∝ exp©−12ξ⊤tΣ−1t ξt+ µ⊤t Σ−1t ξtª
= exp©−12ξ⊤t Λtξt+ η⊤t ξtª
∝ N−1¡ξt; ηt, Λt
¢ (2.17)
The canonical form of the Gaussian (2.17) is completely parametrized by the infor-mation matrix, Λt, and information vector, ηt, which are related to the mean vector and covariance matrix by (2.18).
Λt= Σ−1t (2.18a)
ηt= Σ−1t µt (2.18b)
Duality between Standard and Canonical Forms
The canonical parametrization for the multivariate Gaussian is the dual of the stan-dard form in regard to the marginalization and conditioning operations [113], as demonstrated in Table 2.1. Marginalizing over variables with the standard form is easy since we simply remove the corresponding elements from the mean vector and covariance matrix. However, the same operation in the canonical form involves cal-culating a Schur complement and is computationally hard. The opposite is true when computing the conditional from the joint distribution; it is hard with the standard form yet easy with the canonical parametrization.
The duality between the two parametrizations has important consequences for SLAM implementations as marginalization and conditioning are integral to the fil-tering process. The marginalization operation is fundamental to the time prediction step as part of the roll-up process (2.11). Measurement updates (2.13), meanwhile, implement a conditioning operation in order to incorporate new observation data in the distribution over the state. The duality between the two Gaussian parametriza-tions then helps to explain why time prediction is computationally easy/hard with a standard/canonical parametrizations while measurement updates are hard/easy.
The quadratic complexity of measurement updates is implicit to the standard form and contributes of the EKF’s scalability problem. However, the subsequent chapters demonstrate the ability to exploit the structure of the information matrix in order to make what is otherwise a hard marginalization operation easy in the canonical form. Consequently, both the measurement update and time prediction components of SLAMinformation filters can be made to better scale with the size of the environ-ment.
Table 2.1: Summary of the marginalization and conditioning operations on a Gaussian distribution expressed in the covariance and information forms.
p (α, β) = N¡h
Throughout the thesis, we take advantage of the graphical model [63] representa-tion of the SLAM distribution to better understand the estimation process. This is particularly true in the case of the information form of the Gaussian, as we use the graphical model to motivate novel filtering algorithms. An advantageous property of the canonical parametrization is that the information matrix provides an explicit rep-resentation for the structure of the corresponding undirected graph or, equivalently, theGMRF[122,113]. This property follows from the factorization of a general Gaus-sian density
are the node and edge potentials, respectively, for the corresponding undirected graph.
Random variable pairs with zero off-diagonal elements in the information matrix (i.e.
λij = 0) have an edge potential Ψij(ξi, ξj) = 1, signifying the absence of a link be-tween the nodes representing the variables. Conversely, non-zero shared information
2.4. Information Formulation toSLAM 49
Figure 2-1: An example of the effect of marginalization on the Gaussian information matrix. We start out with a joint posterior over ξ1:6 represented by the information matrix and corresponding Markov network pictorialized on the left. The information matrix for the marginalized density, p (ξ2:6) = R p (ξ1:6) dξ1, corresponds to the Schur complement of Λββ = Λξ1ξ1 in Λξ1:6ξ1:6. This calculation essentially passes information constraints from the variable being removed, ξ1, onto its adjacent nodes, adding shared information between these variables. We see, then, that a consequence of marginaliza-tion is the populamarginaliza-tion of the informamarginaliza-tion matrix.
indicates that there is an edge joining the corresponding nodes with the strength of the edge proportional to λij. In turn, as the link topology for an undirected graph explicitly captures the conditional dependencies among variables, so does the struc-ture of the information matrix. The presence of off-diagonal elements that are equal to zero then implies that the corresponding variables are conditionally independent given the remaining states.
It is interesting to note that one comes to the same conclusion from a simple analy-sis of the conditioning operation for the information form. Per Table2.1, conditioning a pair of random variables, α = [ξ⊤i ξ⊤j]⊤, on the remaining states, β, involves extract-ing the Λαα sub-block from the information matrix. When there is no shared infor-mation between ξi and ξj, Λαα is block-diagonal, as is its inverse (i.e. the covariance matrix). Conditioned upon β, the two variables are uncorrelated, and we can con-clude that they are conditionally independent:9 p¡ξi, ξj | β¢ = p (ξi | β) · p¡ξj | β¢.
The fact that the information matrix characterizes the conditional independence re-lationships emphasizes the significance of its structure.
With particular regard to the structure of the information matrix, it is impor-tant to make a distinction between elements that are truly zero and those that are just small in comparison to others. On that note, we return to the process of marginalization, which modifies zeros in the information matrix, thereby destroying some conditional independencies [113]. Consider a six-state Gaussian random vec-tor, ξ ∼ N−1¡η, Λ¢, characterized by the information matrix and GMRF depicted in the left-hand side of Figure 2-1. The canonical form of the marginal density
9This equality holds for Gaussian distributions but is, otherwise, not generally valid.
p (ξ2:6) = R p (ξ1:6) dξ1 = N−1¡η′, Λ′¢ follows from Table2.1with α = [ξ2 ξ3 ξ4 ξ5 ξ6]⊤ and β = ξ1. The correction term in the Schur complement, ΛαβΛ−1ββΛβα, is non-zero only at locations associated with variables directly linked with ξ1. This set, denoted as m+ = {ξ2, ξ3, ξ4, ξ5}, comprises the Markov blanket [114] for ξ1. Subtracting the correction matrix modifies a number of entries in the Λαα information submatrix, including some that were originally zero. Specifically, while no links exist between ξ2:5 in the original distribution, the variables in m+ become fully connected due to marginalizing over ξ1. Marginalization results in the population of the information matrix, a characteristic that has important consequences when it comes to applying the information form to feature-based SLAM.