3.3 Component Adaptation
3.4.2 Adaptive Gaussian Mixture Filter
The estimation error of Gaussian mixture filters significantly depends on the number of Gaussian components used. This number is typically defined by the user. The novel adaptive Gaussian mixture filter (AGMF) depicted in Figure 3.3 on the next page adapts the number of components dynamically and on-line and thus, directly tackles the refinement step of Algorithm 3. The nonlinear system and measurement models are lin-earized locally by means of statistical linear regression (see Section 2) at each component of the Gaussian mixture. The induced linearization error is quantified by means of the linearization error covariance matrix (2.28).
Based on this error, a novel moment-preserving splitting procedure is pro-posed for introducing new mixture components. The component causing
106 3 Gaussian Mixture Filtering
Linearization stop?
Splitting
Estimation Reduction
k → k − 1
5 f¡x0¢ 3
f¡xk¢ f¡xk−1¢
zk
Figure 3.3: Flow chart of the adaptive Gaussian mixture filter.
the highest linearization error is selected, while splitting is performed in direction of the strongest nonlinearity, i.e., the strongest deviation between the nonlinear model and its linearized version. Both lineariza-tion and splitting are independent of the used statistical linear regression method, which makes the proposed filter versatilely applicable.
Component Selection
A straightforward way to select a Gaussian component for splitting is to consider the weightsωi, i = 1...L. The component with the highest weight is then split. This however does not take the nonlinearity of g ( · ) in the support of the selected component into account. Since linearization is performed individually and locally, a more reasonable selection would be to consider also the induced linearization error of each component. For this purpose, statistical linear regression already provides an appropriate measure of the linearization error in form of the covariance matrix Ce
in (2.28).
In order to easily assess the linearization error in the multi-dimensional case, the trace operator is applied to Ce, which gives the measure
ε = Tr(Ce) ∈ [0,∞) . (3.19)
3.4 Contributions 107
x ε,g
1 2
1
-1
(a) g (x) = cos(x)
x ε,g
-1
-3 1 3
1
(b) g (x) = exp¡−x2¢
Figure 3.4: Two examples of the linearization error quantification by means of the measures (3.19). The solid lines correspond to the function g (.), the dashed lines to the measure (3.19), and the dotted lines indicate linearized versions of g (.)
Geometrically speaking, the trace is proportional to circumference of the covariance ellipsoid corresponding to Ce. The larger Ceand thus the linearization error, the larger isε. Conversely, the trace is zero, if and only if Ce is the zero matrix, i.e.,ε = 0 ⇔ Ce= 0. Hence, (3.19) is only zero, when there is no linearization error.
Example 13: Quantified Linearization Error
The error measure in (3.19) is applied to both scalar functions g (x) = cos(x) and g (x) = exp¡−x2¢ ,
where x ∼N¡µ,1¢ has constant variance and the mean is moved along the x-axis. The corresponding linearization error values for both functions are depicted in Figure 3.4. The measure approaches zero whenever the function g (.) is almost linear. This is the case for cos(x) if x = 0 and for exp¡−x2¢ if x = ±p2/2and x → ±∞.
108 3 Gaussian Mixture Filtering
Besides the linearization error, the contribution of a component to the nonlinear transformation is important as well. That is, the probability mass of the component, which is given by its weightωi, has also to be taken into account. This avoids splitting of irrelevant components. Both ingredients are combined in the so-called selection criterion
i∗= arg max
i =1...L{si} , si,ωγi·¡1 − exp(−εi)¢1−γ
(3.20) Here, the term 1 − exp(−εi) normalizes the linearization measure (3.19) to the interval [0,1]. For a geometric interpolation between weight and linearization error of component i , the parameterγ ∈ [0,1] used. With γ = 0, selecting a component for splitting only focuses on the linearization error, whileγ = 1 considers the weight only.
Splitting
Once a component causing a large linearization error is identified, new components are introduced by means of splitting the identified com-ponent. As mentioned above, splitting is an ill-posed problem due to the high number of free parameters. To reduce the degrees of freedom, splitting is performed given the following constraints:
1. Along principal axes of a Gaussian: This reduces the splitting prob-lem of a multivariate Gaussian to the one-dimensional case. Fur-thermore, splitting becomes computationally cheap and numeri-cally stable compared to arbitrary splitting directions.
2. Splitting into two components: Allows trading the reduction of the linearization error off against the increase of mixture components.
3. Symmetric around the mean: This minimizes the error introduced by splitting a Gaussian.
4. Moment preserving: The mean vector and covariance matrix of the split Gaussian and thus, of the entire mixture remains unchanged.
3.4 Contributions 109
The restriction to split a Gaussian into two components is no limitation.
As splitting is performed recursively by the AGMF within a time step, the newly introduced components can be split again in the next rounds if the linearization error is still too high.
Letω·N(x;µ,C) be the component considered for splitting. It is replaced by two components according to (3.13) with parameters
ω1= ω2=ω2 , µ1= µ +p
λ·α·v , µ2= µ −p λ·α·v , C1= C2= C − λ · α · v vT,
(3.21)
whereα ∈ [−1,1] is a free parameter. The parametrization in (3.21) ensures moment preservation, i.e., the original Gaussian component and its split counterpart have the same mean and covariance. Furthermore,λ and v in (3.21) are a particular eigenvalue and eigenvector, respectively, of C.
Splitting Direction
What remains an open question is the selection of an appropriate eigen-vector for splitting. A straightforward choice might be the eigeneigen-vector with the largest eigenvalue as in [64, 86]. But since (3.20) determines the Gaussian component that causes the largest linearization error, merely splitting along the eigenvector with the largest eigenvalue does not take this error into account.
The key idea of the proposed criterion is to evaluate the deviation be-tween the nonlinear transformation g (.) and its linearized version (2.26) along each eigenvector. The eigenvector with the largest deviation is then considered for splitting, i.e., the Gaussian is split in direction of the largest deviation in order to cover this direction with more Gaussians, which will reduce the error in subsequent linearization steps.
110 3 Gaussian Mixture Filtering
By means of the error term (2.27), the desired criterion for the splitting direction is defined as
dl , Z
R
e¡xl(ν)¢T· e¡xl(ν)¢·N¡xl(ν);µ,C¢dν (3.22)
with xl(ν),µ + ν·vl, l = 1...nx, and vl being the l th eigenvector C. The integral in (3.22) cumulates the squared deviations along the l th eigen-vector under the consideration of the probability at each point xl(ν). The eigenvector that maximizes (3.22) is then chosen for splitting. Unfortu-nately, due to the nonlinear transformation g ( · ) this integral cannot be solved in closed-form in general. For an efficient and approximate so-lution, the sample point calculation schemes described in Section 2.2.5 are employed to approximate the Gaussian in (3.22) in direction of vl by means of Dirac delta distributions. This automatically leads to a dis-cretization of the integral at a few but carefully chosen points.
Splitting Termination
As indicated in Figure 3.3, in every splitting round a stopping criterion is evaluated. Splitting stops, if at least one of the three following thresholds is reached:
Error threshold: The value siin the selection criterion (3.20) drops below smax ∈ [0,1] for every component.
Component threshold: The number of mixture components excels Lmax. Deviation threshold: The deviation between the original mixture f¡x¢
and the mixture obtained via splitting ˜f¡x¢ excels dmax ∈ [0,1].
The deviation considered for the latter threshold is determined by means of the normalized version of the ISD according to
D¯¡ f ¡x¢, ˜f¡x¢¢, D¡ f (x), ˜f (x)
¢
R f (x)2dx +R f (x)˜ 2dx ∈ [0,1] . (3.23)
3.4 Contributions 111
Since splitting always introduces an approximation error to the original mixture, continuously monitoring the deviation limits this error.
Example 14: Shape Approximation
In order to demonstrate the effectiveness of splitting in direction of the strongest nonlinearity, the nonlinear growth process
y = g¡x¢ =ξ
2+ 5 · ξ 1 + ξ2+ w
adapted from [102] is considered, where x = [ξ w]T∼N¡[1,0]T, I2¢.
To approximate the density of y , the Gaussian f (x) is split recursively into a Gaussian mixture, where the number of components is always doubled until a maximum of 64 components is reached. No mixture reduction and no thresholds smax, dmaxare used. The true density of y is calculated via numerical integration.
Two different values for the parameterγ of the selection criterion (3.20) are used:γ = 0.5, which makes no preference between the com-ponent weight and the linearization error andγ = 1, which considers the weight only. Furthermore, a rather simple selection criterion is considered for comparison, where selecting a Gaussian for splitting is based on the weights only (as it is the case forγ = 1), while the splitting is performed in direction of the eigenvector with the largest eigenvalue.
Table 3.2 shows the KLD between the true density of y and the ap-proximations obtained by splitting. The apap-proximations of the pro-posed splitting scheme are significantly better than the approxima-tions of the largest eigenvalue scheme. This follows from the fact that the proposed scheme not only considers the spread of a component.
It also takes the linearization errors into account. In doing so, the Gaussians are always split along the eigenvector corresponding toξ, since this variable is transformed nonlinearly, while w is not. This is
112 3 Gaussian Mixture Filtering
Table 3.2: Approximation error (KLD × 10) for different splitting schemes and numbers of components.
splitting number of Gaussians
scheme 1 2 4 8 16 32 64
max. eigenvalue 2.01 0.77 0.64 0.47 0.39 0.21 0.26 γ = 1 2.01 0.77 0.59 0.34 0.20 0.12 0.07 γ = 0.5 2.01 0.77 0.40 0.22 0.07 0.03 0.02
different for the largest eigenvalue scheme, which wastes nearly half of the splits on w .
The inferior approximation quality forγ = 1 compared to γ = 0.5 results from splitting components, which may have a high impor-tance due to their weight but which do not cause severe linearization errors. Thus, splitting these components will not improve the ap-proximation quality much.
In Figure 3.5, the approximate density of y is depicted for different numbers of mixture components forγ = 0.5. With an increasing number of components, the approximation approaches the true density very well.