4.3 Linear Dynamic Sparse Modelling
4.3.1 Hierarchical Bayesian Kalman Filter
Kalman filter [WB95] is the most common technique for tracking a dynamic signal fol- lowing the linear dynamic model. The conventional Kalman filter is based on the Gaus- sian assumption that both the measurement noise ηt and the system noise κt are cho-
sen as zero-mean Gaussian variables with known covariance, i.e. ηt ∼ N (0, σ2I) and
κt∼ N (0, Qt), where I is an identity matrix. Based on the Gaussian assumption, one has
p(wt|wt−1) = N (µt−1, Σt−1+ Qt) and p(yt|wt) = N (Θtµt, σ2I), where µt−1 and Σt−1
refer to the mean and covariance of signal w at time t − 1 respectively. To track the dy- namic signal w, the Kalman filter continuously alternates between two steps: prediction and update. The prediction step predicts the prior state of the signal wt by calculating
the parameters of p(wt|yt−1) while the update step evaluates posterior state of the signal
p(wt|yt) after observing the current measurement yt.
• Prediction step: p(wt|yt−1) = Z p(wt|wt−1)p(wt−1|yt−1)dwt−1 (4.4) = N (µt|t−1, Σt|t−1) (4.5) • Update step: p(wt|yt) = p(yt|wt)p(wt|yt−1) p(yt) (4.6) = N (µt|t, Σt|t) (4.7)
where the notation t|t − 1 means prediction at time instance t from the time instance t − 1. It can be shown that both distributions p(wt|yt−1) and p(wt|yt) are Gaussian. Given
the current measurement yt, the globally optimal estimate of wt can be determined by
using the linear minimum mean square error (LMMSE) estimator [Sch91] in the update step. LMMSE is the conditional expectation of wtgiven the measurement yt, where the
conditional expectation is assumed to be a simple linear function of yt, E{x|y} = Gtyt.
E{·} refers to the expected value and Gt∈ [0, 1]n×mwhich is called Kalman gain reflects
relative certainty of the measurements and current state estimate. A low gain makes the filter follow the predictions which are obtained in the prediction step more closely. In other words, a gain of zero causes the measurements to be ignored while gain of one causes the predictions to be totally ignored. In addition, the LMMSE provides the estimate of the full distribution of the signal wt so that Kalman filter can track the mean and the
covariance of the signal wtsimultaneously.
The conventional Kalman filter can provide the estimate of a signal wtwithout a sparsity
solution. To meet the sparsity constraint, Hierarchical Bayesian Kalman filter [FKDY13] which is derived from the principles behind the Kalman filter and Sparse Bayesian Learn- ing (SBL) is employed in my linear dynamic sparse modelling method for fMRI sequence reconstruction due to: 1) it uses the same model as described in Equations 4.1 and 4.3, 2) it tracks the mean and covariance of the image wtwhich is necessary for implementing
the measurement design process, 3) the employment of hyperparameters which is used to model the image variations promotes sparsity.
The main difference between the conventional Kalman filter and HB-Kalman is the dis- tribution placed on the variable κt in Equation 4.1. Figure 4.7 illustrates this difference
using their graphic models. The conventional Kalman filter places a Gaussian distribution on the variable κt, while in HB-Kalman the distribution of κtis no longer a Gaussian but a
hierarchical sparse distribution (Equation 4.2) which can promote the sparsity of the vari- able. The hierarchical sparse distribution which is constructed by a Gaussian distribution
(a) Kalman filter
(b) HB-Kalman
Figure 4.7: Comparison of Graphic Models for Kalman Filter and HB-Kalman.
κti ∼ N (0, α−1ti ) with a gamma distribution α −1
ti ∼ Γ(a, b) with i = 1...n is approximate
to a Student’s t distribution (detailed in Equation 4.2) that promotes a strong sparsity prop- erty [Tip01]. It also places an inverse Gamma distribution prior p(σ2|c, d) = Γ−1(σ2|c, d)
on the noise variance σ2 where c and d are set to small values. As opposed to the conven- tional Kalman filter where the covariance matrix Qt of κt as well as the noise variance
σ2 are predefined with the knowledge of system and measurement noise, the covariance
defined by A−1t where At = diag(αt) = diag([α1, . . . , αn]t) and the noise variance are
learnt online from the measurement yt and therefore an additional optimisation step is
Similar to the conventional Kalman filter, the prediction and update steps need to be per- formed at each time instance. In the prediction step, one has to evaluate:
µt|t−1= µt−1, Σt|t−1= Σt−1+ A−1t ,
yt|t−1= Θtµt|t−1, ye,t= yt− yt|t−1.
(4.8)
In the update step, one computes:
µt|t = µt|t−1+ Gtye,t, Σt|t = (I − GtΘt)Σt|t−1,
Gt= Σt|t−1Θ0t(σ2I + ΘtΣt|t−1Θ0t) −1;
(4.9)
where Gtis the Kalman gain at time t. Different from the standard Kalman filter, one has
to perform the additional step of learning the hyperparameters αtand σ2. From Equation
4.8 we get ye,t = Θtκt + ηt. Following the analysis in SBL [Tip01], maximising the
likelihood p(ye,t|αt) is equivalent to minimising the following cost function:
L(αt) = log det(Σαt) + y 0 e,tΣ −1 αtye,t, (4.10) where Σαt = σ 2I + Θ
tA−1t Θ0t and det(Σαt) refers to the determinant of it. The hyper-
parameter αt and σ2 can be directly estimated via the Expectation-Maximisation (EM)
algorithm, while it is impractical to process the high dimension functional MR images due to its low computational speed. Therefore, a fast algorithm [TFO03] (described in Section 3.1.3 in Chapter 3) is used. Notice that as log det(Σαt) is a convex function
while ye,t0 Σ−1αtye,t is a concave function, the sum of them (i.e. the cost function L(αt)) is
non-convex. Therefore the obtained estimate wt is generally suboptimal. However, this
suboptimal solution is proved to be very useful in practice [KLD13]. The HB-Kalman algorithm is summarised in Algorithm 4.3.1
Algorithm 4.3.1: HB-KALMAN(yt, Θt, µt−1, Σt−1) µt|t−1 ← µt−1; ye,t ← yt− Θtµt|t−1; αt, σ2 ← SBL (ye,t, Θt); Σt|t−1 ← Σt−1+ diag(α)−1; Gt← Σt|t−1Θ0t(σ2I + ΘtΣt|t−1Θ0t) −1; µt|t ← µt|t−1+ Gtye,t; Σt|t ← (I − GtΘt)Σt|t−1; return (µt|t, Σt|t)
comment: SBL is the optimisation algorithm described in [TFO03].