• No results found

Hierarchical Bayesian Kalman Filter

4.3 Linear Dynamic Sparse Modelling

4.3.1 Hierarchical Bayesian Kalman Filter

Kalman filter [WB95] is the most common technique for tracking a dynamic signal fol- lowing the linear dynamic model. The conventional Kalman filter is based on the Gaus- sian assumption that both the measurement noise ηt and the system noise κt are cho-

sen as zero-mean Gaussian variables with known covariance, i.e. ηt ∼ N (0, σ2I) and

κt∼ N (0, Qt), where I is an identity matrix. Based on the Gaussian assumption, one has

p(wt|wt−1) = N (µt−1, Σt−1+ Qt) and p(yt|wt) = N (Θtµt, σ2I), where µt−1 and Σt−1

refer to the mean and covariance of signal w at time t − 1 respectively. To track the dy- namic signal w, the Kalman filter continuously alternates between two steps: prediction and update. The prediction step predicts the prior state of the signal wt by calculating

the parameters of p(wt|yt−1) while the update step evaluates posterior state of the signal

p(wt|yt) after observing the current measurement yt.

• Prediction step: p(wt|yt−1) = Z p(wt|wt−1)p(wt−1|yt−1)dwt−1 (4.4) = N (µt|t−1, Σt|t−1) (4.5) • Update step: p(wt|yt) = p(yt|wt)p(wt|yt−1) p(yt) (4.6) = N (µt|t, Σt|t) (4.7)

where the notation t|t − 1 means prediction at time instance t from the time instance t − 1. It can be shown that both distributions p(wt|yt−1) and p(wt|yt) are Gaussian. Given

the current measurement yt, the globally optimal estimate of wt can be determined by

using the linear minimum mean square error (LMMSE) estimator [Sch91] in the update step. LMMSE is the conditional expectation of wtgiven the measurement yt, where the

conditional expectation is assumed to be a simple linear function of yt, E{x|y} = Gtyt.

E{·} refers to the expected value and Gt∈ [0, 1]n×mwhich is called Kalman gain reflects

relative certainty of the measurements and current state estimate. A low gain makes the filter follow the predictions which are obtained in the prediction step more closely. In other words, a gain of zero causes the measurements to be ignored while gain of one causes the predictions to be totally ignored. In addition, the LMMSE provides the estimate of the full distribution of the signal wt so that Kalman filter can track the mean and the

covariance of the signal wtsimultaneously.

The conventional Kalman filter can provide the estimate of a signal wtwithout a sparsity

solution. To meet the sparsity constraint, Hierarchical Bayesian Kalman filter [FKDY13] which is derived from the principles behind the Kalman filter and Sparse Bayesian Learn- ing (SBL) is employed in my linear dynamic sparse modelling method for fMRI sequence reconstruction due to: 1) it uses the same model as described in Equations 4.1 and 4.3, 2) it tracks the mean and covariance of the image wtwhich is necessary for implementing

the measurement design process, 3) the employment of hyperparameters which is used to model the image variations promotes sparsity.

The main difference between the conventional Kalman filter and HB-Kalman is the dis- tribution placed on the variable κt in Equation 4.1. Figure 4.7 illustrates this difference

using their graphic models. The conventional Kalman filter places a Gaussian distribution on the variable κt, while in HB-Kalman the distribution of κtis no longer a Gaussian but a

hierarchical sparse distribution (Equation 4.2) which can promote the sparsity of the vari- able. The hierarchical sparse distribution which is constructed by a Gaussian distribution

(a) Kalman filter

(b) HB-Kalman

Figure 4.7: Comparison of Graphic Models for Kalman Filter and HB-Kalman.

κti ∼ N (0, α−1ti ) with a gamma distribution α −1

ti ∼ Γ(a, b) with i = 1...n is approximate

to a Student’s t distribution (detailed in Equation 4.2) that promotes a strong sparsity prop- erty [Tip01]. It also places an inverse Gamma distribution prior p(σ2|c, d) = Γ−12|c, d)

on the noise variance σ2 where c and d are set to small values. As opposed to the conven- tional Kalman filter where the covariance matrix Qt of κt as well as the noise variance

σ2 are predefined with the knowledge of system and measurement noise, the covariance

defined by A−1t where At = diag(αt) = diag([α1, . . . , αn]t) and the noise variance are

learnt online from the measurement yt and therefore an additional optimisation step is

Similar to the conventional Kalman filter, the prediction and update steps need to be per- formed at each time instance. In the prediction step, one has to evaluate:

µt|t−1= µt−1, Σt|t−1= Σt−1+ A−1t ,

yt|t−1= Θtµt|t−1, ye,t= yt− yt|t−1.

(4.8)

In the update step, one computes:

µt|t = µt|t−1+ Gtye,t, Σt|t = (I − GtΘt)Σt|t−1,

Gt= Σt|t−1Θ0t(σ2I + ΘtΣt|t−1Θ0t) −1;

(4.9)

where Gtis the Kalman gain at time t. Different from the standard Kalman filter, one has

to perform the additional step of learning the hyperparameters αtand σ2. From Equation

4.8 we get ye,t = Θtκt + ηt. Following the analysis in SBL [Tip01], maximising the

likelihood p(ye,t|αt) is equivalent to minimising the following cost function:

L(αt) = log det(Σαt) + y 0 e,tΣ −1 αtye,t, (4.10) where Σαt = σ 2I + Θ

tA−1t Θ0t and det(Σαt) refers to the determinant of it. The hyper-

parameter αt and σ2 can be directly estimated via the Expectation-Maximisation (EM)

algorithm, while it is impractical to process the high dimension functional MR images due to its low computational speed. Therefore, a fast algorithm [TFO03] (described in Section 3.1.3 in Chapter 3) is used. Notice that as log det(Σαt) is a convex function

while ye,t0 Σ−1αtye,t is a concave function, the sum of them (i.e. the cost function L(αt)) is

non-convex. Therefore the obtained estimate wt is generally suboptimal. However, this

suboptimal solution is proved to be very useful in practice [KLD13]. The HB-Kalman algorithm is summarised in Algorithm 4.3.1

Algorithm 4.3.1: HB-KALMAN(yt, Θt, µt−1, Σt−1) µt|t−1 ← µt−1; ye,t ← yt− Θtµt|t−1; αt, σ2 ← SBL (ye,t, Θt); Σt|t−1 ← Σt−1+ diag(α)−1; Gt← Σt|t−1Θ0t(σ2I + ΘtΣt|t−1Θ0t) −1; µt|t ← µt|t−1+ Gtye,t; Σt|t ← (I − GtΘt)Σt|t−1; return (µt|t, Σt|t)

comment: SBL is the optimisation algorithm described in [TFO03].