Kalman Filter: A Simple Derivation

(1)

Kalman Filter: A Simple Derivation

Ali M. Mosammam

Department of Statistics, University of Zanjan, Zanjan, Iran ∗_{Corresponding Author: [email protected]}

Abstract

The Kalman ﬁlter is a recursive estimator and plays a fundamental role in statistics for ﬁltering, prediction and smoothing. The key element in any recursive estimator is the estimate of the current state, xk, at timek, based on observations up to and including

observationkand the Kalman ﬁlter enables the estimate of the state to be updated as new observations become available. In this paper we have tried to derive the Kalman ﬁlter as simple as possible.

Keywords

Kalman Filter, State-space Model, Dynamic System, Gaussian Process

1 Introduction

The analysis of a time series [5] and spatial analysis [2] can be done in one of two ways. Firstly, it could be analysed in a batch. In other words, the whole time se-ries to date is analysed. Secondly, analysis of the time series sequentially. With this approach the current esti-mate of the state of the system is updated over time as new measurements are made. Whilst both methods will give the same answer, the sequential method has two advantages:

1. reduction in the computational cost 2. reduction in the storage capacity.

The disadvantage of sequential methods is appeared when the posterior can not be calculated exactly. In this paper we will concentrate on the state-space approach to modelling dynamic systems. In order to analyze a dynamic system, at least two models are required:

• First, a model describing the evolution of the state with time (the state model)

• Second, a model relating the noisy measurements to the state (the measurement model).

In the Bayesian approach to dynamic state estimation, one attempts to construct the posterior probability den-sity function of the state based on all available informa-tion, including the set of received measurements. For many problems, an estimate is required every time that a measurement is received. In this case, a recursive ﬁlter

is a convenient solution. A recursive filtering approach means that received data can be processed sequentially rather than as a batch so that it is not necessary to store the complete data set nor to reprocess existing data if a new measurement becomes available. There are classes of problems for which the recoursive Bayesian solutions are tractable. The most important of these classes is the set of problems where the state and observation equa-tions are linear, and the distribuequa-tions of the prior, and observation and state noise are Gaussian. In this case no algorithm can ever do better than a Kalman filter [4]. In this paper we have tried to derive the Kalman filter properly. If the state space model is linear with uncor-related Gaussian noise and a Gaussian prior, the prior and posterior distributions at each time step are them-selves Gaussian random variables. A Gaussian random variable’s distribution is uniquely defined by the speci-fication of its mean and variance. The Kalman filter is a recursive estimator and plays a fundamental role in statistics for filtering, prediction and smoothing. The key element in any recursive estimator is the estimate of the current state , xk, at time k, based on observations

up to and including observationkand the Kalman ﬁlter enables the estimate of the state to be updated as new observations become available.

2 Basic model

The basic model in order to derive the Kalman ﬁlter is described by linear, discrete-time, ﬁnite-dimensional state-space equations

xk+1 = axk+wk, k≥0 (state equation) (1)

yk = cxk+vk (measurement model)(2)

Here we shall make the following assumptions:

1. {vk} and {wk} are individually independent,

zero-mean, Gaussian processes with known variances. Ifwk∼N(0, σw2), then we haveE(wk) = 0, var(wk) =

σw2 andE(wkwℓ) =E(wk)E(wℓ) = 0for allk̸=ℓ.

If vk ∼ N(0, σv2), we have E(vk) = 0, var(vk) =

E(v_k2) =σv2 andE(vkvℓ) = 0for allk̸=ℓ.

2. {vk} and {wk} are independent processes. Hence

(2)

3. At the initial time, k = 0, the initial state, x0 ∼

N(˜x0, P0), further we shall assume that x0 is

inde-pendent ofvk andwk for anyk. Hence

(a) E(xkvℓ) = 0for all k,ℓ.

(b) E(xkwℓ) = 0for all k,ℓ.

This state-space model has some properties:

1. Sincexkis a linear combination of the jointly

Gaus-sian random variables x0, w0, w1,· · ·, wk−1, it is a

Gaussian variable. 2. {xk}is a Markov process.

3. {yk}is a Gaussian process.

4. {xk}and{yk}are jointly Gaussian.

We have made these assumptions in order to derive the Kalman ﬁlter easily. Some of these assumptions can be relaxed in practice.

3 Scalar Kalman filter

We have already introduced the basic models (1, 2) to derive the Kalman ﬁlter. We want to show that under some assumptions conditional expectation yields a set of recursive equations which is known the Kalman ﬁlter re-cursions. Here we shall denote the following quantities: • xˆk|k=E(xk|y0,· · ·, yk)is the estimate of the current

state , xk, at time k, based on observations up to

and including observationk.

• xˆk|k−1 =E(xk|y0,· · ·, yk₋1) is the predictor of the

current state ,xk, at timek, based onk−1

obser-vations.

• Σ_k_|_k =var(xk|y0,· · ·, yk) is the variance of the

cur-rent state ,xk, at timek, based on observations up

to and including observationk.

• Σ_k_|_k₋₁=var(xk|y0,· · ·, yk−1) is the variance of the

current state , xk, at time k, based on ﬁrst k−1

observations.

We might interested in prediction ofE(yk|y0,· · ·, yk−1).

Similarly notations can be established for observation equation. From (2)and the third assumption it can be written

[

x0

y0 ]

∼N2 ([

˜

x0

cx˜0 ]

,

[

P0 cP0

cP0 c2P0+σv2

])

.

Hence the conditional distribution takes the form x0|y0 ∼ N(ˆx0|0,Σ0|0), which is completely speciﬁed by

following conditional mean and variance:

ˆ

x0_|0=E(x0|y0) = ˜x0+_c2_PcP₀₊0_σ2 v

(y0−cx˜0),

Σ0|0=var(x0|y0) =P0− (cP0) 2

c2_P 0+σ2v.

Using state-space equation (1, 2) fork= 0, we get x1=ax0+w0

y1=cx1+v1.

Using,

[

x1

y0 ]

∼N2 ([

a˜x0

cx˜0 ]

,

[

a2P0+σw2 acP0

acP0 c2P0+σ2v

])

[

y1

y0 ]

∼N2 ([

acx˜0

cx˜0 ]

,

[

c2(a2P0+σ2w) +σv2 ac2P0

ac2P0 c2P0+σ2v

])

,

the conditional distributions are themselves normal and completely speciﬁed by conditional means and vari-ances

x1|y0 ∼ N(ˆx1_|0,Σ1_|0)

ˆ

x1|0 = ax˜0+

acP0

c2_P 0+σv2

(y0−cx˜0) =axˆ0|0

Σ1_|0 = a2P0+σw2 −

(acP0)2

c2_P 0+σ2v

=a2Σ0_|0+σ2w

y1|y0 ∼ N(cxˆ1|0, c 2

Σ₁_|₀+σ2v)

E(y1|y0) = acx˜0+ ac 2

P0

c2_P 0+σv2

(y0−cx˜0) =cxˆ1_|0

var(y1|y0) = c2(a2P0+σw2) +σw2 −

(ac2P0)2

c2_P 0+σv2

=c2Σ₁_|₀+σ2w.

Furthermore,

[

x1

y1 ]

|y0∼N2 ([

ˆ

x1_|0

cxˆ₁_|₀

]

,

[

Σ1_|0 cΣ1_|0

cΣ₁_|₀ c2Σ₁_|₀+σ2v

])

.

Hencex1|y0, y1∼N(ˆx1|1,Σ1|1), where its mean and

vari-ance are

ˆ

x₁_|₁=E(x1|y0, y1) = ˆx1|0+

cΣ₁_|₀

c2_Σ 1|0+σv2

(y1−cxˆ1|0)

Σ1_|1=var(x1|y0, y1) = Σ1_|0−

(cΣ1|0)2

c2_Σ 1|0+σ2v.

Let we have repeated this method and found xk−1|y0,· · ·, yk−1 ∼ N(ˆxk₋1_|k₋1,Σk₋1_|k₋1), using state

(1,2)we can update equations by

xk =axk−1+wk−1 (state equation)

yk−1=cxk−1+vk−1 (measurement model).

Hence the conditional distributions are xk|y0,· · ·, yk−1∼N(ˆxk|k−1,Σk|k−1)

yk|y0,· · ·, yk₋1∼N(cxˆk|k−1, c2Σk|k−1+σ2v)

wherexˆ_k_|_k₋₁=axˆ_k₋₁_|_k₋₁andΣ_k_|_k₋₁=a2Σ_k₋₁_|_k₋₁+σw2

and _[

xk

yk

]

|y0,· · ·, yk−1

∼N2 ([

ˆ

xk|k−1

cxˆ_k_|_k₋₁

]

,

[

Σk|k−1 cΣk|k−1

cΣ_k_|_k₋₁ c2Σ_k_|_k₋₁+σv2

])

Hence xk|y0,· · ·, yk∼N(ˆxk|k,Σk|k), where

ˆ

xk_|k=E(xk|y0,· · ·, yk) = ˆxk_|k₋1+

cΣk|k−1

c2_Σ k|k−1+σv2

(yk−cxˆk_|k₋1)

Σk|k=var(xk|y0,· · ·, yk) = Σk|k−1−

(cΣk|k−1)2

c2_Σ

k|k−1+σ2v.

(3) So the Kalman ﬁlter equations can be written as

ˆ

x_k_|_k=E(xk|y0,· · ·, yk) = ˆxk|k−1+bk(yk−cxˆk|k−1)

Σk|k=var(xk|y0,· · ·, yk) = Σk|k−1−cbkΣk|k−1.

(3)

where

bk=

cΣk|k−1

c2_Σ

k|k−1+σ2v

. (5)

Notice that the conditional variance is independent of y0,· · ·, yk. Alternatively, using xˆk|k−1 = axˆk−1|k−1 the

Kalman ﬁlter can be written by

ˆ

xk_|k=E(xk|y0,· · ·, yk) =axˆk₋1_|k₋1+bk(yk−acˆxk₋1_|k₋1).

Note thatˆxk_|kis the estimate of statexkat timekbased

on the previous estimate and only one data at time k. No algorithm can ever do better than a Kalman ﬁlter in the linear Gaussian state space. It should be noted that it is possible to derive the same results by minimizing MSE.

4 Kalman filter predictor

We might intrested in E(xk+ℓ|y0,· · ·, yk), which is

known as the Kalman ﬁlter predictor. 1. One-step predictor:

The same as the last section we again update xk+1 applying xk+1 = axk + wk. Therefore

xk+1|y0,· · ·, yk ∼N(axˆk|k, a2Σk|k+σ2w)and applying

equation(4), we obtain

ˆ

xk+1_|k=axˆk_|k =axˆk_|k₋1+abk(yk−cxˆk_|k₋1)

Σ_k₊₁_|_k =a2Σ_k_|_k+σw2 =a2(1−cbk)Σk|k−1+σw2.

(6) Note that if we deﬁne the innovation process {y˜k}

byyk−cxˆk|k−1and also assume thaty˜0=y0−E(y0),

then we have xˆk+1|k =axˆk|k−1+abky˜k in terms of

˜

yk.

2. Multi-step predictor:

Using (1), and substituting these sequences we can write

xk+ℓ=aℓxk+

∑ℓ

i=1ai−1wk+ℓ−i.

Then theℓ-step prediction is given by

ˆ

xk+ℓ_|k =E(xk+ℓ|y0,· · ·, yk) =aℓxˆk_|k,

andΣ_k₊_ℓ_|_k =var(x_k₊_ℓ_|_k) =a2ℓΣ_k_|_k+σw21−a

2ℓ

1−a2.Since

yk+ℓ=cxk+ℓ+vk+ℓ we have, then

E(yk+ℓ|y0,· · ·, yk) =cE(xk+ℓ|y0,· · ·, yk) =cxˆk+ℓ|k

var(yk+ℓ|y0,· · ·, yk) =c2Σk+ℓ|k+σ2v.

5 Vector Kalman filter

We have dealt so far with the scalar state equation by a first order autoregressive process. We can write directly the vector Kalman filter which each one is the first-order autoregressive process:

xjk+1=ajxjk+wjk j= 1,2,· · ·, q. (7)

Let Xk = [x1k, x2k,· · ·, xqk]T and Wk =

[w1k, w2k,· · ·, wqk]T.

Theq equations (7) can be written as the ﬁrst-order vector equation, Xk+1=AXk+Wk, where A is a q×q

matrix, in this case given byA=diag(a1, a2,· · ·, aq)and

Wk ∼ N(0, Qk). Assume that observation equations at

timek are

yik=cixik+vik i= 1,· · ·, r r < q.

We deﬁne observation vector equation byYk=CXk+Vk,

where

C=

     

c1 0 . . . 0 . . . 0

0 c2 . . . 0 . . . 0

..

. ... . .. ... . .. ...

0 0 . . . cr . . . 0

     

Yk = [y1k, y2k,· · ·, yrk]T , Vk = [v1k, v2k,· · ·, vqk]T and

Vk ∼N(0, Rk).

Hence we can directly transfer the scalar Kalman ﬁl-ter, given by equations (3) to (5) in to the corresponding vector Kalman ﬁlter form

ˆ

X_k_|_k= ˆX_k_|_k₋₁+βk(Yk−CXˆk|k−1)

Σk|k= Σk|k−1−βkCΣk|k−1,

where βk = Σk_|k₋1CT[CΣk_|k₋1CT +Rk]−1 andΣk_|k₋1=

AΣ_k₋₁_|_k₋₁AT +Qk−1.

• We can obtain the following vector and matrix set of equation directly from scalar Kalman predictor(6).

ˆ

X_k₊₁_|_k=AXˆ_k_|_k₋₁+Aβk(Yk−CXˆk|k−1)

Σk+1_|k=A(I−βkC)Σk_|k₋1AT +Qk,

whichβk is already deﬁned.

6 Some properties of Kalman

fil-ter

1. The Kalman ﬁlter deﬁned in the last sections was conditional expectation of E(xk|y0,· · ·, yk) which

minimizes the mean square error. It is also max-imum likelihood estimator and the best linear un-biased estimator whose variance is less than any other linear unbiased estimators. Even we drop nor-mality assumption, E(xk|y0,· · ·, yk) also minimizes

the variance. Here E(xk|y0,· · ·, yk) is a function

ofy0,· · ·, yk and without the normality assumption

will not necessarily be linear.

2. LetAY+bbe a linear estimator ofXgivenY, where Ais a fixed matrix andbis a fixed vector. We define a linear minimum variance estimator,Xˆ(Y) =A0Y+

b0, contrast with E(X|Y) the calculation of Xˆ(Y)

does not require the joint probability density. If X and Y are jointly Gaussian, the minimum variance and linear minimum variance estimators coincide. 3. If _Xˆ₍_Y₎ _{is a linear minimum variance estimate of}

X, then CXˆ(Y) +d is a linear minimum variance estimator ofCX+d, whereCanddare ﬁxed matrix and vector respectively. It is clearly property of linearity of the expectation operator.

4. Xˆ(Y) is unbiased, because E( ˆX(Y)) = E{µx +

ΣxyΣ−yy1(Y−µy)}=µx. Recall thatE(X|Y)is a

con-ditional minimum variance estimator. Infact _Xˆ₍_y₎

(4)

5. Kalman filter can be extended to many nonlinear problems [3]. As most real world models are non linear and non Gaussian and many analysis tasks in-volve estimating the state of a dynamic model when only partial or inaccurate observations are available, it is of great interest to develop efficient computa-tional methods to solve this so called Bayesian fil-tering problem numerically. Many approximation methods such as a Extended Kalman filter (EKF) have been developed to cope with this problem. The EKF linearises the state and observation equations, and then uses the Kalman Filter to obtain estimates for the state. For more details see e.g. [1].

7 Smoothing Kalman filter

We already have considered xˆk|k = E(xk|y0,· · ·, yk)

and xˆk+ℓ|k = E(xk+ℓ|y0,· · ·, yk) as classic Kalman

ﬁl-ter and predictor. One might be inﬁl-terested in xˆj|k =

E(xj|y0,· · ·, yk), j < k, which is known as smoothing

Kalman ﬁlter. We expect this estimator to be more accurate than E(xk|y0,· · ·, yk), because more

measure-ments are used in producingxˆ_j_|_k. The simplest smooth-ing is ﬁxed-point smoothsmooth-ing where, we want to deter-minexˆ_j_|_k=E(xj|y0,· · ·, yk)and the associated variance,

Σj|k=E{(xj−ˆxj|k)(xj−xˆj|k)T|y0,· · ·, yk}for some ﬁxed

j and all k > j. Let an augmenting state x∗_k is de-ﬁned by x∗_k₊₁ = x∗_k, k ≥ j with x∗_j = xj at the initial

time, then x∗_k₊₁ =xj,∀k ≥j. Hence xˆ∗k+1|k = ˆxj|k and

Σ∗∗_k₊₁_|_k = Σ_j_|_k, whereΣ∗∗_k₊₁_|_k denotes the variance of er-ror (ˆx∗_k₊₁_|_k−x∗_k₊₁). The augment state space model is therefore

Xk+1= [

xk+1

x∗_k₊₁

] = [ a 0 0 1 ] [ xk

x∗_k

] + [ 1 0 ]

wk=AXk+Wk

Yk=yk=

[

c 0

] [ _x

k

x∗_k

]

+Vk=CXk+Vk,

with the state vector atk=j satisfying

[

xj

x∗j

] = [ xj xj ] .

Using the Kalman ﬁlter predictor

ˆ

X_k₊₁_|_k =

[

ˆ

xk+1|k

ˆ

x∗_k₊₁_|_k

] = [ a 0 0 1 ] [ ˆ

xk|k−1

ˆ

x∗_k_|_k₋₁

] + [ a 0 0 1 ] [ bk

b∗_k

] (

Yk−

[

c 0

] [ _x_ˆ_k |k₋1

ˆ

x∗_k_|_k₋₁

])

,

where

βk=

[

bk

b∗k

]

=

[

Σk_|k₋1 Σ∗k_|k₋1

Σ∗_k_|_k₋₁ Σ∗∗_k_|_k₋₁

] [

c

0

]

×([ c 0

] [ _Σ_k_|_k₋₁ _Σ∗

k|k−1

Σ∗_k_|_k₋₁ Σ∗∗_k_|_k₋₁

] [

c

0

]

+σv2

)₋1

,

and _[

Σk+1|k Σ∗k+1_|k

Σ∗_k₊₁_|_k Σ∗∗_k₊₁_|_k

] = [ a 0 0 1 ] ([ 1 0 0 1 ] − [ bk

b∗_k

] [

c 0

])

×

[

Σk|k−1 Σ∗k_|k₋1

Σ∗_k_|_k₋₁ Σ∗∗_k_|_k₋₁

] [ a 0 0 1 ] + [

σw2 0

0 1

]

.

It can be written as

[

Σ_k₊₁_|_k Σ∗_k₊₁_|_k Σ∗_k₊₁_|_k Σ∗∗_k₊₁_|_k

]

=

[

a2(1−cbk)Σk_|k₋1+σw2 a(1−cbk)Σ∗k_|k₋1

a(1−cb∗k)Σk|k−1 Σk∗∗|k−1−cb∗Σ∗k|k−1 ]

.

Note thatΣk+1_|k appearing here is precisely that

associ-ated with the scalar Kalman filter equation, because the first element of the augmented Kalman filter is xˆk_|k₋1.

Hence fork≥j,

ˆ

xj|k=E(xj|y0,· · ·, yk) = ˆxj|k−1+b∗k(yk−cxˆk|k−1),

b∗k=

cΣ∗_k_|_k₋₁

c2_Σ

k|k−1+σ2v,

Σ∗_k₊₁_|_k=a(1−cbk)Σ∗k|k−1,

with initial valueΣ∗_j_|_j₋₁= Σj_|j₋1.

The variance of the error term,(xj−ˆxj|k), is given by

Σj_|k= Σj_|k₋1−cb∗kΣ∗k|k−1.

The ﬁxed-point smoother has some properties:

1. The ﬁxed point smoother is driven from the innova-tions process,y˜k=yk−cˆxk|k−1 of the Kalman ﬁlter

for the nonaugmented model.

2. The smoother is a linear discrete time model of di-mension equal to that of the ﬁlter.

3. As in the case of ﬁlter,xj|y0,· · ·, ykis Gaussian and

therefore deﬁned by the conditional meanxˆj_|k and

Σj_|k.

8 General form of the vector

Kalman filter

In general we can write the state-space equation by Xk+1=AkXk+GkWk, k≥0 (state equation)

Yk=CkXk+Vk (measurement model),

where we have all the assumptions which we made in the scalar one. Furthermore we assume that the covariance of the{Vk}is given byE[VkVℓT] =Rkδkℓwhere

δkℓ=

{

1 k=ℓ

0 o.w

and also covariance of the{Wk}process isE[WkWℓT] =

Qkδkℓ. Hence {Vk} and {Wk} are zero mean,

(5)

X0 ∼N( ˜X0, P0). Similar to scalar one by using

condi-tional expectation we have

[

X0

Y0 ]

∼N

([

˜

X0

C0X˜0 ]

,

[

P0 P0C0T

C0P0 C0P0C0T+R0 ])

.

Hence _Xˆ

0|0 = ˜X0+P0C0T(C0P0C0T +R0)−1(Y0−C0X˜0)

and covariance is deﬁned by

Σ0_|0=P0−P0C0T(C0P0C0T+R0)−1C0P0. We can update

state equation byX1=A0X0+G0W0. SoXˆ1|0=A0Xˆ0|0

andΣ₁_|₀=A0Σ0|0AT+G0Q0GT0. Also fromY1=C1X1+

V1 it follows that Y1 condition on Y0 is Gaussian with

mean, _Yˆ

1|0 = C1Xˆ1|0, and covariance, C1Σ1|0C1T +R1.

Hence

[

X1

Y1 ]

|Y0∼N

([ _ˆ

X1|0

C1Xˆ1|0 ]

,

[

Σ1|0 Σ1|0C1T

C1Σ1|0 C1Σ1|0C1T+R1 ])

.

ThenX1 given Y0, Y1 has normal distribution by

con-ditional mean and covariance,

ˆ

X1|1= ˆX1|0+ Σ1|0C1T(C1Σ1|0C1T+R1)−1(Y1−C1Xˆ1|0)

Σ₁_|₁= Σ₁_|₀−Σ₁_|₀C₁T(C1Σ1|0C1T+R1)−1C1Σ1|0.

Repeating this method yields

ˆ

Xk|k= ˆXk|k−1+Bk(Yk−CkXˆk|k−1)

Σk_|k= Σk_|k₋1−BkCkΣk_|k₋1,

where Bk = Σk_|k₋1CkT(CkΣk_|k₋1CkT + Rk)−1 and

Σk_|k₋1=Ak−1Σk₋1_|k₋1ATk−1+Gk−1Qk−1GTk−1.

REFERENCES

[1] B. D. Anderson and J. B. Moore. Optimal ﬁltering, Courier Corporation, 2012.

[2] N. Cressie. Statistics for Spatial Data, Revised edition, John Wiley, New York, 1993.

[3] D. Crisan and B. Rozovskii. The Oxford handbook of nonlinear ﬁltering, Oxford University Press , 2011.

[4] R. E. Kalman. A new approach to linear ﬁltering and prediction problems, Journal of Fluids Engineering, Vol.82, No.1, 35-45, 1960.