Kalman Filter: A Simple Derivation
Ali M. Mosammam
Department of Statistics, University of Zanjan, Zanjan, Iran ∗Corresponding Author: [email protected]
Copyright c⃝2015 Horizon Research Publishing All rights reserved.
Abstract
The Kalman filter is a recursive estimator and plays a fundamental role in statistics for filtering, prediction and smoothing. The key element in any recursive estimator is the estimate of the current state, xk, at timek, based on observations up to and includingobservationkand the Kalman filter enables the estimate of the state to be updated as new observations become available. In this paper we have tried to derive the Kalman filter as simple as possible.
Keywords
Kalman Filter, State-space Model, Dynamic System, Gaussian Process1
Introduction
The analysis of a time series [5] and spatial analysis [2] can be done in one of two ways. Firstly, it could be analysed in a batch. In other words, the whole time se-ries to date is analysed. Secondly, analysis of the time series sequentially. With this approach the current esti-mate of the state of the system is updated over time as new measurements are made. Whilst both methods will give the same answer, the sequential method has two advantages:
1. reduction in the computational cost 2. reduction in the storage capacity.
The disadvantage of sequential methods is appeared when the posterior can not be calculated exactly. In this paper we will concentrate on the state-space approach to modelling dynamic systems. In order to analyze a dynamic system, at least two models are required:
• First, a model describing the evolution of the state with time (the state model)
• Second, a model relating the noisy measurements to the state (the measurement model).
In the Bayesian approach to dynamic state estimation, one attempts to construct the posterior probability den-sity function of the state based on all available informa-tion, including the set of received measurements. For many problems, an estimate is required every time that a measurement is received. In this case, a recursive filter
is a convenient solution. A recursive filtering approach means that received data can be processed sequentially rather than as a batch so that it is not necessary to store the complete data set nor to reprocess existing data if a new measurement becomes available. There are classes of problems for which the recoursive Bayesian solutions are tractable. The most important of these classes is the set of problems where the state and observation equa-tions are linear, and the distribuequa-tions of the prior, and observation and state noise are Gaussian. In this case no algorithm can ever do better than a Kalman filter [4]. In this paper we have tried to derive the Kalman filter properly. If the state space model is linear with uncor-related Gaussian noise and a Gaussian prior, the prior and posterior distributions at each time step are them-selves Gaussian random variables. A Gaussian random variable’s distribution is uniquely defined by the speci-fication of its mean and variance. The Kalman filter is a recursive estimator and plays a fundamental role in statistics for filtering, prediction and smoothing. The key element in any recursive estimator is the estimate of the current state , xk, at time k, based on observations
up to and including observationkand the Kalman filter enables the estimate of the state to be updated as new observations become available.
2
Basic model
The basic model in order to derive the Kalman filter is described by linear, discrete-time, finite-dimensional state-space equations
xk+1 = axk+wk, k≥0 (state equation) (1)
yk = cxk+vk (measurement model)(2)
Here we shall make the following assumptions:
1. {vk} and {wk} are individually independent,
zero-mean, Gaussian processes with known variances. Ifwk∼N(0, σw2), then we haveE(wk) = 0, var(wk) =
σw2 andE(wkwℓ) =E(wk)E(wℓ) = 0for allk̸=ℓ.
If vk ∼ N(0, σv2), we have E(vk) = 0, var(vk) =
E(vk2) =σv2 andE(vkvℓ) = 0for allk̸=ℓ.
2. {vk} and {wk} are independent processes. Hence
3. At the initial time, k = 0, the initial state, x0 ∼
N(˜x0, P0), further we shall assume that x0 is
inde-pendent ofvk andwk for anyk. Hence
(a) E(xkvℓ) = 0for all k,ℓ.
(b) E(xkwℓ) = 0for all k,ℓ.
This state-space model has some properties:
1. Sincexkis a linear combination of the jointly
Gaus-sian random variables x0, w0, w1,· · ·, wk−1, it is a
Gaussian variable. 2. {xk}is a Markov process.
3. {yk}is a Gaussian process.
4. {xk}and{yk}are jointly Gaussian.
We have made these assumptions in order to derive the Kalman filter easily. Some of these assumptions can be relaxed in practice.
3
Scalar Kalman filter
We have already introduced the basic models (1, 2) to derive the Kalman filter. We want to show that under some assumptions conditional expectation yields a set of recursive equations which is known the Kalman filter re-cursions. Here we shall denote the following quantities: • xˆk|k=E(xk|y0,· · ·, yk)is the estimate of the current
state , xk, at time k, based on observations up to
and including observationk.
• xˆk|k−1 =E(xk|y0,· · ·, yk−1) is the predictor of the
current state ,xk, at timek, based onk−1
obser-vations.
• Σk|k =var(xk|y0,· · ·, yk) is the variance of the
cur-rent state ,xk, at timek, based on observations up
to and including observationk.
• Σk|k−1=var(xk|y0,· · ·, yk−1) is the variance of the
current state , xk, at time k, based on first k−1
observations.
We might interested in prediction ofE(yk|y0,· · ·, yk−1).
Similarly notations can be established for observation equation. From (2)and the third assumption it can be written
[
x0
y0 ]
∼N2 ([
˜
x0
cx˜0 ]
,
[
P0 cP0
cP0 c2P0+σv2
])
.
Hence the conditional distribution takes the form x0|y0 ∼ N(ˆx0|0,Σ0|0), which is completely specified by
following conditional mean and variance:
ˆ
x0|0=E(x0|y0) = ˜x0+c2PcP0+0σ2 v
(y0−cx˜0),
Σ0|0=var(x0|y0) =P0− (cP0) 2
c2P 0+σ2v.
Using state-space equation (1, 2) fork= 0, we get x1=ax0+w0
y1=cx1+v1.
Using,
[
x1
y0 ]
∼N2 ([
a˜x0
cx˜0 ]
,
[
a2P0+σw2 acP0
acP0 c2P0+σ2v
])
[
y1
y0 ]
∼N2 ([
acx˜0
cx˜0 ]
,
[
c2(a2P0+σ2w) +σv2 ac2P0
ac2P0 c2P0+σ2v
])
,
the conditional distributions are themselves normal and completely specified by conditional means and vari-ances
x1|y0 ∼ N(ˆx1|0,Σ1|0)
ˆ
x1|0 = ax˜0+
acP0
c2P 0+σv2
(y0−cx˜0) =axˆ0|0
Σ1|0 = a2P0+σw2 −
(acP0)2
c2P 0+σ2v
=a2Σ0|0+σ2w
y1|y0 ∼ N(cxˆ1|0, c 2
Σ1|0+σ2v)
E(y1|y0) = acx˜0+ ac 2
P0
c2P 0+σv2
(y0−cx˜0) =cxˆ1|0
var(y1|y0) = c2(a2P0+σw2) +σw2 −
(ac2P0)2
c2P 0+σv2
=c2Σ1|0+σ2w.
Furthermore,
[
x1
y1 ]
|y0∼N2 ([
ˆ
x1|0
cxˆ1|0
]
,
[
Σ1|0 cΣ1|0
cΣ1|0 c2Σ1|0+σ2v
])
.
Hencex1|y0, y1∼N(ˆx1|1,Σ1|1), where its mean and
vari-ance are
ˆ
x1|1=E(x1|y0, y1) = ˆx1|0+
cΣ1|0
c2Σ 1|0+σv2
(y1−cxˆ1|0)
Σ1|1=var(x1|y0, y1) = Σ1|0−
(cΣ1|0)2
c2Σ 1|0+σ2v.
Let we have repeated this method and found xk−1|y0,· · ·, yk−1 ∼ N(ˆxk−1|k−1,Σk−1|k−1), using state
(1,2)we can update equations by
xk =axk−1+wk−1 (state equation)
yk−1=cxk−1+vk−1 (measurement model).
Hence the conditional distributions are xk|y0,· · ·, yk−1∼N(ˆxk|k−1,Σk|k−1)
yk|y0,· · ·, yk−1∼N(cxˆk|k−1, c2Σk|k−1+σ2v)
wherexˆk|k−1=axˆk−1|k−1andΣk|k−1=a2Σk−1|k−1+σw2
and [
xk
yk
]
|y0,· · ·, yk−1
∼N2 ([
ˆ
xk|k−1
cxˆk|k−1
]
,
[
Σk|k−1 cΣk|k−1
cΣk|k−1 c2Σk|k−1+σv2
])
Hence xk|y0,· · ·, yk∼N(ˆxk|k,Σk|k), where
ˆ
xk|k=E(xk|y0,· · ·, yk) = ˆxk|k−1+
cΣk|k−1
c2Σ k|k−1+σv2
(yk−cxˆk|k−1)
Σk|k=var(xk|y0,· · ·, yk) = Σk|k−1−
(cΣk|k−1)2
c2Σ
k|k−1+σ2v.
(3) So the Kalman filter equations can be written as
ˆ
xk|k=E(xk|y0,· · ·, yk) = ˆxk|k−1+bk(yk−cxˆk|k−1)
Σk|k=var(xk|y0,· · ·, yk) = Σk|k−1−cbkΣk|k−1.
where
bk=
cΣk|k−1
c2Σ
k|k−1+σ2v
. (5)
Notice that the conditional variance is independent of y0,· · ·, yk. Alternatively, using xˆk|k−1 = axˆk−1|k−1 the
Kalman filter can be written by
ˆ
xk|k=E(xk|y0,· · ·, yk) =axˆk−1|k−1+bk(yk−acˆxk−1|k−1).
Note thatˆxk|kis the estimate of statexkat timekbased
on the previous estimate and only one data at time k. No algorithm can ever do better than a Kalman filter in the linear Gaussian state space. It should be noted that it is possible to derive the same results by minimizing MSE.
4
Kalman filter predictor
We might intrested in E(xk+ℓ|y0,· · ·, yk), which is
known as the Kalman filter predictor. 1. One-step predictor:
The same as the last section we again update xk+1 applying xk+1 = axk + wk. Therefore
xk+1|y0,· · ·, yk ∼N(axˆk|k, a2Σk|k+σ2w)and applying
equation(4), we obtain
ˆ
xk+1|k=axˆk|k =axˆk|k−1+abk(yk−cxˆk|k−1)
Σk+1|k =a2Σk|k+σw2 =a2(1−cbk)Σk|k−1+σw2.
(6) Note that if we define the innovation process {y˜k}
byyk−cxˆk|k−1and also assume thaty˜0=y0−E(y0),
then we have xˆk+1|k =axˆk|k−1+abky˜k in terms of
˜
yk.
2. Multi-step predictor:
Using (1), and substituting these sequences we can write
xk+ℓ=aℓxk+
∑ℓ
i=1ai−1wk+ℓ−i.
Then theℓ-step prediction is given by
ˆ
xk+ℓ|k =E(xk+ℓ|y0,· · ·, yk) =aℓxˆk|k,
andΣk+ℓ|k =var(xk+ℓ|k) =a2ℓΣk|k+σw21−a
2ℓ
1−a2.Since
yk+ℓ=cxk+ℓ+vk+ℓ we have, then
E(yk+ℓ|y0,· · ·, yk) =cE(xk+ℓ|y0,· · ·, yk) =cxˆk+ℓ|k
var(yk+ℓ|y0,· · ·, yk) =c2Σk+ℓ|k+σ2v.
5
Vector Kalman filter
We have dealt so far with the scalar state equation by a first order autoregressive process. We can write directly the vector Kalman filter which each one is the first-order autoregressive process:
xjk+1=ajxjk+wjk j= 1,2,· · ·, q. (7)
Let Xk = [x1k, x2k,· · ·, xqk]T and Wk =
[w1k, w2k,· · ·, wqk]T.
Theq equations (7) can be written as the first-order vector equation, Xk+1=AXk+Wk, where A is a q×q
matrix, in this case given byA=diag(a1, a2,· · ·, aq)and
Wk ∼ N(0, Qk). Assume that observation equations at
timek are
yik=cixik+vik i= 1,· · ·, r r < q.
We define observation vector equation byYk=CXk+Vk,
where
C=
c1 0 . . . 0 . . . 0
0 c2 . . . 0 . . . 0
..
. ... . .. ... . .. ...
0 0 . . . cr . . . 0
Yk = [y1k, y2k,· · ·, yrk]T , Vk = [v1k, v2k,· · ·, vqk]T and
Vk ∼N(0, Rk).
Hence we can directly transfer the scalar Kalman fil-ter, given by equations (3) to (5) in to the corresponding vector Kalman filter form
ˆ
Xk|k= ˆXk|k−1+βk(Yk−CXˆk|k−1)
Σk|k= Σk|k−1−βkCΣk|k−1,
where βk = Σk|k−1CT[CΣk|k−1CT +Rk]−1 andΣk|k−1=
AΣk−1|k−1AT +Qk−1.
• We can obtain the following vector and matrix set of equation directly from scalar Kalman predictor(6).
ˆ
Xk+1|k=AXˆk|k−1+Aβk(Yk−CXˆk|k−1)
Σk+1|k=A(I−βkC)Σk|k−1AT +Qk,
whichβk is already defined.
6
Some properties of Kalman
fil-ter
1. The Kalman filter defined in the last sections was conditional expectation of E(xk|y0,· · ·, yk) which
minimizes the mean square error. It is also max-imum likelihood estimator and the best linear un-biased estimator whose variance is less than any other linear unbiased estimators. Even we drop nor-mality assumption, E(xk|y0,· · ·, yk) also minimizes
the variance. Here E(xk|y0,· · ·, yk) is a function
ofy0,· · ·, yk and without the normality assumption
will not necessarily be linear.
2. LetAY+bbe a linear estimator ofXgivenY, where Ais a fixed matrix andbis a fixed vector. We define a linear minimum variance estimator,Xˆ(Y) =A0Y+
b0, contrast with E(X|Y) the calculation of Xˆ(Y)
does not require the joint probability density. If X and Y are jointly Gaussian, the minimum variance and linear minimum variance estimators coincide. 3. If Xˆ(Y) is a linear minimum variance estimate of
X, then CXˆ(Y) +d is a linear minimum variance estimator ofCX+d, whereCanddare fixed matrix and vector respectively. It is clearly property of linearity of the expectation operator.
4. Xˆ(Y) is unbiased, because E( ˆX(Y)) = E{µx +
ΣxyΣ−yy1(Y−µy)}=µx. Recall thatE(X|Y)is a
con-ditional minimum variance estimator. Infact Xˆ(y)
5. Kalman filter can be extended to many nonlinear problems [3]. As most real world models are non linear and non Gaussian and many analysis tasks in-volve estimating the state of a dynamic model when only partial or inaccurate observations are available, it is of great interest to develop efficient computa-tional methods to solve this so called Bayesian fil-tering problem numerically. Many approximation methods such as a Extended Kalman filter (EKF) have been developed to cope with this problem. The EKF linearises the state and observation equations, and then uses the Kalman Filter to obtain estimates for the state. For more details see e.g. [1].
7
Smoothing Kalman filter
We already have considered xˆk|k = E(xk|y0,· · ·, yk)
and xˆk+ℓ|k = E(xk+ℓ|y0,· · ·, yk) as classic Kalman
fil-ter and predictor. One might be infil-terested in xˆj|k =
E(xj|y0,· · ·, yk), j < k, which is known as smoothing
Kalman filter. We expect this estimator to be more accurate than E(xk|y0,· · ·, yk), because more
measure-ments are used in producingxˆj|k. The simplest smooth-ing is fixed-point smoothsmooth-ing where, we want to deter-minexˆj|k=E(xj|y0,· · ·, yk)and the associated variance,
Σj|k=E{(xj−ˆxj|k)(xj−xˆj|k)T|y0,· · ·, yk}for some fixed
j and all k > j. Let an augmenting state x∗k is de-fined by x∗k+1 = x∗k, k ≥ j with x∗j = xj at the initial
time, then x∗k+1 =xj,∀k ≥j. Hence xˆ∗k+1|k = ˆxj|k and
Σ∗∗k+1|k = Σj|k, whereΣ∗∗k+1|k denotes the variance of er-ror (ˆx∗k+1|k−x∗k+1). The augment state space model is therefore
Xk+1= [
xk+1
x∗k+1
] = [ a 0 0 1 ] [ xk
x∗k
] + [ 1 0 ]
wk=AXk+Wk
Yk=yk=
[
c 0
] [ x
k
x∗k
]
+Vk=CXk+Vk,
with the state vector atk=j satisfying
[
xj
x∗j
] = [ xj xj ] .
Using the Kalman filter predictor
ˆ
Xk+1|k =
[
ˆ
xk+1|k
ˆ
x∗k+1|k
] = [ a 0 0 1 ] [ ˆ
xk|k−1
ˆ
x∗k|k−1
] + [ a 0 0 1 ] [ bk
b∗k
] (
Yk−
[
c 0
] [ xˆk |k−1
ˆ
x∗k|k−1
])
,
where
βk=
[
bk
b∗k
]
=
[
Σk|k−1 Σ∗k|k−1
Σ∗k|k−1 Σ∗∗k|k−1
] [
c
0
]
×([ c 0
] [ Σk|k−1 Σ∗
k|k−1
Σ∗k|k−1 Σ∗∗k|k−1
] [
c
0
]
+σv2
)−1
,
and [
Σk+1|k Σ∗k+1|k
Σ∗k+1|k Σ∗∗k+1|k
] = [ a 0 0 1 ] ([ 1 0 0 1 ] − [ bk
b∗k
] [
c 0
])
×
[
Σk|k−1 Σ∗k|k−1
Σ∗k|k−1 Σ∗∗k|k−1
] [ a 0 0 1 ] + [
σw2 0
0 1
]
.
It can be written as
[
Σk+1|k Σ∗k+1|k Σ∗k+1|k Σ∗∗k+1|k
]
=
[
a2(1−cbk)Σk|k−1+σw2 a(1−cbk)Σ∗k|k−1
a(1−cb∗k)Σk|k−1 Σk∗∗|k−1−cb∗Σ∗k|k−1 ]
.
Note thatΣk+1|k appearing here is precisely that
associ-ated with the scalar Kalman filter equation, because the first element of the augmented Kalman filter is xˆk|k−1.
Hence fork≥j,
ˆ
xj|k=E(xj|y0,· · ·, yk) = ˆxj|k−1+b∗k(yk−cxˆk|k−1),
b∗k=
cΣ∗k|k−1
c2Σ
k|k−1+σ2v,
Σ∗k+1|k=a(1−cbk)Σ∗k|k−1,
with initial valueΣ∗j|j−1= Σj|j−1.
The variance of the error term,(xj−ˆxj|k), is given by
Σj|k= Σj|k−1−cb∗kΣ∗k|k−1.
The fixed-point smoother has some properties:
1. The fixed point smoother is driven from the innova-tions process,y˜k=yk−cˆxk|k−1 of the Kalman filter
for the nonaugmented model.
2. The smoother is a linear discrete time model of di-mension equal to that of the filter.
3. As in the case of filter,xj|y0,· · ·, ykis Gaussian and
therefore defined by the conditional meanxˆj|k and
Σj|k.
8
General form of the vector
Kalman filter
In general we can write the state-space equation by Xk+1=AkXk+GkWk, k≥0 (state equation)
Yk=CkXk+Vk (measurement model),
where we have all the assumptions which we made in the scalar one. Furthermore we assume that the covariance of the{Vk}is given byE[VkVℓT] =Rkδkℓwhere
δkℓ=
{
1 k=ℓ
0 o.w
and also covariance of the{Wk}process isE[WkWℓT] =
Qkδkℓ. Hence {Vk} and {Wk} are zero mean,
X0 ∼N( ˜X0, P0). Similar to scalar one by using
condi-tional expectation we have
[
X0
Y0 ]
∼N
([
˜
X0
C0X˜0 ]
,
[
P0 P0C0T
C0P0 C0P0C0T+R0 ])
.
Hence Xˆ
0|0 = ˜X0+P0C0T(C0P0C0T +R0)−1(Y0−C0X˜0)
and covariance is defined by
Σ0|0=P0−P0C0T(C0P0C0T+R0)−1C0P0. We can update
state equation byX1=A0X0+G0W0. SoXˆ1|0=A0Xˆ0|0
andΣ1|0=A0Σ0|0AT+G0Q0GT0. Also fromY1=C1X1+
V1 it follows that Y1 condition on Y0 is Gaussian with
mean, Yˆ
1|0 = C1Xˆ1|0, and covariance, C1Σ1|0C1T +R1.
Hence
[
X1
Y1 ]
|Y0∼N
([ ˆ
X1|0
C1Xˆ1|0 ]
,
[
Σ1|0 Σ1|0C1T
C1Σ1|0 C1Σ1|0C1T+R1 ])
.
ThenX1 given Y0, Y1 has normal distribution by
con-ditional mean and covariance,
ˆ
X1|1= ˆX1|0+ Σ1|0C1T(C1Σ1|0C1T+R1)−1(Y1−C1Xˆ1|0)
Σ1|1= Σ1|0−Σ1|0C1T(C1Σ1|0C1T+R1)−1C1Σ1|0.
Repeating this method yields
ˆ
Xk|k= ˆXk|k−1+Bk(Yk−CkXˆk|k−1)
Σk|k= Σk|k−1−BkCkΣk|k−1,
where Bk = Σk|k−1CkT(CkΣk|k−1CkT + Rk)−1 and
Σk|k−1=Ak−1Σk−1|k−1ATk−1+Gk−1Qk−1GTk−1.
REFERENCES
[1] B. D. Anderson and J. B. Moore. Optimal filtering, Courier Corporation, 2012.
[2] N. Cressie. Statistics for Spatial Data, Revised edition, John Wiley, New York, 1993.
[3] D. Crisan and B. Rozovskii. The Oxford handbook of nonlinear filtering, Oxford University Press , 2011.
[4] R. E. Kalman. A new approach to linear filtering and prediction problems, Journal of Fluids Engineering, Vol.82, No.1, 35-45, 1960.