Appendix 5. A Some Proofs
III. Sensor Data Collection
8. Data Collection with Dynamical Linear Models
8.2. Dynamical Linear Models
8.2.1. General Definition
A dynamical linear model (DLM) can be specified by a pair of equations for each
time t≥1,
Yt=Ftθt+vt, vt∼ Nm(0,Vt) (8.1a) θt=Gtθt−1 +wt, wt∼ Np(0,Wt), (8.1b)
together with a prior forθ0
θ0 ∼ Np(m0,C0), (8.1c)
where Gt (of order p×p) and Ft (of order m×p) are fixed scalar matrices and
vt,wtfor t >0 are independent zero mean Gaussian random vectors with their cor-
responding variance matrices Vt and Wt. Note that (8.1a) is called the observation equation and Yt is the counterpart of Xt in the Hidden Markov Model definition;
while (8.1b) is called the state equation which is equivalent to the hidden state
process of a HMM.
8.2.2. DLM for Sensor
It can be proved [93] that the hidden states (θt) follows Markovian property, i.e.,
p(θt|θ(t−1)) =p(θt|θt
−1); (8.2)
and the observationYt only depends on θt, i.e.,
p(Yt|θ(t),Y(t−1)) =p(Yt|θt). (8.3)
Therefore, DLM is indeed an extension of a HMM since it follows the two defining requirements in Definition 7.1 except that the hidden stateθt is a continuous value
Chapter 8. Data Collection with Dynamical Linear Models
process 1. In view of this and in the WSN context, the hidden state equation
can be viewed as the process model describing the underlying physical process, say temperature; while the observation equation is actually a sensing model which samples the physical process and adds zero-mean Gaussian noises. By specifying the hidden state equation differently, we have the following two models that suitable for sensor readings: Local Level Model and Local Linear Trend Model.
Local Level Model
A local level model assumes the hidden state evolution follows a random walk. Specifically, the model can be defined as a special DLM by letting m=p= 1, Ft =
Gt = 1, Vt =V, Wt=W, and θt=µt :
Yt=µt+vt, vt∼ N(0, V) (8.4a)
µt=µt−1+wt, wt∼ N(0, W). (8.4b)
Note (8.4b) is a random walk so the hidden process is nonstationary [41], which makesYt also nonstationary. Intuitively, the observations (Yt) are modeled as unbi-
ased but noisy observations of the level componentµt, which evolves over time, and
is subject to random changes. Local Linear Trend Model
A local linear trend model (called trend model afterwards) is an extension of the local level model, which shares the same observation equation but includes an additional stochastic trend component βt besides the level µt. A trend model can be defined
as
Yt=µt+vt, vt∼ N(0, V), (8.5a)
µt=µt−1+βt−1+wt,1, wt,1 ∼ N(0, W1), (8.5b)
βt=βt−1+wt,2, wt,2 ∼ N(0, W2), (8.5c)
which is equivalent to a DLM with the following settings:
θt= µt βt , Gt=G= 1 1 0 1 , Wt=W = W1 0 0 W2 , Ft=F = [1 0]. 1θ
Note that by lettingW2 = 0, we have a deterministic trend model (the observation
either increases or declines). WhenW2 >0, the trend component becomes stochas-
tic, which fits sensor data context well: the sensor readings may either increase or decline.
8.2.3. Filtering and Smoothing
For a given DLM, the main tasks are to make inference on the hidden states θt
and based on that to predict future observations Yt. The hidden states can be
estimated by computing the conditional densities p(θs|y1:t)1. When s = t, the
estimation problem is called filtering; s < t, it is called smoothing; and s > t, it is
namedstate prediction. Note that the filtering and smoothing problem can be solved
by the renowned Kalman Filter and Kalman Smoother [123], as a DLM model is essentially a special Kalman Filter model [125, 92]. We state the Kalman Filter and Smoother results without proof in Results 8.1 and 8.2; the derivations can be found, for example, in [126].
Result 8.1 (Kalman Filter). Given a DLM specified by (8.1), Let
θt−1|y1:t−1 ∼ N(mt−1,Ct−1). (8.6)
The following statements holds.
1. The one-step-ahead state predictive distribution is Gaussian, i.e. θt|y1:t−1 ∼
N (at,Rt), and
at=Gtmt−1, Rt =GtCt−1G
0
t+Wt (8.7a)
2. The one-step-ahead observation predictive distribution is also Gaussian, i.e.
Yt|y1:t−1 ∼ N(ft,Qt), and
ft=Ftat, Qt =FtRtFt0 +Vt (8.7b)
3. The filtering distribution is still Gaussian, i.e. θt|y1:t∼ N(mt,Ct), and
mt=at+RtF0 tQ −1 t et, Ct =Rt−RtF 0 tQ −1 t FtRt, (8.7c) 1We use the lowercase y
1:t to represent the realizations of random variables. So p(θs|y1:t) is a
Chapter 8. Data Collection with Dynamical Linear Models
where et=yt−ft.
Note Result 8.1 essentially provides a one-pass algorithm to calculate the filtering distributionθt|y1:tby updating the previous resultθt−1|y1:t−1as a prior. Specifically,
the update is done by three seperate steps (shown in eqs. (8.7a) to (8.7c)). The fil- tering procedures do not require local storage of historical sensor data therefore is particularly suitable for sensor nodes. More importantly, the computational effort involved is relative cheap especially for the two sensor models introduced in Sec- tion 8.2.2. For example, for a local level model, the update procedures can be simplified as a few scalar arithmetic operations:
Rt=Ct−1+W Qt=Rt+V mt=mt−1+ Rt(yt−mt−1) Qt , Ct=Rt− R2 t Qt .
Similar results can be found with a trend model by substituting its corresponding model parameters.
Result 8.2 (Kalman Smoother). Given a DLM specified by (8.1), Let θt+1|y1:T ∼ N(st+1,St+1), thenθt|y1:T ∼ N(st,St), and
st=mt+CtG0t+1R−1
t+1(st+1−at+1), (8.8)
St=Ct−CtG0t+1R−1
t+1(Rt+1−St+1)R−t+11Gt+1Ct. (8.9)
According to Result 8.2, it is not hard to see the smoothing proceeds backwardly fromT to 1, and the calculation requires the filtering results (mt, Ct) fort=T, . . . ,1.
Therefore, Kalman Smoother is usually done after the filter so that the filtering results can be reused [72]. Note that the smoothing process is a one pass algorithm with linear growth complexity, and its space complexity also grows as T (it has to
store the complete filtering results of sizeT).
8.2.4. Observation Forecasting
Observation forecasting is to predict future observational readings based on the existing data. Formally, the h-step ahead forecasting distribution can be written
as p(Yt+h|y1:t). Intuitively, the distribution can be viewed as the one-step-ahead
present the filtering procedures that accommodate missing observations and then prove this intuition formally.
Result 8.3(DLM filtering with missing observation).Given a DLM specified by (8.1), Let
θt−1|y1:t−1 ∼ N(mt−1,Ct−1), (8.10)
and observationyt is missing. The following statements holds. The one-step-ahead
state and observation predictive distributions are the same as Result 8.1, i.e.
θt|y1:t−1 ∼ N (at,Rt) and Yt|y1:t−1 ∼ N(ft,Qt) (8.11a)
The filtering distribution is Gaussian, i.e. θt|y1:t∼ N(mt,Ct), where
mt =at, Ct=Rt, (8.11b)
Proof. Equation (8.11a) follows because of Kalman Filter, as the conditions are the
same as Result 8.1. Since yt is missing, then yt = NA and yt does not carry any
information, we have θt|y1:t =θt|y1:t−1 ∼ N(at,Rt) [93].
Theorem 8.1 (DLM Forecasting). Given a DLM specified by (8.1), Let
θt|y1:t ∼ N(mt,Ct), (8.12)
the h-step ahead observation predictive distribution is a Gaussian:
Yt+h|y1:t∼ N (ft(h),Qt(h)), (8.13)
where ft(h) = ft+h, Qt(h) =Qt+h and ft+h and Qt+h are estimated by eqs. (8.11a) and (8.11b) by treating yt+1 =yt+2 =. . .=yt+h−1 =NA.
Proof. See Section 8.A.
8.2.5. Parameter Estimation
Based on its definition (8.1), a DLM has the following parametersVt,Wt and prior
parameters m0,C0 which needs to be estimated from data. Note that Gt,Ft are
assumed known from context. For example, the local level model assumes a random walk hidden process, so Gt= 1; and the observations are unbiased, so Ft = 1.
Chapter 8. Data Collection with Dynamical Linear Models
A popular estimation method is by maximum likelihood [92], where the likelihood is defined as L =Lt,p(y1, y2, . . . , yt) = t Y i=1 p(yi|y1:i−1).1 (8.14)
However, according to Commandeur and Koopman [127], the optimization problem has no analytical solution but can only be estimated iteratively by numerical method.