Dynamical Linear Models - Data Collection with Dynamical Linear Models

Appendix 5. A Some Proofs

III. Sensor Data Collection

8. Data Collection with Dynamical Linear Models

8.2. Dynamical Linear Models

8.2.1. General Definition

A dynamical linear model (DLM) can be specified by a pair of equations for each

time t≥1,

Yt=Ftθt+vt, vt∼ Nm(0,Vt) (8.1a) θt=Gtθt−1 +wt, wt∼ Np(0,Wt), (8.1b)

together with a prior forθ0

θ0 ∼ Np(m0,C0), (8.1c)

where Gt (of order p×p) and Ft (of order m×p) are fixed scalar matrices and

vt,wtfor t >0 are independent zero mean Gaussian random vectors with their cor-

responding variance matrices Vt and Wt. Note that (8.1a) is called the observation equation and Yt is the counterpart of Xt in the Hidden Markov Model definition;

while (8.1b) is called the state equation which is equivalent to the hidden state

process of a HMM.

8.2.2. DLM for Sensor

It can be proved [93] that the hidden states (θt) follows Markovian property, i.e.,

p(θt|θ(t−1)_{) =}_p₍_θt_|θt

−1); (8.2)

and the observationYt only depends on θt, i.e.,

p(Yt|θ(t),Y(t−1)) =p(Yt|θt). (8.3)

Therefore, DLM is indeed an extension of a HMM since it follows the two defining requirements in Definition 7.1 except that the hidden stateθt is a continuous value

Chapter 8. Data Collection with Dynamical Linear Models

process 1_{. In view of this and in the WSN context, the hidden state equation}

can be viewed as the process model describing the underlying physical process, say temperature; while the observation equation is actually a sensing model which samples the physical process and adds zero-mean Gaussian noises. By specifying the hidden state equation differently, we have the following two models that suitable for sensor readings: Local Level Model and Local Linear Trend Model.

Local Level Model

A local level model assumes the hidden state evolution follows a random walk. Specifically, the model can be defined as a special DLM by letting m=p= 1, Ft =

Gt = 1, Vt =V, Wt=W, and θt=µt :

Yt=µt+vt, vt∼ N(0, V) (8.4a)

µt=µt−1+wt, wt∼ N(0, W). (8.4b)

Note (8.4b) is a random walk so the hidden process is nonstationary [41], which makesYt also nonstationary. Intuitively, the observations (Yt) are modeled as unbi-

ased but noisy observations of the level componentµt, which evolves over time, and

is subject to random changes. Local Linear Trend Model

A local linear trend model (called trend model afterwards) is an extension of the local level model, which shares the same observation equation but includes an additional stochastic trend component βt besides the level µt. A trend model can be defined

Yt=µt+vt, vt∼ N(0, V), (8.5a)

µt=µt−1+βt−1+wt,1, wt,1 ∼ N(0, W1), (8.5b)

βt=βt−1+wt,2, wt,2 ∼ N(0, W2), (8.5c)

which is equivalent to a DLM with the following settings:

θt=   µt βt  , Gt=G=   1 1 0 1  , Wt=W =   W1 0 0 W2  , Ft=F = [1 0]. 1_θ

Note that by lettingW2 = 0, we have a deterministic trend model (the observation

either increases or declines). WhenW2 >0, the trend component becomes stochas-

tic, which fits sensor data context well: the sensor readings may either increase or decline.

8.2.3. Filtering and Smoothing

For a given DLM, the main tasks are to make inference on the hidden states θt

and based on that to predict future observations Yt. The hidden states can be

estimated by computing the conditional densities p(θs|y1:t)1. When s = t, the

estimation problem is called filtering; s < t, it is called smoothing; and s > t, it is

namedstate prediction. Note that the filtering and smoothing problem can be solved

by the renowned Kalman Filter and Kalman Smoother [123], as a DLM model is essentially a special Kalman Filter model [125, 92]. We state the Kalman Filter and Smoother results without proof in Results 8.1 and 8.2; the derivations can be found, for example, in [126].

Result 8.1 (Kalman Filter). Given a DLM specified by (8.1), Let

θt−1|y1:t−1 ∼ N(mt−1,Ct−1). (8.6)

The following statements holds.

1. The one-step-ahead state predictive distribution is Gaussian, i.e. θt|y1:t−1 ∼

N (at,Rt), and

at=Gtmt−1, Rt =GtCt−1G

t+Wt (8.7a)

2. The one-step-ahead observation predictive distribution is also Gaussian, i.e.

Yt|y1:t−1 ∼ N(ft,Qt), and

ft=Ftat, Qt =FtRtFt0 +Vt (8.7b)

3. The filtering distribution is still Gaussian, i.e. θt|y1:t∼ N(mt,Ct), and

mt=at+RtF0 tQ −1 t et, Ct =Rt−RtF 0 tQ −1 t FtRt, (8.7c) 1_{We use the lowercase} _y

1:t to represent the realizations of random variables. So p(θs|y1:t) is a

Chapter 8. Data Collection with Dynamical Linear Models

where et=yt−ft.

Note Result 8.1 essentially provides a one-pass algorithm to calculate the filtering distributionθt|y1:tby updating the previous resultθt−1|y1:t−1as a prior. Specifically,

the update is done by three seperate steps (shown in eqs. (8.7a) to (8.7c)). The filtering procedures do not require local storage of historical sensor data therefore is particularly suitable for sensor nodes. More importantly, the computational effort involved is relative cheap especially for the two sensor models introduced in Sec- tion 8.2.2. For example, for a local level model, the update procedures can be simplified as a few scalar arithmetic operations:

Rt=Ct−1+W Qt=Rt+V mt=mt−1+ Rt(yt−mt−1) Qt , Ct=Rt− R2 t Qt .

Similar results can be found with a trend model by substituting its corresponding model parameters.

Result 8.2 (Kalman Smoother). Given a DLM specified by (8.1), Let θt+1|y1:T ∼ N(st+1,St+1), thenθt|y1:T ∼ N(st,St), and

st=mt+CtG0_t₊₁R−1

t+1(st+1−at+1), (8.8)

St=Ct−CtG0_t₊₁R−1

t+1(Rt+1−St+1)R−t+11Gt+1Ct. (8.9)

According to Result 8.2, it is not hard to see the smoothing proceeds backwardly fromT to 1, and the calculation requires the filtering results (mt, Ct) fort=T, . . . ,1.

Therefore, Kalman Smoother is usually done after the filter so that the filtering results can be reused [72]. Note that the smoothing process is a one pass algorithm with linear growth complexity, and its space complexity also grows as T (it has to

store the complete filtering results of sizeT).

8.2.4. Observation Forecasting

Observation forecasting is to predict future observational readings based on the existing data. Formally, the h-step ahead forecasting distribution can be written

as p(Yt+h|y1:t). Intuitively, the distribution can be viewed as the one-step-ahead

present the filtering procedures that accommodate missing observations and then prove this intuition formally.

Result 8.3(DLM filtering with missing observation).Given a DLM specified by (8.1), Let

θt−1|y1:t−1 ∼ N(mt−1,Ct−1), (8.10)

and observationyt is missing. The following statements holds. The one-step-ahead

state and observation predictive distributions are the same as Result 8.1, i.e.

θt|y1:t−1 ∼ N (at,Rt) and Yt|y1:t−1 ∼ N(ft,Qt) (8.11a)

The filtering distribution is Gaussian, i.e. θt|y1:t∼ N(mt,Ct), where

mt =at, Ct=Rt, (8.11b)

Proof. Equation (8.11a) follows because of Kalman Filter, as the conditions are the

same as Result 8.1. Since yt is missing, then yt = NA and yt does not carry any

information, we have θt|y1:t =θt|y1:t−1 ∼ N(at,Rt) [93].

Theorem 8.1 (DLM Forecasting). Given a DLM specified by (8.1), Let

θt|y1:t ∼ N(mt,Ct), (8.12)

the h-step ahead observation predictive distribution is a Gaussian:

Yt+h|y1:t∼ N (ft(h),Qt(h)), (8.13)

where ft(h) = ft+h, Qt(h) =Qt+h and ft+h and Qt+h are estimated by eqs. (8.11a) and (8.11b) by treating yt+1 =yt+2 =. . .=yt+h−1 =NA.

Proof. See Section 8.A.

8.2.5. Parameter Estimation

Based on its definition (8.1), a DLM has the following parametersVt,Wt and prior

parameters m0,C0 which needs to be estimated from data. Note that Gt,Ft are

assumed known from context. For example, the local level model assumes a random walk hidden process, so Gt= 1; and the observations are unbiased, so Ft = 1.

Chapter 8. Data Collection with Dynamical Linear Models

A popular estimation method is by maximum likelihood [92], where the likelihood is defined as L =Lt,p(y1, y2, . . . , yt) = t Y i=1 p(yi|y1:i−1).1 (8.14)

However, according to Commandeur and Koopman [127], the optimization problem has no analytical solution but can only be estimated iteratively by numerical method.

In document Wireless sensor network control through statistical methods (Page 160-165)