Chapter 3 Learning chain event graphs
4.3 Dynamic linear models
Dynamic linear models (DLMs) [Harrison and Stevens, 1976; West and Harrison, 1997; Petris et al., 2009] are the classic state-space model. They are defined as follows.
Definition 38. Adynamic linear modelconsists of time vectors of observations X and state parameters θ such that at timet= 0
and at timet≥1
Xt=Ftθt+vt (4.6)
θt=Gtθt−1+wt (4.7)
where Ft and Gt are known matrices of appropriate order and vt, wt are indepen- dent multivariate-Normal variables with mean zero and variancesVt, Wtrespectively. Equation (4.6) is conventionally called theobservation equation while equation (4.7) is thestate equation or system equation.
The DLM is therefore a state-space model with the added assumptions of linearity and Gaussianity. This allows for exact, conjugate updating of distributions when applying the recursive procedure described above, as originally exploited by the Kalman filter [Kalman, 1960]. When either linearity or Gaussianity are not plausible, conjugacy is often hard to retain. I will introduce in the next chapter a dynamic graphical model that allows for complex multi-variate distributions at each time point that also retains conjugacy. First I discuss some general time series modelling tools that will help in this task.
4.3.1 Multi-process Modelling
Even if a process is determined to be accurately represented by, say, a DLM, it is natural to have uncertainty about the underlying parameter process, e.g. because of knowledge of regime change, or external intervention [West and Harrison, 1989]. In the DLM context this corresponds to being unsure as to the exact nature ofF and
Gin the process equations. This uncertainty can itself be modelled by introducing a new level to the standard state-space model class given above. West and Harrison [1997] call this multi-process modelling [Harrison and Stevens, 1976] in the
DLM context, and is also known in the literature as switching state-space models [Frühwirth-Schnatter, 2006].
In the West-Harrison terminology, multi-process models of the first class apply when for alltthere is someM which determines the parameter values for the whole process — in the DLM this corresponds to uncertainty aboutF andG— but it is not known which value ofM from a possible set Mis the true one.
This can be transparently dealt with under the Bayesian paradigm as follows. A prior distributionP(M)overMis specified before the first observations. Predic- tions for eachXtare calculated as a weighted average over the possible values ofM
(shown here for a finiteM) conditional on observations up to time t−1inclusive:
P(Xt|Xt−1) = ∑ M∈M P(Xt|M)P(M |Xt−1) (4.8) = ∑ M∈M ∫ Θt P(Xt|θt)P(θt|M)P(M |Xt−1)dθt (4.9)
It can be seen that the usual assumption is that Xt is independent of M given
θt; in other words, M is purely a description of the latent process, which in turns
determines the distribution of the observable process, as before.
The distribution of M is then updated after each observation in the usual way, similarly to the filtering method described above:
P(M |Xt)∝P(M |Xt−1)P(Xt|M) (4.10)
P(M |Xt−1) was obtained after observing Xt−1, and P(Xt |M) was calculated in
equation (4.8).
Where each process is a DLM, the multi-process model of the first class is, of course, not a DLM itself, but rather a mixture of DLMs.
A usually more realistic assumption is that at each timeta different value of
M holds. The dependence between the values of M at different times must then be modelled explicitly, whether the values at different times are entirely independent or highly correlated. This was named by West and Harrison [1997] a multi-process model of the second class. It is clear that this class includes multi-process models of the first class as a special case.
Now the prediction formula is updated in the following way:
P(Xt|Xt−1) = ∑ Mt−1∈Mt−1 ∑ Mt∈M P(Xt|Mt)P(Mt|Mt−1, Xt−1)P(Mt−1 |Xt−1) (4.11) = ∑ Mt−1∈Mt−1 ∑ Mt∈M ∫ Θt P(Xt|θt)P(θt|Mt)P(Mt|Xt−1, Mt−1)P(Mt−1|Xt−1)dθt (4.12)
While there are is obviously a large class of possible specifications forP(Mt|
Xt−1, Mt−1), the three “practically important possibilities” recommended by West
and Harrison [1997] are as follows:
1. Fixed model probabilities, such that
P(Mt|Xt−1, Mt−1) =π(Mt) for all t≥1 (4.13)
Here one needs to only specify one prior over M. This prior remains fixed through time and is not changed by observations.
2. First-order Markov probabilities, where fixed transition probabilities between the models
are specified a priori for allM, M′∈ M, so that
P(Mt|Xt−1, Mt−1) = ∑ M′∈M
π(M |M′)P(Mt−1=M′ |Xt−1) (4.15)
Some initial prior distribution over M would need to be set. These Markov transition probabilities would also not change throughout the process.
3. Higher-order Markov probabilities, where the probabilities ofMt additionally
depend on the values ofM att−2,t−3, etc. as well ast−1.
It should be clear that multi-process models of the second class are more complicated than those of the first class with the benefit of allowing flexibility in the models to changing circumstances in the system.
In the next chapter I will introduce a multi-process model where at each time pointMt represents a possible underlying CEG, allowing for far more compli-
cated systems to be modelled than is possible with DLMs but nonetheless retaining conjugacy.