8.2 Dynamic Bayesian Ontology Languages
8.2.1 Dynamic Bayesian Networks
BNs are static models, i.e., it is not possible to capture the dynamic features of the application domain. For instance, the probability of a patient having a high fever is very likely given the fact that the patient had high fever in the previous time step. Such scenarios can be modeled using dynamic Bayesian networks (DBNs) (Dean and Kanazawa 1989; Murphy 2002), which extend BNs to provide a compact representation of evolving JPDs for a fixed set of random variables.
BNs are known for compactly representing a state space, while DBNs can also represent the state-transition probabilities, and thus, can be facilitated to make projections about the future states of a system. The update of the joint probability distribution is typically expressed through an two-slice BN, which expresses the probabilities at the next point in time, given the current context.
Formally, a two-slice BN (TBN) over a finite set of variables V is a pair B→= (G, Θ),
where G = (V ∪V+, E)with V+= {X+| X ∈ V }is a DAG such that all edges are directed
from elements of V ∪ V+ to elements of V+, and Θ contains, for every X+∈ V+, a
conditional probability distribution P (X+| π(X+))for X+ given the parents of X+. As
standard in BNs, every node is independent of all its non-descendants given its parents in TBNs. Thus, for a TBN B→, the conditional probability distribution at time t + 1
given time t is PB→(Vt+1| Vt) = Y X+∈V+ PB→(X +| π(X+)).
Example 8.26 Figure 8.2 depicts a TBN B→ and thereby defines the transition prob-
abilities between the two time slices. For instance, the probability for bob to have high fever at time t + 1 provided he did not have high fever at time t is given by
PB→(f+| ¬f) = 0.1. ♦
A dynamic Bayesian network (DBN) is a pair D = (B1, B→), where B1 is a BN , and
F S C t F+ S+ C+ f+ f .7 ¬f .1 s+ f s f+ .9 f s ¬f+ .5 f ¬s f+ .8 f ¬s ¬f+ .4 ¬f s f+ .8 ¬f s ¬f+ .4 ¬f ¬s f+ .7 ¬f ¬s ¬f+ .1 c+ c f+ s+ .7 c f+ ¬s+ .2 c ¬f+ s+ .4 c ¬f+ ¬s+ 1 ¬c f+ s+ .6 ¬c f+ ¬s+ .1 ¬c ¬f+ s+ .3 ¬c ¬f+ ¬s+ 1 t + 1
Figure 8.2: The DBN Dh = (B1, B→), consisting of (a) a BN B1 (= Bh ) over
V = {F, S, C}, which compactly represents a joint probability distribution,
and (b) a two-slice BN B→ over V , which defines the transition probabilities
between two time slices Vtand Vt+1.
thought of as containing two disjoint copies of the random variables in V , where the probability distribution at time t + 1 depends on the distribution at time t.
To be able to distinguish the variables in different time slices, we use Vt and Xt to
denote the set of variables V and the variable X ∈ V at time t, respectively. As in BNs, x is an abbreviation for X = 1 and ¬x for X = 0 . Moreover, we assume the (first-order) Markov property: the probability of the future state is independent from the past, given the present state. We note, however, that all of our results can be generalized to k-slice BNs, which relaxes this assumption to k slices and adds memory. Given the (first-order) Markov property, a DBN D = (B1, B→) defines, for every t ≥ 1, the unique
joint probability distribution
PD(Vt) = PB1(V1) · t Y i=2 Y Xi∈Vi PB→(Xi | π(Xi)).
We briefly illustrate these notions on our running example.
Example 8.27 Consider the TBN B→ depicted in Figure 8.2. The pair D = (B1, B→)
is a DBN, where B1 is the BN depicted in Figure 8.1. We can pose non-statics queries
to the DBN D. For instance, the probability of bob having high fever at time point 2 , PDh(f2), can be computed as
PB1(f1) · PB→(f2| f1) + PB1(¬f1) · PB→(f2 | ¬f1),
which is a dynamic version of standard probabilistic inference of BNs . ♦ Intuitively, the distribution at time t is defined by unraveling the DBN starting from B1, using the two-slice structure of B→ until t copies of V have been created. This
produces a new BN B1:t encoding the distribution over time of the different variables.
Figure 8.3 shows the unraveling to t = 3 of the DBN (B1, B→), where B1 and B→ are
the networks shown in Figures 8.1 and 8.2, respectively.
The conditional probability tables of each node given its parents (not shown) are those of B1 for the nodes in V1, and of B→ for nodes in V2∪ V3. Notice that B1:t has
8.2 Dynamic Bayesian Ontology Languages F1 C1 S1 F2 C2 S2 F3 C3 S3
Figure 8.3: Three step unraveling B1:3 of (B1, B→)
t copies of each random variable in V . For a given t ≥ 1, we call Bt the BN obtained
from the unraveling B1:t of the DBN to time t, and eliminating all variables not in Vt.
In particular, we have that PBt(V ) = PB1:t(Vt).
For notational convenience, we write V1:t := V1∪ . . . ∪ Vt, and W1:t for a valuation of V1:t. Moreover, we write Xt⊆ Vt to denote a set of variables at time t and xtto denote
an instantiation of these variables; if, furthermore, Xt= Vt then xt corresponds to a
state at time t. Analogously, we write X1:t to denote a sequence of variables x1. . . xt
and x1:t to denote an instantiation of these variables, which is usually called a trajectory.
Traditional inference problems in DBNs are given in Figure 8.4 with the help of a simple timeline. Formally, given a DBN, filtering (also called monitoring) is the task of computing PD(xt | y1:t). Smoothing and prediction are the past (PD(xt−l | y1:t)) and
future (PD(xt+h| y1:t))analogs of filtering, respectively; and classification is a special
case of filtering, which amounts to computing PD(y1:t). Finally, finding a hypothesis,
which maximizes the probability of the query is called decoding, and is computed as arg maxx
1:tPD(x1:t | y1:t). Decoding is analogous to Maximum a Posteriori Hypothesis
(MAP) and most probable explanation (MPE) in BNs (Park and Darwiche 2004b). For
further details, we refer to (Murphy 2002).