Graphical models assumed a fixed-size state-space, i.e., a fixed number of random variables. Such models are called propositional, since they just assign probabilities to propositions of the formX1:N =x1:N. There
has been a lot of work in the AI community on what is called first-order probabilistic logic (FOPL). Below, I will briefly review this work. (See [Pfe00, ch.9] for a more thorough review and list of references).
The three main components of a first-order language are objects, relations and quantifiers. Objects are basically groups of attributes which “belong together”, c.f. a structure in a programming language, or an AI “frame”. To specify the behavior of an object, it is convenient to use universal quantifiers, e.g., ∀x.human(x) ⇒ mortal(x). This is usually modelled using classes: all objects that are instances of the human class have the property that they are mortal. Hence classes are just a way of specifying parameter tying, c.f., the way DBNs allow us to define probability distributions over semi-infinite sequences by parameter tying (quantifying over time: ∀t.P(Xt = i|Xt−1 =j) = A(i, j)). Existential quantifiers (e.g.,∃x.enemy(x)∧ nearby(x)) in FOPL are handled by checking whether the predicate is true for any object in the current world state.
Finally, an n-ary relation can be thought of as a binary random variableR which has n objects as parents, X1:n; R is true iffX1:n satisfies the relation, e.g., sibling-of(X1, X2)is true iffX1 is a sibling of X2. Often it is very useful to find allX0 s.t. R(X, X0)is true; this is a multi-valued function, e.g., siblings-of(X1) = {X2 : sibling-of(X1, X2) = true}. If this is guaranteed to be a single-valued function, we can write e.g.,X1.mother to mean mother-of(X1), etc. This is sometimes called a reference slot, since its value is a pointer to another object. Structural/ relational uncertainty means that we are uncertain of the value of a reference slot. A simple example is data association ambiguity: we do not know which object caused the observation. So we can writeP(Y|Y.cause=Xi) =f(Xi)to denote that what we expect to see depends on the properties of the object we are actually looking at (whose identity will generally be uncertain).
c(Al,Pat,1) c(Pat,Jan,1) a(Al,Jan,1)
a(Al,1) a(Pat,1) a(Jan,1)
a(Al,2) a(Pat,2) a(Jan,2)
c(Al,Pat,2) c(Pat,Jan,2)
a(Pat,3)
Figure A.12: A Bayes net created from a temporal knowledge base. Some lines are shown dotted merely to reduce clutter. a(X,T) is true if objectXhas aids at timeT; c(X,Y,T) is true ifXandY had sexual contact at timeT. Based on Figure 4 of [GK95].
A.5.1
Knowledge-based model construction (KBMC)
KBMC uses a knowledge base to specify set of rules which can be used to create BN structure on a case-by- case basis (see [WBG92] for an early review). Probabilistic logic programming (PLP) [NH97] is an example of KBMC.
Situation calculus (see e.g., [RBKK95]) extends first order logic to temporal reasoning by adding a time argument to all predicates. A similar approach was used in [GK95] to extend PLP to the temporal case. For example, consider the rule
∀X, T.aids(X, T)⇐aids(X, T −1)∨(∃Y.aids(Y, T −1)∧contact(X, Y, T −1))
In Prolog, this is usually written as aids(X,T) :- aids(X,T-1)
aids(X,T) :- exists(Y), aids(Y,T-1), contact(X,Y,T-1)
Given a ground query (i.e., one which does not contain any free variables), such asaids(Pat,2), the sys- tem performs backwards chaining, matching rules whose heads (left hand side) match the query. This gen- erates two new “queries” or “proof obligations”,aids(Pat,1)andaids(Y,1)∧contact(X,Y,1); the latter is then instantianted for each possible value of Y, and the corresponding nodes are added to the network. If the input facts are aids(Al,1)=1, contact(Al,Pat,1)=1, aids(Jan,1)=0, aids(Pat,1)=0,contact(Jan,Pat,1)=0, then we create the Bayes net shown below.
c(Al,Pat,1) c(Pat,Jan,1)
a(Al,1) a(Pat,1) a(Jan,1)
a(Pat,2)
The rules which have the same head are combined in using a mechanism like noisy-OR, which can handle a variable number of parents. We can then apply any standard inference algorithm to the resulting BN. If the query becomesaids(Pat,3), we create the more complicated network shown in Figure A.12.
It is clear that the models created in this way have an irregular structure, which is hard to exploit for efficient inference, especially online inference. Also, KBMC does not allow one to express structural or relational uncertainty, since the structure of the graph is automatically generated given the rules and the background facts. We will address both of these problems in the following sections.
A.5.2
Object-oriented Bayes nets
Object oriented Bayes nets (OOBNs) were introduced in [KP97]. (Similar ideas have been proposed in [LM97, BW00].) The basic idea is that each object has a set of input, output and internal (value) attributes;
given the inputs and outputs (the object’s interface), the internal attributes are d-separated from the rest of the graph. This, plus the hierarchical structure of the model, allows for more efficient inference than is possible with KBMC-generated flat models c.f., [XPB93]. Furthermore, all instances of a class share the same parameters, making learning easier [LB01, BLN01].
[FKP98] introduce the concept of a dynamic object oriented Bayes net (DOOBN). Each object can either be transient or persistent; if it is persistent, it is allowed to refer to the “old” values of its internal attributes, i.e., the ones in the previous time slice. The objects can evolve at different time scales, simply by copying slowly evolving ones less often. Objects can also interact intermittently; this can be modelled by adding switching (guard condition) nodes, which effectively disable links until certain conditions are met c.f., [MP95].
The DOOBN can be flattened to a regular DBN in a straightforward way for inference purposes. Un- fortunately, unlike the static case, the object oriented structure does not help speedup exact inference, since, as we shall see in Section 3.5, essentially all objects become correlated. However, the structure may be ex- ploitable by certain approximation algorithms such as BK (see Section 4.2.1). Note that, if objects evolve at different time scales, the resulting structure will be irregular, as with KBMCs. For online inference, it will often be necessary to copy all objects at every step.
A.5.3
Probabilistic relational models
Probabilistic relational models [FGKP99, Pfe00] extend object oriented Bayes nets by allowing general re- lations between objects, not just “part of” relations. (In an OOBN, the inputs to an object are part of the enclosing object; the model must be strictly hierarchical.) In a PRM, each object can have attributes and reference slots, which are pointers to other objects. (A reference slot is just like a foreign key in a rela- tional database.) An example might be course.instructor, where course is an instance of the Course class, and instructor is a pointer to an instance of the Instructor class.
Reference uncertainty means the value of the pointer (reference slot) is unknown. This can easily be modelled by add all possible objects as parents, and using a multiplexer, as in Section 2.4.6. Alternatively, we can reify a relation into an object itself, e.g., a Registration object might have two pointers, one to a Course and one to a Student. We can then define a joint probability distribution over the values of the Course and Student fields, which might depend on attributes of the Course and Student (e.g., undergrads might be less likely to take grad classes).
PRMs have not yet been applied to temporal reasoning, but it should be straightforward to do so, at least if there is no structural uncertainty. Unfortunately, the flattened graphs may be less structured than in the case of DOOBNs, potentially making inference harder. An MCMC-based approach to inference in PRMs is described in [PR01].
Appendix B
Graphical models: inference
B.1
Introduction
In this appendix, I give a tutorial on how to do exact and approximate inference in Bayesian networks. (Nearly all of the techniques are also applicable to undirected graphical models (MRFs) as well.) This tutorial brings together a lot of material which cannot be found in any one place. I start with the variable elimination algorithm, and then show how this implicitly creates a junction tree (jtree). The jtree can then be used as the basis of a message passing procedure that computes the marginals on all the nodes in a single forwards- backwards pass. I go on to discuss how to handle continuous variables and how to do approximate inference using algorithms based on message passing, including belief propagation and expectation propagation.