2.3 Probabilistic Logic Programming
2.3.1 ProbLog
We will briefly review the probabilistic programming language ProbLog. For more details about this PPL please consult [De Raedt et al., 2007]. We will illustrate all concepts on the following example.
Example 2.5. Consider the following ProbLog program modelling a variation of the classical alarm problem presented in [Russell and Norvig, 2010]. The program has the random variables: burglary, earthquake, hears_alarm(john) and hears_alarm(mary), and states that there is an alarm whenever there is a burglary or an earthquake. The last clause states that if there is an alarm and a person hears the alarm, that person will call.
0.1 :: burglary. 0.2 :: earthquake. 0.7 :: hears_alarm(john). 0.6 :: hears_alarm(mary). alarm ← burglary. alarm ← earthquake.
calls(Pers) ← alarm, hears_alarm(Pers).
The probabilistic facts pi:: fi∈ F signify that fiθ is true with probability pi
for all substitutions θ grounding fi. The probabilistic facts in Example 2.5 are
the following:
F = {0.1 :: burglary, 0.2 :: earthquake,
0.7 :: hears_alarm(john), 0.6 :: hears_alarm(mary)}.
A ProbLog program T with F = {p1 :: f1, ..., pn :: fn} defines a probability
distribution over logic programs L ⊆ LT = {f1, ..., fn}S BK as:
P (L|T ) = Y fi∈L pi Y ci∈LT\L (1 − pi). (2.11)
In terms of logic, L is a complete interpretation that states that all atoms contained in L are true, while all the rest are false.
For example, the probability of the total choice {burglary, hears_alarm(john)}, expressing that a burglary and not an earthquake has occurred, while John heard the alarm but Mary had not, is: 0.1 × (1 − 0.2) × 0.7 × (1 − 0.6) = 0.0224. To compute the success probability of a query (e.g., the query alarm) we need to compute the probability that the query is provable in a randomly sampled logic program. As there are exponentially many subprograms L ⊆ LT (e.g., in this
PROBABILISTIC LOGIC PROGRAMMING 21
than enumerating these explicitly, one would compute the proofs of the query and observe that alarm is true exactly when earthquake or burglary is true. Several more efficient inference methods have been proposed for computing the success probability of a query, both for computing the exact value and for computing an approximation. For more details on these, please refer to [De Raedt et al., 2007] or [Gutmann et al., 2011b] among others.
For example, using the ProbLog exact inference method, the (success) probability of the query calls(mary) for the ProbLog program in Example 2.5 is:
Ps(calls(marry)) = 0.168.
Note that predicates in the body of clauses can also be used to define constraints in the program. A predicate bi is a constraint for a predicate h if it is present
in the body of all the clauses whose head is h. For example, in Example 2.5, alarm is a constraint for calls(Pers): alarm needs to be true for calls(Pers) to be true.
For ease of presentation of programs, sometimes we will also use annotated disjunctions. For example, to model that the shape of an object a robot detects is randomly chosen from a set of two predefined shapes (i.e. the shape can take exclusively one of the two values with a probability of 12) one can write the following clause:
1
2 :: shape(Obj, cube); 1
2 :: shape(Obj, cylinder) ← obj(Obj).
where variable Obj is universally quantified over the set of all objects.
Formally, an annotated disjunction, which is a generalisation of a probabilistic fact, is a statement p1:: f1; ...; pn:: fn← b, where the body b is a conjunction
of atoms. For all substitutions θ grounding bθ and all f1θ, ..., fnθ, when bθ
is true at most one fiθ is true; this fiθ becomes true with probability pi
[Vennekens et al., 2009].
We can now make the link to the BNs introduced in Section 2.1, and we can show how any BN can be modelled with the help of ProbLog and annotated disjunctions. The procedure, which we will summarise here, is explained in more detail in [Vennekens et al., 2009].
Consider any node Xi of the BN, with domain {x1i, ..., x k
i}. Assume node Xi
has m parents pa(Xi) = {Xp1, ..., Xpm}. Let fi(x
j
i) be a probabilistic fact that
we use to denote that some random variable Xi takes on value xji. Assume
that when pa(Xi) take on values w1, ..., wm, we have the conditional probability
table giving probabilities pji = P (Xi = xji|Xp1 = w1, ..., Xpm = wm). Then,
we can write the annotated disjunction clause: (p1
i :: fi(x1i); ...; pki :: fi(xki) ←
22 BACKGROUND
of the set of all such clauses for all nodes Xi and for all parent values in their
domain.
Example 2.6. The BN from Figure 2.1 can be represented by the following ProbLog program using annotated disjunctions:
0.2 :: burglary(true); 0.8 :: burglary(false) ← true. 0.1 :: earthquake(true); 0.8 :: earthquake(false) ← true. 0.9 :: alarm(true); 0.1 :: alarm(false) ← burglary(true),
earthquake(true).
0.8 :: alarm(true); 0.2 :: alarm(false) ← burglary(true), earthquake(false).
0.3 :: alarm(true); 0.7 :: alarm(false) ← burglary(false), earthquake(true).
0.1 :: alarm(true); 0.9 :: alarm(false) ← burglary(false), earthquake(false).
0.8 :: phonecall(true); 0.2 :: phonecall(false) ← alarm(true). 0.1 :: phonecall(true); 0.9 :: phonecall(false) ← alarm(false). ProbLog also supports parameter learning through learning from interpretations (LFI) [Gutmann et al., 2011a]. LFI uses a ProbLog program T (p) for the unknown parameters p = hp1, ..., pni, and our gathered training
data D = {d1, ..., dM}, di the data from instance i, to compute the
maximum likelihood parameter estimation: p = arg maxˆ PP (D|T (p)) =
arg maxP
QM
m=1P (d
m|T (p)), thus obtaining the probability parameters of the
(relationally encoded) BN.
ProbLog will be used for the modelling of relational affordances in a discrete setting. This will be done in Chapter 4 of the thesis.