Distribution Semantics - A Probabilistic Prolog and its Applications

The distribution semantics as rigorously defined by Sato [1995] provides a formal basis for extending logic programming with probabilistic elements. It is a generalization of the least Herbrand model semantics, where the main difference is that logic programs contain a set of dedicated facts whose truth values are not directly set to true as in the least Herbrand model of a usual logic program, but determined probabilistically. Once these truth values are fixed, the program again has a unique least model extending the partial interpretation, which of course can be different depending on the initially chosen assignments. The distribution semantics now defines a distribution over these least Herbrand models of the program by extending a joint probability distribution over the set of dedicated facts. In its basic form, where the joint distribution is defined using a set of independent random events, it is a well-known semantics for probabilistic logics that has been (re)defined multiple times in the literature, often under other names or in a more limited database setting; cf. for instance [Dantsin, 1991; Poole, 1993b; Fuhr, 2000; Poole, 2000; Dalvi and Suciu, 2004]. Sato has, however, formalized a more general setting, including the case of a countably infinite set of random variables and using arbitrary discrete distributions over these basic random variables, in his well-known distribution semantics. We briefly repeat the basic ideas in the following; for more details, the interested reader is referred to [Sato, 1995].

We assume a first order language with denumerably many predicate, constant and functor symbols. Let DB = F ∪ R be a definite clause program, where F is a set of unit clauses, called facts, and R is a set of (possibly non-unit) clauses, called

rules. For simplicity, it is assumed that DB is ground and denumerably infinite,

and no fact in F unifies with the head of a rule in R. The distribution semantics can be viewed as a possible worlds semantics, where ground atoms are treated as random variables, and worlds thus correspond to interpretations assigning truth values to all ground atoms in DB.

The key idea of the distribution semantics is to extend a basic distribution PF over subsets or interpretations F0_{⊆ F} _{into a distribution P}

DB over the least Herbrand models of DB, exploiting the uniqueness of the least Herbrand model of F0_{∪ R}_for each such F0_{. We first illustrate this for the finite case by means of an example.}

Example 2.5 Given the definite clause program DB = F ∪ R with

F = {a(0), a(1)}

R= {(b(0) : −a(0)), (b(1) : −a(1), b(0))}

we enumerate ground atoms in F and DB as ha(0), a(1)i and ha(0), b(0), a(1), b(1)i, respectively. This allows us to denote interpretations as binary vectors, where the i-th bit denotes the truth value of the i-th atom in the corresponding enumeration.

DISTRIBUTION SEMANTICS 15

Based on this notation, we define the basic distribution PF over ΩF = {0, 1}2 as

PF(00) = 0.21 PF(01)= 0.04 PF(10) = 0.58 PF(11)= 0.17

PF is now extended to a distribution PDB over ΩDB= {0, 1}4 by setting

PDB(ˆω) = PF(ω)

if ˆω corresponds to the least Herbrand model of DB extending ω, and PDB(ˆω) = 0

otherwise, that is

PDB(0000) = 0.21 PDB(0010)= 0.04

PDB(1100) = 0.58 PDB(1111)= 0.17

For an arbitrary sentence G using the vocabulary of DB we define the set of possible worlds ˆω ∈ ΩDB where G is true as

[G] = {ˆω ∈ ΩDB | ˆω |= G}.

Given a distribution PDBover ΩDB, the probability of G is defined as the probability of the set [G], which in the finite case is

PDB([G]) = X

ω∈[G]

PDB(ˆω) (2.1)

Example 2.6 Continuing our example, the probability of b(0) is

PDB([b(0)]) = PDB({1100, 1111}) = 0.58 + 0.17 = 0.75,

while that of ∀x.b(x) is

PDB([∀x.b(x)]) = PDB([b(0) ∧ b(1)]) = PDB({1111}) = 0.17.

While for finitely many basic facts, PF and thus PDB can be defined by exhaustive enumeration of ΩF, this is no longer possible for infinite F . Sato showed how to define PDB based on a series of finite distributions P

(n)

F over interpretations ωn of the first n variables in F . For this to be possible, these distributions have to satisfy the compatibility condition, that is

P_F(n)(ωn) = P

(n+1)

F (ωn1) + P

(n+1)

F (ωn0) (2.2)

Intuitively, this condition ensures that if a sentence G satisfies the finite support

condition, that is, there are finitely many minimal subsets F0 _{⊆ F} _{such that}

F0 ∪ R |= G, we can fix a suitable enumeration of F and restrict probability

calculations to a finite prefix of this enumeration covering all facts appearing in these minimal subsets. We do not go into further technical detail here, but instead illustrate one basic and popular choice of such distributions P(n)

F by means of an example.

16 FOUNDATIONS

Example 2.7 We extend our example switching to successor notation for natural

numbers.

F = {a(0), a(s(0)), a(s(s(0))), a(s(s(s(0)))), . . .} R= {(b(0) : −a(0)), (b(s(N)) : −a(s(N)), b(N))}

The basic sample space is now ΩF = {0, 1}∞, that is, the space of countably infinite

Boolean vectors. We fix enumerations for ground atoms in F and DB extending the ones used above, that is, following the order of arguments and iterating between aand b in the case of DB. Again, an interpretation of F , for example ω = 110∞, leads to a unique model of DB, in this case ˆω = 11110∞_.

We consider all random variables corresponding to ground facts in F to be mutually independent, and assign a probability of being true to each of them. For the sake of simplicity, we use the same probability p for each fact. Consider now a finite prefix ωn of an interpretation ω ∈ ΩF, where m variables are assigned 1. Given

the independence assumption, the joint probability of the first n random variables taking value ωn is thus

P_F(n)(ωn) = pm·(1 − p)n−m.

Clearly, this series of distributions respects the compatibility condition of Equation (2.2). To calculate the probability of b(s(s(0)) in our example, it is sufficient to use P(3)

F , as the first three elements of F already determine the truth

value of the query, that is

PDB([b(s(s(0))]) = PDB({ˆω ∈ ΩDB|ˆω6= 111111}) = P (3)

F (111) = p

Finally, let us remark that the key to the distribution semantics is the existence of a unique canonical model of the entire program given an interpretation of the basic facts. While in the original distribution semantics, R is a definite clause program and thus has a unique least Herbrand model, it is equally possible to use the well-founded semantics as discussed in Section 2.1, but parameterized by the set of basic facts, and restrict the set of rules R in such a way that for each two-valued interpretation of the basic facts, the well-founded model of DB is two-valued as well. In this view, R is closely related to the definitions in FO(ID) [Denecker and Vennekens, 2007; Vennekens et al., 2009], but restricts rule bodies to conjunctions of literals instead of arbitrary first order formulae.

In document A Probabilistic Prolog and its Applications (Page 36-38)