Weighted CNF Encoding for Annotated Disjunctions

junctions

The inference pipeline of ProbLog2 is based on a transformation of the initial ProbLog program to a Boolean formula in CNF that is next compiled into an sd- DNNF. The sd-DNNF allows efficient weighted model counting (WMC) in order to compute probabilities (see Chapter 2). We now introduce an encoding of annotated disjunctions in line with this reduction which uses ProbLog constraints (see Section 4.1) to retain the semantics of the ADs.

5.4.1 Encoding

The encoding of annotated disjunctions we propose has two parts: (i) a logic program with weighted facts that is transformed to CNF according to the grounding and Boolean formula conversion components of the ProbLog2 pipeline; and (ii) a set of constraints that is directly added to the CNF. This is a special case of cProbLog (see Section 4.1), adapted directly to the specific constraints needed here. ProbLog employs weighted model counting (WMC) techniques for MARG and MPE inference on the sd-DNNF that results from compiling the CNF.

Definition 5.3(Weighted CNF encoding for ADs). The weighted CNF encoding

(or wCNF encoding, in short) of a ground AD p1:: h1; ..; pn :: hn← b1, .., bm.

with unique identifier aj consists of:

• for each hi with 1 ≤ i ≤ n a surrogate probabilistic fact spf(aj, hi, i)

with true(spf(aj, hi, i)) = pi and false(spf(aj, hi, i)) = 1.0 and a clause

hi: −b1, .., bm, spf(aj, hi, i).

• (the conjunction of) the following two constraints: n−1 ^ i=1 n ^ l=i+1 (¬spf(aj, hi, i) ∨ ¬spf(aj, hl, l)), m ^ k=1 bk ⇔ n _ i=1 spf(aj, hi, i)

Intuitively, surrogate probabilistic facts make the choices in ADs explicit, and constraints ensure that this does not introduce undesired combinations of values. The first constraint ensures that at most one surrogate probabilistic fact for a given AD can be true at a time. The second constraint ensures that one surrogate probabilistic fact for a given AD will be true iff the body of the AD is

true. When one head atom hi is selected for an AD, the other head atoms are

ignored. That is, they do not influence the probability of any selections with

hi true. That is why the false probability of a surrogate probabilistic fact is

set to 1.0. The first constraint states that for each AD, at most one surrogate probabilistic fact can be true in any possible world, and thus only one head atom can be made true by the AD. The second constraint states that a choice is made if and only if the body of the AD is true. We write a surrogate probabilistic

fact spf(aj, hi, i) with true(spf(aj, hi, i)) = p as (p, 1.0) :: spf(aj, hi, i). .

Example 5.9. The wCNF encoding of the two ADs of Example 5.3 consists of

the program part:

(0.6, 1.0)::spf(1, red(b1), 1). (0.3, 1.0)::spf(1, green(b1), 2). (0.1, 1.0)::spf(1, blue(b1), 3).

red(b1):- pick(b1), spf(1, red(b1), 1). green(b1):- pick(b1), spf(1, green(b1), 2). blue(b1):- pick(b1), spf(1, blue(b1), 3). (0.6, 1.0)::spf(2, pick(b1), 1).

(0.4, 1.0)::spf(2, no_pick(b1), 2). pick(b1):- spf(2, pick(b1), 1). no_pick(b1):- spf(2, no_pick(b1), 2).

(¬spf(1, red(b1), 1) ∨ ¬spf(1, green(b1), 2))∧ (¬spf(1, red(b1), 1) ∨ ¬spf(1, blue(b1), 3))∧ (¬spf(1, green(b1), 2) ∨ ¬spf(1, blue(b1), 3))

pick(b1) ⇔ (spf(1, red(b1), 1) ∨ spf(1, green(b1), 2) ∨ spf(1, blue(b1), 3)) ¬spf(2, pick(b1), 1) ∨ ¬spf(2, no_pick(b1), 2)

spf(2, pick(b1), 1) ∨ spf(2, no_pick(b1), 2)

The possible worlds of the program in which the constraints hold are (r, g, b, p,

npabbreviate red(b1), green(b1), blue(b1), pick(b1), no_pick(b1)):

Possible spf(1,r,1) spf(1,g,2) spf(1,b,3) spf(2,p,1) spf(2,np,2) r g b p np P(ωi) World ω1 T F F T F T F F T F 0.36 ω2 F T F T F F T F T F 0.18 ω3 F F T T F F F T T F 0.06 ω4 F F F F T F F F F T 0.40 4 Comparing Example 5.9 to Example 5.5 and Example 5.3 shows that our encoding results in a set of possible worlds that (i) have the same truth value assignments and (ii) the same probabilities. In contrast the set of possible worlds in Example 5.5 is inconsistent with the semantics of ADs. That is, we can compute the correct MPE state from the set of possible worlds listed in the table of Example 5.9. Furthermore, as there is a one-to-one correspondence between the possible worlds and selections we can perform all other inference tasks correctly.

In the next section we formally prove correctness for our approach.

5.4.2 Correctness

We prove correctness of our encoding for probabilistic inference in two steps. First, we show that for a set of annotated disjunctions, there is a one-to-one mapping between the models of the wCNF and the selections in the probability

tree (cf. Section 5.1.2), and second, that the weight of a model of the CNF is the probability of the corresponding selection.

Theorem 5.1. For a set A = {a₁, . . . , ak} of ground annotated disjunctions,

there is a bijection from the set Mof models of the wCNF for A to the set S

of selections in a probability tree T r for A.

Proof: Let LA be the set of atoms in A, and LF the set of surrogate facts in

the wCNF encoding of A. For every truth value assignment lF to LF, there

is exactly one truth value assignment lA to LA such that lF∪ lA is a model

of the program part of the encoding [16]. The first constraint filters out all assignments lF that assign true to more than one surrogate fact for the same

ground AD, and the second filters out those that assign true to any surrogate fact for an AD whose body is false in lF ∪ lA. Each remaining assignment

lF∪ lAis in one-to-one correspondence with a selection inS. That is, each such

an assignment (i.e., a model) corresponds to exactly one path from the root to a leaf of the tree T r (the assignment lA is a model of the node’s interpretation)

and there is one model for each path.

Theorem 5.2. Given a set A = {a₁, . . . , ak} of ground annotated disjunctions,

the set M of models of the wCNF for A, and the set S of selections in a

probability tree T r for A, the weight of a model M ∈Mequals the probability

of the corresponding selection S ∈S.

Proof: Follows directly from the fact that the model and the selection follow the same path through the tree and the definition of the weight function on the CNF. At each node, the probability of the selection so far is multiplied with the probability pi of the chosen head atom, and the weight of the model with the

weight of the AD’s surrogate facts, pi·1.0 · . . . · 1.0 = pi. The validity of Theorem 5.1 and Theorem 5.2 can be verified by comparing the possible worlds of the wCNF in Example 5.9 with the selections associated with the probability tree in Example 5.3: (i) each possible world is associated with exactly one selection and vice-versa (consistent with Theorem 5.1) and (ii) the probability of each possible world equals the probability of the selection that is in bijection with this possible world (consistent with Theorem 5.2). Later, in Example 5.10 we observe the correctness of our encoding (with respect to Theorem 5.1 and Theorem 5.2) for multiple ADs that have the same atoms in their heads.

5.4.3 Annotated disjunctions and multiple causes

Example 5.10 illustrates how our approach encodes a more special case where multiple ADs have the same atoms in their heads. That is, the same event can result from multiple causes.

Example 5.10. Consider again the same problem as in the previous examples:

a bag with colorful balls. According to one source of information, in the bag there are red, green and blue balls (as in Example 5.3), while another source states that there are only red and green balls in the bag:

r₁: 0.6::red(b1); 0.3::green(b1); 0.1::blue(b1) <- pick(b1). r₂: 0.7::red(b1); 0.3::green(b1) <- pick(b1).

r₃: 0.6::pick(b1); 0.4::no_pick(b1) <- true. The probability tree associated with these ADs is:

I1 = {p(b1), r(b1), r(b1)} ={p(b1), r(b1)} P=0.252 I2 = {p(b1), r(b1), g(b1)} P=0.234 I3 = {p(b1), g(b1), g(b1)} ={p(b1), g(b1)} P=0.054 I4 = {p(b1), b(b1), r(b1)} P=0.042 {p(b1)} I5 = {p(b1), b(b1), g(b1)} P=0.018 I6 = {np(b1)} P=0.4 {p(b1), r(b1)} 0.6 {p(b1), g(b1)} 0.3 {p(b1), b(b1)} 0.1 0.7 0.3 0.7 0.3 0.7 0.3 {} 0.4 0.6

and the selections corresponding to the paths from the root to a leaf are:

Selection: Interpretation Complete Probability

P(σi)

σ₁ I1 = {pick(b1), red(b1)} _X 0.252

σ₂ I2 = {pick(b1), red(b1), green(b1)} _X 0.108

σ₃ I2 = {pick(b1), red(b1), green(b1)} _X 0.126

σ₄ I3 = {pick(b1), green(b1)} _X 0.054

σ₅ I4 = {pick(b1), red(b1), blue(b1)} _X 0.042

σ₆ I5 = {pick(b1), green(b1), blue(b1)} _X 0.018

σ₇ I6 = {no_pick(b1)} × 0.4

In this case the interpretation I2 is defined by two selections (σ2 and σ3).

According to the definition, the MPE state is the interpretation associated with the selection with highest probability (in this case thus I1). Although the two

selections σ2 and σ3 define the same interpretation when computing the MPE

σ₂ and σ₃ need to be considered apart.

The following table states the possible worlds corresponding to the wCNF encoding of this program: Poss. spf(1,r,1) spf(1,g,2) spf(1,b,3) spf(2,r,1) spf(2,g,2) spf(3,p,1) spf(3,np,2) r g b p np P(ωi) World ω1 T F F T F T F T F F T F 0.252 ω2 F T F T F T F T T F T F 0.126 ω3 F F T T F T F T F T T F 0.042 ω4 T F F F T T F T T F T F 0.108 ω5 F T F F T T F F T F T F 0.054 ω6 F F T F T T F F T T T F 0.018 ω7 F F F F F F T F F F F T 0.4 4

In document Ontwerp en ontwikkeling van een pijpleiding architectuur voor probabilistische inferentie (Page 173-178)