Using Less Explanations - Approximative Inference

4.2 Approximative Inference

4.2.1 Using Less Explanations

We first discuss approximation techniques that reduce the size of the DNF by considering a subset of all possible explanations. We here exploit the fact that the DNF formula describing sets of explanations is monotone, meaning that adding more explanations will never decrease the probability of the formula being true. Thus, formulae describing subsets of the full set of explanations of a query will always give a lower bound on the query’s success probability.

Example 4.6 In Example 3.1, the lower bound obtained from the shorter

explanation would be PT_{(cd) = 0. 9, while that from the longer one would be}

APPROXIMATIVE INFERENCE 63

Bounded Approximation

The first approximation algorithm, a slight variant of the one proposed in [De Raedt et al., 2007b], uses DNF formulae to obtain both an upper and a lower bound on the probability of a query. It is closely related to work by Poole [1993a] in the context of PHA, but adapted towards ProbLog.

We observe that the probability of an explanation l1∧ . . . ∧ ln, where the li are positive or negative literals involving random variables bi, will always be at most the probability of an arbitrary prefix l1∧ . . . ∧ li, i ≤ n.

Example 4.7 In the graph example, the probability of the second explanation will

be at most the probability of its first edge from c to e, i.e., PT_{(ce) = 0. 8 ≥ 0. 4.} As disjoining sets of explanations, i.e., including information on additional facts, can only decrease the contribution of single explanations, this upper bound carries over to a set of explanations or partial explanations, as long as prefixes for all possible explanations are included. Such sets can be obtained from an incomplete SLD-tree, i.e., an SLD-tree where branches are only extended up to a certain point. These observations motivate ProbLog’s bounded approximation algorithm. The algorithm relies on a probability threshold γ to stop growing the SLD-tree and thus obtain DNF formulae for the two bounds2_{. The lower bound formula D}

1represents

all explanations with a probability above the current threshold. The upper bound formula D2 additionally includes all derivations that have been stopped due to

reaching the threshold, as these still may succeed. Our goal is therefore to refine

D1 and D2 in order to decrease PT(D2) − PT(D1).

Bounded approximation as outlined in Algorithm 4.6 proceeds in an iterative- deepening manner similar to Algorithm 4.4, but collecting explanations in the two DNF formulae D1 and D2 instead of remembering the most likely explanation

only. Initially, both D1 and D2 are set to False, the neutral element with

respect to disjunction, and the probability bounds are 0 and 1, as we have no full explanations yet, and the empty partial explanation holds in any model. After each iteration, BDDs for both formulae are constructed to calculate their probabilities using Algorithm 4.5, and iterative deepening stops once their difference falls below the stopping threshold δ. It should be clear that PT_(D

1) monotonically

increases, as the number of explanations never decreases. On the other hand, as explained above, if D2 changes from one iteration to the next, this is always

because a partial explanation E is either removed from D2and therefore no longer

contributes to the probability, or it is replaced by explanations E1, . . . , En that extend E by additional literals, that is, Ei = E ∧ Si for conjunctions Si, hence

2_{Using a probability threshold instead of the depth bound of [De Raedt et al., 2007b] has been}

64 THE PROBLOG SYSTEM

Algorithm 4.6Bounded approximation using iterative deepening with probability

thresholds.

1: _{function Bounds(query q, interval width δ, initial threshold γ, constant}

β ∈(0, 1)) 2: D1:= False; P1:= 0; P2:= 1 3: repeat 4: D2:= False 5: repeat 6: (result, E, p) := ResolveThreshold(q, γ)

7: if result= success then

8: D1:= D1∨ E ; D2:= D2∨ E

9: if result= stop then

10: D2:= D2∨ E

11: backtrack to the remaining choice points of ResolveThreshold

12: _{until ResolveThreshold has no choice points remaining}

13: Construct BDDs B₁ and B₂ corresponding to D₁ and D₂ 14: P1:= Probability(root(B1)) 15: P2:= Probability(root(B2)) 16: γ:= γ · β 17: until P2− P1≤ δ 18: return [P1, P2] PT_(E 1∨ . . . ∨ En) = PT(E ∧ S1∨ . . . ∨ E ∧ Sn) = PT(E ∧ (S1∨ . . . ∨ Sn)). As explanations are partial interpretations of the probabilistic facts in the ProbLog program, each literal’s random variable appears at most once in the conjunction representing an explanation, even if the corresponding subgoal is called multiple times during construction. We therefore know that the literals in the prefix E cannot be in any suffix Si, hence, given ProbLog’s independence assumption,

PT_{(E ∧ (S}

1∨ . . . ∨ Sn)) = PT(E)PT(S1∨ . . . ∨ Sn) ≤ PT(E). Therefore, P (D2)

monotonically decreases.

Example 4.8 Consider a probability threshold γ = 0. 9 for the SLD-tree in

Figure 4.1. In this case, D1 encodes the left success path while D2 additionally

encodes the path up to path(e, d), i.e., D1 = cd and D2 = cd ∨ ce, whereas the

formula for the full SLD-tree is D = cd ∨ (ce ∧ ed). The lower bound thus is 0. 9, the upper bound (obtained by disjoining D2 to cd ∨ (ce ∧ ¬cd)) is 0. 98, whereas the

true probability is 0. 94.

K-Best

Using a fixed number of explanations to approximate the probability allows better control of the overall complexity, which is crucial if large numbers of queries have

APPROXIMATIVE INFERENCE 65

to be evaluated, e.g., in the context of parameter learning as discussed in Chapter 7. We therefore introduce the k-probability PT

k (q), which approximates the success probability by using the k-best (that is, the k most likely) explanations instead of all explanations when building the DNF formula used in Equation (3.25):

P_kT(q) = P   _ E∈Explk(q) ^ fi∈E1 bi∧ ^ fi∈E0 ¬bi 

In document A Probabilistic Prolog and its Applications (Page 84-87)