4.2 Approximative Inference
4.2.1 Using Less Explanations
We first discuss approximation techniques that reduce the size of the DNF by considering a subset of all possible explanations. We here exploit the fact that the DNF formula describing sets of explanations is monotone, meaning that adding more explanations will never decrease the probability of the formula being true. Thus, formulae describing subsets of the full set of explanations of a query will always give a lower bound on the query’s success probability.
Example 4.6 In Example 3.1, the lower bound obtained from the shorter
explanation would be PT(cd) = 0. 9, while that from the longer one would be
APPROXIMATIVE INFERENCE 63
Bounded Approximation
The first approximation algorithm, a slight variant of the one proposed in [De Raedt et al., 2007b], uses DNF formulae to obtain both an upper and a lower bound on the probability of a query. It is closely related to work by Poole [1993a] in the context of PHA, but adapted towards ProbLog.
We observe that the probability of an explanation l1∧ . . . ∧ ln, where the li are positive or negative literals involving random variables bi, will always be at most the probability of an arbitrary prefix l1∧ . . . ∧ li, i ≤ n.
Example 4.7 In the graph example, the probability of the second explanation will
be at most the probability of its first edge from c to e, i.e., PT(ce) = 0. 8 ≥ 0. 4. As disjoining sets of explanations, i.e., including information on additional facts, can only decrease the contribution of single explanations, this upper bound carries over to a set of explanations or partial explanations, as long as prefixes for all possible explanations are included. Such sets can be obtained from an incomplete SLD-tree, i.e., an SLD-tree where branches are only extended up to a certain point. These observations motivate ProbLog’s bounded approximation algorithm. The algorithm relies on a probability threshold γ to stop growing the SLD-tree and thus obtain DNF formulae for the two bounds2. The lower bound formula D
1represents
all explanations with a probability above the current threshold. The upper bound formula D2 additionally includes all derivations that have been stopped due to
reaching the threshold, as these still may succeed. Our goal is therefore to refine
D1 and D2 in order to decrease PT(D2) − PT(D1).
Bounded approximation as outlined in Algorithm 4.6 proceeds in an iterative- deepening manner similar to Algorithm 4.4, but collecting explanations in the two DNF formulae D1 and D2 instead of remembering the most likely explanation
only. Initially, both D1 and D2 are set to False, the neutral element with
respect to disjunction, and the probability bounds are 0 and 1, as we have no full explanations yet, and the empty partial explanation holds in any model. After each iteration, BDDs for both formulae are constructed to calculate their probabilities using Algorithm 4.5, and iterative deepening stops once their difference falls below the stopping threshold δ. It should be clear that PT(D
1) monotonically
increases, as the number of explanations never decreases. On the other hand, as explained above, if D2 changes from one iteration to the next, this is always
because a partial explanation E is either removed from D2and therefore no longer
contributes to the probability, or it is replaced by explanations E1, . . . , En that extend E by additional literals, that is, Ei = E ∧ Si for conjunctions Si, hence
2Using a probability threshold instead of the depth bound of [De Raedt et al., 2007b] has been
64 THE PROBLOG SYSTEM
Algorithm 4.6Bounded approximation using iterative deepening with probability
thresholds.
1: function Bounds(query q, interval width δ, initial threshold γ, constant
β ∈(0, 1)) 2: D1:= False; P1:= 0; P2:= 1 3: repeat 4: D2:= False 5: repeat 6: (result, E, p) := ResolveThreshold(q, γ)
7: if result= success then
8: D1:= D1∨ E ; D2:= D2∨ E
9: if result= stop then
10: D2:= D2∨ E
11: backtrack to the remaining choice points of ResolveThreshold
12: until ResolveThreshold has no choice points remaining
13: Construct BDDs B1 and B2 corresponding to D1 and D2 14: P1:= Probability(root(B1)) 15: P2:= Probability(root(B2)) 16: γ:= γ · β 17: until P2− P1≤ δ 18: return [P1, P2] PT(E 1∨ . . . ∨ En) = PT(E ∧ S1∨ . . . ∨ E ∧ Sn) = PT(E ∧ (S1∨ . . . ∨ Sn)). As explanations are partial interpretations of the probabilistic facts in the ProbLog program, each literal’s random variable appears at most once in the conjunction representing an explanation, even if the corresponding subgoal is called multiple times during construction. We therefore know that the literals in the prefix E cannot be in any suffix Si, hence, given ProbLog’s independence assumption,
PT(E ∧ (S
1∨ . . . ∨ Sn)) = PT(E)PT(S1∨ . . . ∨ Sn) ≤ PT(E). Therefore, P (D2)
monotonically decreases.
Example 4.8 Consider a probability threshold γ = 0. 9 for the SLD-tree in
Figure 4.1. In this case, D1 encodes the left success path while D2 additionally
encodes the path up to path(e, d), i.e., D1 = cd and D2 = cd ∨ ce, whereas the
formula for the full SLD-tree is D = cd ∨ (ce ∧ ed). The lower bound thus is 0. 9, the upper bound (obtained by disjoining D2 to cd ∨ (ce ∧ ¬cd)) is 0. 98, whereas the
true probability is 0. 94.
K-Best
Using a fixed number of explanations to approximate the probability allows better control of the overall complexity, which is crucial if large numbers of queries have
APPROXIMATIVE INFERENCE 65
to be evaluated, e.g., in the context of parameter learning as discussed in Chapter 7. We therefore introduce the k-probability PT
k (q), which approximates the success probability by using the k-best (that is, the k most likely) explanations instead of all explanations when building the DNF formula used in Equation (3.25):
PkT(q) = P _ E∈Explk(q) ^ fi∈E1 bi∧ ^ fi∈E0 ¬bi