6.4 Probabilistic Domination Count
6.4.5 Generating Functions vs Uncertain Generating Functions
It is clear that instead of applying the uncertain generating function to approximate the domination count of B, two regular generating functions can be used; one generating function that uses the progressive (lower) bounds PU B(Ai ≺Q B) and one that uses the
conservative (upper) probability bounds PU B(Ai ≺Q B). In the following we give an
intuition and a formal proof that using regular generating functions yields looser bounds for the approximated domination.
6.4 Probabilistic Domination Count 119
LetDB =A1, ..., AN be an uncertain object database andB andQbe uncertain objects
inRd. Let (Ai ≺Q B),1≤i≤N denote the random Bernoulli event that Ai dominatesB
w.r.t. Q. Let PLB(Ai ≺Q B) (PU B(Ai ≺QB)) be a lower (upper) bound approximation of
the probabilistic event (Ai ≺Q B) = 1.
A lower bound of the probability DomCount(B, Q) = k can be derived using the following generating function:
FN = Y i∈1,...,N [(PLB(Ai ≺Q B)·x+ (1−PU B(Ai ≺Q B))] = X i≥0 cixi. (6.4)
Intuitively, this generating function uses the progressive approximation PLB(Ai ≺Q
B) of the probability that Ai dominates B w.r.t. Q and the progressive approximation
1−PU B(Ai ≺Q B) of the probability thatAi does not dominateB w.r.t. Q. This generating
function is equal to the uncertain generating function (cf. Equation 6.3) if the uncertain percentage (i.e. the coefficient of y) is omitted for each candidateAi, i.e. if the coefficient
of each y is set to 0.
Lemma 21. Let ck be the coefficients obtained by the generating function in Equation 6.4
and let PLB(DomCount(B, Q) = k) be the lower bound derived by applying the generating
function in Equation 6.3 and exploiting Lemma 20. It holds that
ck =PLB(DomCount(B, Q) =k),
i.e. the lower bound obtained by the (non-uncertain) generating function in Equation 6.4 is as good as the lower bound obtained using the technique in Section 6.4.4.
Proof. The lower bound derived using the uncertain generating function for the probability
DomCount(B, Q) = k is equal to the coefficient ck,0. The coefficient ck,0 corresponds to
the variable xky0 = xk. Since Equation 6.4 and Equation 6.3 are identical except for the variables containing at least one y, but no y is contained in the variable of the coefficient
ck,0 in Equation 6.3, it is identical to the coefficient ck in Equation 6.4.
We can see that we can use a (non-uncertain) generating function to derive the same lower bound. The advantage here is that the (non-uncertain) generating function is easier to compute, due to a much (linear in k) lower number of coefficients. However, deriving an upper bound of the probabilityDomCount(B, Q) = k is not as easy, because the upper bound does use the uncertainty of objects. An upper bound can be derived using the following generating function:
FN = Y i∈1,...,N [(PU B(Ai ≺QB)·x+ (1−PLB(Ai ≺QB))] = X i≥0 cixi. (6.5)
The idea is to use a conservative approximation PU B(Ai ≺Q B) for the probability that
Ai dominates B and a conservative approximation 1−PLB(Ai ≺Q B) for the probability
that Ai does not dominate B. The difference of this generating function compared to
the uncertain generating function (cf. Equation 6.3) is that the uncertain percentage (the coefficient of y) is added to both probabilities P(Ai ≺Q B) and P(¬(Ai ≺QB))3.
Lemma 22. Let ck be the coefficients obtained by the generating function in Equation 6.5
and let PLB(DomCount(B, Q) = k) be the lower bound derived by applying the generating
function in Equation 6.3 and exploiting Lemma 20. It holds that
ck ≥PLB(DomCount(B, Q) = k),
i.e. the upper bound obtained by the (non-uncertain) generating function in Equation 6.4 is in general not as good as the lower bound obtained using the technique in Section 6.4.4. Proof. We give an example where the upper bound derived by Equation 6.4 is worse than the upper bound derived by Equation 6.3 and exploiting Lemma 20. Assume a database containing uncertain objects A1, A2, B and Q. The task is to determine the probability that DomCount(B, Q) = 1. Also assume that we have determined (e.g. as proposed in Section 6.3.2) a lower bound PLB(A1 ≺Q B) (PLB(A2 ≺Q B)) and an upper bound
PU B(A1 ≺Q B) (PU B(A2 ≺Q B)) of the probability that A1 (A2) dominates B w.r.t. Q.
To ease the notation, let A+i denote PLB(Ai ≺Q B), let A−i denote 1−PU B(Ai ≺QB) and
letA?
i denote the uncertain fraction PU B(Ai ≺Q B)−PLB(Ai ≺QB).
6.4 Probabilistic Domination Count 121
Using this notation, Equation 6.3 becomes:
(A+x+A?y+A−)·(B+x+B?y+B−) Expansion yields:
A+B+x2+A+B?xy+A+B−x+A?B+xy+A?B?y2+
A?B−y+A−B+x+A−B?y+A−B−
Exploiting Lemma 20 yields the following upper bound probability forDomCount(B, Q) = 1:
PU B(DomCount(B, Q) = 1) =
A+B?+A+B−+A?B++A?B?+A−B?+A−B++A−B?
On the other hand, Equation 6.4 becomes:
((A++A?)x+A−+A?)·((B++B?)x+B−+B?) Expansion yields:
(A++A?)·(B++B?)x2+ (A++A?)·(B−+B?)x+ (A−+A?)·(B++B?)x+ (A−+A?)·(B−+B?) Extracting the coefficient c1 of x1 yields:
c1 = (A++A?)·(B−+B?) + (A−+A?)·(B++B?)
Expansion yields:
c1 =A+B−+A++B?+A?B−+A?B?+ A−B++A−B?+A?B++A?B?
Comparing PU B(DomCount(B, Q) = 1) and c1, we obtain:
c1−PU B(DomCount(B, Q) = 1) =A?B?
Thus, the upper bound c1 of the (non-uncertain) generating function is greater (and thus
worse) than the upper bound PU B(DomCount(B, Q) = 1) of the uncertain generating
function by a total of A?B?.
Intuitively, the problem of the upper bound using the (non-uncertain) generating func- tion is that the uncertain fraction (i.e. A?) is added to both the probability that A
i
dominates B (i.e. A+) and to the probability that Ai does not dominate B (i.e. A−).