Generating Functions vs Uncertain Generating Functions

6.4 Probabilistic Domination Count

6.4.5 Generating Functions vs Uncertain Generating Functions

It is clear that instead of applying the uncertain generating function to approximate the domination count of B, two regular generating functions can be used; one generating function that uses the progressive (lower) bounds PU B(Ai ≺Q B) and one that uses the

conservative (upper) probability bounds PU B(Ai ≺Q B). In the following we give an

intuition and a formal proof that using regular generating functions yields looser bounds for the approximated domination.

6.4 Probabilistic Domination Count 119

LetDB =A1, ..., AN be an uncertain object database andB andQbe uncertain objects

in_Rd. Let (Ai ≺Q B),1≤i≤N denote the random Bernoulli event that Ai dominatesB

w.r.t. Q. Let PLB(Ai ≺Q B) (PU B(Ai ≺QB)) be a lower (upper) bound approximation of

the probabilistic event (Ai ≺Q B) = 1.

A lower bound of the probability DomCount(B, Q) = k can be derived using the following generating function:

FN ₌ Y i∈1,...,N [(PLB(Ai ≺Q B)·x+ (1−PU B(Ai ≺Q B))] = X i≥0 cixi. (6.4)

Intuitively, this generating function uses the progressive approximation PLB(Ai ≺Q

B) of the probability that Ai dominates B w.r.t. Q and the progressive approximation

1−PU B(Ai ≺Q B) of the probability thatAi does not dominateB w.r.t. Q. This generating

function is equal to the uncertain generating function (cf. Equation 6.3) if the uncertain percentage (i.e. the coefficient of y) is omitted for each candidateAi, i.e. if the coefficient

of each y is set to 0.

Lemma 21. Let ck be the coefficients obtained by the generating function in Equation 6.4

and let PLB(DomCount(B, Q) = k) be the lower bound derived by applying the generating

function in Equation 6.3 and exploiting Lemma 20. It holds that

ck =PLB(DomCount(B, Q) =k),

i.e. the lower bound obtained by the (non-uncertain) generating function in Equation 6.4 is as good as the lower bound obtained using the technique in Section 6.4.4.

Proof. The lower bound derived using the uncertain generating function for the probability

DomCount(B, Q) = k is equal to the coefficient ck,0. The coefficient ck,0 corresponds to

the variable xky0 = xk. Since Equation 6.4 and Equation 6.3 are identical except for the variables containing at least one y, but no y is contained in the variable of the coefficient

ck,0 in Equation 6.3, it is identical to the coefficient ck in Equation 6.4.

We can see that we can use a (non-uncertain) generating function to derive the same lower bound. The advantage here is that the (non-uncertain) generating function is easier to compute, due to a much (linear in k) lower number of coefficients. However, deriving an upper bound of the probabilityDomCount(B, Q) = k is not as easy, because the upper bound does use the uncertainty of objects. An upper bound can be derived using the following generating function:

FN ₌ Y i∈1,...,N [(PU B(Ai ≺QB)·x+ (1−PLB(Ai ≺QB))] = X i≥0 cixi. (6.5)

The idea is to use a conservative approximation PU B(Ai ≺Q B) for the probability that

Ai dominates B and a conservative approximation 1−PLB(Ai ≺Q B) for the probability

that Ai does not dominate B. The difference of this generating function compared to

the uncertain generating function (cf. Equation 6.3) is that the uncertain percentage (the coefficient of y) is added to both probabilities P(Ai ≺Q B) and P(¬(Ai ≺QB))3.

Lemma 22. Let ck be the coefficients obtained by the generating function in Equation 6.5

and let PLB(DomCount(B, Q) = k) be the lower bound derived by applying the generating

function in Equation 6.3 and exploiting Lemma 20. It holds that

ck ≥PLB(DomCount(B, Q) = k),

i.e. the upper bound obtained by the (non-uncertain) generating function in Equation 6.4 is in general not as good as the lower bound obtained using the technique in Section 6.4.4. Proof. We give an example where the upper bound derived by Equation 6.4 is worse than the upper bound derived by Equation 6.3 and exploiting Lemma 20. Assume a database containing uncertain objects A1, A2, B and Q. The task is to determine the probability that DomCount(B, Q) = 1. Also assume that we have determined (e.g. as proposed in Section 6.3.2) a lower bound PLB(A1 ≺Q B) (PLB(A2 ≺Q B)) and an upper bound

PU B(A1 ≺Q B) (PU B(A2 ≺Q B)) of the probability that A1 (A2) dominates B w.r.t. Q.

To ease the notation, let A+_i denote PLB(Ai ≺Q B), let A−i denote 1−PU B(Ai ≺QB) and

letA?

i denote the uncertain fraction PU B(Ai ≺Q B)−PLB(Ai ≺QB).

6.4 Probabilistic Domination Count 121

Using this notation, Equation 6.3 becomes:

(A+x+A?y+A−)·(B+x+B?y+B−) Expansion yields:

A+B+x2+A+B?xy+A+B−x+A?B+xy+A?B?y2+

A?B−y+A−B+x+A−B?y+A−B−

Exploiting Lemma 20 yields the following upper bound probability forDomCount(B, Q) = 1:

PU B(DomCount(B, Q) = 1) =

A+B?+A+B−+A?B++A?B?+A−B?+A−B++A−B?

On the other hand, Equation 6.4 becomes:

((A++A?)x+A−+A?)·((B++B?)x+B−+B?) Expansion yields:

(A++A?)·(B++B?)x2+ (A++A?)·(B−+B?)x+ (A−+A?)·(B++B?)x+ (A−+A?)·(B−+B?) Extracting the coefficient c1 of x1 yields:

c1 = (A++A?)·(B−+B?) + (A−+A?)·(B++B?)

Expansion yields:

c1 =A+B−+A++B?+A?B−+A?B?+ A−B++A−B?+A?B++A?B?

Comparing PU B(DomCount(B, Q) = 1) and c1, we obtain:

c1−PU B(DomCount(B, Q) = 1) =A?B?

Thus, the upper bound c1 of the (non-uncertain) generating function is greater (and thus

worse) than the upper bound PU B(DomCount(B, Q) = 1) of the uncertain generating

function by a total of A?_B?_.

Intuitively, the problem of the upper bound using the (non-uncertain) generating function is that the uncertain fraction (i.e. A?_{) is added to both the probability that} _A

dominates B (i.e. A+) and to the probability that Ai does not dominate B (i.e. A−).

6.4.6 Efficient Domination Count Approximation Based on Dis-

In document Züfle, Andreas (2013): Similarity search and mining in uncertain spatial and spatio-temporal databases. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 142-146)