Incremental Probability Computation - Efficient Retrieval of the Rank Probabilities

12.2 Efficient Retrieval of the Rank Probabilities

12.2.2 Incremental Probability Computation

Let (x, P(X =x))∈X and (y, P(Y =y))∈Y be two observations consecutively returned from the distance browsing. Without loss of generality, let (x, P(X = x)) be returned before (y, P(Y =y)). The current state assumes thatxwas the last processed observation, such that X ∈ S holds. Each probability Pi,S\{Y},y (i ∈ {0, . . . ,min(k,|S \ {Y}|)}) can be computed from the probabilities Pi,S\{X},x in constant time. In fact, the probabilities Pi,S\{Y},y can be computed by considering at most one recursion step backwards. This will turn out to be the main improvement compared to [214], as the new probabilitiesPi,S\{Y},y are incorporated in the previous results, whereas [214] computes the ranking probabilities from scratch (i.e., all shaded cells of the illustrated matrix in Chapter 11), requiring an update cost of O(k·N).

The following three cases have to be considered, which are illustrated in Figure 12.1. The first two cases are easy to tackle; the third case is the most frequently occurring and challenging one.

• Case 1: Both observations belong to the same object, i.e.,X =Y (cf. Figure 12.1(a)).

• Case 2: Both observations belong to different objects, i.e.,X 6=Y and (y, P(Y =y)) is the first retrieved observation of object Y (cf. Figure 12.1(b)).

• Case 3: Both observations belong to different objects, i.e.,X 6=Y and (y, P(Y =y)) is not the first retrieved observation of object Y (cf. Figure 12.1(c)).

12.2 Efficient Retrieval of the Rank Probabilities 129

Now, it will be presented how the probabilities Pi,S\{Y},y for i∈ {0, . . . ,min(k,|S \ {Y}|)} can be computed in constant time considering the above cases.

In the first case (cf. Figure 12.1(a)), the probabilities Px(Z) and Py(Z) of all objects in Z ∈ S \ {X} are equal, because the observations of objects inS \ {X}that appear within the distance range of q of y and within the distance range of q and x are identical. Since the probabilitiesPi,S\{Y},y andPi,S\{X},xonly depend onPx(Z) for all objectsZ ∈ S \ {X}, it is obvious that Pi,S\{Y},y =Pi,S\{X},x for all i.

In the second case (cf. Figure 12.1(b)), it is possible to exploit the fact that Pi,S\{X},x does not depend on Y, as y is the first returned observation of Y. At this point, y ∈ S. Thus, given the probabilities Pi,S\{X},x, the probability Pi,S\{Y},y can easily be computed by incorporating the object X using the recursive Equation (12.2):

Pi,S\{Y},y =Pi−1,S\{Y,X},y·Py(X) +Pi,S\{Y,X},y·(1−Py(X)).

Since S \ {Y, X} = S \ {X, Y} and there is no observation of any object in S \ {X, Y}

which appears within the distance range of q and y but not within the range of q and x (cf. Figure 12.1(b)), similar conditions that held for x can also be assumed for y. Thus, the following equation holds:

Pi,S\{Y},y =Pi−1,S\{X,Y},x·Py(X) +Pi,S\{X,Y},x·(1−Py(X)).

Furthermore, Pi−1,S\{X,Y},x=Pi−1,S\{X},x, becauseY is not in the distance range of q and x and, thus,Y /∈ S \ {X}. Now, the above equation can be reformulated:

Pi,S\{Y},y =Pi−1,S\{X},x·Py(X) +Pi,S\{X},x·(1−Py(X)). (12.3) All probabilities of the term on the right hand side in Equation (12.3) are known and, thus,Pi,S\{Y},y can be computed in constant time, assuming that the probabilitiesPi,S\{X},x computed in the previous step have been stored for all i∈ {0, . . . ,min(k,|S \ {X}|)}.

The third case (cf. Figure 12.1(c)) is the general case which is not as straightforward as the previous two cases and requires special techniques. Again, the assumption is made that the probabilitiesPi,S\{X},xcomputed in the previous step for alli∈ {0, . . . ,min(k,|S \

{X}|)} are known. Similarly to Case 2, the probability Pi,S\{Y},y can be computed by Pi,S\{Y},y =Pi−1,S\{X,Y},x·Py(X) +Pi,S\{X,Y},x·(1−Py(X)). (12.4) Since the probability Py(X) is assumed to be known, now the computation of Pi,S\{X,Y},x is left for all i∈ {0, . . . ,min(k,|S \ {X, Y}|)} by again exploiting Equation (12.2):

Pi,S\{X},x=Pi−1,S\{X,Y},x·Px(Y) +Pi,S\{X,Y},x·(1−Px(Y)), which can be resolved to

Pi,S\{X,Y},x=

Pi,S\{X},x−Pi−1,S\{X,Y},x·Px(Y) 1−Px(Y)

Assuming i= 0 yields P0,S\{X,Y},x= P0,S\{X},x−P−1,S\{X,Y},x·Px(Y) 1−Px(Y) = P0,S\{X},x 1−Px(Y) ,

because the probability P−1,S\{X,Y},x = 0 by definition (cf. Equation (12.2)). The case i= 0 can be solved assuming thatP0,S\{X},x is known from the previous iteration step.

With the assumption that all probabilitiesPi,S\{X},xfor alli∈ {1, . . . ,min(k,|S\{X}|)} as well as Px(Y) are available from the previous iteration step, Equation (12.5) can be used to recursively compute Pi,S\{X,Y},x for all i ∈ {1, . . . ,min(k,|S \ {X, Y}|)} using the previously computedPi−1,S\{X,Y},x. This recursive computation yields all probabilities Pi,S\{X,Y},x (i∈ {0, . . . ,min(k,|S \ {X, Y}|)}) which can be used to compute the probabilities Pi,S\{Y},y for all i∈ {0, . . . ,min(k,|S \ {X, Y}|)} according to Equation (12.4).

In document Bernecker, Thomas (2012): Similarity processing in multi-observation data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 144-146)