Probabilistic Ranking Algorithm - Bernecker, Thomas (2012): Similarity processing in mul

12.3.1 Algorithm Description

The pseudocode of the probabilistic ranking algorithm is illustrated in Algorithm 7 and provides the implementation details of the previously discussed steps. The algorithm requires a query objectqand a distance browsing operatorBthat allows to iteratively access the observations sorted in ascending order of their distance to a query object.

12.3 Probabilistic Ranking Algorithm 133

each object X that

• has previously been found inB, i.e., at least one observation ofX has been processed

• and has not yet been completely processed, i.e., at least one observation of X has yet to be found,

associated with the sumP(X) of probabilities of all its observations that have been found. The AOL offers two functionalities:

• updateAOL(x): adds the probability P(X =x) of the observation x ∈X to P(X), where X is the object that xbelongs to.

• getProb(X): returns the aggregated probability of object X (P(X)).

For efficient retrieval and update, it is mandatory that the position of a tuple (X, P(X)) in theAOLcan be found in constant time in order to sustain the constant time complexity of an iteration. This can be approached by means of hashing or by directly storing with each object X the information about the probability P(X), both requiring an additional space cost ofO(N). Another structure to keep isresult, a matrix that contains, for each observation xthat has been retrieved from B, and each ranking position ithe probability Pq(x, i) that x is located on ranking position i. In order to get an object-based rank probability, observations belonging to the same object can be aggregated, using Equation (12.1).

Additionally, two arrays p-rank_x and p-rank_y are initialized, each of length k, which contain, at any iteration of the algorithm, the probabilities Pi,S\{X},xand Pi,S\{Y},y respec- tively, for all i∈ {0, . . . , k}. x ∈X is the observation found in the previous iteration and y∈Y is the observation found in the current iteration (cf. Figure 12.1).

In line 5, the algorithm starts by fetching the first observation, which is closest to the query observation q in the database. A tuple containing the corresponding object as well as the probability of this observation is added to the AOL.

Then, the probability for the first position of x, p-rank_x, is set to 1, while the probabilities for all other k−1 positions remain 0, because

P0,S\{X},x=P0,∅,x = 1 andPi,S\{X},x =Pi,∅,x= 0

for i ≥ 1 by definition (cf. Equation (12.2)). This simply reflects the fact that the first retrieved observation fromBis always on rank 1. p-rank_y is implicitly assigned top-rank_x. Then, the first iteration of the main algorithm begins by fetching the next observation from

B (line 11). Now, the three cases explained in Subsection 12.2.2 have to be distinguished. In the first case (line 13), both the previous and the current observation refer to the same object. As explained in Subsection 12.2.2, there is nothing to do in this case, since Pi,S\{X},x=Pi,S\{Y},y for all i∈ {0. . . , k−1}.

In the second case (line 16), the current observation refers to an object that has not been seen yet. As explained in Subsection 12.2.2, only an additional iteration of the dynamic- programming algorithm has to be applied (cf. Equation (12.2)). This dynamic iteration

Algorithm 8Dynamic Iteration for Observation y: dynamicRound(oldRanking, Py(X))

Require: oldRanking (intermediate result without object X)

Require: Py(X) (probability that object X is closer to q than observation y)

newRanking ←[0, . . . ,0]{length k}

newRanking[0]←oldRanking[0]·(1−Py(X))

for i= 1→k−1do

newRanking[i]←oldRanking[i−1]·Py(X) +oldRanking[i]·(1−Py(X))

end for

return newRanking (result including object X)

Algorithm 9Probability Adjustment: adjustProbs(oldRanking, Px(Y))

Require: oldRanking (intermediate result including object Y)

Require: Px(Y) (prob. that objectY is closer toq than the last retrieved observation x)

adjustedProbs ←[0, . . . ,0] {length k} adjustedProbs[0]← oldRanking₁₋_P [0]

x(Y)

for i= 1→k−1do

adjustedProbs[i]← oldRanking[i]−adjustedProbs[i−1]·Px(Y)

1−Px(Y)

end for

return adjustedProbs (intermediate result at observation y ∈ Y, excluding object Y from the current result)

dynamicRound is shown in Algorithm 8 and is used here to incorporate the probability that X is closer to q than y into p-rank_y in a single iteration of the dynamic algorithm.

In the third case (line 20), the current observation relates to an object that has al- ready been seen. Thus, the probabilities Pi,S\{X},x depend on Y. As explained in Sub- section 12.2.2, the influence of previously retrieved y ∈ Y on Pi,S\{X},x has to be filtered out first, and then Pi,S\{X,Y},x has to be computed. This is performed by the probability adjustment algorithm adjustP robs(cf. Algorithm 9) utilizing the technique explained in Subsection 12.2.2. Using the Pi,S\{X,Y},x, the algorithm then computes the Pi,S\{Y},y performing a single iteration of the dynamic algorithm like in Case 2.

In line 27, the computed ranking for observation y is added to the result. If the application (i.e., the ranking method) requires objects to be ranked instead of observations, then p-rank_y is used to incrementally update the probabilities of Y for each rank.

The algorithm continues fetching observations from the distance browsing operator

B and repeats this case analysis until either no more samples are left in B or until an observation is found with a probability of 0 for each of the first k positions. In the latter case, there exist k objects that are closer to q with a probability of 1, i.e. for which all observations have been retrieved, and the computation can be stopped, because the same k objects must be closer to q than all further observations in the database that have not yet been retrieved by the distance browsing B.

In document Bernecker, Thomas (2012): Similarity processing in multi-observation data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 148-151)