Runtime Analysis - Efficient Retrieval of the Rank Probabilities

12.2 Efficient Retrieval of the Rank Probabilities

12.2.3 Runtime Analysis

Building on this case-based analysis for the cost of computing Pi,S\{X},x for the currently accessed observation x of an object X, it is now possible to prove that the RPD can be computed at cost O(k· N). The following lemma suggests that the incremental cost per observation access is O(k).

Lemma 12.1 Let (x, P(X=x))∈X and (y, P(Y =y))∈Y be two observations consec- utively returned from the distance browsing B. Without loss of generality, the assumption is made that the observation (x, P(X = x)) was returned in the last iteration in which the probabilities Pi,S\{X},x have been computed for all i ∈ {0, . . . ,min(k,|S \ {X}|)}. In

the next iteration, in which (y, P(Y = y)) is fetched, the probabilities Pi,S\{Y},y for all i∈ {0, . . . ,min(k,|S \ {Y}|)} can be computed in O(k) time and space.

Proof. In Case 1, the probabilities Pi,S\{X},x and Pi,S\{Y},y are equal for all i ∈ {0, . . . ,min(k,|S \ {Y}|)}. No computation is required (O(1) time) and the result can be stored using at most O(k) space.

In Case 2, the probabilities Pi,S\{Y},y for all i∈ {0, . . . ,min(k,|S \ {Y}|)} can be com-

puted according to Equation (12.3)takingO(k)time. This assumes that thePi,S\{X},xhave

to be stored for all i∈ {0, . . . ,min(k,|S \ {Y}|)}, requiring at most O(k) space.

In Case 3, it is first needed to compute and store the probabilities Pi,S\{X,Y},x for all i∈ {0, . . . ,min(k,|S \ {X, Y}|)} using the recursive function in Equation (12.5). This can be done inO(min(k,|S \ {X, Y}|)) time and space. Next, the computed probabilities can be used to computePi,S\{Y},y for alli∈ {0, . . . ,min(k,|S \{Y}|)}according to Equation (12.4)

which takes at most O(k) time and space. 2

After giving the runtime evaluation of the processing of one single observation, it is now possible to extend the cost model for the whole query process. According to Lemma 12.1, the assumption can be made that each observation can be processed in constant time if k is chosen to be constant. Under the assumption that the total number of observations

12.2 Efficient Retrieval of the Rank Probabilities 131

Approach No precomputed B PrecomputedB

Soliman et al. [192] exponential exponential

Chapter 11 [45] exponential exponential

Yi et al. [214] O(k· N2) O(k· N2)

This chapter [43] O(N · log(N) +k· N) O(k· N)

Table 12.1: Runtime complexity comparison between the probabilistic ranking approaches; N and k denote the database size and the ranking depth, respectively.

in the database is linear in the number of database objects, a runtime complexity would be obtained which is linear in the number of database objects, more exactly O(k · N), wherek is the specified depth of the ranking. Up to now, the utilized data model assumes that the pre- and postprocessing steps of the proposed framework require at most linear runtime. Since the postprocessing step only includes an aggregation of the results in order to obtain a final ranking output, the linear runtime complexity of this step is guaranteed. Now, the runtime of the initial (certain) observation ranking has to be examined, which is the preprocessing step needed to initialize the distance browsing B. Similarly to the assumptions that hold for the competitors [45, 192, 214], it can also be assumed that the observations are already sorted, which would involve linear runtime cost also for this module. However, for the general case where a distance browsing has to be initialized first, the runtime complexity of this module would increase toO(N· log(N)). As a consequence, the total runtime cost of the proposed approach (including distance browsing) sums up to O(N · log(N) +k· N). An overview of the computation cost is given in Table 12.1.

The cost required to solve the object-based rank probability problem is similar to that required to solve the observation-based rank probability problem. The solution based on observations additionally only requires to build the sum over all observation-based rank probabilities, which can be done on-the-fly without additional cost. Furthermore, the cost required to build a final unambiguous ranking (e.g., the rankings proposed in Section 12.4 or those proposed in Chapter 11) from the rank probabilities can be neglected. The final ranking can also be computed on-the-fly by simple aggregations of the corresponding (observation-based) rank probabilities.

Regarding the space complexity of an RPD of size O(k· N), a vector of length k has to be stored for each object in the database. In addition, it is required to store the AOL

of a size of at most O(N), yielding a total space complexity ofO(k· N+N) = O(k· N). [214] directly combines the probability computations with the output of U-kRanks with a space complexity of O(N). The approach presented this chapter solves the problem of computing the RPD, i.e., the bipartite graph problem introduced in Chapter 9, and can apply the solution to any definite ranking output. Details will be provided in Section 12.4. To compute an RPD according to the current definition, [214] requires O(k· N) space as well.

Algorithm 7Probabilistic Ranking Algorithm: probRanking(B,q)

Require: B, q

1: AOL← ∅

2: result←Matrix of 0s // size = N ·k

3: p-rank_x ←[0, . . . ,0] // lengthk

4: p-rank_y ←[0, . . . ,0] // length k

5: y← B.next()

6: updateAOL(y)

7: p-rank_x[0]←1

8: add p-rank_x to the first line of result

9: while B is not empty and ∃p∈p-rank_x :p >0 do

10: x←y

11: y ← B.next()

12: updateAOL(y)

13: if Y =X then

14: {Case 1 (cf. Figure 12.1(a))} 15: p-rank_y ←p-rank_x

16: else if Y 6∈AOL then

17: {Case 2 (cf. Figure 12.1(b))} 18: P(X)←AOL.getProb(X)

19: p-rank_y ←dynamicRound(p-rank_x, Py(X))

20: else

21: {Case 3 (Y 6=X, cf. Figure 12.1(c))} 22: P(X)←AOL.getProb(X)

23: P(Y)←AOL.getProb(Y)

24: adjustedProbs ← adjustProbs(p-rank_x, Px(Y))

25: p-rank_y ←dynamicRound(adjustedProbs, Py(X))

26: end if

27: Add p-rank_y to the next line of result 28: p-rank_x ←p-rank_y

29: end while

30: return result

In document Bernecker, Thomas (2012): Similarity processing in multi-observation data. Dissertation, LMU München: Fakultät für Mathematik, Informatik und Statistik (Page 146-148)