• No results found

3.5 An Improved Algorithm

3.5.2 The Algorithm

This section describes our new algorithm for the RMQ-problem. The array A to be preprocessed is (conceptually) divided into superblocks B1′, . . . , B′n/s′ of

size s′ := log2+εn, where Bi′ spans from A[(i−1)s′ + 1] to A[is′]. Here, ε is an arbitrary constant greater than 0. Likewise, A is divided into (conceptual) blocks B1, . . . , Bn/s of size s:= logn/(2 +δ). Again, δ >0 is a constant. For

the sake of simplicity we assume that s′ is a multiple of s. The general idea is that a query from l to r can be divided into at most five sub-queries (see also Fig. 3.6): onesuperblock-query that spans several superblocks, twoblock-queries

that span the blocks to the left and right of the superblock-query, and two in- block-queries to the left and right of the block-queries. We will preprocess long queries by a two-level step as presented in Sect. 3.4.3.1, and short queries will be precomputed by a combination of the Four-Russians-Trick (Arlazarov et al., 1970) with the method from Sect. 3.4.2. From now on, we assume that the

≺-relation is used for answering RMQs, such that the answers become unique. 3.5.2.1 A Succinct Data Structure for Handling Long Queries

We first wish to precompute the answers to all RMQs that span over at least one superblock. In essence, we re-use Sadakane’s idea for the succinct computation

of LCAs in BPS-encoded trees (see Sect. 3.4.3). This is actually a two-level storage scheme due to Munro (1996). Define a table M′[1, n/s′][0,log(n/s′)]. M′[i][j] stores the position of the minimum in the sub-arrayA[(i1)s′+ 1,(i+ 2j1)s]. As in Sect. 3.4.1,M[i][0] can be filled by a linear pass over the array,

and for j > 0 we use a dynamic programming approach by settingM′[i][j] =

arg mink∈{M[i][j−1],M[i+2j−1][j−1]}{A[k]}.

In the same manner we precompute the answers to all RMQs that span over at least one block, but not over a superblock. These answers are stored in a table M[0, n/s−1][0,log(s′/s)], where M[i][j] stores the minimum of A[(i−

1)s+ 1,(i+ 2j 1)s]. Again, dynamic programming can be used to fill table

M in optimal time.

3.5.2.2 A Succinct Data Structure for Handling Short Queries

We now show how to store all necessary information for answering in-block- queries using tables of size 2n+o(n) in total. The key to our solution is the following lemma, which has implicitly been used already in the Berkman- Vishkin algorithm. (Recall that the -relation is used for answering RMQs, such that the answers become unique.)

Lemma 3.9(Relating RMQs and Cartesian Trees). LetAandB be two arrays, both of size s. Then rmqA(i, j) =rmqB(i, j) for all 1ijsif and only ifCcan(A) =Ccan(B).

Proof. It is easy to see that rmqA(i, j) = rmqB(i, j) for all 1 ij s if and only if the following three conditions are satisfied:

1. The minimum under “≺” occurs at the same positionm, i.e., arg minA= arg minB =m.

2. For all 1≤i≤j < m,rmqA[1,m−1](i, j) =rmqB[1,m−1](i, j). 3. For all m < ijs,rmqA[m+1,s](i, j) =rmqB[m+1,s](i, j).

Due to the definition of the Canonical Cartesian Tree, points (1)–(3) are true if and only if the root ofCcan(A) equals the root ofCcan(B), andCcan(A[1, m1]) =

Ccan(B[1, m1]), and Ccan(A[m+ 1, s]) = Ccan(B[m+ 1, s]). As this is the

definition of Cartesian Trees, this is true iffCcan(A) =Ccan(B).

This means that tableP does not have to store the in-block-queries for alln/s

occurring blocks, but only for Cs = 4s/(√πs3/2)(1 +O(s−1)) possible blocks.

The total space occupied byP will be analyzed in Sect. 3.5.2.5. 3.5.2.3 Computing the Block Types

In order to index tableP, it remains to show how to compute the types of the n/sblocks Bj occurring in A in linear time; i.e., how to fill an arrayT[1, n/s]

Algorithm 3.2: An algorithm to compute the type of a block Bj Input: a block Bj of sizes

Output: the type ofBj, as defined by Eq. (3.8)

letR be an array of size s+ 1 {R stores elements on the rightmost path}

1 R[1]← −∞ 2 qs, N 0 3 fori←1, . . . , s do 4 while R[q+is]> Bj[i]do 5

N N +C(si)q {add number of skipped paths}

6

qq1 {remove node from rightmost path}

7

endw 8

R[q+i+ 1−s]←Bj[i] {Bj[i] is new rightmost leaf} 9

endfor 10

returnN

11

such that T[j] gives the type of block Bj. Lemma 3.9 implies that there are

only Cs different types of blocks, so we are looking for a surjection

t:As→ {0, . . . , Cs−1}, and t(Bi) =t(Bj) iff Ccan(Bi) =Ccan(Bj) , (3.8)

where As is the set of arrays of size s. The reason for requiring that Bi and

Bj have the same Canonical Cartesian Tree is that in such a case both blocks

share the same RMQs. Algorithm 3.2 shows how to compute the block type by making use of the ideas from Sect. 3.5.1.

Lemma 3.10 (Correctness of Type-Computation). Algorithm 3.2 correctly computes the type of a block Bj of size s in O(s) time, i.e., it computes a function satisfying the conditions given in (3.8).

Proof. Intuitively, Alg. 3.2 simulates the algorithm for constructing Ccan(Bj)

given in Sect. 3.2 and “implements” a function defined by (3.5). First note that array R[1, s+ 1] simulates the stack containing the labels of the nodes on the rightmost path of the partial Canonical Cartesian TreeCcan

i (Bj), withq+i−s

pointing to the top of the stack (i.e., the rightmost leaf), and R[1] acting as a “stopper.” Now let li be the number of times the while-loop (lines 5–8) is

executed during the ith iteration of the outer for-loop. Note thatli equals the

number of elements that are removed from the rightmost path when going from

Ccan

i−1(Bj) to Cican(Bj). So the sequence l1l2. . . ls satisfies (3.1), and therefore

uniquely characterizes Ccan(Bj). As the additions performed in line 6 of Alg.

3.2 are exactly those defined by (3.5), the value computed by the algorithm is the index of Ccan(B

j) in an enumeration of all binary trees with s nodes. We

3.5.2.4 Answering Queries

Having precomputed all the necessary information as above, it is easy to see thatrmq(l, r) is one of the following five positions (precisely, the position where Aattains the minimum value):

• If the interval [l :r] spans the superblocks B′

l′, . . . , Br′′, compute the po-

sition of the minimum as arg mink∈{M′[l][x],M[r−2x+1][x]}{A[k]}, where

x=log(r′l′).

• If the interval [l : r] spans the blocks Bl1, . . . , Br1 to the left of su-

perblock Bl′′, compute the position of the minimum in this interval as

arg mink∈{M[l1][x],M[r1−2x+1][x]}{A[k]}, where x=⌊log(r1−l1)⌋.

• Same as the previous point, but for the blocks Bl2, . . . , Br2 to the right of

B′ r′.

• Compute the positions of the minimum in the blocks to the left of Bl1

and to the right of Br2.

3.5.2.5 Space Analysis

Table M′ has dimensionsn/s′×log(n/s′) =n/log2+εn×log(n/log2+εn) and stores values up to n; the total number of bits needed forM′ is therefore

|M′| = n log2+εnlog n log2+εn ·logn = n

log1+εn(logn−log log

2+εn)

= n

logεn(1−o(1)) = o(n) .

TableM has dimensionsn/s×log(s′/s) = (2 +δ)n/logn×log((2 +δ) log1+εn). If we just store the offsets of the minima then the values do not become greater thans′; the total number of bits needed forM is therefore

|M| = O n lognlog(log 1+εn)·log(log2+εn) = O nlog2logn logn = o(n).

To store the type of each block, arrayT has lengthn/s= (2 +δ)n/logn, and because of Lemma 3.9, the numbers do not get bigger than O(4s/s3/2). This

means that the number of bits to encodeT is |T| = n s ·log O 4s s3/2 = n s ·(2s−O(logs)) = 2nO nlog logn logn = 2no(n) .

Finally, it is possible to store tableP witho(n) bits: Lemma 3.9 implies that P has only O(s43/s2) rows, one for each possible block-type. For each type we

need to precompute rmq(i, j) for all 1ij s, so the number of columns in P is O(s2). If we use the method described in Sect. 3.4.2 to represent the answers to all RMQs inside one block, this takes O(s·s) bits of space for each possible block.9 The total space is thus

|P| = O 4s s3/2s·s = On2/(2+δ)plogn = On1−2/δ1+1plogn = o(n/logn) bits.

Because the space consumption of table M is asymptotically higher than the second order term that is subtracted from the space of table T, the total space needed is 2n+o(n) bits. And because the peak space consumption at construction time is the same as that of the final data structure (apart from the negligible O(lognlog logn) bits needed for arrayR, and theO(log3n) bits for the Ballot-Numbers in Alg. 3.2), we can now state the main result of this section:

Theorem 3.11 (Succinct representation of RMQ-information). For an ar- ray with n elements from a totally ordered set, there exists an algorithm for the RMQ-problem with time complexity hO(n), O(1)i and bit-space complexity

J2n+o(n),2n+o(n)K.

We finally note that our algorithm is also easy to implement on PRAMs (or real-world shared-memory machines), where with n/t processors the prepro- cessing runs in time Θ(t) ift= Ω(logn), which is work-optimal. This is simply because the minimum-operation is associative and can hence be parallelized after a Θ(t) sequential initialization (Schwartz, 1980).

9We remark that the usage of Alstrup et al.’s method for the precomputation of the in-block

queries would not be necessary to achieve the o(n) space bound, but it certainly saves some space.