3.5 An Improved Algorithm
3.5.2 The Algorithm
This section describes our new algorithm for the RMQ-problem. The array A to be preprocessed is (conceptually) divided into superblocks B1′, . . . , B′n/s′ of
size s′ := log2+εn, where Bi′ spans from A[(i−1)s′ + 1] to A[is′]. Here, ε is an arbitrary constant greater than 0. Likewise, A is divided into (conceptual) blocks B1, . . . , Bn/s of size s:= logn/(2 +δ). Again, δ >0 is a constant. For
the sake of simplicity we assume that s′ is a multiple of s. The general idea is that a query from l to r can be divided into at most five sub-queries (see also Fig. 3.6): onesuperblock-query that spans several superblocks, twoblock-queries
that span the blocks to the left and right of the superblock-query, and two in- block-queries to the left and right of the block-queries. We will preprocess long queries by a two-level step as presented in Sect. 3.4.3.1, and short queries will be precomputed by a combination of the Four-Russians-Trick (Arlazarov et al., 1970) with the method from Sect. 3.4.2. From now on, we assume that the
≺-relation is used for answering RMQs, such that the answers become unique. 3.5.2.1 A Succinct Data Structure for Handling Long Queries
We first wish to precompute the answers to all RMQs that span over at least one superblock. In essence, we re-use Sadakane’s idea for the succinct computation
of LCAs in BPS-encoded trees (see Sect. 3.4.3). This is actually a two-level storage scheme due to Munro (1996). Define a table M′[1, n/s′][0,log(n/s′)]. M′[i][j] stores the position of the minimum in the sub-arrayA[(i−1)s′+ 1,(i+ 2j−1)s′]. As in Sect. 3.4.1,M′[i][0] can be filled by a linear pass over the array,
and for j > 0 we use a dynamic programming approach by settingM′[i][j] =
arg mink∈{M′[i][j−1],M′[i+2j−1][j−1]}{A[k]}.
In the same manner we precompute the answers to all RMQs that span over at least one block, but not over a superblock. These answers are stored in a table M[0, n/s−1][0,log(s′/s)], where M[i][j] stores the minimum of A[(i−
1)s+ 1,(i+ 2j −1)s]. Again, dynamic programming can be used to fill table
M in optimal time.
3.5.2.2 A Succinct Data Structure for Handling Short Queries
We now show how to store all necessary information for answering in-block- queries using tables of size 2n+o(n) in total. The key to our solution is the following lemma, which has implicitly been used already in the Berkman- Vishkin algorithm. (Recall that the ≺-relation is used for answering RMQs, such that the answers become unique.)
Lemma 3.9(Relating RMQs and Cartesian Trees). LetAandB be two arrays, both of size s. Then rmqA(i, j) =rmqB(i, j) for all 1≤i≤j≤sif and only ifCcan(A) =Ccan(B).
Proof. It is easy to see that rmqA(i, j) = rmqB(i, j) for all 1≤ i≤j ≤s if and only if the following three conditions are satisfied:
1. The minimum under “≺” occurs at the same positionm, i.e., arg minA= arg minB =m.
2. For all 1≤i≤j < m,rmqA[1,m−1](i, j) =rmqB[1,m−1](i, j). 3. For all m < i≤j≤s,rmqA[m+1,s](i, j) =rmqB[m+1,s](i, j).
Due to the definition of the Canonical Cartesian Tree, points (1)–(3) are true if and only if the root ofCcan(A) equals the root ofCcan(B), andCcan(A[1, m−1]) =
Ccan(B[1, m−1]), and Ccan(A[m+ 1, s]) = Ccan(B[m+ 1, s]). As this is the
definition of Cartesian Trees, this is true iffCcan(A) =Ccan(B).
This means that tableP does not have to store the in-block-queries for alln/s
occurring blocks, but only for Cs = 4s/(√πs3/2)(1 +O(s−1)) possible blocks.
The total space occupied byP will be analyzed in Sect. 3.5.2.5. 3.5.2.3 Computing the Block Types
In order to index tableP, it remains to show how to compute the types of the n/sblocks Bj occurring in A in linear time; i.e., how to fill an arrayT[1, n/s]
Algorithm 3.2: An algorithm to compute the type of a block Bj Input: a block Bj of sizes
Output: the type ofBj, as defined by Eq. (3.8)
letR be an array of size s+ 1 {R stores elements on the rightmost path}
1 R[1]← −∞ 2 q←s, N ←0 3 fori←1, . . . , s do 4 while R[q+i−s]> Bj[i]do 5
N ←N +C(s−i)q {add number of skipped paths}
6
q←q−1 {remove node from rightmost path}
7
endw 8
R[q+i+ 1−s]←Bj[i] {Bj[i] is new rightmost leaf} 9
endfor 10
returnN
11
such that T[j] gives the type of block Bj. Lemma 3.9 implies that there are
only Cs different types of blocks, so we are looking for a surjection
t:As→ {0, . . . , Cs−1}, and t(Bi) =t(Bj) iff Ccan(Bi) =Ccan(Bj) , (3.8)
where As is the set of arrays of size s. The reason for requiring that Bi and
Bj have the same Canonical Cartesian Tree is that in such a case both blocks
share the same RMQs. Algorithm 3.2 shows how to compute the block type by making use of the ideas from Sect. 3.5.1.
Lemma 3.10 (Correctness of Type-Computation). Algorithm 3.2 correctly computes the type of a block Bj of size s in O(s) time, i.e., it computes a function satisfying the conditions given in (3.8).
Proof. Intuitively, Alg. 3.2 simulates the algorithm for constructing Ccan(Bj)
given in Sect. 3.2 and “implements” a function defined by (3.5). First note that array R[1, s+ 1] simulates the stack containing the labels of the nodes on the rightmost path of the partial Canonical Cartesian TreeCcan
i (Bj), withq+i−s
pointing to the top of the stack (i.e., the rightmost leaf), and R[1] acting as a “stopper.” Now let li be the number of times the while-loop (lines 5–8) is
executed during the ith iteration of the outer for-loop. Note thatli equals the
number of elements that are removed from the rightmost path when going from
Ccan
i−1(Bj) to Cican(Bj). So the sequence l1l2. . . ls satisfies (3.1), and therefore
uniquely characterizes Ccan(Bj). As the additions performed in line 6 of Alg.
3.2 are exactly those defined by (3.5), the value computed by the algorithm is the index of Ccan(B
j) in an enumeration of all binary trees with s nodes. We
3.5.2.4 Answering Queries
Having precomputed all the necessary information as above, it is easy to see thatrmq(l, r) is one of the following five positions (precisely, the position where Aattains the minimum value):
• If the interval [l :r] spans the superblocks B′
l′, . . . , Br′′, compute the po-
sition of the minimum as arg mink∈{M′[l′][x],M′[r′−2x+1][x]}{A[k]}, where
x=⌊log(r′−l′)⌋.
• If the interval [l : r] spans the blocks Bl1, . . . , Br1 to the left of su-
perblock Bl′′, compute the position of the minimum in this interval as
arg mink∈{M[l1][x],M[r1−2x+1][x]}{A[k]}, where x=⌊log(r1−l1)⌋.
• Same as the previous point, but for the blocks Bl2, . . . , Br2 to the right of
B′ r′.
• Compute the positions of the minimum in the blocks to the left of Bl1
and to the right of Br2.
3.5.2.5 Space Analysis
Table M′ has dimensionsn/s′×log(n/s′) =n/log2+εn×log(n/log2+εn) and stores values up to n; the total number of bits needed forM′ is therefore
|M′| = n log2+εnlog n log2+εn ·logn = n
log1+εn(logn−log log
2+εn)
= n
logεn(1−o(1)) = o(n) .
TableM has dimensionsn/s×log(s′/s) = (2 +δ)n/logn×log((2 +δ) log1+εn). If we just store the offsets of the minima then the values do not become greater thans′; the total number of bits needed forM is therefore
|M| = O n lognlog(log 1+εn)·log(log2+εn) = O nlog2logn logn = o(n).
To store the type of each block, arrayT has lengthn/s= (2 +δ)n/logn, and because of Lemma 3.9, the numbers do not get bigger than O(4s/s3/2). This
means that the number of bits to encodeT is |T| = n s ·log O 4s s3/2 = n s ·(2s−O(logs)) = 2n−O nlog logn logn = 2n−o(n) .
Finally, it is possible to store tableP witho(n) bits: Lemma 3.9 implies that P has only O(s43/s2) rows, one for each possible block-type. For each type we
need to precompute rmq(i, j) for all 1≤i≤j ≤s, so the number of columns in P is O(s2). If we use the method described in Sect. 3.4.2 to represent the answers to all RMQs inside one block, this takes O(s·s) bits of space for each possible block.9 The total space is thus
|P| = O 4s s3/2s·s = On2/(2+δ)plogn = On1−2/δ1+1plogn = o(n/logn) bits.
Because the space consumption of table M is asymptotically higher than the second order term that is subtracted from the space of table T, the total space needed is 2n+o(n) bits. And because the peak space consumption at construction time is the same as that of the final data structure (apart from the negligible O(lognlog logn) bits needed for arrayR, and theO(log3n) bits for the Ballot-Numbers in Alg. 3.2), we can now state the main result of this section:
Theorem 3.11 (Succinct representation of RMQ-information). For an ar- ray with n elements from a totally ordered set, there exists an algorithm for the RMQ-problem with time complexity hO(n), O(1)i and bit-space complexity
J2n+o(n),2n+o(n)K.
We finally note that our algorithm is also easy to implement on PRAMs (or real-world shared-memory machines), where with n/t processors the prepro- cessing runs in time Θ(t) ift= Ω(logn), which is work-optimal. This is simply because the minimum-operation is associative and can hence be parallelized after a Θ(t) sequential initialization (Schwartz, 1980).
9We remark that the usage of Alstrup et al.’s method for the precomputation of the in-block
queries would not be necessary to achieve the o(n) space bound, but it certainly saves some space.