In Section 6.1.5, we mentioned two approaches to the offline range min-gap query via Mo’s algorithm [1]. We elaborate on these here. We first design a data structure D that supports the following three operations:
(1) insert a real into D; (2) remove a real from D;
(3) output the min-gap of all the reals in D.
To implement all the operations efficiently, one can augment a balanced binary search tree with three fields: max, min, and min-gap, representing the maximum, the minimum, and the min-gap in each subtree. Since all the three fields can be properly maintained in O(1) time for each node,D can perform each of the three operations in O(log n) time. Now, assume there are in total m range min-gap queries to answer, and each query is of the form [li, ri], where 1 ≤ li < ri ≤ n. According to Mo’s algorithm, we need to
order these intervals bybli/√nc first, then by ri. Since li, ri, andbli/√nc are all integers
in the range of [1, n], counting sort applies and hence takes O(m + n) time. We then initialize D by inserting the reals in the subarray A[l1..r1] and answer the first query.
This step takes O(n log n) time. As we move from the i-th query to the (i + 1)-th query, we need to maintainD by inserting Aj’s for j ∈ [li+1, ri+1]\[li, ri] and removing Aj’s for
j ∈ [li, ri]\ [li+1, ri+1]. Then,D is ready to answer the (i + 1)-th query, and we repeat
this process until all m queries have been reported. It can be shown that the total number of insertions and removals is bounded by ((m/√n)√n + n)√n = (m + n)√n. Thus, it takes O((m + n)√n log n) time to answer all the m queries.
As we see above, Mo’s algorithm judiciously determines an order so that one does not need to change too many elements when switching between consecutive queries. In fact, we can often do better than that. If we treat each query [li, ri] as a point (li, ri)
in R2, the cost of moving from the i-th to the (i + 1)-th query is no more than the
L1-distance between (li, ri) and (li+1, ri+1). Therefore, to reduce the overall cost, we
can map all the queries to points in the plane and compute their rectilinear minimum spanning tree [39, 64]. With the MST in hand, we do a Euler tour (starting from any vertex) in the tree and then answer each query according to the vertex-order along the tour. It is easy to check that the total cost is no more than twice the total tree
length. The planar rectilinear minimum spanning can be computed in O(n log n) time, and computing the Euler tour takes linear time. So, it is generally a good idea to apply this optimization to achieve a better sequence than the one from Mo’s algorithm.
On the other hand, it is worth mentioning that this approach cannot improve the worst-case performance. Consider the following example. We are given a√n×√n grid, and imagine we have roughly (√n×√n)/2 = Θ(n) queries, located at the center of each square from the upper triangle of the grid. Since the L1-distance between any two
grid centers is at least √n, the total length of the MST is Ω(n√n) as there are Θ(n) edges. This example shows the tightness of Mo’s algorithm.
Chapter 7
Conclusion and future work
We summarize the contributions of this thesis and list some open problems for future work.
7.1
Summary of contributions
In Chapter 2, we investigated the preference top-k query problem, where one must pre- processes a dataset of points in Rd so that the user can efficiently retrieve the top-k candidates w.r.t. one’s specific preference. We presented efficient algorithms in 2D and 3D and also considered two query variants, namely, range preference top-k query and preference top-k with fuzzy vectors. Furthermore, in Chapter 3, we proposed a new sampling-based approximation algorithm to answer the preference top-k query. We proved via theoretical analysis that in R2 the method samples only a small subset of
the input while guaranteeing that the approximation error is within a user-specified tolerance. For R3 and R4 we provided experimental evidence for this claim.
In Chapter 4, we extended the concept of a line arrangement to the stochastic setting and investigated the most-likely k-topmost lines problem. We derived an upper-bound on the expected number of changes to the set of most-likely k-topmost lines, taken over the entire x-axis. We also showed, via a concrete example, the upper-bound can be quadratic in the worst-case even when k = 1. Moreover, we proposed an efficient algorithm to compute the most-likely k-topmost lines over the entire x-axis. Finally, we considered two related applications, namely, stochastic Voronoi Diagrams in R1 and
stochastic preference top-k queries in R2.
In Chapter 5, we generalized the idea of the stochastic Voronoi Diagram and its related problems from R1 to a general tree space. Specifically, we investigated two
fundamental proximity problems under the stochastic setting, the closest-pair problem and nearest-neighbor search. For the former, we proposed the first algorithm for com- puting the `-threshold probability and the expectation of the closest-pair distance of a realization of the stochastic input points. For the latter, we studied the k most-likely nearest-neighbor search (k-LNN) via a notion called the k most-likely Voronoi Diagram (k-LVD).
In Chapter 6, we further explored the proximity problems in query-retrieval mode and proposed efficient exact solutions to the range closest pair problem for queries such as a p-sided axes-aligned rectangle (p = 2, 3, 4), a halfplane, and a disc with fixed radius. We also presented a general approximation framework that is flexible enough to handle other query shapes. Some of our proofs (e.g., the number of candidate pairs for halfplane queries and radius-fixed discs (for short queries)) are of independent combinatorial interest.