BUB-Tree regions - The BUB-Tree - Advanced Concepts and Applications of the UB-Tree

The BUB-Tree

6.3 BUB-Tree regions

A tuple in the index part of a (U)B⁺-Tree consist of a separator and a link to the associated child node. Due to this, the whole data space is partitioned into disjoint, but consecutive intervals on the SFC. Whether the resulting regions cover dead space or not is not known at the index level of the tree.

The fundamental idea of the BUB-Tree is to store two addresses bounding the first and last tuple on the indexed child node w.r. to address order. This still guarantees a disjunct partitioning and thus the worst case performance for the basic operations is still logarithmic as for the UB-Tree. Further, by bounding the populated space, search paths can be pruned during query processing.

Formally, a BUB-Tree can be defined as follows:

Definition 6.1 (BUB-Tree Structure)

A BUB-Tree is a set of disjoint segments B = {[σ1, ₁], [σ₂, ₂], . . . , [σ_k, _k]} on a SFC partitioning the universe Ω where _i ≤ σ_i+1∀i ∈ [1, k − 1].

A B⁺-Tree on the compound address σ_i◦ _i indexes the segments. It is not necessary that σ_i and _i correspond to the first and last tuple in page i, however when optimally bounding the populated space they will correspond to these.

Using the compound key σ_i◦ _i for the indexing B⁺-Tree allows to reuse the tree structure without modifying it. Index pages now can only store half the entries of a UB-Tree, but utilizing prefix compression instead of storing the plain keys allows to significantly reduce the overhead. Compressing should use the prefixes of the addresses σ_i and _i for best results, but not the prefix of the compound key σ_i◦ _i.

Pushing the concept of intervals further up into the index will result in index entries bounding all child nodes in the node they refer to. This is similar to the hierarchy of bounding boxes in a R^∗-Tree.

As well as for the R^∗-Tree, also the BUB-Tree allows a page capacity below 50% in order to find a better partitioning of the universe. We will refer to this lower bound for the page utilization as U_min. While it is a property of the BUB-Tree, it can be overwritten by a new Umin during insertion or reorganization.

Example 6.1 (Structure of a BUB-Tree)

Figure 6.1 on the following page shows nine data points in relation to their position on the SFC. The points are grouped into data pages with a capacity of two tuples.

Level 3 Level 2 Level 1

Index

Data SFC

Figure 6.1: Data, SFC-segments, and Pages of a BUB-Tree

The intervals as stored in the BUB-Tree are displayed according to the index levels 1 to 3 where 1 corresponds to the level of the root. Index pages have a capacity of two index entries. The hierarchy of indexed SFC-segments is visible by top-down arrows from the corresponding pages. The dead space segments not covered by the index are depicted by segment with bottom-up arrows from the SFC to the level of the index where dead space is detected during search. Example 6.2 (Partitioning of a UB-Tree, BUB-Tree, and R^∗-Tree)

Figure 6.2 on the next page shows the partitioning of a two dimensional universe for the UB-Tree, BUB-Tree, and R^∗-Tree on the last level of the index just before the data pages. Each region of the UB-Tree and BUB-Tree is colored in a different color.

The bounding boxes of the R^∗-Tree are filled grey while the actual box boundary is in a unique color. There are two clusters of data shown in Figure 6.2(a). The UB-Tree has large Z-regions also covering dead space.

In contrast to this the regions of the BUB-Tree and R^∗-Tree closely approximate the populated area and most of the dead space (white area) is recognized by the index. The R^∗-Tree shows already some overlapping regions, while the BUB-Tree

shows small dead space areas between regions.

In order to estimate the quality of a BUB-Tree partitioning we define the coverage of an index, i.e., the amount of space covered by the index. Coverage in this context is defined as the cardinality of the set of all points ~p ∈ Ω which require to inspect data pages in order to answer if a point ~p actually exists. We normalize this value w.r. to the size of the universe and thus get a percentage value.

We provide no definitions for the B⁺-Tree and UB-Tree, as they always cover the complete universe their coverage is always 100%. The coverage of a BUB-Tree is defined as follows:

Definition 6.2 (BUB-Tree-Coverage)

The coverage of a BUB-Tree partitioning B(Ω) = {[σ1, ₁], [σ₂, ₂], . . . , [σ_k, _k]} for a universe Ω is defined as

cov(B(Ω)) = Pk

i=1vol([σ_i, _i])

|Ω| =

i=1(_i− σ_i)

|Ω|

(a) Data (b) UB-Tree

Figure 6.2: Partitioning of Data resp. Universe for UB-Tree, BUB-Tree and R^∗-Tree

Definition 6.3 (Optimal Coverage of a BUB-Tree)

A BUB-Tree partitioning Bi(Ω) of a universe is called optimal iff, @Bj(Ω)|cov(Bj(Ω)) <

cov(Bi(Ω)) where Bi(Ω) and Bj(Ω) have the same size, i.e., the same number of

re-gions.

The goal is to minimize the coverage of a BUB-Tree while retaining a designated page utilization. As there are many possible partitionings for a universe it is not trivial to find the one with the minimal coverage, however we present algorithms trying to minimize the coverage while only causing modifications which are local w.r. to the tree structure of the BUB-Tree. The actual number of possible partitionings is finite as we are indexing a finite set of tuples.

The coverage is also related to the page capacity, i.e., with a capacity of one only points are covered by the data pages and thus the coverage is cov(B(Ω)) = _|Ω|^k for a relation with k data pages resp. points.

In document Advanced Concepts and Applications of the UB-Tree (Page 143-146)