• No results found

Cache-sensitive binary search trees

3.4 Local relocation

3.4.3 Fixing broken nodes

After every rotation, and after each actual insertion or actual deletion, Invariant 3.1 is re-established by invoking Algorithm 3.3 using the po-tentially broken nodes given in Figure 3.3.

Algorithm 3.3 uses an additional definition: D(x) is defined to be the set of neighbors of node x that depend on node x (i.e., will be broken if x is moved to another cache block). Thus, D(x) ⊂ N (x) and 0 ≤ |D(x)| ≤ |N (x)| ≤ 3. The algorithm uses the property is that a broken node b can be moved freely, because D(b) = ∅: all of its neighbors N (b) are on different cache blocks, as otherwise b would not be broken.

Each iteration in Algorithm 3.3 first removes any non-broken nodes from the set B of potentially broken nodes. Then the first available

32 CACHE-SENSITIVE BINARY SEARCH TREES

(a) External tree insertion

(b) External tree deletion

(c) Internal tree insertion

(d) Internal tree deletion (non-leaf)

(e) Internal tree deletion (leaf)

(f) Single rotation

(g) Double rotation

Figure 3.3 Broken nodes in actual insertion, actual deletion and rotations.

Potentially broken nodes are filled black; the dotted lines indicate the nodes that the operation works on.

fix-broken(B):

1 while B 6= ∅ do

2 Remove any non-broken nodes from B (and exit if B becomes empty).

3 if a node in N (B) has free space in its cache block then

4 Select such a node x and a broken neighbor b ∈ B. Prefer the x with the most free space and a b with no broken neighbors.

5 Move b to the cache block containing x.

6 else if a node b ∈ B has enough free space in its cache block then

7 Select the neighbor x ∈ N (b) with the smallest |D(x)|.

8 Move x and all nodes in D(x) to the cache block containing b.

9 else

10 Select a node x ∈ N (B) and its broken neighbor b ∈ B. Prefer a broken x, and after that an x with small |D(x)|. If there are multiple choices for b, prefer one with N (b) \ x non-broken.

11 Move b, x and all nodes in D(x) to a newly-allocated cache block.

Algorithm 3.3 The local relocation algorithm. B is a set of potentially broken nodes which the algorithm will make non-broken; N (B) =S

b∈BN (b), where N (b) is the set of neighbors of node b. An implementation detail is that the algorithm needs access to the parent, grandparent and great grandparent of each node in B, since the grandparent may have to be moved in lines 8 and 11.

one of three options is executed, and these two steps are repeated until B becomes empty.

The first option of the three examines the cache blocks of all neigh-bors of the broken nodes to find a neighbor x with free space in its cache block. If such a neighbor is found, a broken node b ∈ N (x) is fixed by moving it to this cache block. If no such neighbor exists, the second option is to examine the cache blocks of the nodes in B: if one of them has enough space for a neighbor x and its dependants D(x), they are moved to this cache block.

Otherwise, the third option is to forcibly fix a broken node b by moving it and some neighboring nodes to a newly allocated cache block.

At least one neighbor x of b needs to be moved along with b to make b non-broken; but if x was not broken, some of its other neighbors may depend on x staying where it is – these are exactly the nodes in D(x), and they are all moved to the new cache block. It is safe to move the nodes in D(x) together with x: since they depend on x, none of their other neighbors are on the same cache block.

Clearly, applying these options repeatedly fixes any amount of bro-ken nodes:

34 CACHE-SENSITIVE BINARY SEARCH TREES

Theorem 3.6 Assume that Invariant 3.1 holds in all nodes of a tree, except for a set B of broken nodes. Then giving B to Algorithm 3.3 will establish Invariant 3.1 everywhere.

When reading the following theorem, please recall that |B| ≤ 6 when Algorithm 3.3 is executed after a structure modification.

Theorem 3.7 Algorithm 3.3 moves at most 4|B| = O(|B|) nodes in memory. The total time complexity of the algorithm is O(|B|2).

Proof. Each iteration of the loop in Algorithm 3.3 fixes at least one broken node. Line 5 does this by moving one node; line 11 moves 2–4 nodes (b, x, and at most two other neighbors of x), and line 8 moves 1–3 nodes (x and at most two neighbors). There are at most |B|

iterations, and thus at most 4|B| nodes are moved.

Each iteration looks at O(|B|) nodes, making the total time com-plexity O(|B|2), assuming that the amount of free space in a cache block can be found in constant time. However, a na¨ıve implementation finds free space by looking at every node in the B1-block to locate free posi-tions for nodes. This increases the time complexity to O(|B|2·bB1/B0c), but may actually be preferable with the small B1of current processors.

The implementation described in Section 3.5 did this, with B1/B0= 4.

With larger B1/B0, the bound of the theorem is reached by keeping track of the number of free positions for nodes in an integer stored in a fixed location inside the B1-sized block. To find a free node in constant time, a doubly-linked list of free nodes can be stored in the otherwise unused free nodes themselves, as is done in [64], and a pointer to the head of this list stored in a fixed location inside the B1-block.

A slight optimization is possible in the special case in which a leaf node n with parent p is deleted from an internal avl tree (Figure 3.3(e)) – broken nodes can then be fixed in a simpler way. If p and n are on the same cache block, then look at the other child n0 of p. If n0 is not on the same cache block as p, move n0 to the location where n was deleted from (in an avl tree, n0 must be a leaf node). Otherwise – if p and n are on different cache blocks, or if n0 does not exist – nothing needs to be done. However, this optimization does not work for red-black trees, since there the node n0 might not be a leaf.

A few bits are enough to store this: log2bB1/B0c must be much smaller than, e.g., the size of a single child pointer.