• No results found

Log-squared rebalancing algorithm

Bulk update algorithms for AVL trees

5.2 Log-squared rebalancing algorithm

After actual insertion has been done as specified in the previous section, the produced avl tree is in balance except possibly at the nodes on the path from the root of the update tree to the root of the whole tree.

This is because the update tree itself is in balance, and inserting it in the original tree only affects the heights of the ancestors of the update tree. Therefore, in single-bulk insertion, the rebalancing algorithm only needs to make this path balanced.

The node balancing algorithm of Section 4.1 suggests a simple strat-egy for rebalancing: execute the node balancing algorithm on each node

rebalance(S, P ):

1 for each node n in P from down to up do

2 Update the height value of n.

3 if n is in balance and the height value did not change then

4 return

5 balance-node(n)

Algorithm 5.2 The log-squared rebalancing algorithm for bulk insertion.

The argument P is the path up from the parent of the root of the update tree to the root of the whole tree. This algorithm does not use S, which is a pointer to the root of the update tree.

hi+ 1 ·

Figure 5.1 Two possibilities for minimum possible growth of consecutive siblings of a path in an avl tree. Note that hi+4≥ hi+ 3 in both cases. The balancing direction is displayed inside each node.

on the path up from the parent of the update tree to the root of the whole tree. Algorithm 5.2 includes the optimization that it stops early if it reaches a node which is in balance and whose height is the same as in the original tree, since then the ancestors are already in balance.

We will see below that this rebalancing strategy uses O(log2m) rotations in the worst case, where m is the number of keys that are inserted. Note that the number of rotations is log-squared relative to the height of the update tree, not to the height of the (possibly much larger) tree in which the insertion is performed. This algorithm was first presented in [55]. However, the analysis below is more precise than in that article, including constant factors and giving a sharper bound than can be inferred from the proofs in [55]. These more precise results are needed in the analysis of the more efficient bulk-insertion algorithm presented in Section 5.3.

Figures 5.1 and 5.2 show examples of the extreme cases of the following lemma. The lemma gives a fact about avl trees which will be frequently used in the proofs below.

54 BULK UPDATE ALGORITHMS FOR AVL TREES

Figure 5.2 Maximum possible growth of consecutive siblings of a path in an avl tree. Note that if hj+1= hj+ 3, then hj+2is at most hj+1+ 2. The balancing direction is displayed inside each node.

Lemma 5.1 Consider a path q1, . . . , qn from a leaf q1 to the root qn

of an avl tree. The following hold for the siblings r1, . . . , rn−1 of the nodes on this path: (a) hri≤ hri+1 ≤ hri+3, and (b) hri+j ≥ hri+j −1.

Proof. Because of the balance condition of avl trees, the height of a child is one or two smaller than the height of its parent: hqi + 1 ≤ hqi+1 ≤ hqi+ 2, and also hri+ 1 ≤ hqi+1 ≤ hri + 2. Part (a) follows directly from this. Part (b) is also trivial if we note that hqi+1 ≥ hqi+ 1 implies hqi+j ≥ hqi+ j. Then hri+j ≥ hqi+j+1− 2 ≥ hqi+1+ j − 2 ≥ hri+ j − 1.

Lemma 5.2 Assume that, in a balanced avl tree, a subtree with height h is replaced by a balanced subtree with height h + d, where d ≥ 0. Then Algorithm 5.2 will use at most d2+ 7d/2 + 4 rotations to rebalance the tree, when called with an argument containing the path upward from the parent of the replaced subtree.

Proof. First note that if d = 0, the parent is in balance, so Algorithm 5.2 does no rotations. We consider d > 0 in the following.

Denote by ni the node n on iteration i = 1, . . . , N of the loop in Algorithm 5.2. To find out the number of rotations, we need to consider the heights of the children of ni. On the first iteration (i = 1), one child is the replaced subtree and has height s1 = h + d. The other child is the sibling of the replaced tree, with height h − 1 ≤ h1 ≤ h + 1. On each subsequent iteration (i > 1), one child of ni will be the result of the previous iteration (and its height si−1 ≤ si ≤ si−1+ 1 due to Theorem 4.1), and the other child will be the original sibling of ni−1, i.e., the ith sibling on the path up from the original replaced subtree – denote its height by hi (see Figure 5.3).

By Lemma 5.1 the heights hi must grow by at least so much that hi+j ≥ hi+ j − 1. Assuming that node ni is not yet in balance,

The-S

Figure 5.3 Notation used in Lemmas 5.2 and 5.3.

orem 4.1 implies that if si+1 = si+ 1 (hT = hS in terms of

The height difference on iteration i > 1 is

s1+i−1− h1+i−1 ≤ s1− h1+ d(i − 1)/2e − (i − 1) + 1

≤ d + 2 − b(i − 1)/2c

≤ d − i/2 + 3.

This is ≤ 1 when i ≥ 2d+4; thus, the iteration numbered i = 2d+3 must be the last one. By Theorem 4.2, the number of rotations performed by each iteration is at most one less than the height difference. Then the total number of rotations executed before ni is in balance is at most

2d+3X

After the first iteration where ni is in balance, the height value of ni

may still need to be increased by one from its original value, and thus one further rotation may be necessary somewhere higher up in the tree – the situation is now analogous to single insertion. After this one rotation, ni will be in balance and the height value will not increase, and so the next iteration will be the last. Thus, the algorithm executes at most d2+ 7d/2 + 4 rotations in total.

Theorem 5.1 Algorithm 5.2 uses at most blog2mc2+ 11blog2mc/2 + 17/2 = O(log2m) rotations in the worst case to rebalance the tree after an update tree with m keys has been inserted in an avl tree.

56 BULK UPDATE ALGORITHMS FOR AVL TREES

Proof. The result is implied by Lemma 5.2: an empty subtree (in an internal tree) or a single leaf (in an external tree) is replaced by the update tree, with d = blog2mc + 1 in both cases.

It is instructive to compare Algorithm 5.2 with the single-insertion algorithm in Algorithm 4.2. The rebalancing strategy is essentially the same, because the node balancing algorithm does not care how much imbalance the insertion produced. This implies that if only one key is inserted using bulk insertion, the log-squared rebalancing algorithm performs the same (zero or one) rotations as the single-insertion algo-rithm.