Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

(1)

Binary Search Trees

Nodes in a binary search tree ( B-S-T) are of the form

L R Satellite data

Key P – parent

The B-S-T has a root node which is the only node whose parent pointer is NIL .

A Generic Tree

A

I D

H J

B

F

K L

C E

G

Binary Trees

Binary tree is

– a root

– left subtree (maybe empty) – right subtree (maybe empty)

Properties

– max # of leaves:

– max # of nodes:

– average height for N nodes:

Representation:

A

B

D E

C F

H G

J I

Data

right pointer left pointer

1.Is this binary tree complete?

Why not?

(C has just one child, right side is much deeper than left)

2. What’s the maximum # of leaves a binary tree of depth d/height h can have? 2^d

3.What’s the max # of nodes a binary tree of depth d/height h can have? 2 ^{d + 1}- 1

Minimum? 2^d-1+ 1 ; 2^d

4.We won’t go into this, but if you take N nodes and assume all distinct trees of the nodes are equally likely, you get an average depth/height of SQRT(N).

Is that bigger or smaller than log n?

Bigger, so it’s not good enough! We will see we need to impose structure to get the bounds we want

A

B

D E

C F

H G

J I

(2)

Representation

A

B

D E

C F B

C

D

E

F

Implementations of Binary Trees

A representation of a binary tree data node is similar to a doubly linked list in that it has two pointers.

The graphic of a node shows a data area, and a left and right pointer. Some trees only store data in the leaf nodes.

In this tree of five nodes, the following properties can be seen:

there are 5 nodes

• 3 internal

• 2 leaf nodes

• there are a total of 10 pointers

• 6 of the pointers are null pointers

Here is an example of a union binary tree implementation where the internal nodes are of a different construct than the leaf nodes.

The internal nodes contain pointers, and a data element representing a mathematical operator.

The leaf nodes are data nodes only, and contain the operands.

4 * x * (2 * x + a) - c

For the above equation:

• Post-order traversal of the above tree will regenerate the post-fix notation of the equation.

• Pre-order traversal regenerates prefix notation

• In-order traversal regenerates the form depicted to the left of the tree

(3)

Binary Search Tree

Dictionary Data Structure

4

12 10

6 2

11 5

8

14 13 7 9

Binary tree property

– each node has ≤ 2 children – result:

• storage is small

• operations are simple

• average depth is small – normally

Search tree property

– all keys in left subtree smaller than root’s key

– all keys in right subtree larger than root’s key

– result:

• easy to find any given key

Every BST satisfies the BST property

i. If y is in the LEFT subtree of x then key [ y ] < key [x ]

ii. if y is in the RIGHT subtree of x then key [ y ] > key [ x ] .

This property ensures that data in a B-S-T are stored in such a way as to satisfy the B-S-T property.

Examples

3

11 7

1

8 4

15

4

18 10

6 2

11 5

8

20

NEITHER IS A 21 BINARY SEARCH TREE

7

15

Examples

3

11 7

1

8 4

15

4

18 10

6 2

11 5

8

20

NEITHER IS A 21 BINARY SEARCH TREE

7

15

(4)

Binary Search Trees ( BST)

The defining property of BST is that each node has left and right links pointing to another binary search tree or to external nodes ( which have no non-NIL links).

Î Compare key values in internal nodes with the search key and use result to control progress of the search.

Insert ASERCGHIN into an initially empty BST

Î Notice that each insertion follows a search miss at the bottom of the tree.

Î Insertion is as easy to implement as Search.

Run Times of BST algorithms depend on the shape of the tree

Best Case: Tree is perfectly balanced Î ~ log n nodes from root to the bottom

Worst Case : Could be n nodes from root to the bottom.

Searches on BST : On average require about 2 log n comparisons on a tree with n nodes.

Proof: # of compares = 1 + distance of node to the root Adding over all nodes gives internal path length If C_n= average internal path length of BST with n nodes,

Sorting

If look at BST in proper manner, it represents a sorted file i.e. read the tree from left to right, ignoring the level

(height) of the nodes in the tree i.e. an In Order traversal of the tree

( left subtree => root => right subtree ) BST's are a dual model to quicksort :

Insert { A S E R A H C G I E N X M P E A L } into an empty BST

Node at root corresponds to the pivot element

Traversals

Many algorithms involve walking through a tree, and performing some computation at each node

Walking through a tree is called a traversal Common kinds of traversal

Pre-order

Post-order

Level-order

(5)

Consider the following pseudocode: InOrder_Traversal ( x)

If x = Nil

Then InOrder_Traversal( Left [ x ]) Print key [x ]

InOrder_Traversal ( Right [ x ] )

What is printed if this is applied to the B-S-T in the graph ?

How long does the tree traversal take ? O ( n ) - time for a tree with n items

Visiting each node once and printing the value

An InOrder_Traversal prints the node values in monotonically increasing order.

In Order Listing

20 9

2

15 5

10

30

7 17

In order listing:

2→5→7→9→10→15→17→20→30

Operations on a BST

Searching :

=

^NIL

Find D in the preceding B-S-T :

(6)

What happens if search for C ?

Maximum and Minimum : Very straightforward from the structure of the B-S-T

Tree_ Minimum ( x)

While left [ x ] not null Do x left [ x ] Return x

Tree_ Maximum( x)

While right [ x ] not null Do x right [ x ] Return x

How long does each procedure take to run ? O ( h ) where h = height of the tree .

Just traveling down the tree one level at a time.

20 9

2

15 5

10

30

7 17

Successor and Predecessor :

If all keys are distinct, then the successor of a node x is the node with the smallest key greater than the key [x]

If all keys are distinct, then the predecessor of a node x is the node with the largest key less than the key [x]

Successor and Predecessor :

The structure of the B-S-T allows determination of the successor without any comparison of keys :

Tree_Successor (x) 1. If right [ x ] not null

2. Then return Tree_Minimum ( right [ x ]) 3. y p [ x ]

4. while y not null and x = right [ y ] do x y

5. y p [ y ] 6. return y

What is happening in the situation when the key has no right subtree ?

In this case , if x has a successor then it is the lowest ancestor of x whose left child is also an ancestor of x.

Î to find the successor , in this case, move up the tree from x until find a node that is the left child of its parent.

(7)

1. Find successor of 15

right [ x ] is not null

so execute a call to Extract_Min on right [ 15 ] points to 18 and returns 17 = x.

2. Find successor to 13

i. y gets p[x] and points to 7 node

ii. y not null and x = right [ y ] iii. x2 set to point to 7 node ; y2 set to point to 6 node iv. y2 not null and

x2 = right [ y2 ]

v. x3 set to point to 6 node ; y3 set to point to 15 node vi. y3 is not null and

x3 = right [y3 ] vii. return y3

Îas long as move left up the subtree , we visit smaller keys Î our successor is the node of which we are the predecessor

What is the running time ?

In either case – follow path up the tree or down the tree (and only one of these paths)

Î O ( h ) run time.

What would code look like for the predecessor of x ?

Theorem : The dynamic set operations :

Search, Minimum, Maximum, Successor and Predecessor

can run in O ( h ) time on a B-S-T of height h.

Idea behind Insertion

1. goal of the algorithm is to find a place to insert a new node

2. similar to the search code but with a few twists 3. as you go keep two pointers :

one to where you are ; one to where you have been ( to allow for a quick connection)

4. trace a path from the root to a null this locates where the node will go

5. what if there is no tree ? set this “new” node to be the root

What if the input string is : B D F H J L and no tree exists at first insert ?

(8)

Insertion and Deletion

These operations cause the dynamic set represented by the B-S-T to change. Changes are made so that the B-S-T property is preserved. Insertion :

Begin at the root and trace a path downward in the tree - x traces the path ; y retains the pointer to the parent of x -directional choices are determined by the compare :

key [x] vs key [z]

until x is set to nil

- nil occupies the location where z is to be stored - Running time : as with others, this is O ( h )

Deleting a Node

1. If the node is an external node , simply replace it with a NIL value

2. If it is an internal node, then it has 1 or 2 children that cannot simply be orphaned – they need to be reattached to the BST tree while preserving the BST property.

Case 1 : node has one child :

Replace the node with the value ( key) of its child Case 2 : node has two children :

Find the successor ( or predecessor ) of the node to be removed replace the node with the value ( key) of the successor ( or predecessor )move to earlier cases to resolve any created orphans.

Deletion

This operation is a bit more complicated – depends basically on whether the node to be deleted , z , has:

- No children

In this case, remove the node by changing its parent, p[z], by replacing z with NIL as its child

- A single child

Remove the child and create a “ spliced link” from the parent, p [ z ] to the child of z

- Two children

A bit more complicated – find the successor y that has no left child and replace the contents of z with the contents of y. In this case it’s successor is the minimum in its right subtree, and so, that successor has no left children

Tree_Delete ( T , z )

(9)

Theorem : The dynamic set operations Insert and Delete can run in O( h ) time, in a binary search tree of height h.

Note : h not n

Sorting :

Sort ( A ) for i 1 to n

do Tree_Insert ( A [ i ] ) InOrder_Traversal (root)

What should you expect for a lower bound on the run time ?

## Ω ( n lg n ) ###

Why ? - Is this a comparison based sort ?

Average Case Analysis

( same as Quicksort )

The algorithm is a quicksort in which the partitioning process maintains the order of the elements in each partition.

Consider : given : 3 1 8 2 6 7 5

In turn everything is compared to 3 then to 1 or 8, etc.

-order is different than quicksort, concept same : namely, at each level n compares, depth ~ lg n ÎΩ ( n lg n ) running time.

For a priority queue – to extract the minimum : Extract_Min ( x) - returns a pointer to the Min key while left(x) = null

do x left [ x ] return x

*** examine the first tree ( F B …) and see what happens.

(10)

Deletion

20 9

2

15 5

10

30

7 17

Why might deletion be harder than insertion?

Lazy Deletion

Instead of physically deleting nodes, just mark them as deleted

+ Simpler

+ some adds just flip deleted flag + physical deletions done in batches + extra memory for deleted flag – many lazy deletions slow finds – some operations may have to be

modified (e.g., min and max)

20 9

2

15 5

10

30

7 17

Lazy Deletion

20 9

2

15 5

10

30

7 17

Delete(17) Delete(15) Delete(5) Find(9) Find(16) Insert(5) Find(17)

Deletion - Leaf Case

20 9

2

15 5

10

30

7 17

Delete(17)

(11)

Deletion - One Child Case

20 9

2

15 5

10

30 7

Delete(15)

Deletion - Two Child Case

30 9

2

20 5

10

7 Delete(5)

replace node with value guaranteedto be between the left and right subtrees: the successor

Could we have used the predecessor instead?

Deletion - Two Child Case

30 9

2

20 5

10

7 Delete(5)

always easy to delete the successor – always has either 0 or 1 children!

Delete Code

void delete(Comparable x, Node *& p) { Node * q;

if (p != NULL) {

if (p->key < x) delete(x, p->right);

else if (p->key > x) delete(x, p-

>left);

else { /* p->key == x */

if (p->left == NULL) p = p->right;

else if (p->right == NULL) p = p-

>left;

else {

q = successor(p);

p->key = q->key;

delete(q->key, p->right);

} } } }

(12)

Beauty is Only Θ(log n) Deep

Binary Search Trees are fast if they’re shallow:

– e.g.: perfectly complete

– e.g.: perfectly complete except the “fringe” (leafs) – any other good cases?

What matters here?

Problems occur when one branch is muchlonger than the other!

Balance

Balance measure :

height(left subtree) - height(right subtree)

¾ zero everywhere ⇒perfectly balanced

¾ small everywhere⇒balanced enough

t

5 7

Binary Search Trees. A Generic Tree. Binary Trees. Nodes in a binary search tree ( B-S-T) are of the form. P parent. Key. Satellite data L R

Binary Search Trees

A Generic Tree

Binary Trees

Representation

Implementations of Binary Trees

Binary Search Tree

Dictionary Data Structure

Every BST satisfies the BST property

i. If y is in the LEFT subtree of x then key [ y ] < key [x ]

ii. if y is in the RIGHT subtree of x then key [ y ] > key [ x ] .

Examples

Examples

Binary Search Trees ( BST)

Sorting

Insert { A S E R A H C G I E N X M P E A L } into an empty BST

Node at root corresponds to the pivot element

Traversals

In Order Listing

Operations on a BST

=

Find D in the preceding B-S-T :

What happens if search for C ?

Successor and Predecessor :

Successor and Predecessor :

1. Find successor of 15

What is the running time ?

Idea behind Insertion

Insertion and Deletion

Deleting a Node

Deletion

Tree_Delete ( T , z )

Theorem : The dynamic set operations Insert and Delete can run in O( h ) time, in a binary search tree of height h.

Note : h not n

Sorting :

Sort ( A ) for i 1 to n

do Tree_Insert ( A [ i ] ) InOrder_Traversal (root)

What should you expect for a lower bound on the run time ?

## Ω ( n lg n ) ###

Why ? - Is this a comparison based sort ?

Average Case Analysis

Deletion

Lazy Deletion

Lazy Deletion

Deletion - Leaf Case

Deletion - One Child Case

Deletion - Two Child Case

Deletion - Two Child Case

Delete Code

Beauty is Only Θ(log n) Deep

Balance

Balance between -1 and 1 everywhere ⇒

maximum height of 1.44 log n