Binary Search Trees
CMPSC 122
Note: This notes packet has significant overlap with the first set of trees notes I do in CMPSC 360, but goes into much greater depth on turning BSTs into pseudocode than in 360. Starting in Spring 2014, I've split the introduction to trees in 360 into two packets: one that encompasses all we do here and a second on the deeper mathematical analysis, namely a proof by strong induction of an important theorem relating the height and number of terminal vertices.
If you are not concurrently taking both courses with me, but take 360 with me later, check in with me about potentially being excused from a lecture that will be review for you there.
I. Motivation
We've learned about various structures in which to store data – arrays, lists, stacks, queues – and each has something about it that makes it unique. What often motivates the choice of structure is what we want to do with it, or how we want to get information out of it. All of those other structures were linear structures. We can use the idea of binary trees to store data in a way that allows branching.
Let's do an activity. You'll give me some numbers, and I'll put them into a binary tree in a particular way. As we go, write down the list of numbers in order and the tree. See if you can figure out what I'm doing.
List of numbers:
Resulting tree:
II. Binary Search Trees, Defined
The kind of tree we're working with is something called a binary search tree, sometimes abbreviated BST.
For a binary tree to be a binary search tree, it must satisfy the binary search tree property. That is, for each node n,
• n's left child must be less than n. More formally…
• n's right child must be greater than n. More formally…
In this definition, we work under the assumption that all keys in a BST are unique. (This isn't a stretch, but if we wanted to allow non-unique keys, there are few different strategies we could employ for "same" keys.)
Now then, it's worth noting how BSTs can be used. While we could certainly use a BST to store a list of numbers, it's really the meaning of those numbers that makes a BST useful. We really want to use a BST to store records. But, in practice, we don't really store an entire record in a node of a BST; we instead store some key to the record (think primary keys in database tables – as we'll see in CMPSC 221).
So, we store keys to records in a tree and use the structure of a binary tree to locate a record easily. That's why it's called a binary search tree.
III. Searching A BST
Question: In the tree we drew above, how would go about searching for the key 50 systematically, given that the tree must follow the BST property?
Question: How would we determine that a key isn't found in a BST?
So, let's generalize and write down pseudocode for an algorithm to search for a node in a BST. It should take as an input a pointer to the tree's root and a search key. It should return a pointer to a node containing the search key, or, in the case of failure, NIL.
Problem: What is the precondition for the above algorithm?
IV. An Algorithm for Insertion into a BST
To build a binary search tree from a set of input numbers:
1. Make the first input the root of the BST.
2. For each remaining input, recursively compare the input to the root of the tree.
a. If the input is less than the root, it becomes the left child of the root (or, recursively, it goes into the left subtree.)
b. If the input is greater than the root, it becomes the right child of the root (or, recursively, it goes into the right subtree.)
Example 1: Build a BST from the following lists:
a. 6, 4, 7 b. 6, 4, 7, 2, 5, 9
Problem:
a. Build a BST from these inputs: 10, 20, 30, 40, 5, 8, 50, 60, 70, 15, 80
b. Comment on the shape of the BST.
Problem: Write a recursive algorithm to insert a key into a BST, given that key and a pointer to the BST's root.
V. Tree Traversal
Once a tree is in place, we can traverse or walk the tree to list the elements of the tree. There are three kinds of traversals.
The first is called an inorder traversal of the tree.
Algorithm: Inorder Traversal(Tree T)
1. Do an Inorder Traversal on the left subtree of T 2. Print the root of T
3. Do an Inorder Traversal on the right subtree of T Notice the recursive nature of this procedure.
Example: Let's go back and do an inorder traversal on a BST from the first page.
The other two kinds of traversals are called preorder and postorder. In short, here's how all three go:
• Inorder Traversal: left, root, right
• Preorder Traversal: root, left, right
• Postorder Traversal: left, right, root
Example: Let's do a preorder traversal on a BST from the first page.
Example: Let's do a postorder traversal on a BST from the first page.
VI. Tree Sort
Question: Suppose we had a list of numbers we wanted to sort. How could we use a BST to do this?
Question: What advantages does this method have?
VII. Performance of BST Algorithms
Problem: Build a BST from these values: 50, 30, 20, 40, 70, 80, 60.
Trace a search for 50. How many comparisons are necessary?
Trace a search for 20. How many comparisons are necessary?
Trace a search for 45. How many comparisons are necessary?
Can we call any of these best or worst-case scenarios?
Let's now consider a tree that's slightly larger, one where each of the leaves of the last tree had 2 children.
Let's again extend the last tree in the same way and get a maximum number of comparisons.
Let's generalize the worst-case number of comparisons for the special case of a binary search tree where each node has exactly 2 children:
Number of nodes (n) Worst-Case Number of Comparisons 7
15 31 63
Question: Does this count as a worst-case running time for a search in a BST? Why? If not, what would an accurate worst case be?
Searching wasn't the only algorithm we looked at. Let's consider the performance of others:
• Insertion
• Traversal