2
Fundamental
Data Structures
17 February 2021
Sebastian Wild
Outline
2
Fundamental Data Structures
2.1 Stacks & Queues 2.2 Resizable Arrays 2.3 Priority Queues 2.4 Binary Search Trees 2.5 Ordered Symbol Tables 2.6 Balanced BSTs
Abstract Data Types
abstract data type (ADT)
I list of supported operations
I whatshould happen
I not: how to do it
I not: how to store data
β Javainterface
(with Javadoc comments)
vs.
data structures
I specify exactly
how data is represented
I algorithms for operations
I has concrete costs
(space and running time)
β Javaclass
(non abstract)
Why separate?
I Can swap out implementations βdrop-in replacementsβ)
reusable code!
I (Often) better abstractions
Stacks
Stack ADT
I top()
Return the topmost item on the stack Does not modify the stack.
I push(π₯)
Addπ₯onto the top of the stack.
I pop()
Remove the topmost item from the stack (and return it).
I isEmpty()
Returnstrueiff stack is empty.
I create()
Linked-list implementation for Stack
Invariants:
I maintaintoppointer to topmost element
I each element points to the element below it
(ornullif bottommost)
Linked stacks:
I requireΞ(π)space whenπelements on stack
Array-based implementation for Stack
Can we avoid extra space for pointers? array-based implementation
Invariants:
I maintain arraySof elements, from bottommost to topmost
I maintain indextopof position of topmost element inS.
What to do if stack is full uponpop?
Array stacks:
I requirefixed capacityπΆ(known at creation time)!
I requireΞ(πΆ)space for a capacity ofπΆelements
Digression β Arrays as ADT
Arrays can also be seen as an ADT! . . . but are commonly seen as specific data structure
Array operations:
I create(π) Java:A = new int[π];
Create a new array withπcells, with positions0, 1, . . . , π β 1
I get(π) Java:A[π]
Return the content of cellπ
I set(π,π₯) Java:A[π] = π₯;
Set the content of cellπtoπ₯.
Arrays have fixed size (supplied at creation).
Usually directly implemented by compiler + operating system / virtual machine.
Difference to others ADTs: Implementation usually fixed to βa contiguous chunk of memoryβ.
Doubling trick
Can we have unbounded stacks based on arrays? Yes!
Invariants:
I maintain arraySof elements, from bottommost to topmost
I maintain indextopof position of topmost element inS
I maintain capacityπΆ =S.lengthso that14πΆ β€ π β€ πΆ
can always push more elements!
How to maintain the last invariant?
I beforepush
Ifπ = πΆ, allocate new array of size2π, copy all elements.
I afterpop
Ifπ < 14πΆ, allocate new array of size2π, copy all elements.
βResizing Arrays
an implementation technique, not an ADT!
Amortized Analysis
I Any individual operationpush/popcan be expensive!
Ξ(π)time to copy all elements to new array.
I But: An one expensive operation of costπmeansΞ©(π)next operations are cheap!
Formally:consider βcredits/potentialβ
distance to boundary
Ξ¦ =min{π β 14πΆ, πΆ β π} β [0, 0.6π]
I amortized cost of an operation = actual cost (array accesses) β 4Β·change inΞ¦
I cheappush/pop: actual cost1array access, consumesβ€ 1credits amortized costβ€ 5
I copyingpush: actual cost2π + 1array accesses, creates12π + 1credits amortized costβ€ 5
I copyingpop: actual cost2π + 1array accesses, creates12π β 1credits amortized cost5
sequenceofπoperations: total actual costβ€total amortized cost+final credits
Queues
Operations:
I enqueue(π₯)
Addπ₯at the end of the queue.
I dequeue()
Remove item at the front of the queue and return it.
Bags
What do Stack and Queue have in common?
They are special cases of aBag!
Operations:
I insert(π₯)
Addπ₯to the items in the bag.
I delAny()
Remove any one item from the bag and return it. (Not specified which; any choice is fine.)
I roughly similar to JavaβsCollection
Sometimes it is useful to state that order is irrelevant Bag
Priority Queue ADT β min-oriented version
Now: elements in the bag have differentpriorities.
(Max-oriented) Priority Queue (MaxPQ):
I construct(π΄)
Construct from from elements in arrayπ΄.
I insert(π₯,π)
Insert itemπ₯with priorityπinto PQ.
I max()
Return item with largest priority.(Does not modify the PQ.)
I delMax()
Remove the item with largest priority and return it.
I changeKey(π₯,π0)
Updateπ₯βs priority toπ0.
Sometimes restricted to increasing priority.
I isEmpty()
PQ implementations
Elementary implementations
I unordered list Ξ(1)insert, butΞ(π)delMax
I sorted list Ξ(1)delMax, butΞ(π)insert
Can we get something between these extremes? Like a βslightly sortedβ list? Yes! Binary heaps.
Array view
Heap = arrayπ΄with
βπ β [π] : π΄[bπ/2c] β₯ π΄[π]
β‘
store nodes in level order
inπ΄[1..π]
Tree view
Heap = tree that is
(i)a complete binary tree
all but last level full last level flush left
(ii)heap ordered
Why heap-shaped trees?
Why complete binary tree shape?
I only one possible tree shape keep it simple!
I complete binary trees have minimal height among all binary trees
I simple formulas for moving from a node to parent or children:
For a node at indexπinπ΄
I parent atbπ/2c
I left child at2π
I right child at2π + 1
Why heap ordered?
I Maximum must be at root! max()is trivial!
I But: Sorted only along paths of the tree; leaves lots of leeway for fast
how? . . . stay tuned
Analysis
Height of binary heaps:
I heightof a tree: # edges on longest root-to-leaf path
I depth/levelof a node: # edges from root root has depth0
I How many nodes on firstπfulllevels?
π Γ
β =0
2β =2π+1β 1
Height of binary heap: β = minπs.t.2π+1β 1 β₯ π = blg(π)c
Analysis:
I insert: new element βswimsβ up β€ βsteps (βcmps)
I delMax: last element βsinksβ down β€ βsteps (2βcmps)
I constructfromπelements:
cost = cost of letting each node in heap sink!
β€ 1 Β· β + 2 Β· (β β 1) + 4 Β· (β β 2) + Β· Β· Β· + 2β Β· (β β β ) + Β· Β· Β· + 2ββ1Β· 1 + 2βΒ· 0 = β Γ β =0 2β(β β β ) = β Γ π=0 2β 2ππ = 2 β β Γ π=0 π 2π β€ 2 Β· 2 β β€ 4π
Binary heap summary
Operation Running Time
construct(π΄[1..π]) π(π) max() π(1) insert(π₯,π) π(log π) delMax() π(log π) changeKey(π₯,π0) π(log π) isEmpty() π(1) size() π(1)
Symbol table ADT
Symbol table / Dictionary / Map
Java:java.util.Map<K,V>
/ Associative array / key-value store:
I put(π,π£) Pythondict:d[π] = π£
Put key-value pair(π, π£)into table
I get(π) Pythondict:d[π]
Return value associated with keyπ
I delete(π)
Remove keyπ(any associated value) form table
I contains(π)
Returns whether the table has a value for keyπ
I isEmpty(),size()
I create()
Most fundamental building block in computer science.
Symbol tables vs mathematical functions
I similar interface
I but: mathematical functions are static (never change their mapping)
(Different mapping is a different function)
I symbol table = dynamicmapping
Elementary implementations
Unordered (linked) list:
Fastput
Ξ(π)time forget
Too slow to be useful
Sortedlinked list:
Ξ(π)time forput
Ξ(π)time forget
Too slow to be useful
Binary search
It does help . . . if we have a sorted array! Example:search for69
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97 β π π 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97 β π π 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97 β π π 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97 β Binary search: I halve Β±1 remaining list in each step
β€ blg πc + 1cmps in the worst case
Binary search trees
Binary search trees (BSTs) β dynamic sorted array
I binary tree
I Each node has left and right child
I Either can be empty (null)
I Keys satisfy search-tree property
BST example & find
11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 11 12 17 28 35 55 57 63 69 77 79 80 82 85 97BST insert
Example:Insert88 11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 97 88 88 11 12 17 28 35 55 57 63 69 77 79 80 82 85BST delete
I Easy case: remove leaf, e. g.,11 replace bynull
I Medium case: remove unary, e. g.,69 replace by unique child
I Hard case: remove binary, e. g.,85 swap with predecessor, recurse
11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 88 11 12 17 28 35 55 57 63 69 77 79 80 72 85 88 97
BST summary
Operation Running Time
construct(π΄[1..π]) π(π β) put(π,π£) π(β) get(π) π(β) delete(π) π(β) contains(π) π(β) isEmpty() π(1) size() π(1)
Ordered symbol tables
I min(),max()
Return the smallest resp. largest key in the ST
I floor(π₯), bπ₯c =β€.floor(π₯)
Return largest keyπin ST withπ β€ π₯.
I ceiling(π₯)
Return smallest keyπin ST withπ β₯ π₯.
I rank(π₯)
Return the number of keysπin STπ < π₯.
I select(π)
Return theπth smallest key in ST (zero-based, i. e.,π β [0..π))
With select, we can simulate access as in a truly dynamic array!.
Augmented BSTs
11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 88 1 9 1 4 2 1 7 1 2 16 1 3 1 6 1 2Rank
11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 88 1 9 1 4 2 1 7 1 2 16 1 3 1 6 1 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97Select
11 12 17 28 35 55 57 63 69 77 79 80 82 85 97 88 1 9 1 4 2 1 7 1 2 16 1 3 1 6 1 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 11 12 17 28 35 55 57 63 69 77 79 80 82 85 88 97Balanced BSTs
Balanced binary search trees:
I imposes shape invariant that guaranteesπ(log π)height
I adds rules to restore invariant after updates
I many examples known
I AVL trees(height-balanced trees)
I red-black trees
I weight-balanced trees(BB[πΌ]trees)
I . . .
Other options:
I amortization: splay trees, scapegoat trees
Iβd love to talk more about all of these . . . (Maybe another time)
BSTs vs. Heaps
Balanced binary search tree
Operation Running Time
construct(π΄[1..π]) π(π log π) put(π,π£) π(log π) get(π) π(log π) delete(π) π(log π) contains(π) π(log π) isEmpty() π(1) size() π(1)
min()/max() π(log π) π(1)
floor(π₯) π(log π)
ceiling(π₯) π(log π)
rank(π₯) π(log π)
select(π) π(log π)
Binary heaps Strict Fibonacci heaps
Operation Running Time
construct(π΄[1..π]) π(π) insert(π₯,π) π(log π) π(1) delMax() π(log π) changeKey(π₯,π0) π(log π) π(1) max() π(1) isEmpty() π(1) size() π(1)
I apart from fasterconstruct,
BSTs always as good as binary heaps
I MaxPQ abstraction still helpful