• No results found

UPGMA with Priority Queues. CS181 Fall 2021

N/A
N/A
Protected

Academic year: 2022

Share "UPGMA with Priority Queues. CS181 Fall 2021"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

UPGMA with Priority Queues

CS181 Fall 2021

(2)

UPGMA Algorithm

1. Initialize: assign every sequence to its own cluster 2. Iterate: while multiple clusters remain:

a. Find the two clusters with minimum distance b. Merge these clusters together

c. Compute the distance between the new cluster and all other clusters d. Add a new node to the tree for the new cluster

3. Termination occurs when only 1 cluster remains

(3)

UPGMA Algorithm: Straightforward Runtime

1. Initialize: assign every sequence to its own cluster ← O(n) sequences 2. Iterate: while multiple clusters remain: ← O(n) iterations

a. Find the two clusters with minimum distance ← O(n2) pairs of clusters to check b. Merge these clusters together ← O(n) sequences to move to the new cluster

c. Compute the distance between the new cluster and all other clusters ← O(n) computations d. Add a new node to the tree for the new cluster ← O(1) time to compute proper height

3. Termination occurs when only 1 cluster remains Runtime is dominated by step 2a → O(n3) time

(4)

Priority Queues

Priority queue: a data structure that stores (key, element) pairs—the key of each element determines its priority in the queue

Priority queues support the following operations:

insert(k,e): insert element e with key k into the priority queue

min(): get the element with the smallest key from the priority queue

removeMin(): remove the element with the smallest key from the priority queue

(5)

Priority Queues as Heaps

Priority queues are often implemented as heaps

Heap: a complete binary tree where the key at every node is greater than or equal to the key of its parent

Complete binary tree: a binary tree in which every level is filled, except possibly the last level, which is filled from the left

Examples of heaps: 1

3 7

6 30 15 20

3

9 4

11 10 5

(6)

Priority Queues as Heaps: Insert

To insert an element into a heap, add it to the bottom of the heap and

iteratively move the element upwards until its key is larger than its parent’s key

Example:

1

3 7

6 30 15 20

2

1

3 7

2 30 15 20

6

1

2 7

3 30 15 20

6

(7)

Priority Queues as Heaps: Min and RemoveMin

The element with the smallest key is always at the root of a heap

To remove this element, move the last element of the heap to the root, and iteratively move this element downwards by swapping it with its smaller child until its key is smaller than both of its childrens’ keys

Example: 1

4 3

5 30 15 20

6

6

4 3

5 30 15 20

3

4 6

5 30 15 20

(8)

Priority Queues as Heaps: Operation Runtimes

A heap with n elements has height (log n)

insert(k,e) and removeMin() both require O(log n) time because they need to move an element up or down the height of the tree

min() requires O(1) time because the minimum key is always at the root

(9)

Priority Queues as Heaps: Additional Functionality

If we store a locator L for every node in a priority queue, then we can access any node in the priority queue in O(1) time and remove any node from the priority queue in O(log n) time

If we know all of the elements that will be inserted into the priority queue in advance, we can construct the priority queue bottom-up in O(n) time

This additional functionality is covered in more detail in CSCI 1570 (Design and Analysis of Algorithms)

(10)

UPGMA Algorithm with Priority Queues

The elements of the priority queue are pairs of clusters, and the keys of the priority queue are distances between those clusters

1. Initialize: assign every sequence to its own cluster and construct the initial priority queue

2. Iterate: while multiple clusters remain:

a. Find the two clusters with minimum distance

b. Merge these clusters together and remove all pairs containing the merged clusters from the priority queue

c. Compute the distance between the new cluster and all other clusters and add all these pairs of clusters to the priority queue

d. Add a new node to the tree for the new cluster

3. Termination occurs when only 1 cluster remains

(11)

UPGMA Algorithm: Improved Runtime

1. Initialize:

a. Assign every sequence to its own cluster O(n) sequences

b. Construct the initial priority queue O(n2) time because we know all the elements

2. Iterate: while multiple clusters remain: ← O(n) iterations

a. Find the two clusters with minimum distance ← O(1) time

b. Merge these clusters together ← O(n) sequences to move to the new cluster

c. Remove all pairs containing the merged clusters from the priority queue ← O(n log n) time i. Note: each removal requires O(log n2) = O(2 log n) = O(log n) time

d. Compute the distance between the new cluster and all other clusters ← O(n) computations e. Add all these pairs of clusters to the priority queue ← O(n log n) time

f. Add a new node to the tree for the new cluster O(1) time to compute proper height

3. Termination occurs when only 1 cluster remains

Runtime is now dominated by 2c and 2e → O(n2 log n) time

(12)

Summary

The straightforward runtime of the UPGMA algorithm is O(n3)

Priority queues are powerful data structures that can efficiently find the maximum or minimum of a large group of elements

Implementing the UPGMA algorithm with priority queues improves its runtime to O(n2 log n)

References

Related documents

Serum androgens level were significantly correlated to the severity of PAA in which 33 (43.4%) women with mild acne showed raised DHEA-S compared to 21 (79.2%) women with

If variable binding relations were always computed be- fore coreference assignment, we should have observed a preference for linking the pronoun to the QP antecedent, as indexed

The purpose of this study was to further the research dialogue in the relationship between strategy use and accelerated learning in vocal sight-reading. Specifically, the goal was

bool priority queue:: empty () const ; C ontainer::size type. priority queue:: size () const

• Solution 1: vary preemption times according to queue – processes in lower priority queues have longer time slices • Solution 2: promote a process to higher priority queue. –

GI: Indigestion Abdominal pain Nausea/Vomiting Dark Stools Jaundice Rectal bleeding Incontinence Change in bowel function. GU: Pain with urination Frequent urination

Inputs Formation stress Reservoir height Formation Permeability Required Fracture Conductivity Reservoir Volume The Matlab Model Outputs Optimized fracture dimensions