• No results found

Quicksort

N/A
N/A
Protected

Academic year: 2020

Share "Quicksort"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Quicksort I

To sort

a[left...right]

:

1. if left < right:

1.1. Partition a[left...right] such that:

all a[left...p-1] are less than a[p], and all a[p+1...right] are >= a[p]

(3)

Partitioning (Quicksort II)

A key step in the Quicksort algorithm is

partitioning

the

array

■ We choose some (any) number p in the array to use as a pivot ■ We partition the array into three parts:

p

numbers less than p

numbers greater than or equal to p

(4)

Partitioning II

Choose an array value (say, the first) to use as the

pivot

Starting from the left end, find the first element

that is greater than or equal to the pivot

Searching backward from the right end, find the

first element that is less than the pivot

Interchange (swap) these two elements

Repeat, searching from where we left off, until

(5)

Partitioning

To partition a[left...right]:

1. Set pivot = a[left], l = left + 1, r = right;

2. while l < r, do

2.1. while l < right & a[l] < pivot , set l = l + 1

2.2. while r > left & a[r] >= pivot , set r = r - 1

2.3. if l < r, swap a[l] and a[r]

3. Set a[left] = a[r], a[r] = pivot

(6)

Example of partitioning

■ choose pivot:

4

3 6 9 2 4 3 1 2 1 8 9 3 5 6

■ search:

4

3 6 9 2 4 3 1 2 1 8 9

3

5 6

■ swap:

4

3

3

9 2 4 3 1 2 1 8 9 6

5 6

■ search:

4

3 3 9 2 4 3 1 2

1

8 9 6 5 6

■ swap:

4

3

3

1

2 4 3 1 2 9 8 9 6 5 6

■ search:

4

3

3 1 2 4 3 1

2

9 8 9 6 5 6

■ swap:

4

3

3 1 2

2

3 1 4 9 8 9 6 5 6

(7)

The partition method (Java)

static int partition(int[] a, int left, int right) { int p = a[left], l = left + 1, r = right;

while (l < r) {

while (l < right && a[l] < p) l++; while (r > left && a[r] >= p) r--; if (l < r) {

int temp = a[l]; a[l] = a[r]; a[r] = temp; }

}

a[left] = a[r]; a[r] = p;

(8)

The quicksort method (in Java)

static void quicksort(int[] array, int left, int right) { if (left < right) {

int p = partition(array, left, right); quicksort(array, left, p - 1);

quicksort(array, p + 1, right); }

(9)

Analysis of quicksort—best case

Suppose each partition operation divides the array

almost exactly in half

Then the depth of the recursion in

log

2

n

(10)
(11)

Best case II

We cut the array size in half each time

So the depth of the recursion in

log

2

n

At each level of the recursion, all the partitions at that

level do work that is linear in

n

O(log

2

n) * O(n) = O(n log

2

n)

Hence in the average case, quicksort has time

complexity

O(n log

2

n)

(12)

Worst case

In the worst case, partitioning always divides the size

n

array into these three parts:

■ A length one part, containing the pivot itself

■ A length zero part, and

■ A length n-1 part, containing everything else

We don’t recur on the zero-length part

Recurring on the length

n-1

part requires (in the worst

(13)
(14)

Worst case for quicksort

In the worst case, recursion may be

n

levels deep (for

an array of size

n

)

But the partitioning work done at each level is still

n

O(n) * O(n) = O(n

2

)

So worst case for Quicksort is

O(n

2

)

When does this happen?

■ There are many arrangements that could make this happen

■ Here are two common cases:

■ When the array is already sorted

(15)

Typical case for quicksort

If the array is sorted to begin with, Quicksort is

terrible:

O(n

2

)

It is possible to construct other bad cases

However, Quicksort is

usually

O(n log

2

n)

The constants are so good that Quicksort is

generally the fastest algorithm known

(16)

Tweaking Quicksort

Almost anything you can try to “improve”

Quicksort will actually slow it down

One

good

tweak is to switch to a different

sorting method when the subarrays get small

(say, 10 or 12)

■ Quicksort has too much overhead for small array

sizes

For large arrays, it

might

be a good idea to check

beforehand if the array is already sorted

(17)

Picking a better pivot

Before, we picked the

first

element of the subarray

to use as a pivot

■ If the array is already sorted, this results in O(n2)

behavior

■ It’s no better if we pick the last element

We could do an

optimal

quicksort (guaranteed

O(n log n)

) if we always picked a pivot value

that exactly cuts the array in half

■ Such a value is called a median: half of the values in the

array are larger, half are smaller

■ The easiest way to find the median is to sort the array

(18)

Median of three

Obviously, it doesn’t make sense to sort the array

in order to find the median to use as a pivot

Instead, compare just

three

elements of our

(sub)array—the first, the last, and the middle

■ Take the median (middle value) of these three as pivot ■ It’s possible (but not easy) to construct cases which will

make this technique O(n2)

Suppose we rearrange (sort) these three numbers

so that the smallest is in the first position, the

largest in the last position, and the other in the

middle

(19)

Final comments

Quicksort is the fastest known sorting algorithm

For optimum efficiency, the pivot must be chosen

carefully

“Median of three” is a good technique for choosing the

pivot

However, no matter what you do, there will be some

(20)
(21)

The Hiring Problem

■ You are using an employment agency to hire a new assistant. ■ The agency sends you one candidate each day.

■ You interview the candidate and must immediately decide

whether or not to hire that person. But if you hire, you must also fire your current office assistant—even if it’s someone you have recently hired.

■ Cost to interview is c

i per candidate.

■ Cost to hire is c

h per candidate.

■ You want to have, at all times, the best candidate seen so far. ■ When you interview a candidate who is better than your current

assistant, you fire the current assistant and hire the candidate.

(22)

Pseudo-code to Model the Scenario

Hire-Assistant (n)

best ← 0 ;;Candidate 0 is a least qualified sentinel candidate

for i ← 1 to n

do interview candidate i

if candidate i is better than candidate best

then besti

hire candidate i

Cost Model: Slightly different from the model considered so far. However, analytical techniques are the same.

• Want to determine the total cost of hiring the best candidate. • If n candidates interviewed and m hired, then cost is nci+mch. • Have to pay nci to interview, no matter how many we hire. • So, focus on analyzing the hiring cost mch.

(23)

Worst-case Analysis

In the worst case, we

hire all

n

candidates

.

This happens if each candidate is better than all those

who came before. Candidates come in increasing order

of quality.

Cost is

nc

i

+

nc

h

.

If this happens, we fire the agency. What should happen

(24)

Probabilistic Analysis

We need a probability distribution of inputs to determine

average-case behavior over all possible inputs.

For the hiring problem, we can assume that candidates

come in random order.

■ Assign a rank rank(i), a unique integer in the range 1 to n to

each candidate.

■ The ordered list 〈rank(1), rank(2), …, rank(n)〉 is a

permutation of the candidate numbers 〈1, 2, …, n〉.

■ Let’s assume that the list of ranks is equally likely to be any

one of the n! permutations.

■ The ranks form a uniform random permutation.

■ Determine the number of candidates hired on an average,

(25)

Randomized Algorithm

Impose a distribution

on the inputs by using

randomization within the algorithm.

Used when

input distribution is not known

, or cannot be

modeled computationally.

For the hiring problem:

■ We are unsure if the candidates are coming in a random order. ■ To make sure that we see the candidates in a random order, we

make the following change.

■ The agency sends us a list of n candidates in advance. ■ Each day, we randomly choose a candidate to interview.

■ Thus, instead of relying on the candidates being presented in a

(26)

Discrete Random Variables

A

random variable

X is a

function

from a sample space S

to the real numbers.

If the space is finite or countably infinite, a random

variable X is called a

discrete random variable

.

Maps each possible outcome of an experiment to a real

number.

For a random variable

X

and a real number

x

, the event

X=x

is {

s

S

: X(s)=x

}

.

Pr{

X=x

} = ∑

{sS:X{s}=x}

Pr{

s

}

f

(

x

) = Pr{

X

=

x

} is the

probability density function

of the

(27)

Discrete Random Variables

Example:

■ Rolling 2 dice.

X: Sum of the values on the two dice.

■ Pr{X=7} = Pr{(1,6),(2,5),(3,4),(4,3),(5,2),(6,1)}

(28)

Expectation

Average or mean

The expected value of a discrete random variable

X

is

E

[

X

] = ∑

x

x

Pr{

X

=

x

}

Linearity of Expectation

E[X+Y] = E[X]+E[Y], for all X, Y

E[aX+Y]=aE[X]+E[Y], for constant a and all X, Y

For mutually independent random variables

X

1

,

X

2

, …,

X

n

E[X

(29)

Expectation – Example

Let

X

be the RV denoting the value obtained when a fair

die is thrown. What will be the mean of

X

, when the die

is thrown

n

times.

■ Let X

1, X2, …, Xn denote the values obtained during the n throws.

■ The mean of the values is (X

1+X2+…+Xn)/n.

■ Since the probability of getting values 1 thru 6 is (1/6), on an

average we can expect each of the 6 values to show up (1/6)n

times.

■ So, the numerator in the expression for mean can be written as

(1/6)1+(1/6)2+…+(1/6)6

(30)

Indicator Random Variables

A simple yet powerful technique for computing the

expected value of a random variable.

Convenient method for converting between

probabilities and expectations.

Helpful in situations in which there may be

dependence.

Takes only 2 values, 1 and 0.

Indicator Random Variable for

an

event A

of a

(31)

Indicator Random Variable

Lemma 5.1

Given a sample space

S

and an event

A

in the sample

space

S

, let

X

A

= I{

A

}

. Then

E[

X

A

] = Pr{A}

.

Proof:

Let Ā = S – A (Complement of A) Then,

E[XA] = E[I{A}]

(32)

Indicator RV – Example

Problem:

Determine the expected number of heads in

n

coin flips.

Method 1:

Without indicator random variables.

Let

X

be the random variable for the number of heads in

n

flips.

Then, E[

X

] = ∑

k=0..n

k

·Pr{

X

=

k

}

(33)

Indicator RV – Example

Method 2 :

Use Indicator Random Variables

Define

n

indicator random variables,

X

i

, 1 ≤

i

n

.

Let

X

i

be the indicator random variable for the event

that the

i

th

flip results in a Head.

X

i

= I{the

i

th

flip results in

H

}

Then

X

=

X

1

+

X

2

+ …+

X

n

= ∑

i=1..n

X

i

.

By Lemma 5.1, E[

X

i

] = Pr{H} = ½, 1 ≤

i

n

.

Expected number of heads is E[

X

] = E[∑

i=1..n

X

i

].

By linearity of expectation, E[∑

i=1..n

X

i

] = ∑

i=1..n

E[

X

i

].

E[

X

] = ∑

i=1..n

E[

X

(34)

Randomized Hire-Assistant

Randomized-Hire-Assistant (n)

Randomly permute the list of candidates

best ← 0 ;;Candidate 0 is a least qualified dummy candidate

for i ← 1 to n

do interview candidate i

if candidate i is better than candidate best

then besti

hire candidate i

(35)

Analysis of the Hiring Problem

(Probabilistic analysis of the deterministic algorithm)

X

– RV that denotes the

number of times we hire a new

office assistant

.

Define indicator RV’s

X

1

,

X

2

, …,

X

n

.

X

i

= I{candidate

i

is hired}

.

As in the previous example,

X = X

1 + X2 + …+ Xn

■ Need to compute Pr{candidate i is hired}.

Pr{candidate

i

is hired}

i is hired only if i is better than 1, 2,…,i-1.

■ By assumption, candidates arrive in random order

■ Candidates 1, 2, …, i arrive in random order.

■ Each of the i candidates has an equal chance of being the best so far. ■ Pr{candidate i is the best so far} = 1/i.

■ E[X

(36)

Analysis of the Hiring Problem

■ Compute E[X], the number of candidates we expect to

hire.

By Equation (A.7) of the sum of a harmonic series.

(37)

Analysis of the randomized hiring

problem

Permutation of the input array results in a situation that

is identical to that of the deterministic version.

Hence, the same analysis applies.

Expected hiring cost

is hence

O

(

c

(38)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

(39)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

What is best case running time?

■ Recursion:

(40)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

What is best case running time?

■ Recursion:

1. Partition splits array in two sub-arrays of size n/2 2. Quicksort each sub-array

(41)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

What is best case running time?

■ Recursion:

1. Partition splits array in two sub-arrays of size n/2 2. Quicksort each sub-array

■ Depth of recursion tree? O(log

(42)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

What is best case running time?

■ Recursion:

1. Partition splits array in two sub-arrays of size n/2 2. Quicksort each sub-array

■ Depth of recursion tree? O(log

2n)

(43)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

What is best case running time?

■ Recursion:

1. Partition splits array in two sub-arrays of size n/2 2. Quicksort each sub-array

■ Depth of recursion tree? O(log

2n)

(44)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

(45)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

(46)

Quicksort: Worst Case

Assume first element is chosen as pivot.

Assume we get array that is already in order:

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(47)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(48)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(49)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(50)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(51)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(52)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

[0] [1] [2] [3] [4] [5] [6] [7] [8]

(53)

1. While data[too_big_index] <= data[pivot] ++too_big_index

2. While data[too_small_index] > data[pivot] --too_small_index

3. If too_big_index < too_small_index

swap data[too_big_index] and data[too_small_index] 4. While too_small_index > too_big_index, go to 1.

5. Swap data[too_small_index] and data[pivot_index]

2 4 10 12 13 50 57 63 10 0

pivot_index = 0

(54)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

Worst case running time?

■ Recursion:

1. Partition splits array in two sub-arrays:

• one sub-array of size 0

• the other sub-array of size n-1

2. Quicksort each sub-array

(55)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

Worst case running time?

■ Recursion:

1. Partition splits array in two sub-arrays:

• one sub-array of size 0

• the other sub-array of size n-1

2. Quicksort each sub-array

(56)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

Worst case running time?

■ Recursion:

1. Partition splits array in two sub-arrays:

• one sub-array of size 0

• the other sub-array of size n-1

2. Quicksort each sub-array

■ Depth of recursion tree? O(n)

(57)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

Worst case running time?

■ Recursion:

1. Partition splits array in two sub-arrays:

• one sub-array of size 0

• the other sub-array of size n-1

2. Quicksort each sub-array

■ Depth of recursion tree? O(n)

(58)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

(59)

Quicksort Analysis

Assume that keys are random, uniformly distributed.

Best case running time: O(n log

2

n)

Worst case running time: O(n

2

)!!!

(60)

Improved Pivot Selection

Pick median value of three elements from data array:

data[0], data[n/2], and data[n-1].

(61)

Improving Performance of Quicksort

Improved selection of pivot.

For sub-arrays of size 3 or less, apply brute force

search:

■ Sub-array of size 1: trivial ■ Sub-array of size 2:

■ if(data[first] > data[second]) swap them

References

Related documents

Detta självständiga arbete är ett resultat av kursen EX0652 - Kandidatexamensarbete i Landskapsplanering vid Trädgårdsingenjörsprogrammet, SLU Alnarp. Målet med detta arbete

We stock solenoid valves, gaskets, paper, ribbons, solenoid repair kits, air filters, check valves, steam traps,... S T E

This act of nepotism contributes to the high rate of illiteracy by causing people not to go to school because they believe that even if they get educated, they

During the critical Encoding/Maintenance period, activity on trials with the highest level of accuracy (3 or 4 correct) is higher than trials with lower levels of accuracy.

The method of claim 1 wherein partitioning the array includes determining a pivot element of the array, wherein numbers in the first portion are less than or equal to the

Little India on the other hand uses conservation status to refurbish buildings, and to reserve the core area for traditional Indian trades.. Hence one can say that the conservation

Recent theatre design credits include: Everybody (The New School), Black Hole Wedding (NYMF), Eco Village (Theatre at St Clement’s), The Glow Overhead, Tiny Houses (Chautauqua

Findings suggest that the Female Sexual Function Index (FSFI) remains the most robust sexual morbidity outcome measure, for research or clinical use, in sexually active women