• No results found

Data Structures. Algorithm Performance and Big O Analysis

N/A
N/A
Protected

Academic year: 2021

Share "Data Structures. Algorithm Performance and Big O Analysis"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Structures

Algorithm Performance and Big O

Analysis

(2)

What’s an Algorithm?

• … a clearly specified set of instructions to be followed to solve a problem.

• In essence: A computer program.

• In detail: Defined mathematically in a

course on Theory of Computation (Turing).

(3)

What’s Big “O” do?

• Measures the growth rate of an algorithm as the size of its input grows.

• Huh? “O” is a math function that helps estimate how much longer it takes to run n inputs versus n+1 inputs (or n+2, 2n, 3n…) .

• Doesn’t care what language you use! Only cares

about the underlying algorithm.

(4)

What Doesn’t “O” do?

• Doesn’t tell you that algorithm A is faster than algorithm B for a particular input.

– Why not? Only tells you if one grows faster than another in a general sense for all inputs.

– Usually concerned with very large data inputs.

Called asymptotic algorithm analysis.

(5)

Example: Doesn’t Care About Particular Input

public void algorithm1(Object xInput) {

…16 million lines of code…

}

public void algorithm2(Object xInput) {

if (xInput.size() == 2074) {

return;

} else {

…16 million different lines…

} }

Which one is faster? Always?

We only care about the “average” behavior of the 16 million lines.

(6)

How Calculate Run Time?

1. Each basic operation in the code counts for 1 time unit.

• A basic operation executes in the same time no matter what values it is supplied.

• Examples:

Adding two integers is a basic operation.

Reading a[1] is a basic operation (independent of array size).

Summing the values in an array is NOT a basic operation (why?).

2. Ignore actual time units (seconds, days, etc.).

Could be 1 ns for a fast computer or 1 day for a really slow computer. But for large inputs, won’t matter.

3. Ignore time for method calls, returns, and declarations.

Doesn’t matter in the long run.

(7)

Run Time Example 1

Calculating

public int sum(int num) {

int partialSum = 0;

for(int i=1; i<= num; i++) {

partialSum += i * i * i;

}

return partialSum;

}

N

i

i

1

3

How long to run this code?

(8)

Run Time Example 1 (cont.)

Calculating

public int sum(int num) {

int partialSum = 0;

for(int i=1; i<= num; i++) {

partialSum += i * i * i;

}

return partialSum;

}

N

i

i

1 3

no cost

costs 1 (to init/store in memory)

costs N+1 (once for each test of <=)

(and +1 because of last time through, when it fails)

costs 2 N (once for each + and =…

recall i++ is just i = i + 1) total cost of 4N

(costs 4 per execution… 1 addition, 2 multiplications, 1 assignment)

no cost

Final tally: 1+1+(N+1)+2N+4N = 7N+3

How long to run this code?

costs 1 (to init/store in memory)

(9)

Run Time Example 2

public int sum(int num) {

int partialSum = 0;

for(int i=1; i<= num; i++) {

for(int j=1; j<= num; j++) {

partialSum += i * j;

} }

return partialSum;

}

no cost

no cost costs 1 costs N+1 costs 2N costs N*1

costs N*(N+1) costs N*2N costs N*N*3

Final tally: 1+1+(N+1)+2N+N*1+N*(N+1) +N*2N+N*N*3 = 6N2+5N+3 costs 1

(10)

But This is Overkill!

public int sum(int num) {

int partialSum = 0;

for(int i=1; i<= num; i++) {

for(int j=1; j<= num; j++) {

partialSum += i * j;

} }

return partialSum;

}

Really only one operation, and it happens N

2

times

So we say order of N 2 ,

or O(N 2 )

(11)

Likewise, More Overkill

Calculating

public int sum(int num) {

int partialSum = 0;

for(int i=1; i<= num; i++) {

partialSum += i * i * i;

}

return partialSum;

}

N

i

i

1 3

Really only one operation, and it happens N times

So we say order of N,

or O(N)

(12)

Another “Order Of” Example

public void cool(int n) {

for(int i=2; i<=n; i++) {

int j = (1 + i * i % 3 % i) / (i + 2);

} }

Also, run time, T(N) = 10N - 8.

Can you show me?

The heart of the code is this line.

And it happens N times.

So order of N.

Or say O(N).

(13)

Ah, Back to the Big-O (Definition)

• Call T(N) the run time.

• Definition: T(N) = O(f(N)) if there are positive constants c and n

0

such that

T(N)  c f(N) when N > n

0

.

• What’s it mean? The run time is always less than f(N) for big enough N. (Only the highest order term matters!) (And constants don’t matter.)

Note: f(N) should be the smallest such function such that c and n0exist.

(14)

Example Using Big-O Definition

• In last example, T(N) = 10N - 8

• So, let’s guess T(N) = 10N - 8 = O(N

3

)

• To show that, must show 10N+1  c N3 for some big enough N.

• Let c=1.

• Then 10N - 8  c N3 is true for all N > 10.

• In fact, true for all N > 4.

» i.e., in definition, let n0 = 4

• So by definition, 10N - 8 = O(N3).

But that’s not as good as we can do! Let’s try O(N).

(15)

Another Example Using Definition

• Let’s guess T(N) = 10N-8 = O(N)

• To show that, must show 10N-8  c N for some big enough N.

• Let c=10.

• Then 10N-8  c N is true for all N > 0.

» i.e., in definition, let n0 = 1

• So by definition, 10N-8 = O(N).

No matter how hard you try, that’s the smallest

exponent on N that will work. i.e., O(N) is the best

we can do. And O(N) matches our intuition from

the example code!

(16)

Example: O Constants

• If T(N) = 23N 2 – 562 Then T(N) = O(N 2 )

• Which means: We guarantee that T(N) grows at a rate no faster than N 2 .

• We say c N 2 is an upper bound on T(N).

• for c >23

(17)

Wait, you say…

• Ok, T(N) = 23N 2 – 562 = O(N 2 ).

But 23N 2 grows faster than N 2 .

• What’s up with that? Shouldn’t it be O(23N 2 )?

• NO! We are concerned with the rate of

growth as N increases.

(18)

Wait, you say… (Part 2)

• Consider 23N

2

and N

2

.

• If N doubles in size, how much longer does it take to run?

23 (2N)2 = 4 * (23 N2) – and –

(2N)2 = 4 * (N2)

• In both cases, takes 4 times as long. The rate of growth is just N

2

.

• The constant didn’t matter!!!

(19)

Another Example

• Consider T(N) = 5 N

3

versus N

3

• If we triple the number of inputs, how much longer does it take to run?

5 (3N)3 = 27 * (5 N3) – and –

(3N)3 = 27 * (N3)

• In both cases, takes 27 times as long. The rate of growth is just N

3

.

• The constant didn’t matter!!!

(20)

Yet Another Example

• If T(N) = 7N 3 – N + 56 Then T(N) = O(N 3 )

• Which means: We guarantee that T(N)

grows at a rate no faster than N 3 .

(21)

Wait a cotton, pickin’…

• T(N) = 7N 3 – N + 56= O(N 3 )

• You mean to say the N doesn’t matter?

Yup!

• We are concerned with the asymptotic

behavior for big N.

(Remember those limits in calculus?)

(22)

Wait a cotton, pickin’…(Part 2)

• As N gets huge, N

3

dwarfs N.

• N

3

= 1000

3

= 1,000,000,000 which is a lot bigger than N = 1000.

(1 part in a million!)

• For even bigger values, it’s quickly 1 part in a billion billion.

(Then 1 part in a billion billion billion… yada, yada, yada.)

• Only the biggest exponent matters.

(Called asymptotic analysis.)

(23)

Wait a cotton, pickin’…(Part 3)

• Consider T(N) = 7N

3

– N + 56 versus N

3

• If we take 1000 times the number of inputs, how much longer does it take to run?

7 (1000N)3 – 1000 N + 56= 1,000,000,000 * (7 N3) – 1000 N + 56 – and –

(1000N)3 = 1,000,000,000 * (N3)

• The first term is MUCH bigger than the other terms.

• For any value of N, like 10, the smaller terms subtract an insignificant amount from the total.

• Smaller terms don’t matter

(24)

Review: The Difference Between T(N) and O(N)

• T(N)

• is total run time.

• O(N)

• is the approximation to the run time where we ignore constants and lower order terms. We call it the “growth rate.” Also called “asymptotic approximation.”

• Example:

if T(N) = 3 N2 + N + 1 then T(N) = O(N2)

if T(N) = 3 N log(N) + 2 then T(N) = O(N log(N))

(25)

Review: The Difference Between T(N) and O(N)

• What do we mean by the equal sign?

• T(N) = O(N

2

)

– Says the growth rate for T(N) is N2.

• T(N) = O(N log(N))

– Says T(N) has a growth rate of N log(N).

(26)

Big-O Is Worst Case

• Remember, Big-O says nothing about specific inputs, or specific input sizes.

• Suppose I give Bubble Sort the list 1, 2, 3, 4, 5

» Stops right away. Fast!

• Suppose I give Bubble Sort the list 5, 4, 3, 2, 1

» Worst case scenario! Slow.

• So need to calculate the max number of times code goes through a loop.

• e.g., if use a “while” loop, then the number of times code iterates should be calculated for the worst case.

» By the way, a “while” loop is just like a “for” loop when calculating run time and growth rates.

(27)

Predicting How Long To Run: A Cool Application of Big-O

• Can predict how long it will take to run a program with a very large data set.

• Do a test with a small practice data set.

• Then use big-O to predict how long it will take with a real (large) data set.

• Cool!

• Suppose we are using a program that is O(N

2

).

• Our test shows that it takes 1 minute to run 100 inputs. How long will it take to run 1000 inputs?

• Set it up this way:

   

min 1000

min 1

100

2 2

x takes

inputs takes

inputs

That’s the growth rate!

1002inputs per minute.

(28)

Predicting How Much Data Will Run: A Cool Application of Big-O

• Can predict how much data can be processed in a fixed amount of time.

• Do a test with a small data set.

• Then predict with big-O.

• Suppose we are using a program that is O(N

2

).

• Our test shows that it takes 1 minute to run 100 inputs. How many inputs will run in 60 minutes?

• Set it up this way:

   

min 60

min 1

100

2 2

takes

inputs N

takes

inputs

(29)

General O Rules: Rule 1

• Rule 1

if T

1

(N) = O(f(N)) and T

2

(N) = O(g(N)) then (a) T

1

(N) + T

2

(N) = max( O(f(N)), O(g(N)) ).

(b) T

1

(N) * T

2

(N) = O( f(N)*g(N) ).

These are big-O rules, not run-time rules.

They apply equally well to any other math class!

(30)

Example

for(int i=1; i<= nNum; i++) {

nPartialSum += i * i * i;

}

for(int i=1; i<= nNum; i++) {

for(int j=1; j<= nNum; j++) {

nPartialSum += i * j;

} }

T

1

(N) = O(N)

T

2

(N) = O(N

2

)

So the total run time is

T1(N) + T2(N) = max(O(N), O(N2)) = O(N2)

(31)

General O Rules: Rule 2

• Rule 2

(logN)k = O(N) for any constant k.

What…?

Remember: T(N) = O(f(N)) when T(N)

cf(N)

.

So, we’re just saying that (logN)k grows more slowly than N.

In other words:

logarithms grow VERY slowly.

(32)

Comparison of Growth Rates

0 20 40 60 80 100 120 140

1 3 5 7 9 11 13 15 17

LogN (LogN)^2 N

NLogN N^2 N^3

good

2^N

bad

N (number of inputs)

run ti me T( N)

(33)

Rules For Programs

• Previous rules were general math rules.

• The following rules apply to computer programs.

• will still need to use the math rules!

(34)

Rules For Calculating Growth Rate: Rule 0

• Rule 0: declarations, method calls, returns…

Zero cost.

• Example

int myVariable;

return 0;

No cost. O(1). Constant growth rate. In other words, if we double the # of inputs, takes the same amount of time to run. We describe growth rates in terms of N.

So, T(N) = 0 = 0 * N0 = O(N0) = O(1).

(35)

Rules For Calculating Growth Rate: Rule 1

• Rule 1: “for loops”

• The growth rate of a “for loop” is at most the

growth rate of the statements inside the “for loop”

times the number of iterations.

• Example: O(N)

for(i=0; i<n; i++) {

i++;

}

One addition and one assignment operation Happens N/2 times!

(i++ is happening two places – sneaky)

(36)

Rules For Calculating Growth Rate: Rule 2

• Rule 2: Nested Loops

• Analyze inside out. Growth rate of a statement inside nested loops is the growth rate of the statement multiplied by the product of the sizes of the loops.

• Example:

for(int i=1; i<= num1; i++) {

for(int j=1; j<= num2; j++) {

a[i] = i * a[j];

} }

num2 num1

So total runtime = 4 * num2 * num1 = O(N

2

)

4

(37)

Rules For Calculating Growth Rate: Rule 3

• Rule 3: Consecutive statements

These just add (which means the maximum one counts).

• Example:

for(int i=1; i<3; i++) {

a[i] = i;

}

for(int i=1; i<n; i++) {

a[i] = i;

}

2

n-1

T(n) = 2+(n-1) = O(n)

(just a big-O math rule!)

(38)

Rules For Calculating Growth Rate: Rule 4

• Rule 4: if/else

Growth rate is never more than the longest of the

“if” or the “else” statements.

• Example

if(happyDog) {

print(“bow-wow”);

}

else if(happyCat) {

for(int i=1; i<35; i++) print(“meow”);

}

34 1

T(N) = 34 = 34 N0 = O(N0) = O(1)

Assume print runs in 1 time unit.

(39)

Example With Recursion

public long factorial(long n) {

if(n<=1)

return 1;

else

return n*factorial(n-1);

}

0

1 for multiplication, plus 1 for subtraction, plus the cost of the evaluation of

factorial(n-1)

T(n) = 2 + Cost(factorial(n-1)) - or -

T(n) = 2 + T(n-1)

(roughly… we are ignoring the n<=1 in the run time)

(40)

Example With Recursion 2

O(n) 1) - (n 2

0 2

2 2

2 2

3)))) -

rial(n Cost(facto

2 ( (2 2

2))) -

n factorial(

( Cost (2

2

1)) - rial(n Cost(facto

2 T(n)

Last case of factorial(1) doesn’t cost anything.

Just returns.

(41)

Example With Recursion 3

O(n) 1) - (n 2

0 2

2 2

2 2

3))) -

T(n 2

( (2

2

2)) -

n ( T (2

2

1) - T(n 2

T(n)

Same thing, but different notation.

(42)

Example With Recursion 4

(in fact, was just a for loop in disguise)

public long factorial(long n) {

if(n<=1)

return 1;

else

return n*factorial(n-1);

}

public long factorial(long n) {

long factorial = 1;

for(int i=1; i<=n; i++) {

factorial = i*factorial;

}

return factorial;

}

O(N) T(N) = 1+N+0 = O(N)

1

N

0

(43)

Another Recursion Example

(Fibonacci numbers)

public long fib(int n) {

if(n<=1)

return 1;

else

return fib(n-1) + fib(n-2);

}

0

3 for “–”, “+” and “–”, and also the cost of fib(n-1)and the cost of fib(n-2)

T(n) = 3 + Cost(fib(n-1)) + Cost(fib(n-2)) - or -

T(n) = 3 + T(n-1) + T(n-2)

(Job interviewer once asked me if this was a good way to program Fibonacci #’s!)

(44)

Another Recursion Example 2

T(n) = 3 + T(n-1) + T(n-2)

3 + T(n-2) + T(n-3) 3 + T(n-3) + T(n-4)

3 + T(n-3) + T(n-4) etc. etc. etc.

The cost keeps doubling in size!!!

Exponential: O(2

n

)

bad, bad, bad, bad, bad, bad…

(45)

Tree Doubling

• Called a binary tree.

• Keeps doubling.

• Exponential (2n) growth.

• Bad growth rate!

• But we’ll do some problems that traverse the tree in the other direction.

• Keeps halving.

• Opposite of exponential.

– And what’s the inverse (“opposite”) of exponential? Logs!

• Logarithmic log(n) growth.

• Great growth rate!

• Stay tuned for logarithmic growth…

(46)

Recursion NOT Always a “For”

Loop in Disguise

• The Fibonacci recursion is O(2 N ).

• Most for loops are O(N). But not all…

Stay tuned…

good!

bad!

References

Related documents

Effect of pH, water activity and temperature on the growth and accumulation of ochratoxin A produced by three strains of Aspergillus carbonarius.. isolated from

Assim que a fibra ´ otica tem o comprimento de 40 km, verifica-se que a penalidade de potˆ encia ´ otica recebida ´ e inferior para o formato de impulso de Nyquist , dado que para

A tasty salad with nacho meat, tomatoes, jalapenos, corn, onions, shredded cheese, nacho chips, sour cream, guacamole and salsa.. 159,- bIFF

Central Display Ltd., will be responsible for damage caused by them to crated shipments while handling shipments and will not be held responsible for concealed damage, or damage

To obtain a simple, constant-time receive algorithm we leverage the way packets are sent: when a subflow is ready to send data, segments with contiguous data se- quence numbers

Specifically, the following patient-specific variables should be addressed by the guideline and alternative treatment options discussed to make the criteria appropriate for

Σήμερα, η γενική απεργία κα­ τάντησε (μέ τή μορφή μιάς προσχεδιασμένης επέκτασης τής απεργίας) τό απόλυτο όπλο τών συνδικάτων

In order to receive credit for satisfactory experience performing chemical engineering work, for each employment you must have performed at least three (3) of the tasks listed in