Data Structures
Algorithm Performance and Big O
Analysis
What’s an Algorithm?
• … a clearly specified set of instructions to be followed to solve a problem.
• In essence: A computer program.
• In detail: Defined mathematically in a
course on Theory of Computation (Turing).
What’s Big “O” do?
• Measures the growth rate of an algorithm as the size of its input grows.
• Huh? “O” is a math function that helps estimate how much longer it takes to run n inputs versus n+1 inputs (or n+2, 2n, 3n…) .
• Doesn’t care what language you use! Only cares
about the underlying algorithm.
What Doesn’t “O” do?
• Doesn’t tell you that algorithm A is faster than algorithm B for a particular input.
– Why not? Only tells you if one grows faster than another in a general sense for all inputs.
– Usually concerned with very large data inputs.
Called asymptotic algorithm analysis.
Example: Doesn’t Care About Particular Input
public void algorithm1(Object xInput) {
…16 million lines of code…
}
public void algorithm2(Object xInput) {
if (xInput.size() == 2074) {
return;
} else {
…16 million different lines…
} }
Which one is faster? Always?
We only care about the “average” behavior of the 16 million lines.
How Calculate Run Time?
1. Each basic operation in the code counts for 1 time unit.
• A basic operation executes in the same time no matter what values it is supplied.
• Examples:
• Adding two integers is a basic operation.
• Reading a[1] is a basic operation (independent of array size).
• Summing the values in an array is NOT a basic operation (why?).
2. Ignore actual time units (seconds, days, etc.).
• Could be 1 ns for a fast computer or 1 day for a really slow computer. But for large inputs, won’t matter.
3. Ignore time for method calls, returns, and declarations.
• Doesn’t matter in the long run.
Run Time Example 1
Calculating
public int sum(int num) {
int partialSum = 0;
for(int i=1; i<= num; i++) {
partialSum += i * i * i;
}
return partialSum;
}
Ni
i
1
3
How long to run this code?
Run Time Example 1 (cont.)
Calculating
public int sum(int num) {
int partialSum = 0;
for(int i=1; i<= num; i++) {
partialSum += i * i * i;
}
return partialSum;
}
Ni
i
1 3
no cost
costs 1 (to init/store in memory)
costs N+1 (once for each test of <=)
(and +1 because of last time through, when it fails)
costs 2 N (once for each + and =…
recall i++ is just i = i + 1) total cost of 4N
(costs 4 per execution… 1 addition, 2 multiplications, 1 assignment)
no cost
Final tally: 1+1+(N+1)+2N+4N = 7N+3
How long to run this code?
costs 1 (to init/store in memory)
Run Time Example 2
public int sum(int num) {
int partialSum = 0;
for(int i=1; i<= num; i++) {
for(int j=1; j<= num; j++) {
partialSum += i * j;
} }
return partialSum;
}
no cost
no cost costs 1 costs N+1 costs 2N costs N*1
costs N*(N+1) costs N*2N costs N*N*3
Final tally: 1+1+(N+1)+2N+N*1+N*(N+1) +N*2N+N*N*3 = 6N2+5N+3 costs 1
But This is Overkill!
public int sum(int num) {
int partialSum = 0;
for(int i=1; i<= num; i++) {
for(int j=1; j<= num; j++) {
partialSum += i * j;
} }
return partialSum;
}
Really only one operation, and it happens N
2times
So we say order of N 2 ,
or O(N 2 )
Likewise, More Overkill
Calculating
public int sum(int num) {
int partialSum = 0;
for(int i=1; i<= num; i++) {
partialSum += i * i * i;
}
return partialSum;
}
Ni
i
1 3
Really only one operation, and it happens N times
So we say order of N,
or O(N)
Another “Order Of” Example
public void cool(int n) {
for(int i=2; i<=n; i++) {
int j = (1 + i * i % 3 % i) / (i + 2);
} }
Also, run time, T(N) = 10N - 8.
Can you show me?
The heart of the code is this line.
And it happens N times.
So order of N.
Or say O(N).
Ah, Back to the Big-O (Definition)
• Call T(N) the run time.
• Definition: T(N) = O(f(N)) if there are positive constants c and n
0such that
T(N) c f(N) when N > n
0.
• What’s it mean? The run time is always less than f(N) for big enough N. (Only the highest order term matters!) (And constants don’t matter.)
Note: f(N) should be the smallest such function such that c and n0exist.
Example Using Big-O Definition
• In last example, T(N) = 10N - 8
• So, let’s guess T(N) = 10N - 8 = O(N
3)
• To show that, must show 10N+1 c N3 for some big enough N.
• Let c=1.
• Then 10N - 8 c N3 is true for all N > 10.
• In fact, true for all N > 4.
» i.e., in definition, let n0 = 4
• So by definition, 10N - 8 = O(N3).
But that’s not as good as we can do! Let’s try O(N).
Another Example Using Definition
• Let’s guess T(N) = 10N-8 = O(N)
• To show that, must show 10N-8 c N for some big enough N.
• Let c=10.
• Then 10N-8 c N is true for all N > 0.
» i.e., in definition, let n0 = 1
• So by definition, 10N-8 = O(N).
No matter how hard you try, that’s the smallest
exponent on N that will work. i.e., O(N) is the best
we can do. And O(N) matches our intuition from
the example code!
Example: O Constants
• If T(N) = 23N 2 – 562 Then T(N) = O(N 2 )
• Which means: We guarantee that T(N) grows at a rate no faster than N 2 .
• We say c N 2 is an upper bound on T(N).
• for c >23
Wait, you say…
• Ok, T(N) = 23N 2 – 562 = O(N 2 ).
But 23N 2 grows faster than N 2 .
• What’s up with that? Shouldn’t it be O(23N 2 )?
• NO! We are concerned with the rate of
growth as N increases.
Wait, you say… (Part 2)
• Consider 23N
2and N
2.
• If N doubles in size, how much longer does it take to run?
23 (2N)2 = 4 * (23 N2) – and –
(2N)2 = 4 * (N2)
• In both cases, takes 4 times as long. The rate of growth is just N
2.
• The constant didn’t matter!!!
Another Example
• Consider T(N) = 5 N
3versus N
3• If we triple the number of inputs, how much longer does it take to run?
5 (3N)3 = 27 * (5 N3) – and –
(3N)3 = 27 * (N3)
• In both cases, takes 27 times as long. The rate of growth is just N
3.
• The constant didn’t matter!!!
Yet Another Example
• If T(N) = 7N 3 – N + 56 Then T(N) = O(N 3 )
• Which means: We guarantee that T(N)
grows at a rate no faster than N 3 .
Wait a cotton, pickin’…
• T(N) = 7N 3 – N + 56= O(N 3 )
• You mean to say the N doesn’t matter?
Yup!
• We are concerned with the asymptotic
behavior for big N.
(Remember those limits in calculus?)Wait a cotton, pickin’…(Part 2)
• As N gets huge, N
3dwarfs N.
• N
3= 1000
3= 1,000,000,000 which is a lot bigger than N = 1000.
(1 part in a million!)• For even bigger values, it’s quickly 1 part in a billion billion.
(Then 1 part in a billion billion billion… yada, yada, yada.)• Only the biggest exponent matters.
(Called asymptotic analysis.)
Wait a cotton, pickin’…(Part 3)
• Consider T(N) = 7N
3– N + 56 versus N
3• If we take 1000 times the number of inputs, how much longer does it take to run?
7 (1000N)3 – 1000 N + 56= 1,000,000,000 * (7 N3) – 1000 N + 56 – and –
(1000N)3 = 1,000,000,000 * (N3)
• The first term is MUCH bigger than the other terms.
• For any value of N, like 10, the smaller terms subtract an insignificant amount from the total.
• Smaller terms don’t matter
Review: The Difference Between T(N) and O(N)
• T(N)
• is total run time.
• O(N)
• is the approximation to the run time where we ignore constants and lower order terms. We call it the “growth rate.” Also called “asymptotic approximation.”
• Example:
if T(N) = 3 N2 + N + 1 then T(N) = O(N2)
if T(N) = 3 N log(N) + 2 then T(N) = O(N log(N))
Review: The Difference Between T(N) and O(N)
• What do we mean by the equal sign?
• T(N) = O(N
2)
– Says the growth rate for T(N) is N2.
• T(N) = O(N log(N))
– Says T(N) has a growth rate of N log(N).
Big-O Is Worst Case
• Remember, Big-O says nothing about specific inputs, or specific input sizes.
• Suppose I give Bubble Sort the list 1, 2, 3, 4, 5
» Stops right away. Fast!
• Suppose I give Bubble Sort the list 5, 4, 3, 2, 1
» Worst case scenario! Slow.
• So need to calculate the max number of times code goes through a loop.
• e.g., if use a “while” loop, then the number of times code iterates should be calculated for the worst case.
» By the way, a “while” loop is just like a “for” loop when calculating run time and growth rates.
Predicting How Long To Run: A Cool Application of Big-O
• Can predict how long it will take to run a program with a very large data set.
• Do a test with a small practice data set.
• Then use big-O to predict how long it will take with a real (large) data set.
• Cool!
• Suppose we are using a program that is O(N
2).
• Our test shows that it takes 1 minute to run 100 inputs. How long will it take to run 1000 inputs?
• Set it up this way:
min 1000
min 1
100
2 2x takes
inputs takes
inputs
That’s the growth rate!
1002inputs per minute.
Predicting How Much Data Will Run: A Cool Application of Big-O
• Can predict how much data can be processed in a fixed amount of time.
• Do a test with a small data set.
• Then predict with big-O.
• Suppose we are using a program that is O(N
2).
• Our test shows that it takes 1 minute to run 100 inputs. How many inputs will run in 60 minutes?
• Set it up this way:
min 60
min 1
100
2 2takes
inputs N
takes
inputs
General O Rules: Rule 1
• Rule 1
if T
1(N) = O(f(N)) and T
2(N) = O(g(N)) then (a) T
1(N) + T
2(N) = max( O(f(N)), O(g(N)) ).
(b) T
1(N) * T
2(N) = O( f(N)*g(N) ).
These are big-O rules, not run-time rules.
They apply equally well to any other math class!
Example
for(int i=1; i<= nNum; i++) {
nPartialSum += i * i * i;
}
for(int i=1; i<= nNum; i++) {
for(int j=1; j<= nNum; j++) {
nPartialSum += i * j;
} }
T
1(N) = O(N)
T
2(N) = O(N
2)
So the total run time is
T1(N) + T2(N) = max(O(N), O(N2)) = O(N2)
General O Rules: Rule 2
• Rule 2
(logN)k = O(N) for any constant k.
What…?
Remember: T(N) = O(f(N)) when T(N)
cf(N)
.So, we’re just saying that (logN)k grows more slowly than N.
In other words:
logarithms grow VERY slowly.
Comparison of Growth Rates
0 20 40 60 80 100 120 140
1 3 5 7 9 11 13 15 17
LogN (LogN)^2 N
NLogN N^2 N^3
good
2^Nbad
N (number of inputs)
run ti me T( N)
Rules For Programs
• Previous rules were general math rules.
• The following rules apply to computer programs.
• will still need to use the math rules!
Rules For Calculating Growth Rate: Rule 0
• Rule 0: declarations, method calls, returns…
Zero cost.
• Example
int myVariable;
return 0;
No cost. O(1). Constant growth rate. In other words, if we double the # of inputs, takes the same amount of time to run. We describe growth rates in terms of N.
So, T(N) = 0 = 0 * N0 = O(N0) = O(1).
Rules For Calculating Growth Rate: Rule 1
• Rule 1: “for loops”
• The growth rate of a “for loop” is at most the
growth rate of the statements inside the “for loop”
times the number of iterations.
• Example: O(N)
for(i=0; i<n; i++) {
i++;
}
One addition and one assignment operation Happens N/2 times!
(i++ is happening two places – sneaky)
Rules For Calculating Growth Rate: Rule 2
• Rule 2: Nested Loops
• Analyze inside out. Growth rate of a statement inside nested loops is the growth rate of the statement multiplied by the product of the sizes of the loops.
• Example:
for(int i=1; i<= num1; i++) {
for(int j=1; j<= num2; j++) {
a[i] = i * a[j];
} }
num2 num1
So total runtime = 4 * num2 * num1 = O(N
2)
4
Rules For Calculating Growth Rate: Rule 3
• Rule 3: Consecutive statements
These just add (which means the maximum one counts).
• Example:
for(int i=1; i<3; i++) {
a[i] = i;
}
for(int i=1; i<n; i++) {
a[i] = i;
}
2
n-1
T(n) = 2+(n-1) = O(n)
(just a big-O math rule!)
Rules For Calculating Growth Rate: Rule 4
• Rule 4: if/else
Growth rate is never more than the longest of the
“if” or the “else” statements.
• Example
if(happyDog) {
print(“bow-wow”);
}
else if(happyCat) {
for(int i=1; i<35; i++) print(“meow”);
}
34 1
T(N) = 34 = 34 N0 = O(N0) = O(1)
Assume print runs in 1 time unit.
Example With Recursion
public long factorial(long n) {
if(n<=1)
return 1;
else
return n*factorial(n-1);
}
0
1 for multiplication, plus 1 for subtraction, plus the cost of the evaluation of
factorial(n-1)
T(n) = 2 + Cost(factorial(n-1)) - or -
T(n) = 2 + T(n-1)
(roughly… we are ignoring the n<=1 in the run time)
Example With Recursion 2
O(n) 1) - (n 2
0 2
2 2
2 2
3)))) -
rial(n Cost(facto
2 ( (2 2
2))) -
n factorial(
( Cost (2
2
1)) - rial(n Cost(facto
2 T(n)
Last case of factorial(1) doesn’t cost anything.
Just returns.
Example With Recursion 3
O(n) 1) - (n 2
0 2
2 2
2 2
3))) -
T(n 2
( (2
2
2)) -
n ( T (2
2
1) - T(n 2
T(n)
Same thing, but different notation.
Example With Recursion 4
(in fact, was just a for loop in disguise)
public long factorial(long n) {
if(n<=1)
return 1;
else
return n*factorial(n-1);
}
public long factorial(long n) {
long factorial = 1;
for(int i=1; i<=n; i++) {
factorial = i*factorial;
}
return factorial;
}
O(N) T(N) = 1+N+0 = O(N)
1
N
0
Another Recursion Example
(Fibonacci numbers)
public long fib(int n) {
if(n<=1)
return 1;
else
return fib(n-1) + fib(n-2);
}
0
3 for “–”, “+” and “–”, and also the cost of fib(n-1)and the cost of fib(n-2)
T(n) = 3 + Cost(fib(n-1)) + Cost(fib(n-2)) - or -
T(n) = 3 + T(n-1) + T(n-2)
(Job interviewer once asked me if this was a good way to program Fibonacci #’s!)
Another Recursion Example 2
T(n) = 3 + T(n-1) + T(n-2)
3 + T(n-2) + T(n-3) 3 + T(n-3) + T(n-4)
3 + T(n-3) + T(n-4) etc. etc. etc.
The cost keeps doubling in size!!!
Exponential: O(2
n)
bad, bad, bad, bad, bad, bad…
Tree Doubling
• Called a binary tree.
• Keeps doubling.
• Exponential (2n) growth.
• Bad growth rate!
• But we’ll do some problems that traverse the tree in the other direction.
• Keeps halving.
• Opposite of exponential.
– And what’s the inverse (“opposite”) of exponential? Logs!
• Logarithmic log(n) growth.
• Great growth rate!
• Stay tuned for logarithmic growth…