Factors affecting optimization
•
The architecture of the target CPU:
–
Number of CPU registers.
–
RISC vs CISC.
–
Pipelines.
–
Number of functional units.
•
The architecture of the machine:
–
Cache size .
–
Cache/Memory transfer rates.
•
The machine itself:
–
General purpose use.
–
Special-purpose use(Embedded systems).
•
General Purpose Operating System.
Code Optimization Techniques
We would discuss various code optimization techniques
which includes the following:
1. Dead Code Elimination
2. Constant Folding
3. Copy Propagation
4. Strength Reduction
5. Common Sub-Expression Elimination
6. Code Motion
Dead Code Elimination
Eliminates code that cannot be reached or where the results are not subsequently
used.
For example, consider the following code fragment:
int count
void foo() {
int i;
i = 1;
// dead code since it is not subsequently used
count = 1;
//dead code since it was overwritten
count = 2;
return;
count = 3;
//dead code(unreachable) since the function has returned
}
After applying dead code elimination we have the code below:
int count
void foo() {
Constant Folding
This refers to the technique of evaluating ate compile time, expressions whose
operands are known to be constant.
It involves the determining that all of the operands in an expression are constant
values, performing the evaluation of the expression at compile time and then
replacing the expression by its value. For example, the expression
12 + 4 * 3
can be replaced by its result of 24 at compile time and omit the code as if the
Constant Propagation
In constant propagation, if a variable is assigned a constant value, then
subsequent use of that variable can be replaced by a constant as long as no
intervening assignment has changed the value of the variable.
For Example, consider the code:
int x = 12;
int y = 7 – x /2;
return y * (24 / x + 2)
Applying constant propagation to x, we have:
int x = 12;
int y = 7 – 12 / 2;
return y * (24 / 12 + 2);
Applying constant folding , we have:
Strength Reduction
This is also called operator strength reduction is the replacement of
expressions that are expensive with cheaper and simple ones.
Fore example an add instruction can be used to replace a multiply instruction.
The code:
T2 = T2 * 2
Can be replaced with:
Common Sub-Expression Elimination
This is a code optimization technique that scans the code to find identical expressions
and replaces redundant expression each time it is encountered.
For example, consider the following fragment:
a = b * c + g;
d = b * c * e;
Here we see that the sub-expression b * c repeats, so we can transform the code to:
t = b * c;
a = t + g;
Code Motion
Also called loop-invariant code motion has to do with moving a block of code
outside a loop if it wont have any difference if it is executed outside or inside the
loop.
Consider the example:
for (int i = 0; i < n; i++) {
x = y + z;
a[i] = 6 * i;
}
In the code fragment, the expression x = y + z has no effect inside the loop and can safely be
moved outside of the loop.
The resulting code would be:
x = y + z;
Inlining
This is also referred to as function inlining or inline expansion, is a technique of
replacing a function call with the actual body of the function.
This technique eliminates the overhead associated with expanding the body of
the function inline.
Consider the fragment:
int add ( int x, int y)
{
z = x + y;
return z;
}
int sub (int x, int y) {
return add(x, -y)
}
We can expand the second function without calling he add function, so we have:
Is it faster to count down than it is to
count up?
•
for (i = N; i >= 0; i--)
putchar('*');
is better than:
Is it faster to count down than it is to
count up?
Counting from N down to 0 is slightly faster that Counting from 0 to N
in the sense of how hardware will handle comparison..
Note the
comparison
in each loop
i>=
0
i<N
Most processors have comparison with zero instruction..
so the first one will be translated to machine code as:
1.Load i
2.Compare and jump if Less than or Equal zero
But the second one needs to load N form Memory each time
1.load i
2.load N
3.Sub i and N
4.Compare and jump if Less than or Equal zero
So it is not because of counting down or up..
Examples
t1 = t1 + 1
//remove this(dead code elimination)
L0:t2 = 0
t3 = t1 * 8 + 1
t4 = t3 + t2
//remove this (copy propagation
t5 = t4 * 4
//(t3+t2)*4, then remove preceding line
t6 = t5
t7 = FP + t3
//remove this (dead code elimiation)
*t7 = t2
//replace with t7=0(constant propagation)
t8 = t1
//remove this (dead code elimination)
if(t8>0) goto L1
//t8 is surely < 8 (constant folding)
L1: goto L0
L2: t1 = 1
t10 = 16
t11 = t1 * 2
//Change to t1 + t1 (Strength reduction)
Code after Optimization
L0:t2 = 0
t3 = t1 * 8 + 1
t5 = (t3+t2)*4
t6 = t5
Java codes
The Optimized code:
• for(i=0;i<10;i++) {
System.out.println(i*10); }
• char x; int y; y = x;
• i = 1; count = 1; count = 2;
• float f[100]=new float[100]; float sum=0;
for(i=0;i<100;i++) { sum += f[i];
}
• int i, sum = 0;
for (i = 1; i <= N; ++i) { sum += i;
}
Rewrite the following loop using Do…
While structure:
for(i=0;i<100;i++)
{
Explain what type of code optimization that can be done on the following code:
public void selectionmethod()
{
int a = getValue();
if(a > 0)
{
int b = c + 10;
int d = b * 2;
System.out.println(d);
}
else if( a < 30 )
{
int b = c + 10;
int d = b * 2;
System.out.println(d);
}
Rewrite the following code in an optimization form:
public void emptymethod()
{
final int MIN = 0;
int i = 100;
if (i < MIN)
{
}
Rewrite the following code to remove unnecessary structure:
public void ifmethod()
{
if (true)
{
//Some Code ...
}
if (!true)
{
//Some Code ...
}
Rewrite the following code to remove unnessary If
staement(s):
public boolean test(String value)
{
if(value.equals("Hello"))
{
return true;
}
else
{
return false;
}
Rewrite the following code to avoid method call in
the loop:
public void method()
{
String str = "Good morning";
for (int i = 0; i < str.length(); i++)
{
i++;
}
•
Data Access Optimizations: Data access
optimizations are code transformations, which
change the order in which iterations in a loop
nest are executed. The goal of these
transformations is mainly to improve
temporal locality. Moreover, they can also
expose parallelism and make loop iterations
vectorizable. Note that the data access
optimizations we present in this section
maintain all data dependencies and do not
change the results of the numerical
•
Data Layout Optimizations: Data layout
optimizations modify how data structures and
variables are arranged in memory. These
transformations aim at avoiding effects like cache
conflict misses and false sharing. They are further
intended to improve the spatial locality of a code.
Data layout optimizations include changing base
addresses of variables, modifying array sizes,
Java codes
The Optimization Techniques:
The Optimized code:
double sum =0;
double a[]=new double[1024];
double b[]=new double[1024];
for(int i=1; i<=1023;i++)
sum+=a[i]*b[i];
for(int i=1; i<=n;i++)
for(int j=1; j<=n;j++)
a[i][j]= b[i][j];
for(int i=1; i<=n;i++)
b[i]= a[i] +1.0;
Java codes
The Optimization Techniques:
The Optimized code:
double sum =0;
double a[][]=new double[n][n];
for(int i=1; i<=n;i++)
for(int j=1; j<=n;j++)
sum+=a[i][j];
double a[]=new double[1024];
double b[]=new double[1024];
Example: Accessing A Set-Associative Cache
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
for(int i=1; i<=n;i++) sum+=a[i][j];
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Accessing Sequence:
Example: Accessing A Set-Associative Cache
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][0]
Miss!
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][0]
Miss!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[1][0]
Miss!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[1][0] Miss!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3] double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[2][0] Miss!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3] double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[2][0] Miss!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3]
A[2][0],A[2][1], A[2][2], A[2][3]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[3][0] Miss! Cashe FullDelete the first Block
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3]
A[1][0],A[1][1], A[1][2], A[1][3]
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[3][0] Miss! Cashe FullDelete the first Block Now get the data
Mem Location / Cashe
A[1][0],A[1][1], A[1][2], A[1][3]
A[1][0],A[1][1], A[1][2], A[1][3]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[0][1] Miss! Cashe FullDelete the second Block
Mem Location / Cashe
A[3][0],A[3][1], A[3][2], A[3][3]
A[1][0],A[1][1], A[1][2], A[1][3]
A[1][0],A[1][1], A[1][2], A[1][3]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[0][1] Miss! Cashe FullDelete the second Block
Now get the data
Mem Location / Cashe
A[3][0],A[3][1], A[3][2], A[3][3]
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3]
OPPS
Always Cashe Miss
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Accessing Sequence: …, A[0][2], A[0][1], A[0][0]
double sum =0;
double a[][]=new double[n][n]; for(int j=1; j<=n;j++)
Example: Accessing A Set-Associative Cache
Mem Location / Cashe
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][0]
Miss!
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][0]
Miss!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][1]
Hit!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][2]
Hit!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ...
A[0][1] …..
A[0][2] …..
A[0][3] …..
A[1][0] …..
A[1][1] ……
A[1][2] ……
A[1][3] ……
A[2][0] …..
A[2][1] …..
A[2][2] …..
A[2][3] ……
A[3][0] …..
A[3][1] ……
A[3][2] ……
A[3][3] …..
Current accessing: Page A[0][3]
Hit!
Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[1][0] Miss!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3] double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[1][1] Hit!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3] double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[1][2] Hit!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3] double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)
Example: Accessing A Set-Associative Cache
A[0][0] ... A[0][1] ….. A[0][2] ….. A[0][3] ….. A[1][0] ….. A[1][1] …… A[1][2] …… A[1][3] …… A[2][0] ….. A[2][1] ….. A[2][2] ….. A[2][3] …… A[3][0] ….. A[3][1] …… A[3][2] …… A[3][3] ….. Current accessing: Page A[1][3] Hit!Mem Location / Cashe
A[0][0],A[0][1], A[0][2], A[0][3]
A[1][0],A[1][1], A[1][2], A[1][3]
WOW
Always Cashe Hit
double sum =0;
double a[][]=new double[n][n]; for(int i=1; i<=n;i++)