There are several factors that can prevent the compiler from doing the optimizations that we want it to do. It is important for the programmer to be aware of these obstacles and to know how to avoid them. Some important obstacles to optimization are discussed below.
Cannot optimize across modules
The compiler does not have information about functions in other modules than the one it is compiling. This prevents it from making optimizations across function calls. Example:
// Example 8.20 module1.cpp int Func1(int x) { return x*x + 1; } module2.cpp int Func2() { int a = Func1(2); ... }
If Func1 and Func2 were in the same module then the compiler would be able do function inlining and constant propagation and reduce a to the constant 5. But the compiler does not have the necessary information about Func1 when compiling module2.cpp.
The simplest way to solve this problem is to combine the multiple .cpp modules into one by means of #include directives. This is sure to work on all compilers. Some compilers have a feature called whole program optimization, which will enable optimizations across
modules (See page 83).
Pointer aliasing
When accessing a variable through a pointer or reference, the compiler may not be able to completely rule out the possibility that the variable pointed to is identical to some other variable in the code. Example:
// Example 8.21
void Func1 (int a[], int * p) { int i; for (i = 0; i < 100; i++) { a[i] = *p + 2; } } void Func2() { int list[100]; Func1(list, &list[8]); }
Here, it is necessary to reload *p and calculate *p+2 a hundred times because the value pointed to by p is identical to one of the elements in a[] which will change during the loop. It is not permissible to assume that *p+2 is a loop-invariant code that can be moved out of the loop. Example 8.21 is indeed a very contrived example, but the point is that the compiler cannot rule out the theoretical possibility that such contrived examples exist. Therefore the compiler is prevented from assuming that *p+2 is a loop-invariant expression that it can move outside the loop.
Most compilers have an option for assuming no pointer aliasing (/Oa). The easiest way to overcome the obstacle of possible pointer aliasing is to turn on this option. This requires that you analyze all pointers and references in the code carefully to make sure that no variable or object is accessed in more than one way in the same part of the code. It is also possible to tell the compiler that a specific pointer does not alias anything by using the keyword
__restrict or __restrict__, if supported by the compiler.
We can never be sure that the compiler takes the hint about no pointer aliasing. The only way to make sure that the code is optimized is to do it explicitly. In example 8.21, you could calculate *p+2 and store it in a temporary variable outside the loop if you are sure that the pointer does not alias any elements in the array. This method requires that you can predict where the obstacles to optimization are.
Dynamic memory allocation
Any array or object that is allocated dynamically (with new or malloc) is necessarily accessed through a pointer. It may be obvious to the programmer that pointers to different dynamically allocated objects are not overlapping or aliasing, but the compiler is usually not able to see this. It also prevents the compiler from aligning the data optimally, or from knowing that the objects are aligned. It is preferred to declare objects and fixed size arrays inside the function that needs them.
Pure functions
A pure function is a function that has no side-effects and whose return value depends only on the values of its arguments. This closely follows the mathematical notion of a function. Multiple calls to a pure function with the same arguments are sure to produce the same result. A compiler can eliminate common subexpressions that contain pure function calls and it can move out loop-invariant code containing pure function calls. Unfortunately, the compiler cannot know that a function is pure if the function is defined in a different module or a function library.
Therefore, it is necessary to do optimizations such as common subexpression elimination, constant propagation, and loop-invariant code motion manually when it involves pure function calls.
The Gnu and Clang compilers and the Intel compiler for Linux have an attribute which can be applied to a function prototype to tell the compiler that this is a pure function. Example:
// Example 8.22 #ifdef __GNUC__
#define pure_function __attribute__((const)) #else
#define pure_function #endif
double Func1(double) pure_function ; double Func2(double x) {
return Func1(x) * Func1(x) + 1.; }
Here, the Gnu compiler will make only one call to Func1, while other compilers will make two.
Some other compilers (Microsoft, Intel) know that standard library functions like sqrt, pow
and log are pure functions, but unfortunately there is no way to tell these compilers that a user-defined function is pure.
Virtual functions and function pointers
It is rarely possible for the compiler to predict with certainty which version of a virtual function will be called, or what a function pointer points to. Therefore, it cannot inline the function or otherwise optimize across the function call.
Algebraic reduction
Most compilers can do simple algebraic reductions such as -(-a) = a, but they are not always able to do more complicated reductions. Algebraic reduction is a complicated process which is difficult to implement in a compiler.
Many algebraic reductions are not permissible for reasons of mathematical purity. In many cases it is possible to construct obscure examples where the reduction would cause overflow or loss of precision, especially in floating point expressions (see page 74). The compiler cannot rule out the possibility that a particular reduction would be invalid in a particular situation, but the programmer can. It is therefore necessary to do the algebraic reductions explicitly in many cases.
Integer expressions are less susceptible to problems of overflow and loss of precision for reasons explained on page 74. It is therefore possible for the compiler to do more
reductions on integer expressions than on floating point expressions. Most reductions involving integer addition, subtraction and multiplication are permissible in all cases, while
many reductions involving division and relational operators (e.g. '>') are not permissible for reasons of mathematical purity. For example, compilers cannot reduce the integer
expression -a > -b to a < b because of a very obscure possibility of overflow.
Table 8.1 (page 79) shows which reductions the compilers are able to do, at least in some situations, and which reductions they cannot do. All the reductions that the compilers cannot do must be done manually by the programmer.
Floating point induction variables
Compilers cannot make floating point induction variables for the same reason that they cannot make algebraic reductions on floating point expressions. It is therefore necessary to do this manually. This principle is useful whenever a function of a loop counter can be calculated more efficiently from the previous value than from the loop counter. Any expression that is an n'th degree polynomial of the loop counter can be calculated by n
additions and no multiplications. The following example shows the principle for a 2'nd order polynomial:
// Example 8.23a. Loop to make table of polynomial
const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients double Table[100]; // Table
int x; // Loop counter for (x = 0; x < 100; x++) {
Table[x] = A*x*x + B*x + C; // Calculate polynomial }
The calculation of this polynomial can be done with just two additions by the use of two induction variables:
// Example 8.23b. Calculate polynomial with induction variables const double A = 1.1, B = 2.2, C = 3.3; // Polynomial coefficients double Table[100]; // Table
int x; // Loop counter const double A2 = A + A; // = 2*A
double Y = C; // = A*x*x + B*x + C double Z = A + B; // = Delta Y
for (x = 0; x < 100; x++) {
Table[x] = Y; // Store result
Y += Z; // Update induction variable Y Z += A2; // Update induction variable Z }
The loop in example 8.23b has two loop-carried dependency chains, namely the two
induction variables Y and Z. Each dependency chain has a latency which is the same as the latency of a floating point addition. This is small enough to justify the method. A longer loop- carried dependency chain would make the induction variable method unfavorable, unless the value is calculated from a value that is two or more iterations back.
The method of induction variables can also be vectorized if you take into account that each value is calculated from the value that lies r places back in the sequence, where r is the number of elements in a vector or the loop unroll factor. A little math is required for finding the right formula in each case.
Inlined functions have a non-inlined copy
Function inlining has the complication that the same function may be called from another module. The compiler has to make a non-inlined copy of the inlined function for the sake of the possibility that the function is also called from another module. This non-inlined copy is dead code if no other modules call the function. This fragmentation of the code makes caching less efficient.
There are various ways around this problem. If a function is not referenced from any other module then add the keyword static to the function definition. This tells the compiler that the function cannot be called from any other module. The static declaration makes it easier for the compiler to evaluate whether it is optimal to inline the function, and it prevents the compiler from making an unused copy of an inlined function. The static keyword also makes various other optimizations possible because the compiler does not have to obey any specific calling conventions for functions that are not accessible from other modules. You may add the static keyword to all local non-member functions.
Unfortunately, this method does not work for class member functions because the static
keyword has a different meaning for member functions. You can force a member function to be inlined by declaring the function body inside the class definition. This will prevent the compiler from making a non-inlined copy of the function, but it has the disadvantage that the function is always inlined even when it is not optimal to do so (i.e. if the member function is big and is called from many different places).
Some compilers have an option (Windows: /Gy, Linux: -ffunction-sections) which allows the linker to remove unreferenced functions. It is recommended to turn on this option.