Compiler optimization passes are a core feature of LLVM. The number of optimiza-
tion passes provided by LLVM is continuously growing with currently more than 50 passes available. Each one of these is making use of different optimization opportu- nities to modify LLVM-IR code to improve the output binary’s runtime or sometimes memory consumption. Additionally to these passes, LLBMC contains an additional set of LLVM-IR transformations, most of which are not targeted at optimization but at extending LLBMC’s support for more exotic LLVM-IR language features. Interestingly, many of these optimizations not only improve runtime on the target architecture, but they are also often, though not always, beneficial to the runtime of the SMT solver solving the formulæ generated from this code using LLBMC. This is because SMT solvers use bit-blasting to lower-level bitvector operations to propositional formulæ, and the generated formulæ closely match the circuits of real architectures.
Using compiler optimizations in a code analysis tool has to be handled with care, though. What is merely a missed optimization opportunity for a compiler might cause unsoundness or incompleteness for a static analysis tool, e.g. when loop unrolling fails because a loop has not the expected structure, the compiled code will at worst run slightly slower but a static analysis tool relying on loop unroll, such as LLBMC, might not be able to analyze the code at all.
3.4.1 Optimizations for Performance Improvement
Many optimizations in LLVM-IR can be used in LLBMC to improve performance. Experience has shown that many things which are expensive for a real processor are similarly expensive for an SMT solver.
One basic, exemplary optimization pass, instcombine, replaces one or more expen- sive instructions by one or more cheaper instructions, as can be seen in listing 3.9. The fact that the right hand side argument of the multiplication in listing 3.9a is the constant 3 is used to replace the multiplication by a shift operation and an addition, as can be seen in listing 3.9b.
1 define i32 @times3 (i32 %x) {
2 %0 = mul i32 %x , 3
3 ret %0 4 }
(a) Code before optimization
1 define i32 @times3 (i32 %x) {
2 %0 = shl i32 %x , 2
3 %1 = add i32 %0 , %x
4 ret %1 5 }
(b) Code after optimization
Listing 3.9: Example for instruction simplification optimization
The single most imprtant pass for LLBMC’s performance is LLVM’s built-in mem2reg pass. This pass lifts values stored in memory to virtual registers where possible. This reduces the number of memory read and write operations considerably, which in turn reduces the load on the SMT solver’s implementation of the theory of arrays as well.
Consider the following trivial C function:
1 int double(int x) { 2 return x*x; 3 }
This program is compiled to the IR code in listing 3.10a, and can then be optimized to the code in listing 3.10b. While similar optimizations are done by SMT solvers, it is paramount to do these optimizations early in order to avoid an intermediate blow-up of the formula’s size.
1 define i32 @double (i32 %x) { 2 %x_addr = alloca i32 3 store i32 %x , %x_addr
4 %0 = load i32 %x_addr
5 %1 = add i32 %0 , %0
6 ret %1 7 }
(a) Code before optimization
1 define i32 @double (i32 %x) {
2 %1 = add i32 %x , %x
3 ret %1 4 }
(b) Code after optimization
Listing 3.10: Example for mem2reg optimization
The alloca, store, and load instructions are inserted by the clang code generator because at the time this code is emitted, the generator does not yet know that the address of x will never be used for anything else but loading and storing of x in the local function. Once the whole function is emitted, the mem2reg pass can analyze the function and can then remove the redundant memory access operations.
3.4.2 Optimizations for Extending Language Support
For programs which contain certain language constructs, the following compiler optimizations are required by LLBMC before encoding (see chapter 4) can happen.
This is because LLBMC’s translation to ILR does not support all of LLVM’s language features, so a preprocessing step is necessary to remove these.
LLVM’s optimization pass lowerindirectbranch replaces any indirectbr by an equivalent construct of direct, conditional branches (br). This is straightforward, because indirectbr contains a list of allowed branch targets, so one br per unique entry in this list suffices.
LLVM’s pass lowerswitch has a similar purpose as lowerindirectbranch though instead of replacing indirectbr instructions, it replaces switch instructions. Again, because the conditions and associated branch targets are explicitly listed in the switch instruction, this pass is straightforward.
LLBMC’s lowerindirectcall pass replaces indirect calls by a switch over the set of possible call targets, with each case in the switch calling a single target. The set of possible targets is over-approximated by the set of functions with a matching signature. This can be refined using LLVM’s alias analysis, though this currently does not happen in LLBMC.
LLBMC’s generalunroll optimization pass is derived from LLVM’s unroll pass but allows unrolling in corner cases which LLVM itself does not support. Note that this pass still relies on LLVM for detection of loops to unroll. This detection fails on loops with multiple entry points, e.g. as used in the so called Duff’s device. The pass scalarrepl, implemented in LLVM, replaces aggregates values by a set of scalar values. This reduces the need for support of aggregates in LLBMC. LLBMC’s lowergep replaces a getelementptr with more than 2 index arguments by a sequence of shorter ones. The code in listing 3.11 is the lowered form of the code in listing 3.6.
1 %RT = type { i8 , [10 x [20 x i32 ]], i8 } 2 %ST = type { i32 , double , %RT }
3
4 define i32* @foo ( %ST * %s) {
5 %t1 = getelementptr %ST , %ST * %s , i32 1
6 %t2 = getelementptr %ST , %ST * %t1 , i32 0, i32 2 7 %t3 = getelementptr %RT , %RT * %t2 , i32 0, i32 1
8 %t4 = getelementptr [10 x [20 x i32 ]], [10 x [20 x i32 ]]* %t3 , i32 ←- 0, i32 5
9 %t5 = getelementptr [20 x i32], [20 x i32 ]* %t4 , i32 0, i32 13
10 ret i32* %t5
11 }
Listing 3.11: Example for lowering of getelementptr
Finally, LLVM’s simplifycfg pass simplifies the program’s control flow graph and thereby ensures that only the entry basic block has no predecessor.