ARM: 8 arguments - printf() with several arguments

printf() with several arguments

5.4 ARM: 8 arguments

This is optimized (-O3) version for ARM mode and here we see B as the last instruction instead of the familiar BL. Another di_erence between this optimized version and the previous one (compiled without optimization) is also in the fact that there is no function prologue and epilogue (instructions that save R0 and LR registers values). The B instruction just jumps to another address, without any manipulation of theLRregister, that is, it is analogous to JMP in x86. Why does it work? Because this code is, in fact, e_ectively equivalent to the previous. There are two main reasons: 1) neither the stack norSP, thestack pointer, is modified; 2) the call to printf() is the last instruction, so there is nothing going on a_er it. A_er finishing, the printf() function will just return control to the address stored inLR. But the address of the point from where our function was called is now in LR! Consequently, control from printf() will be returned to that point. As a consequence, we do not need to saveLRsince we do not need to modifyLR. We do not need to modifyLRsince there are no other function calls except printf(). Furthermore, a_er this call we do not to do anything! That‘s why this optimization is possible.

Another similar example was described in―switch()/case/default‖ section, here (11.1.1).

5.3.3 Optimizing Keil + thumb mode

There is no significant di_erence from the non-optimized code for ARM mode.

5.4 ARM: 8 arguments

Let‘s use again the example with 9 arguments from the previous section: 5.2.

void printf_main2()

5.4. ARM: 8 ARGUMENTS CHAPTER 5. PRINTF() WITH SEVERAL ARGUMENTS

The very first ―STR LR, [SP,#var_4]!‖ instruction saves LR on the stack, because we will use this register for the printf() call.

The second ―SUB SP, SP, #0x14‖ instruction decreases SP, the stack pointer, in order to allocate 0x14 (20) bytes on the stack. Indeed, we need to pass 5 32-bit values via the stack to the printf() function, and each one occupies 4 bytes, that is 5∗ 4 = 20 —exactly. The other 4 32-bit values will be passed in registers.

· Passing 5, 6, 7 and 8 via stack:

Then, the values 5, 6, 7 and 8 are written to the R0, R1, R2 and R3 registers respectively. Then, the ―ADD R12, SP,

#0x18+var_14‖ instruction writes an address of the point in the stack, where these 4 variables will be written, into the R12 register. var_14 is an assembly macro, equal to − 0x14, such macros are created by IDA to succinctly denote code accessing the stack. var_? macros created byIDAreflecting local variables in the stack. So, SP + 4 will be written into the R12 register. The next―STMIA R12, R0-R3‖ instruction writes R0-R3 registers contents at the point in memory to which R12 pointing. STMIA instruction meaning Store Multiple Increment A_er. Increment A_er means that R12 will be increased by 4 a_er each register value is written.

· Passing 4 via stack: 4 is stored in R0 and then, this value, with the help of―STR R0, [SP,#0x18+var_18]‖ instruction, is saved on the stack. var_18 is− 0x18, o_set will be 0, so, the value from the R0 register (4) will be written to the point whereSPis pointing to.

· Passing 1, 2 and 3 via registers:

Values of the first 3 numbers (a, b, c) (1, 2, 3 respectively) are passed in the R1, R2 and R3 registers right before the printf() call, and the other 5 values are passed via the stack:

· printf() call:

· Function epilogue:

The ―ADD SP, SP, #0x14‖ instruction returns the SP pointer back to its former point, thus cleaning the stack. Of course, what was written on the stack will stay there, but it all will be rewritten during the execution of subsequent functions.

The―LDR PC, [SP+4+var_4],#4‖ instruction loads the savedLRvalue from the stack into thePCregister, thus caus-ing the function to exit.

5.4. ARM: 8 ARGUMENTS CHAPTER 5. PRINTF() WITH SEVERAL ARGUMENTS .text:0000002A 01 AB ADD R3, SP, #0x18+var_14

.text:0000002C 07 C3 STMIA R3!, {R0-R2}

.text:0000002E 04 20 MOVS R0, #4

.text:00000030 00 90 STR R0, [SP,#0x18+var_18]

.text:00000032 03 23 MOVS R3, #3

.text:0000003E loc_3E ; CODE XREF: example13_f+16

.text:0000003E 05 B0 ADD SP, SP, #0x14

.text:00000040 00 BD POP {PC}

Almost same as in previous example, however, this is thumb code and values are packed into stack di_erently: 8 for the first time, then 5, 6, 7 for the second and 4 for the third.

5.4.3 Optimizing Xcode (LLVM): ARM mode

Almost the same what we already figured out, with the exception of STMFA (Store Multiple Full Ascending) instruction, it is synonym to STMIB (Store Multiple Increment Before) instruction. This instruction increasing value in the SP register and only then writing next register value into memory, but not vice versa.

Another thing we easily spot is the instructions are ostensibly located randomly. For instance, value in the R0 register is prepared in three places, at addresses 0x2918, 0x2920 and 0x2928, when it would be possible to do it in one single point.

However, optimizing compiler has its own reasons about how to place instructions better. Usually, processor attempts to simultaneously execute instructions located side-by-side. For example, instructions like―MOVT R0, #0‖ and ―ADD R0, PC, R0‖ cannot be executed simultaneously since they both modifying the R0 register. On the other hand, ―MOVT R0, #0‖ and

―MOV R2, #4‖ instructions can be executed simultaneously since e_ects of their execution are not conflicting with each other. Presumably, compiler tries to generate code in such a way, where it is possible, of course.

5.4.4 Optimizing Xcode (LLVM): thumb-2 mode

__text:00002BA0 _printf_main2

In document 23 Hack in Sight 2014 (Page 52-55)