X86 CHAPTER 13. STRLEN() - switch()/case/default

switch()/case/default

13.1. X86 CHAPTER 13. STRLEN()

About first: MOVSX (13.1.1) is intended to take byte from a point in memory and store value in a 32-bit register. MOVSX (13.1.1) meaning MOV with Sign-Extent. Rest bits starting at 8th till 31th MOVSX (13.1.1) will set to 1 if source byte in memory has minus sign or to 0 if plus.

And here is why all this.

C/C++ standard defines char type as signed. If we have two values, one is char and another is int, (int is signed too), and if first value contain − 2 (it is coded as 0xFE) and we just copying this byte into int container, there will be 0x000000FE, and this, from the point of signed int view is 254, but not − 2. In signed int, − 2 is coded as 0xFFFFFFFE. So if we need to transfer 0xFE value from variable of char type to int, we need to identify its sign and extend it. That is what MOVSX (13.1.1) does.

See also in section―Signed number representations‖ (32).

I‘m not sure if the compiler needs to storechar variable in the EDX, it could take 8-bit register part (let‘s sayDL). Apparently, compiler‘sregister allocatorworks like that.

Then we see TEST EDX, EDX. About TEST instruction, read more in section about bit fields (17). But here, this instruction just checking value in the EDX, if it is equals to 0.

13.1.2 Non-optimizing GCC

13.1. X86 CHAPTER 13. STRLEN() The result almost the same as MSVC did, but here we see MOVZX instead of MOVSX (13.1.1). MOVZX means MOV with Zero-Extent. This instruction copies 8-bit or 16-bit value into 32-bit register and sets the rest bits to 0. In fact, this instruction is convenient only since it enable us to replace two instructions at once: xor eax, eax / mov al, [...].

On the other hand, it is obvious to us the compiler could produce the code: mov al, byte ptr [eax] / test al, al —it is almost the same, however, the highest EAX register bits will contain random noise. But let‘s think it is compiler‘s drawback—it cannot produce more understandable code. Strictly speaking, compiler is not obliged to emit understandable (to humans) code at all.

Next new instruction for us is SETNZ. Here, if AL contain not zero, test al, al will set 0 to the ZF flag, but SETNZ, if ZF==0 (NZ means not zero) will set 1 to the AL. Speaking in natural language, if AL is not zero, let‘s jump to loc_80483F0. Compiler emitted slightly redundant code, but let‘s not forget the optimization is turned o_.

13.1.3 Optimizing MSVC

Now let‘s compile all this in MSVC 2012, with optimization turned on /(Ox):

Listing 13.1: MSVC 2012 /Ox /Ob0

_str$ = 8 ; size = 4

_strlen PROC

mov edx, DWORD PTR _str$[esp-4] ; EDX -> pointer to the string

mov eax, edx ; move to EAX

$LL2@strlen:

mov cl, BYTE PTR [eax] ; CL = *EAX

inc eax ; EAX++

test cl, cl ; CL==0?

jne SHORT $LL2@strlen ; no, continue loop

sub eax, edx ; calculate pointers difference

dec eax ; decrement EAX

ret 0

_strlen ENDP

Now it is all simpler. But it is needless to say the compiler could use registers such e_iciently only in small functions with small number of local variables.

INC/DEC—areincrement/decrementinstruction, in other words: add 1 to variable or subtract.

13.1.4 Optimizing MSVC + OllyDbg

We may try this (optimized) example in OllyDbg. Here is a very first iteration: fig. 13.1. We see that OllyDbg found a loop and, for convenience, wrapped its instructions in bracket. By clicking right button on EAX, we can choose ―Follow in Dump‖ and the memory window position will scroll to the right place. We can see here a string ―hello!‖ in memory. There are at least once zero byte a_er it and then random garbage. If OllyDbg sees that a register has an address pointing to a string, it will show it.

Let‘s press F8 (step over) enough time so the current address will be at the loop body begin again: fig. 13.2. We see that EAX contain address of the second character in the string.

We will press F8 enough times in order to escape from the loop: fig. 13.3. We will see that EAX now contain address of zeroth byte, placed right a_er the string. Meanwhile, EDX wasn‘t changed, so it still pointing to the string begin. Di_erence between these two addresses will be calculated now.

SUB instruction was just executed: fig. 13.4. Di_erence in the EAX—7. Indeed, the ―hello!‖ string length—6, but with ze-roth byte included—7. But the strlen() must return non-zero characters in the string. So the decrement will processed now and then return from the function.

13.1. X86 CHAPTER 13. STRLEN()

Figure 13.1: OllyDbg: first iteration begin

Figure 13.2: OllyDbg: second iteration begin

Figure 13.3: OllyDbg: pointers di_erence to be calculated now

100

13.1. X86 CHAPTER 13. STRLEN()

Figure 13.4: OllyDbg: EAX to be decremented now

13.1.5 Optimizing GCC

Let‘s check GCC 4.4.1 with optimization turned on -(O3 key):

public strlen

strlen proc near

arg_0 = dword ptr 8

push ebp

mov ebp, esp

mov ecx, [ebp+arg_0]

mov eax, ecx

loc_8048418:

movzx edx, byte ptr [eax]

add eax, 1 test dl, dl

jnz short loc_8048418

not ecx

add eax, ecx

pop ebp

retn

strlen endp

Here GCC is almost the same as MSVC, except of MOVZX presence.

However, MOVZX could be replaced here to mov dl, byte ptr [eax].

Probably, it is simpler for GCC compiler‘s code generator toremember the whole register is allocated for char variable and it can be sure the highest bits will not contain any noise at any point.

A_er, we also see new instruction NOT. This instruction inverts all bits in operand. It can be said, it is synonym to the XOR ECX, 0ffffffffh instruction. NOT and following ADD calculating pointer di_erence and subtracting 1. At the beginning ECX, where pointer to str is stored, inverted and 1 is subtracted from it.

See also: ―Signed number representations‖ (32).

In other words, at the end of function, just a_er loop body, these operations are executed:

ecx=str;

eax=eos;

ecx=(-ecx)-1;

eax=eax+ecx return eax

. . . and this is e_ectively equivalent to:

ecx=str;

eax=eos;

eax=eax-ecx;

101

13.2. ARM CHAPTER 13. STRLEN() eax=eax-1;

return eax

Why GCC decided it would be better? I cannot be sure. But I‘m sure the both variants are e_ectively equivalent in e_iciency sense.

Non-optimizing LLVM generates too much code, however, here we can see how function works with local variables in the stack. There are only two local variables in our function, eos and str.

In this listing, generated byIDA, I renamed var_8 and var_4 into eos and str manually.

So, first instructions are just saves input value in str and eos.

Loop body is beginning at loc_2CB8 label.

First three instruction in loop body (LDR, ADD, STR) loads eos value into R0, then value isincrementedand it is saved back into eos local variable located in the stack.

The next ―LDRSB R0, [R0]‖ (Load Register Signed Byte) instruction loading byte from memory at R0 address and sign-extends it to 32-bit. This is similar to MOVSX (13.1.1) instruction in x86. The compiler treating this byte as signed since char type in C standard is signed. I already wrote about it (13.1.1) in this section, but related to x86.

It is should be noted, it is impossible in ARM to use 8-bit part or 16-bit part of 32-bit register separately of the whole register, as it is in x86. Apparently, it is because x86 has a huge history of compatibility with its ancestors like 16-bit 8086 and even 8-bit 8080, but ARM was developed from scratch as 32-bit RISC-processor. Consequently, in order to process separate bytes in ARM, one have to use 32-bit registers anyway.

So, LDRSB loads symbol from string into R0, one by one. Next CMP and BEQ instructions checks, if loaded symbol is 0. If not 0, control passing to loop body begin. And if 0, loop is finishing.

At the end of function, a di_erence between eos and str is calculated, 1 is also subtracting, and resulting value is returned via R0.

N.B. Registers was not saved in this function. That‘s because by ARM calling convention,R0-R3 registers are ―scratch registers‖, they are intended for arguments passing, its values may not be restored upon function exit since calling function will not use them anymore. Consequently, they may be used for anything we want. Other registers are not used here, so that is why we have nothing to save on the stack. Thus, control may be returned back to calling function by simple jump (BX), to address in theLRregister.

102

In document 23 Hack in Sight 2014 (Page 119-124)