When we are using a frame pointer a nice property (that maybe you have already deduced from the figures above) holds: local data is always at lower addresses than the address pointed by fp while parameters passed in the stack (if any) will always be at higher addresses than the one pointed by fp. It must be possible to access both kinds of local data through fp.
In the following example we will use a function that receives an integer by reference (i.e., an address to an integer) and then squares that integer. In C that is:
void sq(int *c) {
(*c) = (*c) * (*c); }
17. Local data
You may be wondering why the function sqhas a reference parameter (should it not be easier to return a value?), but bear with us for now. We can (should?) implement sq
without using a frame pointer due to its simplicity.
sq:
ldr r2, [r0] @ r2 <- (*r0) ldr r3, [r0] @ r3 <- (*r0) mul r1, r2, r3 @ r1 <- r2 * r3 str r1, [r0] @ (*r0) <- r1
bx lr @ Return from the function
Now consider the following function that returns the sum of the squares of its five parameters. It uses the function sq defined above.
int sq_sum5(int a, int b, int c, int d, int e) { sq(&a); sq(&b); sq(&c); sq(&d); sq(&e); return a + b + c + d + e; }
Parameters a, b, c and d will be passed through registers r0, r1, r2, and r3 respec- tively. The parameter e will be passed through the stack. The function sq, though, expects a reference, i.e.,an address, to an integer and registers do not have an address. This means we will have to allocate temporary local storage for these registers. At least one integer will have to be allocated in the stack in order to be able to call sq but for simplicity we will allocate four of them.
This time we will use a frame pointer to access both the local storage and the parameter e.
sq_sum5:
push {fp, lr} /* Keep fp and all callee-saved registers. */ mov fp, sp /* Set the dynamic link */
sub sp, sp, #16 /* Allocate space for 4 integers in the stack */
/* Keep parameters in the stack */
str r0, [fp, #-16] @ *(fp - 16) <- r0 str r1, [fp, #-12] @ *(fp - 12) <- r1 str r2, [fp, #-8] @ *(fp - 8) <- r2 str r3, [fp, #-4] @ *(fp - 4) <- r3
17.4. Indexing through the frame pointer
/* At this point the stack looks like this | Value | Address(es) +---+--- | r0 | [fp, #-16], [sp] | r1 | [fp, #-12], [sp, #4] | r2 | [fp, #-8], [sp, #8] | r3 | [fp, #-4], [sp, #12] | fp | [fp], [sp, #16] | lr | [fp, #4], [sp, #20] | e | [fp, #8], [sp, #24] v Higher addresses */ sub r0, fp, #16 @ r0 <- fp - 16 bl sq @ call sq(&a); sub r0, fp, #12 @ r0 <- fp - 12 bl sq @ call sq(&b); sub r0, fp, #8 @ r0 <- fp - 8 bl sq @ call sq(&c); sub r0, fp, #4 @ r0 <- fp - 4 bl sq @ call sq(&d) add r0, fp, #8 @ r0 <- fp + 8 bl sq @ call sq(&e) ldr r0, [fp, #-16] @ r0 <- *(fp - 16). Loads a into r0 ldr r1, [fp, #-12] @ r1 <- *(fp - 12). Loads b into r1 add r0, r0, r1 @ r0 <- r0 + r1 ldr r1, [fp, #-8] @ r1 <- *(fp - 8). Loads c into r1 add r0, r0, r1 @ r0 <- r0 + r1 ldr r1, [fp, #-4] @ r1 <- *(fp - 4). Loads d into r1 add r0, r0, r1 @ r0 <- r0 + r1 ldr r1, [fp, #8] @ r1 <- *(fp + 8). Loads e into r1 add r0, r0, r1 @ r0 <- r0 + r1
mov sp, fp /* Undo the dynamic link */
pop {fp, lr} /* Restore fp and callee-saved registers */ bx lr /* Return from the function */
As you can see, we first store all parameters (but e) in the local storage. That means that we need to enlarge the stack enough, as usual, by subtracting from sp. Once we have the storage then we can do the actual store by using the fp register. Note the
17. Local data
usage of negative offsets, because local data will always be in lower addresses than the address infp. As mentioned above, the parameteredoes not have to be stored because it is already in the stack, at a positive offset from fp (i.e., at a higher address than the address in fp).
Note that, in this example, the frame pointer is not indispensable as we could have used
sp to access all the required data (see the representation of the stack).
In order to call sq we have to pass the addresses of the several integers, so we compute the address by subtracting from fp the proper offset and storing it in r0, which will be used for passing the first (and only) parameter of sq. See how, to pass the address ofe, we just compute an address with a positive offset. Finally we add the values by loading them again in r0 and r1 and using r0 to accumulate the sums.
An example program that calls sq sum5(1, 2, 3, 4, 5) looks like this.
/* squares.s */ .data
message: .asciz "\n Sum of 1^2 + 2^2 + 3^2 + 4^2 + 5^2 is %d\n"
.text sq: <<defined above>> sq_sum5: <<defined above>> .globl main main:
push {r4, lr} /* Keep callee-saved registers */
/* Prepare the call to sq_sum5 */ mov r0, #1 @ Parameter a <- 1 mov r1, #2 @ Parameter b <- 2 mov r2, #3 @ Parameter c <- 3 mov r3, #4 @ Parameter d <- 4
/* Parameter e goes through the stack, so it requires enlarging the stack */ mov r4, #5 @ r4 <- 5
sub sp, sp, #8 /* Enlarge the stack 8 bytes, we will use only the topmost 4 bytes */
17.4. Indexing through the frame pointer
str r4, [sp] @ Parameter e <- 5
bl sq_sum5 @ call sq_sum5(1, 2, 3, 4, 5) add sp, sp, #8 @ Shrink back the stack
/* Prepare the call to printf */
mov r1, r0 @ The result of sq_sum5 ldr r0, =message
bl printf @ Call printf
pop {r4, lr} /* Restore callee-saved registers */ bx lr
$ ./squares
Sum of 1ˆ 2 + 2ˆ 2 + 3ˆ 2 + 4ˆ 2 + 5ˆ 2 is 55
Projects
1. Rewrite some earlier programs to pass the parameters by reference.
2. Rewrite some earlier programs to reference variables using the frame pointer, even if it is not necessary.
18 Inline Assembler in C Code
Although we have concentrated on Assembler programming, we must admit that almost every microprocessor system comes with an excellent C compiler and it is expected that programmers will write in C. Sometimes, however, it is necessary to optimize the code by manually inserting assembler code into the C source code. Based on the GNU C compiler, most manufacturers allow for inline assembler code. Unfortunately, each company does it slightly differently so one must learn the details of each system that one uses.
18.1
The asm Statement
The format of the new asmstatement is as follows:
asm("<list of assembler instructions>"
: <list of write-only parameter formats and names> : <list of read-only parameter formats and names>
: <list of registers that are changed (clobbered) by the code> );
The details of using this instruction are many and complicated. Some requirements are unfortunate but depended on the whim of the compiler writer.
If there are no registers that are modified, the third colon “:” is not necessary. If there are no write-only parameters but some read-only ones, they must be preceded by two colons. The following is an example where there are no parameters at all:
int return1(void) {asm("mov r0, #1");}
This trivial function returns the integer value 1 in the usual register r0. No colons are necessary. It is not necessary to inform the compiler that r0 is being changed.
Another simple example is the following complete C program that initializes two global variables “i” and “j” and takes no parameters and returns nothing. The output should be “i = 100 and j = 200” after running the program.
18. Inline Assembler in C Code
void Initialization(void) {
asm("mov r0,#0; add %0,r0,#100; add %1,r0,#200":"+r"(i),"+r" (j)); } void main(void) { Initalization(); printf("\n i = %d and j = %d \n", i , j); }
Since there is just one colon before the parameter formats and names, they are write- only. Because of the ordering of the list, variable “i” is referenced by “%0” and variable “j” by “%1”. This count may be continued for additional parameters. [This is an older method that still works. A more recent way to refer to names is given below.] The formats are both “+r” (the strings themselves) in this case (the “+r” means it is read- write). The choices include “=r” for a write-only parameter with an integer register associated with it, and “=f” for a write-only parameter with a floating point register associated. By the way, they may be either global or local variables in general. We also note that the number of colons and the prefix “=” are redundant but that’s how to do it.
The style of programming we will use here will be to write the I/O portion of the program in C and do most of the actual processing in called functions written in assembler. This follows the HIPO (Hierarchical Input Process Output) model nicely.