Introduction to the GNU Assembler - Introduction to Assembly Language

Introduction to Assembly Language

5.4 Introduction to the GNU Assembler

The GNU Assembler, part of the GNU tools, is used to convert assembly language source code into binary object files. The assembler is extensively documented in the GNU Assembler Manual, which can be found online at , http://sourceware.org/binutils/docs/as/index.html or

(if you have GNU tools installed on your system) in the gnutools/doc sub-directory.

What follows is a brief description, intended to highlight differences in syntax between the GNU Assembler and standard ARM assembly language, and to provide enough information to allow programmers to get started with the tools.

The names of GNU tool components will have prefixes indicating the target options selected, including operating system. An example would be arm-none-eabi-gcc, which might be used for

bare metal systems using the ARM EABI (described in Chapter 20 Writing NEON Code). 5.4.1 Invoking the GNU Assembler

You can assemble the contents of an ARM assembly language source file by running the

arm-none-eabi-as program.

arm-none-eabi-as -g -o filename.o filename.s

The option -g requests the assembler to include debug information in the output file.

When all of your source files have been assembled into binary object files (with the extension

.o), you use the GNU Linker to create the final executable in ELF format.

This is done by executing:

arm-none-eabi-ld -o filename.elf filename.o

For more complex programs, where there are many separate source files, it is more common to use a utility like make to control the build process.

You can use the debugger provided by either arm-none-eabi-gdb or arm-none-eabi-insight to run

the executable files on your host, as an alternative to a real target processor. 5.4.2 GNU Assembly language syntax

The GNU Assembler can target many different processor architectures and is not ARM specific. This means that its syntax is somewhat different from other ARM assemblers, such as the ARM toolchain. The GNU Assembler uses the same syntax for all of the many processor architectures that it supports.

Assembly language source files consist of a sequence of statements, one per line. Each statement has three optional parts, ordered as follows:

label: instruction @ comment

A label lets you identify the address of this instruction. This can then be used as a target for branch instructions or for load and store instructions. A label can be a letter followed (optionally) by a sequence of alphanumeric characters, followed by a colon.

Everything on the line after the @ symbol is treated as a comment and ignored (unless it is inside a string). C style comment delimiters “/*” and “*/” can also be used.

The instruction can be either an ARM assembly instruction, or an assembler directive. These are pseudo-instructions that tell the assembler itself to do something. These are required, amongst other things, to control sections and alignment, or create data.

Introduction to Assembly Language

At link an entry point can be specified on the command line if one has not been explicitly provided in the source code.

5.4.3 Sections

An executable program with code will have at least one section, which by convention will be called .text. Data can be included in a .data section.

Directives with the same names enable you to specify which of the two sections should hold what follows in the source file. Executable code should appear in a .text section and read/write

data in the .data section. Also read-only constants can appear in a .rodata section. Zero

initialized data will appear in .bss. The Block Started by Symbol (bss) segment defines the

space for uninitialized static data. 5.4.4 Assembler directives

This is a key area of difference between GNU tools and other assemblers.

All assembler directives begin with a period “.” A full list of these is described in the GNU documentation. Here, we give a subset of commonly used directives.

.align This causes the assembler to pad the binary with bytes of zero value, in data

sections, or NOP instructions in code, ensuring the next location will be on a word boundary.

.ascii “string”

Insert the string literal into the object file exactly as specified, without a NUL character to terminate. Multiple strings can be specified using commas as separators.

.asciiz Does the same as .ascii, but this time additionally followed by a NUL character

(a byte with the value 0 (zero)).

.byte expression, .hword expression, .word expression

Inserts a byte, halfword, or word value into the object file. Multiple values can be specified using commas as separators. The synonyms .2byte and .4byte can also

be used.

.data Causes the following statements to be placed in the data section of the final

executable.

.end Marks the end of this source code file. .equ symbol, expression

Sets the value of symbol to expression. The “=” symbol and .set have the same

effect.

.extern symbol

Indicates to the assembler (and more importantly, to anyone reading the code) that symbol is defined in another source code file.

.global symbol

Tells the assembler that symbol is to be made globally visible to other source files and to the linker.

.include “filename”

Inserts the contents of filename into the current source file and is typically used to include header files containing shared definitions.

.text This switches the destination of following statements into the text section of the

final output object file. Assembly instructions must always be in the text section. For reference, Table 5-1 shows common assembler directives alongside GNU and ARM tools. Not all directives are listed, and in some cases there is not a 100% correspondence between them.

Table 5-1 Comparison of syntax GNU

Assembler armasm Description

@ ; Comment

#& #0x An immediate hex value

.if IFDEF, IF Conditional (not 100% equivalent)

.else ELSE

.elseif ELSEIF

.endif ENDIF

.ltorg LTORG

| :OR: OR

& :AND: AND

<< :SHL: Shift Left >> :SHR: Shift Right

.macro MACRO Start macro definition

.endm ENDM End macro definition

.include INCLUDE GNU Assembler needs “file”

.word DCD A data word

.short DCW .long DCD .byte DCB .req RN .global IMPORT, EXPORT .equ EQU

Introduction to Assembly Language

5.4.5 Expressions

Assembly instructions and assembler directives often require an integer operand. In the assembler, this is represented as an expression to be evaluated. Typically, this will be an integer number specified in decimal, hexadecimal (with a 0x prefix) or binary (with a 0b prefix) or as an ASCII character surrounded by quotes.

In addition, standard mathematical and logical expressions can be evaluated by the assembler to generate a constant value. These can utilize labels and other pre-defined values. These expressions produce either absolute or relative values. Absolute values are

position-independent and constant. Relative values are specified relative to some linker-defined address, determined when the executable image is produced – an example might be some offset from the start of the .data section of the program.

5.4.6 GNU tools naming conventions

Registers are named in GCC as follows: • General registers: R0 - R15 • Stack pointer register: SP(R13) • Frame pointer register: FP(R11) • Link register: LR(R14)

• Program counter: PC(R15)

• Status register flags (x = C current or S saved): xPSR, xPSR_all, xPSR_f, xPSR_x, xPSR_ctl, xPSR_fs, xPSR_fx, xPSR_f, xPSR_cs, xPSR_cf. xPSR_cx etc.

Note

In Chapter 15 Application Binary Interfaces we will see how all of the registers are assigned a role within the procedure call standard and that the GNU Assembler lets us refer to the registers using their PCS names. See Table 15-1 on page 15-2.