From Assembly to an Executable Memory Image
This handout provides a detailed example of the steps that are required in order to get the processor of a machine to perform a certain task on some piece of data in the memory of that machine. These basic steps
are generally followed by your operating systems when you prompt them to run a program for you. Once understood, they will solidify your understanding of how a machine runs programs.
State of the World
We have seen how to write some assembly code in an ASCII file, assemble it using the assembler program as, extract the assembly instruction binary opcodes from it using the program objcopy, and load it into memory using the Gdb loadmem function. So far, we have done these steps manually. However, they can be automated through the use of Makefiles.
Makefiles
Inside your A1P3A directories, you will find a file called “Makefile”. This Makefile is used by the Unix
“make” utility to guide your computer through “making” your program ready for execution by a processor.
Here is a snippet of your A1P3A Makefile:
[awadyn@csa2 A1P3A]$ cat Makefile
# Copyright (C) 2020 by Jonathan Appavoo, Boston University
# …
# …
# …
… …
and: and.resbin <--- MAKE THIS TARGET TO TEST YOUR SOLUTION echo HEXDUMP of and.resbin
hexdump and.resbin echo Testing and.resbin
bash -c 'if md5sum and.resbin | grep $$(cat and.md5) ;\
then echo and: PASS; else echo and: FAIL; fi'
and.resbin: andtest.elf gdb -x andtest.gdb
andtest.elf: and.o andtest.obj
${LD} -e 0x1000 -M -T ${LDS} andtest.obj and.o -o andtest.elf > andtest.map
and.elf: and.o
${LD} -e 0x1000 -M -T ${LDS} and.o -o and.elf > and.map
and.o: and.s
${AS} -g -aglsm=and.lst and.s -o and.o objdump -r and.o > and.relocations
… …
Makefiles: Targets
A Makefile generally contains a set of targets that can be “made”. These targets are represented by the filenames to the left of the colon (‘:’) symbol. A target can be “made” by invoking the make utility on it:
[awadyn@csa2 A1P3A]$ make and.o
as -g -aglsm=and.lst and.s -o and.o (1) objdump -r and.o > and.relocations (2)
The commands that you see printed out on your terminal are the commands that were executed in order to make your target:
as -g -aglsm=and.lst and.s -o and.o (1)
You have seen this command before: it calls the assembler program (as), telling it to convert your ASCII and.s file to a binary and.o object file.
In addition, the option -aglsm=and.lst creates an assembly listing file called and.lst that contains general information (-ag) about your programming environment, your assembly code (-al), and any symbols (-as) or macros (-am) defined in your assembly code. These flags are combined together into one option, -aglsm=asm.lst, that is passed as a parameter to your assembler program.
Creating an Executable Binary Image
Once your and.o object file target is generated, you can run the Unix file command on it:
[awadyn@csa2 A1P3A]$ file and.o
and.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
Note that this object file is not yet executable. In particular, at this stage, it is still relocatable. This means that the file and.o contents need to be loaded in certain locations in memory in order to the execution of the assembly program to complete successfully.
In previous labs, we wrote trivial sequences of assembly instructions that did not have any symbolic names (such as function names or routine labels) or reference any variables. We simply moved data between registers and memory locations. Hence, after generating our .o object files, we simply extracted from them the binary opcodes of the assembly instructions using the objcopy utility. Then, we loaded those binary bits into memory and executed them.
However, as we start to write more complex instruction sequences, it will be essential for us to organize them as labeled sequences of computation that our program jumps between. In doing so, we will face the challenge of loading this data into memory such that, if we write an instruction that calls a function by its symbolic name in our assembly program:
[awadyn@csa2 210_S20]$ cat test_func_name.s _start:
jmp MY_FUNC EXIT:
int3
MY_FUNC:
addq $0x8, %rbx jmp EXIT
it can be loaded into the memory in a way that reflects the location in memory where that function instructions will be found:
test_func_name.o: file format elf64-x86-64 test_func_name.elf: file format elf64-x86-64
Disassembly of section .text: ---> Disassembly of section RAM:
0000000000000000 <_start>: ---> 0000000000001000 <_text_start>:
0: eb 01 jmp 3 <MY_FUNC> ---> 1000: eb 01 jmp 1003 <MY_FUNC>
0000000000000002 <EXIT>: 0000000000001002 <EXIT>:
2: cc int 1002: cc int3
0000000000000003 <MY_FUNC>: ---> 0000000000001003 <MY_FUNC>:
3: 48 83 c3 08 add $0x8,%rbx 1003: 48 83 c3 08 add $0x8,%rbx
7: eb f9 jmp 2 <EXIT> 1007: eb f9 jmp 1002 <EXIT>
This challenge can be resolved by converting this and.o relocatable object file into an executable and.elf binary file through the process of static linking:
[awadyn@csa2 A1P3A]$ make and.elf as -g -aglsm=and.lst and.s -o and.o objdump -r and.o > and.relocations
ld -e 0x1000 -M -T A1.lds and.o -o and.elf > and.map
The ld utility uses a file called a linker script. In the above command, the linker script is the file named A1.lds. It tells ld to load the contents of and.o into a memory image in a certain format and to label them by predefined sections. The -e 0x1000 flag tells ld to start loading the assembly instructions of and.o beginning at address 0x1000 of the memory image. This memory image is then saved in the binary file and.elf.
[awadyn@csa2 A1P3A]$ file and.elf
and.elf: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
The -M flag above tells ld to generate a file and.map that shows the mapping between the contents of and.o and their memory locations after ld creates the required memory image. Besides generating and.map, the steps that ld performs to load our program into memory are the steps that we performed manually on Gdb in previous labs.
Note
Making a target completes successfully when all of its dependencies are available.
Dependencies are represented by the filenames to the right of the colon (‘:’) symbol. If a target’s dependencies do not exist, the make utility will attempt to generate them by finding their respective targets in the Makefile. Consequently, generating the target and.elf that depends on target and.o begins by searching for file and.o and, when not found, generates and.o by running the assembler (as) utility and the objdump utility as described above.
Loading an Executable Binary Image onto Kex
Executing and testing your code via Gdb is now a matter of loading your binary and.elf into the memory of your Kex machine and executing your instructions:
[awadyn@csa2 A1P3A]$ make and.elf as -g -aglsm=and.lst and.s -o and.o objdump -r and.o > and.relocations
ld -e 0x1000 -M -T A1.lds and.o -o and.elf > and.map [awadyn@csa2 A1P3A]$ gdb -x 210.gdb
GNU gdb (GDB) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
… …
…
Breakpoint 1, 0x0000000000001000 in ?? () Binary General Purpose Register Dump:
rax: 1000 $1 = 1000000000000
… …
…
r15: 0 $16 = 0
rip: 1000 $17 = 1000000000000 eflags: 86 $18 = 10000110
0x1000: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0x1008: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 (gdb) x/4i 0x1000
=> 0x1000: add %al,(%rax) 0x1002: add %al,(%rax) 0x1004: add %al,(%rax) 0x1006: add %al,(%rax) (gdb) help load
Dynamically load FILE into the running program, and record its symbols for access from GDB.
Usage: load [FILE] [OFFSET]
An optional load OFFSET may also be given as a literal address.
When OFFSET is provided, FILE must also be provided. FILE can be provided on its own.
(gdb) load and.elf
Loading section RAM, size 0x3000 lma 0x1000 Start address 0x1000, load size 12288 Transfer rate: 400 KB/sec, 768 bytes/write.
(gdb) x/4i 0x1000
=> 0x1000: int3
0x1001: nopw %cs:0x0(%rax,%rax,1) 0x100b: nopw %cs:0x0(%rax,%rax,1) 0x1015: nopw %cs:0x0(%rax,%rax,1) (gdb) help file
Use FILE as program to be debugged.
It is read for its symbols, for getting the contents of pure memory, and it is the program executed when you use the `run' command.
If FILE cannot be found as specified, your execution directory path ($PATH) is searched for a command of that name.
No arg means to have no executable file and no symbols.
(gdb) file and.elf
A program is being debugged already.
Are you sure you want to change the file? (y or n) y Reading symbols from and.elf...
Warning: the current language does not match this frame.
(gdb) x/4i 0x1000
=> 0x1000 <_text_start>: int3
0x1001 <_text_end>: nopw %cs:0x0(%rax,%rax,1)
0x100b <_text_end+10>: nopw %cs:0x0(%rax,%rax,1)
0x1015 <_text_end+20>: nopw %cs:0x0(%rax,%rax,1) (gdb) stepi
… …
… …
… …
… to be continued.