Systems Programming
02. C Programs in Space and Time
Alexander Holupirek
Database and Information Systems Group Department of Computer & Information Science
University of Konstanz
Summer Term 2008
1
C programs in (address) space and (run-)time
Where is my data and why do I have to know?
I C is closely related to the machine. Before talking about pointers, storage allocation etc. some background knowledge about address space, (virtual) memory and its allocation during program execution comes in handy
I Knowledge about the memory layout of a program is quite helpful when debugging
I Knowledge about what is happening inside the machine on program execution is fundamental, to both, debugging programs and, in first place, writing clean code
2
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
3
C, assembler, and machine code
int a, b;
a = b * b;
mov 0x403030,%eax imul 0x403030,%eax mov %eax,0x403020
4012ee a1 4012ef 30 4012f0 30 4012f1 40 4012f2 00 4012f3 0f 4012f4 af 4012f5 05 4012f6 30 4012f7 30 4012f8 40 4012f9 00 4012fa a3 4012fb 20 4012fc 30 4012fd 40 4012fe 00 ausführbarer Binärcode (hexa- dezimal dargestellt)
Intel iA32-Assembler-Quellcode
Maschinenbefehle bzw.
Prozessorinstruktionen
Adresse
Inhalt (je 1 Byte) C-Quellcode
4
C, assembler, and machine code
int a=4, b;
int main(void) { if (a>5)
b=1;
else b=0;
}
8048344: 83 3d 94 94 04 08 05 cmpl $0x5,0x8049494 804834b: 7e 0c jle 8048359
804834d: c7 05 8c 95 04 08 01 movl $0x1,0x804958c 8048354: 00 00 00
8048357: eb 0a jmp 8048363
8048359: c7 05 8c 95 04 08 00 movl $0x0,0x804958c 8048360: 00 00 00
8048363: c9 ...
Speicher- adresse
Speicherinhalt (=Maschinenbefehl)
C-Quellcode Ausführbarer Binärcode Assembler-Quellcode
a liegt auf Adresse 0x8049494 b liegt auf Adresse 0x804958c
Zahlenwerte in Binär- und Assemblercode sind alle hexadezimal zu verstehen
5
Address Space
0
max.
0x10000000
0x1000000f 0x10000010
Datenblock
0x50000000 0x50000001
16 Byte
Größe des Datenblocks Startadresse des
Datenblocks
Letzte Byteadresse des Datenblocks
Adresse des ersten Byte nach dem Datenblock
Tiefstmögliche Adresse (»Speicherbeginn«)
Höchstmögliche Adresse (»Speicherende«)
Speicheradressen Speicherinhalte
Adressen einzelner Byte
0x56 0xfc
6
Byte Ordering
Adr.
Adressraum
Daten (4 Byte):
MSB LSB
d3 d2 d1 d0
0
n
max.
Big-Endian-System Little-Endian-System
Adr. Inhalt MSB
LSB
Mit der Adresse n wird auf die 4 Byte großen Daten im Programm zugegriffen n
n+1 n+2 n+3
d3 d2 d1 d0
Adr. Inhalt d0 d1 d2 d3 n
n+1 n+2 n+3
LSB
MSB
MSB = Most Significant Byte (höchstwertiges Byte) LSB = Least Significant Byte (niedrigstwertiges Byte)
Alignment Rules
Goal: Optimal Performance
I Determine the address locations for variables and instructions
I Great impact on compiler, assembler, linker tools
Adressraum
Adressen (hexadezimal)
0x35 0x36 0x37 0x38 Daten-
Langwort (misaligned)
Datenbus
Adressoffsets (Byteadressen)
1. Zugriff
2. Zugriff
Langwortgrenzen auf dem Bus
Langwortgrenzen (ohne Rest durch 4 teilbar) im Adressraum +0
0x34 +1 0x35
+2 0x36
+3 0x37
0x38 0x39 0x3a 0x3b
Alignment Rules (cont.)
For derived types16 (constructed from the basic types) alignment rules apply to each single component:
struct artikel {char name[5];
int anzahl;
double preis;};
alignment(1) alignment(4)
Alignment rules may be influenced through compiler directives
(-malign-int aligns variables on 32-bit boundaries producing code that runs somewhat faster on processors with 32-bit busses at the expense of memory)
16arrays, functions, pointers, structures, unions (we will discuss them later)
9
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
10
Storage Classes
Placement of data in memory depends on storage class
I An object, such as a variable, is a location in storage, and its interpretation depends on two main attributes: its storage class and its type
I The storage class determines the lifetime of the storage associated with the identified object
I The types determines the meaning of the values found in the identified object.
I In C we have two storage classes: automatic and static
I Storage class specifiers (auto, extern, register, static) together with the context of an object’s declaration, specify its storage class
11
Automatic Storage Class
Automatic Objects
I auto and register give the declared objects automatic storage class, and may be used only within functions
I They are local to a block17, discarded on exit from the block
I Declarations within a block create automatic objects if no storage class specification is mentioned or auto is used
I Initialization of automatic objects is performed each time the block is entered at the top (if a jump into the block is
executed the initializations are not performed)
I Objects declared register are automatic, and are (if possible) stored in fast registers of the machine
I For register the address operator ’&’ is not allowed
17aka “compound statement”, such as the body of a function
12
Static Storage Class
Static Objects
I May be local to a block or external to all blocks
I In both cases, they retain their values across exit from and reentry to functions and blocks
I Within a block, static objects are declared with static
I Objects declared outside of all blocks (at the same level as function definitions) are always static
I On the outer level, the keyword static makes them local to a particular translation unit (internal linkage)
I They are global to an entire program by omitting an explicit storage class, or by using extern (external linkage)
13
Storage Class and Sections
Intermediate Summary
I A program executed does not only use storage for its instructions, but additionally needs space for, e.g., variables
I Variables may be temporary, dynamically allocated, or static (i.e., permanent in terms of storage allocation), initialized or uninitialized, declared as constant (const) and thus read-only
I Placement of data in memory depends on its storage class
I During the translation process the compiler uses sections to divide the address space into logical units
I Details vary with operating systems and compiler used
14
Typical Program Organisation
A typical program divides naturally in sections
Code machine instructions, should be unmodifiable, size is known after compilation, does not change (.text)
Data I static data
I initialized (.data) /uninitialized (.bbs)
I constant address in memory
I permanent life time
I dynamic data
I stack or heap
I storage space not known
I volatile life time
Program Sections
.text
.data
.bss
PROM oder RAM
RAM
RAM Adressraum
schreibgeschützt
PROM:
Programmable Read Only Memory (im Betrieb nicht beschreibbarer Speicherbaustein)
RAM:
Random Access Memory (Speicher mit wahlfreiem Zugriff)
Virtual Memory and Segments
Virtual Memory
I Whenever a process is created, the kernel provides a chunk of physical memory which can be located anywhere
I Through the magic of virtual memory (VM), the process believes it has all the memory on the computer
Typically the VM space is laid out in a similar manner:
I Text Segment (.text)
I Initialized Data Segment (.data)
I Uninitialized Data Segment (.bss)
I The Stack
I The Heap
17
A Program in Memory
Code, Konstanten initialisierte Daten nicht initialisierte Daten
Heap 0
Adressen
aus ausführbarer Datei geladen bei Prozessstart bereitgestellt und mit 0 initialisiert (gelöscht) bei Prozessstart bereitgestellt, für dynamische Speicherallozierung,
bei Prozessstart bereitgestellt, wächst zu tieferen Adressen (bzw. zu höheren Adr.;
wächst dem Stapel entgegen
prozessorabhängig) Stack
static data dynamic data
18
Different Memory Layouts
Code, Konstanten initialisierte Daten nicht initialisierte Daten
Heap 0
Adressen
Stack
Code, Konstanten initialisierte Daten nicht initialisierte Daten
Heap Stack 0
Adressen
(A) Lösung auf PC (iA32) (B) Stack umgekehrt wachsend
Programm- startadresse
19
Memory Segments
Text Segment The text segment contains the actual code
(including constants) to be executed. It’s usually sharable, so multiple instances of a program can share the text segment to lower memory requirements. This segment is usually marked read-only so a program can’t modify its own instructions.
Initialized Data Segment This segment contains global variables which are initialized by the programmer.
Uninitialized Data Segment Also named .bss (block started by symbol) which was an operator used by an old assembler.
This segment contains uninitialized global variables. All variables in this segment are initialized to 0 or NULL pointers before the program begins to execute.
20
Memory Segments (cont.)
The Stack The stack is a collection of stack frames which we will discuss later. When a new frame needs to be added (as a result of a newly called function), the stack grows downward.
The Heap Dynamic memory, where storage can be (de-)allocated via C’s free(3)/malloc(3). The C library also gets dynamic memory for its own personal workspace from the heap as well. As more memory is requested “on the fly”, the heap grows upward.
21
Variable Placement and Life Time (Code)
int a ;
s t a t i c int b ; v o i d
f u n c (v o i d) {
c h a r c ; s t a t i c int d ; }
int
m a i n (v o i d) {
int e ;
int * pi = (int*) m a l l o c (s i z e o f(int));
f u n c ();
f u n c ();
f r e e ( pi );
r e t u r n ( 0 ) ; }
22
Variable Placement and Life Time (Code)
int a ; /* P e r m a n e n t l i f e t i m e */
s t a t i c int b ; /* dito , but r e d u c e d s c o p e */
v o i d f u n c (v o i d) {
c h a r c ; /* o n l y for the l i f e t i m e of f u n c () */
/* but 2 x ; v i s i b l e o n l y in f u n c () */
s t a t i c int d ; /* i ’ m unique , e x i s t o n c e at a s t a b l e */
/* address , v i s i b l e o n l y in f u n c () */
} int
m a i n (v o i d) {
int e ; /* l i f e t i m e of m a i n () */
int * pi = (int*) m a l l o c (s i z e o f(int)); /* n e w b o r n */
f u n c ();
f u n c ();
f r e e ( pi ); /* RIP , pi p o i n t s to an i n v a l i d a d d r e s s */
r e t u r n ( 0 ) ; }
Variable Placement and Life Time (Diagram)
t=0: Programmausführung wird gestartet, d.h., Ausführungsum- gebung ist bereits initialisiert t=x: beliebiger Zeitpunkt während der Programmausführung Code
Daten
Halde (Heap)
Stapel (Stack) Adresse
0
max.
PC(t=0) PC(t=x)
pi
SP(t=0) SP(t=x)
1. Instruktion 2. Instruktion 3. Instruktion 4. Instruktion ...
a b
c pi e
int d
Variable Placement
Variables (outside a function) Globally declared variables go to the Uninitialized Data Segment if they are not initialized, to Initialized Data Segment otherwise. Necessary for the OS to decide if storage has to be loaded with initialization data from the executable binary.
Variables (inside a function) Implicit assumption of auto, go to The Stack. Declared as static, see above.
Constants (const) Text Segment
Function Parameters Are pushed on The Stack or stored in registers. If pointers are passed, data is elsewhere.
25
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
26
From source code to executable code
Translation Steps (multi-phase compilation)
Compilation HLL source code to assembler source code Assembly Assembler source code to object code
Linking Object code to executable code
Compilers and assemblers create object files containing the generated binary code and data for a source file. Linkers combine multiple object files into one, loaders take object files and load them into memory.
Goal: An executable binary file (a.out)
From high-level language (HLL) source code to executable code, i.e., concrete processor instructions in combination with data.
27
Translation steps using gcc(1)
Präprozessor Compiler Assembler Binder
*.c/*.cc/*.cpp
*.s
*.s
*.o
*.o/*.a
a.out Eingabe-
Ausgabe-
Quellcode C/C++ Assembler-Quellcode
Assembler-Quellcode Objektdatei Ausführbare Datei (= Objektdatei, ladbar)
Objektdatei,
*.i/*.ii
Vorverarbeiteter
Bibliotheksdatei
dateien
dateien
C/C++-Quellcode (ungebunden)
Objektdatei (ungebunden)
28
File suffixes and their meaning
For any given input file, the file name suffix determines what kind of compilation is done (see gcc(1)) for more details and suffixes:
suffix compilation step
.c C source code which must be preprocessed .i C source code which should not be preprocessed .h Header file to be turned into a precompiled header .s Assembler code
.o An object file to be fed straight into linking
29
Creation of an executable file
= Operation
= Eingang oder
= Kommando
(Filename).o
a.out ld
gas Assemblieren
(Filename).s gcc Kompilieren (Filename).c
Object/Library Files
Binden
Ausgang
30
The C Preprocessor
The C preprocessor performs . . .
I Inclusion of named files
I Macro Substitution
I Conditional Compilation
File Inclusion
A control line of the form
# i n c l u d e f i l e n a m e
causes the replacement of that line by the entire contents of the file filename.
Note
The characters in the name filename must not include > or \n, and the effect is undefined if it contains any of ", ’, \ , or /*.
Location
The named file is searched for in a sequence of implementation- dependent places (often starting in /usr/include).
Macro Substitution
A control line of the form
# d e f i n e i d e n t i f i e r token - s e q u e n c e
causes the preprocessor to replace subsequent instances of the identifier with the given sequence of tokens.
Example
# d e f i n e E X I T _ F A I L U R E 1
# d e f i n e E X I T _ S U C C E S S 0
# d e f i n e S _ I R W X U 0 0 0 0 7 0 0 /* RWX m a s k for o w n e r */
# d e f i n e S _ I R U S R 0 0 0 0 4 0 0 /* R for o w n e r */
# d e f i n e S _ I W U S R 0 0 0 0 2 0 0 /* W for o w n e r */
# d e f i n e S _ I X U S R 0 0 0 0 1 0 0 /* X for o w n e r */
33
Macro Substitution (cont.)
A control line of the form
# d e f i n e i d e n t i f i e r ( i d e n t i f i e r - l i s t ) token - s e q u e n c e
where there is no space between the first identifier and the ’(’, is a macro definition with parameters given by the identifier list.
Example
# d e f i n e S _ I S D I R ( m ) (( m & 0 1 7 0 0 0 0 ) == 0 0 4 0 0 0 0 ) /* d i r e c t o r y */
# d e f i n e S _ I S C H R ( m ) (( m & 0 1 7 0 0 0 0 ) == 0 0 2 0 0 0 0 ) /* c h a r sp . */
# d e f i n e S _ I S B L K ( m ) (( m & 0 1 7 0 0 0 0 ) == 0 0 6 0 0 0 0 ) /* b l o c k sp . */
# d e f i n e S _ I S R E G ( m ) (( m & 0 1 7 0 0 0 0 ) == 0 1 0 0 0 0 0 ) /* r e g u l a r */
# d e f i n e S _ I S F I F O ( m ) (( m & 0 1 7 0 0 0 0 ) == 0 0 1 0 0 0 0 ) /* f i f o */
34
Macro Substitution (cont.)
A control line of the form
# u n d e f i d e n t i f i e r
causes the identifier’s preprocessor definition to be forgotten. It is not erroneous to apply #undef to an unknown identifier.
Example
/*
* S o m e h e a d e r f i l e s may d e f i n e an abs m a c r o .
* If defined , u n d e f it to p r e v e n t a s y n t a x e r r o r
* and i s s u e a w a r n i n g .
* # w a r n i n g is a p r a g m a ( i m p l e m e n t a t i o n - d e p e n d e n t a c t i o n )
*/
# i f d e f abs
# u n d e f abs
# w a r n i n g abs m a c r o c o l l i d e s w i t h abs () p r o t o t y p e , u n d e f i n i n g
# e n d i f
35
Conditional Inclusion
Parts of a program may be compiled conditionally
Example
# i f n d e f N U L L
# i f d e f _ _ G N U G _ _
# d e f i n e N U L L _ _ n u l l
# e l s e
# d e f i n e N U L L 0 L
# e n d i f
# e n d i f
36
Predefined Names
Several identifiers are predefined, and expand to produce special information. They, and also the preprocessor expression operator defined, may not be undefined or redefined.
LINE A decimal constant containing the current source line number FILE A string literal containing the name of the file being compiled DATE A string literal containing the data of compilation ’Mmm dd yyyy’
TIME A string literal containing the data of compilation ’hh:mm:ss’
STDC The constant 1. It is intended that this identifier be defined to be 1 only in standard-conforming implementations
37
Compilation
HLL-Quellcode Compiler
Assembler-Quellcode
Übersetzungsliste mit Text
Text
Text evtl. temporäre Dateien
Kompilation
Fehlermeldungen
38
Assembly
Assembler-
Assemblierung
Assembler
Maschinencode und
Übersetzungsliste mit Fehler- Text
Objektformat
Text evtl. temporäre Dateien
Quellcode
Zusatzinformationen
meldungen und Symboltabelle
Linking
Binden
Binder (Linker)
Absoluter Code oder relozier-
Link Map (Adressraum- benutzung), Symbolliste
Binärcode od.
Text evtl. temporäre Dateien
Objektformat
Objektformat
Bibliotheksobjektformat Maschinencode und Zusatzinfo.
Maschinencode und Zusatzinfo.
Maschinencode und Zusatzinfo. library search
Objektformat
barer Code mit Zusatzinfo.
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
41
Program Section In Virtual Memory
Sektion .text (Code):
Sektion .data (init. Daten) 0
xx
0 yy
Adressraum 0
0x08048244
0x08049370
0xffffffff
Nach Kompilation Nach Bindung
Jede Sektion beginnt bei Adr. 0, Sektionen Alle Sektionen sind im Adress- sind »logische. Adressräume« des Compilers raum »absolut« platziert
42
Linking an Executable Binary
OBJ1
OBJ2
OBJ3
.data1
.text2 .bss2
.text3 .data3 .bss3
.text1 .bss1
.text1 .text2 .text3 .data1 .data3 .bss1 .bss2 .bss3 Eingabedaten: ungebundene Objektdateien
Verarbeitungsresultat: ausführbare Datei (gebunden, reloziert) Bindung (linking)
OBJtotal
.text: Code
.data: initialisierte Variablen .bss: nicht initialisierte Variablen
I Each object code (compiled seperately) starts at address 0
I Linking them together involves
I centralization of sections
I relocation of adresses
43
Relocation Records
I Once sections are placed subsequently, relocation can start
I Executable code contains embedded addresses
I Static data, function calls, jump targets
I On relocation those have to be changed inside the code
I Without a relocation table this is not possible
I A relocation record holds the relative address of a symbol (name of a variable, a function etc.)
R E L O C A T I O N R E C O R D S FOR [. t e x t ]:
O F F S E T T Y P E V A L U E
0 0 0 0 0 0 1 a R _ 3 8 6 _ 3 2 b 0 0 0 0 0 0 2 3 R _ 3 8 6 _ 3 2 a 0 0 0 0 0 0 2 9 R _ 3 8 6 _ 3 2 b
44
Source File: compile.c
int a = 1; /* G l o b a l v a r i a b l e , i n i t i a l i z e d - > . d a t a */
int b ; /* G l o b a l v a r i a b l e , u n i n i t i a l i z e d - > . bss */
int
m a i n (v o i d) {
s t a t i c int c ; /* Local , s t a t i c v a r i a b l e - > . bss */
b = 5;
c = b + a + 16;
r e t u r n c ; }
I Compile a relocatable object file
cc -c compile.c (creates compile.o)
I Linking an executable binary (one-step compilation) cc compile.c -o compile
45
Analysis of Object Files (compile.o)
$ f i l e c o m p i l e . o
ELF 32 - bit LSB r e l o c a t a b l e , I n t e l 80386 , v e r s i o n 1 , not s t r i p p e d
$ o b j d u m p - x c o m p i l e . o
c o m p i l e . o : f i l e f o r m a t elf32 - i 3 8 6 c o m p i l e . o
a r c h i t e c t u r e : i386 , f l a g s 0 x 0 0 0 0 0 0 1 1 : H A S _ R E L O C , H A S _ S Y M S
s t a r t a d d r e s s 0 x 0 0 0 0 0 0 0 0 S e c t i o n s :
Idx N a m e S i z e VMA LMA F i l e off A l g n
0 . t e x t 0 0 0 0 0 0 5 a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 4 2 * * 2 C O N T E N T S , ALLOC , LOAD , RELOC , R E A D O N L Y , C O D E 1 . d a t a 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 2 * * 2
C O N T E N T S , ALLOC , LOAD , D A T A
2 . bss 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 4 2 * * 2 A L L O C
3 . r o d a t a 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 4 2 * * 0 C O N T E N T S , ALLOC , LOAD , R E A D O N L Y , D A T A
46
Object File: compile.o (cont.)
S Y M B O L T A B L E :
0 0 0 0 0 0 0 0 l df * ABS * 0 0 0 0 0 0 0 0 c o m p i l e . c 0 0 0 0 0 0 0 0 l d . t e x t 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 l d . d a t a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 l d . bss 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 l O . bss 0 0 0 0 0 0 0 4 c .0 0 0 0 0 0 0 0 0 l d . r o d a t a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g O . d a t a 0 0 0 0 0 0 0 4 a 0 0 0 0 0 0 0 0 g F . t e x t 0 0 0 0 0 0 5 a m a i n 0 0 0 0 0 0 0 4 O * COM * 0 0 0 0 0 0 0 4 b R E L O C A T I O N R E C O R D S FOR [. t e x t ]:
O F F S E T T Y P E V A L U E
0 0 0 0 0 0 1 a R _ 3 8 6 _ 3 2 b 0 0 0 0 0 0 2 3 R _ 3 8 6 _ 3 2 a 0 0 0 0 0 0 2 9 R _ 3 8 6 _ 3 2 b 0 0 0 0 0 0 3 1 R _ 3 8 6 _ 3 2 . bss 0 0 0 0 0 0 3 6 R _ 3 8 6 _ 3 2 . bss 0 0 0 0 0 0 4 c R _ 3 8 6 _ 3 2 . r o d a t a
c o m p i l e . o : f i l e f o r m a t elf32 - i 3 8 6 D i s a s s e m b l y of s e c t i o n . t e x t :
0 0 0 0 0 0 0 0 < main >:
0: 55 p u s h % ebp
1: 89 e5 mov % esp ,% ebp
3: 83 ec 18 sub $0x18 ,% esp
6: 83 e4 f0 and $ 0 x f f f f f f f 0 ,% esp
9: b8 00 00 00 00 mov $0x0 ,% eax
e : 29 c4 sub % eax ,% esp
10: a1 00 00 00 00 mov 0 x0 ,% eax
15: 89 45 e8 mov % eax ,0 x f f f f f f e 8 (% ebp )
18: c7 05 00 00 00 00 05 m o v l $0x5 ,0 x0
1 f : 00 00 00
22: a1 00 00 00 00 mov 0 x0 ,% eax
27: 03 05 00 00 00 00 add 0 x0 ,% eax
2 d : 83 c0 10 add $0x10 ,% eax
30: a3 00 00 00 00 mov % eax ,0 x0
35: a1 00 00 00 00 mov 0 x0 ,% eax
3 a : 8 b 55 e8 mov 0 x f f f f f f e 8 (% ebp ) ,% edx
3 d : 3 b 15 00 00 00 00 cmp 0 x0 ,% edx
43: 74 13 je 58 < m a i n +0 x58 >
45: 83 ec 08 sub $0x8 ,% esp
48: ff 75 e8 p u s h l 0 x f f f f f f e 8 (% ebp )
4 b : 68 00 00 00 00 p u s h $ 0 x 0
50: e8 fc ff ff ff c a l l 51 < m a i n +0 x51 >
55: 83 c4 10 add $0x10 ,% esp
58: c9 l e a v e
c o m p i l e . o : f i l e f o r m a t elf32 - i 3 8 6 D i s a s s e m b l y of s e c t i o n . t e x t :
0 0 0 0 0 0 0 0 < main >:
int b ; /* G l o b a l v a r i a b l e , u n i n i t i a l i z e d - > . bss */
int
m a i n (v o i d) {
0: 55 p u s h % ebp
... 6 m o r e l i n e s ...
15: 89 45 e8 mov % eax ,0 x f f f f f f e 8 (% ebp )
s t a t i c int c ; /* Local , s t a t i c v a r i a b l e - > . bss */
b = 5;
18: c7 05 00 00 00 00 05 m o v l $0x5 ,0 x0
1 f : 00 00 00
c = b + a + 16;
22: a1 00 00 00 00 mov 0 x0 ,% eax
27: 03 05 00 00 00 00 add 0 x0 ,% eax
2 d : 83 c0 10 add $0x10 ,% eax
30: a3 00 00 00 00 mov % eax ,0 x0
r e t u r n c ;
35: a1 00 00 00 00 mov 0 x0 ,% eax
}
... 10 m o r e l i n e s ...
49
Executable Binary File: compile
c o m p i l e : f i l e f o r m a t elf32 - i 3 8 6 c o m p i l e
a r c h i t e c t u r e : i386 , f l a g s 0 x 0 0 0 0 0 1 1 2 : EXEC_P , H A S _ S Y M S , D _ P A G E D
s t a r t a d d r e s s 0 x 1 c 0 0 0 4 0 8 S e c t i o n s :
Idx N a m e S i z e VMA LMA F i l e off A l g n
...
9 . t e x t 0 0 0 0 0 2 1 4 1 c 0 0 0 4 0 8 1 c 0 0 0 4 0 8 0 0 0 0 0 4 0 8 2 * * 2 C O N T E N T S , ALLOC , LOAD , R E A D O N L Y , C O D E
...
12 . d a t a 0 0 0 0 0 0 1 4 3 c 0 0 1 0 0 8 3 c 0 0 1 0 0 8 0 0 0 0 1 0 0 8 2 * * 2 C O N T E N T S , ALLOC , LOAD , D A T A
...
20 . bss 0 0 0 0 0 1 8 4 3 c 0 0 3 1 0 0 3 c 0 0 3 1 0 0 0 0 0 0 1 1 0 0 2 * * 5 A L L O C
S Y M B O L T A B L E :
3 c 0 0 3 1 4 0 l O . bss 0 0 0 0 0 0 0 4 c .0 3 c 0 0 3 2 8 0 g O . bss 0 0 0 0 0 0 0 4 b 1 c 0 0 0 5 c 0 g F . t e x t 0 0 0 0 0 0 5 a m a i n 3 c 0 0 1 0 1 8 g O . d a t a 0 0 0 0 0 0 0 4 a
50
1 c 0 0 0 5 c 0 < main >:
int b ; /* G l o b a l v a r i a b l e , u n i n i t i a l i z e d - > . bss */
int
m a i n (v o i d) {
1 c 0 0 0 5 c 0 : 55 p u s h % ebp
1 c 0 0 0 5 c 1 : 89 e5 mov % esp ,% ebp
1 c 0 0 0 5 c 3 : 83 ec 18 sub $0x18 ,% esp
1 c 0 0 0 5 c 6 : 83 e4 f0 and $ 0 x f f f f f f f 0 ,% esp
1 c 0 0 0 5 c 9 : b8 00 00 00 00 mov $0x0 ,% eax
1 c 0 0 0 5 c e : 29 c4 sub % eax ,% esp
1 c 0 0 0 5 d 0 : a1 00 31 00 3 c mov 0 x 3 c 0 0 3 1 0 0 ,% eax
1 c 0 0 0 5 d 5 : 89 45 e8 mov % eax ,0 x f f f f f f e 8 (% ebp )
s t a t i c int c ; /* Local , s t a t i c v a r i a b l e - > . bss */
b = 5;
1 c 0 0 0 5 d 8 : c7 05 80 32 00 3 c 05 m o v l $0x5 ,0 x 3 c 0 0 3 2 8 0 1 c 0 0 0 5 d f : 00 00 00
c = b + a + 16;
1 c 0 0 0 5 e 2 : a1 18 10 00 3 c mov 0 x 3 c 0 0 1 0 1 8 ,% eax 1 c 0 0 0 5 e 7 : 03 05 80 32 00 3 c add 0 x 3 c 0 0 3 2 8 0 ,% eax
1 c 0 0 0 5 e d : 83 c0 10 add $0x10 ,% eax
1 c 0 0 0 5 f 0 : a3 40 31 00 3 c mov % eax ,0 x 3 c 0 0 3 1 4 0 r e t u r n c ;
1 c 0 0 0 5 f 5 : a1 40 31 00 3 c mov 0 x 3 c 0 0 3 1 4 0 ,% eax }
51
Repetition Computer Architecture
Storage Classes
From Source Code To Executable Code
Construction of an Executable
Relocation Process
52
Relocation Of An Assembler Instruction
During the linking process relocated addresses are injected in the code, for example the assignment b = 5;
B e f o r e r e l o c a t i o n ( r e l o c a t a b l e ‘ c o m p i l e . o ‘):
18: c7 05 00 00 00 00 05 m o v l $0x5 ,0 x0
1 c 0 0 0 5 d 8 : c7 05 80 32 00 3 c 05 m o v l $0x5 ,0 x 3 c 0 0 3 2 8 0 A f t e r r e l o c a t i o n ( e x e c u t a b l e ‘ compile ‘):
The proper address for b can be found in the symbol table.
S Y M B O L T A B L E : ( c o m p i l e )
3 c 0 0 3 2 8 0 g O . bss 0 0 0 0 0 0 0 4 b
I The symbol table for compile yields 3c003280 for variable b
53
Relocation Of An Assembler Instruction (cont.)
? How to find the right places in the machine code to perform the substitutions?
I Linker has relocation record (relative address) of b
R E L O C A T I O N R E C O R D S FOR [. t e x t ]: ( c o m p i l e . o ) 0 0 0 0 0 0 1 a R _ 3 8 6 _ 3 2 b
I Linker has absolute address of main from symbol table
S Y M B O L T A B L E : ( c o m p i l e )
3 c 0 0 3 2 8 0 g O . bss 0 0 0 0 0 0 0 4 b 1 c 0 0 0 5 c 0 g F . t e x t 0 0 0 0 0 0 5 a m a i n
54
Relocation Of An Assembler Instruction (cont.)
Putting it all together:
R E L O C A T I O N R E C O R D S FOR [. t e x t ]: ( c o m p i l e . o ) 0 0 0 0 0 0 1 a R _ 3 8 6 _ 3 2 b ( r e l a t i v e o f f s e t ) S Y M B O L T A B L E : ( c o m p i l e )
3 c 0 0 3 2 8 0 g O . bss 0 0 0 0 0 0 0 4 b ( abs . a d d r e s s of b ) 1 c 0 0 0 5 c 0 g F . t e x t 0 0 0 0 0 0 5 a m a i n ( abs . a d d r e s s of m a i n )
Computing the address where substitution must be performed:
1 c 0 0 0 5 c 0 + 0 0 0 0 0 0 1 a = 1 c 0 0 0 5 d a
18: c7 05 00 00 00 00 05 m o v l $0x5 ,0 x0
1 c 0 0 0 5 d 8 : c7 05 80 32 00 3 c 05 m o v l $0x5 ,0 x 3 c 0 0 3 2 8 0