Lecture 14
Project, Assembler and Exam
Emma S¨ oderberg
Revised by Emma S¨ oderberg on March 5, 2013.
Based on slides by G¨ orel Hedin and Lennart Andersson.
EDA180: Compiler Construction F14-1
Compiler phases and program representations
Lexical analysis (scanning)
Syntactic analysis (parsing)
Semantic analysis Frontend
Analysis
Immediate code generation
Optimization
Machine code generation
Backend
Synthesis Tokens
AST Attributed
AST
Intermediate code
Intermediate code Source
code
Machine code
Today
I Project
I Intel assembler
I Exam
I Repetition
I Beyond..
EDA180: Compiler Construction F14-2
Course Project
Build a compiler for your language
Standard project
In teams of 2 persons.
Prerequisites:
I Approved assignments
I Assignment supervisor may grant postponement Design a small procedural language:
I integer and boolean types
I variables, constants, expressions, statements, ...
I block structure with nested procedures
I parameters, return values, recursion
I name analysis
I type analysis
I intermediate code generation
I assembly code generation
Non-standard project
Design a language of your choice.
Must be accepted by project supervisor in advance.
Should be approximately the same size as the standard project.
Typical requirements:
I non-trivial grammar
I non-trivial name analysis
I significant semantic computations
I translation to some intermediate code
I translation to native code
EDA180: Compiler Construction F14-5
Project administration
Estimated work load: 40 hours (20-80) Administration
I Report your project group to your assignment supervisor.
I Your assignment supervisor will be your project supervisor.
I Book a meeting with your supervisor.
I Three tasks: Design, Front end, Back end.
I Three deadlines: March 24, April 22, and May 6.
(also on the course webpage) Project supervisors:
I Niklas Fors, [email protected].
I Jesper ¨ Oqvist, [email protected].
EDA180: Compiler Construction F14-6
Project administration: Repository
Git (recommended):
I Private repository – don’t assist plagiarism. View the section on ”Cooperation or Plagiarism” on the department web page.
Note that this excludes GitHub.
I BitBucket:
I
Private Git repositories
I
Academic license
I
Used by your supervisor
I Set up your own and give access to your supervisor.
Subversion:
I We can set up a repository for you.
Intel Assembler
Generate assembler from ICode
Tools: as, ld, gcc
Intel 386/486/Pentium processor architecture
General-purpose registers:
I EAX, EBX, ECX, EDX, ESI, EDI
I ESP – stack pointer
I EBP – base pointer Instruction pointer: EIP
Segment registers: ECS, EDS, EES, ESS Flags register:
I EFLAGS – 32 bits used to store results of comparisons.
EDA180: Compiler Construction F14-9
Register structure
Structure of the EAX register (bits):
31 24 23 16 15 8 7 0
AH AL
AX EAX
I AL,AH – 8-bit registers.
I AX – a 16-bit register.
I EAX – extended AX to 32 bits.
I EBX, ECX, and EDX have the same structure.
EDA180: Compiler Construction F14-10
Program example
.data # allocating memory
n: .long 234 # the number
length: .long 0 # the result
ten: .long 10 # the divisor
.text # instructions
.global _start # make _start globally known _start: movl $0, %ebx # use ebx as counter
movl n, %eax # copy number to eax nextdigit:
movl $0, %edx # prepare for long division idivl ten # divide combined edx:eax by 10
# quotient to eax addl $1, %ebx # add 1 to counter cmpl $0, %eax # compare eax to 0 jg nextdigit # jump if eax>0
movl %ebx, length # copy counter to memory
Variables may have predetermined locations in memory and be referred to by name.
Memory
Memory size:
I Every byte (b, 8 bits) has an address, 0, 1, . . .
I word (w, 16 bits)
I long (l, 32 bits) In the project:
I All variables reside on the stack.
I Memory for the stack is allocated by ld (default 2Mb).
I You will not need a .data segment!
Useful operand forms
Operand Refers to
$1448 constant 1448 (base 10) nextdigit label address
%eax value in eax
(%ebp) value at address contained in ebp 4(%ebp) value at 4 bytes after address in ebp (%ebp,%eax,4) value at ebp+4*eax
The last three forms refer to values in main memory.
EDA180: Compiler Construction F14-13
Useful instructions
Instruction Operands Effect
movl rmc32, rm32 rm32 ← rmc32
addl rmc32, rm32 rm32 ← rm32+rmc32
subl rmc32, rm32 rm32 ← rm32-rmc32
negl rm32 rm32 ← -rm32
idivl rm32 eax ← edx:eax/rm32
edx ← remainder
notl rm32 rm32 ← ! rm32, bitwise, false = 0 andl rmc32, rm32 rm32 ← rm32 & rmc32, bitwise orl rmc32, rm32 rm32 ← rm32 | rmc32, bitwise cmpl rmc32
1, rmc32
2compare by computing
rmc32
2-rmc32
1leal m32, r32 r32 ← address denoted by m32
Operand types: r – register, m – memory, c – constant An instruction can have at most one memory (m) operand.
EDA180: Compiler Construction F14-14
Conditional and jump instructions
The result of comparisons (compl) end up in the EFLAGS register and may be used by succeeding instructions.
Condition codes (cc) set by the compl instruction:
l le e ne g ge
< ≤ = 6= > ≥ Jumps may be conditional:
jmp dest jump unconditionally je dest jump if equal
jg dest jump if greater
jcc dest jump if cc (conditional code) Other conditional instructions:
setcc rm8 rm8 = cc ? 1 : 0 cmovcc rm32, r32 r32 = rm32 if cc
Stack instructions
Instruction Operand Effect
pushl rmc32 push value in rmc32
popl rm32 pop to rm32
Example:
pushl %ebx
Stack before:
value
← towards address 0 Stack after:
ebx value value
Procedure calls
Instruction Operands Effect
call c32 push return address and jump
ret pop return address and jump
int c32 interrupt to kernel
Example:
call p # will push address of next instruction ...
p:
...
ret # will pop address and jump
EDA180: Compiler Construction F14-17
C compiler conventions
I Arguments are pushed on the stack in reverse order in the caller’s activation record.
I Caller pops arguments after return.
I Callee must restore EBX, ESI, EDI, ESP, and EBP before returning.
I EAX is used for return values.
EDA180: Compiler Construction F14-18
Debugging assembler
The ddd debugger (gdb):
Inspect memory Inspect registers Step
through program
The Exam
The exam
Regular exam: Wednesday March 13, 8-13, Sparta:D.
Next exam: Friday August 30, 8-13, Victoriastadion 1A.
One week advance registration is required for the August exam.
Allowed material at the exam:
I Manual page on JastAdd syntax.
I ICode reference.
I Dictionary between English and your native language.
Bonus points from the seminar exercises:
I Are counted at both the above examination dates, but not next year.
Prerequisites for writing the exam:
I Approved assignments.
I Assignment supervisor may grant postponement.
EDA180: Compiler Construction F14-21
Old exams
See the course web site, but note that . . .
I from 2008 a slightly different intermediate code is used.
I in 2003 and earlier, a slightly different JastAdd notation was used.
Now, walk-through of the exam from 2007-03-06 . . .
EDA180: Compiler Construction F14-22
Exam: Problem 1 – Lexical analysis
According to the Java Language Specification, an identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter. Assume that a Java letter is one of a–z, A–Z, and that a Java digit is one of 0–9.
According to the Java code conventions a class identifier should start with a capital letter, a method name should start with a small letter, and all letters should be capital in a constant name.
a. Specify regular expressions for class and method identifiers according to the Java code conventions. You may use [a-z], but not more complex ranges like [a-zA-Z], as a regular expression denoting the language of all strings with one character from the specified range.
b. An identifier cannot have the same spelling as the null literal.
Construct a DFA recognizing class and method identifiers according to the Java code conventions and the literal null with distinct final states.
Exam: Problem 2 – Parsing
A qualified identifier in Java adheres to the grammar qualifiedID → qualifiedID ”.” qualifiedID qualifiedID → ID
where ID is an identifier token.
a. This grammar is ambiguous. Provide a string that has two different parse trees and draw the trees.
b. Construct an equivalent grammar on canonical form that is unambiguous.
c. Consider the language of all strings generated by the first grammar followed by a $ token. Construct a canonical LL(1) grammar for this language and present the LL(1) table.
d. Specify an equivalent EBNF grammar for the first grammar that is not
recursive and requires just 1 token lookahead.
Exam: Problem 3 – Semantic analysis
Consider the following fragment of an abstract grammar.
ProcedureDecl ::= Type <ID> Parameters Stmt;
abstract Stmt;
Assignment: Stmt ::= <ID> Expr;
IfStmt: Stmt ::= Expr Then:Stmt Else:Stmt;
Return: Stmt ::= Expr;
StmtList: Stmt ::= Stmt*;
a. Every execution path through the procedure block must terminate with a return statement. Construct a .jadd file with a method that checks this. Note that the following concrete program should not generate an error message.
integer fac(integer n) { if (n==0) {
return 1;
} else {
return n*fac(n-1);
} }
EDA180: Compiler Construction F14-25
Exam: Problem 3 – Semantic analysis
b. Assume that there is a traversing visitor:
class TraversingVisitor implements Visitor { ...
Object visit(IfStmt node, Object data) { node.getExpr().accept(this, data);
node.getThen().accept(this, data);
node.getElse().accept(this, data);
return null;
} ...
}
Construct a subclass of this class that provides a method
static int numberOfReturns(ProcedureDecl node)
that will return the number of return statements in the node argument.
EDA180: Compiler Construction F14-26
Exam: Problem 4 – Code generation and run-time system
You are going to generate intermediate code for the printR procedure in
void main() { int n;
void printR(int k); { if (k >= 0) {
printR(k-1);
print(k);
} }
n = read();
printR(n);
}
Exam: Problem 4 – Code generation and run-time system
a. Introduce a Print instruction in ICode that can be used for the print statement in the example. You should specify the abstract and the context-free grammars.
b. What code should be generated for printR? Assume the same activation record layout as in the lectures, i.e. header, local variables, and temporaries, and that arguments are pushed on the stack by the caller.
You must not replace the recursive calls by iteration. You must use a labeling scheme that would avoid name clashes in more complex examples.
c. Draw a diagram showing the stack of activation records just before
k=0 is printed for the case that n=2. You should indicate where the
dynamic and static links point and the values of variables, parameters,
and temporaries. The static links should be correct even if they are not
used in this example.
Repetition
F14-F01
EDA180: Compiler Construction F14-29
F14: Machine code generation
Overall knowledge about:
I Machine architecture with CPU, registers, and memory.
EDA180: Compiler Construction F14-30
F13: Optimization
SSA form (Static Single Assignment)
I A powerful representation for optimization.
Typical optimizations at the intermediate code level:
I Dominance analysis.
I Copy propagation.
I Constant propagation.
I . . .
Typical optimizations at the machine code level:
I Register allocation.
I Instruction scheduling (to take advantage of pipelining).
F12: Memory Management
Overall knowledge:
I The difference between manual and automatic memory management.
I Terminology: fragmentation, memory leak, dangling pointer, compaction, root pointer, . . .
I Main ideas in the main algorithms: reference counting, mark-sweep, copying, generation-based, conservative, incremental, . . .
I Main benefits and drawbacks of the different algorithms.
You don’t have to:
I Memorize the details of the algorithms.
F11: Intermediate Code
You should know:
I What different kinds of intermediate code are there?
I Why temporary variables are needed and how they are handled.
I Advantages of using intermediate code.
I Difference between intermediate code and machine code.
I Difference between a virtual machine and a real machine.
I Translate a program to ICode.
I How to implement code generation based on the AST.
You don’t have to:
I Memorize the details of ICode — you may use the ICode reference on the exam.
EDA180: Compiler Construction F14-33
F10: Run-time systems
You should know:
I Terminology: activation record, stack, stack pointer, frame pointer, static link, dynamic link, return address, object, heap, heap pointer, . . .
I How procedure calls work, with parameter and return value transmission.
I How object creation works.
I How local and non-local variables in procedures are accessed.
I How different kinds of variables are accessed in an OO language.
I What v-tables are and how they are used in OO languages for method calls.
I Draw the execution state at a given point in a given program.
EDA180: Compiler Construction F14-34
F9: Attribute grammars
You should understand:
I General idea.
I What is the difference between inherited and synthesized attributes?
You should be able to:
I Compute values for synthesized and inherited attribute for a given attribute grammar.
I Make name analysis using synthesized and inherited attributes.
F8: Name and type analysis
You should know:
I Terminology: name analysis, type analysis, scope, block,
homogeneous blocks, declaration-before-use, bindings, symbol table, . . .
I Different kinds of scope rules.
I The difference between IdDecls and IdUses.
I How to implement name analysis based on the AST.
I Typical kinds of errors that can occur during compilation, and what
different compiler phases they are identified in.
F7: LR parsing
You should understand:
I The principles for how an LR parser works, LR items.
I Why LR is more powerful than LL.
I Typical kinds of unambiguous grammars that can be handled by an LR parser but not by an LL parser.
I Shift and reduce actions.
I What is meant by a Shift/Reduce or Reduce/Reduce conflict?
EDA180: Compiler Construction F14-37
F6: AST computations, AOP, The visitor pattern
You should know:
I The Visitor pattern and how to use it.
I Intertype declarations (static Aspect-oriented programming) and how to use them.
I The benefits and drawbacks of these techniques, compared to each other and compared to writing tangled code.
I Implement various computations using Visitors and Intertype declarations, e.g., unparsing, metrics, interpretation, name analysis, type checking, computation of information needed for code generation, . . .
EDA180: Compiler Construction F14-38
F5: Nullable, First and Follow, ... Abstract syntax trees
You should know:
I The principles for how an LL parser works.
I Intuitive definitions: nullable, FIRST, FOLLOW.
I Construct the nullable, FIRST, and FOLLOW tables for any CFG.
I Construct the LL(1) table for a CFG.
I decide if a grammar is LL(1) or not.
I The difference between a parse tree and an abstract syntax tree.
I The difference between a CFG and an abstract grammar.
I How to design an object-oriented abstract grammar with good names.
F5: Nullable, First and Follow, ... Abstract syntax trees
You should know:
I Write down an abstract grammar using the JastAdd notation.
I How to build ASTs using semantic actions.
I How to build the AST when an LL parser is used.
You don’t have to:
I Memorize the API for generated JastAdd classes — you may use the JastAdd manual page on the exam.
I Memorize the JJTree way for building ASTs.
F4: LL Parsing
You should know:
I The different names for LL parsing.
I How to implement an LL parser by hand using recursive procedures.
I Typical kinds of grammars that an LL(1) parser cannot accept.
I Given a CFG with some of these typical problems, construct an equivalent CFG that is LL(1).
I What is the difference between local lookahead and global lookahead?
I What the “dangling else” problem is and how to handle it in an LL parser generator.
I Why it is sometimes useful to extend a CFG by an EOF-rule, and how to do it.
EDA180: Compiler Construction F14-41
F4: LL Parsing
You should know:
I What is meant by ambiguous and unambiguous grammars.
I Given an ambiguous grammar for expressions, construct an equivalent unambiguous grammar (given associativity and precedence rules).
I Typical kinds of unambiguous grammars that cannot be handled by an LL(1) parser.
I When could such grammars be LL(k)?
I Construct equivalent grammars that are LL(1).
EDA180: Compiler Construction F14-42
F3: Context-free grammars and Parsing
You should know:
I How to design a clear and simple CFG for a language (disregarding ambiguities, non-LL-ness, etc.).
I Terminology: terminals, nonterminals, productions, start symbol.
I The formal definition of a CFG, G = (N, T , P, S ), and what it means.
I The different notation forms for CFGs.
I Given a grammar on EBNF form, how to construct an equivalent grammar on canonical form, and vice versa.
I What is meant by (leftmost/rightmost) derivation.
I Show that a string belongs to a given language REs.
I Typical notation for regular expressions.
I The difference between REs and CFGs.
F2: Regular expressions and Scanning
You should know:
I Typical kinds of tokens and non-tokens.
I How to define typical tokens and non-tokens using regular expressions.
I What typical ambiguities may occur for a set of token definitions?
I How can such ambiguities be resolved?
I What a finite automaton (FA) is.
I The difference between a deterministic and nondeterministic FA.
I How to translate an NFA to a DFA.
I How to implement a scanner based on FAs, including handling
ambiguities between regular expressions.
F1: Introduction
You should know:
I The typical phases in a compiler.
I The typical representations of a program inside a compiler.
I The separation into analysis and synthesis.
I The separation into front end and back end.
I Typical applications of compiler construction techniques (in addition to the typical source-to-machine code compiler).
EDA180: Compiler Construction F14-45
Beyond ..
Examples of compiler-related research:
I Development of programming editors – textual and graphical.
I Evaluation of reference attributes – incremental/parallel.
I Optimizing compilers for multiprocessors.
I . . .
Examples of compiler-related Master’s thesis projects:
I Extend the Java language – Java 7, Lambda expressions . . .
I Develop IDE for the Modelica Language (Modelon/Ideon)
I Optimize the JModelica compiler (Modelon/Ideon)
I . . .
Let us now if you are interested in a Master’s thesis or PhD thesis project!
EDA180: Compiler Construction F14-46