Compiler and Language Processing Tools
Summer Term 2011Introduction
Prof. Dr. Arnd Poetzsch-Heffter
Software Technology Group TU Kaiserslautern
Prof. Dr. Arnd Poetzsch-Heffter Compilers 1
Introduction
Outline
1. Introduction
Language Processing Tools Application Domains
Tasks of Language-Processing Tools Examples
2. Language Processing
Terminology and Requirements Compiler Architecture
3. Compiler Construction
Introduction Language Processing Tools
Language processing tools
• Processing of source texts in (source) languages • Analysis of (source) texts
• Translation to target languages
Prof. Dr. Arnd Poetzsch-Heffter Compilers 3
Introduction Language Processing Tools
Language processing tools (2)
Typical source languages• Programming languages: C, C++, C#, Java, Scala, Haskell, ML,
Smalltalk, Prolog
• Script languages: JavaScript, bash
• Languages for configuration management: make, ant
• Application and tool-specific languages: Excel, JFlex, CUPS • Specification languages: Z, CASL, Isabelle/HOL
• Formatting and data description languages: LaTeX, HTML, XML • Design and architecture description languages: UML, SDL, VHDL,
Introduction Language Processing Tools
Language processing tools (3)
Typical target languages
• Assembly, machine, and bytecode languages • Programming language
• Data and layout description languages • Languages for printer control
• ...
Prof. Dr. Arnd Poetzsch-Heffter Compilers 5
Introduction Language Processing Tools
Language processing tools (4)
Language implementation tasks
• Tool support for language processing • Integration into existing systems • Connection to other systems
Introduction Application Domains
Application domains
• Programming environments
I Context-sensitive editors, class browers I Graphical programming tools
I Pre-processors
I Compilers
I Interpreters I Debuggers
I Run-time environments (loading, linking, execution, memory
management)
Prof. Dr. Arnd Poetzsch-Heffter Compilers 7
Introduction Application Domains
Application domains (2)
• Generation of programs from design documents (UML) • Program comprehension, re-engineering
• Design and implementation of domain-specific languages I Robot control
I Simulation tools
I Spread sheets, active documents • Web technology
I Analysis of Web sites
I Active Web sites (with integrated functionality) I Abstract platforms, e.g. JVM, .NET
Introduction Application Domains
Related fields
• Formal languages, language specification and design • Programming and specification languages
• Programming, software engineering, software generation,
software architecture
• System software, computer architecture
Prof. Dr. Arnd Poetzsch-Heffter Compilers 9
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools
Analyser Translation Interpreter
Source Code Source Code
Target Code Analysis Results Source Code Input Data Output Data
Analysis, translation and interpretation are often combined.
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (2)
1. Translation
I Compiler implements analysis and translation I OS and real machine implement interpretation
Pros:
I Most efficient solution
I One interpreter for different programming languages I Prerequisite for other solutions
Prof. Dr. Arnd Poetzsch-Heffter Compilers 11
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (3)
2. Direct interpretation
I Interpreter implements all tasks.
I Examples: JavaScript, command line languages (bash) I Pros: No translation necessary (but analysis at run-time)
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (4)
3. Abstract and virtual machines
I Compiler implements analysis and translation to abstract machine
code
I Abstract machine works as interpreter I Examples: Java/JVM, C#, .NET
I Pros:
• Platform independent (portability, mobile code)
• Self-modifing programs possible
4. Other combinations
Prof. Dr. Arnd Poetzsch-Heffter Compilers 13
Introduction Examples
Example: Analysis
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 8 package b1_1 ; class Weltklasse extends Superklasse implement BesteBohnen {Qualifikation studieren ( Arbeit schweiss) { return new Qualifikation ();}} Beispiel: (Analyse) javac-Analysator S u p e rk la s s e .c la ss Q u a lif ik a tio n .c la ss A rb e it .c la s s B e s te B o h n e n .c la s s ... b1_1/Weltklasse.java:4: '{' expected. extends Superklasse ^ 1 errorIntroduction Examples
Example: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 9 package b1_1; class Weltklasse extends Superklasse implements BesteBohnen { Qualifikation studieren ( Arbeit schweiss ) { return new Qualifikation(); }} Beispiel 1: (Übersetzung) javac S u p e rk la s s e .c la ss Q u a lif ik a tio n .c la ss A rb e it .c la s s B e s te B o h n e n .c la s s ...Compiled from Weltklasse.java class b1_1/Weltklasse extends ... implements ... { b1_1/Weltklasse(); b1_1.Qualifikation studieren(...); } Method b1_1/Weltklasse() ...
Method b1_1.Qualifikation studieren(...) ...
Prof. Dr. Arnd Poetzsch-Heffter Compilers 15
Introduction Examples
Example: Translation (2)
Result of translation17. 04. 2007 10 © A . P o e tz sch -H e fft e r, T U K a ise rsl a u te rn B e isp ie l 1 : (F o rts e tz u n g )Compiled from Weltklasse.java class b1_1/Weltklasse extends b1_1.Superklasse implements b1_1.BesteBohnen { b1_1/Weltklasse(); b1_1.Qualifikation studieren(b1_1.Arbeit); } Method b1_1/Weltklasse() 0 aload_0
1 invokespecial #6 <Method b1_1.Superklasse()> 4 return
Method b1_1.Qualifikation studieren(b1_1.Arbeit) 0 new #2 <Class b1_1.Qualifikation>
3 dup
4 invokespecial #5 <Method b1_1.Qualifikation()> 7 areturn
Introduction Examples
Example 2: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 11
int main() {
printf("Willkommen zur Vorlesung!"); return 0; } Beispiel 2: (Übersetzung) gcc .file "hello_world.c" .version "01.01" gcc2_compiled.: .section .rodata .LC0:
.string "Willkommen zur Vorlesung!" .text .align 16 .globl main .type main,@function main: pushl %ebp movl %esp,%ebp subl $8,%esp ...
Prof. Dr. Arnd Poetzsch-Heffter Compilers 17
Introduction Examples
Example 2: Translation (2)
Result of translation 17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 12 Beispiel 2: (Fortsetzung) .file "hello_world.c" .version "01.01" gcc2_compiled.: .section .rodata .LC0:.string "Willkommen zur Vorlesung!" .text .align 16 .globl main .type main,@function main: pushl %ebp movl %esp,%ebp subl $8,%esp addl $-12,%esp pushl $.LC0 call printf addl $16,%esp xorl %eax,%eax jmp .L2 .p2align 4,,7 .L2: movl %ebp,%esp popl %ebp ret .Lfe1: .size main,.Lfe1-main
.ident "GCC: (GNU) 2.95.2 19991024 (release)"
Introduction Examples
Example 3: Translation
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 13 Beispiel 3: (Übersetzung) \documentclass{article} \begin{document} \vspace*{7cm}\centerline{\Huge\bf It‘s groovy} \end{document}
groovy.tex (104 bytes)
...
groovy.dvi (207 bytes, binary)
%!PS-Adobe-2.0 %%Creator: dvips(k) 5.86 ... %%Title: groovy.dvi ... groovy.ps (7136 bytes) latex dvips
Prof. Dr. Arnd Poetzsch-Heffter Compilers 19
Introduction Examples
Example: Interpretation
Beispiel: (Interpretation) ... 14 iload_1 15 iload_2 16 idiv 17 istore_3 ... .class-Datei Eingabedaten Ausgabedaten ... 14 iload_1 15 iload_2 16 idiv 17 istore_3 ...Java Virtual Machine (JVM)
Input Data
Output Data .class File
Introduction Examples
Example: Combined technique
Java implementation with just-in-time (JIT) compiler
17.04.2007 © A. Poetzsch-Heffter, TU Kaiserslautern 15
Kombinierte Implementierungstechnik: Java-Implementierung mit JIT-Übersetzer
Java-Überset-zungseinheit javac Analysator Übersetzer Eingabedaten
Java Byte Code .class-Datei
Ausgabedaten JIT-Übersetzer
JVM
Maschinencode reale Maschine/Hardware (JIT=Just in time)
Beispiel: (Kombinierte Technik)
Java Source Code Unit Analyzer Translator Input Data Output Data .class file JIT Translator
Machine Code Real Machine / Hardware
Prof. Dr. Arnd Poetzsch-Heffter Compilers 21
Language Processing Terminology and Requirements
Language processing: The task of translation
Translator Source Code
Error Message or Target Code
• Translator (in a broader sense):
Analysis, optimization and translation
• Source code:
Input (string) for translator in syntax of source language (SL)
• Target Code:
Output (string) of translator in syntax of target language (TL)
Language Processing Terminology and Requirements
Phases of language processing
• Analysis of input: I Program text I Specification I Diagrams
• Dependant on target of implementation I Transformation (XSLT, refactoring) I Pretty printing, formatting
I Semantic analysis (program comprehension) I Optimization
I (Actual) translation
Prof. Dr. Arnd Poetzsch-Heffter Compilers 23
Language Processing Terminology and Requirements
Compile time vs. run-time
• Compile time: during run-time of compiler/translator
Static: All information/aspects known at compile time, e.g.:
I Type checks
I Evaluation of constant expressions I Relative addresses
• Run-time: during run-time of compiled program
Dynamic: All information that are not statically known, e.g.:
I Allocation of dynamic arrays I Bounds check of arrays I Dynamic binding of methods
I Memory management of recursive procedures
For dynamic aspects that cannot be handled at compile time, the
Language Processing Terminology and Requirements
What is a
good compiler
?
Prof. Dr. Arnd Poetzsch-Heffter Compilers 25
Language Processing Terminology and Requirements
Requirements for translators
• Error handling (static/dynamic) • Efficient target code
• Choice: Fast translation with slow code
vs. slow translation with fast code
• Semantically correct translation
Language Processing Terminology and Requirements
Semantically correct translation
Intuitive definition: Compiled program behaves according to language definition of source language.
Formal definition:
• semSL: SL_Program × SL_Data → SL_Data • semTL: TL_Program × TL_Data → TL_Data • compile: SL_Program → TL_Program
• code: SL_Data → TL_Data • decode: TL_Data → SL_Data Semantic correctness:
semSL(P,D) = decode(semTL(compile(P), code(D)))
Prof. Dr. Arnd Poetzsch-Heffter Compilers 27
Language Processing Compiler Architecture
Compiler Architecture
Scanner Source Code as String Token Stream ParserName and Type Analysis Translator Code Generator Syntax Tree Decorated Syntax Tree (Close to SL) Intermediate Language Target Code as String Attribution & Optimization Attribution & Optimization Peep Hole Optimization Analysis Synthesis
Language Processing Compiler Architecture
Properties of compiler architectures
• Phases are conceptual units of translation • Phases can be interleaved
• Design of phases depends on source language, target language
and design decisions
• Phase vs. pass (phase can comprise more than one pass.) • Separate translation of pogram parts
(Interface information must be accessible.)
• Combination with other architecture decisions:
Common intermediate language
Prof. Dr. Arnd Poetzsch-Heffter Compilers 29
Language Processing Compiler Architecture
Common intermediate language
Source Language 1 Source Language 2 Source Language n Intermediate Language Target Language 1 Target Language 2 Target Language m ... ...
Language Processing Compiler Architecture
Dimensions of compiler construction
• Programming languages
I Sequential procedural, imperative, OO-languages I Functional, logical languages
I Parallel languages/language constructs • Target languages/machines
I Code for abstract machines I Assembler
I Machine languages (CISC, RISC, ...) I Multi-processor/multi-core architectures I Memory hierarchy
• Translation tasks: analysis, optimization, synthesis
• Construction techniques and tools: bootstrapping, generators • Portability, specification, correctness
Prof. Dr. Arnd Poetzsch-Heffter Compilers 31
Compiler Construction
Compiler construction techniques
1. Stepwise construction
I Construction with compiler for different language I Construction with compiler for different machine I Bootstrapping
2. Compiler-compiler: Tools for compiler generation
I Scanner generators (regular expressions) I Parser generators (context-free grammars)
I Attribute evaluation generators (attribute grammar) I Code generator generators (machine specification) I Interpreter generators (semantics of language) I Other phase-specific tools
3. Special programming techniques
I General technique: syntax-driven I Special technique: recursive descend
Compiler Construction
Stepwise construction
Programming typically depends on an existing compiler for the
implementation language. For compiler construction, this does not hold in general.
Source, target, and implementation languages of compilers can be denoted in T-diagrams.
CL
SL TL
T-diagram denotes compiler from source language SL to target language TL (SL → TL compiler) written in language CL.
Prof. Dr. Arnd Poetzsch-Heffter Compilers 33
Compiler Construction
Construction with compiler for different language
• Given: C → ML (machine language) compiler in ML • Construct: SL → ML compiler in ML
• Solution: Develop SL → ML compiler in C, translate that compiler
from C →ML by using the existing C → ML compiler
C SL ML ML C ML ML SL ML to be developed existing by translation
Compiler Construction
Construction with compiler for different machine
• Construct: C → ML1 compiler in ML1 • Given
1. C → ML1 compiler in C
2. C → ML2 compiler in ML2
• Method: construct cross compiler
First step C C ML1 ML2 C ML2 ML2 C ML1 cross compiler given given
Prof. Dr. Arnd Poetzsch-Heffter Compilers 35
Compiler Construction
Construction with compiler for different machine (2)
Second step C C ML1 ML2 C ML1 ML1 C ML1 resulting compiler given cross compiler
Compiler Construction
Bootstrapping
• Construct: SL → ML compiler in ML • Suppose: yet no compiler exists • Method:
1. Construct partial language SLi of SL such that
SL0 ⊂ SL1 ⊂ SL2 ⊂ . . . ⊂ SL
2. Implement SL0 compiler for MLin ML
3. Implement SLi+1 compiler for MLin SLi 4. Create SLi+1 compiler for ML in ML
Prof. Dr. Arnd Poetzsch-Heffter Compilers 37
Compiler Construction
Bootstrapping (2)
SL0 SL1 ML ML SL0 ML ML SL1 ML SL1 SL2 ML SL2 SL ML ML SL2 ML ML SL ML manually by extension by translationCompiler Construction
Recommended reading
Wilhelm, Maurer:
• Chap. 1, Introduction (pp. 1–5)
• Chap. 6, Structure of Compilers (pp. 225 – 238)
Appel
• Chap. 1, Introduction (pp. 3 – 14)