1.7
An Example of Compilation
In the rest of this chapter I show, very briefly, the action of some of the phases of compilation taking as example input the program of figure 1.4. First of all the characters of the program are read in, then these characters are lexically analysed into separate items. Some of the items will represent source program identifiers and will include a pointer to the symbol table.3 Figure 1.5 shows
part of the results of lexical analysis: in most languages subdivision of the input into items can be performed without regard to the context in which the name occurs.
Next, the syntax analyser examines the sequence of items produced by the lexical analyser and discovers how the items relate together to form ‘phrase’ fragments, how the phrases inter-relate to form larger phrases and so on. The most general description of these relationships of the fragments is a tree, as shown in figure 1.6.
Each node of the tree includes a tag or type field which identifies the kind of source program fragment described by that node, together with a number of pointers to nodes which describe the subphrases which make it up. Figure 1.7 shows how part of the tree shown in figure 1.6 might actually be represented as a data structure. There are of course a number of different ways of representing the same tree, and that shown in figure 1.7 is merely an example. There are as many different ways of drawing the structure, and throughout most of this book I shall show trees in the manner of figure 1.6. The only significant differences between the two picturings is that, in the first, nodes aren’t shown as sequences of boxes and names aren’t shown as pointers to the symbol table. Figure 1.7 is perhaps more faithful to reality, so it should be borne in mind whenever you encounter a simplified representation like that in figure 1.6.
After syntax analysis, the object description phase takes the tree and the symbol table entries produced by the lexical analyser. It analyses the declarative nodes in the tree, producing descriptive information in the symbol table as shown in figure 1.8. Standard procedures, such as ‘print’, may receive a default declara- tion before translation: figure 1.8 shows a possible entry in the symbol table. Note that, since the tree contains a pointer to the symbol table in each node which contains a reference to an identifier, neither object description phase nor translator need search the symbol table but need merely to follow the pointer to the relevant entry.
After the object description phase has filled in the descriptors in the symbol table, a simple translation phase can take the tree of figure 1.7 together with the symbol table of figure 1.8 and produce an instruction sequence like that shown in figure 1.9.4
3 The item which represents a number may also contain a pointer to a table which contains
a representation of the number. For simplicity figures 1.5 and 1.6 show the value of the number as part of the lexical item itself.
18 CHAPTER 1. PHASES AND PASSES The addresses used in the instructions need finally to be relocated by the loader. Suppose that, when the program is loaded into store, its memory cell space starts at address 23. Then the first line of figure 1.9 would be converted into ‘LOAD 1, 24’ and the last line into ‘STORE 1, 23’.
Optimisation of the object code in this case could produce an enormous im- provement in its execution efficiency. The total effect of the program is to print the number ‘4’. By looking at the way in which the values of variables are used throughout the program and by deferring translation of assignment statements until their result is required – in this case they are never required – an optimi- sation phase could reduce the program just to the single statement ‘print(4)’. Optimisation is a mighty sledgehammer designed for bigger nuts than this ex- ample, of course, and it is always an issue whether the expense of optimisation is worth it: in the case of figure 1.4 it would certainly cost more to optimise the program than it would to run it!
Summary
The underlying organisation of compilers is simple and modular. This chapter discusses how the various phases cooperate so that later chapters can concen- trate on the separate phases in isolation.
Input and lexical analysis is discussed in chapters 4 and 8; syntax analysis in chapters 3, 16, 17 and 18; object description in chapter 8; translation in chapters 5, 6, 7 and 9; optimisation in chapter 10; loading in chapter 4; run-time support in chapters 11, 12, 13 and 14; run-time debugging in chapter 20.
Chapter 2
Introduction to Translation
The most important task that a compiler performs is to translate a program from one language into another – from source language to object language.Simple translationis a mechanism which takes a representation of a fragment of
the source program and produces an equivalent fragment in the object language – a code fragment which, when executed by the object machine, will perform the operations specified by the original source fragment.
Since the object program produced by a simple translator consists of a sequence of relatively independent object code fragments, it will be less efficient than one produced by a mechanism which pays some attention to the context in which each fragment must operate. Optimisationis a mechanism which exists to cover up the mistakes of simple translation: it translates larger sections of program than the simple translator does, in an attempt to reduce the object code inefficiencies caused by poor interfacing of code fragments.
In order to be able to produce object code phrases the translator must have access to asymbol tablewhich provides a mapping from source program names to the run-time objects which they denote. This table is built by the lexical analyser (see chapters 4 and 8) which correlates the various occurrences of each name throughout the program. The mapping to run-time objects is provided by the object description phase (see chapter 8) which processes declarative in- formation from the source program to associate each identifier in the symbol table with a description of a run-time object.
This chapter introduces the notion of a ‘tree-walking’ translator, which I believe is a mechanism that is not only easy to construct but which can readily and reliably produce efficient object code fragments. Such a translator consists of a number of mutually recursive procedures, each of which is capable of translating one kind of source program fragment and each of which can generate a variety of different kinds of object code fragments depending on the detailed structure of the source fragment which is presented to it.
Because the process of translation depends on the selection at each point of one 19
20 CHAPTER 2. INTRODUCTION TO TRANSLATION Statement:
if hour*60+minute=1050 or tired then leave(workplace)
conditional statement [EXPRESSION] Boolean or [STATEMENT] procedure call [LEFT] relation = [RIGHT] name tired [LEFT] arithmetic + [RIGHT] number [PROCEDURE] name [ARGUMENTS] name leave workplace 1050 [LEFT] arithmetic * [RIGHT] name minute [LEFT] name hour [RIGHT] number 60
Figure 2.1: Tree describing a simple statement
of a number of possible code fragments translators tend to be voluminous, but since the selection of a fragment is fairly simple and generating the instructions is very straightforward, they tend to run very rapidly. A simple translator will usually use less than 15% of the machine time used by all phases of the compiler together, but will make up more than 50% of the compiler’s own source code.
2.1
Phrases and Trees
In this chapter I introduce the technical termphrase, which is used to describe a logically complete fragment of source program. Statements, for example, are phrases; so are expressions. Phrases are either ‘atomic’ – e.g. a name, a number – or are made up of sub-phrases – e.g. a statement may have a sub-phrase which is an expression, the statement itself may be a sub-phrase of a block which is a sub-phrase of a procedure declaration, and so on. In this chapter the largest phrase shown in the examples is an expression or a simple structured statement. In a multi-pass compiler the unit of translation is the entire program – at least
2.1. PHRASES AND TREES 21 cond or = + * call numb 1050 numb 60 tired (in symbol table)
leave (in symbol table)
workplace (in symbol table)
minute (in symbol table)
hour (in symbol table)
22 CHAPTER 2. INTRODUCTION TO TRANSLATION at the topmost level – but a single page isn’t large enough to show the necessary trees!
The state of the art in programming language translation is now, and will remain for some time, that a program can only be translated if it is first analysed to find how the phrases inter-relate: then the translator can consider each phrase in turn. The result of the analysis shows how the program (the phrase being translated) can be divided into sub-phrases and shows how these sub-phrases inter-relate to make up the entire phrase. For each sub-phrase the description shows how it is divided into sub-sub-phrases and how they are inter-related. The description continues to show the subdivision of phrases in this way until the atomic phrases – theitemsof the source text – are reached.
The most general representation of the results of such an analysis is a tree like that in figure 2.1. The lines arebranches, the place where branches start and finish are nodes. The topmost node of the tree is its root – the tree is conventionally drawn ‘upside down’. Nodes which don’t divide into branches
areleavesand represent the atomic items of the source program. Each phrase of
the source program is represented by a separate node of the tree. To translate a phrase it is merely necessary to concentrate on one node and, if it has sub-nodes, to translate its sub-nodes as well.
Figure 2.2 shows a possible data-structure representation of the tree of figure 2.1 using record-vectors and pointers. Many other representations are possible – for example one in which each pointer is represented by a subscript of a large integer vector. The tree structure, no matter how it is represented, gives the translator the information it needs in order to translate the statement – it shows which phrases are related, and how they are related. From the source statement it isn’t immediately obvious what the object program should do. Should it, for example, evaluate ‘hour*60’ ? Must it evaluate ‘minute=1050’ ? What about ‘1050ortired’ ? A glance at the tree gives the answers – only the first is represented by a node because analysis has shown it to be a phrase of the program.
In order to be able to translate a phrase of a program it isessentialto know what kind of phrase it is, what its sub-phrases are and how they are inter-related. There can be non-tree-like representations of the necessary information, but the tree holds and displays the information in its most accessible and useful form. In addition it emphasises the essentially recursive nature of translation – a nature which is somewhat hidden, but not denied, in so-called ‘linear’ representations.