9.3 Record Data Structures
9.3.3 Union types and variant records
It is essential, in most record-processing programs, that a pointer in a record be allowed to refer to one of a number of different types of record-vector. ALGOL 68 allows this with ‘union’ types, PASCAL with ‘variant records’.
The union-type mechanism requires the use of ‘conformity clauses’. When ac- cessing a union-type value, the user must provide an expression which selects one of a number of sections of source program, depending on the actual type of the value. Each of the individual sections can then be translated as if the value is of a particular type within the union. The representation of a union- type object must obviously contain information about its type, and the space allocated for such an object must be large enough to hold the largest kind of object within the union.
The variant-record mechanism is less elegant. Each record in PASCAL may contain a number of variant-fields, whose values determine the layout of the record-vector. It is possible even to assign a value to a variant-field but I know of no PASCAL implementation which would thereupon reorganise the contents of the record-vector, as it should if such an assignment were properly imple-
164 CHAPTER 9. ACCESSING AN ELEMENT OF A DATA STRUCTURE mented. Access to a field of a record with variant-fields would require, if strictly interpreted, a run-time check on the values of the variant-fields to ensure that the access is valid: once again I know of no PASCAL implementation which does this.
Summary
The code fragments which perform data structure accessing are crucial to the efficiency of the object program because they so often appear in the inner loop of the source program. In many numerical analysis programs, for example, the efficiency of array access is the most important factor in determining the efficiency of the object program and in a system program the efficiency of vector and record access can be just as important.
Vector accessing is fairly cheap, array accessing relatively expensive. Record accessing is cheapest of all. While setting up the data space for a dynamic array at block entry time it is necessary to set up a dope vector to aid multiplicative addressing and possible at the same time to set up an addressing vector (Iliffe vector) structure to aid non-multiplicative access.
The unnecessary re-calculation of the address of an element of a data structure in the code generated by a tree-walking simple translator can often produce disappointing inefficiencies. However the treatment in this chapter, in chapters 5, 6 and 7 and in section III emphasises the value of tree-walking as an imple- mentation technique which encourages the compiler writer to generateaccurate code in the first instance and then allows incremental refinement of individual procedures to improve the quality of local fragments of code. The code optimi- sation techniques discussed in chapter 10 may be useful to reduce unnecessary object program activity associated with data structure access.
Chapter 10
Code Optimisation
In this book I argue that for a modern, well-designed programming language a compiler which includes a tree-walking simple translation phase can produce acceptable object code, particularly when careful attention is paid to the ‘cru- cial code fragments’ discussed in chapters 5, 6, 7 and 9 and in section III. No matter how carefully the individual code fragments are designed, though, the code produced by such a compiler can rarely be ‘optimal’ and it is always easy to spot redundant instructions in the object program or to think of ways in which a particular source construct could be better translated in a particular setting. Code optimisationtechniques attempt systematically to reduce the dis- parity between the code produced by a compiler and the code which might be generated by a very careful hand-translation of the source program. The simpler techniques, discussed in this chapter, merely re-order and merge code fragments which might be produced by simple translation, but more advanced techniques can go so far as to replace the source program’s algorithm with a more efficient algorithm of equivalent effect.
Optimisation mechanisms vary not only in their effectiveness but also in the cost of their application. Unusually heavy optimisation of a program can eas- ily double or triple its compilation time. Just where to draw the line between useful and wasteful mechanisms is a matter of taste, but most compiler-writers believe that in practice a few straightforward optimisation techniques can pro- duce most of the effect that is required at a fraction of the cost of application of more advanced and more powerful techniques. Knuth (1971), in an em- pirical investigation of programs actually submitted to a computing service, supported this intuition by showing how the mechanisms of ‘constant folding’, ‘global register allocation’, ‘deferred storage’ and ‘code motion out of loops’ could at relatively low cost provide quite substantial improvements in the effi- ciency of most object programs. This chapter accordingly discusses only these simple mechanisms (and because this book is in part a do-it-yourself aid to the construction of a simple compiler it discusses only the principles of the tech- niques rather than their detailed implementation). In the same paper Knuth
166 CHAPTER 10. CODE OPTIMISATION
Sub-optimisations
constant expression evaluation at compile-time expression re-ordering
local register allocation
common sub-expression recognition jumping code for Booleans
True Optimisations constant folding code motion
redundant code elimination global register allocation - deferred storage - index optimisation loop optimisation
strength reduction peephole optimisation
Figure 10.1: Sub-optimisation and optimisation techniques
showed how more powerful mechanisms could produce dramatic effects on the efficiency of programs, but at a much higher cost because each mechanism was applicable to only a small proportion of real-life programs.
Simple translation depends on the hierarchical relationship between source pro- gram fragments, expressed as a tree. Code optimisation mechanisms depend also upon the control flow between source program fragments, best expressed as a ‘thread’ from node to node of the tree. Wulf et al. (1975) give a detailed description of an optimiser which works in this way. This chapter gives only a brief description of optimisation mechanisms and in many cases the examples show a linearised form of the program – it is worth remembering the advantages of the threaded tree representation, however, and I give in figure 10.4 below an example which illustrates it.
In practice there is a continuum of translation mechanisms from the simplest transliteration mechanism discussed in chapter 2 to the most sophisticated op- timisation mechanisms; nevertheless in figure 10.1 I have tried to draw a line between ‘sub-optimisation’ and ‘optimisation’ proper. I have classified as a sub-optimisation every technique which relates to the translation of a single expression, considered in isolation from the code fragments which surround it. Most language definitions (with the notable exception of ALGOL 60) state that a programmer cannot rely on the order of evaluation of operands or the order of performance of operations within an expression and therefore even a simple translator can impose any convenient order it wishes. True optimisations, for the purposes of this book, are those translation mechanisms which take account
167 of thecontrol contextwithin which a code fragment will operate – i.e. the effects of code fragments which are executed before it and after it.
10.0.4 Language restrictions and accurate optimisation
A powerful argument against the use of code optimisation techniques is the undeniable fact that rather too many optimising compilers often produce an in- valid translation of the source program – i.e. an object program which doesn’t produce the effect specified by the source program. Sometimes this is unavoid- able, and in such cases it may be necessary to alter the language definition to make it clear that certain restrictions apply to source programs, but frequently it arises because of the nature of the process of optimisation.
A simple translator produces a range of code fragments, each of which is either always right or always wrong. During debugging of the compiler it is relatively easy to detect errors in the translator which cause it to consistently produce inaccurate code and this makes it very easy to debug a tree-walking translator.1
Optimisation isn’t like that: it rejects a transparent translation in favour of one which exploits peculiarities of the source program’s algorithm and the details of control flow in the object program. Quite glaring errors of translation can be made which only show up in programs with a particular pattern of control flow, or which exercise a feature of the optimised code which is rarely used. Many op- timising compilers, for example, have implemented multiplication and division by a power of 2 by using ‘shift’ instructions, which is normally valid except for the case of division when the operand has the value ‘-1’ (Steele, 1977). That is almost a simple translation error, but in other cases it is common to find that an optimising compiler will produce an incorrect translation given a particu- lar version of the source program yet a correct translation when some trivial alteration is made, such as interchanging the order of a couple of assignment statements.
Such problems arise when a particular code optimisation technique can only validly be employed given certain properties of the object program. When the compiler-writer has failed to check the validity of application of a technique, or has made an ineffective attempt to check it, there is a bug in the compiler. Often, though, the necessary properties of the object program are impossible to check at compile-time. In such a case it may be necessary to define the semantics of the source language in such a way that optimisation is permitted and that users are warned of the possible anomalous effects of code optimisation on some programs. The language definition must, in effect, prohibit source programs which specify one form of processor behaviour when translated transparently but another behaviour when ‘optimised’.
1 Every compiler-writer has the experience of finding translation bugs after the compiler has
been operating apparently perfectly for months – bugs which, it often seems, ought to have affected most users of the compiler but didn’t! The time does arrive, though, when you can be fairly confident that there are no more such bugs to be found in a simple translator.
168 CHAPTER 10. CODE OPTIMISATION Examples of the restrictions that are applied are the following (paraphrased from the FORTRAN 66 definition)
(i) No function call may alter the value of any other element in the statement in which it occurs.
(ii) If textually identical function calls appear within a single statement then each evaluation must produce the same result and the function must be written so that the net side-effect is the same no matter how many of the function calls are actually executed.
(iii) If a single memory cell can be referred to in a fragment of source program by more than one name then alteration of the value of that cell via one of the names may not immediately alter its value when referred to by any of the other names.
Restriction (i) is an extension of that which allows node-reversal in an expres- sion by demanding that ‘X+F(X)’ must produce the same value as ‘F(X)+X’. Restriction (ii) demands that ‘RAND(I)/RAND(I)’ must always evaluate to ‘1.0’ (which is suprising to the user when RAND is a pseudo-random number function!) and allows the compiler to evaluate ‘common sub-expressions’ in a statement only once rather than many times. Restriction (iii) implies that the program fragment
I = Z+1 J = Z*I A(J) = A(I)
is invalid if I and J can ever refer to the same memory cell, and is imposed to allow the compiler to hold INTEGER values in registers rather than in the memory.
The restrictions above are specific to FORTRAN, but similar restrictions can be applied in any language. Some implementations of PASCAL, for example, go further and demand that no function may produce any side-effects whatsoever. In any language in which a memory cell can be simultaneously referred to by more than one name – say via a parameter name and a global variable name in a block-structured language – some version of restriction (iii) must be applied. By their very nature such restrictions are uncheckable at compile-time (given the design of conventional programming languages) and even partial checks can prove impossibly expensive: the compiler must therefore generate code which assumes that the restrictions have been obeyed. The naive user will not usually fully understand the purpose or the effect of the restrictions, however, and even the most sophisticated programmer will sometimes make a mistake. As a result it’s a common experience that a program which runs apparently ‘correctly’ when given a simple translation will produce very different results when optimised. The effect is to increase the average effort required to debug a program and, as I argue below and in section V, the cost of debugging may be greater than the
169 cost of executing the unoptimised program throughout its useful life. Which leads me to ask the question –