• No results found

Block-structuring and scope

In document Understanding and Writing Compilers (Page 160-166)

8.4 Single Names with Multiple Descriptors

8.4.2 Block-structuring and scope

In block-structured languages such as ALGOL 60, ALGOL 68, PASCAL or PL/1 a single name can refer to different objects in different blocks of the program. The translator must discover which of the various possible objects is intended by noting the context in which the name occurs. The simplest technique is to maintain a list of descriptors in each symbol table entry and to manipulate the list during translation so that the descriptor at the front of the list is always the currently relevant one. All that is required is to collect together the descriptors for a block or procedure (or whatever program section can declare objects). Just before translation of the section commences, each of these descriptors is inserted at the front of the relevant descriptor list; when translation of the section is over, each descriptor can be removed from its list. This is the function of the ‘declare’ and ‘undeclare’ procedures called from TranBlock and TranProcDecl in chapter 7.

To remove the descriptors after a block has been translated it is possible to trace through all the declarations in the block and to remove the first descriptor associated with each name declared there. An alternative mechanism is to link together all the descriptors for the names declared in a block as they are created and to use this link to remove the names at the end of the block – as with the

8.4. SINGLE NAMES WITH MULTIPLE DESCRIPTORS 145

begin co (i) in the outer block X is undeclared oc

...

begin int X;

... co (ii) in block 1 X is an int oc begin real X;

... co (iii) in block 2 X is a real oc end;

... co (iv) X is an int again oc begin bool X;

... co (v) in block 3 X is a bool oc end;

... co (vi) and once again an int oc end;

... co (vii) back in the outer block and undeclared oc begin string X;

... co (viii) in block 4 X is a string oc end;

... co (ix) and finally X is undeclared again oc end;

State of descriptor list:

(i) after program begin: empty (ii) after block 1 begin: D1(int)

(iii) after block 2 begin: D2(real), D1(int) (iv) after block 2 end: D1(int)

(v) after block 3 begin: D3(bool), D1(int) (vi) after block 3 end: D1(int)

(vii) after block 1 end: empty (viii) after block 4 begin: D4(string)

(ix) after block 4 end: empty

146 CHAPTER 8. CREATING AND USING THE SYMBOL TABLE COBOL descriptors above, the mechanism requires that each descriptor contain a pointer to the symbol table entry with which it is associated. There are a variety of other possible techniques. If the descriptors are to be used to create a run-time debugging table (see chapter 20) then it may be necessary to preserve them after they have been unlinked from the symbol table entry.

Whichever method is used, the effect is the same, and figure 8.12 shows the way in which the descriptor list would change during the translation of a sam- ple ALGOL 68 program. Note that this mechanism ensures that whenever the name denotes no object at all the descriptor list is empty – which is the situa- tion recognised by the translator when it produces an error report ‘undeclared identifier X’. In effect each descriptor list mimics the operation of a separate descriptor stack.

Summary

The symbol table is used to correlate the separate occurrences of a name during compilation of a program. Because the lexical analyser must search the table each time it encounters a name or keyword in the source program, it should be searched by a ‘hash addressing’ technique for maximum efficiency. It’s conve- nient to use the symbol table to recognise keywords. It may be useful to build a similar table which correlates multiple uses of particular constants in the source program.

The descriptors held in the symbol table give information about the run-time objects which source program names denote. An important component of this information is the run-time address of an object. By keeping track of locations within a data frame the object description phase can assign addresses to all data objects: the translator or the loader can assign addresses to labels and procedures. The information in a descriptor is inserted when a declaration is processed by the object-description phase and is used by the translation phases both to check the validity of use of the name and to generate the instructions which will manipulate the run-time object.

In many languages a name may describe more than one object in the program. The descriptors associated with a name will reflect the language structures that give rise to them – in COBOL a hierarchy, in ALGOL 60 a list or stack of descriptors.

Chapter 9

Accessing an Element of a

Data Structure

Compiled languages usually have one or more of the kinds of data structures discussed in this chapter. Records are like the nodes of the parse tree used in the examples of this book – multi-element objects which contain a collection of values, some of which may be pointers to other such objects. Accessing an element of a record, via the name of that element, is relatively efficient and may be compile-time checked for validity.

Vectors and arrays are random-access storage areas indexed according to the

value of a subscript expression; accessing an element of an array involves cal- culation of the address of that element, which may be an expensive activity. Much effort can usefully be expended on making array access less expensive and I describe two alternative mechanisms of address calculation.

In this chapter I discuss the code fragments which implement data structure access without showing example translation procedures in every case. I show a translation procedure which generates fairly straightforward code for a vector access and one which generates code for a PASCAL-like record access: for the other examples it would be tedious to display translation procedures which had only minor differences from these two.

The code fragments discussed in this chapter are as crucial as those discussed elsewhere. This chapter shows the tree-walking translation mechanism at its worst, however, because in translating a source program fragment which makes multiple reference to a data structure element, a tree walker cannot easily pre- vent unnecessary re-calculation of the address of that element in the object code.

148 CHAPTER 9. ACCESSING AN ELEMENT OF A DATA STRUCTURE

let TranVecAccess(nodep, regno) be

{ let vecname, subscript = nodep.left, nodep.right

let vecdescr = vecname.descriptor

if vecdescr=empty then

Error("undeclared name", vecname)

elsf vecdescr.kind \= vector then

Error("not a vector name", vecname)

if subscript.kind=number then

{ Gen(LOAD, regno, vecdescr.address)

Address := subscript.value; Modifier := regno

}

else

{ TranArithExpr(subscript, regno) Gen(ADD, regno, vecdescr.address) Address := 0; Modifier := regno

} }

Figure 9.1: Translating accesses to a vector element

9.1

Accessing an Element of a Vector

The symbol table descriptor of an array will give its dimensionality, the bounds of each dimension if they are known at compile-time, and its address within a particular data frame. This address, in a compiler for a language which allows ‘dynamic arrays’, will normally be that of a fixed-sizedope vectorwhich at run- time will contain the address of the actual elements of the array together with information about its dimensionality and its actual bounds.

I consider first the case of a vector, which in programming language terms is just a single-dimension array. The ‘dope vector’ will contain at run-time three items of information

#V0 – the address of the element V[0] (note that this address may be outside the bounds of the space allocated to the elements of the vector, for example if it is declared as V[2:24]).

#V1 – the lower bound given in the declaration of the vector. #V2 – the upper bound given in the declaration of the vector.

The latter two pieces of information are useful for the purposes of run-time error detection and may also be used when the vector is passed as an argument in a procedure call (see chapter 13). Chapter 11 shows how the ‘dope vector’ can

9.1. ACCESSING AN ELEMENT OF A VECTOR 149

(a) Source: V[i] := x (c) Source: V[i-1] := V[i+1]

Code: LOAD 1, x Code: LOAD 1, i

LOAD 2, i ADDn 1, 1 ADD 2, #V0 ADD 1, #V0 STORE 1, 0(2) LOAD 1, 0(1) LOAD 2, i (b) Source: V[3] := y SUBn 2, 1 ADD 2, #V0

Code: LOAD 1, y STORE 1, 0(2)

LOAD 2, #V0 STORE 1, 3(2)

Figure 9.2: Code to access a vector

Source: V[i-1] := V[i+1]

Simply translated: ‘Optimised’:

LOAD 1, i LOAD 1, i ADD 1, #V0 ADD 1, #V0 LOAD 1, 1(1) LOAD 2, 1(1) LOAD 2, i STORE 2, -1(1) ADD 2, #V0 STORE 1, -1(2)

150 CHAPTER 9. ACCESSING AN ELEMENT OF A DATA STRUCTURE be set up at block entry: figure 9.1 shows a ‘TranVecAccess’ procedure which generates code that uses the contents of the ‘dope vector’ to access an element of the vector proper. The procedure assigns values to two global variables ‘Address’ and ‘Modifier’ which can be used in the generation of instructions by the calling procedure. Thus TranAssignStat (chapter 7) can insert these values into aSTORE instruction, a procedure called by TranArithExpr (chapter 5) or TranBoolExpr (chapter 6) could use them in a LOAD, ADD, MULT (or whatever) instruction.

Figure 9.2 shows how the code produced by TranVecAccess would be used in code for expressions and assignment statements. Code (a) shows how an element of the vector can be accessed in the most general case and code (b) shows the slightly more efficient code fragment which is used to access a constant-subscript element. Code (c) shows how disappointing the code can be in some cases (code optimisation could do something about this code).

Figure 9.3 shows the code which could be produced if TranVecAccess catered specially for cases in which the subscript was<expr>+<constant>or<expr>- <constant>. However in this example it is in fact unnecessary to recalculate the address of V[i] and code optimisation might exploit this to produce a more efficient code fragment. The optimised code that could be produced is shown in the right-hand-column of figure 9.3: although a tree-walker might be designed to produce this code for the example shown it would fail to produce optimal code in general.

The procedure of figure 9.1 assumes that each element of a vector occupies a single memory cell. When accessing a vector of multi-cell objects (ALGOL 68structs, PASCALrecords, FORTRAN COMPLEX variables) or parti-cell objects (characters or bits) it is necessary to adjust the subscript by multiplying, dividing or shifting to produce the correct offset from the base of the array. Every element of a vector must occupy the same number of memory cells as every other element, and the size is known at compile-time, so the computation of an address can be fairly efficient, at least in the case of a vector of multi-cell objects. It is worth exploiting any bit- or byte-addressing instructions which the object machine may possess to access a vector of parti-cell elements: if there are no such instructions then perhaps it is best to store each element of the vector in a separate cell to preserve accessing efficiency.

In document Understanding and Writing Compilers (Page 160-166)