Control Structures - Representation of Language Elements

3.2 Representation of Language Elements

3.2.4 Control Structures

A node representing a control structure generally results in several disjoint code sequences rather than a single code sequence. The meanings of and relationships among the sequences depend primarily upon the source language, and hence general schemata can be used to specify them. Each of the disjoint sequences then can be thought of as an abstract machine operation with certain dened properties and implemented individually.

The

goto

statement is implemented by an unconditional jump instruction. If the jump leaves a block or procedure then additional operations, discussed in Section 3.3, are needed to adjust the state. In expression-oriented languages, a jump out of an expression may require adjustment of a hardware stack used for temporary storage of intermediate values. This adjustment is not necessary when the stack is simply an area of memory that the compiler manages as a stack, computing the necessary osets at compile time. (Unless use of a hardware stack permits cheaper access functions, it should be avoided for this reason.)

Schemata for common control structures are given in Figure 3.9. The operation `condition(expression,truelabel,falselabel)' embodies the jump cascade discussed in Section 3.2.3. The precise mechanism used to implement the analogous `select' operation depends upon the set

k

:::k

m. Let

k

min be the smallest and

k

max the largest values in this set. If `most' of the values in the range [

k

min

;k

max] are members of the set then `select' is implemented as

shown in Figure 3.10a. Each element of

target

that does not correspond to an element of

k

:::k

m is set to `L0'. When the selector set is sparse and its span is large (for example, the set 0

;

5000

;

10000), a decision tree or perfect hash function should be used instead of an array. The choice of representation is strictly a space/time tradeo, and must be made by the code

generator for each case clause. The source-to-target mapping must specify the parameters to be used in making this choice.

condition(

e

, L1, L2) L1: clause L2: a)

if

ethen

clause; condition(

e

, L1, L2) L1: clause1 GOTO L L2: clause2 L:

if

ethen

clause1

else

clause2; select(

e

k

1, L1,

:::

k

n, Ln, L0) L1: clause1 GOTO L

:::

Ln: clause_n GOTO L L0: clause0 L:

case

eof

k

1: clause1;

:::

;

k

n: clausen

else

clause0; GOTO L L1: clause L: condition(

e

, L1, L2) L2: d)

whileedo

clause; L1: clause condition(

e

, L2, L1) L2:

repeat

clause

until

e

forbegin(

i

e

clause

forend(

i

e

fori

e

by

e

toe

do

clause;

Figure 3.9: Implementation Schemata for Common Control Structures

By moving the test to the end of the loop in Figure 3.9d, we reduce by one the number of jumps executed each time around the loop without changing the total number of instructions

3.2 Representation of Language Elements 53 required. Further, if the target machine can execute independent instructions in parallel, this schema provides more opportunity for such parallelism than one in which the test is at the beginning.

`Forbegin' and `forend' can be quite complex, depending upon what the compiler can deduce about the bounds and step, and how the language denition treats the controlled variable. As an example, suppose that the step and bounds are constants less than 212, the step is positive, and the language denition states that the value of the controlled variable is undened on exit from the loop. Figure 3.10b shows the best IBM 370 implementation for this case, which is probably one of the most common. (We assume that the body of the loop is too complex to permit retention of values in registers.) Note that the label LOOP is dened within the `forbegin' operation, unlike the labels used by the other iterations in Figure 3.9. If we permit the bounds to be general expressions, but specify the step to be 1, the general schema of Figure 3.10c holds. This schema works even if the value of the upper bound is the largest representable integer, since it does not attempt to increment the controlled variable after reaching the upper bound. More complex cases are certainly possible, but they occur only infrequently. It is probably best to implement the abstract operations by subroutine calls in those cases (Exercise 3.9).

target :

array

[kmin .. kmax]

of

address; k : integer;

k := e;

if

k kmin

and

k kmax

then goto

target [k]

else goto

L0; a) General schema for `select' (Figure 3.9c)

LA 1,

e

1 = constant

<

2 12 LOOP ST 1,

i

:::

Body of the clause

L 1,

i

LA 2,

e

2 = constant

<

2 12 LA 3,

e

3 = constant

<

2 12 BXLE 1,2,LOOP

b) IBM 370 code for special-case forbegin

:::

forend

i

e

t

e

if

i > t

then gotol

else gotol

l

1 :

i

+ 1;

l

2 :

:::

(* Body of the clause *)

if

i < t

then gotol

l

3 :

c) Schema for forbegin...forend when the step is 1

Figure 3.10: Implementing Abstract Operations for Control Structures

Procedure and function invocations are control structures that also manipulate the state. Development of the instruction sequences making up these invocations involves decisions about the form of parameter transmission, and the construction of the activation record { the area of memory containing the parameters and local variables.

A normal procedure invocation, in its most general form, involves three abstract operations:

Transfer:

Transfer control to the procedure.

Callend:

Relinquish access to the activation record of the procedure.

Argument computation and transmission instructions are placed between `callbegin' and `transfer'; instructions that retrieve and store the values of result parameters lie between `transfer' and `callend'. The activation record of the procedure is accessible to the caller between `callbegin' and `callend'.

In simple cases, when the procedure calls no other procedures and does not require complex parameters, the activation record can be deleted entirely and the parameters treated as local variables of the environment statically surrounding the procedure declaration. The invocation then reduces to a sequence of assignments to these variables and a simple subroutine jump. If, as in the case of elementary functions, only one or two parameters are involved then they can be passed in registers. Note that such special treatment leads to diculties if the functions are invoked as formal parameters. The identity of the procedure is not xed under those circumstances, and hence special handling of the call or parameter transmission is impossible. Invocations of formal procedures also cause problems if, as in ALGOL 60, the number and types of the parameters is not statically specied and must be veried at execution time. These dynamic checks require additional instructions not only at the call site, but also at the procedure entry. The latter instructions must be avoided by a normal call, and therefore it is useful for the procedure to have two distinct entry points { one with and one without the tests.

Declarations of local variables produce executable code only when some initialization is required. For dynamic arrays, initialization includes bounds computation, storage allocation, and construction of the array descriptor. Normally only the bounds computation would be realized as in-line code; a library subroutine would be invoked to perform the remaining tasks. At least for test purposes, every variable that is not explicitly initialized should be im- plicitly assigned an initial value. The value should be chosen so that its use is likely to lead to an error report; values recognized as illegal by the target machine hardware are thus best. Under no circumstances should 0 be used for implicit initialization. If it is, the programmer will too easily overlook missing explicit initialization or assume that the implicit initialization is a dened property of the language and hence write incorrect programs.

Procedure and type declarations do not usually lead to code that is executed at the site of the declaration. Type declarations only result in machine instructions if array descriptors or other variables must be initialized. As with procedures, these instructions constitute a subprogram that is not called at the point of declaration.

ALGOL 68 identity declarations of the form

mid

expression

are consistently replaced by initialized variable declarations

mid

0 :=

expression

. Here

id

0 is a new internal name, and every applied occurrence of

id

is consistently replaced by

id

". The initialization remains the only assignment to

id

0. Simplication of this schema is possible when the expression can be evaluated at compile time and all occurrences of

id

replaced by this value.

The same schema describes argument transmission for the reference and strict value mechanisms, in particular in ALGOL 68. Transmission of a reference parameter is implemented by initialization of an internal reference variable:

ref

m parameter =argument becomes

ref

m variable := argument.

We have already met the internal transformation used by the value and name mechanisms in Section 2.5.3. In the result and value/result mechanisms, the result is conveniently assigned to the argument after return. In this way, transmission of the argument address to the procedure is avoided. When implementing value/result transmission for FORTRAN, one should generate the result assignment only in the case that the argument was a variable. (Note that if the argument address is transmitted to the procedure then the caller must

3.3 Storage Management 55

In document Compiler Construction - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials (Page 63-67)