• No results found

3.2 Representation of Language Elements

3.2.2 Composite Objects

For composite objects, we are interested in the properties of the standard representation and the possibilities for reducing storage requirements.

An objecta :

array

[m .. n]

of

M will be represented by a sequence of(n - m + 1)

components of typeM. The address of element

a

[

i

] becomes:

address (a[m]) + (i - m) * |M| = address (a[0]) + i * |M|

Here |M| is the size of an element in address units and address (a [ 0 ]) is the `c-

titious starting address' of the array. The address of a[0] is computed from the location

of the array in storage; such an element need not actually exist. In fact, address (a [0])

could be an invalid address lying outside of the address space.

The usual representation of an object b :

array

[m1 .. n1,

:::

, mr .. nr]

of

M occupies

k

1

k

2

:::k

rjMjcontiguous memory cells, where

k

j =

n

j,

m

j+ 1,

j

= 1

;:::;r

. The address of element

b

[

i

1

;:::;i

r] is given by the following storage mapping function when the array is stored in row-major order:

address(

b

[

m

1

;:::;m

r]) + (

i

1 ,

m

1)

k

2

k

rjMj++ (

i

r,

m

r)jMj =address(

b

[0

;:::;

0]) +

i

1

k

2

:::k

rjMj++

i

rjMj By appropriate factoring, this last expression can be rewritten as:

address(

b

[0

;:::;

0]) + ((

:::

(

i

1

k

2+

i

2)

k

3+ +

i

r)jMj

If the array is stored in column-major order then the order of the indices in the polynomial is reversed: address(

b

[0

;:::;

0]) + ((

:::

(

i

r

k

r ,1+

i

r,1)

k

r ,2+ +

i

1) jMj

The choice of row-major or column-major order is a signicant one. ALGOL 60 does not specify any particular choice, but many ALGOL 60 compilers have used row-major order. Pascal implicitly requires row-major order, and FORTRAN explicitly species column-major order. This means that Pascal arrays must be transposed in order to be used as parameters to FORTRAN library routines. In the absence of language constraints, make the choice that corresponds to the most extensive library software on the target machine.

Access to

b

[

i

1

;:::;i

r] is undened if the relationship

m

j

i

j

n

j is not satised for some

j

= 1

;:::;r

. To increase reliability, this relationship should be checked at run time if the compiler cannot verify it in other ways (for example, that

i

j is the controlled variable of a

loop and the starting and ending values satisfy the condition). To make the check, we need to evaluate a storage mapping function with the following xed parameters (or its product with the size of the single element):

r;

address(

b

[0

;:::;

0])

;m

1

;:::;m

r

;n

1

;:::;n

r

Together, these parameters constitute the array descriptor. The array descriptor must be stored explicitly for dynamic and exible arrays, even in the trivial case

r

= 1. For static arrays the parameters may appear directly as immediate operands in the instructions for computing the mapping function. Several array descriptors may correspond to a single array, so that in addition to questions of equality of array components we have questions of equality or identity of array descriptors.

An

r

dimensional array

b

can also be thought of as an array of

r

,1 dimensional arrays. We might apply this perception to an object

c

:

array

[1

::m;

1

::n

]

ofinteger

, representing it as

m

one-dimensional arrays of type

t

=

array

[1

::n

]

ofinteger

. The ctitious starting addresses of these arrays are then stored in an object

a

:

array

[1

::m

]

of

"

t

. To be sure, this descriptor technique raises the storage requirements of

c

from

mn

to

mn

+

m

locations for integers or addresses; in return it speeds up access on many machines by replacing the multiplication by

n

in the mapping functionaddress(

c

[0

;

0]) +(

in

+

j

)j

integer

jby an indexed memory reference. The saving may be particularly signicant on computers that have no hardware multiply instruction, but even then there are contraindications: Multiplications occurring in array accesses are particularly amenable to elimination via simple optimizations.

The descriptor technique is supported by hardware on Burroughs 6700/7700 machines. There, the rows of a two-dimensional array are stored in segments addressed by special seg- ment descriptors. The segment descriptors, which the hardware can identify, are used to access these rows. Actual allocation of storage to the rows is handled by the operating sys- tem and occurs at the rst reference rather than at the declaration. The allocation process, which is identical to the technique for handling page faults, is also applied to one-dimensional arrays. Each array or array row is divided into pages of up to 256 words. Huge arrays can be declared if the actual storage requirements are unknown, and only that portion actually referenced is ever allocated.

Character strings and sets are usually implemented as arrays of character and Boolean values respectively. In both cases it pays to pack the arrays. In principle, character string variables have variable length. Linked lists provide an appropriate implementation; each list element contains a segment of the string. List elements can be introduced or removed at will. Character strings with xed maximum length can be represented by arrays of this length. When an array of Boolean values is packed, each component is represented by a single bit, even when simple Boolean variables are represented by larger storage units as discussed above. A record is represented by a succession of elds. If the elds of a record have alignment constraints, the alignment of the entire record must be constrained also in order to guarantee

3.2 Representation of Language Elements 47 that the alignment constraints of the elds are met. An appropriate choice for the alignment constraint of the record is the most stringent of the alignment constraints of its elds. Thus a record containing elds with alignments of 2, 4 and 8 bytes would itself have an alignment of 8 bytes. Whenever storage for an object with this record type is allocated, its starting address must satisfy the alignment constraint. Note that this applies to anonymous objects as well as objects declared explicitly.

The amount of storage occupied by the record may depend strongly upon the order of the elds, due to their sizes and alignment constraints. For example, consider a byte-oriented machine on which a character variable is represented by one byte with no alignment constraint and an integer variable occupies four bytes and is constrained to begin at an address divisible by 4. If a record contained an integer eld followed by a character eld followed by a second integer eld then it would occupy 12 bytes: There would be a 3-byte gap following the character eld, due to the alignment constraint on integer variables. By reordering the elds, this gap could be eliminated. Most programming languages permit the compiler to do such reordering.

Records with variants can be implemented with the variants sharing storage. If it is known from the beginning that only one variant will be used and that the value of the variant selector will never change, then the storage requirement may be reduced to exactly that for the specied variant. This requirement is often satised by anonymous records; Pascal distinguishes the calls

new

(

p

) and

new

(

p;

variant selector) as constructors for anonymous

records. In the latter case the value of the variant selector may not change, whereas in the former all variants are permitted.

The gaps arising from the alignment constraints on the elds of a record can be eliminated by simply ignoring those constraints and placing the elds one after another in memory. This packing of the components generally increases the cost in time and instructions for eld access considerably. The cost almost always outweighs the savings gained from packing a single record; packing pays only when many identical records are allocated simultaneously. Packing is often restricted to partial words, leaving objects of word length (register length) or longer aligned. On byte-oriented machines it may pay to pack only the representation of sets to the bit level.

Packing alters the access function of the components of a composite object: The selector must now specify not only the relative address of the component, but also its position within the storage cell. On some computers extraction of a partial word can be specied as part of an operand address, but usually extra instructions are required. This has the result that packed components of arrays, record and sets may not be accessible via normal machine addresses. They cannot, therefore, appear as reference parameters.

Machine-dependent programs sometimes use records as templates for hardware objects. For example, the assembly phase of a compiler might use a record to describe the encoding of a machine instruction. The need for a xed layout in such cases violates the abstract nature of the record, and some additional mechanism (such as the representation specication of Ada) is necessary to specify this. If the language does not provide any special mechanism, the compiler writer can overload the concept of packing by guaranteeing that the elds of a packed record will be allocated in the order given by the programmer.

Addresses are normally used to represent pointer values. Addresses relative to the be- ginning of the storage area containing the objects are often sucient, and may require less storage than full addresses. If, as in ALGOL 68, pointers have bounded lifetime, and the correctness of assignments to reference variables must be checked at run time, we must add information to the pointer from which its lifetime may be determined. In general the starting address of the activation record (Section 3.3) containing the reference object serves this pur- pose; reference objects of unbounded extent are denoted by the starting address of the stack.

A comparison of these addresses for relative magnitude then represents inclusion of lifetimes.