Establishing Addressability - Engineering A Compiler pdf

As part of the linkage convention, the compiler must ensure that each procedure can generate an address for each variable that it references. In an Algol-like language, this usually includes named global variables, some form of static storage, the procedure’s own local variables, and some of the local variables of its lexical ancestors. In general two cases arise; they diﬀer in the amount of calculation required to ﬁnd the starting address, orbase address, of the data area.

7.5.1 Trivial Base Addresses

For most variables, the compiler can emit code that generates the base address in one or two instructions. The easiest case is a local variable of the current procedure. If the variable is stored in the procedure’sar, the compiler can use the arp as its base address. The compiler can load the variable’s value with a singleloadAI instruction or a loadIfollowed by a loadAO. Thus, access to local variables is fast.

(Sometimes, a local variable is not stored in the procedure’sar. The value might reside in a register, in which case loads and stores are not needed. The variable might have an unpredictable or changing size, in which case the compiler might need to allocate space for it in the run-time heap. In this case, the compiler would likely reserve space in the ar for a pointer to the heap location. This adds one extra level of indirection to any access for that variable, but defers the need for its size until run-time.)

Access to global and static variables is handled similarly, except that the base address may not be in a register at the point where the access occurs. Thus, the compiler may need to emit code that determines the base address at run-time. While that sounds complex, it is exactly the task that symbolic assemblers were designed to accomplish. The compiler generates base addresses for global and static data areas by using the name of the data area as part of an assembly language label. To avoid conflicts with other labels, the compiler “mangles” the name by adding a prefix, a suffix, or both, to the name. The compiler deliberately adds characters that cannot appear in source-language names.

For example, given a global variablefee, accompiler might construct the label&fee., counting on the fact that ampersand (&) cannot be used in a source language name and that no legalcname can end with a period. It would emit the appropriate assembly language pseudo-operation to reserve space for fee

or to initializefee, attaching the label to the pseudo-operation. To obtain a run-time address forfee, the compiler would emit the instruction

loadI &fee. ⇒ r₁.

The next instruction would user₁ to access the memory location forfee. Notice that the compiler has no actual knowledge of wherefee is stored. It uses a relocatable label to ensure that the appropriate run-time address is written into the instruction stream. At compile-time, it makes the link between

7.5. ESTABLISHING ADDRESSABILITY 183 the contents of r1 and the location of feeby creating an assembly-level label. That link is resolved by the operating system’s loader when the program is loaded and launched.

Global variables may be labelled individually or in larger groups. In Fortran, for example, the language collects global variables into “common blocks.” A typical Fortran compiler establishes one label for each common block. It assigns an oﬀset to each variable in each common block and generates loadandstore

instructions relative to the common block’s label.

Similarly, the compiler may create a single static data area for all of the static variables within a single static scope. This serves two purposes. First, it keeps the set of labels smaller, decreasing the likelihood of an unexpected conflict. If a name conflict occurs, it will be discovered during linking or loading. When this occurs, it can be quite confusing to the programmer. To further decrease the likelihood of this problem, the compiler can prepend part of the file name or the procedure name to the variable’s name. Second, it decreases the number of base addresses that might be required in a single procedure. This reduces the number of registers tied up to hold base addresses. Using too many registers for addressing may adversely affect overall run-time performance of the compiled code.

7.5.2 Local Variables of Other Procedures

In a language that supports nested lexical scopes, the compiler must provide a mechanism to map static distance coordinates into hardware addresses for the corresponding variables. To accomplish this, the compiler must put in place data structures that let it to compute the addresses of ar_{s of each lexical ancestors} of the current procedure.

For example, assume that fee, at level x, references variable a declared in fee’s level y ancestor fie. The parser converts this reference into a static distance coordinate(x−y),offset. Here, x−yspecifies how many lexical levels lie betweenfeeandfie, andoffsetis the distance from thearpfor an instance of fieand the storage reserved forainfie’sar.

To convert(x−y),offsetinto a run-time address, the compiler must emit two different kinds of code. First, the compiler writer must select a mechanism for tracking lexical ancestry among activation records. The compiler must emit the code necessary to keep this information current at each procedure call. Second, the compiler must emit, at the point of reference, code that will interpret the run-time data structure and the expressionx−yto produce the address of the appropriatearp_{and use that}arp_and_offset _{to address the variable. Since} bothx−yandoffsetare known at compile time, most of the run-time overhead goes into traversing the data structure.

Several mechanisms have been used to solve this problem. We will examine two: access links and a global display.

Access Links In this scheme, the compiler ensures that each ar _{contains a} pointer to the ar of its immediate lexical ancestor. We call this pointer an access link, since it is used to access non-local variables. Starting with the

locals callerarp access link ret. address ret. value parameters - - level 2 - arp locals callerarp access link ret. address ret. value parameters - - level 1 locals callerarp access link ret. address ret. value parameters level 0

Figure 7.7: Using access links

current procedure, the access links form a chain of thears for all of its lexical ancestors. Any local variable of another procedure that can be accessed from the current procedure must be stored in an ar on the chain of access links. Figure 7.7 shows this situation.

To use access links, the compiler emits code that walks the chain of links until it ﬁnds the appropriate arp. If the current procedure is at levelx, and the reference is to oﬀset o at level y, the compiler emits code to follow x−y

pointers in the chain of access links. This yields the appropriatearp. Next, it emits code to add the offset o to thisarp, and to use the resulting address for the memory access. With this scheme, the cost of the address calculation is proportional to x−y. If programs exhibit shallow levels of lexical nesting, the difference in cost between accessing two variables at different levels will be fairly small. Of course, as memory latencies rise, the constant in this asymptotic equation gets larger.

To maintain access links, the compiler must add code to each procedure call to find the appropriate arp _{and store it into the} ar _{for the called pro-} cedure. Two cases arise. If the called procedure is nested inside the current procedure—that is, its lexical level is exactly one more than the level of the calling procedure—then the caller uses its own arp _{as the access link of the} called procedure. Otherwise, the lexical level must be less than or equal to the level of the calling procedure. To find the appropriatearp, the compiler emits code to find the arp one level above the called procedure’s level. It uses the same mechanism used in accessing a variable at that level; it walks the chain of access links. It stores thisarp as the called procedure’s access link.

Global Display In this scheme, the compiler allocates a globally accessible array to hold thearp_{s of the most recent instance of a procedure called at each level.} Any reference to a variable that resides in some lexical ancestor becomes an indirect reference through this global table ofarp_{s. To convert}₍_x−_y₎_,_oﬀset into an address, the compiler takes the arp stored in element y of the global display, addsoﬀset to it, and uses that as the address for the memory reference.

7.6. STANDARDIZED LINKAGES 185

In document Engineering A Compiler pdf (Page 192-195)