process, except possibly for calls to operating system services, i.e., any library rou tines that are called are linked statically (before execution) with the user’s code, and all that needs to be done to enable execution is to load the executable image of the program into memory, to initialize the environment as appropriate to the operating system’s standard programming model, and to call the program’s main procedure with the appropriate arguments. There are several drawbacks to this model, hav ing to do with space utilization and the time at which users’ programs and libraries are bound together, that can all be solved by using so-called shared libraries that are loaded and linked dynamically on demand during execution and whose code is shared by all the programs that reference them. The issues, presented as advantages of the shared library model, are as follows:
1. A shared library need exist in the file system as only a single copy, rather than as part of each executable program that uses it.
2. A shared library’s code need exist in memory as only one copy, rather than as part of every executing program that uses it.
3. Should an error be discovered in the implementation of a shared library, it can be replaced with a new version, as long as it preserves the library’s interface, without re quiring programs that use it to be relinked—an already executing program continues to use the copy of the library that was present in the file system when it demanded it, but new invocations of that program and others use the new copy.
Note that linking a program with a nonshared library typically results in acquir ing only the routines the program calls, plus the transitive closure of routines they call, rather than the whole library, but this usually does not result in a large space savings—especially for large, complex libraries such as those that implement win dowing or graphics systems—and spreading this effect over all programs that link with a given library almost always favors shared libraries.
A subtle issue is the need to keep the semantics of linking the same, as much as possible, as with static linking. The most important component of this is being able to determine before execution that the needed routines are present in the library, so that one can indicate whether dynamic linking will succeed, i.e., whether undefined and/or multiply defined external symbols will be encountered. This functionality is obtained by providing, for each shared library, a table of contents that lists its entry points and external symbols and those used by each routine in it (see Figure 5.14 for an example). The first column lists entry points and externally known names in this shared library and the second and third columns list entry points and externals they reference and the shared libraries they are located in. The pre-execution linking operation then merely checks the tables of contents corresponding to the libraries to be linked dynamically, and so can report the same undefined symbols that static linking would. The run-time dynamic linker is then guaranteed to fail if and only if the pre-execution static linker would. Still, some minor differences may be seen
128 Run-Time Support
Entry Points and Shared Entry Points and External Symbols Library External Symbols
Provided Used Used
entry1 lib rary l externl entry2 lib rary 2 entry3 entry2 lib rary l entry1 lib rary 2 entry4 entry5 externl
FIG. 5.14 An example of a shared library’s table of contents.
when one links a dynamic library ahead of a static library, when both were originally linked statically.
Also, the code that is shared need not constitute a library in the sense in which that term has traditionally been used. It is merely a unit that the programmer chooses to link in at run time, rather than in advance of it. In the remainder of this section, we call the unit a shared object rather than a shared library, to reflect this fact.
Shared objects do incur a small performance impact when a program is running alone, but on a multiprogrammed system, this impact may be balanced entirely or nearly so by the reduced working set size, which results in better paging and cache performance. The performance impact has two sources, namely, the cost of run-time linking and the fact that shared objects must consist of position-independent code, i.e., code that can be loaded at different addresses in different programs, and each shared object’s private data must be allocated one copy per linked program, resulting in somewhat higher overhead to access it.
We next consider the issues and non-issues involved in supporting shared ob jects. Position independence must be achieved so that each user of a shared object is free to map it to any address in memory, possibly subject to an alignment condi tion such as the page size, since programs may be of various sizes and may demand shared objects in any order. Accessing local variables within a shared object is not an issue, since they are either in registers or in an area accessed by means of a register, and so are private to each process. Accessing global variables is an issue, since they are often placed at absolute rather than register-relative addresses. Calling a rou tine in a shared object is an issue, since one does not know until the routine has been loaded what address to use for it. This results in four problems that need to be solved to make objects position-independent and hence sharable, namely, (1) how control is passed within an object, (2) how an object addresses its own external variables, (3) how control is passed between objects, and (4) how an object addresses external variables belonging to other objects.
In most systems, transferring control within an object is easy, since they provide program-counter-relative (i.e., position-based) branches and calls. Even though the
Section 5.7 Code Sharing and Position-Independent Code 129
object as a whole needs to be compiled in such a way as to be positioned at any location when it is loaded, the relative offsets of locations within it are fixed at compile time, so PC-relative control transfers are exactly what is needed. If no PC- relative call is provided by the architecture, it can be simulated by a sequence of instructions that constructs the address of a call’s target from its offset from the current point, as shown below.
For an instance of a shared object to address its own external variables, it needs a position-independent way to do so. Since processors do not generally provide PC- relative loads and stores, a different technique must be used. The most common approach uses a so-called global offset table, or GOT, that initially contains offsets of external symbols within a so-called dynamic area that resides in the object’s data space. When the object is dynamically linked, the offsets in the GOT are turned into absolute addresses within the current process’s data space. It only remains for procedures that reference externals to gain addressability to the GOT. This is done by a code sequence such as the following lir code:
gp < - G 0 T _ o ff - 4 c a l l n e x t ,r 3 1 n e x t : gP gp + r3 1
where G0T_of f is the address of the GOT relative to the instruction that uses it. The code sets the global pointer gp to point to the base of the GOT. Now the procedure can access external variables by means of their addresses in the GOT; for example, to load the value of an external integer variable named a, whose address is stored at offset a_of f in the GOT, into register r3, it would execute
r 2 <r- [ g p + a _ o ff ] r 3 < - [r 2 ]
The first instruction loads the address of a into r2 and the second loads its value into r3. Note that for this to work, the GOT can be no larger than the non-negative part of the range of the offset in load and store instructions. For a
Rise,
if a larger range is needed, additional instructions must be generated before the first load to set the high-order part of the address, as follows:r 3 <r- h ig h _ p a r t ( a _ o f f ) r 2 <r- gp + r 3
r 2 <r- [r 2 + lo w _ p a r t ( a _ o f f ) ] r 3 < - [r 2 ]
where h ig h .p a rt ( ) and lo w .p art ( ) provide the upper and lower bits of their ar gument, divided into two contiguous pieces. For this reason, compilers may provide two options for generating position-independent code—one with and one without the additional instructions.
Transferring control between objects is not as simple as within an object, since the objects’ relative positions are not known at compile time, or even when the program is initially loaded. The standard approach is to provide, for each routine called from an object, a stub that is the target of calls to that routine. The stub is placed in the calling object’s data space, not its read-only code space, so it can be
130 Run-Time Support
modified when the called routine is invoked during execution, causing the routine to be loaded (if this is its first use in the called object) and linked.
There are several possible strategies for how the stubs work. For example, each stub might contain the name of the routine it corresponds to and a call to the dynamic linker, which would replace the beginning of the stub with a call to the actual routine. Alternately, given a register-relative branch instruction, we could organize the stubs into a structure called a procedure linkage table, or PLT, reserve the first stub to call the dynamic linker, the second one to identify the calling object, and the others to each construct the index of the relocation information for the routine the stub is for, and branch to the first one (thus invoking the dynamic linker). This approach allows the stubs to be resolved lazily, i.e., only as needed, and versions of it are used in several dynamic linking systems. For sparc, assuming that we have stubs for three procedures, the form of the PLT before loading and after the first and third routines have been dynamically linked are as shown in Figure 5.15(a) and (b), respectively. Before loading, the first two PLT entries are empty and each of the others contains instructions that compute a shifted version of the entry’s index in the PLT and branch to the first entry. During loading of the shared object into memory, the dynamic linker sets the first two entries as shown in Figure 5.15(b)—the second one identifies the shared object and the first creates a stack frame and invokes the dynamic linker—and leaves the others unchanged, as shown by the .PLT3 entry. When the procedure, say f ( ), corresponding to entry 2 in the PLT is first called, the stub at .PLT2—which still has the form shown in Figure 5.15(a) at this point— is invoked; it puts the shifted index computed by the se th i in g l and branches to . PLTO, which calls the dynamic linker. The dynamic linker uses the object identifier and the value in g l to obtain the relocation information for f ( ), and modifies entry .PLT2 correspondingly to create a jmpl to the code for f ( ) that discards the return address (note that the se th i that begins the next entry is executed—harmlessly—in the delay slot of the jmpl). Thus, a call from this object to the PLT entry for f ( ) henceforth branches to the beginning of the code for f ( ) with the correct return address.
Accessing another object’s external variables is essentially identical to accessing one’s own, except that one uses that object’s GOT.
A somewhat subtle issue is the ability to form the address of a procedure at run time, to store it as the value of a variable, and to compare it to another procedure address. If the address of a procedure in a shared object, when computed within the shared object, is the address of its first instruction, while its address when computed from outside the object is the address of the first instruction in a stub for it, then we have broken a feature found in C and several other languages. The solution is simple: both within shared code and outside it, we use procedure descriptors (as described in the preceding section) but we modify them to contain the PLT entry address rather than the code’s address, and we extend them to include the address of the GOT for the object containing the callee. The code sequence used to perform a call through a procedure variable needs to save and restore the GOT pointer, but the result is that such descriptors can be used uniformly as the values of procedure variables, and comparisons of them work correctly.
Section 5.8 Symbolic and Polymorphic Language Support 131
PLTO: unimp .PLTO: save sp ,-6 4 ,sp
unimp c a ll dyn_linker
unimp nop
PLT1: unimp .PLT1: .word object_id
unimp unimp
unimp unimp
PLT2: seth i (.-.P L T 0 ),g l .PLT2: seth i ( . - . PLTO),gl b a,a .PLTO seth i °/0h i ( f ) ,g l
nop jmpl gl+°/0lo ( f ) ,r0
PLT3: seth i (.-.P L T 0 ),g l .PLT3: seth i ( . - . PLTO),gl
b a,a .PLTO b a,a .PLTO
nop nop
PLT4: seth i (.-.P L T 0 ),g l .PLT4: seth i ( .-.PLTO),gl b a,a .PLTO seth i #/,hi (h) ,g l
nop jmpl gl+7,lo(h) ,r0
nop nop
(a) (b)
FIG. 5.15 sparc PLT (a) before loading, and (b) after two routines have been dynamically linked.
5.8
Sym bolic and Polymorphic Lan guage Support
M ost of the compiler material in this book is devoted to languages that are well suited to compilation: languages that have static, compile-time type systems, that do not allow the user to incrementally change the code, and that typically make much heavier use of stack storage than heap storage.
In this section, we briefly consider the issues involved in compiling program s written in more dynamic languages, such as l isp, M L, Prolog, Scheme, se l f, Smalltalk, sn o b o l, Java, and so on, that are generally used to manipulate symbolic data and have run-time typing and polymorphic operations. We refer the reader to [Lee91] for a more expansive treatment of some of these issues. There are five main problems in producing efficient code for such a language, beyond those con sidered in the remainder of this book, namely,
1. an efficient way to deal with run-time type checking and function polymorphism, 2. fast implementations of the language’s basic operations,
3. fast function calls and ways to optimize them to be even faster, 4. heap storage management, and
5. efficient ways of dealing with incremental changes to running programs.
Run-time type checking is required by most of these languages because they assign types to data, not to variables. Thus, when we encounter at compile time an operation of the form “ a + b ” , or “ (p lu s a b) ” , or however it might be written in a particular language, we do not, in general, have any way of knowing whether the
132 Run-Time Support
operation being performed is addition of integers, floating-point numbers, rationals, or arbitrary-precision reals; whether it might be concatenation of lists or strings; or whether it is some other operation determined by the types of its two operands. So we need to compile code that includes type information for constants and that checks the types of operands and branches to the appropriate code to implement each operation. In general, the most common cases that need to be detected and dispatched on quickly are integer arithmetic and operations on one other data type, namely, list cells in lisp and ML, strings in snobo l, and so on.
Architectural support for type checking is minimal in most systems, sparc, how ever, provides tagged add and subtract instructions that, in parallel with performing an add or subtract, check that the low-order two bits of both 32-bit operands are zeros. If they are not, either a trap or a condition code setting can result, at the user’s option, and the result is not written to the target register. Thus, by putting at least part of the tag information in the two low-order bits of a word, one gets a very in expensive way to check that an add or subtract has integer operands. Some other Rises, such as mips and pa-risc, support somewhat slower type checking by pro viding compare-immediate-and-branch instructions. Such instructions can be used to check the tag of each operand in a single instruction, so the overhead is only two to four cycles, depending on the filling of branch-delay slots.
The low-order two bits of a word can also be used to do type checking in sparc
for at least one more data type, such as list cells in lisp. Assuming that list cells are doublewords, if one uses the address of the first word plus 3 as the pointer to a list cell (say in register r l), then word accesses to the car and edr fields use addresses of the form r l - 3 and r l + 1, and the addresses are valid if and only if the pointers used in loads or stores to access them have a 3 in the low-order two bits, i.e., a tag of 3 (see Figure 5.16). Note that this leaves two other tag values (1 and 2) available for another type and an indicator that more detailed type information needs to be accessed elsewhere.
The odd-address facet of the tagging scheme can be used in several other R i s e
architectures. Other efficient means of tagging data are discussed in Chapter 1 of [Lee91].
The work discussed in Section 9.6 concerns, among other things, software techniques for assigning, where possible, types to variables in languages in which, strictly speaking, only data objects have types.
Fast function calling is essential for these languages because they strongly en courage dividing programs up into many small functions. Polymorphism affects function-calling overhead because it causes determination at run time of the code to invoke for a particular call, based on the types of its arguments. Rises are ideal
r l 503
500
edr
504
Section 5.9 Wrap-Up 133
in this regard, since they generally provide fast function calls by branch-and-link in structions, pass arguments in registers, and, in most cases, provide quick ways to dispatch on the type of one or more arguments. One can move the type of an argu ment into a register, convert it to an offset of the proper size, and branch into a table of branches to code that implements a function for the corresponding type.
Dynamic and symbolic languages generally make heavy use of heap storage, largely because the objects they are designed to operate on are very dynamic in size and shape. Thus, it is essential to have a very efficient mechanism for allocating heap storage and for recovering it. Storage recovery is uniformly by garbage collection, not by explicit freeing. The most efficient method of garbage collection for general use for such languages is generation scavenging, which is based on the principle that the longer an object lives, the longer it is likely to live.
Finally, the ability to incrementally change the code of a running program is a characteristic of most of these languages. This is usually implemented in compiled implementations by a combination of run-time compilation and indirect access to functions. If the name of a function in a running program is the address of a cell