• No results found

Shared objects in Linux, BSD and Mac systems normally use the so-called position- independent code. The name "position-independent code" actually implies more than it says. A code that is compiled as position-independent has the following features:

• The code section contains no absolute addresses that need relocation, but only self- relative addresses. Therefore, the code section can be loaded at an arbitrary

memory address and shared between multiple processes.

• The data section is not shared between multiple processes because it often contains writeable data. Therefore, the data section may contain pointers or addresses that need relocation.

• All public functions and public data can be overridden in Linux and BSD. If a function in the main executable has the same name as a function in a shared object, then the version in main will take precedence, not only when called from main, but also when called from the shared object. Likewise, when a global variable in main has the same name as a global variable in the shared object, then the instance in main will be used, even when accessed from the shared object. This so-called symbol

interposition is intended to mimic the behavior of static libraries. A shared object has a table of pointers to its functions, called procedure linkage table (PLT) and a table of pointers to its variables called global offset table (GOT) in order to implement this "override" feature. All accesses to functions and public variables go through the PLT and GOT.

The symbol interposition feature that allows overriding of public functions and data in Linux and BSD comes at a high price, and in most libraries it is never used. Whenever a function in a shared object is called, it is necessary to look up the function address in the procedure linkage table (PLT). And whenever a public variable in a shared object is accessed, it is necessary to first look up the address of the variable in the global offset table (GOT). These table lookups are needed even when the function or variable is accessed from within the same shared object. Obviously, all these table lookup operations slow down the execution considerably. A more detailed discussion can be found at

http://www.macieira.org/blog/2012/01/sorry-state-of-dynamic-libraries-on-linux/

Another serious burden is the calculation of self-relative references in 32-bit mode. The 32- bit x86 instruction set has no instruction for self-relative addressing of data. The code goes through the following steps to access a public data object: (1) get its own address through a function call. (2) find the GOT through a self-relative address. (3) look up the address of the data object in the GOT, and finally (4) access the data object through this address. Step (1) is not needed in 64-bit mode because the x86-64 instruction set supports self-relative addressing.

In 32-bit Linux and BSD, the slow GOT lookup process is used for all static data, including local data that do not need the "override" feature. This includes static variables, floating point constants, string constants, and initialized arrays. I have no explanation why this delaying process is used when it is not needed.

Obviously, the best way to avoid the burdensome position-independent code and table lookup is to use static linking, as explained in the previous chapter (page 155). In the cases where dynamic linking cannot be avoided, there are various ways to avoid the time-

consuming features of the position-independent code. These workaround methods depend on the system, as explained below.

Shared objects in 32 bit Linux

Shared objects are normally compiled with the option -fpic according to the Gnu compiler manual. This option makes the code section position-independent, makes a PLT for all functions and a GOT for all public and static data.

It is possible to compile a shared object without the -fpic option. Then we get rid of all the problems mentioned above. Now the code will run faster because we can access internal variables and internal functions in a single step rather than the complicated address calculation and table lookup mechanisms explained above. A shared object compiled

without -fpic is much faster, except perhaps for a very large shared object where most of the functions are never called. The disadvantage of compiling without -fpic in 32-bit Linux is that the loader will have more references to relocate, but these address calculations are done only once, while the runtime address calculations have to be done at every access. The code section needs one instance for each process when compiled without -fpic because the relocations in the code section will be different for each process. Obviously, we lose the ability to override public symbols, but this feature is rarely needed anyway.

You may preferably avoid global variables or hide them for the sake of portability to 64-bit mode, as explained below.

Shared objects in 64 bit Linux

The procedure to calculate self-relative addresses is much simpler in 64-bit mode because the 64-bit instruction set has support for relative addressing of data. The need for special position-independent code is smaller because relative addresses are often used by default anyway in 64-bit code. However, we still want to get rid of the GOT and PLT lookups for local references.

If we compile the shared object without -fpic in 64 bit mode, we encounter another problem. The compiler sometimes uses 32-bit absolute addresses, mainly for static arrays. This works in the main executable because it is sure to be loaded at an address below 2 GB, but not in a shared object which is typically loaded at a higher address which cannot be reached with a 32-bit (signed) address. The linker will generate an error message in this case. The best solution is to compile with the option -fpie instead of -fpic. This will generate relative addresses in the code section, but it will not use GOT and PLT for internal references. Therefore, it will run faster than when compiled with -fpic and it will not have the disadvantages mentioned above for the 32-bit case. The -fpie option is less useful in 32-bit mode, where it still uses a GOT.

Another possibility is to compile with -mcmodel=large, but this will use full 64-bit addresses for everything, which is quite inefficient, and it will generate relocations in the code section so that it cannot be shared.

You cannot have public variables in a 64-bit shared object made with option -fpie because the linker makes an error message when it sees a relative reference to a public variable where it expects a GOT entry. You can avoid this error by avoiding any public variables. All global variables (i.e. variables defined outside any function) should be hidden by using the declaration "static" or "__attribute__((visibility

The gnu compiler version 5.1 and later has an option -fno-semantic-interposition, which makes it avoid the use of PLT and GOT look, but only for references within the same file. The same effect can be obtained by using inline assembly code to give the variable two names, one global and one local, and use the local name for local references.

Despite these tricks, you may still get the error message: "relocation R_X86_64_PC32 against symbol `functionname' can not be used when making a shared object; recompile with -fPIC", when the shared object is made from multiple modules (source files) and there is a call from one module to another. I have not yet found a solution to this problem.

Shared objects in BSD

Shared objects in BSD work the same way as in Linux.

32-bit Mac OS X

Compilers for 32-bit Mac OS X make position-independent code and lazy binding by default, even when shared objects are not used. The method currently used for calculating self- relative addresses in 32-bit Mac code uses an unfortunate method that delays execution by causing return addresses to be mispredicted (See manual 3: "The microarchitecture of Intel, AMD and VIA CPUs" for an explanation of return prediction).

All code that is not part of a shared object can be speeded up significantly just by turning off the position-independent code flag in the compiler. Remember, therefore, always to specify the compiler option -fno-pic when compiling for 32-bit Mac OS X, unless you are making a shared object.

It is possible to make shared objects without position-independent code when you compile with the option -fno-pic and link with the option -read_only_relocs suppress. GOT and PLT tables are not used for internal references.

64-bit Mac OS X

The code section is always position-independent because this is the most efficient solution for the memory model used here. The compiler option -fno-pic apparently has no effect. GOT and PLT tables are not used for internal references.

There is no need to take special precautions for speeding up 64-bit shared objects in Mac OS X.