Compile Time and Run Time Performance with the Sun C++ Compiler

(1)

Compile−Time and Run−Time Performance

with the Sun C++ Compiler

Lawrence Crowl

Sun Microsystems Inc. −− Forte Tools

ABSTRACT

The organization of your C++ software can have a significant impact on both compile−time and run−time performance. We present a number of techniques for improving performance. Topics include compilation, build environment, include files, inline functions, class design, function design, catching errors, memory allocation, and template design and instantiation. Because inconsiderate persuit of performance can lead to unmaintainable

software, we also discuss the engineering and maintenance implications of the techniques.

IINTRODUCTION

The goal of any software development effort should be to produce a quality product in a timely manner. The run− time performance of the product contributes to its qualityby delivering results faster. The compile−time performance of the product contributes to its timeliness by shortening the edit−compile−debug cycle. However, both run−time performance and compile−time performance are secondary factors in achieving timely quality. Therefore, you should consider run−time and compile−time performance improvements only when justified by improvements in overall product quality and timeliness.

This paper presents a number of techniques for improving the performance of C++ programs [ISO 1998] using the Sun C++ compiler [Sun 1999]. Most of these techniques are not specific to the Sun C++ compiler and may be more generally applied. However, these techniques are not a complete list, so also see the references. The techniques presented are tactical in nature, providing generally local and incremental improvements in performance. You can often gain significantly more performance by

improvements in data structures, algorithms, abstractions, goals, and management

Many of the techniques presented in this paper are not always helpful. For instance, some run−time performance improvements will degrade compile time and coding time. Likewise, some compile−time performance improvements

will degrade run time. Fortunately, while all of a program contributes to compile time and coding time, usually only a small fraction of the program contributes most of the execution time. You can concentrate run−time performance improvements on the fraction of the program that

contributes significantly to execution time, and apply compile−time performance improvements to the rest of the program.

The fraction of the program that contributes most of the execution time is usually not obvious. Furthermore, evidence indicates that developers are usually very poor judges of where the time is spent in programs. Therefore, you should apply performance improvements only after the performance is measured, and the hot spots are known, not suspected.

There are a number of tools available to measure performance. We recommend the Analyzer [Sun 1999], which is part of our product. For more system−level effects,

truss can also be a useful tool.

COMPILATION

We are continually improving our compilers, both in their compile time and the run time of generated programs. Therefore, the most cost−effective technique for improving run−time and compile−time performance is often to simply use the newest compiler.

Turnaround

To achieve the fastest build times, compile without -g and without optimization.

Optimization

The compiler flag -O produces optimized code, but optimized at a fairly low level, specifically -xO2 on a scale from -xO1 to -xO5. For good run−time performance at reasonable compilation times, consider using -xO3. The run−time performance gained is larger than the

(2)

because C++ produces a more complex intermediate form, which the code generator is able to reduce effectively with

-xO3. Conversely, C++ programs generally rely on indirection more heavily than C programs do, and hence compiling at -xO4 and -xO5 is often less effective than it would be for C programs. Optimization at these levels may not yield run−time improvements that are justified by their compile−time costs.

Finally, a note of caution about the -fast flag. This flag is carefully interpreted to provide maximum performance on a single machine. It is not good choice for programs that are to run on a wide variety of machines.

BUILD ENVIRONMENT

The build environment can have a significant affect on product build times. Build times can easily be reduced by an order of magnitude by paying careful attention to system−level effects. The following rules are usually helpful, but their significance to overall build times can vary quite a bit.

Use file systems that are local to the process. NFS file access is significantly slower than local file access, and the compiler may need to read and write many files.

Use make, not scripts fired by make. Taking the time to write explicit dependencies on directory creation, etc., will generally yield faster built times.

Use flat make structures and efficient makefiles. The cost of yet another invocation of make can be quite large, particularly when the makefiles use macros that resolve to executing a program. [Miller 1997]

Use dmake [Sun 1999], which is part of our product. Even on a single−processor machine, dmake can sometimes halve build times by overlapping computation and I/O. Build a program with templates in a single directory. This approach reduces the number of template repositories, and hence reduces the number of template instantiations. These template instantiations can be a significant portion of program build time.

Install a local compiler. For small programs, the cost of loading the compiler from an NFS file system will never be amortized.

LIBRARIES

Most C++ libraries are designed for good performance over

a wide range of uses. For example, the C++ standard library [ISO 1998] embodies good algorithms with a significant number of services. However, their general applicability means that such libraries are not optimal for any given application. You can achieve better run−time performance, and often better compile−time performance, with your own custom library. However, designing, documenting, coding, and debugging a custom library is detrimental to timeliness. Make your own custom library only when you need product quality more than you need product timeliness.

INCLUDE FILES

A significant cost in compilation of C++ programs is simply reading what can be a very large number of include files. Program organizations that reduce the number of files included can significantly reduce compile times.

Abstraction

The most effective technique for reducing file inclusion is to reduce interface complexity. Reducing interface

complexity generally improves the understandability of the product and hence increases quality and timeliness, so good interfaces generally improve both compilation time and debugging time. You should understand abstractions that can help organize your program effectively [Gamma 1995]. However, unbridled abstraction can be detrimental to the product. For example, when a project group must use an external interface, the group often wraps the external interface with an internal interface. This internal interface is often just a simple renaming of the external interface, providing none of the decoupling benefits that abstraction normally provides, but consuming all of the costs in development time, compile time, and run time. So create only abstractions that provide recognizable services. As an example of how abstractions can improve compile times, consider the case of a service provided through an abstract base class and factory, rather than through full classes and the new operator. For example:

class service {

public: virtual answer * query() = 0; virtual void statement( data * ) = 0; static service * generate();

protected: service(); };

...

service * instance = service::generate(); instance->statement( data1 );

(3)

In this example, the entire structure of the actual service is hidden from the clients of the service, and those clients need not spend the time to compile that structure, or include other structures implied by the internals of the service. Just as importantly, clients need not be recompiled when the internals of the service change, which can dramatically reduce incremental product build times.

Multiple Classes

Often several class definitions are needed to provide a service. These classes should all be defined in a single include file, rather than one class per include file. This technique reduces the number of files opened, which can have a significant effect on compilation time.

Incomplete Classes

Many classes only get used indirectly within an include file. For instance, the answer and data classes in the above example are used only through pointers. In these cases, the compiler does not need the full definition of the classes, and incomplete definitions suffice. So, instead of

#include "answer.h"

use

class answer;

Guarded Includes

Because managing the proper inclusion of class definitions can often become difficult, programs tend to rely on idempotent, or self−protecting, include files. For example,

#ifndef SERVICE_H #define SERVICE_H class service { ... }; #endif

If the file has not been previously read, the class is defined, otherwise, the file is essentially empty. We can avoid the unnecessary read of the file by testing the guard on the outside of the include as well.

#ifndef SERVICE_H #include "service.h" #endif

This technique is most effective when the compiler is hitting the limits of available main memory and as a result the file cache is ineffective.

INLINE FUNCTIONS

Inline function expansion can significantly decrease run time, but the cost is significantly increased compilation time.

Do not use inline functions when you anticipate changes to the function definition and recompiling all callers is expensive. In particular, an inline function definition commits you to a much larger ABI. Do not commit yourself without due consideration.

Cost/Benefit

Calls to small and quick functions can be smaller and quicker when expanded inline than when called normally. Conversely, calls to large or slow functions can be larger and slower when expanded inline than when branched to. Furthermore, all calls to an inline function must be recompiled whenever the function definition changes. Consequently, the decision to use inline functions requires considerable care.

The general advice is to use inline functions when the inlined function results in fewer instructions than the call to the function, or when the application performs significantly faster with the function inline. Usually this advice means that inline functions are either trivial, or used extensively in the hot areas of the program.

Un−Inlinable Functions

The compiler cannot inline all function calls, so making the most effective use of function inlining may require some source changes. Use the +w (or +w2) option to learn when function inlining does not occur. The compiler currently will not inline a function below -xO4 under the following circumstances.

The function contains difficult control constructs, such as loops, and switch statements. Many times these functions execute the difficult control constructs infrequently. To inline such a function, split the function into two parts, an inner part that contains the difficult control constructs and an outer part that decides whether or not to call the inner part. This technique of separating the infrequent part from the frequent part of a function can improve performance even when the compiler can inline the full function.

The inline function body is large or complicated. Apparently simple function bodies may be complicated because of calls to other inline functions within the

(4)

body, or because of implicit constructor and destructor calls (as often occurs in constructors and destructors for derived classes). For such functions, inline expansion rarely provides significant performance improvement, and the function is best left un−inlined. Constructors should be out−of−line until proven effective inline. The arguments to an inline function call are large or

complicated. The compiler is particularly sensitive when the object for an inline member function call is itself the result of an inline function call. To inline functions with complicated arguments, simply compute the function arguments into local variables and then pass the variables to the function. This rewrite is particularly effective when the parameters are references.

The function contains static local variables.

However, our compiler’s inlining effectiveness is growing, so what is true this release may not be true next release.

Conditional Inlining

Inlining reduces run time, but increases compile time. Unfortunately, the value of run time versus compile time is not fixed throughout program development. During initial development and debugging, compile time is significantly more important than run time. Conversely, during final program build and benchmarking, run time is more important than compile time. You can control the amount of inlining by making inline definitions conditional on a preprocessor variable. The technique has the general structure:

service.h:

#ifndef SERVICE_H #define SERVICE_H

// Include files needed by the class definitions. // Declare incomplete classes needed.

// Define the classes.

// Declare functions that are never inlined. #ifdef INLINING

#include "service.ii" #else // INLINING

// Declare other functions when not inlined. #endif // INLINING

#endif // SERVICE_H service.ii

// Include other files needed by inline definitions. // Declare incomplete classes needed.

// For each function definition: #ifdef INLINING

inline

#endif INLINING

// Provide the function header and body. service.cc

#include "service.h"

// Include other files needed for all function definitions. #ifndef INLINING

#include "service.ii" #endif // INLINING

// Define the functions that are never inlined.

The following example illustrates the above technique, as well as some techniques described earlier. (The code itself is meaningless.) store.h #ifndef STORE_H #define STORE_H class store { public: store(); private: long variable; }; #endif // STORE_H store.cc #include "store.h" store::store() { } type.h: #ifndef TYPE_H #define TYPE_H #include "store.h" class sym; class type : store { public: sym* mine(); type* strip(); private: sym* my_var; }; #ifdef INLINING #include "type.ii" #endif // INLINING #endif // TYPE_H type.ii: #ifdef INLINING inline #endif sym* type::mine() { return my_var; }; type.cc: #include "type.h" #include "sym.h"

(5)

#ifndef INLINING #include "type.ii" #endif // INLINING type* type::strip() { return mine()->mine(); } sym.h: #ifndef SYM_H #define SYM_H #include "store.h" class type; class sym : store { public: sym* up(); type* mine(); private: sym* up_var; type* my_var; };

extern sym* inner( sym* ); #ifdef INLINING

#include "sym.ii" #else

extern sym* outer( sym* ); #endif // INLINING #endif // SYM_H sym.ii: #include "type.h" #ifdef INLINING inline #endif // INLINING type* sym::mine() { return my_var; }; #ifdef INLINING inline #endif // INLINING sym* outer( sym* arg ) { return inner( arg->up() ); } sym.cc: #include "sym.h" #include "type.h" #ifndef INLINING #include "sym.ii" #endif // INLINING sym* sym::up() { return up_var; }

sym* inner( sym* arg ) { return arg->mine()->mine(); } main.cc: #include "type.h" #include "sym.h" int main() { sym sv; sym* sp = sv.mine()->strip()->mine(); return &sv == outer( sp );

}

This technique is particularly valuable during debugging because the compiler does not inline when the -g switch is present. As a consequence, during debugging, inline functions consume object space and compile time without any performance gain. Likewise, the reduced number of file inclusions needed by inlining also improves compile time. However, when inlining is enabled, compilations will generally include more files than they would without the technique. Because inlining is generally most useful during optimization, and optimization itself is slow, the relative cost in compile time due to extra includes is low. So, for debuggable builds, use -g and do not define INLINING. For fast builds without debugging, do not use -g and do not define INLINING. For run−time optimized builds, use one of the optimization flags and define INLINING.

There is one strong note of caution in using this technique. All objects including files testing INLINING must be compiled with the same definition of INLINING. In practice, this restriction means that only application programmers can use the technique.

CLASS DESIGN

The design of classes and class hierarchies can have a significant impact on run−time performance, through increased operation efficiency, reduced data size, and reduced code size.

Bit−Fields

Convert boolean and small integer values into bit−fields, and then place these fields adjacent to each other. This technique can substantially reduce data size, though it generally requires more instructions to implement. The cost of reading bit−fields depends primarily on how much instruction parallelism is available, while the cost of writing bit−fields is usually double the cost of writing a non−bit−field.

Field Order

Order data fields to limit the size of unused padding between fields. In particular, order fields by their types in the order long double, double, long long int, pointers, long

(6)

be feasible or reliable when the program uses typedefs that change with the build environment.

Devirtualizing Methods

Convert a virtual method to a non−virtual method when the virtualness of the method is not used. This technique will improve the speed of calls, and enable inlining for further speed improvements. In general, this technique should be applied towards the end of program development, when the abstractions used by the program have become stable and program run time is more important than development flexibility.

Converter Methods

The standard mechanism for dynamic cast is very general, and consequently is more expensive than most specific needs warrant. You can achieve similar functionaly by providing dynamic converter methods rather than using

dynamic_cast. For example, instead of

car* car_ptr = dynamic_cast<car*>( vehicle_ptr )

use

car* car_ptr vehicle_ptr->to_car()

where

class vehicle {

virtual car* to_car() { return (car*)0; } };

class car : vehicle {

virtual car* to_car() { return this; } };

Duplicate Methods

Rather than duplicate methods in different derived classes, define them in their least common base class. This transformation reduces the working set, which improves run time, and reduces the number of functions, which improves compile time.

Default Operators

Use the default operators. If a class definition does not declare a parameterless constructor, a copy constructor, a copy assignment operator, or a destructor, the compiler will implicitly declare them. These are called default operators. A C−like struct has these default operators. When the compiler builds a default operator, it knows a great deal

about the work that needs to be done and can produce very good code. This code is often much faster than user− written code because the compiler can take advantage of assembly−level facilities while the programmer usually cannot. So, when the default operators do what is needed, the program should not declare user−defined versions of these operators.

Default operators are inline functions, so do not use default operators when inline functions are inappropriate (see the previous section). Otherwise, default operators are appropriate when:

The user−written parameterless constructor would only call parameterless constructors for its base objects and member variables. Primitive types effectively have "do nothing" parameterless constructors.

The user−written copy constructor would simply copy all base objects and member variables.

The user−written copy assignment operator would simply copy all base objects and member variables. The user−written destructor would be empty. Some C++ programming texts suggest that class

programmers always define all operators so that any reader of the code will know that the class programmer did not forget to consider the semantics of the default operators. Obviously, this advice interferes with the optimization discussed above. The resolution of the conflict is to place a comment in the code stating that the class is using the default operator.

Value Classes

C++ classes, including structures and unions, are passed and returned by value. For Plain−Old−Data (POD) classes, the C++ compiler is required to pass the struct as would the C compiler. Objects of these classes are passed directly. For objects of classes with user−defined copy constructors, the compiler is effectively required to construct a copy of the object, pass a pointer to the copy, and destruct the copy fter the return. Objects of these classes are passed indirectly. For classes that fall between these two requirements, the compiler can choose. However, this choice affects binary compatibility, so the compiler must choose consistently for every class.

For most compilers, passing objects directly can result in faster execution. This execution improvement is

(7)

complex numbers or probability values. You can sometimes improve program efficiency by designing classes that are more likely to be passed directly than indirectly.

In compatibility mode (-compat[=4]), a class is passed indirectly if it has any one of the following:

A user−defined constructor A virtual function

A virtual base class

A base that is passed indirectly

A non−static data member that is passed indirectly Otherwise, the class is passed directly.

In standard mode (the default mode), a class is passed indirectly if it has any one of the following:

A user−defined copy constructor A user−defined destructor A base that is passed indirectly

A non−static data member that is passed indirectly Otherwise, the class is passed directly.

To maximize the chance that a class will be passed directly: Use default constructors, especially the default copy

constructor, where possible.

Use the default destructor where possible. The default destructor is not virtual, therefore a class with a default destructor should generally not be a base class.

Avoid virtual functions and virtual bases.

Classes (and unions) that are passed directly by the C++ compiler are passed exactly as the C compiler would pass a struct (or union). However, C++ structs and unions are passed differently on different architectures.

SPARC V7/V8

Structs and unions are passed and returned by allocating storage within the caller and passing a pointer to that storage. (That is, all structs and unions are passed by reference.)

SPARC V9

Structs with a size no greater than 16 bytes (32 bytes) are passed (returned) in registers. Unions and all other structs are passed and returned by allocating storage within the caller and passing a pointer to that storage. (That is, small structs are passed in registers; unions and large structs are passed by reference.) As a consequence, small value classes are passed as efficiently as primitive

types. IA32

Structs and unions are passed by allocating space on the stack and copying the argument onto the stack. Structs and unions are returned by allocating a temporary object in the caller’s rame and passing the address of the temporary object as an implicit first parameter.

FUNCTION DESIGN

Functions efficiency is the core of good run−time performance, and C++ functions often have subtle inefficiencies because the language often hides significant work behind very terse syntax.

Reference Parameters

Programmers often code functions with reference

parameters rather than value parameters because "it is more efficient", rather than because the semantics are

appropriate. However, as we saw above, value parameters may be more efficient, and even when they are not directly more efficient, the compiler knows that a value parameter cannot be aliased, and so can better optimize access to the parameter.

However, switching between reference parameters and value parameters is not neutral with respect to semantics, so care must be taken that the switch is appropriate. In

particular, if the class has virtual functions, passing the class as a value parameter is probably not appropriate.

Temporary Objects

C++ functions often produce implicit temporary objects, each of which must be created and destroyed. For non− trivial classes, the creation and destruction of temporary objects can be expensive in terms of processing time and memory usage. The C++ compiler does eliminate some temporary objects, but it cannot eliminate all of them. Write functions to minimize the number of temporary objects as long as your programs remain comprehensible. Techniques include using explicit variables rather than implicit temporary objects and using reference parameters rather than value parameters. Another technique is to implement and use operations such as += rather than implementing and using only + and =. For example, the first line below introduces a temporary object for the result of a + b, while the second line does not.

(8)

T x( a ); x += b;

Cache Member Variables

Accessing member variables is a common operation in C++ member functions.

The compiler must often load member variables from memory through the this pointer. Because values are being loaded through a pointer, the compiler sometimes cannot determine when a second load must be performed or whether the value loaded before is still valid. In these cases, the compiler must choose the safe, but slow, approach and reload the member variable each time it is accessed.

You can avoid unnecessary memory reloads by explicitly caching the values of member variables in local variables, as follows:

Declare a local variable and initialize it with the value of the member variable.

Use the local variable in place of the member variable throughout the function.

If the local variable changes, assign the final value of the local variable to the member variable. However, this optimization may yield undesired results if the member function calls another member function on that object. This optimization is most productive when the values can reside in registers, as is the case with primitive types. The optimization may also be productive for memory−based values because the reduced aliasing gives the compiler more opportunity to optimize.

This optimization may be counter−productive if the member variable is often passed by reference, either explicitly or implicitly.

On occasion, the desired semantics of a class requires explicit caching of member variables, for instance when there is a potential alias between the current object and one of the member function’s arguments. For example:

void complex::operator *= ( const complex& right ) { real = real * right.real + imag * right.imag; imag = real * right.imag + imag * right.real; }

will yield unintended results when called with:

x *= x;

The cached version does not exhibit the problem.

void complex::operator *= ( const complex& right ) { double this_real = real, this_imag = image; double that_real = right.real, that_imag = right.image;

real = this_real * that_real + this_imag * that_imag; imag = this_real * that_imag + this_imag * that_real; }

Common Call Elimination

C++ programs usually rely heavily on procedure calls, and a function often calls the same function with the same data more than once. While compilers are generally effective at sharing the results of common subexpressions, they are not effective at sharing the results of common calls. For example,

person* grandfather = me->father()->father(); person* grandmother = me->father()->mother();

has a common call, and will generally be more efficient written as

person* dad = me->father();

person* grandfather = dad->father(); person* grandmother = dad->mother();

While this technique may seem obvious, and indeed it is, it often requires special attention because the extremely concise nature of C++ calls often provides insufficient reminders to consider efficiency.

Const Declarations

Many programmers use const reference parameters with the intent to inform the compiler that the parameter is read− only,

and hence should be better optimized. Unfortunately, this intent is at odds with the C++ language definition. The

const keyword says that the storage may not be modified through the given name. What it does not say is that the storage cannot be modified through some other name. With the exception of variables directly declared const, which means you can only initialize them, const is basically ineffective at improving run−time performance. It does, however, catch errors in the programming process.

CATCHING RUN−TIME ERRORS

While errors are generally uncommon, their very possibility can have a significant effect on program development time and on program performance.

(9)

Exit

In general, you should use assert, abort, or exit (or an application−specific variant) when program termination is acceptable. This technique yields the smallest intrusion on program performance.

Exceptions

When program termination is unacceptable and errors are rare, use exceptions. The C++ exception mechanism is designed for effective handling of rare events. The Sun C++ compiler has an extremely efficient implementation of exceptions when they are not thrown, and it is getting faster with each release. Furthermore, exceptions limit the coding needed to deal with the problem to the point of detection and the point of handling, and avoid handling by

intermediates. However, exceptions have a high cost when actually thrown, and are thus not suitable when an error is common.

Return Codes

For common errors, the traditional function return codes provide the most efficient run−time solution.

MEMORY ALLOCATION

Memory allocation can consume a very large amount of time. Reducing the amount of allocation is usually the most effective technique for improving performance.

Deallocating Memory

Once unnecessary allocations are eliminated, the next most effective technique for improving performance is to deallocate memory when it is no longer needed. This is best accomplished with explicit calls to delete. However, applications may lose control of their memory, and a conservative garbage collector can sometimes be a critical tool. The C++ compiler provides a conservative garbage collector with the option -library=gc.

Allocating Efficiently

Once memory is deallocated, you must turn to improving the efficiency of allocation. The option -library=gc provides afaster malloc and free. If that is not enough, use class− specific new and delete operators. These are commonly called pool allocators.

Finally, when processing character strings, consider using

realloc when adjusting string size rather than separate new

and delete.

TEMPLATES

Programs that use templates heavily can benefit quite a bit from careful template design.

Factoring Common Code

The simple intent of templates is to adapt a general class or function design to a particular type. This is readily

accomplished by a simple template design. Unfortunately, these simple designs often result in template instances with large amounts of identical code because the entire class or function is specialized, not just the part that must be specialized.

template< typename T > struct list {

T* chain;

T* next() { return chain; } void method();

};

You can reduce the amount of code generated, and hence the working set size of the application, by defining a common non−template base class and a derived template class. The base class contains the common code, and the derived class contains the part that must be specialized.

struct list_base { void* chain; void method(); };

template< typename T > struct list : list_base { T* chain;

T* next() { return (T*)chain; } };

However, there are some costs associated with this

technique. First, debugging may be hindered when the base class uses void* rather than a pointer to the actual type. Second, use of a base class will introduce another layer of function calls, which can cost more than the lower working set size justifies. Third, some type−based optimizations may no longer be possible, which is a concern only at very high levels of optimization.

Definitions

You can organize your template definitions in two ways, with definitions included and with definitions separated. The definitions−included organization places the template

(10)

definitions within the template header file.

whatever.h

template< typename T > int function( T ) { return 0; }

The definitions−separated organization places the template definitions in a separate file, which is read as−needed during template instantiation.

whatever.h

template< typename T > int function( T ); whatever.cc

template< typename T > int function( T ) { return 0; }

It is important to note that the template definition file is essentially an automatic include file. Do not compile this file.

The definitions−included organization requires the compiler to store template definitions, thus increasing compiler data size, even when no templates are instantiated, but it reads the least number of files. The definitions−separated organization

avoids saving templates when none are instantiated, but requires searching directories and reading additional files when templates are instantiated. If a particular compilation usually does not instantiate a template, the definitions− separated organization is most efficient. Otherwise, the definitions−included organization is most efficient.

Instantiation

Template instantiation can have a major impact on compile time. The default instantiation mode of the compiler is

-instances=extern, which provides full support of the language with minimal programming effort but is generally the slowest method of compilation.

The fastest template instantiation mode is

-instances=explicit, but it requires manual instantiation of all templates needed, which can require as substantial amount of programmer effort to identify all the needed instantiations.

A slightly easier mode is -instances=semiexplicit, which requires manual instantiation of the directly used instances, but permits the compiler to automatically produce

dependent instances. However, this mode restricts the kinds of templates one can use, and produces larger programs than -instances=explicit. See compiler manual for a

discussion of the restrictions.

The easiest mode is -instances=static, which automatically instantiates, but restricts the kinds of templates one can use, and produces even larger programs than

-instances=semiexplicit.

In summary, use the default instantiation mode when program size is important, but compilation time is not. Use

-instances=static when compilation time is important, but program size is not. Use -instances=explicit or

-instances=semiexplicit when both are important and you are willing to work for the benefits.

SUMMARY

First and foremost, design your program well. Well− designed programs will often have good compile−time performance. If not, local changes will usually bring good compile−time performance. When the run−time

performance is inadequate,

explore compilation options and alternate libraries. When the run−time performance is still inadequate, reprogram the small portion of the code that contributes most to run time.

REFERENCES

1. Bentley, J., (1982). Writing Efficient Programs. Prentice−Hall.

2. Bulka, D., Mayhew, D. (2000). Efficient C++:

Performance Programming Techniques. Addison−

Wesley.

3. Gamma, D., et al., (1995). Design Patterns: Elements

of Reusable Object−Oriented Software. Addison−

Wesley.

4. ISO. (1998). Programming languages − C++. ISO/IEC 14882:1998.

5. Meyers, S., (1992). Effective C++. Addison−Wesley. 6. Meyers, S., (1996). More Effective C++. Addison−

Wesley.

7. Miller, P., (1997). Recursive Make Considered Harmful, AUUGN ’97.

8. Sun Microsystems, (1999). Forte 6 Analyzing Program

Performance With Sun WorkShop.

9. Sun Microsystems, (1999). Forte 6 C++ Programming

Guide.

10. Sun Microsystems, (1999). Forte 6 C++ User’s Guide. 11. Sun Microsystems, (1999). Forte 6 TeamWare User’s

(11)