• No results found

Design Considerations

In document prus-1995 (Page 55-63)

Glenn Fowler, David Korn, Stephen North, Herman Rao, and Kiem-Phong Vo

2.2 The ast Libraries

2.2.1 Design Considerations

The primary goals in building the ast components are applicability, e- ciency, ease of use, and ease of maintenance. However, there is no simple set of rules that would guarantee the simultaneous achievement of these goals. Often enough, the goals conict and decisions have to be made to balance the trade-os. Below are an eclectic set of design considerations used as guidelines in building the ast software.

2.2.1.1 Necessity

A component is not reusable unless it is used. This means that a reusable component should be built out of real needs. A way to meet this condition is to rst plan some applications, then build the functions that make up the applications as one or more libraries. Because libraries are often used in dierent ways, this approach has the additional advantage of forcing the programmer to think in advance about dierent usages resulting in better code quality.

An example of writing a library before writing a command is the libpp library (Section 2.2.9). libpp denes the token parsing and symbol process- ing engine in our K&R [KR88], ANSI [ANS90], and C++ [Str91] compat-

ible C preprocessor. Besides this original use, libpp has found use in other important language related tools, such as Cia (Chapter 6), a system for storing and nding information about C programs, and app (Chapter 5), a preprocessor to annotate C code with assertions.

2.2.1.2 Generality

Except for eciency concerns, reusable components should be designed for their most general applications. Often, this means putting together separate but related concepts into a single unifying interface. This is im- portant because applications often use similar mechanisms (for example, various search structures) in dierent ways (such as for storing objects of dierent types). Unifying the dierent mechanisms in a standard set of functions both simplies application construction and increases their ease of maintenance. A further important eect of generality is that it often opens up new uses.

Libraries and File System Architecture 29

The dictionary library libdict (see Section 2.3.3) shows an example of how related concepts are unied under a single general interface. Ordered and unordered objects are treated uniformly. Eciency is guaranteed by switching storage methods between hash tables for unordered objects and self-adjusting binary trees for ordered ones.

An aspect of generality related to portability is to provide common abstractions that hide the dierences in the underlying platforms. Though our software is UNIX-based, it is no secret that no two versions of UNIX are the same. In the short term, the existence of standard bodies, such as POSIX [POS90], actually worsens the situation, as the standards tend to be some amalgam of existing systems but unlike any of them. Sometimes, when the dierences in extant implementations of a desired feature are wide enough, the standards may even shy away from dening one. In Section 2.2.2, we describe a set of functions and header les that combine features from various UNIX avors. Our tools are written based on this interface to increase portability.

2.2.1.3 Extensibility

In building a library, certain low-level but critical functions from the underlying platform are frequently required. Sometimes it is protable to abstract such dependencies and let applications provide appropriate processing functions. This is similar to the idea of virtual functions in C++, which parameterize the operations of an abstract class. Section 2.3

discusses disciplines that are interfaces designed to capture external re- source dependencies. This allows applications to redene such resource requirements for a library without tampering with its internals.

2.2.1.4 Eciency

Eciency is a primary consideration in the construction of a reusable component, because the performance of such a component is amplied by its repeated use. When a reusable component does not perform ad- equately, programmers will be tempted to hand-code and create appli- cations that are hard to maintain. There are two aspects of eciency: internal and external.

30 Fowler, Korn, and Vo

Internal eciencyThis means, rst, that library components are imple- mented using the current best-known data structures and algorithms. Then, even if general algorithms may have good performance over a large class of operations, it is sometimes benecial to optimize code based on its most popular use or local hardware and platform features. An example of this type of optimization is the numerical conversion algorithm from an internal representation to an ASCII format in the

sfprintf()family of functions in the so library (Section 2.3.1). Here,

because base 10 is most commonly used, it is handled using a fast cus- tomized algorithm. Other bases are handled by a general but slower method.

External eciency:This means that the library interface is designed so that critical resources managed by the library can be eciently accessed by applications. An example of this is thesfreserve()function of the

so library that allows an application to directly and safely access the internal buer of an I/O stream. For applications accessing large chunks of data, this can dramatically reduce the number of memory copying operations between stream and application buers while still minimizing system calls. We have rewritten many system commands, such as pack and wc, based onsfreserve() with up to a factor of four

in performance improvement over the BSD4.3 versions of the same commands.

2.2.1.5 Robustness

A successful reusable component should be robust with respect to stresses on critical resources. There are two aspects of robustness: internal and external.

Internal robustness This means that the library components should be well tested in a variety of environments, that their implementation does not impose any articial constraints on resources, and that they can respond well to unexpected events. The ast components are continu- ally tested and used on nearly every UNIX platform. Articial con- straints, such as xed size arrays, number of bits in an int, etc., are

Libraries and File System Architecture 31

code is written in a style compilable under the K&R C, ANSI C and C++ dialects. This allows the code to be tested with the type check-

ing mechanisms of many C compilers, each with its own strengths and weaknesses. In addition, the code can be used transparently by appli- cations based on dierent C dialects.

External robustness: This means that the library design should prevent applications from making inherently unsafe usage and provide them with ways to deal with exceptions. An example of inherently unsafe usage is the stdio gets() function, which takes as input a buer with

unspecied size and returns data of unspecied length in the buer. Since neither buer size nor data size are known in advance, there is no precaution that either the library or the application can make to prevent buer overow. By contrast, the so library provides a func-

tion sfgetr(), which returns a pointer to a record delineated by some

application-dened record separator. The space for the record is inter- nally managed by the library, as only it can know how much space is required.

An aspect of external robustness related to extensions is to design global data structures so that applications see only what they require. For example, an application based on the so library does I/O via stream handles of the type Sfio t. However, from the application's

viewpoint, such a handle contains only the elements necessary to imple- ment fast operations, such as sfputc() or sfgetc(). Other members

of the structure are hidden from view. By doing this and by being care- ful to use memory allocation instead of static arrays, we can guarantee that, even at the binary level, application code will not be aected should the Sfio t structure need extensions.

2.2.1.6 Modularity

Modularity means that components and functions are suciently insu- lated from one another so that the implementation of one will not severely aect the implementation of another. It also means that the components and functions can be used independently. Modularity is important be- cause it reduces the complexity in interrelations among components. By and large, the ast libraries can be used in arbitrary order. Of course, using

32 Fowler, Korn, and Vo

some of them may mean that others will be implicitly required, but such requirements are transparent at the application level.

For example, the ast error-handling component uses so to format er- ror messages, so using an error-handling routine would implicitly mean using so. But this does not mean that any understanding of so is re- quired to use the error-handling functions. Within a library, to the extent possible, the functions are designed to be orthogonal. For example, sim- ilar to stdio, the so package allows an application to set its own buer for a stream. Unlike stdio, which requires that the buer be set before any I/O operation is performed, so allows arbitrary buer change. This may seem to be a trivial improvement but for the fact that so lets ap- plications create string streams to access memory buers, and being able to switch such buers at any time is important.

2.2.1.7 Minimality

Next to having an awkward or inconsistent interface, having too much in the interface is another factor that steepens the learning curve for users. As a general rule, an interface should not be provided unless it does something that cannot be done otherwise without signicant loss of eciency or convenience. For example, unlike stdio, which provides a multitude of convenience functions, such asgetchar()andputchar()in

addition to general stream manipulation functions, such as getc() and

putc(), so simply insists that the standard functions sfgetc() and

sfputc() be used.

The downside of minimizing the interface is awkward and redundant code at the application level when certain aggregate operations are com- monly performed. In such a case, a compromise should be reached. An example is thesfprints()function of so that creates a formatted string

in some system provided area and returns a pointer to that string. This function avoids the buer overow problem that often arises with the

sprintf() function of stdio. Here, strictly speaking, an application can

create the eect of sfprints() using a combination of a string stream,

sfprintf(), sfseek(), and sfreserve(), but this is too awkward to

Libraries and File System Architecture 33

2.2.1.8 Portability

Given the multitude of hardware and software platforms available today, portability is an absolute requirement for successful software. There are two dimensions to portability: code and data. At the code level, the ast libraries are portable to nearly all known UNIX and UNIX-like platforms (including Windows and Windows NT). This is aided by the ie prob- ing mechanism and an accompanying coding discipline (see Section 3.2) that allows recording knowledge learned during porting and enables code conguration without user intervention.

At the data level, it is desirable that persistent data (for example, disk les) or data communicated among processes also be portable. That is, such data should be independent of the local hardware representations. This is a hard problem and a complete solution for aggregate data types would require much more cooperation from the languages and compilers than currently available in any avor of C. However, for primitive types, the problem is more amenable to treatment. Assuming that the order of bits in bytes is the same across hardware platforms, the so library provides function to transparently read and write integers and oating point values.

2.2.1.9 Evolvability

A successful reusable library will undergo revisions as its design and im- plementation are stressed by usage or technology advances. When the in- terface is suciently general, certain types of revision can be kept hidden within the package, and the interface can be maintained as is. However, weakness in the design is often not revealed until challenged by new needs; then, the interface must change. Sometimes, this amounts to adding new functions to alter the states of the library.

An example along this way is the method idea discussed in Section 2.3, which allows customization of the abstract interface by selecting a new method. In other cases where new, clean, and well-designed interfaces provide much more benet than previous ones, compatibility must be broken. Then, it is important to help users ease the transition. An exam- ple is the stdio source and binary compatibility packages provided with

34 Fowler, Korn, and Vo

so. These packages allow applications based on stdio to either recompile or simply link with so transparently. This means that a software project can take advantage of new technologies immediately without too much upheaval in their programming practice.

2.2.1.10 Naming Conventions

Good interface conventions help to ease the learning curve of a software package and reduce name clashing when dierent packages are used to- gether in a single application. As libraries are developed by dierent peo- ple at dierent times, it is hard to achieve a uniform set of conventions.

Sometimes the interface is already dened by earlier packages (for example, the screen library in Section 2.2.4), so new conventions cannot apply. By and large, the naming conventions followed in ast are:

Standard prexes:Constants, functions, and variables used in a package

are always named using a small and unique set of prexes that clearly identify the package, including the name of the package. For example, the prexes SF, Sf, and sf are used for the so package.

Standard argument ordering: Functions typically manipulate some

structures that carry states across calls. Such state-carrying structures always come rst in a argument list. For example, in all so calls, the stream argument is always the rst. Sometimes arguments come in pairs (for example, a buer and its size). Then, the one containing data or used to store data comes rst (for example, the buer comes before its size). Finally, ag arguments are always last in the list.

Object identication: A library typically denes and uses many dier-

ent objects. It is helpful to use naming conventions that distinguish dierent object types. Preprocessor symbols or macros (for example,

SF READ) are dened using uppercase letters. Nonfunctional global sym-

bols (for example,Sfio t) often start with an uppercase letter.Sfio t

also shows that a library-dened type often has an axed t. Function

names (for example, sfopen()) are always in lowercase.

Reducing private global symbols:Global data private to a single library

is often placed in a singlestructso that only one identier in the data

Libraries and File System Architecture 35

library are kept in a structure Sfextern. The leading underscore in

Sfextern further emphasizes that it is a private symbol.

2.2.1.11 Architecture Conventions

Architecture conventions help to t a library into other families of li- braries, simplify the library design, and ease the learning process for new users. Below are some of the conventions used in the ast libraries.

Reusing well-known architecture conventions: Inventing a new library does not necessarily mean inventing new software architecture and conventions. It is often advantageous to follow already familiar conven- tions. For example, in many libraries, the modus operandi is to create some data structure, manipulate it, and nally destroy it. A good ex- isting convention is practiced by the UNIX le-manipulation system calls: open(), read(), write(), lseek(), and close(). Here, open()

creates a le descriptor, a data structure that carries states across sys- tem calls{andclose()destroys this data structure. The le descriptor,

that is, the object being manipulated, is always the rst argument to other calls, such as read() or write(), that require it. This is one of

the architecture conventions employed in ast.

Saving and restoring states:C and its sibling languages are stack-like in their function-call convention. Certain data structures in the libraries are shared across function calls, so it is good to architect the library functions so that state information can be saved and restored seam- lessly. A good convention for functions that alter states is to always return the previous state. This allows an arbitrary function to call a library function to perform some work, then restore the data struc- tures to their previous state before returning. For example, the func-

tion sfset() of so, used to set the ags controlling a stream, always

returns the previous set of ags.

Information hiding: A structure publicly advertised by a library needs to reveal only as much of its internals as required by the interface im- plementation. Revealing too much of the private structure members makes it dicult to improve or extend the library without violating object-code compatibility. So, for example, theSfio tstructure of so

36 Fowler, Korn, and Vo

reveals only as much of its members as required for implementing fast macro functions, such as sfputc(). Other members are visible only to

the so functions. This prevents application developers from improper use of information. Further, as the structure is incomplete, certain in- formation, such as its size, is meaningless in computation. On numerous occasions, this has helped to prevent applications from having to re- compile when Sfio t was extended.

Meaningful use of exceptional values: Separate operations can often be merged into one using certain exceptional values. An example is the

so call sfstack(base,top) that species a base stream and a top

stream to be pushed on top ofbase. I/O operations on the stream stack

identied bybase are performed on the topstream. Thus, sfstack()

is useful for applications that process nested streams, such as the C preprocessor and#include les. Now, an operation required for a stack

is the ability to pop the top element. Instead of providing a separate pop function, so does this withsfstack(base,NULL). AsNULLis often an

error value (for example, formalloc() ), using it in a meaningful way

like this also induces programmers to be more aware and check for it.

In document prus-1995 (Page 55-63)