Glenn Fowler, David Korn, Stephen North, Herman Rao, and Kiem-Phong Vo
2.3 Disciplines and Methods
2.4.3 Primitives and Implementation
We now consider the interface that libgraph presents to C programmers. Most programmers can write basic graph data structures and functions in a few dozen lines of code. A typical approach is to store nodes in a hash table and give each node an out-edge list. Attributes are hard- wired as elds in the node and edge structs. This approach is simple and usually ecient. The problem is that it does not permit much sharing of code or of graph data les. Usually, if les are considered at all, they are programmed to have xed eld, line-oriented formats that are not compatible between applications that employ dierent attribute sets. As argued previously, we need a richer, more exible model. To support this model, libgraph has about 30 entry point functions. They can be classied into the following groups:
Create, search for, or delete graphs, subgraphs, nodes, and edges. Attach, get, or set attributes.
Traverse node or edge lists, or subgraph trees. Read or write les.
The basic data structures dened by libgraph are graphs, nodes, and edges. As a client program runs, it can decorate these with attributes. lib- graph also supplies a few auxiliary data structures to manage attributes. There are two kinds of attributes. String attributes are name-value pairs with default values, and are intended principally for I/O. For example, when libgraph reads an external le, the string attributes are automati- cally attached to graphs, nodes, and edges.
The other kind of attributes are runtime records dened in C by ap- plication programmers. These attributes allow programs to operate on values, such as weights, counts, and marks, using ecient native rep- resentations. In libgraph-1, programmers dene one record type shared throughout the entire client program. This proved to be a serious limita-
76 North
tion because it impedes the design of layered graph libraries; all functions generally must share the same compile-time denition of attributes.
libgraph-2 allows multiple runtime records. Each record has a header containing a unique name (a string, such as layout_data or
union_find_fields), a pointer to the next record in the list, and
application-specic elds. These records are kept in a circularly linked list attached to a node, edge, or graph. Thus, a function can nd its runtime data by searching for the record with a given name. It is up to application programmers to manage this name space sensibly. Because it is clearly undesirable to search this list on every data reference, libgraph- 2 has an optional move-to-front search on this list, with hard and soft lock requests, making frequently referenced data available in one pointer reference within a set of compatible functions.
Any desired conversion between runtime and string values must be written explicitly by application programmers. Usually, conversion func- tions are called immediately after reading or before writing graphs. For example, a program may convert a numeric weight in a runtime record to a string attribute for printing in a graph le.
libgraph's node and edge sets are stored in libdict splay tree dictio- naries. An advantage of splay trees over hash tables is the support for ordered sets. If nodes and edges are labeled and stored in their input sequence order, lters may process graphs without scrambling their con- tents, as would be the case if only hashing were employed. Further, some algorithms seem more predictable to users if they process nodes or edges in a known sequence, not some seemingly unpredictable order. libdict also allows changing dictionary ordering functions, and we take advantage of this to order nodes and edges by external keys sometimes, and by internal number at other times. User-dened ordering functions are also permit- ted. Though we have not exploited this much, it may be useful in coding geometric algorithms.
While libdict provides many convenient features, naive use for node and edge sets as we just described would incur signicant overhead. For example, callingDtnextto move from one item to the next in a splay tree
dictionary set involves a function call and possibly several comparisons and pointer operations for tree rotation. Ideally, to compete with the
Libraries and File System Architecture 77
hash table/linked list representation of graphs, we would like the cost of moving from one element to the next in a set to be just a few machine instructions or perhaps even just one. This cost is critical when coding an algorithm whose inner loop involves scanning edge lists.
Our solution is to give libgraph-2 a function to temporarily linearize or atten node and edge sets. This makes the splay tree look like a linked list: the root node or list head is set to the smallest element, and the left and right tree pointers are prev and next in the list. libgraph also denes macros or inline functions to traverse these lists very quickly. This means that a graph is either in edit mode (having ecient random access), or traversalmode (having ecient sequential access). A boolean ag in each graph records its mode; this ag is tested by random-access operations to trap related errors as early as possible.
Naive use of libdict also costs memory in the form of container objects for dictionary members. Each container object has left and right tree pointers and a user object pointer. libgraph eliminates these containers by using the libdict option of embedding the headers in user objects (in this case, the graphs, nodes, and edges). A slight complication is that an edge needs to belong to two dictionaries (both in- and out-edge sets). Ac- cordingly, libgraph creates two structs for every edge; each has a pointer to one endpoint node and to its partner edge. Each node and edge struct consumes 7 words (plus additional storage for node names and any at- tributes that are attached).