Data Structures - Implementing GP 2 - GP 2: Efficient Implementation of a Graph Programming Lan

5. Implementing GP 2

5.6. Data Structures

5.6.1. Host Graphs

A graph structure stores node and edge structures in handcoded dynamic arrays. The initial array sizes are computed at compile time from the number of nodes and edges in the host graph. For large host graphs this is the least power of 2 greater than the number of nodes or edges. There is a minimum size to reduce overhead in resizing the array when executing graph programs that start with a small or empty host graph since graph programs could build a potentially large host graph from a relatively small input. Free lists are used to prevent fragmentation of the arrays. Nodes and edges are uniquely identified by their indices in these arrays. The graph structure also stores the node count, the edge count, and a linked list of root node identifiers for fast access to the root nodes in the host graph.

A node structure contains the node’s identifier, a root flag, a matched flag, its label, its degrees, and references to its inedges and outedges. Each node structure contains four integers for storing two inedges and two outedges. Additional incident edges are placed in a dynamic array. These arrays are not supported by free lists. The motivation behind this choice of incident edge storage is to limit memory allocation overhead for host graph construction and modification: many common graph classes such as grids, binary trees, and cycles consist mainly of nodes with a small number of outgoing or incoming edges. While this increases the base size of node structures, it is not especially wasteful because in practice, host graphs contain very few isolated nodes. An edge structure contains the edge’s identifier, its label, the identifiers of its source and target, and a matched flag. The matched flag of nodes and edges, initially false, is set during matching when a host graph item is paired with a rule graph item. It is used to check if candidate host items have already been matched.

Although the data structure is optimised in some respects, there is nothing that is tailored towards querying host graphs for matching information beyond the bare minimum. GP 1’s graph data structure supported complex queries by, for example, storing lists of nodes and edges by label. One could query host graphs to return a list of edges with a specific label outgoing from a specific node. As a consequence, host graph updating becomes slower, but this is significantly

typedef s t r u c t H o s t L a b e l { MarkType mark ;

i n t l e n g t h ;

s t r u c t H o s t L i s t ∗ l i s t ; } H o s t L a b e l ;

Figure 5.6.: C data structure for host graph labels.

typedef s t r u c t HostAtom { char t y p e ; union { i n t num ; s t r i n g s t r ; } ; } HostAtom ;

Figure 5.7.: C data structure for host graph labels and atoms. A HostList is a doubly-linked list.

outweighed by the reduction in search time for matching rules. The underlying philosophy is that in graph transformation, a graph is queried more often than it is updated. The current graph data structure could be improved by supporting similar querying operations. This is achievable by auxiliary data structures that store nodes and edges by their labels.

5.6.2. Host Graph Labels

The definitions of the data structures for host labels and host atoms can be seen in Figure 5.6 and Figure 5.7. A label structure contains an enumerated type for marks (MarkType), the length of the list, and a HostList. The HostList type is a doubly-linked list of HostAtoms in order to implement the constant time list matching algorithm from the previous chapter. A HostAtom is a union of integers and C strings, equivalent to GP 2’s atom type. Storage of lists at runtime may have an impact on performance when manipulating large labelled host graphs. We describe and empirically evaluate two implementations in Section 5.7.

5.6.3. Morphisms

The morphism data structure not only needs to capture the node-to-node and edge-to-edge functions that define a graph morphism (see Definition 3), but also the assignments mapping variables to their values. Thus the data structure used to represent morphisms contains the following four substructures: (1) an array of host node identifiers, (2) an array of host edge identifiers, (3) an array of assignments, and (4) a stack of variable identifiers. The first three items correspond to the mapping functions and the assignment. The assignment’s array entries are a pair of a character type ((n)o value, (i)nteger, (s)tring, (l)ist) and a value. The purpose of the stack will be explained shortly.

The library defines three functions to add variable-value assignments. One of these is addIntegerAssignment, which takes a morphism, an integer identifier i and an integer k. It adds the assignment i → k if it is compatible with the existing assignment. This is checked by inspecting the ith index of the assignment array. If no assignment to i exists, signified by the type ‘n’, then the function updates

s t a t i c unsigned h a s h H o s t L i s t ( HostAtom ∗ l i s t , i n t l e n g t h ) { unsigned hash = 0 ; i n t i n d e x ; f o r ( i n d e x = 0 ; i n d e x < l e n g t h ; i n d e x++) { HostAtom atom = l i s t [ i n d e x ] ; i n t v a l u e = atom . t y p e == ’ i ’ ?

atom . num : h a s h S t r i n g ( atom . s t r ) ; hash = ( ( hash << 5 ) + hash ) + v a l u e ;

}

return hash % LIST TABLE SIZE ; }

Figure 5.8.: GP 2’s list hashing function

the morphism by setting the array entry to (‘i’, k) and returning 1. Otherwise, the morphism contains some assignment i → k. It returns 0 if k = k0 and -1 if k 6= k0.

Some care is required to properly manage the assignments. The order of variable indices in the assignment array is determined by the order of variable dec- larations in the rule. There is no guarantee that the variables encountered at runtime follow this order. This causes a complication when match backtracking: if a rule graph item fails to match, only the variables in the label of that item should be removed from the assignment. Variables assigned in the matching of previous items should remain untouched. The stack is used to record assignment indices in the order in which the variables are assigned values. To support this, each node or edge array entry in the morphism contains the number of variables associated with that rule node or rule edge. In this way, backtracking a step is achieved by examining the number of variables associated with the current item, popping that number of items from the stack, and nulling each corresponding assignment entry.

In document GP 2: Efficient Implementation of a Graph Programming Language (Page 77-79)