Depth-first Search - Graph Traversing - Case Studies in Rooted Graph Programs

6. Case Studies in Rooted Graph Programs

6.2. Graph Traversing

6.2.1. Depth-first Search

A depth first-search of a graph starts by visiting an arbitrary node. Each step of the search visits the target of an unexplored outgoing edge from the most recently-visited node v. If no such edge exists, then v is finished, and the outedges of the next most recently-visited node are examined. This process repeats until all outedges of all visited nodes have been explored. Search continues by visiting an arbitrary unvisited node. Search terminates when all nodes in the graph have been visited.

For a graph G, a preordering is a list of the nodes in G, where v occurs before w if v is visited before w during a DFS. A postordering is a list of nodes in G, where v occurs before w if v is finished before w during a DFS. A reverse postordering is the reverse of a postordering, which is in general not equal to the preordering given by the same search. An important property of a reverse postordering is that it is a topological ordering of the nodes of G [THCRS09], a property which we shall use in Section 6.4.

These concepts are illustrated in Figure 6.1, the DFS of a square graph. Colours represent the state of each node: unvisited nodes are white, visited nodes are grey, and finished nodes are black. Each node is labelled with pre/post, where pre and post are the positions of the node in the graph’s preordering and postordering respectively. Explored edges are dashed. We refer to the nodes by their position in the grid. For example, the top left node is TL which is the first node visited by the algorithm. Then the DFS explores the edges in the order T L → T R, T R → BR, T L → BL, BL → BR. This search order produces the preorder TL, TR, BR, BL, the postorder BR, TR, BL, TL, and the reverse postorder TL, BL, TR,

Main = init; DFS!; label source DFS = forward!; try back else break

1 ⇒ init() 1 1 1:0 i 2 3 n:m 1 ⇒ forward(i,m,n:int) i 2 n+1 3 n+1:m 1 i 1 n:m 2 ⇒ label source(i,m,n:int) i:m+1 1 n:m+1 2 i 2 j 3 n:m 1 ⇒ back(i,j,m,n:int) i:m+1 2 j 3 n:m+1 1

Figure 6.2.: The GP2 program dfs

BR. Note that the reverse postorder is different from the preorder: BL was the last node to be visited, but it was not the last node to be fully explored. We note that DFS may also be used to categorise edges, which reveals interesting properties about the graph [THCRS09], but these properties are not relevant to the programs in this chapter.

Remark 10. In the context of graph-traversing GP 2 programs, we say a host graph node is visited when it first participates in the match of a successful rule application.

Remark 11. We use the term blank throughout this chapter to refer specifically to unmarked nodes and edges labelled with the empty list. We use the term blank graph to refer to a graph whose nodes and edges are blank.

The GP 2 program dfs, shown in Figure 6.2, is a concrete realisation of the DFS algorithm. It performs a directed depth-first search on the blank host graph starting at an arbitrary node v. Nodes not reachable from v are not visited by the search. The output graph is the host graph with two changes:

1. The nodes visited in the DFS are labelled pre : post as illustrated in Fig- ure 6.1. During the computation, visited nodes are only labelled with pre until they are finished, at which point post is appended to the label. 2. An additional root node stores two counts of the number of nodes visited

in the DFS (obtained through the preorder and postorder labelling). This illustrates that graph traversal can perform a global computation on host graphs.

The program maintains two root nodes. The grey root node in the host graph is used to navigate the graph in a depth-first manner. The second unmarked root

node, called the counter, is created by the program. It stores the current preorder count and the current postorder count. It assigns its preorder count to the root node after it is moved forward, and its postorder count before it is backtracked. When the program terminates, these counts will both be equal to the number of nodes in the host graph that are reachable from the root node.

The rule init prepares the search by matching an arbitrary host graph node called the source. The source is rooted and coloured grey which marks it as visited. It is also labelled with its preorder position (1). The rule also creates the counter. The procedure DFS is applied as long as possible. A loop iteration has two steps. First, it moves forward along a path of unmarked edges passing through blank nodes for as long as possible, then moves back one step when forward movement is no longer possible. Each rule application moves the root node along the path, greying blank nodes and labelling them with the next preorder number from the counter. Explored edges are dashed. Unlike in the previous example, this is not a permanent mark. Instead, it acts as a trail of breadcrumbs to facilitate backtracking once a node is finished. Visited nodes are greyed and labelled, so they cannot be matched as the third node in forward’s left-hand side. At some point, forward is no longer applicable, either when the root node has no outgoing edges, or when the targets of all of its outgoing edges have been visited. In either case, the root node is finished. The rule back appends the current postorder count to its label, moves the root node back one step along the path of dashed edges, and unmarks it. After a single application of back the next loop iteration starts, which searches for an unexplored outnode from the current root node. In this way, all outedges of visited nodes are explored, and every node reachable from the source node is reached. DFS! terminates when back is no longer applicable, at which point the root node is the node matched by init. The construct try back else break is used to exit the loop when break fails without reverting the graph to the state before entering the current loop iteration. Finally, label source appends the postorder count to the source because it is not the subject of a back rule application.

Figure 6.3 shows an example run of dfs on the same graph as in the previous example. The top left graph is the state after applying init. The top right graph is the state after two applications of forward. Observe that the nodes are labelled in the order in which they are visited. The bottom right graph is the state after two applications of back and one application of forward. The rightmost nodes have been labelled with their postorder positions, but the top left node has not since the top left node is not yet finished: the search has continued on its second outgoing edge 1 → 4. The bottom left graph is the output graph of the program. The counter is labelled with two integers, both equal to the number of nodes in the host graph.

Remark 12. In the following proofs, we use root node to refer only to the root node that was part of the original host graph, i.e. not the counter node.

Lemma 4. Let G be a blank graph with source v. The following property is an invariant of the loop DFS!: the root node is reachable from v through a path of

1 1:0

_⇒

∗ 1 2 3 3:0 ∗

⇒

1 2:2 3:1 4 4:2 ∗

⇐

1:4 2:2 3:1 4:3 4:4

Figure 6.3.: Example execution of dfs

dashed edges. Every node in this path is grey. There are no other dashed edges. Proof. The property trivially holds immediately after the application of init. Let w be the root node. Assume there is a path, possibly empty, of dashed edges from v to w consisting of only grey nodes, with no other marked edges in the graph. After an application of forward, this path has been extended by a single marked edge connecting w and one of its outgoing neighbours w0. w0 is the new root node, w0 is grey, and the rule creates no additional marked edges. Therefore the property still holds. A similar argument shows that back also preserves the invariant.

Lemma 5. Let G be a blank graph. The program dfs terminates when run on G.

Proof. We only need to prove termination of the loop DFS!; the rest of the program consists of single rule applications. Let > be the following lexicographic ordering on graphs: G > H if G contains more blank nodes than H, or if G and H contain the same number of blank nodes and G contains more dashed edges than H. If forward is applied to G to give H, then G > H because forward marks and labels a blank node. In addition, back is applied to G to give H, then G > H because back undashes an edge and does not change the number of blank nodes in the graph. It follows that DFS! terminates because there are a finite number of graphs less than the host graph with respect to the given ordering. Lemma 6. Let G be a blank graph, and let v be the source of the DFS. The program dfs visits all nodes reachable from v.

Proof. We give a proof by contradiction. Assume that there exists a node w reachable from v that is unvisited when the loop DFS! terminates. w is blank because it has not been visited. w is not the root node, since neither forward nor back makes a blank node the root. w is not the target of an edge outgoing from a visited node w0: since w0is visited, it must have been the root at some stage in the computation. If w were the target of an outgoing edge of w0, then forward would have matched for w0 → w, either when w0 was first made the root or immediately after it was made the root from an application of back. Therefore, the source of any edge whose target is w is an unvisited node. We can inductively extend this argument to conclude that all nodes from which w can be reached are unvisited. However, v is visited by init, contradicting the assumption that w is reachable from v. Therefore dfs visits all nodes reachable from v.

Proposition 3 (Correctness of dfs). Given a blank input graph G, dfs chooses a source node v and labels all nodes w reachable from v with a two-element list, where the first element is the preorder position of w, and the second element is the postorder position of w.

Proof. We refer to the first and second elements of the counter’s list by pre and post respectively. First, init matches a node v and labels it 1, which is clearly v’s preorder position. The rule also sets pre to 1 and post to 0. A node w is visited when it is matched by forward, which labels w with pre + 1 and increments pre. No other rule modifies the value of pre, so the node labelling respects the definition of preorder. The rule forward is looped, so an application of back is only attempted when the current root node does not have an outgoing edge whose target is unvisited, precisely when that node is finished. An application of back labels a finished node with post + 1 and increments post. No other rule modifies the value of post, so the node labelling respects the definition of the postorder. Lemma 5, dfs terminates, which guarantees a valid output graph. Finally, Lemma 6 ensures that all nodes reachable from v are visited by dfs.

Proposition 4 (Complexity of dfs). The program dfs runs in linear time on host graphs of bounded degree, and in quadratic time on host graphs of unbounded degree.

Proof. We assume that the host graph is blank because this provides the worst case complexity. The rule init matches in constant time because all nodes in the host graph are valid matches. All other rules are fast rule schemata. By Theorem 2 (see Section 4.5.1), they are applied in constant time on host graphs of bounded degree and they are applied in linear time on host graphs of unbounded degree. The rule label source is applied once, while the rules forward and back are applied a linear number of times in the node size of the graph. It follows that the program runs in linear time on host graphs of bounded degree, and quadratic time otherwise.

→

0 1 1

→

0 1 2 1

Figure 6.4.: Illustration of a breadth-first search.

In document GP 2: Efficient Implementation of a Graph Programming Language (Page 117-122)