Breadth-first search - Heterogeneous Multi-core Architectures for High Performance Computing

expressed by the properties of the shortest path, given by the BFS search [124].

A wide literature explores different BFS solutions, based on multicore processors [125,126,124] or heterogeneous architectures [127,128,9,129]. In [124] a new sophisticated data structure to reduce cache coherence traffic between CPUs is presented. This implementation outperforms the previous ones, including other architectures such as cell processors [128], clus- ters [125] and shared memory supercomputers [130]. A simple and faster implementation of BFS on multicore CPUs has been proposed in [9]. The paper also proposes a new hybrid method based on CPU and GPU which selects the best execution methods among sequential and parallel approach, depending on the graph scale.

Unfortunately traditional software and hardware parallel implementations do not necessary work well for large scale graphs due to the graph properties [131]. For instance, many graphs are unstructured and highly irregular and they require fine-grained memory accesses to be explored. These char- acteristics lead to suboptimal performance in cache-based microprocessors due to poor spatial and temporal locality of memory accesses. Moreover, since no computation must be performed in BFS algorithm, the execution is dominated by the memory latency. Also on reconfigurable architectures, graphs with unstructured and irregular memory accesses cannot achieve high performance. The low memory bandwidth generates many pipeline stalls, resulting in a little FPGA-acceleration.

In this chapter a novel idea for an efficient graph implementation on reconfigurable architectures is presented.

4.2 Breadth-first search

Given a graph G(V,E) composed of a set of vertices V, a set of edges E and a source s in V, the BFS algorithm explores the edges of G to discover all the vertices reachable from s. It computes the distance from s to each reachable vertex in terms of smallest number of edges and it produces a breadth-first tree rooted at s. Vertices are visited in levels: when a vertex is visited at level l, it also said to be at distance l from the root.

the path containing the smallest number of edges. The name breadth-first search comes from the fact that the algorithm expands the frontier between discovered and undiscovered vertices uniformly across the breadth of the frontier. This means that the algorithm discovers all vertices at distance k from s before discovering vertices at distance k+1.

4.2 Breadth-first search 83

Figure4.1illustrates how BFS works on a sample graph. To keep track of progress, vertices in figure are white, gray and black. All vertices are initially white, except for the source vertex which is gray. When a vertex is encountered during the search, it becomes non-white. Therefore, gray and black vertices have already been discovered. However BFS distinguishes between them to ensure that the search proceeds in a breadth-first manner. Suppose for example that vertices w and t are connected by an edge (w, t ∈ E) and vertex w is black, then vertex t can be gray or black (see also figure

4.1c and e). Grey vertices may have some adjacent white vertices which represent the frontier between discovered and undiscovered vertices (for example see vertices t and u in figure4.1c). The algorithm also builds the breadth-first tree, which initially contains just the source vertex s. Every time a white vertex v is discovered, the vertex v and the correspondent edge are added to the tree. Since a vertex is discovered at most once, it has at most one parent. The sequential breadth-first search algorithm follows: The graph representation usually adopted is the ”Compressed Sparse Row” (CSR) format. It consists of three vectors:

• V O(N)-sized - Input/Output;

• Offset O(N)-sized - Input;

• Adj list O(M)-sized - Input.

The BFS level of each node is computed starting from a source vertex and stored in an O(N)-sized array (V). The graph representation merges the successors of all vertices into a single O(M)-sized array (Adj list) with the beginning location of each vertex’s adjacency list stored in a separate O(N)- sized array (Offset).

Figure4.2 shows the CSR representation relative to the previous example (see also figure4.1a). The source vertex is s (i.e. vs = 0). The offset values

in position s (i.e. 4) and s - 1 (i.e. r = 2), indicate that successors of vertex s are stored in the adjacency list in position 2 and 3 (for a generic vertex v, the successor are stored from of f set[v − 1] to of f set[v] − 1 ). Then, the vertices vector must be updated according to the vertices stored in the adjacency list (red arrows in figure4.2, Vr= 1and Vw = 1).

Algorithm 4.1.Sequential BFS exploration of a graph Input: G(V,E), graph;

source vertex, s; level, exploration level;

Q, vertices to be explored in the current level; Qnext, vertices to be explored in the next level;

Ev, set of edges connected to v

marked, array of booleans: markedi∀i ∈ [1 · · · | V |]

1: ∀i ∈ [1... | V |] : markedi = f alse

2: markeds= true 3: level ← 0 4: Q ← {s} 5: repeat 6: Qnext ← {} 7: for all v ∈ Q do 8: for all n ∈ Evdo

9: if markedn= f alsethen

10: markedn← true 11: Qnext ← Qnext∪ {n} 12: end if 13: end for 14: end for 15: Q ← Qnext 16: level ← level + 1 17: until Q = {}

Figure 4.2: Compressed sparse row format

In document Heterogeneous Multi-core Architectures for High Performance Computing (Page 92-95)