2. Graph Structures
2.6. Small-World Network Models
The Watts-Strogatz small-world model [66], also referred to as theβ-model [83], is a popular method for generating synthetic small-world graphs. The model starts of with a regular ring lattice ofnvertices that are all connected to theirknearest neighbours (i.e. k/2 neighbours in either direction). Then, a vertex i∈Vis chosen and the edge(v,v+1)connecting it to its nearest neighbour in a clockwise sense is rewired with probabilityp. When rewiring an edge, the old edge is deleted and a new edge(i,j)is created instead. The new neighbour jis picked uniformly at random from all vertices j∈V,∀j∈/Ai,i= j, whereAi is
the adjacency-list of vertexi(i.e. duplicate edges and self-connections are not allowed). This process is repeated for every vertex going around the ring in clockwise direction, until one lap has been completed. Then the whole process is repeated, considering the edge(v,v+2)to the second-nearest neighbour of every vertex for rewiring, then the third-nearest and so on, untilk/2 laps have been completed. At this point every edge has been considered for rewiring exactly once.
Theβ-model provides a simple way to create a graph structure that ranges anywhere from a “large- world” regular lattice with high clusteringCand an average shortest path lengthLthat scales linearly with the system sizen, to a nearly random graph with smallCand logarithmic length scaling by merely adjusting the rewiring parameter fromp=0 top=1. However, the most interesting graphs are generated for values ofpsomewhere in between regular lattice and random graph. It turns out that for a large range of p, the model generates small-world networks withC(p)CrandomyetL(p)almost as small asLrandom[66]. The
reason for this behaviour is that, as mentioned before, the first shortcuts have a highly nonlinear effect on L, drastically reducing the average path length for the entire graph. Larger values of pgive diminishing returns forL. The clustering coefficientC, on the other hand, decreases much more slowly, because every rewired edge only has a local effect on the direct neighbourhood of the affected vertices. This behaviour is illustrated in Figure 2.3.
While increasingly large values of p eventually destroy the local community structure of the graph, moving it out of the region of disorder that classifies a small-world graph and turning it into a random graph, Barth´el´emy and Amaral [84] have shown that the small-world region can be extended to smaller and smaller values ofpas long as the network sizenis sufficiently large. The appearance of small-world
2.6 Small-World Network Models 31
Figure 2.4.: Watts’ small-worldα-model withn=500 vertices, degreek=10 andα=2.0. The highly connected clusters of individual nodes are still clearly visible, but enough random shortcuts have been generated to combine all local communities into a fully connected graph. Visualised using the force-based layout algorithm in GraViz3D.
behaviour is a phase transition5that depends both onnandp. They propose the scaling law [84]:
L(n,p)∼n∗Fk
n
n∗
, (2.11)
whereFkdepends only on the degreekwithFk(u1)∼uandFk(u1)∼ lnu,n∗is a function ofp
such thatn∗=p−τwithτ=1 forp1 [82,85]. Hence, the crossover size from large-world to small-world behaviour isn∗. The mean shortest path length grows linearly fornO(p−1)and logarithmically for large networks with nO(p−1). This also holds for small-world networks built on lattices of dimensiond greater than one [82].
Theβ-model can easily be extended to more dimensionsd, iterating over all nearest neighbours in all dimensions before moving on to the second nearest neighbours and so on [83]. Because edges are rewired at random, it is possible for the graph to become disconnected, which is usually treated as infinite distance between two vertices in different disjoined components. This can be undesirable in some circumstances. To circumvent this problem, Newman and Watts [86] proposed a modified version of the model that does not rewire edges, but instead adds additional links between two randomly chosen vertices with probabilityp, one for each edge on the original lattice substrate. The number of shortcutsnpk/2 thus remains the same on average, while the mean degreek=k+pkof the final graph increases with p. Furthermore, the modified model allows self-edges and multiple edges, so that the distribution of shortcuts is completely uniform.
Where there is aβ-model, there must be anα-model as well. Watts’ α-model [83] seems to have attracted somewhat less effort than theβ-model, probably due to its higher computational demands and its more specialised focus on social networks. It was designed to construct a network in a fashion similar to how real social networks form, based on the currently existing network structure. Like the Barab´asi-Albert model [58] for scale-free graphs (see Section 2.4), theα-model thus generates small-world graphs using preferential attachment following a set of specific rules (see Appendix B). A single parameterαis used to
5See Newman and Watts [82] for a note on why this is a phase transition with varying rewiring probabilityp, contrary to what
32 2 Graph Structures
interpolate between a highly clustered but disconnected “caveman” world, where everybody is connected to everybody else in the same “cave” but to no one outside, and a random “Solaria”6world, where current friendships have almost no influence on the establishment of new friendships. An example for a graph generated with theα-model is given in Figure 2.4.
In the case ofα=0 even a single shared neighbour drastically increases the propensity of two vertices to form a direct link between each other. Therefore, after the first edges have been created randomly, almost all following edges are created mainly based on the existing connections with very little chance of creating a random edge between two vertices that have no shared neighbours. The result is a disconnected “caveman” world. Asα increases, existing links lose some of their influence on the network structure, allowing the “caves” to become connected by the increasing number of random shortcuts. Whenα→∞the network closely resembles a random graph, although it never becomes entirely random as the construction algorithm systematically loops over all vertices, which means that edges are not created in an entirely independent fashion as requested by the Gn,p and Gn,mrandom graph models described in Section 2.2. The model
exhibits a phase change in its properties at a particular value of theαparameter. For example, the average shortest path length starts low for smallαvalues and rises to a peak with increasingα, signalling the point where the network becomes fully connected. It then falls away to a flat fixed value at highα[83].
6Named after a planet in an Isaac Asimov novel (1957), where humans live in isolation and interact only via robots and computers
Chapter
3
Parallel Processing Architectures
This chapter describes the compute architectures and parallel programming libraries used throughout this thesis. Features and restrictions that are specific to a particular processor design are explained.
3.1. Multi-Core CPU
While CPU manufacturers have traditionally increased the CPU frequencies from one generation to the next, this trend has slowed down dramatically over the last years, as the increasing power consumption becomes more and more difficult to manage. Manufacturers like Intel and AMD have instead started to incorporate more cores onto their chip designs, as illustrated in the CPU architecture diagram in Figure 3.1. Although the processing cores also integrate short vector units, typically 128-bit wide and more recently 256-bit with the new advanced vector extensions (AVX) in the latest generation of Intel processors, the CPU implementations described here do not take advantage of them beyond what the compiler can do automatically. These vector units only support floating point operations and are generally more limited than the SIMD units found in GPUs. The author concentrates on developing data parallel algorithms for the GPU and limits the CPU development efforts to task-level parallelism.
The consequence of the new processor designs for software developers and end users is that sequential software does not automatically run faster on newer CPUs. Programmers have to rethink and modify their software to use multiple threads so that parallel tasks can run concurrently on different CPU cores. Most programming languages either come with built-in support for multi-threading or allow external libraries
Figure 3.1.: A high-level view of a typical multi-core CPU architecture. Most modern desktop CPUs have four or six independent superscalar CPU cores with their own L1 and L2 caches and control logic. Some CPUs integrate a GPU on the chip at the cost of two main processor cores. A large, shared L3 cache provides low latency access to cached data.