Complex networks are mathematical structures which encode relationships between sets of discrete entities. As such, they are particularly useful in the study of complex systems, since they provide a means by which the interactions between components of a system can be represented. Indeed, since the non-trivial configuration of these interactions may itself be the source of complexity, they play a central role in the study of complex systems.
The study of networks in the context of crime is one of the primary themes of this thesis, and a number of examples will be considered. In particular, both phys- ical and abstract networks will be considered, representing the structure of urban streets and the proximity of crimes, respectively. Since the same basic theory and terminology is common to all cases, and will be invoked frequently, a brief introduc- tion to core network concepts will now be given. This will cover the basic structures and notation used in all discussion of networks, while more specific concepts will be introduced throughout the thesis as they become relevant.
Network science is a substantial field of research in its own right, with a number of strands: empirical study of real-world networks, modelling of network formation, and applications in real-world models. Much of this research is not relevant in the present context, and a review will not be given; instead, relevant literature will be introduced at points throughout the thesis. A number of comprehensive reviews can be found elsewhere (Newman, 2003; Boccaletti et al., 2006; Costa et al., 2011), as can those which focus on the particular sub-fields of spatial (Barth´elemy, 2011) and temporal (Holme & Saram¨aki, 2012) networks.
Mathematical representation
From a technical perspective, the term network refers to a mathematical object which is otherwise known as a graph. Indeed, the distinction between the two terms
is primarily one of context: ‘graph’ refers to the abstract object, whereas ‘network’ is more often used to imply that the structure in question represents real-world entities. The majority of terminology used to refer to networks is taken from the mathematical field of ‘graph theory’, which is an extensive and mature subject (see Bollob´as, 2002).
Basic notation
A network G = (V, E) is an ensemble of vertices, V , together with the links, E, which join them1. The set of vertices, V = {v}, is a non-empty and countable set
of N elements, and E = {e} is a set of M elements, each of which is a pair of vertices. The number of vertices, N , is referred to as the order or the network, and the number of links as its size.
In order to be able to refer to elements of the network, its vertices are labelled using the integers 1, . . . , N , where the order is unimportant as long as the labelling is consistent and unique. Each particular vertex is then referred to using a subscript, vi, or the label itself, i, depending on which is convenient; the later convention will
be adopted for the remainder of this discussion. In this way, specific links can be denoted using pair notation: if a link exists between two vertices, i and j, it is represented by the pair (i, j). When a link is present, the vertices are said to be adjacent and each is a neighbour of the other.
At this point it is necessary to note a distinction between two types of network: undirected and directed. In an undirected graph, links have no orientation, and the ordering of vertices in any link (i, j) is unimportant. In the directed case, however, the ordering is significant, and such a link (i, j) is said to exist from i to j. Directed networks can contain links in both directions between a given pair of vertices, or just one. Graphically, directionality is usually represented by adding an arrow to a link; both undirected and directed examples are shown in Figure 1.1.
1Vertices are also commonly referred to as nodes, and links as edges, but these terms are
avoided here to avoid confusion with the concept of ‘edge effects’ and the use of ‘activity node’ within criminological theory.
1
2
3
4
(a) An undirected net- work - G1
1
2
3
4
(b) A directed network - G2Figure 1.1: Graphical representations of simple undirected and directed net- works. Circles represent vertices and the presence of a link is indicated by a line or arrow.
Two extreme cases, in terms of the presence of links, arise frequently and are iden- tified by name. An empty network is one which contains no links (that is, E =∅), and is simply defined by the number of vertices it contains. A complete network, on the other hand, is one in which all possible links are present. A complete undirected network therefore contains N (N2−1) links (the number of pairs of vertices), whereas a complete directed network contains N (N − 1) (since two links are present for each pair; one in each direction).
It is also possible for graphs (i.e. networks) to be derived from other networks. This can be done in a number of ways, many of which involve the notion of a sub- graph. The vertices of a subgraph are a subset of the vertices of the original network, and its links are a subset of the links of those original network for which both ver- tices remain. Formally, then, for a network G = (V, E), a subgraph G0 = (V0, E0) is one such that V0 ⊆ V and E0 ⊆ {(i, j) | (i, j) ∈ E and i, j ∈ V0}. An induced subgraph is a particular case in which all of the original links between members of V0 are retained; that is, E0 ={(i, j) | (i, j) ∈ E and i, j ∈ V0}
Adjacency matrix
The structure of a network can be expressed in a number of ways, the most con- venient and common of which is provided by the adjacency matrix. This encodes
all information necessary to describe a network: for the network, G = (V, E), intro- duced above, the adjacency matrix, A, is an N× N matrix such that
aij =
1 if (i, j)∈ E (i.e. there is a link connecting i and j) 0 otherwise.
(1.1)
The adjacency matrices of the two networks shown in Figure 1.1, for example, are
A1 = 0 1 0 1 1 0 0 1 0 0 0 1 1 1 1 0 A2 = 0 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 . (1.2)
For undirected networks, aij = aji for all i, j ∈ V , and so these adjacency matrices are symmetric, with all information contained in the upper (or lower) triangular part. There is no such constraint for directed networks.
Network metrics
Many properties of network structure can be measured, and a number of metrics are common in the empirical study of networks. Each emphasises some different aspect of structure, and their use is dependent on the particular characteristics of interest. The number of metrics employed is very large, and only the most basic will be given here; a number of others, however, will be introduced in the course of the thesis.
Many of the quantities typically studied involve the properties of individual ver- tices, and many concern some notion of ‘centrality’. The most common concept is that of degree, which simply measures the number of links which are incident with a particular vertex (i.e. the number of neighbours it has). For a node i in an undirected network, this is typically denoted ki. In terms of the adjacency ma-
trix, A, it is also straightforward to see that ki is equal to Pjaij. In the directed
case, analogous quantities of in-degree and out-degree are defined as the number of inward-pointing and outward-pointing links, respectively. The degree distribution is
the distribution of these values for a given network, and is frequently studied.
Other metrics concern larger-scale structures within networks, and the concept of clustering provides one such example. Clustering measures the tendency for links to form between the neighbours of a given node, and can be interpreted as the proba- bility that the neighbours of a node should themselves be neighbours. For a given node, i, it is measured by the clustering coefficient, Ci, which is the ratio of the
number of links between the neighbours of i to the maximum possible number of such links:
Ci =
qi
ki(ki− 1)/2
(1.3)
where qi is the number of edges between neighbours of i, and ki is the degree of i.
Again, this can be averaged over all vertices of the network, providing a macro-level measure of clustering: hCi = 1 n X i C(i) (1.4)
This is also an example of a metric which can, perhaps more intuitively, also be thought of in geometric or graphical terms. The clustering coefficient can also be expressed in terms of the closure of triangles in the network; that is, the probability that, when two sides of a triangle are already present, the third side will also be: It is notable that this quantity can be defined in several ways, typically involving the counting of certain types of subgraph:
C4 = number of closed triplets
number of connected triples. (1.5)
Graphical interpretations such as these will be used a number of times within the thesis.