• No results found

Figure 4.1: Example k-core decomposition of a simple graph

The k-core decomposition of a graph gives a measure of the cohesiveness of the nodes in the graph. The k-cores of a graph are a sequence of nested subgraphs of gradually increasing cohesion, such that the vertices

of each k-core sub-graph must have at least k other neighbours in that k-core

sub-graph (and therefore each neighbour similarly). The number k in this context then is an indicator of centrality within more cohesive regions of a network: vertices within a higher k valued k-core are more central and better connected to the vertices within that k-core than the vertices outside of that k-core.

The k-cores of the Internet are of interest to scalable Internet routing as they appear to be stable and useful for selecting landmarks in a Cowen Landmark Routing scheme. However, as explained in Chapter 3, a stumbling block to a distributed, compact Cowen Landmark Routing protocol for the Internet is the lack of a distributed form of the k-core graph decomposition.

This chapter will give:

• The details of a distributed k-core graph decomposition algorithm for static graphs with a proof of correctness and convergence in Section 4.3, building on properties established in Section 4.2.

4.1 Background 59

• The extension and optimisation of the distributed k-core algorithm and its proof of correctness and convergence to dynamic graphs in Section 4.4. • A detailed analysis of the performance characteristics of the distributed

k-core algorithm in Section 4.5 on large scale, Internet AS graphs with:

– The behaviour on static AS graphs examined in Subsection 4.5.2. – The efficiency of the optimised, dynamic version of the algorithm

relative to the static version, as edges of the AS graphs are changed, examined in Subsection 4.5.3.

These results will show the distributed k-core algorithm is correct and performs well on large, Internet AS graphs, even as they change.

4.1.1

k-core Preliminaries

The k-core decomposition of a graph was defined by Seidman [63] as follows: “Let G be a graph. If H is a subgraph of G, δ (H) will denote the minimum degree of H; each point of H is thus adjacent to at least

δ (H) other points of H. If H is a maximal connected (induced)

subgraph of G with δ (H) ≥ k, we say that H is a k-core of G.”

Note that there may be multiple k-core sub-graphs, within a graph. E.g. the fully-connected components of the graph correspond to the 1-cores of the graph [63]. A graph with multiple disconnected components would have multiple disconnected 1-cores.

An ambiguity potentially can arise in the terminology as to whether the term

k-core refers to the union of all nodes in a k-core in the graph regardless of

whether they are connected, or whether k-core refers to some specific, connected

k-core component.

The primary concern in this work is efficiently determining which k-cores a vertex is a member of. The connectedness of those k-cores is not of concern to the work here. So, for simplicity, “k-core” will refer to the union of all k-core sub-graphs for some specific value of k, and “k-cores” will refer to the collection of every k-core for all values of k ∈ N.

The k-cores are nested. Any vertex in a (k+1)-core must also be in a k-core, with all k-cores contained in the 0-core. Vertexes which are in the 0-core but not in the 1-core are disconnected vertices, with no edges. Disconnected vertices and the 0-core often are ignored.

The k-core decomposition of a graph is the production of all non-empty k-core sub-graphs. That is, the process of finding all k-cores within the graph and determining the k-core membership of each vertex. An example k-core graph decomposition is shown in Figure 4.1. Its 3-core has multiple components. All the vertices are in the 1-core, and the vertices in the 3-cores are also in a 2-core.

4.1.2

Relevant properties of k-cores

This section will set out a few properties of k-cores, and their vertices. These will be used to construct the well-known centralised algorithm for determining the

k-core membership of vertices (e.g. [62, 12]). These properties should also help

inform the later construction of the decentralised algorithm for k-core graph decomposition in Section 4.2.

Let N(G, v) to be the set of neighbours of v in the graph G, and

deg(G, v) = |N(G, v)|. Let ‘:’ be a shorthand for "such that". Let Ndeg (G, v, d) select the subset of the neighbours of a vertex, v ∈ G, which have a degree of at least d in G, i.e.:

Ndeg (G, v, d) = {w ∈ N(G, v) : deg(G, w) ≥ d} (4.1) The graph argument often is clear from the context of the vertex, in which case the graph argument may be left out, simply writing these as N(v), deg(v) or Ndeg (v, k) instead.

The k-core definition in Section 4.1.1 states that each vertex in a k-core sub-graph has a degree of at least k in that k-core, and has at least k neighbours whose degree is at least k in that k-core. That is, given a k-core H ⊆ G, for some k ∈ N, then for any vertex v ∈ H:

|Ndeg (H, v, k)| ≥ k (4.2) This fact may be used to construct an algorithm to find the k-cores in a known static graph G. This algorithm, shown in Algorithm 4.1, works by iteratively applying Relation 4.2 to prune out vertices that are not in the relevant k-core. It requires full, global knowledge of the state of the graph.

To see how Algorithm 4.1 may be constructed, say we add to H some or all of the missing vertices and edges from G to produce H0, such that H ⊆ H0 ⊆ G.

Relation 4.2 must continue to hold for the original vertices v of H within H0, i.e. that Ndeg (H0, v, k) ≥ k, because adding new vertices can only increase the degree

4.1 Background 61

Algorithm 4.1 Graph-centric k-core algorithm

1: k ← 0 2: while |G| > 0 do 3: Gk ← G 4: while (I ← {v ∈ G : |Ndeg (v, k)| < k}) 6= ∅ do 5: G ← G \ I 6: end while 7: k ← k + 1 8: end while

before (i.e. I = H0\ H), Relation 4.2 may also hold for some, however it can not hold for all, otherwise at least one k-core in H would not be a maximal sub-graph, contradicting its definition. Thus for any H0, either there is at least one vertex that is inconsistent with Relation 4.2, or it must be that all vertices are consistent and so H0 = H. Thus, if I is not the empty-set this implies there must exist a vertex in

H0 that is not consistent with Relation 4.2, and vice versa:

I 6= ∅ ⇔ ∃w ∈ H0 : |Ndeg (H0, w, k)| < k (4.3)

Removing any such inconsistent vertices, I, from H0, successively if required, until all remaining vertices satisfy Relation 4.2 must then obtain the k-core H again. Implication 4.3 means I can equivalently be defined without any knowledge of the actual k-cores, H, through this repeated removal of vertices that do not meet Relation 4.2. Having H0, and being able to determine I, implies then being able to determine the k-core H. This process leads to Algorithm 4.1.

Algorithm 4.1 starts by assigning all vertices, including any disconnected vertices, to the 0th-core. It then repeatedly removes all 0-degree vertices (and their edges)

from the graph until no more such vertices remain; the remaining vertices form the 1-core; then all 1-degree vertices are repeatedly removed; and so on, until there no longer are any vertices left to assign to further cores, and all k-cores for all k are known.

If Algorithm 4.1 is evaluated in a purely sequential manner, vertex by vertex, then the number of those evaluations is lower-bounded by Ωmax (kmax)2, where max (kmax) is the value of the highest k-core1. This is because the outer-loop must

run at least max (kmax) times, iterating over each k-core from k = 1 to

k = max (kmax), and in each such outer-loop there is an inner-loop which must

examine at least k vertices. Accordingly, in total

Pk=max(kmax)

k=1 k =1/2



max (kmax)2+ max (kmax) vertices must be examined, at a

1This terminology may seem slightly awkward. It is chosen for consistency with terminology in

minimum, in the absolute best case.

The checking of vertices for inconsistency in the conditional of each iteration of the inner loop of Algorithm 4.1 could potentially be parallelised, so that every vertex in the k-core being considered in a given iteration could be evaluated together. Though, the inner loop itself would have to remain. If the vertex consistency checking was parallelised, then Algorithm 4.1 would still require at least Ω (max (kmax)) evaluations.

Note that while it’s conceivable that the vertex consistency checks within the conditional of the inner loop of Algorithm 4.1 could be parallelised, the outer loop can not be parallelised. In the inner loop, the consistency of a vertex in any given iteration can be checked independently of the other vertices, and so there is potential for parallelisation in the inner loop. For the outer loop, each iteration depends on the previous iteration having been completed. This data dependency means the outer loop can not be parallelised.