4.5 Refinement of the Model to Match the Clustering Coefficient
5.1.1 Review of Community-Finding Algorithms
Clustering or community-finding deals with the detection of intrinsic groups present in a network. A loose definition of clustering therefore could be “the process of orga- nizing vertices into groups whose members are similar in some way”. Consequently, a cluster/community is a collection of vertices which are “similar” among them and “different” from vertices belonging to the other clusters. Most of the clustering al- gorithms assume that the number of edges among the vertices within a group (i.e., similar vertices) by far outnumber the edges between vertices from different groups. This problem of clustering turns out to be extremely important in various disciplines including computer science, sociology and network theory. Some of the popular algo- rithms that have been devised by the researchers in each of the above disciplines to tackle this problem are presented below.
Algorithms in Computer Science
In computer science, community-finding corresponds to the graph partitioning prob- lem that involves dividing a graph into pieces such that the pieces are of about the same size and there are only a few connections in between the pieces. Various ap- proaches have been proposed in the literature including those based on (i) different optimization techniques as in [81, 100], (ii) spectral bisection [139] and (iii) message passing [58].
Optimization: One of the classic examples of graph bisection using optimization techniques is the Kernighan-Lin algorithm [81]. This is a greedy optimization method that assigns a benefit function Q to divisions of the network and then attempts to optimize this benefit over possible divisions. The benefit function is equal to the number of edges that are present within the two groups minus the number of edges
that run between them. As an initial condition, one has to specify the size of the two groups into which the network is to be divided and the start configuration for the groups (may be random). The algorithm proceeds in two stages. In the first stage, a vertex is chosen from each of the groups and the change in the benefit function is calculated if these two vertices are swapped. The pair that maximizes this change is chosen and swapped. This process is repeated however, with a limitation that a vertex that has been swapped once is locked and is never swapped again. Once all the vertices in a group have been locked this stage of the algorithm ends. In the second stage, the sequence of swaps made are revisited so as to find out the point where Q was maximum. The corresponding configuration is chosen to be the bisection of the graph.
Another popular method based on optimization is the k-means clustering [100]. Here the number of clusters is preassigned to a value, say k. Each vertex is embedded as a point (xi) in a metric space and a distance measure is defined between the points
in this space. The distance measure indicates the dissimilarity between the pairs of vertices. The goal is to partition the points into k sets (S = {S1, S2, . . . , Sk}) so as
to minimize a cost function. The cost function takes the following form
argmin S k X i=1 X xj∈Si || xj − c1 ||2
where c1 is the centroid (i.e., the center) of the points in Si. The k-means clustering
problem can be solved using the Lloyd’s algorithm [97]. The algorithm begins with an initial distribution of centroids that are as far as possible from each other. In the first iteration, each vertex is assigned to a centroid. The centers of mass of the k clusters formed are then computed and this new set of centroids allows for a new classification of the vertices in the second iteration, and so on. Within a few iterations the centroids become stable and the clusters do not change any more.
Spectral Bisection: Given an adjacency matrix A of a graph G, one can construct the Laplacian matrix L where L = D - A and D is the diagonal matrix of node degrees.
In other words, each diagonal entry of the matrix D is dii=
P
jaij where aij are the
entries of the matrix A. All the other entries of matrix D are zero. Spectral clustering techniques make use of the spectrum of the matrix L to perform dimensionality reduction for clustering in fewer dimensions. One such technique is the Shi-Malik algorithm [139], commonly used for image segmentation. The algorithm partitions based on the eigenvector v corresponding to the second smallest eigenvalue of the matrix L. This partitioning can be done in various ways, such as by taking the median m of the components of the vector v, and placing all nodes whose component in v is greater than m in one partition, and the rest in the other partition. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.
Message Passing: A set of data points can be clustered via passing certain mes- sages between the points. The key idea is to construct a similarity network with nodes representing the data points and the weights on the edges representing some similarity between the data points and then iteratively propagating messages from a node to its neighbors finally resulting into to the emergence of the clusters. A popular algorithm in this category is the one suggested by Frey and Dueck [58] that is based on finding data centers or “exemplars” through a method which they call “affinity propagation”. In this algorithm, each node in the similarity network is assumed to be a potential exemplar and real-valued messages are exchanged between nodes until a high-quality set of exemplars and corresponding clusters gradually emerges. There are two kinds of messages that are exchanged between the nodes, namely “responsibil- ities” and “availabilities”. Responsibilities r(i, k) indicate the accumulated evidence for how well-suited the node k is to serve as the exemplar for the node i, taking into account other potential exemplars for the node i. Responsibilities are sent from node i to the candidate exemplar k. On the other hand, availability a(i, k) indicates the accumulated evidence for how appropriate it would be for the node i to choose node k as its exemplar, taking into account the support from other nodes that node k should be an exemplar. Availabilities are sent from candidate exemplar k to the node i.
The algorithm consists of some simple update rules and messages are exchanged only between node pairs connected by an edge. There are three steps in which it pro-
ceeds – (i) update responsibilities given availabilities, (ii) update availabilities given the responsibilities, and (iii) monitor exemplar decisions by combining availabilities and responsibilities. Terminate if the local decisions stay (almost) unchanged over successive iterations.
Algorithms in Sociology
Sociologists concerned with the analysis of social networks have developed a large volume of literature that deals with the problem of community identification. The primary technique adopted is known as hierarchical clustering [135]. A hierarchical clustering on a network of n nodes produces a series of partitions of the network, Pn,
Pn−1, . . . , P1. The first partition Pn consists of n single-node clusters, while the last
partition P1, consists of a single cluster containing all the n nodes. At each particular
stage the method joins together the two clusters which are closest (or most similar) to each other. Note that at the first stage, this amounts to joining together the two nodes that are most similar to each other.
Differences between methods arise because of the different ways of defining the distance (or similarity) between clusters. The most popular similarity measure in the sociology literature is called structural equivalence. Two nodes in the network are said to be structurally equivalent if they have exactly the same set of neighbors (other than each other, if they are connected). In most cases, exact structural equivalence is rare and therefore one needs to define the degree of equivalence between node pairs which, may be done in several ways. One of them, known as Euclidian distance [160], is defined as
distij =
s X
k6=j,i
(aik− ajk)2
where aik and ajk are entries of the adjacency matrix. The more structurally similar
the nodes i and j are, the lower is the value of distij. Another commonly used
to the nodes i and j in the adjacency matrix [160].
There are also several techniques of joining the clusters such as the
(i) Single linkage clustering: One of the simplest form of hierarchical clustering method is single linkage, also known as the nearest-neighbor technique. The fundamental characteristic of this method is that distance between the clusters is defined as the distance between the closest pair of nodes, where only pairs consisting of one node from each cluster are considered.
(ii) Complete linkage clustering: The complete linkage, also known as the farthest- neighbor, clustering method is the opposite of single linkage. Distance between clusters is defined here as the distance between the most distant pair of nodes, one chosen from each cluster.
(iii) Average linkage clustering: In this case, the distance between two clusters is defined as the average of distances between all pairs of nodes, where each pair is made up of one node from each cluster.
Algorithms in Network Theory
Clustering or community analysis is also a very well-researched area in the field of complex networks. Various algorithms have been proposed in order to detect “natu- ral” divisions of the vertices in the network. In the following, we shall outline a few of these techniques that have become quite popular in the recent times.
Spin Model based Algorithm: One of the very popular models in the field of statistical mechanics, usually applied to study various properties of ferromagnets, is the Potts model [166]. The model describes a system of spin interactions where a spin can be in n different states. If the interactions are ferromagnetic then at the ground state all the spins have the same value (i.e., they are aligned). However, if the interactions are anti-ferromagnetic then in the ground state of the system there are different spin values co-existing in homogeneous clusters. If the Potts spin variables
are assigned to the vertices of a network where the interactions are between the neighboring spins then the communities can be recovered from the like-valued spin clusters in the system. In [133], the authors propose a method to detect communities where the network is mapped onto a n-Potts model with nearest neighbor interactions. The Hamiltonian (i.e., the total energy) of the model can be expressed as
H = −JX i,j aijδ(σi, σj) + κ n X s=1 ns(ns− 1) 2
where aij is an element of the adjacency matrix, δ is the Kronecker delta function [74],
σi and σj stand for the spin values, ns is the number of spins in the state s and J
and κ are the coupling parameters. H is a sum of two competing terms: one is the classical ferromagnetic Potts model energy that favors spin alignment while the other peaks when the spins get homogeneously distributed. The ratio κ/J controls the relative importance of these two terms. H is minimized via the process of simulated annealing (see [83]) that starts from an initial configuration where spins are randomly assigned to the vertices and the number of states n is kept very high. The authors in [133] show that the procedure is quite fast and the results do not depend on n (as long as n is sufficiently high).
Random Walk based Algorithm: Random walks [74] on a graph can be very useful for detecting communities. The basic idea is that if the graph has strong community structures then a random walker spends a long time inside a community due to the high density of the internal edges and the number of paths that could be followed. In [167], the author uses random walks to define the distance between pairs of vertices. The distance distij between vertices i and j is defined as the average
number of edges that a random walker has to cross to reach j starting from i. Vertices that are close to each other are likely to belong to the same community. The author defines a “global attractor” of a vertex i to be a vertex closest to i, that is, any vertex which is at a smallest distance from i. On the other hand, the “local attractor” of i are its nearest neighbors (i.e., vertices directly sharing an edge with i). Two types of communities are defined based on the local and the global attractors: i has to
be put into the community of its attractor and also into the community of all other vertices for which i is an attractor. Communities have to be minimal subgraphs i.e., they cannot have smaller subgraphs that are also communities as per the previously mentioned definition. Other variants of random walk based clustering algorithms may be found in [73, 90, 168].
Clique Percolation based Algorithm: In certain real-world graphs a particular vertex may belong to more than one community. For instance, a polysemous word in a semantic network (see Chapter 1) can belong to each different community of words that corresponds to a particular sense of that polysemous word. In such cases, one needs to detect overlapping communities and the term “cover” is more appropriate than partition. The most popular technique to detect overlapping communities is the clique percolation method [120]. This method is based on the concept that the internal edges of a community together form a clique while inter-community edges do not form such cliques. If it were possible for a k-clique (i.e., a clique composed of k nodes) to move on a network then it would probably get trapped within its original community as it would not be able to cross the bottleneck formed by the inter-community edges. The authors define a few terms so as to realize this idea – (i) two k-cliques are adjacent if they share k − 1 nodes, (ii) the union of adjacent cliques is a k-clique chain, (iii) two k-cliques are connected if they are a part of a k-clique chain and (iv) a k-clique community is the largest connected subgraph formed by the union of a k-clique and of all k-cliques that are connected to it. The identification of a k-clique community is carried out by making a k-clique “roll” over adjacent k-cliques, where rolling means rotating a k-clique about the k − 1 vertices it shares with any of the adjacent k-cliques (see [51] for a method outlining the computation of k-cliques and their overlaps). Since, by construction, k-clique communities can share vertices therefore they can be overlapping.
The Girvan-Newman Algorithm: The Girvan-Newman algorithm [60] focuses on those edges in the network that are least “central”, that is the edges which are most “between” the communities. The “edge betweenness” of an edge is defined as the number of shortest paths between pairs of nodes that run along it. If there are more
than one shortest paths between a pair of nodes then each of these paths is assigned equal weight such that the total weight of all of the paths sums to unity. If a network contains communities that are only loosely connected by a very few inter-community edges, then all the shortest paths between different communities must go along one of these few edges. Thus, the edges connecting communities should have high edge betweenness (at least one of them). By removing these edges, the communities can be separated from one another thereby, revealing the underlying community structure of the network.
In short, the steps in which the algorithm proceeds are – (i) calculate the edge betweenness of all existing edges in the network, (ii) remove the edge with the highest betweenness, (iii) recalculate the betweenness of all edges affected by the removal, and (iv) repeat steps 2 and 3 until no edges remain.
The Radicchi et al. Algorithm: The algorithm of Radicchi et al. [129] counts, for each edge, the number of loops of length three it is a part of and declares the edges with very low counts as inter-community edges. The basic idea is that the edges that run between communities are unlikely to belong to many short loops, because, to complete a loop containing such an edge there needs to be another edge that runs between the same two communities, and such other edges are rare. Therefore, it should be possible to spot the between-community edges by looking for the ones that belong to an unusually small number of loops. Such edges can be iteratively removed to decompose the network into disjoint communities.