CHAPTER 3. METHODS
3.3 Second phase
3.3.2 Network properties
3.3.2.3 Clustering coefficient (transitivity)
While the degree is a property of a vertex and the distance is a property pertaining to a pair of vertices (a dyad), the clustering coefficient is based on the concept of triadic closure or transitivity. A triad (or a triple) of vertices A, B, and C is called transitive when the three vertices have ‘balanced’ relations. In an undirected graph, it means that if A connects to B and B connects to C, then A connects to C. In social network analysis, triadic relations have been emphasized as a meaningful building block of a group structure. The number of ties (edges) among three actors (vertices) defines different patterns of triadic relations: no tie means isolates, one tie means a couple and an isolate, two ties means one actor bridging the others, and all three ties means a cluster. By counting the instances of the different triadic patterns, one can get a picture of whether and how actors in the network are scattered or clustered. The clustering coefficient is a measure quantifying this feature.
There are two different ways to calculate the clustering coefficient of a network. One way is to directly calculate the transitivity ratio, by defining a clustering coefficient C as:
and a connected triple of vertices refers to a set of three vertices at least one of which is connected to the other two. In other words, a triangle is a triad with three edges and a connected triple is a triad with at least two edges. Therefore, C
measures the portion of triples that have the third edge to make them transitive. In effect, C is the probability that two neighbors of a randomly selected vertex are connected.
An alternative way of obtaining the clustering coefficient C of a graph is to calculate the clustering coefficient of each vertex i in the graph and get the average over all vertices. A definition of the clustering coefficient of a vertex i, proposed by Watts and Strogatz (1998), is:
Note that the denominator of the above equation, the number of the triples centered on a vertex, is in effect the number of pairs among the neighbors of the vertex. If there is an edge connecting a pair of its neighbors, that makes a triangle connected to the vertex. Therefore, the clustering coefficient of a vertex i can be equivalently defined as:
where Ei is the number of edges connecting the neighbors of vertex i, and ki is the degree of vertex i. In other words, the clustering coefficient of a vertex is the ratio of the number of existing edges connecting its neighbors to the maximum possible number of edges between the neighbors. Note that the above definition of Ci is, in effect, the density of the neighborhood of vertex i. Therefore, it provides a local
measure of cohesion among neighboring vertices.
The clustering coefficient of the graph can then be obtained by averaging the local clustering coefficient of all vertices (of degree at least two) in the network.
The two definitions of clustering coefficient of a graph are both widely used. As Newman (2003) points out, these definitions reverse the order of operations. The later approach has the advantage of having a local value for each vertex, and, hence, of being able to observe the distribution of the local clustering coefficient. However, it should be noted that, since the clustering coefficient of a vertex is in effect a local density measure, it is affected by the size of its neighborhood (i.e, low degree vertices tend to have higher density due to the small denominator). As a consequence, the global measure of the clustering coefficient of a graph, averaging local values, weight low-degree vertices more.
Regardless of which definition is used, however, many networks are found to exhibit a high clustering coefficient. That is, if vertex A is connected to vertex B and vertex B to vertex C, the probability that vertex A and C are also connected is much higher than the probability that two randomly selected vertices are connected. This finding indicates that vertices in a real network tend to cluster to form a cohesive structure.
In the induced network of delicious.com users, transitivity means that if two users are neighbors of the same third user, then the two users are likely to be connected to each other. In other words, if A and B, and B and C have one or more item in common, then A and C have a high probability of having a shared
item. A highly clustered group of users in this network indicates cohesive interests with a dense intersection of their bookmark collections. It should be noted, however, that a network induced from an affiliation network inherently has a higher clustering coefficient than a typical one-mode network, due to the fact that all members in a group are to be completely connected in the process of projecting the two-mode graph into a one-mode graph. If users A and B and C bookmarked the same item, the transitive relations among A, B and C are given by definition.
The high clustering coefficient of a network, together with the short average pathlength discussed above, contributes to the small-world effect. A theoretical model for small-world networks, suggested by Watts and Strogatz (1998), characterizes a small world network as a loosely connected set of highly clustered subgroups.