CHAPTER 3: RESEARCH METHODS
3.1 Social network analysis
Social network analysis (SNA) is applied to the characteristics of co-authorship networks. SNA uses an established set of mathematical algorithms to map and analyze relationships among entities (Wasserman & Faust, 1994). In network analysis, the attributes of the entities are not ignored, but rather seen in the light of the relationship the nodes have with one another. Several key structural measures and notions in SNA are the result of researchers' insights into empirical phenomena and are driven by social theory(Wasserman & Faust, 1994). Social network methods have been developed over the past seven decades as an integral part of development in social theory, empirical research, maths and statistics (Wasserman & Faust, 1994).
3.1.1 Constructing a co-authorship network
A network of researchers can be constructed if two researchers co-author a scholarly paper together. In this case, scholars would form the nodes and the paper they have co- authored would represent the link between them.
For example, if four authors,
E1 = [[a,b],[a,c],[a,d],[b,c],[b,d],[c,d]].
Again when c co-writes a paper with f, V2 = [c,f], the link is represented as E2 = [c,f].
Similarly, when d co-writes a paper with b and i, V3 = [d,b,i] the links are represented as
E3 = [[d,b], [d,i],[b,i]].
The lines between the nodes in a co-authorship network are undirected, symbolizing mutual relationship. This could be graphically depicted as in Figure 3.1.
Figure 3.1: An example of co-authorship network
If two authors wrote a paper together, a weight of one was accorded to their relationship. When they co-authored two and more times, their edges were merged to give a weight to their relationship. For example, if A and B co-authored a paper three times, only one edge line still passes between them, but the edge would carry a weight of three. Optionally, higher weights could be visualized by thickening the edge between A and B. The edge value does not get fractioned based on the number of authors in the paper.
3.1.2 The evaluation of topological characteristics of a network
Topological characteristics of the network are evaluated at two levels: the global or macro level and the local or micro level. At the global level, by calculating density, geodesic paths, clustering coefficients and degree distribution, the overall features of the network are revealed. Global properties indicate the concentration of authority, control and other resources within the network (Yan et al., 2010). At the local level, measures such as degree, betweenness, closeness and PageRank centralities reveal the properties of individual nodes. Centralities indicate the influence of actors in terms of their popularity, approachability, brokerage power and prestige. The social behaviour of authors is governed by opportunities, which in turn determine the influence of actors in the network (Yan et al., 2010).
A path is the sequence of vertices ‘walked’ from one edge in the network to another edge. A geodesic distance is the shortest path between a specified number of nodes. It is possible that there is more than one geodesic path between two vertices at any particular point in time.
A component is a set of nodes joined in such a way that any single random node in the network could reach out to any other random node by “…traversing a suitable path of intermediate collaborators” (Newman, 2004a) (p.5202). A giant component is the component having the largest number of nodes. In a network, initially most nodes either exist in isolation or in small clusters. Then, when new vertices and edges are added, the network grows dynamically to a tipping point, also known as the percolation level, at a special value of probability:
P = 1/n (1) where n is the number of vertices above which a giant component forms (Newman, 2007). In a co-authorship network, a giant component can reflect the group in which the main or central activity is taking place.
Clustering coefficient,C, is also known as ‘transitivity’ and more accurately as the ‘fraction of transitive triples’(Wasserman & Faust, 1994). Mathematically, clustering coefficient is calculated as:
𝐶 = 𝑛𝑜.𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 3 ×𝑛𝑜.𝑜𝑓 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 (2)
where the number of triangles represents trios of nodes in which each node is connected to both others, and connected triples represent trios of nodes in which at least one node is connected to both others (Barabasi et al., 2002; Newman, 2004a).
The density of a network, G, indicates the number of links in the network in ratio to the maximum possible links. The density, D, of an undirected network P (cooperation network in which the relationship is mutual) with n vertices is expressed as (Otte & Rousseau, 2002),
(3)
Degree is the most common and probably the most effective centrality measure to
determine both the influence and importance of a node. A degree is simply the number of edges incident on the vertex. Mathematically, degree 𝑘𝑖 of a vertex is
𝑘𝑖 = ∑ 𝑔𝑖𝑗 𝑛
𝑗=1 (4)
where 𝑔𝑖𝑗= 1 if there is a connection between vertices i and j and 𝑔𝑖𝑗= 0 if there is no such connection (Otte & Rousseau, 2002).
Betweenness centrality of a vertex i is the fraction of geodesic paths that pass through i,
which could be mathematically represented as
𝑏(𝑖) = ∑ 𝑚𝑗𝑖𝑘
𝑚𝑗𝑘
𝑗,𝑘 (5)
where 𝑚𝑗𝑘 is the number of geodesic paths from vertex j to vertex k(j, k ≠ i) and 𝑚𝑗𝑖𝑘 is the number of geodesic paths from vertex j to vertex k, passing through vertex I (Linton, 1977; Otte & Rousseau, 2002)
Closeness centrality of a vertex i is the average geodesic distance from every other node
in the network. Mathematically, this is computed as
𝑐𝑖 = ∑ 𝑑𝑗 𝑖𝑗 (6)
where 𝑑𝑖𝑗 is the number of edges in the geodesic path from vertex i to vertex j (Otte & Rousseau, 2002)
PageRank is a link analysis algorithm (Page, Brin, Motwani, & Winograd, 1999) that
measures the relative importance of nodes within the network. PageRank works on the premise that having links to page p from important pages, is a good indication that page p is important one too. PageRank was proposed initially for digraphs. However, it can be
calculated for a unidirectional graph, such as the co-authorship network, by making each edge bidirectional.