• No results found

In this chapter we have reviewed a collection of community detection algorithms, includ- ing variants specifically designed for bipartite networks, that have previously been used to cluster complex networks. We survey modifications of modularity-based algorithms to adapt them to the bipartite case. Modularity based algorithms suffer from a well-known resolution limit. We conclude that Infomap is the best-performing algorithm for cluster- ing large networks based on two benchmark comparative analyses [137, 89].

However, Infomap has not been specifically designed for bipartite networks, thus it does not perform as intended on bipartite networks as it does with unipartite networks. This issue is the subject of the next two chapters.

Chapter 3

Weighted projection method

In recent years, there has been increasing motivation to analyse bipartite networks in general as a separate network category, and in particular to investigate their community structure. The Infomap algorithm utilizes the information flow on a network in order to achieve its clustering. This information flow is approximated in practice by means of a random walk along the network, and iterating until a steady state distribution emerges, as it must, under the assumption that the network in question is strongly connected and aperiodic. However, the Infomap algorithm cannot run directly on a bipartite network without adaptation because a random walk is periodic, see Section 1.5.4. The Infomap

algorithm will not converge to a unique stationary distribution π as t−→ ∞.

To overcome this, a weighted projection approach based on common neighbours of set P of a bipartite network may be applied, to obtain a one mode network that can be clustered byInfomap, as we do in this Chapter. This can solve the issue, since in bipartite networks one set may attract most interest “while the other may be somewhat neglected” [154]. Alternative ways to break the periodicity of random walks on bipartite networks are to use one of our proposed strategies, as we discuss in Chapter 4.

The motivation of our approach is to be able to obtain a stationary distribution for a random walk on the unipartite network obtained by projection. This is achieved by integrating the projection process into the Infomap algorithm. We do the same for the

Louvain algorithm for comparison purposes. This allows us to compute the time com- plexity for the whole operation starting from converting bipartite networks to weighted unipartite networks followed by clustering them by the two algorithms, and to compare performance by the two algorithms. It is possible to use a different projection approach and then apply Infomap and Louvain algorithms, but we chose to integrate projection with Infomap and Louvain algorithms to simplify the testing process, e.g. when testing and optimizing our projection code.

In this Chapter, we survey the projection approaches for bipartite networks that have been already used in the literature in Section 3.1. In Section 3.2 we describe the weighted projection network we use. We compare the community detection outcomes from bench- mark with others in the literature on a small database, which is nonetheless a benchmark for bipartite clustering techniques, the “Southern Women” database [41] in Section 3.3. For 5 real world bipartite networks the results and experimental evaluation are given in Section 3.4. In Section 3.5 the complexity of the algorithm is studied and running time for all case studies are demonstrated. Section 3.6 discusses and highlights the important results we found.

3.1

Projection approaches

The projection method has been used for a long time in recommendation systems in the business area. Its strength is the idea that the emphasis is usually on one of the two node sets. These sets can be switched for different applications. So a projection method allows us to investigate bipartite networks using powerful one mode algorithms, after a transforming process. The projections of bipartite networks “remain an important methodological tool in research” [122]. The weighted projected network is much more informative than the unweighted one [94]. Since its weights are all integers it can be analyzed by standard techniques for unweighted graphs [128, 73].

CHAPTER 3 Weighted projection methods

times the corresponding nodes connect through a common neighbour [149].

Newman [125] extended the concept of the weights on the edge between nodes while working with scientific collaboration networks. He argued that two scientists whose names appear on a paper together with many other co-authors might know one another less well on average than two who were the only authors of a paper. To consider this effect, he defines the weight among the nodes using the factor

wij = X x 1 kx−1 (3.1)

where kx is the number of authors on paper x. So, in scientific collaboration networks,

if two scientists who only publish a single paper together with no other co-authors get weight of 1. However, if the two scientists have a co-author on the same paper, the weight on the tie between them is 0.5.

Another way to project bipartite networks is to produce a weighted directed one mode network based on classifying nodes in the original bipartite network as target (P) and intermediary (S), as mentioned in Collaborative Filtering (CF) [146]. The weight wij on

an edge between i, j ∈ P shows the importance of node i attached to node j measured by a specific function. These weights denote how important the ending node is to the starting one.

However, this method provides a weighted directed one mode network, and the main focus of the thesis is on weightedundirectedone mode networks. Moreover, it is expensive in terms of time complexity as CF can be done in O(m2hk

Pi+mnhkSi) [184].

Another example of projection is the Network-based Interference (NBI) used in rec- ommendation systems [184]. In this method an asymmetrical weighted approach on set

P has been proposed, where wij 6=wji. The weights on edges are stored in the weighted

adjacency matrix W = [wij]k×k. This process requires k2 memory space to store the

weighted matrix [184]. In Collaborative Filtering methods [76, 168, 179] the similarity

matrix between nodes is constructed and stored in m2 memory space.

Another projection technique is to construct a block adjacency matrix that represents the binary bipartite network as in (1.28). The projection of set P has weighted incident matrix is AAT, and the projection of set S has ATA [116]. This is a matrix imple- mentation of the common neighbours approach. We use an adaptation of the common neighbours approach here because of its simplicity and efficiency.

Guimera et al [73] found no difference in the node communities detected inP whether they resulted from modularity maximisation after projection, or projection after bipartite modularity maximisation.

However, one-mode projection implies a certain degree of information loss [94]. An example of the lost information is in movie bipartite networks where there is a set of actors and a set of movies. The bipartite network encodes node attributes such as the running time of movies the actors are common in [94]. This information can be lost when transferring the bipartite network to a one mode network. Another example is the natural structure of bipartite network where in general we can not obtain the bipartite network from its one mode projected network [184, 71], see Figure 3.1.1 for more clarity.

Figure 3.1.1: Illustration of structurally different bipartite networks which exhibit an identical one-mode projection.

CHAPTER 3 Weighted projection methods