• No results found

Complex Network Community Detection by Improved Nondominated Sorting Genetic Algorithm

N/A
N/A
Protected

Academic year: 2020

Share "Complex Network Community Detection by Improved Nondominated Sorting Genetic Algorithm"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

2019 International Conference on Information Technology, Electrical and Electronic Engineering (ITEEE 2019) ISBN: 978-1-60595-606-0

Complex Network Community Detection by Improved Nondominated

Sorting Genetic Algorithm

Wen-jun LIU

1,*

and Bin CHEN

2

1

School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430065, China

2

Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial System, Wuhan 430065, China

*Corresponding author

Keywords: Complex networks, Community detection, NSGA2, LPA, Modularity.

Abstract. Aimed at the problems of low solution precision and easy to be trapped into local optima by single objective evolutionary algorithm, a self-adaptive multi-objective optimization algorithm based on nondominated sorting genetic algorithm II (NSGA2) and Label Propagation Algorithm (LPA) is proposed. The algorithm takes Kernel K-means (KKM) and Ratio Cut (RC) as the objective functions. Two new crossover operator and the improved mutation operator is used to achieve the evolution of the population. We conducted simulation experiments in the computer-generated networks and the real-world networks environment. The results show that compared with other community detection algorithms, our algorithm has the advantages of high resolution and strong search ability, and it can effectively identify the community structure in complex networks.

Introduction

Real-world systems are complex, but they can be described by abstract network models, such as ecosystem, social network and so on. By analyzing the community structure in complex network, features in complex networks can be found more intuitively [1,2,3].

Community detection can be modeled as an optimization problem. For example, Pizzuti has proposed a single objective genetic algorithm(GA-net) for community detection[4].Because of the limit of single-objective evolutionary method based on modularity several multi-objective evolutionary methods has been proposed, such as MOGA-net [5].

In this paper, we propose a novel community algorithm based on NSGA2 and LPA, termed as NSGA2-LPA. We use the multi-objective evolutionary algorithm based on NSGA2 which was proposed by Deb in 2002. NSGA2 has high computational efficiency when dealing with low-dimensional problems, and the solution set has good diversity and convergence [6,7].

Related Background

Complex Network

A complex network is usually represented by G = (V, E), where V represents the set of all nodes in the complex network, and E represents the set of all edges in the complex network. The topology of complex networks is usually represented by adjacent matrix A = (𝑎𝑖𝑗)𝑛×𝑛, where n is the number of nodes in the network,𝑎𝑖𝑗 = 1, if there is a link between node i and j ,otherwise,𝑎𝑖𝑗 = 0.

Optimization Problem

The community detection algorithm based on improved NSGA2 selects KKM as the evaluation function of the community‘s internal density and RC as the evaluation function of the community‘s external density [8]. Given an unsigned network denoted as G = (V, E) with |V| = n nodes

(2)

as C = {𝐶1, 𝐶2, ⋯ , 𝐶𝑘}, Then KKM and RC can be defined as

min {𝐾𝐾𝑀 = 2(𝑛 − 𝑘) − ∑

𝐿(𝐶𝑖,𝐶𝑖)

|𝐶𝑖|

𝑘 𝑖=1

𝑅𝐶 = ∑ 𝐿(𝐶𝑖,𝐶̅ )𝑖

|𝐶𝑖|

𝑘 𝑖=1

(1)

where L(𝐶𝑖, 𝐶𝑗) is the number of edges between nodes in community i ;L(𝐶𝑖, 𝐶̅ )𝑖 is the number of

edges between nodes in community i and nodes outside community i.

The Proposed NSGA2-LPA for Community Detection

Encoding and Initialization

We adopt character-based encoding and each gene represents the community number of the node. The character-based encoding process is shown in the Fig. 1. Fig. 1(a) is the network topology, and Fig. 1(b) is the code of nodes in the network. The community number for nodes 1,2, and 3 are all 1, so they belong to the same community. Fig. 1(c) is a schematic diagram of network community structure, node 1,2 and 3 form the same community, and node 4,5,6,7 form the same community.

Figure 1. (a) Network structure diagram. (b) Gene encode table. (c) Community structure map.

This paper adopts the initialization method of randomly assigning community numbers to each node, and randomly generates a population composed of n individuals.

Crossover

(a) Multipoint crossover operator based on community structure

[image:2.595.147.451.307.367.2]

Two chromosomes A and B are randomly selected as the crossed source and target chromosomes respectively which are shown in the Fig. 2(a). Randomly select two nodes from the source chromosome A and get the node numbers marked a and b, get the community structure from a to b on source chromosome A and use the community structure to change the region from a to b on target chromosome B to generate a new chromosome as an individual in offspring population. Exchange the source and target chromosomes to generate another new chromosome.

Figure 2. Two crossover operator.

(b) Single point crossover operator based on LPA

[image:2.595.197.399.547.649.2]
(3)

Mutation

In the process of mutation, we randomly pick a chromosome to be mutated. And a node is picked randomly on the chromosome to be the change-node. Randomly select a point that has a link with the change-node as the mutation guidance node, and change the community number of the change-node to the community number of mutation guidance node.

The Main Loop of the NSGA2-LPA Algorithm

Given a network N and the graph G modeling it, NSGA2-LPA optimizes the two objectives (1) presented in Section 2.2. The algorithm starts with a population initialized at random, and then applies the modified crossover and mutation operators to produce the new population. Find the Pareto front set by using the nondominant sort of NSGA2, and the crowding distance is used to select the optimal set as the offspring population. After that, we adopt the concept of modularity, introduced by Girvan and Newman to assess the quality of a partition, to select, among the solutions found, that having the highest value of modularity.

Experimental Results

In order to verify the performance of the algorithm proposed in this paper, Experiments have been carried out on both computer synthetic networks and six real-life networks. And four algorithms named as MOGA-net[9],GA-net[10], IGA[11],GN[1] are chosen to compare with the proposed approach.All experiments were carried out under the desktop PC of Inter(R)Core(TM)i5-4460 processor, 3.2ghz main frequency, 4g memory and Windows 10 operating system.

Synthetic Data Set

LFR(lancichinetti fortunato-radicchi) benchmark network is currently the most commonly used computer-generated data set in community detection which is proposed by Lancichinetti[12]. The LFR network model used in this paper takes as the parameter is:N = 128,k = 16,𝑘𝑚𝑎𝑥=

16,𝑐𝑚𝑎𝑥 = 32,𝑐𝑚𝑖𝑛 = 32,μincreases from 0 to 0.5 in steps of 0.05.

The first evaluation index used in this paper is the function modularity Q proposed by Newman in [1] which is widely used to measure the compactness of community division. The second evaluation index is normalized mutual information (NMI) whichis used to evaluate the similarity between the true community structure and the detected ones[13,14].

Figure 3. (a)Comparing NSGA2-LPA with different algorithms in terms of nmi on LFR benchmark(b)Comparing NSGA2-LPA with different algorithms in terms of Q on real-life network benchmark.

[image:3.595.72.525.520.707.2]
(4)

performance of GN algorithm decreases whenμ=0.4, and the performance of our algorithm decreases when μ=0.5 . Therefore, the NSGA2-LPA algorithm proposed in this paper has advantages in accuracy and efficiency in LRF network test.

[image:4.595.82.510.155.248.2]

Real-life Data Set

Table 1. Information of real network data sets.

Network Node Edge Description

karate 34 78 Zachary’s karate club

dolphin 62 159 Dolphin social network

book 105 441 Books about US politics

football 115 613 American College football netscience 1461 2742 Coauthorship network

power grid 4941 6594 American Power Grid

In Fig. 3(b), we show the modularity obtained by each algorithm, and the abscissas 1-6 represent the karate network, the dolphin network, the books network, the football network, the netscience network, and the power grid network [15,16,17,18] which are detailed description is shown in table 1. It can be seen from the experiment that in most cases, the evolutionary algorithm is higher than the modularity solved by the traditional algorithm like GN algorithm, indicating that the evolutionary algorithm is effective for community detection. And in large complex networks, GN algorithms cannot successfully divide the community structure due to the limitations of the algorithm. Compared to single-objective evolutionary algorithms, multi-objective algorithms improve the accuracy of partitions over most networks, especially for large complex networks.

Figure 4. (a) The true partitioning of the Bottlenose Dolphins network. (b) The partitioning with highest Q detected by NSGA2-LPA.

Dolphin dataset is a network of dolphins that Lesseau et al. observed in a group of bottlenose dolphins inhabiting the Doubtful Sound Bay in New Zealand after seven years. A tie between two dolphins was established by their statistically significant frequent association. The network has 62 nodes and 159 edges. The network naturally is separated into two large groups, the female group and the male one as shown in Fig. 4(a). From the Pareto front set of NSGA2-LPA, we can get a high modularity value of the dolphin network, Q=0.5263. And the partitioning with highest Q detected by NSGA2-LPA is shown in Fig. 4(b). It is clearly known from this experiment, NSGA2-LPA is a more effective algorithm for community detection

Concluding Remarks

[image:4.595.104.494.392.539.2]
(5)

precision and wide applicability. It is more suitable for large complex networks than GN algorithm and it has better accuracy than evolutionary algorithms such as GA-net, IGA, and MOGA-net. It also shows that the proposed method can efficiently reveal community structures in large complex networks.

References

[1] Girvan M, Newman M E J. Community structure in social and biological networks[J]. Proc Natl Acad Sci U S A, 2001, 99(12):7821-7826.

[2] Newman M E, Girvan M. Finding and evaluating community structure in networks.[J]. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2004, 69(2):026113.

[3] Newman M E J. Detecting community structure in networks[J]. European Physical Journal B, 2004, 38(2):321-330.

[4] Pizzuti C. GA-Net: A Genetic Algorithm for Community Detection in Social Networks[C]// Parallel Problem Solving from Nature - PPSN X, 10th International Conference Dortmund, Germany, September 13-17, 2008, Proceedings. Springer-Verlag, 2008.

[5] Pizzuti, C. A Multiobjective Genetic Algorithm to Find Communities in Complex Networks[J]. IEEE Transactions on Evolutionary Computation, 2012, 16(3):0-0.

[6] Deb K. A fast and elitist multiobjective genetic algorithm : NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6.

[7] Raghavan U N, Albert, Réka, Kumara S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical Review E, 2007, 76(3):036106.

[8] Angelini L, Boccaletti S, Marinazzo D, et al. Identification of network modules by optimization of ratio association[J]. Chaos, 2006, 17(2):175.

[9] Pizzuti, C. A Multiobjective Genetic Algorithm to Find Communities in Complex Networks[J]. IEEE Transactions on Evolutionary Computation, 2012, 16(3):0-0.

[10] Pizzuti C. GA-Net: A Genetic Algorithm for Community Detection in Social Networks[C]// Parallel Problem Solving from Nature - PPSN X, 10th International Conference Dortmund, Germany, September 13-17, 2008, Proceedings. Springer-Verlag, 2008.

[11] Deng kun, zhang jianpei, Yang jing. Community detection in complex networks using an improved genetic algorithm [J]. Journal of Harbin engineering university, 2013(11):1438-1444.

[12] Lancichinetti A, Fortunato S, Radicchi F. Benchmark graphs for testing community detection algorithms[J]. Physical Review E, 2008, 78(4 Pt 2):046110.

[13] Wu F, B.A. Huberman. Finding communities in linear time: a physics approach[J]. The European Physical Journal B, 2004, 38(2):331-338.

[14] Danon L, Duch J, Diaz-Guilera A, et al. Comparing community structure identification[J]. Journal of Statistical Mechanics, 2005, 2005(09):09008.

[15] M. Girvan and M. E. J. Newman, Proc. Natl. Acad. Sci. USA 99, 7821-7826 (2002).

[16] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, Behavioral Ecology and Sociobiology 54, 396-405 (2003).

[17] D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998).

Figure

Figure 1. (a) Network structure diagram. (b) Gene encode table. (c) Community structure map
Figure 3. (a)Comparing NSGA2-LPA with different algorithms in terms of nmi on LFR benchmark(b)Comparing NSGA2-LPA with different algorithms in terms of Q on real-life network benchmark
Table 1. Information of real network data sets.

References

Related documents

Another trial not included in the meta- analysis randomised 34 head injury patients to receive either 1.6% sodium chloride or Ringer's lactate, but used 0.45% sodium chloride as

There is some data from Ireland indicating that addiction treatment episodes related to NPS use escalated very rapidly in the first months of 2010 and then declined in the two years

Glomus tumor is a rare, benign tumor derived from structures known as glomus bodies [1,2,3,4]... This has been known as the most frequent site in women, while the tumor occurs

Vandetanib versus placebo in patients with advanced non-small-cell lung cancer after prior therapy with an epidermal growth factor receptor tyrosine kinase inhibitor:

(Added-AFRC) AFRC crewmembers who are required by the appropriate AFI 11-2 MDS Specific Volume 1 to attend a simulator or ground training session where CRM is part of the

According to Kumanyika, Wilson, and Guilford-Davenport (1993), the social environment of African American women is less negative about obesity than might commonly be assumed, based

The best 0.1% (10 sets of TMD parameters) design that produces the best performance criteria of peak displacement and drift is recorded and selected first to investigate the trend