• No results found

A New Method on Modularity Gain Derivation and Enhanced LM

N/A
N/A
Protected

Academic year: 2020

Share "A New Method on Modularity Gain Derivation and Enhanced LM"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

2017 2nd International Conference on Computer, Mechatronics and Electronic Engineering (CMEE 2017) ISBN: 978-1-60595-532-2

A New Method on Modularity Gain Derivation and Enhanced LM

Bo YANG

1

, Yuan JIANG

1

, Shao-yu LI

1

, Guang-hui YAN

2, *

and Ya-fei WANG

2

1

Gansu Electric Power Information & Communication Centre, China

2

School of Electronic and Information Engineering, Lanzhou Jiaotong University, China

*Corresponding author

Keywords: Complex network, Community Detection, Modularity, LM.

Abstract. Louvain Method (LM), one of mainstream community detection approach based on

modularity optimization, is widely used by virtue of its nearly linear time complexity and high quality community detection but has some deficiencies with respect to the theory and efficiency. We firstly present a method to calculate the Q-gain after node leaving their community and improve the theoretical research in this field, considering there is no method to calculate the gain in the existing research. Secondly, in view of the high storage space demands of LM and the sparse nature of complex networks, we propose an isolated node separation strategy, which only remains the connected nodes in each iteration. The experimental results based on the synthetic and real networks illustrate the effectiveness and efficiency provided by our approach.

Introduction

How to analyze and predict the complex relationship networks extracted from massive data is an essential part of research on many contemporary disciplines spanning from social [1, 2], biological [3, 4], computer science [5-7] to energy [8], as well as economics [9]. It is possible to identify groups of nodes that are densely connected among themselves, but sparsely connected to the rest of the network. Research on complex network community discovery method is very important to analyze the topology and hierarchical structure of complex network, understand the formation process of community, predict the dynamic change of complex network and reveal the regular characteristics of complex network [10].

The community-detection problem is challenging for the reason partly due to the fact that there is no universally standard quantitative definition. To deal with the situation, modularity Q [11, 12], one of the most widely used definition, is proposed. Researchers come up with the modular optimization algorithms used heuristic strategy to mine the network structure, which are mainly based on the greedy strategy algorithm[13,14], hierarchical clustering[15,16], and the integration of multiple strategies[17-19](greedy strategy, local optimization, hierarchical clustering, etc.). Among these approaches, the method proposed by Blondel et al [17], also called Louvain Method (LM), has become widely used by virtue of its relative computational efficiency and the high quality of the community detection results.

LM is a multiscale method, obtained the optimal result during each iteration. Each time, modularity is first optimized using a greedy local algorithm, then a ‘supernetwork’ is formed whose nodes represent the communities discovered and the greedy algorithm is repeated on this supernetwork. The process iterates until there is no further improvements in modularity.

Although LM is efficient, widely used and gives informative results, it has, indeed, deficiencies and the room for improvement. Here we supplement the theoretic basis of modularity Q and decrease the memory space of LM in a principled and flexible manner.

(2)

second type of gain. In effect, we give a calculation method of Q gain after nodes leaving the original community, and perfect the theoretical research in the field.

Second, The LM algorithm has high quality community discovery results, partly because of that it provides the hierarchical community structure by intermediate results. The intermediate results of the storage in the process force the LM algorithm to demand larger storage space, about 20 times than of other similar algorithms [17]. In this paper, an improved strategy with removing isolated nodes is introduced in the LM algorithm. Experiments show that, compared with the original algorithm, the improved algorithm not only reduces the demand of storage space, but also its running time has been further reduced.

Modularity Q and Gain Calculation Method

Modularity Q

The modularity Q has two equivalent definitions [14], which are based on the definition of the adjacency matrix and of the network community connection matrix. Here we only give the latter.

2

1 2 2

k ii i

i e a Q E E =     = −   

(1)

In the community connection matrix of the network, the larger the value of Q is, the higher the sum of the elements on the diagonal line account for the sum of all the elements in the matrix. The modularity Q gives a clear definition of the community structure, and its value range is [-0.5, 1].

Gain-Q calculation Method

Considering a network divided among n communities, we define its community connection matrix as A. Merging the n−1th community and the nth community, adding the nth row (column) elements to the nth row (column) in Awe get B, then the merged Q gain is calculated as shown in Eq. 2.

1 1 1

11 12

2 1 2

21 22

1 1 1 2 1 1 1

1 2 1

,n n

,n n

n , n , n ,n n ,n

n , n , n ,n n ,n

e e

e e

e e

e e

A

e e e e

e e e e

− − − − − − − −         =         … … … …

1 2 1 1 1

11 12

2 2 2 1 2

21 22

2 1 2 2 2 2 2 1 2

1 1 1 1 2 2 1 2 2 1 1 1 1

,n ,n n

,n ,n ,n

n , n , n ,n n ,n n ,n

n , n , n , n , n ,n n ,n n ,n n ,n n ,n n ,n

e e e

e e

e e e

e e

B

e e e e e

e e e e e e e e e e

− − − − − − − − − − − − − − − − − − − − +     +    =   +   + + + + + +    … … … …

merge B A

Q Q Q

∆ = −

(

)

( )

2

2 2

1 1

1 1 1

1

2 2

n

n n n i i

n ,n n ,n i ii

a a a

e e e

E E − − = − − =  + +    = + + −    

2 1 1 1 1 2 1 1

2 E 2 E 2 E 2 E

n i n n

i i n ,n n ,n

i

a a a

e e e

− − =    ⋅  −  − =  + +      ∑ 1 1 1

E 2 E

n n n ,n a a e − −  ⋅  =  − 

  (2) We denote cv as the community which node v belongs to.

' v

c signify the community after

removing v and the edge connected to v from cv, where we regard v as a community with only one

node. In the inverse derivation according to Equation 2, the calculation of Q gain after v leaving cv as

(3)

(

)

1

E 2 E

v

' v

v c v

depart v ,c

a a a

Q e

 

∆ = −

 

 

(3)

Where avdenotes for the degree of node v. acvis value summing all nodes degree up, and ev ,cv'is the

sum of weight of edges connected between node v and community ' v c .

Advanced LM with Isolated Nodes Separation Strategy

Advanced Algorithm

The main procedure of advanced LM with isolated node separation strategy(LM+) as shown in algorithm 1. In the lth iteration, the input of the algorithm is Gl−1 in the l-1th layer of network, output

is isolated node collection Il−1 of Gl−1 and Gl in the lth layer of network.

Algorithm 1: Louvain method with improved strategy

Input: initial network G0=

(

V E w0, 0, 0

)

Output: connected network set G=

(

G G1, 2,,Gl

)

isolated nodes set I=

(

I I0, ,1,Il

)

1. While ( true )

2. l← +l 1;

3. INIT(Gl1)

4. While increase do

5. increasefalse;

6. t← +t 1;

7. Foreach i of Vl1 do

8. Qdepart|∆QmaxEnter|MaxCidgetQ( Nl−1( i )) 9. if(Qdepart>0&&Q>Qdepart+ ∆QmaxEnter) 10. increasetrue;

11. V i cid alll[ . ]. ←V i cid alll[ . ]. −i all. ;

12. G .Add(l Num);

13. i cid. ←NumNumNum+1;

14. elseif (Qdepart+ ∆QmaxEnter >0) 15. increasetrue;

16. V i cid alll[ . ]. ←V i cid alll[ . ]. −i all. ;

17. V MaxCid alll[ ]. ←V MaxCid alll[ ]. +i all. ;

18. i cid. ←MaxCid

19. if(V i cid alll[ . ]. ==0)

20.

V Delete i cidl. ( . ); 21. if(t==1)

22. Return ;

23. Foreach i of Vl1 do

24. V i cid inl[ . ]. ←V i cid inl[ . ]. +i in. ;

25. Foreach j of Nl1( )i do

26. if(i cid. == j cid. )

27. V i cid inl[ . ]. ←V i cid inl[ . ]. +wl1( , )i j ; 28. else

29. w i cid j cidl( . , . )←w i cid j cidl( . , . )+wl1( , )i j

30. G.Add(G )l

Experimental Comparison and Analysis

In this section, we quantify the performance of LM+ by comparing it to the original algorithm in various social and information networks. We evaluate the performance of the methods by assessing the accuracy of the detected community when compare to the gold-standard, ground-truth communities. We also measure the running time to evaluate the scalability as the network size are different. Furthermore, we introduce the space compression ratio(SCR) to measure the contribution of algorithms in space storage, which is defined as below:

SCR = (LM storage node - LM + storage node)/ LM storage node

(4)
[image:4.612.128.488.66.168.2]

Figure 1. Output size of each iteration. Figure 2. The propotion of input node in the iteration.

Fig. 1 and Fig. 2 show the process of the LM + algorithm and the LM algorithm running on the bank customer transaction network. Due to the complex network is a sparse network, and with the operation of the algorithm, the more obvious the structure of the community in the iteration is, the less the number of edges is, when calculating the storage space, we only count the number of nodes in the network but no edges. When counting the number of nodes that need to be stored in each iteration, the number of nodes that count the final result of this iteration is counted, regardless of the number of nodes in the intermediate scan process.

We demonstrate the input and output size of network during each iteration on the Figure 3 and Figure 4. In the iteration, the LM + algorithm brings separation of the isolated nodes in the input network forward, and only allows the connected nodes actually participate in the iterative process, which makes the size of the input network in the subsequent iteration significantly reduced, and can reduce the demand for time and space resources. For example, in the fourth iteration, the number of nodes in the LM + algorithm is 798, and 38776 nodes in the LM algorithm. The 38776 nodes mainly contain isolated nodes, which are not removed in the previous rounds .

In light of the situation mentioned above, the LM + algorithm stores the isolated nodes and the connected nodes separately when the result of the intermediate iteration process is stored, so that the isolated nodes generated by each iteration are stored only once, which avoids the repeated storage of such nodes and greatly reduces the storage space demands.

[image:4.612.91.517.434.557.2]

Figure 3. Improved percentage of algorithm reduction at run time. Figure 4. Improved modularity by out method.

(5)

Conclusions

In this paper, we first deduce the calculation method of the Q gain after the node leaving the network, and supplement the theoretical research on modularity Q in complex networks. Then, compared with LM algorithm, the improved LM algorithm reduces the run time of the algorithm and the demand for the storage space by introducing the improved strategy of eliminating the isolated nodes, and promotes the accuracy of community detection by importing the new node movement, which let it leave the original community as a community alone. In further work, we will focus on the effect of the node input order on the LM + algorithm to improve its sensitivity of the node input sequence.

Acknowledgements

This work is supported by National Natural Science Foundation under Grant No.61363060 No.61662066 and No. 61163010, the Science project of Gansu Electric Power Information & Communication Centre.

Reference

[1] Mislove A, Marcon M, Gummadi K P, et al. Measurement and analysis of online social networks[C]// ACM SIGCOMM Conference on Internet Measurement 2007, San Diego, California, Usa, October. DBLP, 2007:29-42.

[2] González M C, Hidalgo C A, Barabási A L. Understanding individual human mobility patterns.[J]. Nature, 2008, 453(7196):779.

[3] Palla G, Derényi I, Farkas I, et al. Uncovering the overlapping community structure of complex networks in nature and society. [J]. Nature, 2005, 435(7043):814.

[4] Gavin A C, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. [J]. Nature, 2006, 440(7084):631-636.

[5] Newman M. Networks: An Introduction [M]. Oxford University Press, Inc. 2010.

[6] Estrada E. The Structure of Complex Networks: Theory and Applications [M]. Oxford University Press, Inc. 2011.

[7] Knight P A. A First Course in Network Theory [M]. Oxford University Press, 2015.

[8] Watts D J, Strogatz S H. Collective dynamics of 'small-world' networks. [J]. Nature, 1998, 393(6684):440.

[8] Vespignani A. Evolution of Networks: From Biological Nets to the Internet and WWW [J]. Oup Catalogue, 2004, 57(10):81-82.

[9] Zhang X, Liu B, Wang X. Research on community detection methods in complex network [J]. Water Science & Technology Water Supply, 2013, 13(2):368.

[10] Newman M E J, Girvan M. Finding and evaluating community structure in networks [J]. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2003, 69(2 Pt 2):026113.

[11] Newman M E J. Modularity and community structure in networks [J]. Proceedings of the National Academy of Sciences, 2006, 103(23):8577-8582.

[12] Newman M E J. Fast algorithm for detecting community structure in networks. [J]. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2003, 69(6 Pt 2):066133.

(6)

[14] Duch J, Arenas A. Community Detection in Complex Networks Using Extremal Optimization [J]. Physical Review E, 2005, 72(2 Pt 2):027104.

[15] Lü Z, Huang W. Iterated tabu search for identifying community structure in complex networks.[J]. Physical Review E Statistical Nonlinear & Soft Matter Physics, 2009, 80(2 Pt 2):026130.

[16] Blondel V D, Guillaume J L, Lambiotte R, et al. Fast unfolding of communities in large networks [J]. Journal of Statistical Mechanics Theory & Experiment, 2008, 2008(10):155-168.

[17] Liu X, Murata T. Advanced modularity-specialized label propagation algorithm for detecting communities in networks [J]. Physica A Statistical Mechanics & Its Applications, 2010, 389(7):1493-1500.

[18] Gach O, Hao J K. A Memetic Algorithm for Community Detection in Complex Networks [M]// Parallel Problem Solving from Nature - PPSN XII. Springer Berlin Heidelberg, 2012:327-336.

[19] Klimt B, Yang Y. Introducing the Enron Corpus.[C]// Conference on Email & Anti-Spam. DBLP, 2004.

Figure

Figure 1. Output size of each iteration.

References

Related documents

The essence of the VB method relies on making simplifying assumptions about the poste- rior dependence of a problem. By definition, the general posterior dependence structure

This conclusion is further supported by the following observations: (i) constitutive expression of stdE and stdF in a Dam + background represses SPI-1 expression (Figure 5); (ii)

This essay asserts that to effectively degrade and ultimately destroy the Islamic State of Iraq and Syria (ISIS), and to topple the Bashar al-Assad’s regime, the international

○ If BP elevated, think primary aldosteronism, Cushing’s, renal artery stenosis, ○ If BP normal, think hypomagnesemia, severe hypoK, Bartter’s, NaHCO3,

Free vertical vibrations of a 9200TEU containership are analyzed where the hull was modelled by the beam finite elements (FE), while the shell plating was modelled by the plate

July 2010 research stage, the fourth coder (of Russian and Kyrgyz language print media) completed coding. The content coding is now ready for comparative analysis. The case

National Conference on Technical Vocational Education, Training and Skills Development: A Roadmap for Empowerment (Dec. 2008): Ministry of Human Resource Development, Department

If you understand are seeking the book entitled TRIVIA QUIZ BOWL PORTHOLE - MIDDLE SCHOOL EDITION By Mary Carmalt M.Ed. as the selection of reading, you could