Edge Label Propagation Algorithm Based on Node Influence

(1)

2017 2nd International Conference on Computer, Mechatronics and Electronic Engineering (CMEE 2017) ISBN: 978-1-60595-532-2

Edge Label Propagation Algorithm Based on Node Influence

En-dong CHANG, De-cheng ZHANG, Shi QIU, Jie HU and Hong-tao LIU

School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

Keywords: Community detection, Label propagation, Node influence, Overlapping communities.

Abstract. Community discovery is an important tool for understanding complex networks in depth. For real-world complex networks, overlap of communities is essential. In order to discover overlapping communities, this paper presents an overlapping community detection method (ELPANI) using node influence and side label propagation. Since the birth of LPA, a wide range of attention has been paid, but its results are unstable due to the randomness of its label renewal order. To solve these problems, we use the node's influence to initialize the edge label, and in the edge label update process, we use the degree of edge sorting to avoid these random factors. In order to retain multiple communities, we have retained multiple tabs on the side. Finally, we restore the completed edge label to the node.

Introduction

Communities are abstracted from real systems. Many of the real world networks contain community structures and community mining has seen long-term growth in recent years. Nowadays, community mining has been applied to many fields of research, such as the microblogging network of community relations[1], biological analysis of protein networks[2,3], Abstract Network community testing[4], etc. Community discovery is an important step in understanding complex networks. Although there is no universal definition of community structure, it is widely accepted that a community in a network should have more internal connections than external ones[5]. In the past few years, many methods have been proposed to detect the community structure of complex networks. However, most of these approaches focused on the identification of node community. For example, the Cluster Percolation method proposed by Gergely Palla (CPM) [6], Steve proposed the Combined Split algorithm (CONGA) [7], Gregory proposed the overlapping community propagation algorithm (COPRA) [8].

Until recently, Wei Liu published complex network community discovery algorithms through edge label published in 2016[ELPA] [9]. The community discovery algorithm by the edge label propagation has brought a new idea for community discovery, which combines the natural advantage of link community with the efficiency of the label propagation algorithm.

The edge label propagation algorithm (ELPA) includes four steps: (1) Initialization of community labels, (2) edge label propagation, (3) node label propagation, (4) bridge identification. The initialization step is used to construct all candidate link communities, edge label propagation completes the update iteration of the edge label until stable, node label propagation completes the division of the community to which the node belongs, then, edges that cross any two communities are marked as bridges, finally, the algorithm classifies the edges and nodes into the corresponding communities to which they belong.

(2)

label retention threshold. The rest of this article is organized as follows: in the second section, we introduce the relevant work of community discovery. The third section introduces the algorithm which we proposed, and the fourth section gives the algorithm analysis, section five summarizes the article.

Related Work

The label propagation algorithm [9] discover communities by the interaction of vertex label, update the label for the current vertex by selecting the most of the neighbors. The label of the vertex i is li, then the update step of the label can be expressed in the following formula:

(1)

W is the adjacency matrix, Error! Reference source not found.is the label after the node i updates, Nb(i) is the neighbor node set of i, Error! Reference source not found.is the Kronecker delta function:

(2)

With the gradual updating of node labels, the labels of each node tend to be stable. Until all the labels of the nodes are no longer changing, all the nodes with the same label are regarded as the same community.

Label propagation algorithm is known for its time complexity approaching to linear time, but its random label update order makes the final community result unstable. In the bipartite graph, there will be an infinite loop phenomenon.

After that, Gregory and others extended the LPA to overlapping community discoveries, COPRA algorithm allows each node has multiple labels, the label structure is (l, b), The algorithm adds attribution b to each label of the node, attribution indicates the probability of a label belonging to the node, . The sum of attributions of all label of a node is 1. Use to represent the degree of ownership of the x vertex label i at the tth iteration.

(3) Nb (x) represents the neighbor set of x.

After COPRA there are many improved algorithms, such as, SLPA introduces listeners and speakers[10], HANP, which incorporates score values to label propagation[11], BMLPA is introduced to balance the ownership[12], etc. However, all of these algorithms use vertex labels to propagation and assign a unique label to each vertex. Therefore, in order to speed up the update of edge labels, we use the node's influence to initialize the edge labels, and use edge labels to iterate. For details, see Chapter 3.

Algorithm

Edge Label Initialization

(3)

2. Starting from the most influential node, tag assignment for the label of the node The influence of the node is as follows:

(4)

(5) p(u) represents the influence of node u; deg(u) represents the degree of node u; N(u) represents a neighbor set of node u.

Note: the edge that has been initialized will be deleted from the graph, and it will no longer appear in other subsequent updates. It indicates that if the two nodes are connected, the label on the edge is the label of the node with high influence.

Table 1.The algorithm description of this section. Algorithm.

Input: A graph G=(N,E) Output: Elabel begin

1. For i 1 to |N| do

2. Calculate the node's influence according to equation (1), arrange it in descending order as Nds 3. End

4. For i 1 to |N| do

5. Assign tags to the adjacent edges of Nodes in the order of Nds, and add to Elabel 6. Delete the initialized edge in the adjacency matrix of figure G

7. End

8. Return Elabel

Node Label Propagation

The LPA algorithm in the node update stage by using random strategy update, using random strategy to solve the conflict in the same number of labels, in order to avoid the random factors of instability. We use the degrees of the edge in the edge label update ranking, we define the degree of the edge as the two node degree connected by the side. This not only makes the update order of the edges well, but also keeps the speed advantage of the LPA algorithm because the computation of the edge degree is linear, the running speed of the algorithm will not be reduced.

In the label update phase of label propagation algorithm, we can take synchronous update or asynchronous update, However, the kth update in the synchronization update is based on the label of

the (k-1)th neighbor. Hence, , where is

the label of node x at time k. The problem however is that subgraphs in the network that are bipartite or nearly bipartite in structure lead to oscillations of labels. Hence we use asynchronous updating

where , are neighbors

that are not yet updated in the current iteration. are neighbors that are not yet updated in the current iteration. Our edge label propagation also has two sub graph problems, so our edge label propagation will also use asynchronous update strategy.

In order to find overlapping communities, we set a label pair for each edge, e(l, b), where l is the label and b is the degree of attribution of the label. In order to avoid generating many small communities, we take the label with the degree of ownership less than 1 / v when receiving the label, v represents the total number of labels of the current node.

(4)

[image:4.612.112.509.100.374.2]

Table 2.The algorithm description of this section. Algorithm.

Input: A graph G=(N,E)，Elabel

Output: A set of overlapping communities P={C0,...,Ck} begin

1. P={∅_}

2. Node label propagation. 3. cyclet = 100

4. While cyclet > 0 do 5. cyclet – 1

6. The degree of the side is sorted and stored as Eorder 7. For i 1 to |E| do

8. Update the edge label in the order of Eorder

10. Delete the degree of belongingness below the 1/v label 11. EndFor

12. If The labels on each edge no change 13. Break

14. EndIf 15. Endwhile

16. save the edges of the same side label into the C

17. save the corresponding node to the corresponding community P according to the side label 18. Return P

Algorithm Analysis

In this section, we analyze the time complexity of this algorithm. N stands for nodes, and M represents the number of edges. In the first stage, when each label is initialized, the influence of each node is calculated. The whole process is repeated, so the time complexity of initialization is O (n). In the second step, the edge label propagation is required for each edge to traverse the labels of all its adjacent sides, because each side has two nodes, so the total degree is 2 times of all nodes, and the time complexity is O (2logn). Finally, the time complexity of restoring the tag to the point after the update is completed is O (m). So the time complexity of this algorithm is O (n+2logn+m). It can be seen that this algorithm is very fast, and because this algorithm avoids the random factors, the community will be stable.

Conclusion

In this paper, we propose an edge label propagation algorithm based on the influence of nodes to detect overlapping communities. It not only preserves the speed advantage of the tag propagation algorithm, but also avoids the instability caused by the random update. However, there are still many problems to be improved in our algorithm. For example, label propagation algorithm is easy to generate multiple small communities, etc., which are all problems we need to solve in the future.

Reference

[1] Palla, G., Baraba’si, A. L. & Vicsek, T. Quantifying social group evolution. Nature 446, 664–667 (2007).

(5)

[3] Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Baraba´si, A. L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).

[4] Amiri, B., L. Hossain, J. W. Crawford, and R. T. Wigand. 2013. Community detection in complex networks: multi–objective enhanced firefly algorithm. Knowledge-Based Systems, 46: 1–11.

[5] Newman, M. E. J., Baraba’si, A. L. & Watts, D. J. Te Structure and Dynamics of Networks (Princeton Univ. Press, 2006).

[6] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society [J]. Nature, 2005, 435(7043):814-818.

[7] Steve Gregory. An Algorithm to Find Overlapping Communities in Networks [C].Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases, 2007:91-102.

[8] Gregory S. Finding overlapping communities in networks by label propagation [J]. New Journal of Physics, 2009, 12(10): 2011-2024.

[9] Raghavan UN, Albert R, Kumara S. Near linear time algorithm to detect community structures in large-scale networks [J]. Physical review. E, Statistical, nonlinear, and soft matter physics, 2007, 76(3 Pt 2):036106.

[10] Xie J, Szymanski B K, Liu X. SLPA: Uncovering Overlapping Communities in Social Networks via a Speaker-Listener Interaction Dynamic Process[C]// IEEE, International Conference on Data Mining Workshops. IEEE Computer Society, 2011:344-349.

[11] Leung IXY, Hui P, Lio P, Crowcroft J. Towards real-time community detection in large networks. Physical Review E, 2009, 79(6): 066107. [doi: 10.1103/PhysRevE.79.066107]

[12] Wu Z H, Lin Y F, Gregory S, et al. Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks [J]. Journal of computer science and Technology (English Edition), 2012, 27(3):468-479.