Abstract
Community detection, an effective tool to analyze and understand network data, has been paid more and more attention in recent years. One of the most popular methods of detecting community structure is to find the division with the maximal modularity. However, the modularity maximization is an NP-complete problem. In the field of swarm intelligence algorithm, particle swarm optimization (PSO) has been widely used to solve such NP-complete problem. Nevertheless, premature convergence and lower accuracy limit its performance in community detection. In order to overcome these shortcomings, this paper proposes a novel PSO called P-PSO for community detection through combining the computational ability of Physarum, a kind of slime. The proposed algorithm improves the efficiency of PSO by recognizing inter-community edges based on Physarum-inspired network model (PNM). Experiments in eight networks show that the proposed algorithm is effective and promising for community detection, compared with other algorithms.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Complex networks have numerous characteristics, among which the community structure is an important one. Community detection, a powerful tool to discover community structures, has a wide application prospect, like predicting protein functions [1] and analyzing the information dissemination [2].
In the past few decades, a large number of algorithms have been proposed for community detection. They can be classified into optimization algorithm and heuristic algorithm. Meanwhile, a modularity measure Q [3] is proposed to evaluate the quality of community divisions, which has been widely used. It has been proved that swarm intelligence optimization algorithms including particle swarm optimization algorithm (PSO) [4] show their superiority in local learning and global search. Recently, Cai et al. have successfully used greedy discrete particle swarm optimization algorithm (GDPOS) [5] to detect the community structures in a network. However, failing to make full use of prior knowledge of network and generate high-quality initial population, this algorithm does not lead to the good enough performance of global search and relatively high accuracy.
According to the latest reports, a large number of biological experiments have demonstrated that a slime named Physarum has an intelligence of solving mazes and constructing efficient and robust networks [6, 9]. Meanwhile, the Physarum-inspired Mathematical Model (PM) has been proposed by Tero et al. [7], which has been used for optimizing the heuristic algorithms [8]. Thus, a Physarum-inspired network model (PNM) is proposed for initializing the PSO based on the PM model, which is utilized to distinguish inter-community edges from intra-community edges. Furthermore, we attempt to optimize the phase of PSO’s initialization for higher quality in community detection.
The remaining of this paper is organized as follows: Sect. 2 illustrates the related background and introduces the particle swarm optimization algorithm for community detection. Section 3 proposes the Physarum-inspired particle swarm optimization algorithm. Section 4 reports the experiments in eight real-world networks and the comparisons with state of the art algorithms. Section 5 concludes this paper.
2 Related Work
2.1 Community Detection
A network can be composed of nodes and edges, in which nodes usually stand for members and edges represent relationships between members. Let \(G= (V, E)\) denote a network, where V and E are the aggregations of nodes and edges, respectively. Aiming at dividing the nodes in a network into different communities, community detection results in that nodes across communities are sparsely connected, while nodes within a community are relatively densely connected. Under the premise that a community is a subset of V and \(n_{c}\) is defined as the number of communities, a community division is a set of communities, \(C_{i}\subset G, C=\{C_{1}, C_{2}, \dots , C_{n_{c}}\}\), where \(C_{i}\ne \varnothing , \bigcap \limits _{i=1}^{n_{c}}C_{i}=\varnothing , \bigcup \limits _{i=1}^{n_{c}}C_{i}=G\).
In this paper, the proposed fitness function is the widely used modularity (normally denoted as Q) [3]. The Q function can be written as Eq. (1), where |V| and |E| are the number of nodes and edges of a network, respectively; A is the adjacency matrix of a network and \(A_{ij}=1\) if there exists an edge between node i and j; \(k_{i}\) is the degree of node i, and \(\delta (i, j)=1\) if the nodes i and j are in the same group, otherwise \(\delta (i, j)=0\). Without the loss of generality, we assume that the better division corresponds to the higher Q value. Therefore, the community detection can be transformed into an optimization problem formulated as Eq. (2).
2.2 PSO for Community Detection
Derived from the social behavior seen in some animal populations, like fish school and birds flock, PSO is a type of swarm intelligence algorithm proposed by Eberhart and Kennedy in 1995 [4]. The concise framework, simple principle and fast convergence make PSO a popular algorithm for solving continuous optimization problems. Each particle has a position and velocity vector. The position vector usually stimulates a candidate solution to the optimized problem, and the velocity vector denotes the tendency of position updating. A particle updates its status iteratively according to its own and the other particles’ experiences to search for the optimal solution. Here, we take a typical PSO for network clustering, termed GDPSO, as an example to introduce the basic parts of PSO for community detection.
Particle representation: Considering that the community detection is a discrete optimized problem, we have to redefine the particle positions. One position vector represents a network division and the position vector of the particle i is defined as \(X_{i}=\{x_{i}^{1}, x_{i}^{2}, \dots , x_{i}^{n}\}\), where \(x_{i}^{j}\in [1, n]\) is an integer.
In such definition, \(x_{i}^{j}\) is called a label identifier standing for the community the node j belongs to. If \(x_{i}^{j}=x_{i}^{k}\), then node j and k belong to the same community. Not only is this coding scheme easy to decode, but also it can determine the number of the communities after division directly. As a result, the computational complexity will be reduced. The coding scheme of the particle is shown in Fig. 1.
Particle-status-updating rules: The operation of updating status must be redefined under the discrete background in order to make GDPSO practicable for community detection. The updating rules are put forward as follows:
In the above equations, the \(Pbest_{i} = \{pbest_{i}^1, pbest_{i}^2\dots , pbest_{i}^D\}\) and \(Gbest_{i} = \{gbest_{i}, gbest_{i}\dots , gbest_{i}\}\) are the \(i^{th}\) particle’s best personal position and the best global position of the swarm, respectively; the inertia weight \(\omega \), the learning factors \(c_{1}\) and \(c_{2}\) are set typical values of 0.7298, 1.4961 and 1.4961; the \(r_{1}\) and \(r_{1}\) are random numbers ranging from 0 to 1.
In Eq. (3), \(\ominus \) is defined as an XOR operator. Provided with two velocity vectors \(V_{1}=\{v_{1}^{1}, v_{1}^{2}, \dots , V_{1}^{n}\}\) and \(V_{2}=\{v_{2}^{1}, v_{2}^{2}, \dots , v_{2}^{n}\}\), \(V_{1} \oplus V_{2} = V_{3} = \{v_{3}^{1}, v_{3}^{2}, \dots , v_{3}^{n}\}\) is a velocity vector with a detailed operation shown as follows:
In Eq. (4), given an old position \(X_{old}=\{x_{old}^{1}, x_{old}^{2}, \dots , x_{old}^{n}\}\) and a velocity \(V=\{v_{1}, v_{2}, \dots , v_{n}\}\), \(X_{old}\otimes V = X_{new}=\{x_{new}^{1}, x_{new}^{2}, \dots , x_{new}^{n}\}\) is a position vector whose element is defined as follows:
where \(L_{i} = \{l_{1}, l_{2}, \dots , l_{k}\}\) is the set of label identifiers of node \(i's\) neighbors. The \(\varDelta Q\) is calcluated using the following equation:
In general, each node chooses the community identifier which contributes to the largest increase or the smallest decrease of Q value based on its neighbors.
Mutation: GDPSO implements the mutation operation so as to preserve diversity and avoid falling into local optima. The procedure can be depicted as follows: generating a random number between 0 and 1; for each node in a network, if the random number is smaller than the mutation probability pm, assigning its label identifier to all of its neighbors.
3 Physarum-inspired PSO for Community Detection
3.1 The Physarum-based network mathematical model
In this paper, PM model is modified into Physarum-based network model (PNM) which could be used to recognize the intra-community edges in a network. The key mechanism of PM model is the feedback system between the fluxes and conductivities of tubes based on the Posieuille flow.
First, let \(Q_{i, j}^{t}\), \(D_{i, j}^{t}\), \(L_{i, j}\) and \(p_{i}^{t}\) stand for the flux, the conductivity, the length of \(e_{i, j}\) and the pressure of \(v_{i}\) at time step t, respectively. The relationship among these parameters can be represented as Eq. (8). Second, according to the Kirchhoff’s law formulated in Eq. (9), the pressure and fluxes can be obtained by solving such equations at each iteration step. Third, \(Q_{i,j}^{t}\) feeds back to \(D_{i,j}^{t}\) based on Eq. (10), and as iteration step t is completed, the iteration step \(t+1\) repeats the above procedures on the basis of the data iteration step t returns. Finally, as such positive feedback continues, a highly efficient network is generated [7].
PNM is based on the Physarum-inspired Mathematical Model (PM), whose major modification is the scheme of choosing inlets/outlets in each iteration. In such model, a vertice is chosen as an inlet, while the others are chosen as outlets. Namely, Eq. (9) is modified as Eq. (11), where \( D \) and \( L \) are known. Given a certain inlet and outlet, a set of equations based on Eq. (11) can be obtained. By solving such equations, we get \(p_{i}\) of node i, where i ranges from 1 to |V|. Besides, every vertice is chosen as the inlet once in each iteration step of PNM. When \(v_{i}\) is chosen as the inlet, a local conductivity matrix denoted as \(D^{t} (i)\) is calculated based on the feedback system. Eventually, after all local conductivity matrices are obtained, the global conductivity matrix is updated by the average of \(D^{t} (i)\) based on Eq. (12).
3.2 Physarum-Inspired Network Model for Community Detection
Taking advantage of PNM, we roughly distinguish the inter-community edges from intra-community through conductivities. Then, we adopt PNM optimize initialization generating a high-quality initial solution and accelerating convergence.
We can obtain a matrix D through PNM, and suppose that node i has a neighbor set \(L (i)=\{l_{1}, l_{2}, \dots , l_{k}\}\) and let label(i) be the community label which node i belongs to. First, for each node i, we initialize label(i) as i. In addition, we assume that \(\varOmega _{i} = \{label (j)|j\in L (i)\; and\; D_{i,j}< (1-R\%)*D_{max}\}\) includes the community labels of neighbors of node i. Namely, the top \(R\%\) conductivities \(D_{i,j}\) denote that the edges between node i and j are inter-community edges. Then, each node randomly selects an element from \(\varOmega _{i}\) as its new label.
For the next step, the label propagation is utilized to optimize preliminary initial solution further. Each node determines its community label based on the labels of its neighbors. We assume that each node in the network chooses to join the community with the largest number of its neighbors, which can be represented as Eq. (13), where \(\delta (i, j)\) is 1, if node i and j belong to the same community, otherwise \(\delta (i, j)\) is 0. This step is executed iters times where iters is the number of propagation iteration. For a clear expression, with a prefix (i.e., P−) added to the original GDPSO algorithm for distinction, the novel algorithm is denoted as P-PSO. The detailed process of P-PSO is shown in Algorithm 1.
4 Experiments and Results
All experiments are executed in the same environment to enable fair comparisons between our algorithm and other algorithms including GDPSO [5], IACO-Net [12] and PNGACD [13]. All results are averaged over 30 repeated runnings in order to eliminate fluctuation. There are two popular metrics for evaluating the performance of community detection: the modularity Q and normalized mutual information (NMI) [10].
4.1 Results on Benchmark Networks
Some experiments are carried out in the GN benchmark network proposed by Lancichinetti et al. [11]. \(\alpha \) denotes the mixing parameter which controlls the proportion of links within and out of a community. We test all algorithms in eleven computer-generated networks with the value of \(\alpha \) ranging from 0 to 0.5.
As shown in Fig. 2, when the mixing parameter is no larger than 0.1, all algorithms except PNGACD can discover the correct communities (\(NMI = 1\)). With the mixing parameter increasing, the IACO-Net fails to detect the true partitions. For \(\alpha = 0.4\), P-PSO and GDPSO still obtain \(NMI = 1\). When \(\alpha \) is larger than 0.4, the NMI of GDPSO decreases more quickly than the proposed P-PSO. The experiments in the GN benchmark networks prove that P-PSO is feasible for community detection.
4.2 Results on Real-World Networks
Table 1 shows the structural characteristics of eight real-world networks used in our experiments for evaluating the performance of our proposed method.
Figure 3 shows the results that some experiments are implemented to verify the robustness of P-PSO in the four networks. It can be concluded that P-PSO has a better stability than that of GDPSO. Table 2 reports the maximal and mean values of Q in other real-world networks. Results show that P-PSO is substantially better than the compared algorithms.
Figure 4 reports the dynamic average modularity with the increment of iteration. The optimized algorithm P-PSO has a higher growth rate than the original GDPSO at the initial phase. The difference between them becomes smaller with the increment of iteration, and yet P-PSO converges faster than GDPSO. Above all, P-PSO shows a superiority in Q value during the whole iteration process.
Figure 5 shows the community divisions in Polbooks and Football. In Fig. 5(a), the geometric figures denote the real communities and the colors denote communities detected by P-PSO. Due to the context of books, some books are connected more closely and form smaller communities, which disorganizes the original divisions in the real world. In terms of the Football network, the positions are denoted as the real division and the colors mean five communities in the division of P-PSO. Each node represents a football team in the real world, and an edge stands for a game they have together. The marked circle emphasizes the main difference between the detected communities by P-PSO and the real communities.
5 Conclusion
The research about community detection is helpful for us to analyze the basic characteristics of networks. Taking advantage of the Physarum network model (PNM) and greedy discrete particle swarm optimization algorithm (GDPSO), we propose a particle swarm optimization algorithm (P-PSO). The experimental results in eight real-world networks demonstrate that P-PSO shows a better ability in optimizing the initial solution and can obtain effective and promising results than other state of the art algorithms.
References
Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Weng, L., Menczer, F., Ahn, Y.Y.: Virality prediction and community structure in social networks. Sci. Rep. 3, 2522 (2013)
Newman, M.E.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of 1995 IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Press, New York (1995)
Cai, Q., Gong, M., Ma, L.: Greedy discrete particle swarm optimization for large-scale social network clustering. Inf. Sci. 316, 503–516 (2015)
Tero, A., Takagi, S., Saigusa, T., Ito, K., Bebber, D.P., Fricker, M.D., Yumiki, K., Kobayashi, R., Nakagaki, T.: Rules for biologically inspired adaptive network design. Science 327, 439–442 (2010)
Tero, A., Kobayashi, R., Nakagaki, T.: A mathematical model for adaptive transport network in path finding by true slime mold. J. Theor. Biol. 224, 553–564 (2007)
Liu, Y., Gao, C., Zhang, Z., Lu, Y., Chen, S., Liang, M., Tao, L.: Solving np-hard problems with physarum-based ant colony system. IEEE/ACM Trans. Comput. Biol. Bioinf. 14, 108–120 (2017)
Nakagaki, T., Yamada, H., Tóth, Á.: Intelligence: maze-solving by an amoeboid organism. Nature 407, 470–470 (2000)
Danon, L., Daz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech. Theory Exp. 2005, P09008 (2005)
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E 78, 046110 (2008)
Mu, C., Zhang, J., Jiao, L.: An intelligent ant colony optimization for community detection in complex networks. In: 2014 IEEE Congress on Evolutionary Computation(CEC), pp. 700–706. IEEE Press, New York (2014)
Gao, C., Liang, M., Li, X., Zhang, Z., Wang, Z.: Network community detection based on the Physarum-inspired computational framework. IEEE/ACM Trans. Comput. Biol. Bioinf. (2016). doi:10.1109/TCBB.2016.2638824
Acknowledgments
Zhengpeng Chen and Fanzhen Liu contributed equally to this work and should be considered as co-first authors. This work is supported by the National Natural Science Foundation of China (Nos. 61402379, 61403315), Fundamental Research Funds for the Central Universities (No. XDJK2016A008, XDJK2016B029, XDJK2016E074), CQ CSTC (cstc2015gjhz40002).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chen, Z., Liu, F., Gao, C., Li, X., Zhang, Z. (2017). An Enhanced Particle Swarm Optimization Based on Physarum Model for Community Detection. In: Tan, Y., Takagi, H., Shi, Y., Niu, B. (eds) Advances in Swarm Intelligence. ICSI 2017. Lecture Notes in Computer Science(), vol 10386. Springer, Cham. https://doi.org/10.1007/978-3-319-61833-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-61833-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61832-6
Online ISBN: 978-3-319-61833-3
eBook Packages: Computer ScienceComputer Science (R0)