A Simple and Effective Community Detection Method Combining Network Topology with Node Attributes

He, Dongxiao; Song, Yue; Jin, Di

doi:10.1007/978-3-030-29551-6_15

Dongxiao He¹¹,
Yue Song¹¹ &
Di Jin¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11775))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

2827 Accesses
1 Citations

Abstract

Community detection is a fundamental problem in the study of complex networks. So far, extensive approaches, which use network topology alone or use both network topology and attribute information, have been designed to detect the community partitions of networks. However, existing approaches cannot work effectively for networks whose community structure does not match well with the ground-truth, or networks whose topological information contains serious noise, and networks where the difference of attribute similarity between nodes is tiny. Inspired by a force-directed network layout and community intuitive characteristics, we propose a simple while effective approach which utilizes attribute information to partition nodes into communities by maximizing network modularity. By using attributes as nodes to the network and the interaction between nodes, our novel method cannot only effectively improve community detection of networks, but also obtain the number of communities closer to the real one. Through extensive experiments on some real-world datasets, we demonstrate the superior performance of the new approach over some state-of-the-art approaches.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Node Attribute-enhanced Community Detection in Complex Networks

Article Open access 25 May 2017

Overlapping Community Detection based on Network Decomposition

Article Open access 12 April 2016

A comprehensive survey on community detection methods and applications in complex information networks

Article 18 April 2024

Keywords

1 Introduction

Many complex systems can be abstracted as networks which consist of nodes and edges between nodes in modern network science. Complex networks are usually organized in communities, which have their own role and/or function such as groups of related individuals in social networks, functional modules of protein-protein interaction networks, etc. [6]. Discovering communities is useful for understanding what role they play in the network structure and dynamics, and has many real applications [7].

The primary objective of community detection is to partition the network into communities which correspond to functional modules composed of nodes. Generally, two types of information can be exploited for community detection. They are the network topology and attribute information on nodes. Therefore, a wide variety of community detection methods based on different theories and technologies had been proposed recently [6,7,8, 12]. Topology-oriented community detection methods can be mainly categorized into two types: measure-based methods [1, 2, 16, 17] and probabilistic model based methods [10, 13]. However, some of the real-world networks do not conform to the community structure property, and there is a lot of noise between nodes. Therefore, the above methods using network topology alone cannot work effectively in many cases. In order to improve the quality of community partitions detected, many existing methods take node attributes into account to supplement network topology information [3,4,5, 15]. However, these methods usually need to manually adjust the effect between topology and attributes on community detection, which is typically difficult to achieve, and the similarity between nodes (calculated in these methods) in some networks are not significant. So these methods can not effectively play a role. Although some models [9, 20] have been proposed, which do not need to adjust the parameters between two types of information, they cannot effectively partition nodes into communities for the network with serious noise.

Force-directed layout is a widely used technology in network layout. Its very essence is to turn structural proximities into visual proximities. Noack [18] has shown that the proximities express communities and proposes that layouts with optimal energy are consistent with clusterings with optimal modularity. The layout and clusterings of the whole network can be affected by adding or deleting nodes and edges because the structural proximity has been changed in the network. Since adding or deleting nodes and edges to the network can affect structural proximity, it may be ideal to change the community structure for community detection. But how to effectively add nodes and edges to the network for community detection is still a challenge.

To address these problems, inspired by a force-directed network layout and community intuitive characteristics, in this paper we propose a novel method with an augmented graph for community detection. Different from existing topology and attributes based approaches that using node attributes to calculate similarity, our new approach uses them as nodes to help detect communities of networks. In this way, our method not only avoids the problem that the quality of community partitions cannot be effectively improved due to the tiny difference between node attribute similarity, but also improves community structure of the network to obtain more accurate partitioning results through the interaction between nodes. We can then obtain community partition of the original network from the augmented graph using a fast and effective algorithm based on modularity optimization, e.g. Louvain. We present extensive experimental results on some real-world attributed networks to show that the new method is effective and outperforms some state-of-the-art approaches for community detection.

2 The Method

Assuming that an undirected and attributed network G is defined as a 3-tuple (V, E, X), where \(V=\left\{ v_{1}, v_{2}, \ldots , v_{N}\right\} \) is a set of N nodes, \(E=\{(u, v) : u, v \in V, u \ne v\}\) is a set of e edges, which can be represented by an \(N \times N\) adjacent matrix A with elements \(A_{u v}=1\) if u and v are connected, or 0 otherwise, and \(X=\left\{ x_{1}, x_{2}, \dots , x_{M}\right\} \) is a set of M attributes, which is also binary matrix and its value is 1 if the node has the attribute, or 0 otherwise, then community detection is to partition the node set V of network G into K disjoint communities \(P=\left\{ C_{1}, C_{2}, \ldots , C_{K}\right\} \).

2.1 Motivations

Most of community detection methods mainly focus on detecting communities using network topology, and the premise is that the functional communities are consistent with community structure of networks. However, some of the real-world networks do not meet this property, and there is also typically serious noise in network topology. As shown in Table 1, we use a widely used metric modularity [17] to assess the community structure of seven widely used attributed networks [14]. We use Louvain method [1] to divide seven networks into communities to get the modularity values, and then compare it with the modularity values of the ground-truth communities. We can see that some of the real-world networks do not correspond well to what we usually say. In Table 1, it is obvious that the connections between communities are more denser for the first four networks, and the network structures contain serious noise for the latter three networks. It is not uncommon for two disconnected nodes to belong to the same community, and it is difficult to correctly divide a node connected to multiple communities for various reasons to the correct community by only relying on the topological structure. Therefore, the traditional methods based on network topology, such as modularity-based methods, cannot work effectively in this case.

Table 1. Datasets descriptions and comparison between modularity values obtained by Louvain method and ground-truth communities. Q is the Modularity value, which is between −1 and 1. The larger the value, the denser the connections within the community. N is the number of nodes, E the number of edges, M the number of attributes, K the real number of communities.

Full size table

As shown, the network topology reflects only one aspect of networks and usually contains noise. Fortunately, node attributes can be used to improve the results of community detection. However, this is also technically challenging to effectively combine these two kinds of available information. Existing methods typically deal with network topology and node attributes separately. As a result, they need to adjust the impact of two types of information on community detection, which is difficult to achieve in real life. And most of the methods only deal with attribute information by calculating attribute similarity between nodes. However, there is no significant difference between the attribute similarity of a pair of nodes within community and that across communities in some networks in many cases. As shown in Fig. 1, Fig. 1(a) is a network with three nodes. We use the topology-based approach, such as modularity-based approach, to partition three nodes into the same community. When we improve the community partitioning result by calculating the node’s attribute similarity (e.g., using cosine similarity), it does not play an effective role because the pairs of nodes have similar similarities. (The black nodes are attributes, and edges indicate that nodes have these attributes in Fig. 1(b).) Therefore, it still is not enough for community detection methods to help partition communities by calculating attribute similarity as existing methods done.

2.2 Augmented Graph Based Method

Here we develop a simple and effective community detection method that combines network topology and node attribute information by an augmented graph. The initial topological structure of the original network is augmented by new nodes called attribute nodes and new edges called attribute edges. Our method is mainly based on the following intuitive properties:

Nodes that connected to each other are more likely to belong to the same community.
Nodes sharing common attributes are more likely to be in the same community.
Even if two nodes are not interconnected, they may belong to the same community through the interaction between original nodes and the interaction between original nodes and attribute nodes.
Even if both the topological structure and attribute information contain a lot of noise, communities can be found well through the implicit interaction of these two kinds of nodes.

The process of the proposed new method is shown in Algorithm 1. We formally describe the work process of our method as follows. A set of attribute nodes is \(V_{M}=\left\{ v_{1}, v_{2}, \dots , v_{M}\right\} \) and an attribute edge is added between node u and attribute node \(v_{m}\) if node u has the m-th attribute (As shown in Fig. 1(b)). There are totally M attribute nodes and \(\sum _{u \in V, m \in V_{M}} X_{u m}\) attribute edges added to the original network. In the augmented graph, two nodes are close if they are connected through many other original nodes, or if they share many common attribute nodes as neighbors. Once the augmented graph is created, we can apply some existing community detection methods, e.g., Louvain algorithm [1] based on modularity, to detect communities in the network (As shown in Fig. 1(c)).

3 Experiments

We now analyze why the new method (AUG for short) works. And then we compare it with several state-of-the-art community detection methods for evaluation. The datasets used are shown in Table 1.

3.1 Why Our Works

To validate the new method, we illustrate the original network and the augmented network of Cornell and their community partitions in Fig. 2. For the original network of the Cornell dataset in Fig. 2(a), the community structure derived is not consistent with ground-truth (i.e., the nodes within communities are sparsely connected, while ones between communities are densely connected), which results in a large number of nodes being misclassified (Fig. 2(b)). In comparison, AUG correctly assign most nodes to correct communities (Fig. 2(d)). This is because AUG uses attributes as nodes and edges which indeed improves community results (Fig. 2(c)). This validates that adding attributes as nodes and edges to the network can effectively complement the community characteristics into the network topology so as to find better communities.

3.2 Comparison with Existing Methods

We compared the new method with two types of the state-of-the-art community detection methods. The first, including DCSBM [13] and NetMRF [10], uses network topology alone. The second includes PCLDC [20], SCI [19] and NEMBP [9], which use both topological and attribute information. All the methods compared require the number of communities to be specified, while our method does not. Here, we set their number of communities to the ground-truth, and ran these methods with their default parameters. We used Accuracy (AC) and Normalized Mutual Information (NMI) [7] as metrics for performance evaluation.

Table 2. Comparison of the results of different community detection methods in terms of AC and NMI. ORG is the method that original networks are partitioned by using Louvain. \(K'\) is the number of communities detected. Bold font denotes the best results.

Full size table

As shown in Table 2, AUG is far superior to ORG, confirming that the new method can effectively improve the quality of community structure detected by adding attributes as nodes to the original network. Moreover, compared with ORG, AUG can find the number of communities (Table 2) which is much closer to that of ground truth communities (Table 1). Moreover, compared with existing methods, AUG has the best performance on 6 and 4 of 7 networks in terms of AC and NMI, respectively. Generally, the methods that use both topology and attribute information perform better than those using topology alone. We like to highlight that among the methods using both two types of information, our AUG is on average 26.8%, 17.2%, and 6.3% more accurate than PCLDC, SCI and NMEBP in AC; and 24.3%, 22.3% and 2.5% more accurate than PCLDC, SCI and NMEBP in NMI. These results further validates the superiority of our new approach over others in finding communities.

4 Conclusion

In this paper, we proposed an efficient and simple method for community detection by adding attributes as nodes to the original network. It is able to make fully use of network topology and attribute information especially when the topology contains serious noise. The extensive experimental results demonstrated that our method outperformed the state-of-the-art approaches for community detection and validated the superior performance of our method on 7 real-world networks.

References

Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech-Theory E P10008, 1–12 (2008)
MATH Google Scholar
Chakraborty, T., Dalmia, A., Mukherjee, A., Ganguly, N.: Metrics for community analysis: a survey. ACM Comput. Surv. 50(4), 54:1–54:37 (2017)
Article Google Scholar
Combe, D., Largeron, C., Géry, M., Egyed-Zsigmond, E.: I-Louvain: an attributed graph clustering method. In: Fromont, E., De Bie, T., van Leeuwen, M. (eds.) IDA 2015. LNCS, vol. 9385, pp. 181–192. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24465-5_16
Chapter Google Scholar
Falih, I., Grozavu, N., Kanawati, R., Bennani, Y.: Anca : Attributed network clustering algorithm. In: Complex Networks, pp. 241–252 (2018)
Google Scholar
Falih, I., Grozavu, N., Kanawati, R., Bennani, Y.: Community detection in attributed network. In: Proceedings of the WWW, pp. 1299–1306 (2018)
Google Scholar
Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010)
Article MathSciNet Google Scholar
Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016)
Article MathSciNet Google Scholar
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Natl. Acad. Sci. 99(12), 7821–7826 (2002)
Article MathSciNet Google Scholar
He, D., Feng, Z., Jin, D., Wang, X., Zhang, W.: Joint identification of network communities and semantics via integrative modeling of network topologies and node contents. In: Proceedings of the AAAI, pp. 116–124 (2017)
Google Scholar
He, D., You, X., Feng, Z., Jin, D., Yang, X., Zhang, W.: A network-specific Markov random field approach to community detection. In: Proceedings of the AAAI, pp. 306–313 (2018)
Google Scholar
Jacomy, M., Venturini, T., Heymann, S., Bastian, M.: Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS One 9(6), e98679 (2014)
Article Google Scholar
Jin, D., Wang, H., Dang, J., He, D., Zhang, W.: Detect overlapping communities via ranking node popularities. In: Proceedings of the AAAI, pp. 172–178 (2016)
Google Scholar
Karrer, B., Newman, M.E.J.: Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011)
Article MathSciNet Google Scholar
Leskovec, J.: Stanford network analysis project (2016). http://snap.standford.edu
Muslim, N.: A combination approach to community detection in social networks by utilizing structural and attribute data. Soc. Networking 05, 11–15 (2016)
Article Google Scholar
Newman, M.E.J.: Modularity and community structure in networks. Proc. Natl. Acad. Sci. 103, 8577–8582 (2006)
Article Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69, 026113 (2004)
Article Google Scholar
Noack, A.: Modularity clustering is force-directed layout. Phys. Rev. E 79, 026102 (2009)
Article Google Scholar
Wang, X., Jin, D., Cao, X., Yang, L., Zhang, W.: Semantic community identification in large attribute networks. In: Proceedings of the AAAI, pp. 265–271 (2016)
Google Scholar
Yang, T., Jin, R., Chi, Y., Zhu, S.: Combining link and content for community detection: a discriminative approach. In: Proceedings of the ACM SIGKDD, pp. 927–936 (2009)
Google Scholar

Download references

Acknowledgments

This work was supported by Natural Science Foundation of China (61876128, 61772361).

Author information

Authors and Affiliations

College of Intelligence and Computing, Tianjin University, Tianjin, 300350, China
Dongxiao He, Yue Song & Di Jin

Authors

Dongxiao He
View author publications
You can also search for this author in PubMed Google Scholar
Yue Song
View author publications
You can also search for this author in PubMed Google Scholar
Di Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Di Jin .

Editor information

Editors and Affiliations

University of Piraeus, Piraeus, Greece
Christos Douligeris
University of Vienna, Vienna, Austria
Dimitris Karagiannis
University of Piraeus, Piraeus, Greece
Dimitris Apostolou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, D., Song, Y., Jin, D. (2019). A Simple and Effective Community Detection Method Combining Network Topology with Node Attributes. In: Douligeris, C., Karagiannis, D., Apostolou, D. (eds) Knowledge Science, Engineering and Management. KSEM 2019. Lecture Notes in Computer Science(), vol 11775. Springer, Cham. https://doi.org/10.1007/978-3-030-29551-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-29551-6_15
Published: 21 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29550-9
Online ISBN: 978-3-030-29551-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics