Keywords

1 Introduction

Network science is a modern and significant discipline in many fields, such as social and computer science. Networks, consisting of nodes and edges which connect a pair of nodes, always occur in a variety of contexts [1]. The real-world networks usually share the same characteristic: they exhibit strong community structure. The property of community structure is: in which network nodes are joined together in tightly knit groups, between which there are only looser connections [2]. For example, in Facebook, users who have consistent interests often gather together and form a community but there are only few connections between such communities. Community structure reveals the fundamental functional modules of a network and enables us to better understand the interactive behavior of the network.

Community detection has developed rapidly in recent years and various community detection methods, which mainly focus on network topology, have been proposed, e.g., the agglomerative or divisive algorithms [3], modularity optimization based methods [4], and spectral algorithms [5]. Further, it is well known that a node may belong to multiple communities (i.e. overlapping community). As a result, lots of methods were developed to detect overlapping community, such as k-clique community detection algorithms [6], local expansion and optimization algorithms [7] and probabilistic model-based algorithms [8]. Except for network topology, node attributes or link attributes are also taken into account when discovering communities [9,10,11]. In addition to improving community detection, researchers have realized that community detection should not only find community structure but also describe communities semantically by the use of abundant verbal information in the textual content. These descriptions can reveal why some nodes form a community and enable people to better understand the functions or meanings of communities, and in a way, this has much more practical value in real-world applications. Some methods have been proposed which combine topology and content information and give reasonable and interpretable communities [12, 13].

However, some problems still occur and need to be solved when network topology and node contents are integrated. One of the most important issues is the mismatch problem of topology and content. Traditional methods [12,13,14] typically assume that the network topology and node contents share the same community membership, but in many real social networks, this assumption does not always hold. For example, in a Twitter network, social links usually directly reflect which users gather into a community, while users may generate diverse and disordered content information. Thus, the community membership derived by network topology probably differs from the cluster membership derived by node contents.

For the above problem, it is necessary to extract useful content information to assist topology information in detecting more actual and accurate communities. In this paper, we propose a new generative model different from the traditional generative model and design a new community detection method, referred to as Robust and Strong Explanatory Community Detection (RSECD). To be specific, based on nonnegative matrix factorization (NMF), we are able to obtain the community membership matrix for network topology and cluster membership matrix for node contents. More importantly, there exists some implicit relation between network communities and content clusters, thus we introduce a transition probability matrix to depict it. As a result, even though the content information does not match with topology information, our method can still obtain accurate detection results by using the transition matrix with a suitable prior. At last, we put network topology, node content and transition matrix into a unified NMF framework, and optimize them altogether by designing effective updating rules in order to achieve an integral balance of them.

In the experiments, we use artificial networks to analyze the parameter in the objective function and to demonstrate the effectiveness and robustness of our approach. Next, we conduct experiments on seven real-world network datasets and compare RSECD with eight baseline methods in terms of both disjoint community and overlapping community evaluation metrics. Experimental results show that RSECD can significantly improve the performance in all comparisons, which further illustrates our approach’s robustness. And finally, in order to verify that RSECD is strong-explanatory to communities, we use a case study on a musical social network to semantically explain the hidden meanings of some topics and tell the ‘true stories’ behind communities.

2 Related Work

Various community detection methods, which only take the network topology into account, have been proposed. For example, hierarchical clustering methods [3] which include agglomerative and divisive hierarchical algorithms. Optimal modularity approaches (such as spectrum optimization method [5]) can find communities by the use of modularity optimization. Another approach [4] applies modularity into graphs of different networks by correcting modularity, such as symbolized networks. By mapping a network into a Laplacian matrix and calculating its eigenvector values, spectral methods can find each node’s corresponding community accurately.

With in-depth analysis and research of complex network, the content information of complex networks shows its value and some community detection methods, which integrate the content information with network topology, have been developed. For instance, a subgraph overlapping clustering algorithm combining network structure and content information is proposed [9]. This method applies expectation-maximization (EM) algorithm to maximize likelihood function to generate stationary candidate subgraphs, and then uses k-means algorithm to cluster edges in order to obtain the overlapping community structure. A new generative probabilistic model is proposed which is learned by using a nested expectation-maximization algorithm and can describe the generalized communities [10]. In [11], a co-learning strategy is developed to jointly train the two parts (communities and semantics) in the model by combining a nested EM algorithm and belief propagation.

Recently, researchers have also realized that community detection should not only find communities, but also use rich verbal information in the text to give semantic description of communities. The description information reveals why some nodes gather into a community and helps people better understand the functions or implications of communities. For example, the approach in [12] using nonnegative matrix factorization integrates two tasks of community detection and user profiling into a unified model, and then achieves community profiling by a linear operator integrating the profiles of users. A joint community profiling and detection (CPD) model [13] is proposed which describes communities by published content and friendship links of users. In addition, the method SCI [14], which can detect and describe communities, has also been proposed. This method uses nonnegative matrix factorization to integrate topology and content information into a unified model, and achieves relatively high detection accuracy in comparison with other methods. More importantly, SCI can not only detect communities, but also analyzes the semantics of detected communities. In general, this type of method has more practical value than others without semantics.

However, the methods mentioned above mainly focus on how to effectively fuse topology structure and content information to improve the performance of community detection while do not further consider how to detect communities more robustly, especially when the node contents do not match well with network communities. Moreover, most of these methods can only interpret each community using a single topic, which is far from satisfactory in many real applications.

3 RSECD: The Network Model

Our proposed RSECD approach extends the previous SCI approach by introducing a transition probability matrix with a suitable prior to represent the hidden relationship between network communities and content clusters. In this section, firstly, we illustrate the difference between traditional generative model and our proposed new generative model; then we give some notations. Finally, we elaborate how to model RSECD.

3.1 Traditional Generative Model vs. New Generative Model

Most of community detection methods [9,10,11,12,13,14] follow traditional generative model which generally assumes that network topology and node contents share the same community structure (as shown in Fig. 1(a)). While in many real-world networks, network topology and node contents may implicate different community structures, so that we modify the traditional generative model and design a more reasonable generative model, as shown in Fig. 1(b). In this new model, node contents N implicates topic cluster T (not community structure C) and topic cluster T is generated by community structure C and transition probability matrix X together.

Fig. 1.
figure 1

A comparison of traditional generative model and our proposed new model. (a) is the traditional generative model where community structure C directly generates network topology G and node contents N. (b) is RSECD’s generative model where node contents N implicates topic cluster T (not community structure C) and topic cluster T is generated by community structure C and transition probability matrix X together. In addition, identity matrix I, as the transition matrix’s prior, plays a key guiding role in fusing these two types of information.

3.2 Notations

For an undirected network G with n nodes and e edges, we represent it by a binary-valued adjacency matrix \( {\text{A}}\;{ \in }\;{\text{R}}^{n \times n} \). Each node i has its attributes Si, which may be the semantic information of the node. Si is in the form of an m-dimensional binary-valued vector. All of Si form an attribute matrix \( \text{S}\; \in \;\text{R}^{n \times m} \). The community detection task is: when A and S are observed, on topology, we need to find k different communities; on content cluster with semantics, we need to find k′ different topics and infer the semantics for each community. Because all of the baseline algorithms assume that the number of communities is equal to that of topics, we still assume k = k′ in this paper. However, our RSECD algorithm can also apply equally to k = k′.

3.3 Modeling Network Topology

Our network topology model is based on the following intuitive properties: (1) if two nodes belong to the same community, they are more likely to be connected; (2) if two nodes have similar community memberships, they have a high probability to be linked. We define the propensity of node i belonging to community c as uic. Then we have a community membership of all nodes denoted as U = (uic)n×k. Based on the first propensity, we can use uicujc to represent the expected number of edges between nodes i and j in community c. Based on the second propensity, we can achieve that the expected number of edges between nodes i and j in the whole network is \( \sum\nolimits_{c = 1}^{k} {u_{ic} u_{jc} } \). Considering all nodes, we have the following loss function:

$$ \mathop{\text{min}}\limits_{{\rm U} \ge {0}} ||{\text{A}} - {\text{UU}}^{{\rm T}} ||_{\text{F}}^{2} $$
(1)

3.4 Modeling Node Attributes

We define the propensity of topic t having attribute q as cqt and the propensity of node i belonging to topic t as vit. Then we have an attribute membership of all topics denoted as C = (cqt)m×k and a topic cluster membership of all nodes denoted as V = (vit)n×k. In addition, we define the propensity of a node i having attribute q as siq, which is an element of attribute matrix S. We suppose that if node i belongs to topic t, node i and topic t will have similar attributes information. It can be represented as \( s_{iq} = \sum\nolimits_{t = 1}^{k} {v_{it} c_{tq} } \). Then we have the following loss function:

$$ \mathop {\text{min}}\limits_{{\text{C} \ge \text{0,V} \ge {0}}} ||{\text{S}} - {\text{VC}}^{{\rm T}} ||_{\text{F}}^{{\rm 2}} $$
(2)

3.5 Modeling Transition Probabilities

Transition probability is an important concept of Markov chain and is defined as the probability of transferring from one state to another. We introduce transition probabilities to represent the relationship between network communities and topic clusters. Here the probability transferring from community c to topic t is defined as xct, the probability vector transferring from community c to any topic is defined as xc (xc satisfies a probability distribution) and the probability matrix transferring from any community to any topic is defined as X. Moreover, to effectively guide the fusion of topology and content, we employ identity matrix I as the prior of X. Then we have the following loss function:

$$ \mathop {\hbox{min} }\limits_{{{\text{X}} \ge 0}} ||{\text{UX}} - {\text{V}}||_{{\rm F}}^{2} + ||{\text{X}}1_{k}^{{\rm T}} - 1_{k}^{{\rm T}} ||_{\text{F}}^{2} + ||{\text{I}} - {\text{X}}||_{{\rm F}}^{2} $$
(3)

where \( 1_{k}^{\text{T}} \in {\text{R}}^{k \times 1} \) and all of its elements are 1.

3.6 The Unified Model

By combining the objective functions of the above formulas (including (1) to (3)), we obtain RSECD’s overall loss function:

$$ \mathop {\hbox{min} }\limits_{\begin{subarray}{l} {\text{U}} \ge 0,{\text{V}} \ge 0, \\ {\text{C}} \ge 0,{\text{X}} \ge 0 \end{subarray} } L = {\kern 1pt} {\kern 1pt} {\kern 1pt} ||{\text{A}} - {\text{UU}}^{{\rm T}} ||_{{\rm F}}^{2} + {\kern 1pt} {\kern 1pt} \alpha ||{\text{S}} - {\text{VC}}^{{\rm T}} ||_{{\rm F}}^{{2}} + ||{\text{UX}} - {\text{V}} ||_{\text{F}}^{{2}} + ||{\text{X}} {1}_{k}^{{\rm T}} { - 1}_{k}^{{\rm T}} {||}_{{\rm F}}^{{\rm 2}} + {||}{\text{I}} - {\text{X}} {||}_{{\rm F}}^{{2}} $$
(4)

where α is a balance parameter between network topology and node contents.

Our RSECD model can deal with the topology and content’s mismatch problem in networks well. To be specific, (1) when topology matches with content very well, the first two parts of unified model (network topology model and node attributes model) work so that topology and content can reinforce each other in order to find more exact community structure. (2) When only some parts of content match with network topology, RSECD can also extract useful material from content information to assist topology information in detecting more actual and accurate communities by the mapping and tractive function of transition matrix X. (3) When content does not match with topology at all, matrices U and V are almost orthogonal, thus matrix X is close to a random matrix and the final result is equal to that of using only topology information. In addition, the optimized X essentially represents the mapping relationship between communities and topics, so that we can also use X to explain the detected communities. So our RSECD is robust and strong-explanatory to community detection. We will further use extensive experiments (including a case study) to demonstrate these cases.

4 Optimization

Since the objective function in (4) is not convex, it is hard to obtain the global optimal solution. Fortunately, the local minima of (4) can be obtained using the Majorization-Minimization framework [16]. Here we describe an algorithm that iteratively updates U with V, C, X fixed, updates V with U, C, X fixed, updates C with U, V, X fixed, and updates X with U, V, C fixed, which guarantees that our objective does not increase and the parameters keep nonnegative (with any nonnegative initial seeds) after each iteration. The specific formulas are shown in the following subproblems.

4.1 U-Iteration

When updating U, we need to solve the following problem:

$$ \mathop {\hbox{min} }\limits_{{{\text{U}} \ge 0}} L({\text{U}}) = {\kern 1pt} {\kern 1pt} {\kern 1pt} ||{\text{A}} - {{\rm UU}}^{{\rm T}} ||_{{\rm F}}^{2} + ||{\text{UX}} - {{\rm V}}||_{{\rm F}}^{2} $$
(5)

An arbitrary matrix M satisfies \( ||{\text{M}}||_{{\rm F}}^{2} = {{\rm tr}}({{\rm MM}}^{{\rm T}}) \), so we transform this problem as:

$$ L({\text{U}}) = {{\rm tr}}({{\rm A}}^{{\rm T}} {{\rm A}} - {\text{A}}^{{\rm T}} {{\rm UU}}^{{\rm T}} - {{\rm UU}}^{{\rm T}} {{\rm A}} + {\text{UU}}^{{\rm T}} {{\rm UU}}^{{\rm T}} ) + {\text{tr}}({{\rm X}}^{{\rm T}} {{\rm U}}^{{\rm T}} {{\rm UX}} - {\text{X}}^{{\rm T}} {{\rm U}}^{\text{T}} {{\rm V}} - {\text{V}}^{{\rm T}} {{\rm UX}} + {\text{V}}^{{\rm T}} {{\rm V}}) $$
(6)

We then take a derivative with respect to U and get the following formula:

$$ \frac{{\partial L({\text{U}})}}{{\partial {\text{U}}}} = - 2({{\rm A}}^{{\rm T}} + {{\rm A}}){{\rm U}} + 2({\text{UX}} - {\text{V}}){\rm X}^{{\rm T}} + 4 {{\rm UU}}^{{\rm T}} {{\rm U}} $$
(7)

In order to reduce computational cost, we use a multiplicative update algorithm based on the Oja’s iterative learning rule [15] to update U. We decompose (7) into two sets:

$$ \nabla_{\text{U}} L({{\rm U}}) = \nabla_{{ + }} - \nabla_{ - } $$
(8)

where \( \nabla_{{ + }} \) (\( \nabla_{{ - }} \)) is the sum of all positive (negative) components, then we have:

$$ {\text{U}}_{{\rm new}} = {\text{U}}_{{\rm old}} \frac{{{\nabla }_{{ - }} }}{{{\nabla }_{{ + }} }} $$
(9)

In (7), the negative terms are 2ATU, 2AU, 2VXT and the positive terms are 2UXXT, 4UUTU. So we have the updating rule of U as:

$$ u_{ij} \leftarrow u_{ij} (\frac{{{\text{A}}^{{\rm T}} {{\rm U}} + {\text{AU}} + {{\rm VX}}^{{\rm T}} }}{{{{\rm UXX}}^{{\rm T}} + 2 {\text{UU}}^{{\rm T}} {{\rm U}}}})_{ij} $$
(10)

4.2 V-Iteration and C-Iteration

When updating V, we need to solve the following problem:

$$ \mathop {\hbox{min} }\limits_{{{\text{V}} \ge 0}} L({\text{V}}) = \alpha ||{\text{S}} - {{\rm VC}}^{{\rm T}} {||}_{\text{F}}^{{\rm 2}} + ||{\text{UX}} - {{\rm V}}||_{\text{F}}^{{2}} $$
(11)

In order to iterate V, we transform this problem into the following equation:

$$ L({\text{V}}) = \alpha \cdot {\text{tr}}({{\rm S}}^{{\rm T}} {{\rm S}} - {{\rm S}}^{{\rm T}} {{\rm VC}}^{{\rm T}} - {{\rm CV}}^{{\rm T}} {{\rm S}} + {\text{CV}}^{{\rm T}} {{\rm VC}}^{{\rm T}} ) + {{\rm tr}}({{\rm X}}^{{\rm T}} {{\rm U}}^{{\rm T}} {{\rm UX}} - {{\rm X}}^{{\rm T}} {{\rm U}}^{{\rm T}} {{\rm V}} - {\text{V}}^{{\rm T}} {{\rm UX}} + {\text{V}}^{{\rm T}} {{\rm V}}) $$
(12)

We then take a derivative with respect to V and get the next formula:

$$ \frac{{\partial L({\text{V}})}}{{\partial {\text{V}}}} = - 2\alpha {\text{SC}} - 2{\text{UX}} + 2\alpha {\text{VC}}^{{\rm T}} {{\rm C}} + 2{\text{V}} $$
(13)

Similar to (10), we then obtain the updating rule of V as:

$$ v_{ij} \leftarrow v_{ij} (\frac{{\alpha {\text{SC}} + {{\rm UX}}}}{{\alpha {\rm VC}^{{\rm T}} {{\rm C}} + {{\rm V}}}})_{ij} $$
(14)

When updating C, similar to the steps from (11) to (14), we obtain the updating rule of C as:

$$ c_{ij} \leftarrow c_{ij} (\frac{{{\text{S}}^{{\rm T}} {{\rm V}}}}{{{{\rm CV}}^{{\rm T}} {\text{V}}}})_{ij} $$
(15)

4.3 X-Iteration

When updating X, we need to solve the following problem:

$$ \mathop {\hbox{min} }\limits_{{{\text{X}} \ge 0}} L({\text{X}}) = ||{\text{UX}} - {\text{V}}||_{{\rm F}}^{2} + ||{\text{I}} - {\text{X}}||_{{\rm F}}^{2} + ||{\text{X}}1_{k}^{{\rm T}} - 1_{k}^{{\rm T}} ||_{{\rm F}}^{2} $$
(16)

To iterate X, we can transform this problem into the following equation:

$$ \begin{aligned} L({\text{X}}) & = {\text{tr}}({{\rm X}}^{{\rm T}} {{\rm U}}^{{\rm T}} {{\rm UX}} - {\text{X}}^{{\rm T}} {{\rm U}}^{{\rm T}} {{\rm V}} - {\text{V}}^{{\rm T}} {\text{UX}} + {\text{V}}^{{\rm T}} {{\rm V}}) \\ & + {\text{tr}}(1_{k} {\text{X}}^{{\rm T}} {{\rm X1}}_{k}^{{\rm T}} - 1_{k} {\text{X}}^{{\rm T}} 1_{k}^{{\rm T}} - 1_{k} {\text{X}} 1_{k}^{{\rm T}} +1_{k} 1_{k}^{{\rm T}} ) + {{\rm tr}}({{\rm I}} - {\text{X}} - {\text{X}}^{{\rm T}} + {\text{X}}^{{\rm T}} {{\rm X}}) \\ \end{aligned} $$
(17)

We then take a derivative with respect to X and get the next formula:

$$ \frac{{\partial L({\text{X}})}}{{\partial {\text{X}}}} = - 2{\text{U}}^{{\rm T}} {\text{V}} - 2{\text{I}} - 2{\text{M}} + 2{\text{U}}^{{\rm T}} {\text{UX}} + 2{{\rm XM}} + 2{{\rm X}} $$
(18)

where \( {\text{M}} \in {\text{R}}^{{k{ \times }k}} \) and its elements are all 1. In (18), the negative terms are 2UTV, 2I, 2M and positive terms are 2UTUX, 2XTM, 2X. So we obtain the updating rule of X:

$$ x_{ij} \leftarrow x_{ij} (\frac{{{\text{U}}^{{\rm T}} {\text{V}} + {\text{I}} + {\text{M}}}}{{{{\rm U}}^{{\rm T}} {\rm UX} + {\text{XM}} + {\text{X}}}})_{ij} $$
(19)

5 Experiments

Here we first use artificial networks to analyze the influence of parameter α in the objective function and demonstrate that our approach can solve the mismatch problem well. We then compare our method with eight state-of-the-art algorithms on seven real datasets in terms of four well-known metrics. And finally, we discuss a case study analysis to show that our method has a strong explanatory capability to communities.

5.1 Artificial Networks

We use the Newman’s model [2] to generate artificial benchmark networks. Each network has 128 nodes which have been divided into 4 communities. Each node has zin edges connecting to the nodes of the same community and zout edges connecting to the nodes of different communities (zin + zout = 16). In addition, all nodes are partitioned into 4 clusters corresponding to 4 communities. To be specific, for each node in the sth cluster, we use a binomial distribution with mean pin = hin/h to generate a h-dimensional binary vector as its ((s − 1) × h + 1)-th to (s × h)-th attributes and use a binomial distribution with mean pout = hout/(3 h) to generate its rest attributes. In our experiment, we set h = 50, zout = hout = 8 and use normalized mutual information (NMI) [19] as the metric. To simulate real-world networks’ mismatch problem, we use pmis (ranging from 0 to 1) to reveal the mismatch rate between network topology and node contents. For example, if pmis = 0.8, then in this network, there are 20% of nodes whose contents match with topology and 80% of nodes whose contents do not match with topology. In the first experiment, based on experience, we consider four choices for parameter α (α = 1, \( ||{\text{A}}||_{{\rm F}}^{{2}} \), \( 1/||{\text{S}}||_{\text{F}}^{{2}} \), or \( ||\text{A}||_{\text{F}}^{{2}} /||\text{S}||_{\text{F}}^{2} \)) and respectively compute the average NMI values under them. The results are shown in Fig. 2(a), when pmis is less than 0.6 (this corresponds to most cases in real-world networks), the result under α = \( {||}{\text{A}}{||}_{\text{F}}^{{2}} \) is greater than the others, so we conclude that choosing α = \( {||}{\text{A}}{||}_{\text{F}}^{{2}} \) as the default value may be better than the other three choices.

Fig. 2.
figure 2

Results on artificial networks. (a) is the NMI results under 4 different choices of parameter α. (b) is 3 different methods’ NMI results when the mismatch rate pmis varies from 0 to 1. (c) is 3 different methods’ NMI results when hout varies from 0 to 12 under pmis = 0.

Next, to illustrate RSECD’s robustness, we compare three methods—Topo, SCI and RSECD. Topo is a variant of RSECD using topology information alone. SCI is a NMF-based method using topology and content information together but did not consider the mismatch problem [14]. As shown in Fig. 2(b), Topo keeps a stable detection accuracy no matter how pmis changes because the topology information existing in the network is fixed. When pmis is less than 0.3, as SCI combines topology and content information together, it has higher accuracy than Topo. However, because SCI fails to solve the mismatch problem, when pmis is greater than 0.4, the performance of SCI gradually weakens and is worse than Topo. RSECD, as the extended work of SCI, has better performance than Topo and SCI when pmis is less than 0.7. Moreover, when pmis is larger than 0.7 (i.e., a high mismatch rate in the network), RSECD is just slightly worse than Topo but much better than SCI. In summary, the result demonstrates that: (1) when content match with topology well, RSECD can better combine topology and content to find communities; (2) when content does not match with topology, RSECD can also solve the mismatch problem well. Therefore, RSECD is robust.

Finally, because the cluster structure implicated by content information may be indistinct in the real-world networks, we design a third experiment. In this part, we set pmis = 0 and relieve the constraint hout = 8, making hout vary from 0 to 12. The larger hout is, the higher distinct degree is. The final result is shown in Fig. 2(c). As we can see, RSECD’s accuracy is almost always higher than that of SCI. Even though when the cluster structure is very indistinct, RSECD’s accuracy does not decline too much and is very close to that of Topo.

5.2 Real-World Networks

Datasets.

We use 7 real networks [17, 18] with node attributes and ground-truth community labels. These datasets are often used in the field of community detection by researchers and their detailed information is shown in Table 1. In this table, the number of attributes represents the total number of attributes in the network.

Table 1. Datasets used.

Metrics.

To test RSECD’s performance, we conduct a quantitative analysis of the final detection results using two types of metrics (disjoint community metrics and overlapping community metrics). For disjoint community metrics, we choose accuracy (AC) [19] and normalized mutual information (NMI) [19]. AC is used to measure the percentage of correct labels obtained. In clustering applications, NMI is used to measure how similar two sets of clusters are. For overlapping community metrics, we choose F-score [20] and Jaccard similarity [20]. Both of them are common metrics which are used to quantify the performance in terms of the agreement between the ground-truth communities and the detected communities.

Baselines.

To illustrate RSECD’s effectiveness, we choose three types of baseline algorithms including two topology-based methods (DCSBM [21] and BigCLAM [22]), one content-based method (AP [23]), and five methods using both topology and content (CESNA [24], DCM [25], PCL-DC [26], Block-LDA [27] and SCI [14]).

Setting.

In the experiments, first for each network we uniformly set α to be \( ||\text{A}||_{\text{F}}^{{2}} \) based on previous parameter analysis. We then repeat RSECD algorithm 20 times with different random seeds. We obtain the result which corresponds to the smallest loss function value as the final result.

Results.

We show the final results in Tables 2 and 3. It is worth noting that AP cannot compute accuracy (AC) value, and CESNA and DCM are only applicative to overlapping community metrics. In the tables, we use bold to mark the best results. Table 2 shows the comparison results in terms of AC and NMI. In AC, our method RSECD performs best among all the five methods. In NMI, RSECD still achieves the best results in comparison to the other methods. All the comparison results using different algorithms under overlapping community metrics are shown in Table 3. In these results, RSECD again has the best performance in comparison to the other tested approaches. In summary, the main reasons that our algorithm achieves such superior performance are as follows: (1) RSECD assumes that topology and content do not share the same community structure, so that those harmful content information will not interfere with topology information’s important role in community detection; (2) transition probability matrix, as a filter of content information, can retain beneficial content information which can assist topology information in detecting more actual, accurate communities and remove harmful content information which has wrong guidance in community detection. Therefore, RSECD can solve the mismatch problem well and the final performance results are relatively high and stable in any case.

Table 2. Performance comparison of different methods using disjoint community metrics. Here “topo”, “cont”, “both” denote methods using topology, contents, and topology-and-contents.
Table 3. Performance comparison of different methods using overlapping community metrics.

Efficiency.

As like standard nonnegative matrix factorization, the calculational complexity of RSECD is \( O(T(n^{2} k + 2mnk + nk^{2} )) \) where T is the number of iterations, n the number of nodes, k the number of communities (k << n) and m the number of attributes. By taking into account the sparsity of the adjacency matrix A and attribute matrix S, RSECD needs \( O(T(ek + 2e'k + nk^{2} ) \) time where e is the number of edges (e << n) and e’ the number of nonzero elements in the attribute matrix S (e′ << m). Thus, the computational complexity of RSECD is near linear with the number of nodes. We also report RSECD’s running time. It needs 2.893 s (here “s” denotes seconds), 8.9 s, 8.233 s, 10.952 s, 14.041 s, 6248.029 s and 5760.169 s, respectively, on the datasets Facebook, Cornell, Texas, Washington, Wisconsin, Citeseer and Uai2010.

5.3 A Case Study on Lastfm

We select LASTFM datasetFootnote 1, which comes from a musical social network, as our dataset for the case study analysis. This dataset contains 1,892 users and the total number of attributes in the network is 11,946. These attributes reveal users’ favorite songs or singers. LASTFM does not have the ground-truth of community labels. While, all the methods used in this work need the number of communities to be given. So, as did in [14], we use Louvain method [28] to set the number of communities in this network to 38. Two vivid examples to interpret the communities derived are shown in Figs. 3 and 4 in the form of word clouds. Word clouds can graphically show different attribute words’ importance degree in one community in order to explain the current community’s semantics. That is, in a word cloud, the size of a word is proportional to the probability that it belongs to this community.

Fig. 3.
figure 3

Word clouds for the 30th community. (a) denotes topic 1 and (b) denotes topic 32, both of which are dominant topics of the 30th community.

Fig. 4.
figure 4

Word clouds for the 16th community. This community contains three dominant topics, in which (a) denotes topic 13, (b) denotes topic 24 and (c) denotes topic 36.

The first example is the 30th community which contains two dominant topics, i.e., topics 1 and 32. Topic 1, as shown in Fig. 3(a), is highly related to electronic pop music. The total of “electronic”, “electropop” and “electronica” has a high proportion in all attribute words and illustrates that the theme of topic 1 is pop electronic music. In addition, “australian”, “8-bit”, “synth pop”, “big beat” and “dark pop” are different styles of pop electronic music. On the other hand, topic 32, as shown in Fig. 3(b), mainly denotes synth pop music. Synth pop music origins from “new wave”, “post-punk” and is popular in “80 s”. “new romantic” is a synth pop song of Taylor Swift. “depeche mode” is a British band in style of alternative dance and synth pop. “electroclash” is another name of “tech pop” which contains the style of synth pop. “synth” and “synth pop” also appear here. It is worth noting that, these two topics which corresponds to electronic pop music of multiple styles and synth pop music, respectively, both belong to electronic pop music although being the different branches. Therefore, the 30th community will be a group of fans adoring electronic pop music mainly including synth pop music.

Our second example is the 16th community which contains three dominant topics, i.e., topic 13, 24 and 36. They are shown in Fig. 4(a), (b) and (c), respectively. Similar to the previous analysis, we found out that topic 13 is related to opera music (for example, “diva”, “female vocalist” appear here); topic 24 is related to country music and pop music (for example, “country”, “pop” appear here); and topic 36 is related to dance music (for example, “dance”, “disco” appear here). Simultaneously, these three topics have the same theme, i.e., female singer. So, we can conclude that the 16th community’s dominant topic is female singers and the three topics (topic 13, 24, 36) in this community all have their own accurate semantics, respectively. Specifically, topic 13, 24, 36 respectively reflects opera music, country music and dance music.

6 Conclusion

In this paper, we proposed a new community detection method (RSECD) which is able to detect communities and in the same time analyze the semantics of founded communities. We introduced a nonnegative matrix factorization model to depict the relationships between nodes, topics and communities more accurately. A transition probability matrix with a suitable prior was also introduced to show their hidden relationships to improve the robustness of the new model, especially when node contents do not match well with network topology. Through artificial benchmark networks, we analyzed the influence of parameter α in the objective function and demonstrated RSECD’s high level of robustness. On real-world networks, we showed that RSECD outperforms all of the baseline methods. Finally, the case study analysis on a musical social network showed how the semantic explanation of communities derived by RSECD works. This helps people to understand and interpret communities more precisely and in a human-readable form in many real applications.