Keywords

1 Introduction

In complex networks, there usually exist groups inside which nodes are connected more densely with one another than with the nodes outside. These groups of nodes are called communities [13]. In reality, these groups usually have physical meanings such as members of the same organization, scientists with publications in the same area, or proteins sharing the same function. Thus, uncovering such latent communities in complex networks has attracted great research interests in the past decade [11]. Classic methods assume communities are mutually exclusive, i.e., each node of a network belongs to one and only one community. However, in real-world complex networks like social networks and biological networks, such community membership restriction does not apply because a node may have multiple characteristics and thus belongs to multiple communities. As a result, a more challenging problem named overlapping community detection has been introduced in recent years [31].

Matrix Factorization (MF), as one of the standard framework to solve the problem of overlapping community detection, detects communities from a global view [31]. Taking the adjacency matrix G of the given network as input, MF-based models assign the number of communities in advance, and seek out a node-community weight matrix F, which matches the information revealed by the input as accurately as possible. Early work [23, 29] simply aims to approximate G entry by entry with \(FF^T\), which only makes use of the mathematical representation of adjacency matrix, but ignores its physical meaning. The most obvious information an adjacency matrix provides is the link information. Thus, recent work [32, 35] assumes that nodes sharing more communities have a higher probability to be linked and formulates the problem with a generative objective function. In other words, a link can be regarded as an indicator to reflect the similarity of community membership between two nodes.

However, a link is not a perfect indicator for two major reasons. First, it is common that two nodes sharing several communities do not have a link between them, or two nodes with no common community are connected. A survey conducted on Facebook [9] shows that edges between two individuals from different communities outnumber edges connecting users in the same community. For example, a salesperson may make connections with many strangers to sell his products, and the establishment of links between salespeople and customers does not indicate any similarity between their community memberships. In cases like these, links become noise instead of evidence. Second, a link is a binary indicator in an unweighted network. Given two linked node pairs with no other information at all, it is impossible to distinguish which one is more similar.

Inspired by the definition of tie strength [12], we incorporate a more powerful indicator, which is the number of mutual friends between two nodes, to reflect their community membership similarity. The definition of tie strength reveals that the stronger tie the two nodes own, the larger overlap in their friendship circles they will have. This idea can be incorporated into our matrix factorization framework for overlapping community detection, which meets the common sense that the more communities two nodes share, the more mutual friends they will have. For example, if two individuals attended the same class in high school, joined the same basketball team, and work in the same company now, they should know many mutual friends in different communities, i.e., their ego-networks (friend circles) are densely overlapped. Compared to a link, the number of mutual friends is no longer a binary indicator and it provides more confidence to predict the similarity of community membership between two nodes. However, it still suffers from several issues: the lack of friends of two nodes may limit the number of mutual friends between them, and communities with different sizes may contribute different numbers of mutual friends to each node pair. To handle these limitations, we incorporate Mutual Density as a more consistent indicator, which is defined as the Jaccard similarity of two nodes’ ego-networks. Under the general description of “neighborhood similarity”, the concept of mutual density has been applied in community detection under different assumptions [1, 3, 21, 26, 28]. However, none of these methods are based on matrix factorization and none of them use mutual density to measure the similarity of community membership between two nodes.

In this paper, we introduce mutual density and the number of mutual friends as the new indicators instead of links themselves for inferring community membership similarity in the matrix factorization framework. We conduct data observation on real-world networks with ground-truth communities to validate that mutual density is more consistent with community memberships similarity than the other two indicators. Thus, we formulate our Mutual Density based Non-negative Matrix Factorization (MD-NMF) model, which incorporates mutual density as the community similarity indicator and employs a novel objective function to ensure that a node pair with higher mutual density is more likely to have a higher community membership similarity. From a node’s perspective, we ensure that it is more likely to join the same communities with its acquaintances than with its strangers. To solve the optimization problem, we apply projected stochastic gradient descent with sampling. By applying our model to real-world and open-source network datasets, we find that our new MD-NMF model outperforms several state-of-the-art methods on either modularity or \(F_1\) score.

The main contributions of this paper are:

  1. 1.

    We incorporate Mutual Density as a new indicator to reflect the community membership similarity between two nodes in substitution for a link within the matrix factorization framework for overlapping community detection.

  2. 2.

    We find that there is consistency between the mutual density of two nodes and their community memberships similarity by empirically studying real-world networks with ground-truth communities.

  3. 3.

    We propose a novel Mutual Density based Non-negative Matrix Factorization (MD-NMF) model for overlapping community detection by formulating mutual density properly in the matrix factorization framework. Our model outperforms state-of-the-art baselines.

2 Definition and Data Observation

2.1 Problem Definition

Definition 1

(Community Detection). Given an unweighted and undirected graph G(VE), community detection aims to find a communities set \(S=\{C_{i}|C_{i}\ne \emptyset , C_{i}\ne C_{j}, 1\le i,j\le p\}\) where \(C_i\) represents a community consisting a set of nodes, to maximizes a particular objective function f, i.e.,

$$\begin{aligned} \max f(G,S), \end{aligned}$$
(1)

where p is the number of communities.

Different from the traditional community detection problem, an overlapping community detection problem allows communities to overlap with each other. This relaxation enables the Matrix Factorization approach to be employed. In MF-based methods, the graph is represented by its adjacency matrix \(G \in \{0,1\}^{n\times n}\), whose (ij) entry indicates whether node i and node j are connected or not. The goal is to find a node-community weight matrix F, with its entry \(F_{u,c}\) representing the weight of node u in community c, and apply F to approximate the adjacency matrix.

2.2 Indicator Definitions

To infer the community membership similarity between two nodes, we have mentioned three indicators in Introduction. They are link existence l(uv), the number of mutual friends m(uv) and mutual density d(uv), where u and v are both nodes in V. We formally define each of them as follows.

Definition 2

(Link Existence). Given a graph G(VE) and two nodes \(u,v \in V\), the link existence between u and v is

$$\begin{aligned} l(u,v) = \left\{ \begin{array}{l l} 1 &{} \quad \,\,{if}\,\, G_{uv}=1,\\ 0 &{} \quad \,\,{else} \end{array} \right. . \end{aligned}$$
(2)

Definition 3

(The Number of Mutual Friends). Given a graph G(VE) and two nodes \(u,v \in V\), the number of mutual friends between u and v is

$$\begin{aligned} m(u,v) =\, \mid \{i | (u,i) \in E\ and\ (v,i) \in E\} \mid . \end{aligned}$$
(3)

Definition 4

(Mutual Density). Given a graph G(VE) and two nodes \(u,v \in ~V\), the mutual density between u and v is

$$\begin{aligned} d(u,v) = \frac{\mid \{i | (u,i) \in E\ and\ (v,i) \in E\}\ \mid }{\mid \ \{j | (u,j) \in E\ or\ (v,j) \in E\}\ \mid }. \end{aligned}$$
(4)
Table 1. Dataset statistics. |V|: number of nodes, |E|: number of edges, |C|: number of ground-truth communities, D: average degree of nodes, M: average number of nodes per community, A average number of joined communities per node.

2.3 Data Observation

To validate (1) the number of mutual friends is better than a link in inferring community membership similarity, and (2) mutual density is more stable compared with the number of mutual friends, we conduct two experiments on two large real-world networks with ground-truth communities [33]. Table 1 shows the statistics of these two networks, which are Amazon and DBLP.Footnote 1

To quantify the community membership similarity between two nodes, we use cosine similarity as our measurement, which is defined as follows.

Definition 5

(Cosine Similarity of Community Membership). Given a graph with p ground-truth communities \(\{C_{i} | i=1,2,\cdots ,p\}\), the cosine similarity of community membership s(uv) between u and v is

$$\begin{aligned} s(u,v) = \frac{\mathbf {u}\cdot \mathbf {v}^{T}}{\Vert \mathbf {u}\Vert _{2}\Vert \mathbf {v}\Vert _{2}}\ , \end{aligned}$$
(5)

where \(\mathbf {u} \in \mathcal {R}^{p}\) is the community membership vector of node u and \(u_{i}\) represents the weight u belongs to community \(C_{i}\).

Fig. 1.
figure 1

The number of sampled node pairs having a same value of cosine similarity

First, we randomly sample 100,000 node pairs with links as well as 100,000 node pairs with at least two or four mutual friends and compute the cosine similarity of community membership for each node pair. Figure 1 plots the number of 3 different types of node pairs with the same value of cosine similarity. We expect all three types of node pairs to share at least one community and thus to have non-zero cosine similarity. However, nearly 14,000 node pairs with links do not share any communities. The error rate is about 14%. On the other side, less than 8% of the node pairs with at least two mutual friends and only about 1% of the node pairs with at least four mutual friends are out of our expectation. When the value of cosine similarity is non-zero, all three types are pretty similar, and the number of node pairs with four mutual friends is slightly greater than the other types. Thus, the number of mutual friends is a more accurate and more flexible indicator compared to the existence of links.

Fig. 2.
figure 2

Averaged value of each indicator as a function of cosine similarity in community membership

Second, we compare the stability of indicator between the number of mutual friends and mutual density. A stable indicator is expected to be monotonic while community membership similarity increases. We sample 10,000 node pairs each time with a certain value of cosine similarity and calculate the average number of mutual friends and average mutual density of these node pairs. The result is shown in Fig. 2. We can see that on the DBLP data, the average number of mutual friends vibrates up and down while average mutual density is almost monotonic as cosine similarity increases. Thus, mutual density is a more stable indicator than the number of mutual friends to infer community membership similarity.

In summary, mutual density is the best indicator among all three indicators we mentioned with highest accuracy and stability.

3 Mutual Density Based NMF Model

3.1 Model Assumption

From the data observation, we can see that the cosine similarity of community membership between two nodes is correlated with their mutual density. It leads to the intuition of our model that two nodes with larger mutual density are more likely to have higher cosine similarity of community membership.

To formally illustrate our model assumption, we need to define two relationships between two nodes in the first place: \(\alpha \)-acquaintance and \(\beta \)-stranger.

Definition 6

( \(\alpha \) -acquaintance). Given \(\alpha \in [0,1]\), for two nodes \(u,v\in V\), v is u’s \(\alpha \)-acquaintance if and only if

$$\begin{aligned} d(u,v) \ge \alpha . \end{aligned}$$

By the symmetry of d(uv), u is also v’s \(\alpha \)-acquaintance.

Definition 7

( \(\beta \) -stranger). Given \(\beta \in [0,1]\), for two nodes \(u,v\in V\), v is u’s \(\beta \)-stranger if and only if

$$\begin{aligned} d(u,v) \le \beta . \end{aligned}$$

By the symmetry of d(uv), u is also v’s \(\beta \)-stranger.

In both definitions, d(uv) is the mutual density between u and v defined in Eq. (4). Moreover, for a node u, we define its set of \(\alpha \)-acquaintances as \(A(u,\alpha )=\{i|d(u,i) \ge \alpha \}\) and its set of \(\beta \)-strangers as \(B(u,\beta )=\{j|d(u,j) \le \beta \}\).

Following our intuition, our model assumption can be formally defined as

$$\begin{aligned} \begin{aligned}&s(u,i)>s(u,j), \\&\text {if}~i\in A(u,\alpha ),\,j \in B(u,\beta ), \text {and}~\alpha > \beta , \end{aligned} \end{aligned}$$
(6)

where s(ui) is the cosine similarity of community memberships between u and i.

In other words, we expect that the cosine similarity between u and any of its \(\alpha \)-acquaintances should be greater than the cosine similarity between u and any of its \(\beta \)-strangers. Adjusting \(\alpha \) and \(\beta \) for different graphs enables us to make sure that the difference of cosine similarity is significant. If \(\alpha \) is only slightly greater than \(\beta \), we are not confident enough to make such assumption.

3.2 Model Formulation

In the MD-NMF model, we aim to find the node-community weight matrix F which maximizes the likelihood that every node in the graph has higher cosine similarity in community membership with all its \(\alpha \)-acquaintances than with all its \(\beta \)-strangers. For each node u, we want to maximize

$$\begin{aligned} \mathcal {P}(\underset{u}{>} | F, \alpha ,\beta ) = \prod _{i \in A(u,\alpha )}\prod _{j\in B(u,\beta )} \mathcal {P}(s(u,i) > s(u,j) | F). \end{aligned}$$
(7)

Given any two nodes \(u,v \in V\), we can obtain their node-community weight vectors \(F_{u},F_{v}\) from F. From the observation that the higher cosine similarity of community membership vectors between two nodes, the greater mutual density they will have, we define the probability that \(s(u,i) > s(u,j)\) given the node-community membership matrix as

$$\begin{aligned} \mathcal {P}(s(u,i) > s(u,j) | F) = \sigma (\frac{F_{u}F_{i}^{T}}{\Vert F_{u}\Vert _2\Vert F_{i}\Vert _2} - \frac{F_{u}F_{j}^{T}}{\Vert F_{u}\Vert _2\Vert F_{j}\Vert _2}), \end{aligned}$$
(8)

where \(\sigma \) is the sigmoid function \(\sigma (x)=\frac{1}{1+e^{-x}}\). For simplicity, we define \(\phi (i,j) = \frac{F_{i}F_{j}^{T}}{\Vert F_{i}\Vert \Vert F_{j}\Vert }\), so we have

$$\begin{aligned} \mathcal {P}(s(u,i) > s(u,j) | F) = \sigma (\phi (u,i) - \phi (u,j)). \end{aligned}$$
(9)

Since the sigmoid function maps any real value into (0, 1), this probability approaches to 1 when \(\phi (u,i)\gg \phi (u,j)\) and approaches to 0 when \(\phi (u,i) \ll \phi (u,j)\).

By multiplying Eq. (7) for each node and combining Eqs. (8) and (9), we can derive the final learning objective of the MD-NMF model, which is

$$\begin{aligned} \begin{aligned} l(F) =&\underset{F\in R^{n\times p}_{+}}{\max } \log \prod _{u\in V} \mathcal {P}(\underset{u}{>} | F, \alpha ,\beta ) - \lambda \cdot reg(F) \\ =&\underset{F\in R^{n\times p}_{+}}{\max } \sum _{u\in V} \sum _{i\in A(u,\alpha )} \sum _{j\in B(u,\beta )} \log \mathcal {P}(s(u,i) > s(u,j) | F) \\&-\, \lambda \cdot reg(F) \\ =&\underset{F\in R^{n\times p}_{+}}{\max } \sum _{u\in V} \sum _{i\in A(u,\alpha )} \sum _{j\in B(u,\beta )} \log \sigma (\phi (u,i) - \phi (u,j)) \\&-\, \lambda \cdot reg(F), \end{aligned} \end{aligned}$$
(10)

where reg(F) is a regularization term in order to prevent overfitting of F, and \(\lambda \) is the regularization parameter. For the simplicity of differentiation, we set \(reg(F) = \Vert F\Vert ^{2}_{F}\), which is the Frobenius norm of F.

3.3 Parameter Learning

To make our model scalable to large datasets, we employ the widely used paradigm of Stochastic Gradient Descent (SGD) as our learning algorithm. Also considering the non-negativity constraint, we apply a projected gradient method [18] which maps the vector with negative parameters back to the nearest point in the projected space. Following the learning objective l, we update the matrix F by

$$\begin{aligned} \varTheta _{t+1} = \max \{\varTheta _{t} + \delta \frac{\partial l}{\partial \varTheta },0\}\ , \end{aligned}$$
(11)

where \(\delta \) is the learning rate and \(\varTheta \) can be any entry of matrix F.

Algorithm 1 describes the whole iterative process of parameter learning. In each iteration, the time complexity is O(|E|p), where |E| is the number of edges and p the number of communities. Because we need to save the whole node-community weight matrix F in memory, the space complexity of the algorithm is O(|V|p), where V is the number of nodes. When V becomes too large, the algorithm needs huge memory to store the whole matrix F, which is the limitation of the algorithm. To scale this algorithm to billions of nodes, distributed storage and update of F should be considered.

figure a

Choosing the Number of Communities. Before running Algorithm 1, we need to set the number of communities p in advance. After conducting some experiments on small datasets, we find that if we set p to be larger than the intended p and learn the parameters accordingly, our detected communities contain the results we obtain with the intended p as well as some duplicated communities or trivial communities with few nodes. Thus, our strategy is to pick a relatively large p based on the number of nodes and edges in the network and further refine our results via merging or deletion.

Acquaintances and Strangers Sampling. For node \(u \in V\) and any of its \(\alpha \)-acquaintances i, if \(\alpha > 0\), it is guaranteed that u and i have mutual friends. To find i, we first do a breadth-first search and group all u’s neighbors as well as friends of these neighbors into a set. Then we filter out any node k with \(d(u,k) < \alpha \) in this set and sample i from the remaining nodes uniformly at random. If u does not have any \(\alpha \)-acquaintance, we sample another u and repeat the above process until we get a valid u. To sample the \(\beta \)-stranger of u, we simply sample a random node from graph until we get the \(\beta \)-stranger. From Table 1 we can see that in each graph, the average degree of nodes is much smaller than the number of edges. Thus the time complexity sampling acquaintances and strangers of a node remains constant.

Setting Membership Threshold. For each node, to determine whether it belongs to a particular community, our strategy is to set a membership threshold t for the node-community membership matrix, i.e., if \(F_{u,k} \ge t\), we say that node u is associated with community k. t is a hyper-parameter which is tuned via experimental results.

4 Experiments

4.1 Dataset

The real-world datasets we use include the two large networks we have described in the data observation section, as well as six benchmark networks collected by NewmanFootnote 2. Table 2 lists the basic information of the six benchmark datasets. They are relatively small compared to the two large networks and have no ground-truth communities.

Table 2. Statistics of six Newman’s datasets. |V|: number of nodes, |E|: number of edges.

4.2 Comparison Methods

For comparison, we select the following six baseline approaches, namely Sequential Clique Percolation (SCP) [16], Demon [8], Bayesian Non-negative Matrix Factorization (BNMF) [23], Bounded Non-negative Matrix Tri-Factorization (BNMTF) [36], BigCLAM [34], and Preference-based Non-negative Matrix Factorization (PNMF) [35]. Notice that the latter four approaches are also based on matrix factorization.

4.3 Evaluation Metrics

We use modularity as the evaluation metric for small datasets without ground-truth communities and \(F_{1}\) score for large datasets with ground truth communities.

Modularity. The classic modularity is defined as

$$\begin{aligned} Q = \frac{1}{2|E|}\underset{u,v\in V}{\sum }(G_{u,v} - \frac{d(u)d(v)}{2|E|})I_{u,v}, \end{aligned}$$

where d(u) is the degree of node u, \(G_{u,v}\) is the (uv) entry of the adjacency matrix G, and \(I_{u,v}=1\) if uv are in the same community otherwise 0 [20].

In the overlapping scenario, since a node pair may share more than one communities, a minor modification has been made by replacing \(I_{u,v}\) with \(|C_{u}\cap C_{v}|\), i.e., the number of overlapped community between u and v:

$$\begin{aligned} \hat{Q} = \frac{1}{2|E|}\underset{u,v\in V}{\sum }(G_{u,v} - \frac{d(u)d(v)}{2|E|})|C_{u}\cap C_{v}|. \end{aligned}$$

From the definition, we can see that greater value of modularity reveals denser connectivity within the detected communities because only linked node pairs sharing common communities contribute positively to the value. This metric has also been frequently used in previous MF-based works [34, 35].

\({{\varvec{F}}}_\mathbf{1 }\) Score. The \(F_{1}\) score of a detected community \(S_{i}\) is defined as the harmonic mean of

$$\begin{aligned} precision(S_{i}) = \underset{j}{\max } \frac{S_{j}^{'}\cap S_{i}}{|S_{i}|} \end{aligned}$$

and

$$\begin{aligned} recall(S_{i}) = \underset{j}{\max } \frac{S_{j}^{'} \cap S_{i}}{|S_{j}^{'}|}, \end{aligned}$$

i.e.,

$$\begin{aligned} F_{1} = \frac{precision(S_{i})\cdot recall(S_{i})}{precision(S_{i})+ recall(S_{i})}, \end{aligned}$$

where \(S_{j}^{'}\) is one of the given ground-truth communities. The overall \(F_{1}\) score of the result of detected communities is the average \(F_{1}\) score of all communities in the detected communities set.

4.4 Results

For the small networks, we set the learning rate \(\theta \) as 0.5 and p ranging from 10 to 50. We assume each node joins at most 3 to 10 communities and set the threshold based on this assumption. For the large network datasets, we set \(\theta \) much greater because the normalized term in cosine similarity limits the altered amount of weight in each gradient descent iteration. We set p ranging from 1,000 to 5,000 and assume that each node joins at most 100 communities. The maximum number of iteration is set to be 100, while in most cases F converges before reaching the iteration limit.

Table 3. Comparison of experiment results in terms of modularity.
Table 4. Comparison of experiment results in terms of \(F_{1}\) score.

Table 3 shows the results in terms of modularity on six small benchmark networks without ground-truth communities. We can see that our MD-NMF model outperforms all baseline methods on all datasets on modularity, including LC that leverages the general concept of “neighborhood similarity” as well and PNMF that is also based on a pairwise objective function but employs links as the indicator.

Table 4 shows the results on two large benchmark networks with ground-truth communities. We can see that only three of our comparison methods are able to scale to networks of such size. On both Amazon and DBLP dataset, our MD-NMF model prevails on the metric \(F_1\) score.

5 Related Work

5.1 Community Detection

Community detection has been an important line of research in physics and computer science for a long period, and many different classes of approaches are proposed to solve this problem [11]. Apart from tradition graph partitioning/clustering approaches [15], modularity-based methods are particularly designed for community detection tasks [20]. As the most well-known quality function by far, modularity can be directly optimized. Since optimizing modularity has been proven to be an NP-complete problem [6], many heuristics are proposed to solve it in polynomial time [7, 10, 19]. However, these classic community detection algorithms have a severe limitation that a node belongs to one and only one community.

Until recently, major attention has been focused on the case where communities are allowed to be overlapped [31]. According to the general strategy, overlapping community detection methods can be classified into local methods and global methods. Local methods adopt divide-and-conquer which discovers communities in small subgraphs before merging small communities into larger ones based on some criteria [8, 17, 30]. Global methods employ stochastic block models [2, 14] or community affiliation models [32] which aim to figure out the relationship between nodes and communities in a macro view. As one of the major frameworks, Matrix Factorization (MF) introduces a node-community membership matrix to match the adjacency matrix according to some optimization function [23, 29, 34, 36].

5.2 Mutual Friends

Mutual friend as a strong factor to indicate the closeness between two nodes has been investigated in many social-related tasks. Friend recommender systems provide the potential friends list through discovering the latent information behind network topology and friends in common [4, 25]. Link prediction models in complex networks use common neighbors to evaluate the probabilities of link establishments [5]. Online social rating networks make use of the co-commenting and co-rating behaviors of users to recommend products and predict new rating [27]. In community detection problem, mutual friends have also been employed to measure the strength of connections between nodes. Newman defines connection strength as the normalized term of mutual friends and uses it to cluster nodes [21]. Tang and Liu directly interpret Jaccard similarity as node similarity to fit into K-means algorithm for community detection [28]. Steinhaeuser and Chawla exam Jaccard coefficient as an edge weighting method and employ it in community detection. However, this algorithm fails to detect any community structure without the addition of node attribute [26]. Alvari et al. regard neighborhood similarity, i.e., the number of common neighbors, as a similarity measure and incorporate it into a game theory framework [3]. Ahn et al. explicitly give the definition of link similarity and hierarchically cluster links accordingly [1].

In this paper, mutual density has the same mathematical form as Jaccard similarity or link similarity but is used for measuring the community membership similarity. Thus, we can still calculate mutual density between two nodes even if they are not linked. Also, our model is built on the matrix factorization framework instead of link clustering.

5.3 Bayesian Personalized Ranking

The pairwise objective function of our model is based on the Bayesian Personalized Ranking [24]. This method and its extensions are originally proposed to solve the ranking problem in recommender systems [22, 37]. Zhang et al. employ this model on the overlapping community detection problem [35]. They focus on the link indicator and assume that each node shares more common communities with its neighbors than its non-neighbors, which is more realistic both conceptually and experimentally.

6 Conclusion

In this paper, we propose a Mutual Density based Non-negative Matrix Factorization model for overlapping community detection. We introduce mutual density as a more consistent indicator of community membership similarity than links in traditional methods. The formulation of our model is based on empirical findings that mutual density correlates with the cosine similarity of community membership. Our learning objective maximizes the likelihood that each node has a more similar community membership with its acquaintances than its strangers. Experiment results show that our new model outperforms the other baseline methods as well as the link-based PNMF model in real-world datasets.