Keywords

1 Introduction

Network data is ubiquitous in our daily life, for example, social networks, road networks, and Internet networks. Analyzing these networks is of both theoretical and practical values. Recent years have witnessed numerous network analysis tasks. Among these tasks community detection is one of the most important and challenging problems. A community can be defined as a group of users that (1) interact with each other more frequently than with those outside the group and (2) are more similar to each other than to those outside the group  [15]. The research on community detection is beneficial for a variety of real-world applications such as online marketing and recommendation systems.

A variety of approaches have been proposed to solve the problem of community detection in different types of networks, e.g., homogeneous networks, attributed networks, and heterogeneous networks. More details about community detection can be found in the survey paper  [6]. Among these methods, Nonnegative Matrix Factorization (NMF) based methods have attracted increasing attention since it has been proved to be effective in detecting communities and has powerful interpretability in clustering. NMF based community detection approaches identify the hidden communities by decomposing the adjacency matrix into lower-rank matrices  [22, 24]. From the perspective of statistics, NMF can be viewed as probabilistic  [10] and Bayesian models  [20] by factorizing the input data into distributions instead of real-value matrices. Nonnegative matrix tri-factorization (NMTF), which extends the standard NMF, factorizes the input matrix into three low-rank matrices so that it can provide more information about the interaction between lower-rank representations. Therefore, it has been utilized to community detection task to identify communities and learn community interaction simultaneously  [15, 28]. However, real-world networks could be noisy. Existing NMF based community detection methods are sensitive to the outliers and noise due to the utilization of the squared loss function to measure the quality of graph regularization and network reconstruction  [7]. The community detection performance may be degraded by the inevitable noise.

To deal with this issue, we propose a framework based on the nonnegative residual matrix factorization (NRMF) in this study. A residual matrix, represented by the matrix reconstruction error, is explicitly introduced to capture the impact of outliers and noise. The residual matrix should be sparse intuitively so some sparse regularization can be used to model the sparsity. Three different types of norms, i.e., \(L_0\), \(L_{1}\) and \(L_{2,1}\), have been studied for the sparsity modeling. To optimize the NRMF framework with different types of regularization, multiplicative update rules  [11] are used to learn the lower-rank matrices and different thresholding operators are used to learn the residual matrix corresponding to different regularization types. To evaluate the performance of the proposed NRMF, we conduct community detection experiments on two types of real-world networks from different domains: networks with and without known communities.

Our contributions can be summarized as follows:

  • We propose a nonnegative residual matrix factorization based framework NRMF to detect communities. A residual matrix, which is represented by the matrix reconstruction error, is explicitly introduced to model the impact of outliers and noise. The residual matrix is regularized to capture the sparsity.

  • We explore three different regularization types to model the sparsity, i.e., \(L_0\), \(L_{1}\) and \(L_{2,1}\) norms, on the residual matrix. We also propose the multiplicative update rules and different thresholding operators to optimize the objectives.

  • We evaluate the effectiveness of NRMF using several real-world networks. The experimental results demonstrate that our method outperforms state-of-the-art NMF based methods in community detection.

The rest of the paper is organized as follows. Section 2 provides an overview of the related work. Notations and problem formulation are given in Sect. 3. Section 4 explains the proposed NRMF framework. In Sect. 5 we then discuss our experimental study. Finally, in Sect. 6 we draw conclusions and outline directions for future work.

2 Related Work

Communities are groups of vertices which probably share common properties and/or play similar roles within the graph  [6]. Traditional community detection methods aim to partition nodes into different groups such that the number of edges between groups is minimal. For example, cut-based graph partition  [9], k-core decomposition  [5]. There are some methods aiming at explore the graph structures. Modularity maximization aims to find communities which can optimize a predefined measures  [13]. Spectral graph clustering makes use of the eigenvectors of Laplacian matrices to group nodes  [6].

Recently, NMF-based clustering approaches have also been applied in community detection. NMF as well as NMTF techniques have been used for community detection in networks in  [22] where it aims to factorize the adjacency matrix of the given network. Bayesian version of NMF for community detection has been proposed in  [19] to identify overlapping communities. Since NMTF explicitly models data interactions through an extra latent factor, it provides better interpretability and then has been employed in community detection. NMTF with specific bounds has been proposed in  [28] to detect overlapping communities. NMTF with graph regularization has been used in  [15] to discovery communities in attributed social networks. REACT  [17] employs two NMTF components for community detection and role discovery and a regularization component for modeling the relations between communities and roles. Deep Autoencoder-like NMF  [26] consists of encoder and decoder, where the encoder component attempts to transform the original network into the community membership space and the decoder seeks to reconstruct the original network from the community membership space with the aid of the hierarchical mappings learnt in the encoder component. Local community detection problem has been studied in  [8]. Community detection in multi-layer networks using NMF has been studied in  [12]. To autonomously determine the number for communities, an adaptive NMF model has been proposed in  [25].

With the rapid development of deep learning techniques, community preserving network embedding approaches which are based on deep learning have attracted enormous attention from machine learning and graph mining communities recently. Existing network embedding methods have reported promising results in community detection. For example, M-NMF  [23] exploits the consensus relationship between the representations of nodes and community structure, and then jointly optimize NMF based representation learning model and modularity based community detection model. Community Embedding (ComE)  [3] is an embedding based method for joint node embedding and community detection. DNGE  [16] uses Gaussian embedding to detect communities in dynamic networks.

Due to the space limitation, we refer the reader to  [6] for more details in community detection research.

3 Preliminary

We first summarize the notations used in this study in Table 1 and then introduce the backgrounds of the techniques we will use in this paper.

Table 1. Summary of the notations.

Data could be noisy and there may exist some entries of the data corrupted arbitrarily. To capture such noisy information in the framework of NMF, in  [7, 18] a sparse error matrix S has been introduced to capture the sparse corruption. Thus, a robust factorization to approximate the input data matrix A can be defined as:

$$\begin{aligned} A\approx UV^T+S, \end{aligned}$$
(1)

where S is supposed to be sparse. To guarantee the sparsity, [18] proposed to use \(L_0\) norm to regularize S and [7] utilized \(L_1\) norm as the sparseness constraint. Therefore, the objective function can be defined as

$$\begin{aligned} \min _{U,V,S}\Vert A-UV^T-S\Vert _F^2. \end{aligned}$$
(2)

Then, we will introduce the formal definitions of different norms:

  • \(L_0\) Norm: \(L_0\) norm is not really a norm and it is defined as the number of nonzero elements in the given matrix.

  • \(L_1\) Norm: \(L_1\) norm is the sum of the absolute values of the columns. Given a matrix \(A_{n\times n}\), formally

    $$\begin{aligned} L_1(A)=\max _{1\le j\le n}\sum ^n_{i=1}|A_{ij}|. \end{aligned}$$
    (3)

    \(L_1\) Norm is robust to noise and outliers because it considers the sparsity in rows.

  • \(L_{2,1}\) Norm: \(L_{2,1}\) Norm combines \(L_1\) and \(L_2\) norms. Given a matrix \(A_{n\times n}\), \(L_{2,1}(A)\) is defined as

    $$\begin{aligned} L_{2,1}(A)=\sum _{i=1}^{N}\Big (\sum _{j=1}^{N}|A_{ij}|^2\Big )^{1/2}. \end{aligned}$$
    (4)

    \(L_{2,1}\) norm controls the capacity of A and also ensures A to be sparse in rows so it is robust to noise and outliers.

4 The Proposed Method

In this paper, motivated by the robust NMF  [7, 18], we propose a general framework, which is less sensitive to noise and outliers, to detect communities. The framework can be formulated as:

$$\begin{aligned} \min _{C,M,S}&\Vert A-CMC^T-S\Vert _F^2+\alpha \cdot \Vert S\Vert _{p}+\beta \cdot Tr(C^TLC),\\\nonumber&s.t.~C\ge 0,M\ge 0,C^TC=I, \end{aligned}$$
(5)

where Tr(V) denotes the trace of the matrix V. L is the graph Laplacian and defined as \(L=D-A\) where \(D_{ii}=\sum _{l}A_{il}\). This framework is flexible to incorporate different types of sparse regularization and constraints. It is worth noting that:

  • In this work we only consider undirected networks where the adjacency matrix A is symmetric. This framework can be extended to directed networks straightforwardly by changing the first term to \(\Vert A-CMB^T-S\Vert _F^2\).

  • Empirically we find that adding the interaction matrix M and orthogonal constraint would achieve worse performance so in the following discussion we will simplify the objective function as:

    $$\begin{aligned} \min _{C,M,S}&\Vert A-CC^T-S\Vert _F^2+\alpha \cdot \Vert S\Vert _{p}+\beta \cdot Tr(C^TLC),\\\nonumber&s.t.~C\ge 0,M\ge 0, \end{aligned}$$
    (6)
  • In this work, we exploit three different norms, \(L_0\), \(L_1\) and \(L_{2,1}\), to capture the reconstruction error, i.e., q is set to be 0, 1 and 2,1, respectively.

4.1 Optimization

The objective function in Eq. (6) is not convex for all parameters simultaneously. We use the multiplicative update rules to solve this optimization problem due to its good compromise between speed and ease of implementation  [11]. We will optimize the objective with respect to one variable while fixing the other variables.

The update rules for matrices C and M (since we will not consider M we can simply set \(M=I\)) are similar to the standard NMF, which is defined as:

$$\begin{aligned} C&\leftarrow C\circ \frac{(A-S)CM+\beta AC}{CMC^TCM+\beta DC}\end{aligned}$$
(7)
$$\begin{aligned} M&\leftarrow M\circ \frac{C(A-S)C}{C^TCMCC^T} \end{aligned}$$
(8)

where \(\circ \) denotes the element-wise product.

The update rules for the residual matrix S is different from different norms.

\(L_{0}\) Regularized Residual. For \(L_{0}\) norm, a hard-thresholding update rule has been used in  [18]. Formally, it is define as:

$$\begin{aligned} S_{ij}=\left\{ \begin{array}{ll} 0 &{} {\text {if}~|R_{ij}|\le \alpha }\\ R_{ij} &{} {\text {otherwise}} \end{array} \right. \end{aligned}$$
(9)

where \(R_{ij}=(A-CC^T)_{ij}\).

\(L_{1}\) Regularized Residual. For \(L_{1}\) norm, a soft-thresholding update rule has been used in  [7]. Formally, it is define as:

$$\begin{aligned} S_{ij}=\left\{ \begin{array}{ll} 0 &{} {\text {if}~|R_{ij}|\le \frac{\alpha }{2}}\\ R_{ij}-\frac{\alpha }{2}sign(R_{ij}) &{} {\text {otherwise}} \end{array} \right. \end{aligned}$$
(10)

where \(R_{ij}=(A-CC^T)_{ij}\), \(sign(R_{ij})\) is the sign function: if \(R_{ij}>0\), \(sign(R_{ij})=1\); if \(R_{ij}<0\), \(sign(R_{ij})=-1\); otherwise \(sign(R_{ij})=0\).

\(L_{2,1}\) Regularized Residual. \(L_{2,1}\) norm is superior to \(L_0\) and \(L_1\) norm because: (1) it can model the sparsity effectively, and (2) it has a simple and efficient solution to solve the optimization problem  [14]. Based on  [14], we have the update rule for\(L_{2,1}\) regularized S as:

$$\begin{aligned} S\leftarrow S\circ \frac{A-CC^T}{S+\alpha DS}, \end{aligned}$$
(11)

where D is the diagonal matrix with the j-th diagonal element which is defined as:

$$\begin{aligned} D_{jj}=\frac{1}{\Vert S_{j}\Vert _{2}} \end{aligned}$$
(12)
figure a

4.2 Computational Complexity

Computational complexity. For simplicity, given two matrices \(M_{n\times r}\) and \(N_{r\times f}\), the computational complexity of the multiplication of M and N is O(nrf). The complexity of updating rules in Algorithm 1 (Line 3–5) is \(O(n^2c\,+\,nc^2\,+\,n^3)\). To update S (Line 6–12), the complexity for \(L_0\) and \(L_1\) is \(O(n^2+nc^2)\) and for \(L_{2,1}\) is \(O(nc^2+n^3)\). By taking the number of iteration i into consideration, the complexity is \(O(i(n^2c+nc^2+n^3))\).

5 Experiments

To validate the effectiveness of NRMF in community detection, we conduct experiments on two types of real-world networks from different domains, i.e., networks with and without known communities. To further exploit the influence of different regularization terms in sparsity modeling, we compare NRMF with \(L_0\), \(L_1\) and \(L_{2,1}\) regularization.

5.1 Datasets and Evaluation Metrics

We conduct experiments on two types of data: networks with and without known communities. For networks with known communities, we select Football, Email, and Wiki networks. There are 5, 42 and 19 communities in Football, Email and Wiki networksFootnote 1, respectively. For networks without known communities, we select Jazz, Email-Univ and Hamsterster networks. The optimal number of communities for each network is inferred using Louvain algorithm  [1]. A brief summary of these datasets is shown in Table 2. We employ purity and normalized mutual information (NMI) as the evaluation metrics for networks with known communities. Purity measures the extent to which each cluster contained data points from primarily one class. The purity of a clustering is obtained by the weighted sum of individual cluster purity values which defined as:

$$\begin{aligned} Purity=\frac{1}{N}\sum _{i=1}^{k}max_j|c_i\cap t_j|, \end{aligned}$$
(13)

where N is number of objects, k is number of clusters, \(c_i\) is a cluster in C, and \(t_j\) is the classification which has the max count for cluster \(c_i\). NMI evaluates the clustering quality based on information theory, and is defined by normalization on the mutual information between the cluster assignments and the pre-existing input labeling of the classes:

$$\begin{aligned} NMI(\mathcal {C,D})=\frac{2*\mathcal {I(C,D)}}{\mathcal {H(C)+H(D)}}, \end{aligned}$$
(14)

where obtained cluster \(\mathcal {C}\) and ground-truth cluster \(\mathcal {D}\). The mutual information \(\mathcal {I}(\mathcal {C},\mathcal {D})\) is defined as \(\mathcal {\mathcal {I}}(\mathcal {C},\mathcal {D})=\mathcal {H}(\mathcal {C})-\mathcal {H(C|D)}\) and \(\mathcal {H}(\cdot )\) is the entropy.

For networks without known communities we use modularity as the evaluation metric. Formally, modularity is defined as:

$$\begin{aligned} Q = \frac{1}{2m} \sum _{ij} \left( A_{ij} - \frac{k_ik_j}{2m}\right) \delta (c_i,c_j) \end{aligned}$$
(15)

where m is the number of edges, A is the adjacency matrix of the input graph, \(k_i\) is the degree of node i and \(\delta (c_i, c_j)\) is 1 if node i and node j are in the same community and 0 otherwise.

Table 2. Summary of data sets used in the experiments where n, e and c denote the number of nodes, edges, and communities, respectively.

5.2 Baseline Methods

To demonstrate our model detects communities more effectively than other NMF based community detection approaches, we select several representative NMF based methods as our baselines:

Table 3. Community detection performance w.r.t. Purity. Best performance is in bold font.
  • NMF  [22]: NMF is the basic matrix factorization framework. To make a fair comparison, the objective function for NMF is \(\Vert A-CC^T\Vert _F^2\).

  • Orthogonal NMF (ONMF)  [4]: ONMF is a variant of NMF by enforcing orthogonal constraints on the community membership matrix C, i.e., \(C^TC=I\). In particular, since we add contraint to C, the objective is \(\Vert A-CMC^T\Vert _F^2\).

  • Projected NMF (PNMF)  [27]: PNMF directly projects the original network to a subspace by minimizing \(\Vert A-CC^TA\Vert _F^2\).

  • Bayesian NMF (BNMF)  [20]: BNMF is a bayesian NMF model. It models the matrix factorization in a Bayesian fashion.

  • Graph regularized NMF (GNMF)  [2]: GNMF incorporates an affinity graph which is constructed to encode the geometrical information to NMF, and then seeks a matrix factorization which respects the graph structure.

  • BigClam  [24]: BigClam is a cluster affiliation model. It relaxes the graph fitting problem into a continuous optimization problem to find overlapping communities.

  • Nonnegative Symmetric Encoder-dedcoder (NSED)  [21]: NSED is a nonnegative symmetric encoder-decoder approach proposed for community detection. It extracts the community membership by integrating a decoder and an encoder into a unified loss function.

For our framework, we compare three variants: \(L_0\) regularized NRMF (NRMF-\(L_{0}\)), \(L_1\) regularized NRMF (NRMF-\(L_{1}\)) and \(L_{2,1}\) regularized NRMF (NRMF-\(L_{2,1}\)).

5.3 Experimental Results

The experimental results on networks with known communities are shown in Table 3 based on Purity and Table 4 based on NMI. Based on these results, it can be observed that:

  • NRMF-\(L_{2,1}\) achieves the best performance in community detection based on both the purity and NMI metrics. It indicates \(L_{2,1}\) regularization can better capture the sparsity of the residual matrix compared to other types of regularization such as \(L_0\) and \(L_1\).

  • NRMF methods with \(L_1\) and \(L_{2,1}\) sparse regularization outperform other NMF based approaches. It demonstrates that our proposed framework can effectively identify the communities because it explicitly takes noise into consideration when factorizing the input adjacency matrix.

  • NRMF methods with \(L_0\) performs worse than some other NMF based methods. It shows that \(L_0\) norm is not a good choice for sparsity modeling. This may because \(L_0\) norm only considers the number of nonzero elements but ignore the values of these elements.

  • It is interesting to observe that ONMF and GNMF achieve worse performance than other NMF based methods including the standard NMF. This may result from that real-world network data is very sparse and constraints on the latent representations may be difficult to achieve.

Table 4. Community detection performance w.r.t. NMI. Best performance is in bold font.

The experimental results on networks without known communities are shown in Table 5 based on Modularity. From these results, we can draw the same conclusions to that on the networks with known communities. Since modularity is a measure only based on the network structure (ground-truth community labels may be related not only to structures but also semantics), good performance on modularity further demonstrate the effectiveness of our framework from the structural perspective.

Table 5. Community detection performance w.r.t. Modularity. Best performance is in bold font.

5.4 Parameter Sensitivity

The NRMF involves two parameters shown in Eq. (5): \(\alpha \) controls the trade-off of redisual modeling and \(\beta \) controls the graph regularization. In this section, we examine how the different choices of parameters affect the performance of NRMF in community detection. Specifically, we measure the NMI and Purity on Email data and Modularity on Email-Univ data. The results are shown in Figs. 1, 2 and 3. Note that only NRMF with \(L_{2,1}\) is evaluated because it achieves the best performance in the experiments above.

Fig. 1.
figure 1

Parameter sensitivity of \(\alpha \) on Email data.

It can be observed from these results that:

  • Large \(\alpha \) is preferred in order to achieve better NMI and Purity in detecting communities. In specific, when \(\alpha \) is 0.6–0.7, the performance is the best.

  • In contrast, smaller \(\beta \) brings better community detection performance. In the experiments, \(\beta =0.1\) is the best choice w.r.t. both NMI and Purity.

  • For the networks without ground-truth labels, the choices of \(\alpha \) and \(\beta \) are different. In Email-Univ, smaller \(\alpha \) and larger \(\beta \) give better performance. In this specific experiment, \(\alpha =0.1\) and \(\beta =0.8\) would be the best parameters.

Fig. 2.
figure 2

Parameter sensitivity of \(\beta \) on Email data.

Fig. 3.
figure 3

Parameter sensitivity results on Email-Univ data.

6 Conclusion

We proposed NRMF, a novel framework using nonnegative residual matrix factorization for community detection, that addresses the limitation of existing NMF based community detection approaches: being sensitive to noise and outliers. In NRMF, a residual matrix has been introduced explicitly to capture the noise. To model the sparsity of the NRMF, we exploit three types of regularization, i.e., \(L_0\), \(L_1\) and \(L_{2,1}\). To optimize the objective w.r.t different regularization terms, different updating rules have been employed. Our experimental study demonstrated that NRMF effectively preserves community structures and captures noisy information, outperforming state-of-the-art NMF based methods in community detection.

On the basis of NRMF, several new research lines can be pursued. For example, it is interesting to exploit more advanced optimization method, e.g., ADMM, or extend it to attributed networks. We leave these extensions for future work.