Community detection in signed networks by relaxing modularity optimization with orthogonal and nonnegative constraints

Zhang, Yunfei; Liu, Yuyan; Ma, Xiaomeng; Song, Jie

doi:10.1007/s00521-019-04597-9

Community detection in signed networks by relaxing modularity optimization with orthogonal and nonnegative constraints

Original Article
Published: 06 November 2019

Volume 32, pages 10645–10654, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Community detection in signed networks by relaxing modularity optimization with orthogonal and nonnegative constraints

Download PDF

Yunfei Zhang¹,
Yuyan Liu¹,
Xiaomeng Ma² &
…
Jie Song¹

358 Accesses
4 Citations
Explore all metrics

Abstract

Community detection in networks including singed edges is a primary challenge that has already attracted substantial attention. In this paper, we show that this task could be reformulated as a combinatorial optimization concerning the trace of the signed modularity matrix. Keeping the orthogonal and nonnegative constraints in the relaxation, we propose a multiplicative update rule, named the SMON algorithm, which results in a solution that is a close approximation to the genuine community indication matrix. In addition, the rows of the solution can be referred to as the probabilities of corresponding vertex falling into each community, which can help us to discover the overlapping community structure of the network and identify vertices that reside on the watersheds between different communities. Experimental results on real-life social networks as well as synthetic signed networks verify that our method is effective and superior to the existing approaches.

Community Detection Based on Regularized Semi-Nonnegative Matrix Tri-Factorization in Signed Networks

Article 30 May 2017

A Regularized Convex Nonnegative Matrix Factorization Model for signed network analysis

Article 02 January 2021

An algorithm based on positive and negative links for community detection in signed networks

Article Open access 07 September 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Community structure exists in a wide range of real-world networked systems, such as social [1, 2], technological [3], biological [4,5,6], and other complex networks [7,8,9]. Given a network, it has been shown that detecting its communities can help to understand its intrinsic topology, characteristics, and even dynamics.

Since the pioneering work [10], there come a large number of algorithms for community detection in networks (see, e.g., [5, 11,12,13,14] and recent reviews [8, 9]). For example, a two-stage approach to community detection in signed networks was proposed by Huang et al. [40]. Further, Jiang et al. [41] applied the characteristics of network structure and game theory to the field of social science.

Although these methods have achieved promising results, they routinely focused on the unsigned networks including positive edges only. In recent years, singed graphs have received increasing attention in literature, due to the fact that it provides a concise mathematical representation for many real-life complex systems which contains both “positive” and “negative” relationships. For example, the friendly and hostile relations coexist among individuals [15, 16], the alliances and disputes occur among countries simultaneously [17], and the active and inhibitive regulations appear among genes at the same time [18].

The study of signed networks can be traced back to the 1940s. Heider’s sociological research found that if two individuals are positively related, they normally have the same attitudes toward other people [19]. This intuitive discovery was subsequently represented as the structural balanced theory [20]. This theory tells that we can split vertices of a structural balanced network into two groups such that all edges within the same group are positive, and meanwhile all edges between the different groups are negative. Soon after, the weak balance [21] concept was proposed when the network has several clusters, each including only positive edges within itself and negative edges connecting to others. Obviously, we can easily obtain the community structure of such networks as long as we remove the negative links. However, it is much more difficult to detect communities in an unbalanced network which is usually encountered in many real applications. In addition, empirical studies have shown that the traditional approaches developed for unsigned networks are unsuitable for networks comprised both positive and negative links [17, 22, 23], since they have no way to take into account the information supplied by negative connections which actually play a very important role in the structure and evolution of the entire network [18, 24,25,26].

The most simple way to address this challenge is seemingly to generalize the existing algorithms to allow for negative links. Following the idea, Gómez et al. introduced a reformulation of modularity to analyze the community structure in networks constructed by correlated data [27]. Coincidentally, Traag and Bruggeman proposed a similar definition of modularity for signed networks based on a Potts model and employed the simulated annealing algorithm to minimize the Hamiltonian [17]. Furthermore, Anchuri and Magdon-Ismail iteratively extracted the communities in a singed network by using the spectral partitioning method [28]. Also, they presented a strategy similar to the Kernighan–Lin method [29] to further improve the performance. Besides, other methods, such as the agent-based random walk [30] and the multi-objective approach [31], have been employed for community detection in signed networks.

In this paper, we attempt to explore the community structure of a given signed network by maximizing the generalized modularity [17, 27]. We show that this task can be reformulated as a discrete optimization problem that involves the trace of a matrix (we name it signed modularity matrix). Relaxing the problem while keeping the orthogonal and nonnegative constraints leads to a solution which is a close approximation to the ideal community indication matrix, and vertices can directly be assigned into groups. In addition, the rows of the solution can be regarded as the possibilities of corresponding vertex falling into each community. This information can help us discover the overlapping community structure of the network and find vertices that locate on the boundaries between the communities.

The remainder of this paper is organized as follows. We begin to go over the definition of modularity for the signed networks in Sect. 2. We reformulate the problem of modularity maximization as a discrete optimization involving the trace of the signed modularity matrix in Sect. 3, whose relaxation with orthogonal and nonnegative constraints lead to a multiplicative update rule that can produce a solution that is a close approximation to the ideal community indication matrix. Experimental results on two social networks and a set of synthetic networks with various designed community structures are given in Sect. 4, followed by the conclusions in Sect. 5.

2 Preliminaries

For an unsigned network, the most efficient and accurate solutions to its community detection problem are those that optimize a quality function named modularity [14]. This function measures the discrepancy between the probability of edges falling into communities in the network and the corresponding probability in a null model that preserves the same degree distribution of the network. Indeed, this function can evaluate the fitness of a partition to a network, since the larger the modularity the better the partitioning. Unfortunately, methods based on modularity optimization suffered a failure when they are directly applied to extract the community structure of signed networks [22, 23, 27]. To make use of the information hidden in negative links, Gómez et al. introduced a modification of modularity [27] that serves as the foundation of our research and will be briefly reviewed as follows.

A signed network $G = (V, E)$ has a vertex set V including n vertices, and an edge set $E \subseteq (V, V)$ containing vertex pairs. Its adjacency matrix A can be defined as $A_{ij} = 1$ if there is a positive edge from vertex i to vertex j, $A_{ij} = -1$ if there is a negative edge from vertex i to vertex j, and $A_{ij} = 0$ otherwise. For weighted networks, $A_{ij}$ describes the weight of the edge (i, j). We can separate the positive and negative edges by setting $A^+_{ij} = A_{ij}$ if $A_{ij} > 0$ and 0 otherwise, and $A^-_{ij} = -A_{ij}$ if $A_{ij} < 0$ and 0 otherwise. Obviously, we have $A = A^+ - A^-$. The positive degree of vertex j is defined by $k^+_i = \sum _{j=1}^n A^+_{ij}$ and the positive total strengths is given by $2M^+ = \sum _{i=1}^n k^+_i = \sum _{i,j=1}^n A^+_{ij}$. Suppose that there are c groups $\{g_k\}_{k=1}^c$^{Footnote 1} and the modularity for the positive component can be given by

$$\begin{aligned} Q^+ = \frac{1}{2M^+} \sum _{i,j = 1}^n \left( A^+_{ij} - \frac{k^+_ik^+_j}{2M^+}\right) \delta (g_i, g_j), \end{aligned}$$

(1)

where the function $\delta (g_i, g_j)$ returns the values 1 if vertices i and j are in the same community, and 0 otherwise. Similarly, the modularity for the negative part can be represented as

$$\begin{aligned} Q^- = \frac{1}{2M^-} \sum _{i,j = 1}^n \left( A^-_{ij} - \frac{k^-_ik^-_j}{2M^-}\right) \delta (g_i, g_j), \end{aligned}$$

(2)

where $k^-_i$ is the negative degree of vertex i such that $k^-_i = \sum _{j=1}^n A^-_{ij}$ and the negative total strengths $2M^- = \sum _{i=1}^n k^-_i = \sum _{i,j=1}^n A^-_{ij}$. The total modularity must be a trade-off between the tendency of building communities by positive links and that of destroying communities by negative links. If $Q^+$ and $Q^-$ are assumed to give contributions to modularity proportionally to their respective positive and negative strengths, the final definition for modularity is

$$\begin{aligned} Q= & {} \frac{2M^+}{2M^+ + 2M^-} Q^+ - \frac{2M^-}{2M^+ + 2M^-} Q^- \nonumber \\= & {} \frac{1}{2M^+ + 2M^-} \sum _{i,j=1}^n \left( A_{ij} - \frac{k^+_ik^+_j}{2M^+} + \frac{k^-_ik^-_j}{2M^-}\right) \delta (g_i, g_j). \end{aligned}$$

(3)

It is easy to check that the standard modularity is recovered without negative links and modularity is equal to zero when all vertices are in one community together. With the definition at hand, the problem of community detection in a signed network becomes to seek an optimal solution that maximizes Eq. (3).

3 Modularity optimization with nonnegative relaxation

We cannot use the exhaustive search to solve the optimization of the modularity, because the number of partitions increases at least exponentially along with the rise of vertices. In other words, this optimization problem is NP-hard. It has been suggested to solve the problem by various optimization heuristics [17, 22, 27]. In the following subsections, we first give a reformulation of the modularity and then propose a relaxation with orthogonal and nonnegative constraints which can be solved by a multiplicative update rule.

3.1 Modularity reformulation

Let $\{\varvec{z}_{k}\}_{k=1}^c$ be the community indicators whose ith element $z_{ik}$ is 1 if vertex i belongs to community k, and 0 otherwise. In particular $\varvec{z}_k = (0, \dots , 0, 1, \dots , 1, 0, \dots , 0)^T$ if vertices within each community are adjacent. Therefore, up to the constant factor, Eq. (3) can be rewritten as

$$\begin{aligned} Q= & {} \sum _{k=1}^c \left( \sum _{i,j=1}^n A_{ij} z_{ik} z_{jk} - \frac{1}{2M^+} \left( \sum _{i=1}^n k_i^+ z_{ik}\right) ^2 \right. \nonumber \\&+\left. \frac{1}{2M^-} \left( \sum _{i=1}^n k_i^- z_{ik}\right) ^2\right) \nonumber \\= & {} \sum _{k=1}^c \left( \varvec{z}_k^T A \varvec{z}_k - \frac{1}{2M^+} (\varvec{k}^T_+ \varvec{z}_k)^2 + \frac{1}{2M^-} (\varvec{k}^T_- \varvec{z}_k)^2 \right) \nonumber \\= & {} \sum _{k=1}^c \varvec{z}^T_k \left( A - \frac{1}{2M^+} \varvec{k}_+ \varvec{k}^+_+ + \frac{1}{2M^-} \varvec{k}_- \varvec{k}^T_-\right) \varvec{z}_k, \end{aligned}$$

(4)

where $\varvec{k}_+$ is the positive degree vector such that $\varvec{k}_+ = (k^+_1, \dots , k^+_n)^T$ and similarly the negative degree vector $\varvec{k}_- = (k^-_1, \dots , k^-_n)^T$. Denoting $Z = (\varvec{z}_1, \dots , \varvec{z}_c)$, the problem of maximizing Eq. (4) can be further represented by

$$\begin{aligned} \max _{Z} \mathrm {Tr}(Z^T B Z),\quad s.t. ,\,Z \in \{0, 1\}^{n \times c}, \end{aligned}$$

(5)

where $B = A - \frac{1}{2M^+} \varvec{k}_+ \varvec{k}^T_+ + \frac{1}{2M^-} \varvec{k}_- \varvec{k}^T_-$. The problem (5) is a combinatorial optimization involving the trace of the matrix B which we name the signed modularity matrix hereafter.

3.2 Nonnegative relaxation

The problem (5) is also NP-hard. We can derive a good approximation by transforming the discrete optimization problem into one that is continuous. It is worthwhile to note that the columns of the indicator matrix Z are orthogonal and nonnegative. If we retain their orthogonality purely, the optimal solution is exactly the eigenvectors corresponding to the top c eigenvalues of the signed modularity matrix B. The leading eigenvector can be used to make a bipartition of the network according to the signs of its elements, on the condition that there only exist two groups [28]. For a signed network with more than two clusters, these eigenvectors could be seriously far away from the true community indication vectors that they approximate, owing to their mixed-sign entries. Consequently, we often need to use other clustering methods (e.g., the K-means algorithm) to obtain the final results.

A more accurate relaxation is preserving both the nonnegative and orthogonal constraints on the matrix Z, which is given by

$$\begin{aligned} \max _{Z} \mathrm {Tr}(Z^T B Z),\quad s.t. ,\,Z^TZ = I, \, Z \ge 0. \end{aligned}$$

(6)

This problem has been well studied in previous works [32,33,34], with different format of the matrix B. Mathematically, the problem (6) is identical to

$$\begin{aligned} \max _{Z} \mathrm {Tr}[Z^T (\rho I + B) Z],\quad s.t. ,\,Z^TZ = I, \, Z \ge 0. \end{aligned}$$

(7)

because the term $\mathrm {Tr}(Z^T \rho I Z) = \rho \mathrm {Tr}(I) = \rho n$ is irrelevant to Z. Specifically, let $\rho$ to be the absolute value of the minimal eigenvalue of the singed modularity matrix, i.e., $\rho = |\lambda _{\min }(B)|$, ensuring that the matrix $\rho I + B$ is positive definite. This trick makes the optimization problem well-behaved. The lagrangian function of problem (7) can be given by

$$\begin{aligned} L = \mathrm {Tr}[Z^T(\rho I + B)Z] - \mathrm {Tr}[\Lambda (Z^TZ-I)] - \mathrm {Tr}(\Sigma Z), \end{aligned}$$

(8)

where the Lagrange multiplier $\Sigma$ ensures the nonnegative constraint $Z \ge 0$ and the Lagrange multiplier $\Lambda$ ensures the orthogonal constraint $Z^TZ = I$, respectively. The Karush–Kuhn–Tucker (KKT) complementary slackness condition becomes

$$\begin{aligned} \left( \frac{\partial L}{\partial Z_{ik}}\right) Z_{ik} = [(\rho I + B) Z - Z\Lambda ]_{ik} Z_{ik} = 0, \end{aligned}$$

(9)

which is mathematically identical to

$$\begin{aligned} {[}(\rho I + B) Z - Z\Lambda ]_{ik} Z^2_{ik} = 0. \end{aligned}$$

(10)

Summing over k, we obtain $\Lambda _{ii} = [Z^T(\rho I + B)Z]_{ii}$, giving the diagonal elements of $\Lambda$. To compute the off-diagonal entries, we temporarily neglect the nonnegative constraint and let $\partial L/\partial Z$ = 0 which leads to $\Lambda _{ii'} = [Z^T(\rho I + B)Z]_{ii'}$. Combining these two results, we have

$$\begin{aligned} \Lambda = [Z^T(\rho I + B)Z]. \end{aligned}$$

(11)

Decomposing matrices B and $\Lambda$ into positive and negative parts by

$$\begin{aligned} B = B^+ - B^-, \quad \Lambda = \Lambda ^+ - \Lambda ^-, \end{aligned}$$

(12)

$$\begin{aligned}&\frac{1}{2}\frac{\partial \left( \mathrm {Tr}[Z^T(\rho I + B)Z] - \mathrm {Tr}(\Lambda Z^TZ)\right) }{\partial Z} \nonumber \\&\quad = (\rho Z + B^+Z + Z\Lambda ^-) - (B^-Z+Z\Lambda ^+) = 0, \end{aligned}$$

(13)

which leads to the multiplicative update formula

$$\begin{aligned} Z_{ik} \leftarrow Z_{ik} \sqrt{\frac{(\rho Z + B^+Z + Z\Lambda ^-)_{ik}}{(B^-Z+Z\Lambda ^+)_{ik}}}. \end{aligned}$$

(14)

We can see that $Z_{ik}$ will decrease when the corresponding element of the gradient in Eq. (14) is smaller than zero, and will increase otherwise. That is to say, the update direction of the rule (14) is always the same to the update direction in the gradient ascent method. The fact guarantees that the Lagrangian function (8) increases monotonically. Since the feasible domain of problem (7) is non-convex, the update rule (14) can only reach a local optimum. The algorithm may return different solutions if the initial conditions are different. To get an acceptable solution, we run our algorithm several times with various possible starting values and output the solution that gives the largest value of Eq. (4) over all the runs.

3.3 Overlapping structure

The solution, returned by the update rule (14), provides us with valuable information for the network’s community structure. When we compute the off-diagonal elements of the lagrangian multiplier, the nonnegative constraint is temporally ignored. Therefore, the columns of indicator matrix Z are not exactly orthogonal. An exact orthogonal constraint means that each row of Z can have only one nonzero element, which implies that each vertex can belong to one community exclusively. This is the hard partition. The approximately orthogonal condition relaxes this constraint slightly. Each vertex could fall fractionally into more than one community, which yields the soft partition of a network and sheds light on the elucidation of its overlapping structure.

In fact, the rows of the indicator matrix Z can be understood as the possibilities that each vertex falls into different groups, as shown in previous studies [32,33,34] as well as our experiments. More precisely, the magnitude of $Z_{ik}$ quantifies the degree that the vertex i should be assigned to community k. Therefore, we can simply group vertex i into the community $k^{*}$ to which it is most likely to belong, i.e., $k^{*} = \max \nolimits _{k}\{Z_{ik}, k = 1, \dots , c\}$. On the other hand, it is very interesting to assign vertices to more than one cluster, namely, overlapping community detection [5, 35,36,37]. The vertices falling into several groups are observed to play an essential role in networks. Furthermore, some “instable” vertices [35], which reside on the border between two communities, are very difficult to be classified into any community. It is very important to discover the overlapping community structure of a signed network and to identify the instable vertices. We can normalize each row of matrix Z such that $\sum _{k} Z_{ik} = 1$ and employ the bridgeness [36] and community entropy [34] to explore the overlapping community structure. The two metrics of vertex i are given by

$$\begin{aligned} b_i= & {} 1 - \sqrt{\frac{c}{c-1} \sum _{k=1}^c \left( Z_{ik} - \frac{1}{c}\right) ^2}, \end{aligned}$$

(15)

$$\begin{aligned} \xi _{i}= & {} - \sum _{k=1}^c Z_{ik} \log _{k} Z_{ik}. \end{aligned}$$

(16)

It is clear that vertex i is likely to participate in more than one community simultaneously if it has a large bridgeness $b_i$ and entropy $\xi _i$.

4 Experiments

In this section, we fist apply the proposed algorithm, denoted by SMON (Signed Modularity maximization with Orthogonal and Nonnegative constraints), to two social networks and make the exploratory analyses of their community structures. After that, we check our method with a set of synthetic networks having controlled community structures and the results demonstrate that it is effective and efficient.

4.1 Real-life social networks

The first real network was about the Slovene Parliamentary Party (SPP) in 1994 and included 10 parties in total [38]. The distances between these parties were designed ranging from − 3 to 3. The community structure of this network is so clear that we can perfectly discover it through the simple eigen-partitioning heuristics [28]. In Fig. 1a, the leading eigenvector of the signed modularity matrix is used to assign the colors, ranging from green to red, to the vertices. By setting $c=2$, the split of the singed network returned by our algorithm is completely consistent with the true community structure, as shown in Fig. 1b. All vertices in the network are assigned to one of two communities. However, the vertex “SNS” marked by the bold border falls into the circle group and the square group with probability 0.0238 and 0.9762 at the same time. This says that the vertex is the overlapping vertex of two communities, as its bridgeness is as high as 0.0476 and community entropy is 0.1623. It is clear to find that two vertices “ZS-ESS” and “DS” are connected with this vertex by negative edges in the same community.

The second experiment is on the Gahuku-Gama Subtribes (GGS) network. This network includes 16 Gahuku-Gama subtribes, which were engaged in warfare with one another in a particular area in 1954. The positive and negative edges stand for political alliance and enmities, respectively. Unlike the first network, the communities of this network cannot be correctly discovered by the eigen-partitioning method. We see from Fig. 2a that the vertex “GAVEV” is misclassified because of its corresponding negative value in the leading eigenvector of the signed modularity matrix. The community with larger size can be further divided into two clusters when we repeat the partitioning procedure. But the misclassification is unable to be revised. Figure 2c shows the three groups categorized by our method, and they are in good agreement with the known communities. In addition, the probabilities of the vertex “MASIL” belonging to the circle group and the square group are 0.7371 and 0.2629, respectively. This leads to its high value of bridgeness 0.3530 and group entropy 0.5244. This indicates that this vertex is the overlapping vertex of these two groups, as there are two positive connections between vertex “MASIL” and “NAGAM” and “UHETO,” respectively.

The experiments show that our method can not only offer the partitions of the two networks that are identical to ones stemming from the sociological studies, but also can facilitate the discovery of their overlapping community structures and the identification of instable vertices that reside on the watersheds between different communities.

4.2 Synthetic signed networks

We further test our algorithm with a set of synthetic networks having controlled community structure and compare it with other approaches. Similar to the previous work [23, 30], we generate these networks at random and denote them by

$$\begin{aligned} \mathrm {SN}\big ((N_1, N_2, \dots , N_c), N_c, k, p_{\rm in}, p_+, p_-\big ) \end{aligned}$$

where $N_i$ is the number of vertices in cluster i, $N_c$ is the number of clusters, k is the average degree of vertices, $p_{\rm in}$ is the probability of vertices linking with each other in the same cluster, $p_+$ is the probability of vertices in different clusters connected by positive links, and $p_-$ is the probability of vertices in same group connected by negative links. Clearly, the parameter $p_{\rm in}$ controls the cohesion of the clusters and the other two parameters $p_+$ and $p_-$ make the community structure to be indistinct. For simplicity, we denote it as $\mathrm {SN}(N, N_c, k, p_{\rm in}, p_+, p_-)$ when the number of vertices in each cluster is identical.

As a beginning, we consider four balanced synthetic networks. Figure 3 successively gives the reordered adjacency matrices of these networks SN(32, 4, 16, 0.8, 0, 0), SN(32, 4, 16, 0.1, 0, 0), SN(32, 20, 16, 0.8, 0, 0) and SN([32, 48, 64, 80], 4, 16, 0.8, 0, 0) according to their indicator matrix Z. In all the cases, our method can achieve a satisfactory performance, ignoring the density of links in each community, the number of the communities and the sizes of the communities. In particular, it works pretty well even if the community structure is much less cohesive when $p_{\rm in} = 0.1$ (see Fig. 3b). It is clear to see that the matrix Z is very close to the ideal community indicator matrix in all the situations. We then conduct a similar study on two unbalanced networks SN (32, 4, 16, 0.8, 0.2, 0.2) and SN([32, 48, 64, 80], 4, 16, 0.8, 0.2, 0.2) shown in Fig. 4. As expected, their community structures can be successfully discovered by the proposed method regardless of the sizes of the communities. The experimental results indicate that the proposed method is rather insensitive to the noise in the signed networks.

So far, all the experiments were conducted on networks that include dozens or hundreds of vertices. To further check our algorithm’s scalability, we test it on a group of large-size synthetic networks generated by the above-mentioned strategy. For these synthetic networks, we only take their hard partition into account. We introduce the normalized mutual information (NMI) to evaluate the performance of our algorithm. Suppose that $c_1$ and $c_2$ are the true and detected community structure, this measure is computed as

$$\begin{aligned} \mathrm {NMI}(c_1, c_2) = \frac{\sum \nolimits _{i=1}^c \sum \nolimits _{j=1}^c n_{ij} \ln \frac{n_{ij}n}{n^{(1)}_i n^{(2)}_j}}{\sqrt{\left( \sum \nolimits _{i=1}^c n^{(1)}_i \ln \frac{n^{(1)}_i}{n}\right) \left( \sum \nolimits _{i=1}^c n^{(2)}_i \ln \frac{n^{(2)}_i}{n}\right) }}, \end{aligned}$$

(17)

where n is the number of vertices in network, $n_{ij}$ is the number of vertices belonging to the true community i and classified into the community i, $n^{(1)}_i$ and $n^{(2)}_i$ are the numbers of vertices belonging to the true community i and assigned to the detected community i, respectively. The partition obtained by the algorithms is better if the NMI value is larger.

Once again, we generate balanced and unbalanced synthetic signed networks. For the balanced network, we use the generation model $\mathrm {SN}(1000, 4, 500, p_{\rm in}, 0, 0)$ in which $p_{\rm in}$ gradually increases from 0 to 1. Figure 5 compares the performance of our method and three other approaches, i.e., the finding and extracting community (FEC) method [30], the signed modularity maximization through simulated annealing (SMMSA) [17, 27] and the eigen-partitioning heuristics with the Kernighan–Lin strategy (EigPart+KL) [28]. We see clearly that both the SMMSA method and our SMON approach are able to exactly discover the communities in all cases. When $0 \le p_{\rm in} \le 0.1$, the NMI of our method is still acceptable, namely, more than 0.92, even though it is slightly lower than that of the SMMSA method. The rest of two approaches, however, can only return satisfactory results when $p_{\rm in}$ is large enough. All of them suffer from a rapid deterioration along with $p_{\rm in}$ becomes increasingly smaller. The NMI of the FEC algorithm begins to drop when $p_{\rm in}$ is smaller than 0.8 and then quickly decreases to less than 0.2 when $p_{\rm in} = 0.5$ and even approximately is equal to 0 when $p_{\rm in}$ is smaller than 0.3. Moreover, the EigPart+KL method can always achieve a competitive performance with our SMON method when $0.3 \le p_{\rm in} \le 1$, but quickly declines as $p_{\rm in}$ approaches 0.

We then set the parameter $p_{\rm in} = 0.8$ and increase the other two parameters $p_+$ and $p_-$ from 0 to 0.5. It is clear that the synthetic networks are all unbalanced in this situation. Figure 6 summarizes the results obtained by our method and three other algorithms. As we can see, the SMMSA method and the SMON method can exhibit a competitive performance, which outshine the other two approaches consistently and significantly sometimes. Our method is able to give a split of the signed networks when $0 \le p_{+} \le 0.3$ or $0 \le p_{-} \le 0.5$, whose NMI is more than 0.2. In contrast, the NMIs of the FEC method and the EigPart+KL method suddenly collapse and continuously decrease once $p_{+}$ is larger than 0.3.

4.3 Optimal number of communities

We suppose that the number of communities c is known in advance. However, we have no idea about this information in many cases. For a given network, it is required to propose a strategy to estimate the community number. Recall that the modularity (3) is a useful tool to evaluate the fitness of a partition to a signed network. The larger the modularity is, the better the partitioning becomes. As a result, we can choose the optimal number of the communities as follows. We run the SMON algorithm with different values c varying from 2 to a sufficiently large integer. The value corresponding to the largest modularity can be referred to as the optimal number. As illustrated in Fig. 7, this strategy can find the true number of communities in the Slovene Parliamentary Party network, the Gahuku-Gama Subtribes network, and six aforementioned synthetic networks.

5 Conclusions

In this paper, we first show that the problem of community detection in signed networks is identical to a combinatorial optimization regarding the trace of a matrix called the signed modularity matrix. We relax this optimization problem preserving the orthogonal and nonnegative constraints, and introduce a multiplicative update rule to solve it. The solution is a close approximation to the genuine community indication matrix and therefore is applicable for dividing vertices into communities directly. Furthermore, the rows of the solution after the normalization can be represented as the probabilities that each vertex falls into different communities. This information facilitates the discovery of the overlapping structure of the network and the identification of vertices locating on the boundaries between different communities. Experimental results on two real social networks and a set of synthetic signed networks validate that our algorithm can not only detect their community structures accurately, but also help to discover their overlapping structures.

Notes

How to choose an optimal value of c will be discussed in Sect. 4.3, and we take it as a given here.

References

Palla G, Barabási A-L, Vicsek T (2007) Quantifying social group evolution. Nature 446(7136):664–667
Article Google Scholar
Traud AL, Kelsic ED, Mucha PJ, Porter MA (2011) Comparing community structure to characteristics in online collegiate social networks. SIAM Rev 53(3):526–543
Article MathSciNet Google Scholar
Flake GW, Lawrence SR, Giles CL, Coetzee FM (2002) Self-organization and identification of web communities. IEEE Comp 35(3):66–70
Article Google Scholar
Guimerà R, Nunes Amaral LA (2005) Functional cartography of complex metabolic networks. Nature 433(7028):895–900
Article Google Scholar
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Article Google Scholar
Huss M, Holme P (2007) Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks. IET Syst Biol 1(5):280–285
Article Google Scholar
Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256
Article MathSciNet Google Scholar
Porter MA, Onnela J-P, Mucha PJ (2009) Communities in networks. Not Am Math Soc 56(9):4294–4303
MathSciNet MATH Google Scholar
Fortunato S (2009) Community detection in graphs. Phys Rep 486(3):75–174
MathSciNet Google Scholar
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Article MathSciNet Google Scholar
Newman MEJ (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74:036104
Article MathSciNet Google Scholar
Newman MEJ, Leicht EA (2007) Mixture models and exploratory analysis in networks. Proc Natl Acad Sci USA 104:9564–9569
Article Google Scholar
Holme P, Liljeros F, Edling CR, Kim BJ (2003) Network bipartivity. Phys Rev E 68:056107
Article Google Scholar
Newman MEJ, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69:026113
Article Google Scholar
Guha R, Kumar R, Raghavan P, Tomkins A (2004) Propagation of trust and distrust. In: Proceedings of 13th international conference on World Wide Web, pp 403–412
Kunegis J, Lommatzsch A, Bauckhage C (2009) The slashdot zoo: mining a social network with negative edges. In: Proceedings of 18th international conference on World Wide Web, pp 741–750
Traag VA, Bruggeman J (2009) Community detection in networks with positive and negative links. Phys Rev E 80:036115
Article Google Scholar
Mason MJ, Fan G, Plath K, Zhou Q, Horvath S (2009) Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells. BMC Genom 10:327
Article Google Scholar
Heider F (1946) Attitude and cognitive organization. J Psychol 21:107–112
Article Google Scholar
Cartwright D, Harary F (1956) Structural balance: a generalisation of Heider’s theory. Psychol Rev 63:277–293
Article Google Scholar
Davis JA (1967) Clustering and structural balance in graphs. Human Relat 20:181–187
Article Google Scholar
Kaplan TD, Forrest S (2008) arXiv preprint arXiv:0801.3290
Jiang JQ (2015) Stochastic block model and exploratory analysis in signed networks. Phys Rev E 91(6):062805
Article Google Scholar
Szell M, Lambiotte R, Thurner S (2010) Multirelational organization of large-scale social networks in an online world. Proc Nat Acad Sci USA 107:13636–13641
Article Google Scholar
Leskovec J, Huttenlocher D, Kleinberg J (2010) Engaging with Massive Online Courses. In: Proceedings of 19th international conference on World Wide Web, pp 641–650
Huang ZX, Qiu YH (2010) A multiple-perspective approach to constructing and aggregating citation semantic link network. Future Gen Comput Syst 26(3):400–407
Article Google Scholar
Gómez S, Jensen P, Arenas A (2009) Analysis of community structure in networks of correlated data. Phys Rev E 80:016114
Article Google Scholar
Anchuri P, Magdon-Ismail M (2012) Communities and balance in signed networks: A spectral approach. In: Proceedings of international conference on advances in social networks analysis and mining, pp 235–242
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307
Article Google Scholar
Yang B, Cheung WK, Liu JM (2007) Community mining from signed social networks. IEEE Trans knowl Data Eng 19(10):1333–1348
Article Google Scholar
Amelio A, Pizzuti C (2013) Community mining in signed networks: A multiobjective approach. In: Proceedings of the 2013 IEEE/ACM international conference on advances in social networks analysis and mining, pp 95–99
Luo D, Ding CHQ, Huang H, Li T (2009) Non-negative Laplacian Embedding. In: Proceedings of 9th IEEE international conference on data mining, pp 337–346
Nie F, Ding C, Luo D, Huang H (2010) Multi-subspace representation and discovery. In: Proceedings of European conference on machine learning and principles and practice of knowledge discovery in databases, pp 451–466
Jiang JQ, McQuay LJ (2012) Modularity functions maximization with nonnegative relaxation facilitates community detection in networks. Physica A 391(3):854–865
Article Google Scholar
Gfeller D, Chappelier JC, De Los Rios P (2005) Finding instabilities in the community structure of complex networks. Phys Rev E 72(5):056135
Article Google Scholar
Nepusz T, Petróczi A, Negyessy L, Bazsó F (2008) Fuzzy communities and the concept of bridgeness in complex networks. Phys Rev E 77:016107
Article MathSciNet Google Scholar
Psorakis I, Roberts S, Ebden M, Sheldon B (2011) Overlapping community detection using Bayesian non-negative matrix factorization. Phys Rev E 83:066114
Article Google Scholar
ropivnik S, Mrvar A (1996) An analysis of the slovene parliamentary parties network. In: Ferligoj A, Kramberger A (eds) Developments in statistics and methodology, pp 209–216
Read KE (1954) Cultures of the Central Highlands, New Guinea. Southwest J Anthropol 10:1–43
Article Google Scholar
Huang C, Hu B, Yang R, Wu G (2018) SNMFP: A two-stage approach to community detection in signed networks. Phys A: Stat Mech Appl 510:754–764
Article Google Scholar
Jiang F, Zhao X, Bai Q (2018) Simulation and stability analysis of conflict events between employees and organization based on the social network. Concurrency and Computation: Practice and Experience, p e5097

Download references

Author information

Authors and Affiliations

School of Information Engineering, Kunming University, Kunming, 650214, China
Yunfei Zhang, Yuyan Liu & Jie Song
College of Economics, Shenzhen University, Shenzhen, 518060, China
Xiaomeng Ma

Authors

Yunfei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomeng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jie Song
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaomeng Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Y., Liu, Y., Ma, X. et al. Community detection in signed networks by relaxing modularity optimization with orthogonal and nonnegative constraints. Neural Comput & Applic 32, 10645–10654 (2020). https://doi.org/10.1007/s00521-019-04597-9

Download citation

Received: 02 July 2019
Accepted: 24 October 2019
Published: 06 November 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00521-019-04597-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Community detection in signed networks by relaxing modularity optimization with orthogonal and nonnegative constraints

Abstract

Similar content being viewed by others

Community Detection Based on Regularized Semi-Nonnegative Matrix Tri-Factorization in Signed Networks

A Regularized Convex Nonnegative Matrix Factorization Model for signed network analysis

An algorithm based on positive and negative links for community detection in signed networks

1 Introduction

2 Preliminaries