A graph convolutional fusion model for community detection in multiplex networks

Cai, Xiang; Wang, Bang

doi:10.1007/s10618-023-00932-w

A graph convolutional fusion model for community detection in multiplex networks

Published: 06 April 2023

Volume 37, pages 1518–1547, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

A graph convolutional fusion model for community detection in multiplex networks

Download PDF

906 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Community detection is to partition a network into several components, each of which contains densely connected nodes with some structural similarities. Recently, multiplex networks, each layer consisting of a same node set but with a different topology by a unique edge type, have been proposed to model real-world multi-relational networks. Although some heuristic algorithms have been extended into multiplex networks, little work on neural models have been done so far. In this paper, we propose a graph convolutional fusion model (GCFM) for community detection in multiplex networks, which takes account of both intra-layer structural and inter-layer relational information for learning node representation in an interwoven fashion. In particular, we first develop a graph convolutional auto-encoder for each network layer to encode neighbor-aware intra-layer structural features under different convolution scales. We next design a multiscale fusion network to learn a holistic version of nodes’ representations by fusing nodes’ encodings at different layers and different scales. Finally, a self-training mechanism is used to train our model and output community divisions. Experiment results on both synthetic and real-world datasets indicate that the proposed GCFM outperforms the state-of-the-art techniques in terms of better detection performances.

Gumbel-SoftMax based graph convolution network approach for community detection

Article 29 June 2023

Beyond Laplacian Smoothing for Semi-supervised Community Detection

Weakly-supervised learning for community detection based on graph convolution in attributed networks

Article 09 August 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

1.1 Background

Community detection is to partition a network into several components, each of which, so-called a community, contains densely connected nodes with some structural similarities (Fortunato 2010; Bouguessa et al. 2010). As an important task in network science, community detection can find many real-world applications in diverse domains, like biology, chemistry, transportation, sociology (Garcia et al. 2018; Chang et al. 2016; Liu and Wang 2022; Magalingam et al. 2015). For example, community detection in biological networks can be used to analyze the interaction of brain regions and its influence on brain functions (Garcia et al. 2018). Community detection in social networks can help identifying crime organizations (Magalingam et al. 2015).

For its wide applications, many algorithms have been developed for community detection, including traditional heuristic algorithms and modern deep learning ones. Those heuristic algorithms, such as GN (Newman and Girvan 2004) and Louvain Blondel et al. (2008), usually optimize an objective function iteratively to improve the quality of detected communities. Modern deep learning-based algorithms, such as CommunityGAN (Jia et al. 2019), DANMF (Ye et al. 2018), and MRFasGCN (Jin et al. 2019), encode graph structural information for learning nodes’ representations that are used to compute community divisions.

Although lots of progresses have been achieved in the field (Souravlas et al. 2021; Su et al. 2022), most algorithms for community detection are developed in the context of single layer networks. A single layer network, or called a monoplex network in this paper, can be characterized by its unique type of edge connecting two nodes. However, in many practical scenarios multiple types of relations do exist in between two entities. For example, people can establish either friend relation or coworker relation, or both in a social network. For better representing different relations, it is recently proposed to use a multiplex network consisting of multiple layers of networks each representing only one relation type. Multiplex networks can be applied to lots of applications such as multi-behavior recommendation (Xia et al. 2021) and community detection (Tang et al. 2009; Magnani et al. 2021).

A multiplex network is composed of L network layers, $G_{l}(V,E_{l})$, $l=1,2,...,L$. Figure 1 illustrates a 3-layer network for community detection. The number of nodes in each layer are the same, denoted as N. A multiplex network can be also represented as multiple adjacency matrices, $\textbf{A}_{l} \in \mathbb {R}^{N \times N}$, for $l=1,2,...,L$. Similar to monoplex networks, community detection in multiplex networks aims to partition the node set V into M communities, $\{C_{1},C_{2},...,C_{M}\}$. The detected communities can be seen as the comprehensive associations of nodes under diverse types of relations.

In recent years, some solutions have been proposed to address community detection in multiplex networks. A few of them propose to first convert this problem into the classical setting of community detection in a single layer network (Berlingerio et al. 2011; Suthers et al. 2013; Shao et al. 2022). For example, Berlingerio et al. (2011) first reduce a multiplex network into a single layer one by judging whether the nodes are connected in at least one network layer and then apply the random walk method to detect communities from this single layer network. Some algorithms directly apply a greedy modularity-maximization strategy on a multiplex network (Mucha et al. 2010; Tagarelli et al. 2017; Pramanik et al. 2017; Paul and Chen 2022). For example, Tagarelli et al. (2017) propose a multilayer modularity function to find consensus community structures by a greedy search approach. Some other algorithms apply matrix decomposition to extract features from a multiplex network (Ma et al. 2018; Gligorijević et al. 2019; Chen et al. 2019). For example, Ma et al. (2018) propose a nonnegative matrix factorization (NMF) algorithm called s2-jNMF, which uses a joint NMF for each network layer to obtain multiplex basis matrix for community discovering.

1.2 Motivation

The limitation of existing approaches on community detection in multiplex networks mainly lies in the following two aspects. On the one hand, lots of algorithms (Berlingerio et al. 2011; Boutemine and Bouguessa 2017; Pramanik et al. 2017; Interdonato et al. 2017) have considered how to extend traditional heuristic ones in single layer networks for multiplex networks, which could lead to the loss of some inter-layer relational information. For example, Boutemine and Bouguessa (2017) flatten a multiplex network into a single one and then utilize the Label Propagation Algorithm (LPA) (Raghavan et al. 2007). However, a flattened edge cannot describe all latent relations between two nodes in a multiplex network. On the other hand, some representation learning algorithms used in multiplex networks (Park et al. 2020; Jing et al. 2021) can be applied for community detection as their downstream tasks, but they ignore the critical factors of building multilayer community structures. For example, Jing et al. (2021) propose to maximize the mutual information between the local node embedding and the global summary for node classification and community detection. But they mainly focus on how to train a general node embedding. Besides, they are normally based on the prior knowledge of nodes’ attributes for representation learning. How to detect communities from only topological structures of a multiplex network without extra node information is a challenge task for most of such node representation learning models.

To overcome such limitations, we consider the following two challenging issues: (a) What kind of structural or topological characteristics are suitable for a community? (b) How to extract and fuse features from each single layer to represent a community in a multiplex network? We note that two kinds of characteristics are important for a community, i.e. path-aware topological characteristics and neighbor-aware structural characteristics. The former can be learned by sampling short node sequences via random walk and maximizing nodes’ co-occurrence probability in sequences. The latter can be learned by merging the embedding of a node from its neighbors (Liu et al. 2022). Besides, neighbors of different scales can be used for embedding learning. We also note that intra-layer and inter-layer not only can be extracted but also can be fused to learn nodes’ representations for community detection via neural network models.

Motivated from such considerations, this paper considers to learn not only neighbor-aware intra-layer structural information but also semantics-aware inter-layer relational information to learn nodes’ representation for community detection. In many cases, little prior knowledge is available about which type of information. For example, each layer of the temporal multiplex network represents a time slice and cannot provide extra information from the semantic perspective. Therefore, intra-layer or inter-layer, is more important for community detection in multiplex networks. We further argue that the two kinds of information should be jointly exploited for node representation learning in an interwoven manner, rather than in a separate way. In response to these arguments, we design a graph convolutional fusion model (GCFM) for community detection in multiplex networks.

1.3 Contribution

The novelties of our proposed GCFM framework include encoding and fusing intra-layer structural information with inter-layer relational information for learning node representation in an interwoven fashion. Specifically, there are three key contributions in our GCFM: (1) How to learn node topological characteristics in each single layer? In the GCFM, we first employ some graph embedding technique to obtain node initial embedding for each network layer and next design a module of graph convolutional auto-encoder (GCA), executed on a per network layer basis, to encode neighbor-aware intra-layer structural information under different convolution scales .^{Footnote 1} (2) How to better learn inter-layer semantics for community detection? In the GCFM, we design a module of multiscale fusion network (MFN) to fuse nodes’ encodings at different layers and different scales for learning a holistic version of nodes’ representations. (3) How to detect community based on the node feature space? In the GCFM, we use a self-training mechanism to train our model and output community divisions for a multiplex network.

Compared with those traditional heuristic studies (Tang et al. 2009; Boutemine and Bouguessa 2017; Gligorijević et al. 2019), our GCFM uses the random walk to generate node initial embedding for subsequent encoding and fusing, instead of optimizing a certain indicator such as modularity, which avoids falling into local optimum. For those deep learning-based methods which only use auto-encoder (Song and Thiagarajan 2019), our GCFM designs the GCA and MFN that consider the influence of different convolutional scales. For other node embedding based methods which intend to train a general node embedding for downstream tasks such as node classification, community detection etc. (Park et al. 2020; Jing et al. 2021), our GCFM applies a community specific training loss and pays attention to the community characteristics when learning nodes’ representations. Extensive experiments are conducted on both synthetic and real-world datasets. Results validate that our GCFM achieves higher precision and detects communities with better modularity over peer competitors in most cases.

The main contributions are summarized as follows:

Propose a graph convolutional fusion model (GCFM) to encode and fuse intra-layer and inter-layer information for representation learning in an interwoven fashion for community detection in multiplex networks.
Develop a graph convolutional auto-encoder (GCA) to encode neighbor-aware intra-layer structural information at different scales.
Design a multiscale fusion network (MFN) to fuse and encode intra-layer structural information and inter-layer relational information at different layers and different scales.
Experiment on both synthetic and real-world datasets to validate the superiority of our proposed model.

The rest of the paper is organized as follows. Section 2 reviews the related work. The proposed GCFM is presented in Sect. 3 and evaluted in Sect. 4 and 5. Section 6 concludes the paper.

2 Related work

2.1 Multiplex network community detection

Many traditional heuristic algorithms for community detection in monoplex networks have been extended into multiplex ones, which can be generally categorized into three types: flattening, aggregation, and direct methods (Huang et al. 2021).

Flattening methods (Berlingerio et al. 2011; Gao et al. 2019) first convert (flatten) a multiplex network into a monoplex one by, say for example, setting new edge weights and then use any monoplex algorithm to find communities. For example, Gao et al. (2019) propose a modified particle competition model, which constructs a extended adjacency matrix to represent a multiplex network for community detection.

Aggregation methods (Tang et al. 2009; Tagarelli et al. 2017; Ali et al. 2019) capture structural information from each single layer and aggregate them into a comprehensive version of node feature for community detection. For example, Ali et al. (2019) utilize a variational Bayes method to extract community structures from each network layer and aggregate them to determine the final community divisions.

Direct methods (Ma et al. 2018; Chen et al. 2019; Mercorio et al. 2019) work directly on a multiplex network to detect communities. For example, Ma et al. (2018) propose a s2-jNMF algorithm, which simultaneously factorizes the matrices from each network layer to obtain community divisions.

2.2 Deep learning community detection

Many deep learning techniques have been applied for community detection in monoplex networks. Compared to most traditional methods, the advantage of deep learning lies in that it can automatically learn node representations via, say for example, encoding graph structural information (Liu et al. 2020).

Deep neural networks, like convolutional neural networks (Sperlí 2019), auto-encoders (Cao et al. 2018), and generative adversarial networks (Jia et al. 2019), can help capturing structural relations in between nodes. For example, Jia et al. (2019) propose a GAN-based (Generative Adversarial Network) model to encode the membership strength of nodes to communities for community detection.

Deep graph embedding-based models convert nodes into a low-dimensional vector space to preserve graph structural information, such as deep non-negative matrix factorization (Ye et al. 2018), deep sparse filtering (Xie et al. 2018), and community embedding (Tu et al. 2018). For example, Ye et al. (2018) propose a DANMF model, which learns the hierarchical mappings between the original network and the community distribution by using a deep auto-encoder.

Graph neural network-based models can extract community structures from raw graph data by fusing graph mining and deep learning techniques (Jin et al. 2019; Wang et al. 2019; Bo et al. 2020). For example, Jin et al. (2019) propose an end-to-end model for semi-supervised community detection, which integrates a graph convolutional network with the Markov Random Field technique.

3 Graph convolutional fusion model

Figure 2 presents the framework of our graph convolutional fusion model (GCFM), which contains four modules: (1) node initial embedding, (2) graph convolutional auto-encoder (GCA), (3) multiscale fusion network (MFN), and (4) self-training community detection.

3.1 Node initial embedding

We employ some graph embedding technique to obtain node initial embedding per network layer. Since nodes are not with attributes in our problem setting, we choose the DeepWalk (Perozzi et al. 2014) to help capturing latent path-aware topological features for initializing a node embedding. We note that other techniques can also be used, and we will compare a few in our experiments.

For the l-th network layer, we use $\textbf{X}_{l} \in \mathbb {R}^{N \times d_{0}}$ to denote the node initial embedding matrix obtained from the DeepWalk, where $d_{0}$ is the dimension of initial embedding. Although the initial embeddings can be directly used for community detection, such path-aware topological embeddings may not be able to well describe latent neighborhood information of a node, which, however, is often the key to the community detection task. To capture such local association characteristics, we next design the GCA module for encoding neighbor-aware structural information.

3.2 Graph convolutional auto-encoder

The GCA module executes aggregation operations to update a node’s embedding from its neighbors’, by which local structure information can be encoded into nodes’ embeddings. Furthermore, by applying multiple scales of the aggregation operation, not only one-hop neighbors but also multi-hop neighbors as well their associate information through edges in between neighbors can be encoded for representing a node local structure at different scales.

The GCA aggregation operations are executed on each network layer independently. Take the l-th network layer for example. Let $\textbf{A}_{l} \in \mathbb {R}^{N \times N}$ denote the adjacency matrix of the l-th network layer. For the first scale of GCA, the input includes $\textbf{A}_{l}$ and $\textbf{X}_l$, and the output is given by

$$\begin{aligned} \textbf{H}_{l,1} = ReLU(\widetilde{\textbf{D}}^{-\frac{1}{2}}_{l} \widetilde{\textbf{A}}_{l} \widetilde{\textbf{D}}_{l}^{-\frac{1}{2}} \textbf{X}_{l} \textbf{W}_{l,0}), \end{aligned}$$

(1)

where $\widetilde{\textbf{D}}^{-\frac{1}{2}}_{l} \widetilde{\textbf{A}}_{l} \widetilde{\textbf{D}}_{l}^{-\frac{1}{2}}$ is a normalized Laplacian matrix. $\widetilde{\textbf{A}}_{l} = \textbf{A}_{l} + \textbf{I}$ is the adjacency matrix plus the node self-connection matrix and $\widetilde{\textbf{D}}_{l(ii)} = \sum _{j} \widetilde{\textbf{A}}_{l(ij)}$. $\textbf{W}_{l,0} \in \mathbb {R}^{d_{0} \times d_{1}}$ is a learnable weight matrix. We choose the $ReLU(\cdot )$ function as the activation function.

The forward encoding process in the k-th GCA scale is as follows:

$$\begin{aligned} \textbf{H}_{l,k} = ReLU(\widetilde{\textbf{D}}^{-\frac{1}{2}}_{l} \widetilde{\textbf{A}}_{l} \widetilde{\textbf{D}}_{l}^{-\frac{1}{2}} \textbf{H}_{l,k-1} \textbf{W}_{l,k-1}), \end{aligned}$$

(2)

where $\textbf{H}_{l,k} \in \mathbb {R}^{N \times d_{k}}$ is the l-th layer node encoding after the k-th GCA scale. It is obtained from the node encoding $\textbf{H}_{l,k-1} \in \mathbb {R}^{N \times d_{k-1}}$ of the previous scale. $\textbf{W}_{l,k-1} \in \mathbb {R}^{d_{k-1} \times d_{k}}$ is a learnable weight matrix.

The output of each GCA scale can be regarded as how large the neighborhood of a node will be included into its local structural encoding. Figure 3 illustrates how different GCA scales can include different neighborhoods of a node for encoding its local structure information, which could be analogy to encode a node-centric max-k-neighbor subgraph. The GCA module encodes multiscale local structures for each node; Such structural encodings can be exploited for discriminating neighborhood associations at different scales, which are key components for the community detection task.

In the decoding process, we reconstruct the edges in between nodes in each single layer by using an inner product decoder:

$$\begin{aligned} \widehat{\textbf{A}}_{l} = sigmoid({\textbf{H}_{l,K}} \textbf{H}_{l,K}^{\textrm{T}}), \end{aligned}$$

(3)

where $\widehat{\textbf{A}}_{l}$ is the reconstructed adjacency matrix of the l-th layer. $\textbf{H}_{l,K} \in \mathbb {R}^{N \times d}$ is the K-scale GCA node encoding.

We apply a binary cross entropy loss to minimize the difference between $\widehat{\textbf{A}}_{l}$ and $\textbf{A}_{l}$ for the l-th layer:

$$\begin{aligned} L_{rl} = -\frac{1}{N} \sum _{i=1}^{N} \sum _{j=1}^{N} [\textbf{A}_{l(ij)} \log (\widehat{\textbf{A}}_{l(ij)})+(1- \textbf{A}_{l(ij)}) \log (1-\widehat{\textbf{A}}_{l(ij)})]. \end{aligned}$$

(4)

The total reconstruction loss is the sum of reconstruction error of each layer:

$$\begin{aligned} L_{r} = \sum _{l}^{L} L_{rl}. \end{aligned}$$

(5)

3.3 Multiscale fusion network

The GCA module is executed per network layer basis, that is, its node encodings at different neighborhood scales are for different network layers. As the edges in each layer express a kind of specific semantic relations among nodes, we need to take into account all kinds of semantic relations to output a comprehensive community division for a multiplex network. To this end, we propose the MFN module to fuse node encodings not only from different GCA scales, but also from different network layers.

As we have no a prior knowledge to decide which layer in a multiplex network is more important, we first design the multilayer fusion of node structural encodings as a simple sum operation for the k-th GCA scale:

$$\begin{aligned} \textbf{H}_{k} = \sum _{l=1}^{L} \textbf{H}_{l,k}. \end{aligned}$$

(6)

$\textbf{H}_{k} \in \mathbb {R}^{N \times d_{k}}$ is called the k-th scale node multilayer encoding.

We next use a fully connected neural network to output different scales of node representation $\textbf{Z}_k$. The initial scale of MFN fusion transforms $\textbf{H}_{1}$ to the hidden state in the next scale:

$$\begin{aligned} \textbf{Z}_{1}= & {} \textbf{H}_{1}, \end{aligned}$$

(7)

$$\begin{aligned} \textbf{Z}_{2}= & {} ReLU(\textbf{W}_{1}\textbf{Z}_{1}^{\textrm{T}} + \textbf{b}_{1})^{\textrm{T}}, \end{aligned}$$

(8)

where $\textbf{Z}_{1} \in \mathbb {R}^{N \times d_{1}}$ is the initial scale of node representation, and $\textbf{Z}_{2} \in \mathbb {R}^{N \times d_{2}}$ the second scale of node representation. $\textbf{W}_{1} \in \mathbb {R}^{d_{2} \times d_{1}}$ and $\textbf{b}_{1} \in \mathbb {R}^{d_{2} \times N}$ are learnable parameters.

In the full connected units, each node exchanges its encodings with the other nodes’. We emphasize the topological information by adding the aggregated multilayer node encoding in each MFN scale. For the k-th ($k>1$) MFN scale, we update the node representation $\textbf{Z}_{k}$ by inputting the multilayer node encoding $\textbf{H}_{k-1}$ and the multiscale node representation $\textbf{Z}_{k-1}$ into a full connected unit. The forward process can be written as:

$$\begin{aligned} \textbf{Z}_{k} = ReLU(\textbf{W}_{k-1}(\textbf{Z}_{k-1} + \textbf{H}_{k-1})^{\textrm{T}} + \textbf{b}_{k-1})^{\textrm{T}}, \end{aligned}$$

(9)

where $\textbf{Z}_{k} \in \mathbb {R}^{N \times d_{k}}$ and $\textbf{Z}_{k-1}, \textbf{H}_{k-1} \in \mathbb {R}^{N \times d_{k-1}}$. The weight matrix $\textbf{W}_{k-1} \in \mathbb {R}^{d_{k} \times d_{k-1}}$ and bias $\textbf{b}_{k-1} \in \mathbb {R}^{d_{k} \times N}$ are learnable parameters.

The final MFN scale is to fuse $\textbf{Z}_{K} $ and $\textbf{H}_{K}$ to output final node representation $\textbf{Z} \in \mathbb {R}^{N \times d_{K}}$

$$\begin{aligned} \textbf{Z} = \textbf{Z}_{K} + \textbf{H}_{K}, \end{aligned}$$

(10)

which will be input to the next self-training module for community detection.

3.4 Self-training community detection

Community detection is often an unsupervised problem in the real world. Inspired by Xie et al. (2016), we adapt a self-training mechanism to train our GCFM.

For the i-th node, its final node representation $\textbf{z}_{i}$, the i-th row of the node representation $\textbf{Z}$, can be seen as an embedding in a feature space $\mathbb {Z}$. Let ${\varvec{\nu }}_{j}$ denote the representation of the j-th community center, which is also an embedding in the space $\mathbb {Z}$. All the community center embeddings are initialized by using the K-means algorithm before training.

Assume the pre-specified number of communities as M. The computation of community center is as follows. We first randomly select M community centers. And then we repeat the following process until convergence: (a) For each node i, we find its class label $c_{i}$ which minimizes the distance between node i and each community center ${\varvec{\nu }}_{j}$, $j=1,2,...,M$.

$$\begin{aligned} c_{i}=\mathop {\arg \min }\limits _{j}(\Vert \textbf{z}_{i} - {\varvec{\nu }}_{j} \Vert ^{2}). \end{aligned}$$

(11)

(b) We next update each community center by recalculating the centroid of its belonging class.

$$\begin{aligned} {\varvec{\nu }}_{j}=\frac{1}{\vert C_{j} \vert } \sum \limits _{\textbf{z}_{i} \in C_{j}} \textbf{z}_{i}, \end{aligned}$$

(12)

where $C_{j}$ includes all the node representation with class label $c_{i}=j$.

We next use the Student’s t-distribution (Van der Maaten and Hinton 2008) as a kernel to measure the similarity between each node and each community center. The similarity $\textbf{q}_{ij}$ between a node i and a community j is computed by

$$\begin{aligned} \textbf{q}_{ij} = \frac{(1 + \Vert \textbf{z}_{i} - {\varvec{\nu }}_{j} \Vert ^{2})^{-1}}{\sum _{j'} (1 + \Vert \textbf{z}_{i} - {\varvec{\nu }}_{j'} \Vert ^{2})^{-1}}. \end{aligned}$$

(13)

It can be interpreted as the probability of assigning node i to community j. As such, we treat $\textbf{Q}=[\textbf{q}_{ij}] \in \mathbb {R}^{N \times M}$ as the distribution of the soft assignments of all nodes.

In order to detect cohesive communities, an objective is to enable each node closer to its belonging community center in terms of their similarity in the feature space $\mathbb {Z}$. This can be translated to obtain a high confidence and trustworthy distribution of soft assignments, denoted as the target distribution $\textbf{P} \in \mathbb {R}^{N \times M}$, where an element $\textbf{p}_{ij} \in \textbf{P}$ can be computed by raising $\textbf{q}_{ij}$ to the second power and then normalizing by frequency per community:

$$\begin{aligned} \textbf{p}_{ij} = \frac{\textbf{q}_{ij}^{2}/\textbf{f}_{j}}{\sum _{j'} \textbf{q}_{ij'}^{2}/\textbf{f}_{j'} }, \end{aligned}$$

(14)

where $\textbf{f}_{j} = \sum _{i} \textbf{q}_{ij}$ are soft community frequencies.

After acquiring soft assignment distribution $\textbf{Q}$ and target distribution $\textbf{P}$, we use the Kullback–Leibler divergence loss to measure the difference between the two community distributions.

$$\begin{aligned} L_{c} = KL(\textbf{P} \Vert \textbf{Q}) =\sum _{i} \sum _{j} \textbf{p}_{ij} \log \frac{\textbf{p}_{ij}}{\textbf{q}_{ij}}. \end{aligned}$$

(15)

By minimizing the KL divergence loss, our model can improve the distribution $\textbf{Q}$ under the supervision of the target distribution $\textbf{P}$, which is called self-training community detection. Note that $\textbf{P}$ should have the following properties: (a) a high accuracy of community division (b) an assigned community with high confidence (c) a normalized loss contribution of each centroid. Therefore, we choose $\textbf{Q}$ to construct $\textbf{P}$ because $\textbf{Q}$ is natural and flexible for its use of softer probabilistic targets.

For model training, we define the final loss of GCFM by

$$\begin{aligned} L = L_{r} + \lambda L_{c}, \end{aligned}$$

(16)

where $\lambda $ is a coefficient to balance two losses of network reconstruction and community detection.

At last, we obtain final community divisions from the optimized distribution $\textbf{Q}$. The community label of each node is assigned by:

$$\begin{aligned} g_{i} = \mathop {\arg \max }\limits _{j}(\textbf{q}_{ij}), \end{aligned}$$

(17)

where $g_{i}$ is the community label of node i.

3.5 Complexity anlaysis

Assume the number of nodes as N, the number of layers in a multiplex network as L, the depth of GCA as K, and the number of detected communities M. We consider the computational complexity of each module. The node initial embedding is generated by DeepWalk which contains the training of the Skip-gram model. Given the number of random walks $\rho $, walk length t, window size $\omega $, and initial embedding size $d_{0}$, its computation complexity is in the order of $\mathcal {O}(L\rho Nt\omega (d_{0}+d_{0}logN))$ according to Chen et al. (2018). GCA is a GCN-based model which has linear time complexity with the number of edges. Therefore, we focus on the number of edges in each layer $\vert E_{l} \vert , l=1,2,...,L$ and the encoding dimension of each scale in GCA $d_{k}, k=1,2,...,K$. The complexity of GCA is in the order of $\mathcal {O}(\sum _{l=1}^{L} \vert E_{l} \vert d_{0}d_{1}d_{2}...d_{K})$. The MFN consists of multiple fully connected unit and the initial input is the first scale of GCA. The time complexity of MFN is in the order of $\mathcal {O}(LNd_{1}^{2}d_{2}^{2}...d_{K}^{2})$. For the self-training module, the computation time mainly depends on Eq.(13). According to Xie et al. (2016), the computational complexity is $\mathcal {O}(NM+NlogN)$. To sum up, the total time complexity of GCFM is in the order of $\mathcal {O}(L\rho Nt\omega (d_{0}+d_{0}logN)+ \sum _{l=1}^{L} \vert E_{l} \vert d_{0}d_{1}d_{2}...d_{K} +LNd_{1}^{2}d_{2}^{2}...d_{K}^{2} +NM+NlogN)$, which has a linear relation with the number of layers and edges in each layer.

4 Experiment setup

4.1 Datasets

We conduct experiments on synthetic datasets and real-world datasets without ground truth community labels, respectively. For synthetic datasets, we use mLFR (Bródka 2016) which are controlled by the mixing parameter $\mu $ and a smaller $ \mu $ suggests more obvious community structures. We will vary it in our experiments. For real-world datasets, we use AUCs (Magnani et al. 2013), RM (Eagle and Pentland 2006), C.elegans (Chen et al. 2006), London (De Domenico et al. 2014), Vickers (Zhang et al. 2017), FFTWYT (Magnani and Rossi 2011), CKM (Coleman et al. 1957), Plasmodium (Stark et al. 2006), HumanHIV1 (Stark et al. 2006), and FriendFeed (Magnani and Rossi 2011) to conduct experiments. The details of datasets are as follows.

$\bullet $ mLFR (Bródka 2016): It is a benchmark tool for generating synthetic multiplex networks. The main parameter of mLFR is the mixing parameter $\mu $ which is the probability of a node connecting to another node in a different community. A smaller value of $ \mu $ suggests that a multiplex network contains more obvious community structures. Table 1 presents the parameter settings for the mLFR datasets in our experiment.

The following real-world multiplex networks are from different domains, and Table 2 summarizes their statistics.

$\bullet $ AUCs (Magnani et al. 2013): It is a 5-layer social network consisting of 61 nodes as employees in the Aarhus University and 620 edges as their social relations, including work together, lunch together, friendship on Facebook, off-line friendship, and co-authorship.

$\bullet $ RM (Eagle and Pentland 2006): It is 3-layer social network from the MIT Reality Mining project, which consists of 94 nodes as project participants and 1,385 edges as their social relations: including friendship, average proximity at work and average proximity outside lab.

$\bullet $ C.elegans (Chen et al. 2006): It is 3-layer neuronal network of the nematode "Caenorhabditis Elegans", consisting of 279 nonpharyngeal neurons and 5,863 synaptic connections, including three types, namely, electric connection, chemical monadic connection and chemical polyadic connection.

$\bullet $ London (De Domenico et al. 2014): It is a 3-layer traffic transportation network consisting of 369 nodes and 441 edges, where nodes are stations and three types of edges are transportation lines, including the underground lines, overground lines, and DLR.

$\bullet $ Vickers (Zhang et al. 2017): It is a 3-layer offline social network of 29 nodes as seventh grade students in a school and 740 edges for their social relations, including affinity in the class, best friends and working together.

$\bullet $ FFTWYT (Magnani and Rossi 2011): It is a 3-layer online social network, consisting of 6,407 users and 74,862 edges. The three layers correspond to three online social platforms, including Friendfeed, Twitter, and YouTube.

$\bullet $ CKM (Coleman et al. 1957): It is a 3-layer social network, consisting of 246 nodes and 1,551 edges. The data is collected from physicians in four towns in Illinois, Peoria, Bloomington, Quincy and Galesburg, which includes the physicians’ adoption of a new drug.

$\bullet $ Plasmodium (Stark et al. 2006): It is a 3-layer biological network, consisting of 1,203 nodes and 2,521 edges, which includes three types of interactions between genes and proteins, i.e. direct interaction, physical association and association.

$\bullet $ HumanHIV1 (Stark et al. 2006): It is a 5-layer biological network, consisting of 1,005 nodes and 1,355 edges. It considers different types of genetic interactions about HIV type 4 in the Biological General Repository for Interaction Datasets, i.e. physical association, direct interaction, colocalization, association, and suppressive genetic interaction.

$\bullet $ FriendFeed (Magnani and Rossi 2011): It is a 3-layer online social network, consisting of 21,006 nodes and 573,600 edges. It mainly contains interactions among users in Friendfeed collected over the two months, including commenting, liking, and following.

Table 1 The basic parameter setting of mLFR dataset

Full size table

Table 2 The statistics of real datasets

Full size table

4.2 Competitors

We compare our GCFM with the following competitors:

$\bullet $ PMM (Tang et al. 2009) is based on modularity optimization and matrix decomposition, which consists of structural feature extraction and cross-layer integration.

$\bullet $ MDLPA (Boutemine and Bouguessa 2017) introduces a constrained label propagation mechanism for multiplex networks.

$\bullet $ CSNMF (Gligorijević et al. 2019) introduces a collective factorization framework which uses symmetric NMF (Kuang et al. 2012) for each network layer and then generates a common feature representation by fusing layers to extract community structures.

$\bullet $ DH-Louvain (Shao et al. 2022) is a multi-greedy algorithm for community detection in multiplex networks which optimizes a weighted modularity density.

$\bullet $ M-DeepWalk (Song and Thiagarajan 2019) first constructs a supra graph for a multiplex network and then uses the DeepWalk (Perozzi et al. 2014) to obtain node representations that are input into an auto-encoder to extract cohesive structures in the latent space.

$\bullet $ DMGI (Park et al. 2020) extends Deep Graph Infomax (Velickovic et al. 2019) to multiplex networks and jointly integrate the nodes’ embeddings by a consensus regularization framework.

$\bullet $ HDMI (Jing et al. 2021) designs a joint supervision signal which includes both extrinsic and intrinsic mutual information to optimizes node representation in multiplex networks.

4.3 Metrics

For synthetic datasets, as they have ground truth labels, we use the commonly used evaluation metrics for supervised learning, including NMI (Normalized Mutual Information), ARI (Adjusted Rand Index), and Purity. For real-word datasets, as they do not contain ground truth labels, we adopt the widely used Modularity metric (Mucha et al. 2010) (also used in synthetic datasets) with two parameters, the resolution parameter $\gamma _{s}$ and the coupling parameter $\mathscr {C}_{jsr}$. The details of metrics are as follows.

NMI is used to trade-off the quality of communities against the number of communities:

$$\begin{aligned} NMI(\Omega ; C) = \frac{I(\Omega ; C)}{[H(\Omega )+H(C)]/2}, \end{aligned}$$

(18)

where $\Omega $ is the community division and C is the ground truth. $I(\Omega ; C)$ is the mutual information between $\Omega $ and C. $H(\cdot )$ is the Shannon information entropy.

Rand Index is the percentage of true decisions, defined by

$$\begin{aligned} RI(\Omega ; C) = \frac{TP + TN}{TP + FP + FN +TN}, \end{aligned}$$

(19)

where true positive (TP), true negative (TN), false positive (FP), and false negative (FN) are decisions between $\Omega $ and C. ARI is to scale Rand Index in the range of [0,1].

Purity is the percentage of nodes which are classified correctly:

$$\begin{aligned} Purity(\Omega ; C) = \frac{1}{N}\sum _{m} \mathop {max}\limits _{j} \vert \omega _{m} \cap c_{j} \vert , \end{aligned}$$

(20)

where $\omega _{m}$ and $c_{j}$ mean the m-th community in $\Omega $ and the j-th community in C respectively.

The multilayer modularity evaluation metric is defined by:

$$\begin{aligned} Q_{m} = \frac{1}{2\eta }\sum \limits _{ijsr} [(\textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{jr}}{2m_{s}} ) \delta (s,r) + \delta (i,j) \mathscr {C}_{jsr}] \delta \left( g_{is}, g_{jr}\right) , \end{aligned}$$

(21)

where $\textbf{A}_{ijs}$ is the intra-layer edge strength and $\mathscr {C}_{jsr}$ is the inter-layer edge strength or called the coupling parameter (subscript i, j indexes nodes, s, r indexes layers). $\gamma _{s}$ is a resolution parameter in the s-th layer. $ \eta =\left( \sum _{ijs} \textbf{A}_{ijs} + \sum _{jsr} \mathscr {C}_{jsr} \right) /2 $ is the total multilayer edge strength. $m_{s} = \left( \sum _{ij} \textbf{A}_{ijs} \right) /2$ is the total intra-layer edge strength of the s-th layer. $d_{is}$ is the degree of nodes i of the s-th layer. $ g(\cdot ) $ is the community label. $\delta (\cdot )$ is the Kronecker function. In our experiments, the resolution parameter $\gamma _{s}$ and the coupling parameter $\mathscr {C}_{jsr}$ are set to 1. The multilayer modularity is scaled in the range of [0, 1]. A higher modularity indicates better community detection result.

4.4 Parameters

In our GCFM algorithm, we use the DeepWalk to generate an 128-dimensional initial embedding for each node. For the DeepWalk, we set the window size as 10, walk length as 40, and the number of walks started per node as 20 for all datasets. The depth of GCA is set to 4. The dimension of each GCA scale is set to 128-512-1024-2048-10 for synthetic datasets and 128-1024-2048-2048-10 for real-world datasets. The learning rate r is set to 0.001. Considering that the reconstruction loss $L_{r}$ is too larger than the community detection loss $L_{c}$, we set the balance coefficient $\lambda =10$.

For the competitor M-DeepWalk, as it assigns a community label for a node in each layer, we use the majority voting to determine its final community label. The MDLPA can automatically determine the number of communities without a priori assumption and do not need any hyper-parameter. The PMM contains a single hyper-parameter as the number of structural features extracted from each layer. We select the number of structural features in [5, 20] with a step of one per increment and choose the best result. The CSNMF has no external parameters, except a pre-specified community number. We set it the same as the target community number in synthetic datasets and tune it in real-world datasets. The DH-Louvain is also parameter-free and can determine the number of communities by itself. For the M-DeepWalk, we set the threshold of generating a supra graph as 0.1 for the synthetic datasets and 0.2 for the real-world datasets. The parameters of DeepWalk are the same as ours. The learning rate is set to 0.001. For the DMGI and HDMI, we use the initial embedding in Sect. 3.1 as 128-dimensional node features in each layer. The learning rate is also set to 0.001. For all the coefficient of module loss and regularization in the DMGI, we follow the original paper and select from [0.0001, 0.001, 0.01, 0.1] and tune the coefficients to obtain the best result. For the HDMI, we set the layer and fusion coefficients to 1 for all layers in synthetic datasets and use grid search to tune in the real-world datasets. For all the competitors and our model, we run them 10 times and take the average result.

5 Results and analysis

5.1 Performance on synthetic datasets

Figure 5 compares the community detection performance on mLFR synthetic networks. We observe that our GCFM performs the best in almost all the cases. Furthermore, the performance improvements are much significant compared with the competitors in the cases of $\mu \ge 0.25$. We also notice that the much better modularity of our GCFM over that of competitors suggests stronger modular structures of its detected communities.

These results validate the effectiveness of our GCFM algorithm: The GCA module encodes neighbor-aware structural information for each node to explore locally associated structures at different scales in each network layer. The MFN module fuses structural encodings first for multiple network layers and then for different convolution scales, so as to learn a node representation featuring both intra-layer locally associated structures and inter-layer semantic relations. Finally, the self-training mechanism extracts cohesive communities from the viewpoint of community distribution via the soft assignment, which can iteratively optimize each node representation close to its community center.

Some other analysis for the results are as follows. We can observe that the performance of all algorithms degrade with the increase of the mixing parameter $\mu $. For the mLFR synthetic datasets, a lager value of $\mu $ indicates to include more connections in between two nodes belonging to different communities, which makes it more difficult for the task of community detection. We can also observe that the M-DeepWalk based on node representation learning have a good performance in many cases. The M-DeepWalk extends the DeepWalk (Perozzi et al. 2014) by enabling cross-layer random walks for learning node representations. Even given its simple learning technique, the results indicate that learning node representation could be a powerful approach for the community detection task. For DMGI and HDMI, they are both based on mutual information maximization which can narrow the difference between network layers and have a relatively good performance. The PMM has an average performance by optimizing the principal modularity. The CSNMF, which utilizes collective matrix decomposition, is sensitive to the mixing parameter $\mu $. When $\mu \ge 0.25$, the performance of CSNMF drops sharply. Though MDLPA can determine community number automatically, the losing of multiplex information when flattening multiplex networks leads to the worst performance in most mLFR datasets. The DH-Louvain has two phases, including node merging and community merging, to optimize a weighted modularity density, and its performance is on the average among all competitors.

Table 3 Modularity on real-world datasets

Full size table

5.2 Performance on real-world datasets

Table 3 presents the modularity results on real-world datasets. The numbers in the brackets are the number of detected communities in these algorithms when achieving their respective best modularity. We note that the PMM, CSNMF, M-DeepWalk, MDGI, HDMI and GCFM algorithms need to pre-specified the number of communities; While MDLPA and DH-Louvain algorithms do not need to do so, and they can automatically determine the number of communities after finishing algorithm execution. Figure 6 presents the modularity results with different pre-specified community numbers in three different types of multiplex networks. We observe that our GCFM outperforms the others for its superiority and robustness.

We again observe that our GCFM achieves the best modularity in most real-world datasets, and the third place in the London dataset. The specialty of the London dataset lies in its sparsity: It is a 3-layer network containing 369 nodes yet with only 441 edges. Indeed, it contains many isolated nodes, each of which cannot link to any other node in one or more network layers. Generally speaking, a community division with high modularity should assign each isolated node an independent community label. MDLPA directly regard isolated nodes as independent communities, which can be found from their much larger community numbers in Table 3. On the other hand, for algorithms needing pre-specified community numbers, they do not discriminate isolated nodes yet still clustering them into communities, which leads to a lower modularity. This phenomenon could be easily addressed by firstly removing isolated nodes for community division. From Fig. 6, it can be observed that our GCFM achieves consistent good performance while some competitors such as the CSNMF and PMM have some performance variations with the increase of the numbers of communities. Both the CSNMF and PMM are based on the adjacency matrix decomposition. Varying community numbers would impact on the eigenvectors during matrix decomposition, which again would impact on the obtained community structures in the process of two medium-sized communities merging into one or a huge community being divided into multiple communities. In our GCFM, the community number only impacts on the self-training module and KL loss.

An interesting observation is that the M-DeepWalk often plays the worst in these real-world datasets, which is contrary to the results in the synthetic datasets. This discrepancy comes from the differences of two types of datasets. In the synthetic datasets, the topological structure of each layer is similar, that is, each node has similar neighborhood in different layers, which makes it easier to find more inter-layer edges. While in real-world datasets, the topological structure of different network layers could differ greatly, which leads to the loss of structural information when generating a supra graph. While our GCFM can decrease such inconsistency of nodes in different layer.

5.3 Ablation study

5.3.1 Node initial embedding

To investigate the impact of node initial embedding, we compare the DeepWalk with the other three methods, i.e. one-hot, node2vec (Grover and Leskovec 2016), and node centrality. For node2vec, we set its return parameter $p = 1$ and in-out parameter $q = 0.5$. For centrality-based method, we choose four common centrality metrics as the node initial embedding, including degree centrality, betweenness centrality, closeness centrality, and load centrality. Figure 7 presents the modularity results with different node initial embeddings. The two graph embedding algorithms, namely, DeepWalk and node2vec achieve better performance than the other two. This is not unexpected, as they can pre-train node initial embeddings with some topological information.

5.3.2 Fusion depth analysis

For a given four-scale GCA module, we conduct experiments to analyze the impact of fusion depths. The outputs of the four-scale GCA are denoted by $\textbf{H}_{1}$, $\textbf{H}_{2}$, $\textbf{H}_{3}$, and $\textbf{H}_{4}$. Each time we decrease one output scale and input the others into the MFN module. For example, GCFM-3 means that there are three output scales, i.e. $\textbf{H}_{2}$, $\textbf{H}_{3}$, and $\textbf{H}_{4}$, which are input to the MFN module. Besides, GCFM-1 means that only $\textbf{H}_{4}$ is used for self-training community detection. Table 4 presents the modularity results with fusion under different scales. We can see that the output of each GCA scale contributes to the final community division and GCFM-4 outperforms the others, which suggests the necessity of multiscale fusion.

Table 4 Modularity with different fusion depths

Full size table

5.3.3 Model depth analysis

We also conduct experiments to investigate the effect of model depths, i.e., the GCA scales. Figure 8 plots the modularity results against different model depths. We can obverse that the best modularity results are usually obtained with a 4-scale model. When more than four scales are used, the modularity will not keep increasing but even start dropping. Although multiscale fusion is desirable, a larger model depth, i.e., more GCA scales might introduce the so-called over-smoothing problem: After more neighbors are included into the aggregation operation, the neighbor-aware structural encoding start becoming similar across topology-close nodes; While reducing its capability of discriminating local structures.

5.3.4 Parameter analysis

In order to investigate the effect of hyper-parameter, i.e. the loss coefficient $\lambda $ and learning rate r, we test the modularity with the increase of $\lambda $ from 0 to 100 under different learning rate, i.e. $r \in \{0.0001, 0.001, 0.01, 0.1\}$. Figure 9 shows the modularity results with varying $\lambda $ and r in C.elegans dataset. We can see that the best modularity is obtained under the setting of $\lambda =10$ and $r=0.001$. When $\lambda =0$, our model can not identify high modular structures, which proves the effectiveness of community detection loss $L_{c}$.

5.4 Visualization

5.4.1 Network visualization

Figure 10 visualizes an instance of a 3-layer mLFR network with $\mu = 0.35$ and the communities detected by the experimented algorithms. We can observe that the result of our GCFM is closest to the ground truth.

5.4.2 Embedding visualization

Figure 11 visualizes the node representations and feature similarity matrix learned by our GCFM for an instance of a 4-layer mLFR network with $\mu =0.4$. We use the t-SNE (Van der Maaten and Hinton 2008) to visualize the node representation $\textbf{Z}$ at some training epoches. We can see that the not only the node representations become more evident with the increase of epochs, but also the relations in between nodes, that is, clearer yet distant community division in the projected space. In the right, the main diagonal blocks of feature similarity matrix $\mathbf {ZZ^{T}}$ show the community distribution corresponding to the result of the 25-th training epoch.

6 Discussion and conclusion

In this paper, we have proposed the GCFM for the task of community detection in multiplex networks. It contains a graph convolutional auto-encoder for encoding neighbor-aware intra-layer structural information, a multiscale fusion network for learning a holistic version of nodes’ representations by fusing nodes’ encodings at different layers and different scales, and a self-training mechanism to train our model and detect communities. Experiments on both synthetic and real-world datasets have validated the superiority of our GCFM over the state-of-art algorithms. In this work, an excellent feature of our model lies in its full consideration for different scales of local information.

We also obverse some limitations which need to be further investigated. Due to the lack of prior knowledge, we treat each layer equally while each network layer may have its own contribution to the final detected communities in the real world. Therefore, how to judge the weight of each layer in a multiplex network is important. One of the possible approaches is to introduce node attributes. Also the GCFM is mainly used in undirected multiplex networks. Some applications in the real world suggest directed edges. In our future work, we would like to investigate community detection in multiplex networks with node attributes and edge directions.

Availability of data and materials

All data generated or analysed during this study are included in this published article.

Notes

In order to distinguish the concept of "network layer", we use "scale" in the context of graph convolutional networks.

References

Ali HT, Liu S, Yilmaz Y, Couillet R, Rajapakse I, Hero A (2019) Latent heterogeneous multilayer community detection. In: Proceedings of the international conference on acoustics, speech and signal processing, pp 8142–8146. IEEE
Berlingerio M, Coscia M, Giannotti F (2011) Finding and characterizing communities in multidimensional networks. In: Proceedings of international conference on advances in social networks analysis and mining, pp 490–494. IEEE
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech: Theory Exp 2008(10):10008
Article MATH Google Scholar
Bouguessa M, Wang S, Dumoulin B (2010) Discovering knowledge-sharing communities in question-answering forums. ACM Trans Knowl Discov Data 5(1):1–49
Article Google Scholar
Boutemine O, Bouguessa M (2017) Mining community structures in multidimensional networks. ACM Trans Knowl Discov Data 11(4):1–36
Article Google Scholar
Bo D, Wang X, Shi C, Zhu M, Lu E, Cui P (2020) Structural deep clustering network. In: Proceedings of the World Wide Web conference, pp 1400–1410
Bródka P (2016) A method for group extraction and analysis in multilayer social networks. arXiv preprint arXiv:1612.02377
Cao J, Jin D, Yang L, Dang J (2018) Incorporating network structure with node contents for community detection on large networks using deep learning. Neurocomputing 297:71–81
Article Google Scholar
Chang H, Feng Z, Ren Z (2016) Community detection using dual-representation chemical reaction optimization. IEEE Transn Cybern 47(12):4328–4341
Article Google Scholar
Chen BL, Hall DH, Chklovskii DB (2006) Wiring optimization can relate neuronal structure and function. Proc Natl Acad Sci 103(12):4723–4728
Article Google Scholar
Chen Z, Chen C, Zheng Z, Zhu Y (2019) Tensor decomposition for multilayer networks clustering. In: Proceedings of the AAAI conference on artificial intelligence, pp 3371–3378
Chen H, Perozzi B, Hu Y, Skiena S (2018) Harp: Hierarchical representation learning for networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Coleman J, Katz E, Menzel H (1957) The diffusion of an innovation among physicians. Sociometry 20(4):253–270
Article Google Scholar
De Domenico M, Solé-Ribalta A, Gómez S, Arenas A (2014) Navigability of interconnected networks under random failures. Proc Natl Acad Sci 111(23):8351–8356
Article MathSciNet MATH Google Scholar
Eagle N, Pentland AS (2006) Reality mining: sensing complex social systems. Pers Ubiquit Comput 10(4):255–268
Article Google Scholar
Fortunato S (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Article MathSciNet Google Scholar
Gao X, Zheng Q, Verri FA, Rodrigues RD, Zhao L (2019) Particle competition for multilayer network community detection. In: Proceedings of the 11th international conference on machine learning and computing, pp 75–80
Garcia JO, Ashourvan A, Muldoon S, Vettel JM, Bassett DS (2018) Applications of community detection techniques to brain graphs: algorithmic considerations and implications for neural function. Proc IEEE 106(5):846–867
Article Google Scholar
Gligorijević V, Panagakis Y, Zafeiriou S (2019) Non-negative matrix factorizations for multiplex network analysis. IEEE Trans Pattern Anal Mach Intell 41(4):928–940
Article Google Scholar
Grover A, Leskovec J (2016) node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 855–864
Huang X, Chen D, Ren T, Wang D (2021) A survey of community detection methods in multilayer networks. Data Min Knowl Disc 35(1):1–45
Article MathSciNet MATH Google Scholar
Interdonato R, Tagarelli A, Ienco D, Sallaberry A, Poncelet P (2017) Local community detection in multilayer networks. Data Min Knowl Disc 31(5):1444–1479
Article MathSciNet MATH Google Scholar
Jia Y, Zhang Q, Zhang W, Wang X (2019) Communitygan: Community detection with generative adversarial nets. In: Proceedings of the World Wide Web conference, pp 784–794
Jing B, Park C, Tong H (2021) Hdmi: High-order deep multiplex infomax. In: Proceedings of the web conference 2021, pp 2414–2424
Jin D, Liu Z, Li W, He D, Zhang W (2019) Graph convolutional networks meet markov random fields: Semi-supervised community detection in attribute networks. In: Proceedings of the AAAI conference on artificial intelligence, pp 152–159
Kuang D, Ding C, Park H (2012) Symmetric nonnegative matrix factorization for graph clustering. In: Proceedings of the international conference on data mining, pp 106–117. SIAM
Liu Q, Wang B (2022) Neural extraction of multiscale essential structure for network dismantling. Neural Netw 154:99–108
Article Google Scholar
Liu Q, Wang B, Qi J, Deng X (2022) A new centrality measure based on neighbor loop structure for network dismantling. Digit Commun Netw
Liu F, Xue S, Wu J, Zhou C, Hu W, Paris C, Nepal S, Yang J, Yu PS (2020) Deep learning for community detection: Progress, challenges and opportunities. In: Proceedings of the 29th international joint conference on artificial intelligence, pp. 4981–4987. International Joint Conferences on Artificial Intelligence Organization
Ma X, Dong D, Wang Q (2018) Community detection in multi-layer networks using joint nonnegative matrix factorization. IEEE Trans Knowl Data Eng 31(2):273–286
Article Google Scholar
Magalingam P, Davis S, Rao A (2015) Using shortest path to discover criminal community. Digit Investig 15:1–17
Article Google Scholar
Magnani M, Hanteer O, Interdonato R, Rossi L, Tagarelli A (2021) Community detection in multiplex networks. ACM Comput Surv 54(3):1–35
Article Google Scholar
Magnani M, Micenkova B, Rossi L (2013) Combinatorial analysis of multiple networks. arXiv preprint arXiv:1303.4986
Magnani M, Rossi L (2011) The ml-model for multi-layer social networks. In: ASONAM, pp 5–12. IEEE Computer Society
Mercorio F, Mezzanzanica M, Moscato V, Picariello A, Sperli G (2019) Dico: A graph-db framework for community detection on big scholarly data. IEEE Trans Emerg Top Comput 9(4):1987–2003
Article Google Scholar
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P (2010) Community structure in time-dependent, multiscale, and multiplex networks. Science 328(5980):876–878
Article MathSciNet MATH Google Scholar
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2):026113
Article Google Scholar
Park C, Kim D, Han J, Yu H (2020) Unsupervised attributed multiplex network embedding. Proc AAAI Conf Artif Intell 34:5371–5378
Google Scholar
Paul S, Chen Y (2022) Null models and community detection in multi-layer networks. Sankhya A 84(1):163–217
Article MathSciNet MATH Google Scholar
Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, pp 701–710
Pramanik S, Tackx R, Navelkar A, Guillaume J-L, Mitra B (2017) Discovering community structure in multilayer networks. In: 2017 IEEE international conference on data science and advanced analytics (DSAA), pp 611–620. IEEE
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76(3):036106
Article Google Scholar
Shao Z, Ma L, Lin Q, Li J, Gong M, Nandi AK (2022) Pmcdm: privacy-preserving multiresolution community detection in multiplex networks. Knowl-Based Syst 244:108542
Article Google Scholar
Song H, Thiagarajan JJ (2019) Improved deep embeddings for inferencing with multi-layered graphs. In: Proceedings of the international conference on big data, pp 5394–5400. IEEE
Souravlas S, Anastasiadou S, Katsavounis S (2021) A survey on the recent advances of deep community detection. Appl Sci 11(16):7179
Article Google Scholar
Sperlí G (2019) A deep learning based community detection approach. In: Proceedings of the 34th ACM/SIGAPP symposium on applied computing, pp 1107–1110
Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M (2006) Biogrid: a general repository for interaction datasets. Nucleic Acids Res 34(suppl_1), 535–539
Suthers D, Fusco J, Schank P, Chu K-H, Schlager M (2013) Discovery of community structures in a heterogeneous professional online network. In: 2013 46th Hawaii international conference on system sciences, pp 3262–3271. IEEE
Su X, Xue S, Liu F, Wu J, Yang J, Zhou C, Hu W, Paris C, Nepal S, Jin D, et al (2022) A comprehensive survey on community detection with deep learning. IEEE Trans Neural Netw Learn Syst
Tagarelli A, Amelio A, Gullo F (2017) Ensemble-based community detection in multilayer networks. Data Min Knowl Disc 31(5):1506–1543
Article MathSciNet MATH Google Scholar
Tang L, Wang X, Liu H (2009) Uncoverning groups via heterogeneous interaction analysis. In: Proceedings of the 9th international conference on data mining, pp 503–512. IEEE
Tu C, Zeng X, Wang H, Zhang Z, Liu Z, Sun M, Zhang B, Lin L (2018) A unified framework for community detection and network representation learning. IEEE Trans Knowl Data Eng 31(6):1051–1065
Article Google Scholar
Van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(11):2579–2605
MATH Google Scholar
Velickovic P, Fedus W, Hamilton WL, Liò P, Bengio Y, Hjelm RD (2019) Deep graph infomax. ICLR (Poster) 2(3):4
Google Scholar
Wang C, Pan S, Hu R, Long G, Jiang J, Zhang C (2019) Attributed graph clustering: a deep attentional embedding approach. In: Proceedings of the 28th international joint conference on artificial intelligence, pp 3670–367. International Joint Conferences on Artificial Intelligence Organization
Xia L, Huang C, Xu Y, Dai P, Zhang X, Yang H, Pei J, Bo L (2021) Knowledge-enhanced hierarchical graph transformer network for multi-behavior recommendation. Proc AAAI Conf Artif Intell 35:4486–4493
Google Scholar
Xie Y, Gong M, Wang S, Yu B (2018) Community discovery in networks with deep sparse filtering. Pattern Recogn 81:50–59
Article Google Scholar
Xie J, Girshick R, Farhadi A (2016) Unsupervised deep embedding for clustering analysis. In: Proceedings of the international conference on machine learning, pp 478–487. PMLR
Ye F, Chen C, Zheng Z (2018) Deep autoencoder-like nonnegative matrix factorization for community detection. In: Proceedings of the 27th ACM international conference on information and knowledge management, pp 1393–1402
Zhang H, Wang C-D, Lai J-H, Philip SY (2017) Modularity in complex multilayer networks with multiple aspects: a static perspective. In: Proceedings of the applied informatics, pp 1–29. SpringerOpen

Download references

Funding

This work is supported in part by National Natural Science Foundation of China (Grant No: 62172167).

Author information

Authors and Affiliations

School of Electronic Information and Communication, Huazhong University of Science and Technology, Wuhan, 430074, Wuhan, China
Xiang Cai & Bang Wang

Authors

Xiang Cai
View author publications
You can also search for this author in PubMed Google Scholar
Bang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

XC: methodology, software, experimentation, result analysis, writing—original draft, visualization. BW: Conceptualization, methodology, result analysis, writing—review editing, supervision.

Corresponding author

Correspondence to Bang Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethics approval

Not applicable.

Consent to participate

Yes.

Consent for publication

Yes.

Additional information

Responsible editor: Jingrui He.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

In this Appendix, we discuss the characteristics of multilayer modularity and the encoding distance with a node corresponding to the same entity in different layers.

Theorem 1

Given the modularity of each single layer $Q_{s}=\sum \limits _{ij} [(\textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{js}}{2m_{s}})] \delta \left( g_{i}, \right. \left. g_{j}\right) $, the multilayer modularity will degenerate into the sum of multiple single layer modularity, i.e. $\sum \limits _{s} Q_{s}$, if a node is assigned with only one community label in its all layers of a multiplex network.

For the multilayer modularity defined by Eq.(21), it allows that a node in each layer can have its own label; While GCFM and some other baseline methods (PMM, MDLPA, CSNMF, DMGI, HDMI) assign a community label to a node in its belonging layers. In order to analyze the GCFM, we can rewrite Eq.(21) as:

$$\begin{aligned} Q_{m} = \frac{1}{2\eta }\sum \limits _{ijsr} [(\textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{jr}}{2m_{s}} ) \delta (s,r) + \delta (i,j) \mathscr {C}_{jsr}] \delta \left( g_{i}, g_{j}\right) . \end{aligned}$$

(A1)

For layer s, r and node i, j, the Eq.(A1) can be divided into four cases:

$\bullet $ Case1: $s=r$ and $i=j$ ($\delta \left( g_{i}, g_{j}\right) =1$ and $C_{jsr}=0$).

$$\begin{aligned} Q^{(1)} = \frac{1}{2\eta }\sum \limits _{is} [(\textbf{A}_{iis} - \gamma _{s} \frac{d_{is}^{2}}{2m_{s}})]. \end{aligned}$$

(A2)

$\bullet $ Case2: $s=r$ and $i \ne j$.

$$\begin{aligned} Q^{(2)} = \frac{1}{2\eta }\sum \limits _{ijs,i \ne j} \left[ \left( \textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{js}}{2m_{s}}\right) \right] \delta \left( g_{i}, g_{j}\right) . \end{aligned}$$

(A3)

$\bullet $ Case3: $s \ne r$ and $i = j$ ($\delta \left( g_{i}, g_{j}\right) =1$).

$$\begin{aligned} Q^{(3)} = \frac{1}{2\eta }\sum \limits _{isr, s \ne r} \mathscr {C}_{jsr}. \end{aligned}$$

(A4)

$\bullet $ Case4: $s \ne r$ and $i \ne j$.

$$\begin{aligned} Q^{(4)} = 0. \end{aligned}$$

(A5)

Therefore, $Q_{m}$ can be represented as $Q^{(1)}+Q^{(2)}+Q^{(3)}+Q^{(4)}$. $Q^{(3)}$ and $Q^{(4)}$ are constants and $C_{jsr}$ is proven ineffective in our problem setting. It works on the case that each node in each layer is assigned with an independent community label. $Q^{(1)}$ and $Q^((2))$ can be merged as:

$$\begin{aligned} Q^{(1)}+Q^{(2)} = \frac{1}{2\eta }\sum \limits _{s}\sum \limits _{ij} \left[ \left( \textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{js}}{2m_{s}}\right) \right] \delta \left( g_{i}, g_{j}\right) , \end{aligned}$$

(A6)

which means the sum of each single layer modularity $Q_{s}=\sum \limits _{ij} [(\textbf{A}_{ijs} - \gamma _{s} \frac{d_{is} d_{js}}{2m_{s}})] \delta \left( g_{i}, g_{j}\right) $. To sum up, our model is to design an optimization function $g(\cdot )$ to maximize the sum of monoplex modularity:

$$\begin{aligned} \mathop {\arg \max }\limits _{g} \sum \limits _{s} Q_{s}. \end{aligned}$$

(A7)

Theorem 2

The GCA can decrease the encoding distance of same node in different layers, i.e. $\Vert \hat{\textbf{h}}_{i,1}-\hat{\textbf{h}}_{i,2}\Vert _{2}$, if they have similar topological structures in their own layers.

Given a multiplex network, we focus on the k-th GCA scale. Denote $\textbf{h}_{i,l}$ as the encoding of node i of the l-th layer (denoted as (i,l)) in $(k-1)$-th scale and we have $\hat{\textbf{h}}_{i,l}=ReLU(\sum _{j \in N_{i,l}} \frac{\textbf{h}_{j,l}}{\sqrt{d_{i,l}}\sqrt{d_{j,l}}}\textbf{W})$, where $N_{i,l}$ is the neighborhoods of node (i,l). Assume $ReLU(\cdot )$ as $\sigma (x)=x$ and $\textbf{W}=\textbf{I}$. $\hat{\textbf{h}}_{i,l}$ can be regarded as the average of neighbor encodings. Now we focus on node i in a two-layer multiplex network, i.e. $\hat{\textbf{h}}_{i,1}$ and $\hat{\textbf{h}}_{i,2}$. For $\hat{\textbf{h}}_{i,1}$, it can be divided into three parts: (a) Self node encoding $\frac{\textbf{h}_{i,l}}{d_{i,l}}$. (b) The encodings of generalized common neighbors of (i,1) and (i,2), where generalized common neighbors mean that nodes correspond to the same entity. It can be represented as $\frac{\textbf{B}_{1}}{\sqrt{d_{i,1}}}$, where $\textbf{B}_{1}=\sum _{u \in N_{i,1} \cap N_{i,2}} \frac{\textbf{h}_{u,1}}{\sqrt{d_{u,1}}}$ is the sum of common neighbor encodings. (c) The other non-common neighbor encodings, denoted as $\frac{\textbf{D}_{1}}{\sqrt{d_{i,1}}}$, where $\textbf{D}_{1}=\sum _{v \in N_{i,1} - N_{i,1} \cap N_{i,2}} \frac{\textbf{h}_{v,1}}{\sqrt{d_{v,1}}}$. Assume each node in the two layer has the same initial encoding in $(k-1)$-th scale, i.e. $\textbf{h}_{i,1}=\textbf{h}_{i,2}$. Then we measure the distance of $\hat{\textbf{h}}_{i,1}$ and $\hat{\textbf{h}}_{i,2}$:

$$\begin{aligned} \begin{aligned}&\Vert \hat{\textbf{h}}_{i,1}-\hat{\textbf{h}}_{i,2}\Vert _{2}\\&\quad =\Vert \left( \frac{\textbf{h}_{i,1}}{d_{i,1}}-\frac{\textbf{h}_{i,2}}{d_{i,2}}\right) +\left( \frac{\textbf{B}_{1}}{\sqrt{d_{i,1}}}-\frac{\textbf{B}_{2}}{\sqrt{d_{i,2}}}\right) + \left( \frac{\textbf{D}_{1}}{\sqrt{d_{i,1}}}-\frac{\textbf{D}_{2}}{\sqrt{d_{i,2}}}\right) \Vert _{2}\\&\quad \le \left\| \frac{\textbf{h}_{i,1}}{d_{i,1}}-\frac{\textbf{h}_{i,2}}{d_{i,2}} \right\| _{2} + \left\| \frac{\textbf{B}_{1}}{\sqrt{d_{i,1}}}-\frac{\textbf{B}_{2}}{\sqrt{d_{i,2}}} \right\| _{2} + \Vert \frac{\textbf{D}_{1}}{\sqrt{d_{i,1}}}-\frac{\textbf{D}_{2}}{\sqrt{d_{i,2}}} \Vert _{2}\\&\quad \le \left| \frac{d_{i,2}-d_{i,1}}{d_{i,1}d_{i,2}}\vert \Vert \textbf{h}_{i,1} \right\| _{2} \\&\qquad +\sum _{u \in N_{i,1} \cap N_{i,2}} \left| \frac{\sqrt{d_{i,2}d_{u,2}}-\sqrt{d_{i,1}d_{u,1}}}{\sqrt{d_{i,1}d_{u,1}d_{i,2}d_{u,2}}}\right| \left\| \textbf{h}_{u,1} \right\| _{2} \\&\qquad +\left( \left\| \frac{\textbf{D}_{1}}{\sqrt{d_{i,1}}}\right\| _{2} +\left\| \frac{\textbf{D}_{2}}{\sqrt{d_{i,2}}}\right\| _{2}\right) . \end{aligned} \end{aligned}$$

(A8)

From Eq.(A8) we can derive the upper bound of the distance between $\hat{\textbf{h}}_{i,1}$ and $\hat{\textbf{h}}_{i,2}$. The first term measures the degree difference of (i,1) and (i,2), where $\vert \frac{d_{i,2}-d_{i,1}}{d_{i,1}d_{i,2}}\vert \le 1$. If $d_{i,1} \approx d_{i,2}$, the effect of the first term can be ignored; While if the two nodes have no similar degrees such as $d_{i,2} \gg d_{i,1} = 1$, it may achieve the upper bound 1. The second term represents the influence of the common neighbors. For each item $\vert \frac{\sqrt{d_{i,2}d_{u,2}}-\sqrt{d_{i,1}d_{u,1}}}{\sqrt{d_{i,1}d_{u,1}d_{i,2}d_{u,2}}}\vert \le 1$, we assume (i,1) and (i,2) have similar degrees. Specially, if $d_{u,2} \gg d_{u,1}$, this item will become $\vert \frac{1}{\sqrt{d_{i,1}d_{u,1}}} \vert $ which is controlled by the degree of node (i,1) and the degree of its common neighbor (u,1), both normally larger than 1. So usually $\vert \frac{\sqrt{d_{i,2}d_{u,2}}-\sqrt{d_{i,1}d_{u,1}}}{\sqrt{d_{i,1}d_{u,1}d_{i,2}d_{u,2}}}\vert < 1$. The third term shows the influence of the non-common neighbors. For the upper bound of each item $\vert \frac{1}{\sqrt{d_{i,1}d_{v,1}}} \vert $ and $\vert \frac{1}{\sqrt{d_{i,2}d_{v,1}}} \vert $, if $d_{v,1} \gg d_{i,1}$ and $d_{v,2} \gg d_{i,2}$, the effect of the third term can be ignored. Besides, there is certain restrictive correlation between the second term and the third term, which is $d_{i,1}=\vert u \in N_{i,1} \cap N_{i,2} \vert + \vert v \in N_{i,1} - N_{i,1} \cap N_{i,2} \vert $.

In a multiplex network, we hope that the same node in different layers have approximate performance if they have similar topology so that we can fuse their encodings and obtain a consistent community label. Equation(A8) proves that our GCA can decrease the encoding distance of same node in different layers with similar topology, i.e node degree, common neighbor and one-hop neighbor degree, which will be benefit for the task of community detection.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cai, X., Wang, B. A graph convolutional fusion model for community detection in multiplex networks. Data Min Knowl Disc 37, 1518–1547 (2023). https://doi.org/10.1007/s10618-023-00932-w

Download citation

Received: 25 August 2022
Accepted: 28 February 2023
Published: 06 April 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10618-023-00932-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A graph convolutional fusion model for community detection in multiplex networks

Abstract

Similar content being viewed by others

Gumbel-SoftMax based graph convolution network approach for community detection

Beyond Laplacian Smoothing for Semi-supervised Community Detection

Weakly-supervised learning for community detection based on graph convolution in attributed networks

Explore related subjects

1 Introduction

1.1 Background

1.2 Motivation

1.3 Contribution

2 Related work

2.1 Multiplex network community detection

2.2 Deep learning community detection

3 Graph convolutional fusion model

3.1 Node initial embedding

3.2 Graph convolutional auto-encoder

3.3 Multiscale fusion network

3.4 Self-training community detection

3.5 Complexity anlaysis

4 Experiment setup

4.1 Datasets

4.2 Competitors

4.3 Metrics

4.4 Parameters

5 Results and analysis

5.1 Performance on synthetic datasets

5.2 Performance on real-world datasets

5.3 Ablation study

5.3.1 Node initial embedding

5.3.2 Fusion depth analysis

5.3.3 Model depth analysis

5.3.4 Parameter analysis

5.4 Visualization

5.4.1 Network visualization

5.4.2 Embedding visualization

6 Discussion and conclusion

Availability of data and materials

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Appendix A

Appendix A

Theorem 1

Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation