Keywords

1 Introduction

Community detection is a multidisciplinary research area that is used to study the structural properties of complex networks. These structures of the network have a high-dimensional form of graph data, which requires a large number of trainable parameters. Deep learning techniques are employed to analyze the rich nonlinear structure of real-world networks. Spectral clustering techniques cannot scale for large networks because they can perform well only for small networks. Other traditional methods (as mentioned in Table 1) used for community detection, such as statistical inference, do not perform well on large networks having high-dimensional features. Such networks have high computational complexity in terms of both time and space. Deep learning models when used for community detection in networks provide improved performance over traditional techniques like spectral clustering and statistical interference.

Table 1 Taxonomy of traditional community detection methods

They learn the nonlinear structural properties of the network and represent the network in its low-dimensional form. Reference [7] presented the architecture of the convolutional neural network to mitigate the redundant information existing in the networks by sharing the weights of convolutional layers among residual blocks, thus showing a 45% reduction in network parameters and increased efficiency.

1.1 Community: Definition and Properties

Community: A set of communities \(c = c_1 ,c_2 ...,c_k\) denotes k communities partitioned in a network G (V, E). Informally, a community C is a subgraph of a network that consists of a collection of nodes V such that the number of edges inside the community is denser than the edges linking the vertices of C with other communities of the graph. Here, V is a set of vertices; E is set of edges, n = |V|, m = |E|, C = A subset of V, nc = |C|.

Intra Cluster density

$$\delta_{{\text{in}}} \left( C \right) = \frac{{\# {\text{internal}}\,{\text{edges}}\,{\text{of}}\,C}}{{n_c \left( {n_c - 1} \right)/2}}$$

Inter Cluster density

$$\begin{aligned} & \delta_{{\text{ext}}} \left( C \right) = \frac{{\# {\text{inter - cluster}}\,{\text{edges}}\,{\text{of}}\,C}}{{n_c \left( {n - n_c } \right)}} \\ & \partial_{{\text{ext}}} \left( C \right) \ll 2m/n\left( {n - 1} \right) \ll \partial_{{\text{in}}} \left( C \right) \\ \end{aligned}$$

Connectedness is an important property that maintains the connections between each pair of vertices in C. Community detection is delineated on sparse graphs only.

2 Categorization of Deep Learning Techniques

  1. A.

    Convolutional Neural Networks (CNNs)

It belongs to a specific category of feed-forward neural networks. Reference [8] addresses the problem of community detection in topologically incomplete networks. Their work proposed a deep CNN model that showed more robustness than classical supervised models, even in the case of missing edges in networks.

  1. B.

    Autoencoder-Based CD Approach

Autoencoders (AEs) are unsupervised models similar to spectral clustering frameworks using low-dimensional matrix reconstruction [9]. Several studies have been proposed to use variants of autoencoder models such as stacked autoencoder [10] and sparse AEs [11].

  1. C.

    Generative Adversarial Networks (GANs)

GANs are comprised of two competing neural networks with adversarial training to improve the discriminative ability when applied to community detection problem solves the overfitting challenge and resulting in fast-adjusting precision. Wang [12] introduced a low-dimensional vector space graph representation approach. Here, each vertex of the graph is represented as a low-dimensional vector space. A novel deep learning algorithm was proposed by the authors [13] to utilize graph representation learning techniques to solve overlapping community detection problems. Previous approaches focused only on communities having domain-specific rich topological information and failed in the networks having less structural information. An approach for cross-domain network representation was devised by the authors in [14].

  1. D.

    Deep NMF-Based CD Approaches

Non-negative factorization (NMF) [15] computation involves the factorization of a large matrix into two matrices having non-negative values. NMF approach follows for community detection tasks by decomposing the adjacency matrix of a network into the product of two matrices with non-negative elements. The error function is also minimized for further network partitioning tasks. NMF can be implemented in both overlapping and non-overlapping community detection tasks. Conventional NMF cannot capture all the sophisticated topological information for community detection. The deep learning-based NMF approach proposed for the multilayer learning strategy of complex data to uncover latent feature hierarchies using stacked NMF has shown an improved performance compared to that used for single-layered networks [16].

  1. E.

    Deep SF-Based CD Approach

Sparse filtering (SF) [17] is an effective feature learning algorithm that is known to handle high-dimensional graph data. It is an efficient two-layer learning model which is hyperparameter-free with only a single hyperparameter. It optimizes the cost function-sparsity of l2-normalized features and can scale easily to high-dimensional input data. Also, it is capable of learning significant features in multiple layers using stacked layering. In the discovery of communities, a sparse filtering algorithm is applied to extract the network features for further network partitioning tasks, resulting in meaningful community structures [18].

  1. F.

    Community Embedding-Based Approaches

The graph embedding approach focuses on the distribution of nodes present in communities in low-dimensional space. The approach embeds communities rather than specific nodes, which is a reverse approach. Community embedding is good for community detection as well as node categorization [19]. Reference [20] proposed a probabilistic generative model to learn representations of the social network by observing the information diffusion cascades instead of network structures. The proposed model learns community-preserving social network embeddings from social contagion logs. The focus is to discover social structures and predict information propagation in the network.

  1. G.

    Community Detection Based on Graph Neural Networks (GNN)

GNNs can model the complex relationships in graph-related data. GNN models are based on deep learning and graph mining techniques. The authors have proposed a model for detecting overlapping communities using graph neural networks (GNN) approach. This GNN-based model has proved to be more effective and robust than other existing approaches [21]. The authors have proposed a modified GNN framework that involves a line graph and a non-backtracking operator of the graph to analyze edge adjacency knowledge. The algorithms can be used for node-classification challenges apart from community detection tasks and have shown improvements in supervised community discovery problems [22].

2.1 Discussion

Table 2 summarizes the comparative analysis done based on the comprehensive survey of using deep neural networks (DNNs) for community detection. DNN is a new approach for solving social network analysis problems and is highly effective in graph and node representation in a network [23]. Previous approaches like stochastic block models and modularity maximization provide linear mapping to low-dimensional space. But mostly real-world networks have nonlinear structures so deep neural networks have proved to be effective in nonlinear representation [24].

Table 2 Comparative study of deep learning techniques on community detection

3 Challenges and Future Directions

The models and frameworks we have discussed above are the most recent strategies developed in the last few years to solve the community detection problem. From our study, we have encountered many challenging criteria that need to be further focused on, and the most efficient solutions must be provided. This section presents some of these challenges spotted in our study that may lead to a new future research direction.

Temporal Changes in Communities

As the network continuously reflects changes in user relations and topological information, dynamic community detection should be considered. The models with high computational power which can analyze the dynamic community detection and extract spatio-temporal characteristics of the social structures are in demand to be developed.

Meaningful Representation of Datasets

Generally, an enormous amount of data is generated by social networks, which is used as input datasets to predict communities. Deep learning techniques must use datasets in a meaningful format to predict the correct semantic representation of communities. Additionally, a better interpretation of different communities formed may help in fast information propagation.

No Prior Knowledge of the Number of Communities

According to [30, 31], random walks were performed to get preliminary communities and refine results by modularity. But in the case of disconnected networks, random walks cannot cover each node, thus degrading the performance of community detection algorithms.

Signed Networks

The impact of the type of relationship (positive or negative) on nodes is different. So, existing community detection approaches implemented on unsigned networks cannot be used for signed networks. So, to detect communities in signed networks the focus of the research has to be on representing negative ties. Deep learning strategies developed should be efficient enough to represent positive and negative ties in signed networks. Future work may cover the impact of signed edges.

Community Overlapping Detection

More efforts are needed to focus on the overlapping detection approaches as some of these strategies discussed in this survey have worked on overlapping community detection problems.

Efficient use of Computational Resources

Some of the developed algorithms require heavy computations, one such mentioned in where the processing of adjacency matrix to similarity matrix construction requires large computation resources. Thus, mechanisms for better use of computational resources should be developed for the new computation-specific strategies.

Comparative Analysis Intermediaries

There is a shortfall of straightforward comparative analysis techniques for the strategies we have studied so far. In this regard, the Network kit is the most widely used tool kit for large-scale network analysis tasks and has inbuilt algorithms already implemented by the researchers.

NLP Embeddings

The latest trend used is called random walks for node embeddings. These node embeddings help similar nodes remain close in their representations. So, further trends also include temporal graphs and ego networks.

4 Conclusion

In this survey paper, we analyzed the existing community detection techniques and current trends using deep learning approaches for community discovery tasks in various scenarios. As discussed in this review paper, deep learning models for community detection have emerged to be more robust, effective, efficient, and flexible to handle high-dimensional network data. However, there is a scope for more research work in future to be done on studying the overlapping community detection problem, need of optimized algorithms with less computational complexity, taking into account signed networks, meaning representation of datasets to predict correct number of communities, dynamic community detection as the network is undergoing changes continuously, etc. Finally, along with the taxonomy of traditional and deep learning methods, challenges and prospects for community detection have also been elaborated in this paper.