1 Introduction

A network is a collection of nodes that are connected by links, and this definition is applicable to many systems in today’s world, such as social networks, web networks, and biological networks. The analysis of these networks originated from the work of mathematicians on graphs and has since attracted the attention of other fields, owing to its applicability to a wide range of applications in different fields, including community detection, which is one of the most fundamental and classic problems. Most networks of interest display community structure, where the nodes are organized into groups, called communities, clusters, or modules (Fortunato and Hric 2016). Community detection involves dividing a network into a number of groups or modules, with each group representing a community. To be considered a community, a group or module must not be empty, its nodes must be part of the graph, each community must be different (i.e., the set of nodes in two different communities must not be identical), and the union of all communities must return the set of nodes in the graph (Kumar et al. 2020).

The objective of community detection is to identify hidden communities in a network based on information provided by the network, such as its topology or the characteristics of its nodes and edges (Guo et al. 2022). Community structures are critical to understanding not only the network topology but also how the network functions. Various methods have been proposed to detect community structures from different perspectives (Fortunato 2010).

In this work, we propose a novel approach for community detection, named PCMeans, which combines PageRank, hierarchical clustering, and k-means algorithms to tackle the community entire detection problem on the network. Our technique involves three phases: (1) identifying the most important nodes in the graph using Local PageRank, (2) partitioning the data points into groups based on their similarity using hierarchical clustering, and (3) refining the clusters using k-means clustering to obtain disjoint clusters from the overlapping clusters obtained in the second phase. While our approach employs Local PageRank, which operates locally on each node, the clustering itself is performed globally on the entire network. We adopt the initial number of communities and nodes, and this algorithm has low time complexity, simple operation, and easy implementation. The main objective of this approach is to select the initial nodes in the subgraph network with Local PageRank. Then, a hierarchical classification is conducted to create overlapping communities. Finally, the K-means algorithm is applied to the centers of these communities to find disjoint communities. According to the experimental results and comparative analysis of PCMeans with other state-of-the-art community detection algorithms, PCMeans generates consistently high-quality results for real-world and synthetic networks that are competitive with other algorithms in terms of accuracy. PCMeans is also more efficient than some other algorithms, as it addresses the problem of slow convergence and has low time complexity, simple operation, and easy implementation. Therefore, PCMeans is an effective and efficient method for community detection.

The rest of this paper is organized as follows: the related work is presented in Sect. 2, the proposed method in Sect. 3, the experimental results in Sect. 4, and finally, the concluding remarks in Sect. 5.

2 Related work

There are many methods proposed for detecting communities in social networks. Some of these methods allow for disjoint communities, and examples include:

The Newman–Girvan algorithm (Newman and Girvan 2004) is a method proposed for detecting community and sub-community structures in social networks. This algorithm involves iteratively removing edges from the network to split it into communities. The edges are identified using one of several possible “betweenness” measures, and these measures are recalculated after each removal. This process continues until the network is completely divided into communities. The algorithm has two definitive features: the iterative removal of edges and the recalculation of betweenness measures, which are crucial for its effectiveness. The algorithm has been shown to perform well in detecting community structures in a variety of real-world social networks.

The Louvain method (Blondel et al. 2008) is an iterative algorithm that aims to maximize the modularity of a network by moving nodes between communities. During the first phase, each node is initially assigned to its own community, and then the algorithm iteratively moves nodes between communities to maximize the modularity. In the second phase, the communities found in the first phase are collapsed into “super-nodes” and a new network is constructed, with the weights of the edges between the super-nodes representing the sum of the weights of the edges between nodes in the corresponding communities. The algorithm then starts again with the new network, repeating the two phases until no further improvement in modularity can be achieved.

K’-means (Žalik 2008): A modification of the K-means algorithm for non-overlapping community detection that allows the number of clusters to vary.

Vilcek (2014) proposed a new graph clustering approach for network community detection called the deep K-means algorithm. This algorithm is based entirely on K-means clustering and aims to improve upon traditional spectral clustering. Instead of using eigenvector decomposition in spectral clustering, Vilcek proposed using a multilayer autoencoder pipeline implemented with recursive K-means clustering. This approach was shown to outperform traditional spectral clustering in terms of accuracy, as measured by normalized mutual information. However, the algorithm has a disadvantage in that its execution time increases faster than spectral clustering as the size of the dataset increases. Additionally, the algorithm requires prior knowledge of the number of communities to be found, which is not always practical. Incorporating a technique such as modularity function optimization to automatically choose the number of communities would be an interesting future direction for this approach.

CLPSO-DE (Pourkazemi and Keyvanpour 2017): A hybrid algorithm that combines particle swarm optimization and differential evolution for community detection.

Frequent Pattern (Moosavi et al. 2017): A frequent pattern mining-based approach for community detection that extracts frequent subgraphs as communities.

Cai et al. (2019) proposed that DDJKM (Density, Degree, Jaccard, and K-means) algorithm is a clustering-based overlapping community detection method proposed in 2019. The algorithm uses the uncertainty of nodes, calculated based on their density, degree, and similarity, to describe their membership to different communities. K-means clustering is then applied to the uncertainty matrix to identify overlapping communities. The algorithm has shown promising accuracy and efficiency in various real-world networks.

CMCM (Geng et al. 2019): A multi-objective optimization-based algorithm that simultaneously optimizes both the modularity and the conductance for community detection.

Sheng et al. (2020) in 2020 proposed a new community detection method called IACD (Inter-node Attraction Community Detection), which is based on the attraction of internal nodes. The algorithm consists of three main steps: evaluating node importance, selecting pairs of attractive nodes, and dividing the community. First, they calculate the node influence (IF) and global influence (GIF) of each node, and then identify the most attractive node for each node to form pairs of attractive nodes. Finally, the algorithm creates a two-dimensional array to output the community division result. IACD considers the importance of nodes in the network, and uses a physics-inspired approach to represent the forces between nodes.

Hajij et al. (2020) proposed a novel approach to finding initial centers for the K-means clustering algorithm by using the PageRank vector as a centrality measure. This method was found to be efficient and provided several key benefits. One such advantage is that the PageRank vector can be calculated for both direct and indirect graphs. Additionally, since the PageRank vector was designed to be computed on large graphs, it offers improved speed. Finally, this method can be applied to other domains by easily generalizing to metric spaces.

Yuan et al. (2020) developed an influence maximization algorithm for social networks that selects the most influential node as the initial active node and assigns it the maximum number of nodes. They proposed the edge betweenness algorithm, based on community detection, to maximize node influence. The algorithm uses the K-means algorithm to divide the community and selects the optimal community segmentation result based on modularity. It then calculates the edge betweenness of each community and selects important nodes to form the set of starting nodes for the influence maximization algorithm. Finally, the independent cascade model is used to simulate the propagation of influence and maximize its effect.

Li et al. (2021) proposed an improved algorithm, called LPA-MNI (label propagation algorithm based on modularity and node importance), for detecting community structure in complex networks. The LPA-MNI algorithm aims to address the randomness inherent in the original LPA algorithm by combining modularity and node importance. Initially, LPA-MNI uses modularity optimization procedures to identify initial communities, and all nodes within a community are assigned the same label. In iterative label propagation processes, LPA-MNI updates labels in a decreasing order of node importance. When multiple labels are assigned to the same maximum number of nodes, LPA-MNI calculates the importance of each node and selects the label of the most influential node to update. Overall, the LPA-MNI algorithm improves upon the original LPA algorithm by leveraging modularity and node importance to more accurately detect community structure in complex networks.

Chaudhary and Singh (2021) concluded that unsupervised machine learning techniques can be used for community detection in COVID-19 datasets, which can help in understanding the spread of the virus and identifying potential transmission hotspots. They also noted that further research is needed to improve the accuracy of community detection algorithms and to account for factors such as the time-varying nature of social interactions during the pandemic.

FPPM (Wu et al. 2021): A frequent pattern-based approach for community detection that uses a pattern mining algorithm to extract communities.

Akbar et al. (2021) suggested that the use of modularity maximization for community detection in social networks can provide valuable insights for businesses and help them make more informed decisions.

Entropy gap (Liu et al. 2022): An algorithm that uses the concept of entropy to detect communities by identifying the gap between the entropy of the original network and that of a null model.

CDBNE (Zhou et al. 2023): presented a community detection algorithm based on unsupervised attributed network integration (CDBNE) to solve problems. They propose a framework that simultaneously learns representation based on network structure and attribute information and clustering-oriented representation.

There are also many community detection algorithms proposed by various researchers, allowing for overlapping communities. Some of the algorithms are:

The clique percolation method (CPM) is a community detection algorithm that was proposed by Palla et al. (2005). It is unique in that it allows for overlapping communities to be detected in a network. The algorithm works by finding all k-cliques in the network and creating a clique graph where each k-clique is represented by a node. Overlapping communities are then identified as sets of nodes that are connected by k-cliques. The size of the communities detected is determined by the value of k, with larger values detecting smaller and more tightly knit communities. The CPM algorithm has been effective in identifying overlapping communities in various types of networks, including social, biological, and technological networks.

The Infomap algorithm (Rosvall and Bergstrom 2008) is a popular method for detecting communities in networks proposed in 2008. It uses a random walk approach to identify communities based on information theory principles. The basic idea is to treat the network as a flow of information and to try to identify communities that represent clusters of nodes with high information flow within the cluster, but low information flow between clusters.

RCE (Musdar and Azhari 2015): A randomized clustering-based algorithm for overlapping community detection that uses a random walk-based approach.

LED (Ma et al. 2016) uses a community detection algorithm called the label propagation algorithm (LPA), proposed in 2016, which assigns each vertex to one or more communities based on the network weights and the current community assignments of its neighbors. The algorithm iterates between computing the network weights and applying LPA until convergence is reached.

EMc and PGDc (Van Laarhoven and Marchiori 2016): Two algorithms based on the expectation maximization algorithm and the projected gradient descent method, respectively, for overlapping community detection.

WalkSCAN is a popular algorithm for detecting overlapping communities. It was proposed by Tong and Cao in 2017 (Hollocou et al. 2016). The algorithm is based on a probabilistic generative model that uses a combination of Bayesian and expectation maximization techniques to identify overlapping communities. The idea behind the algorithm is to model the network as a random walk process, where each node represents a state in the process, and the edges represent the probabilities of transitioning from one state to another.

Kumar et al. (2020) proposed a clustering-based overlapping community detection method called NSGA-II that uses a multi-objective optimization approach to balance the trade-off between maximizing intra-community density and minimizing inter-community density. The algorithm starts by randomly selecting initial individuals, representing communities, and evaluating their fitness based on the two objectives. It then generates new individuals using a genetic algorithm-based approach to potentially improve fitness values. The process repeats iteratively until a set of non-dominated solutions, representing the overlapping communities in the network, are identified.

LCDNN (Luo et al. 2020): A deep neural network-based algorithm for overlapping community detection that uses both local and global information.

LGIEM (Ma et al. 2020): A likelihood-based algorithm for overlapping community detection that uses a generative model to describe the community structure.

While overlapping community detection methods offer certain advantages, they may not always be the best choice for detecting complex community structures or handling noisy data. In this article, we propose a new method for detecting communities in social networks. The method is primarily based on identifying important nodes and creates a set of overlapping communities that subsequently converge to a set of disjoint communities.

3 PCMeans: a novel community detection algorithm

In this paper, we proposed a novel approach for community detection called PCMeans algorithm, it is a novel approach for detecting communities in a graph. It consists of three stages: detect influenced nodes by calculate of the Local PageRank, overlapping hierarchical clustering, and K-means clustering. These three phases are successive.

In the first stage, the algorithm calculates the Local PageRank score for each node in the graph and sorts them in descending order.

In the second stage of the PCMeans algorithm, communities are refined by grouping overlapping ones. This involves starting with the node that has the highest Local PageRank score and its neighbors, and adding them to a new class. This process continues until all nodes in the graph have been assigned to at least one classes. Then, the similarity between pairs of communities is calculated based on the number of common nodes they share. Communities that have more than 50% of their nodes in common are merged. This step is repeated until no further merging is possible.

In the third and final stage, PCMeans applies K-means clustering to the communities identified in the second stage. The algorithm uses the centers of these communities as the initial centroids for K-means clustering. The number of clusters is set to be equal to the number of communities identified in the second stage. We use k-means clustering to obtain disjoint clusters from the overlapping clusters obtained in the second phase. This addresses the issue of overlapping data points in the clusters.

Figure 1 illustrates the flowchart of the PCMeans algorithm.

Fig. 1
figure 1

Framework of PCMeans algorithm

The PCMeans algorithm presented globally in Algorithm 1; thereafter, it will be detailed in the following sessions.

figure a

Overall, the PCMeans algorithm for community detection and can be used to analyze social networks, identify key players in a network, and understand the structure of complex systems.

The steps involved in PCMeans algorithm are:

  1. 1.

    Influenced nodes by Local PageRank Stage: The first stage of PCMeans involves community detection, which identifies influential nodes in a graph.

    1. (a)

      Calculate the PageRank score for each node in the subgraph

    2. (b)

      Sort the nodes in decreasing order of their score.

  2. 2.

    Overlapping hierarchical clustering Stage: The second stage of PCMeans involves refining the nodes found in the first stage by order for grouping overlapping communities. The steps involved in this stage are:

    1. (a)

      Take the node with the highest Local PageRank score, and add it and its neighbors to a new cluster.

    2. (b)

      Continue adding nodes to clusters until all nodes in the graph have been assigned to at least one cluster.

    3. (c)

      Calculate the similarity between pairs of clusters based on the number of common nodes between them.

    4. (d)

      If the number of common nodes between two clusters is greater than \(50\%\) of the size of the smaller cluster, then the two clusters are merged.

    5. (e)

      Repeat the previous step until no further merging is possible.

  3. 3.

    K-means clustering stage: The final stage of PCMeans involves applying K-means clustering to the clusters identified in the first two stages. The steps involved in this stage are:

    1. (a)

      Use the centers of the clusters identified in the second stages as the initial centroids for K-means clustering.

    2. (b)

      Set the number of communities to be equal to the number of clusters identified in the last stages.

    3. (c)

      Run K-means clustering on the communities to group nodes within the same community together

3.1 Stage 01: influenced nodes by local PageRank

PCMeans is a community detection algorithm that begins by calculating the Local PageRank (Brin and Page 1998; Bar-Yossef and Mashiach 2008) for each node in the graph. Local PageRank is a measure of the importance of a node within its local neighborhood, which is defined as the set of nodes that are reachable within two hops from the node using a modified breadth-first search algorithm with h = 2. This algorithm creates a subgraph that includes only the vertices and edges that are within two hops of the starting vertex, which can be useful for analyzing the local structure of a larger graph. We use Local PageRank to calculate the importance of nodes within this subgraph.

However, it is important to note that PCMeans is a global community detection method, which means that it takes into account the entire graph, not just the local subgraph. While Local PageRank provides a useful measure of node importance within a specific neighborhood, it is not the only factor taken into consideration by PCMeans. The algorithm also considers the edge weights between nodes and their corresponding cluster assignments to determine the final community structure of the graph.

The nodes in the graph are then sorted based on their Local PageRank values, from highest to lowest. Formula 1 for Local PageRank is defined as:

$$\begin{aligned} \text {PR}(v_i) = \frac{1-d}{N}+d{\sum\nolimits _{v_j\in M(v_i)} \frac{\text {PR}(v_j)}{L(v_j)}} \end{aligned}$$
(1)

where d is the damping factor, \( v_{i}\) and \( v_{j}\) are the nodes under consideration in G’, \( M(v_i)\) is the set of nodes that have a link with \(v_i\) where \(v_i\) in G’, \( L(v_i)\) is the number of links in subgraph G’, and N is the total number of nodes of G’.

All nodes in the network are arranged in descending order based on their Local PageRank values computed using Formula 1. In cases where nodes have the same Local PageRank, these nodes are arranged in ascending order of node number. We store the nodes with their Local PageRank, establishing a ranked list of nodes based on their importance within their local neighborhood.

Algorithm 2 presents this step.

figure b

3.2 Stage 02: overlapping clustering hierarchical

This stage presents an overlapping hierarchical clustering algorithm. The algorithm is based on selecting the node with the highest Local PageRank score found in the previous stage, and adding it and its neighbors to a community. The similarity between every pair of communities is then computed using a measure based on the percentage of common nodes, and two communities are grouped if their similarity exceeds a threshold of 50% compared to the smallest one. The algorithm repeats this process, gradually reducing the level of overlap between communities until no more grouping is possible.

In this paper, we use a measure of similarity based on the percentage of common nodes of two communities. This similarity measure is non-commutative. The formula for the similarity measure is presented as follows:

$$\begin{aligned} \text {Similarity}(C_i)= & {} \frac{|C_i\cap C_j |}{ |C_i|} \end{aligned}$$
(2)
$$\begin{aligned} \text {Similarity}(C_j)= & {} \frac{|C_i\cap C_j |}{ |C_j |} \end{aligned}$$
(3)

here \(C_i\) and \( C_j\) are communities detected in the graph G; \(|C_i |\) and \(|C_j |\) are the number of nodes in \(C_i\) and \(C_j\), respectively.

What interests us in these two measures is the maximum between them, we called SimMax. The formula for SimMax is presented as follows:

$$\begin{aligned} \text {SimMax}(C_i, C_j) = \max (\text {Similarity}(C_i), \text {Similarity}(C_j)) \end{aligned}$$
(4)

An advantage of this approach is its ability to assign nodes to multiple communities, reflecting the intricate and interconnected nature of real-world networks. The hierarchical arrangement of the resultant groupings can offer valuable understanding of the connections between communities at different levels of detail. However, we believe that using disjoint communities with inclusion of nodes would provide more precise results. Therefore, we plan to utilize the number and centers of these communities in the next stage.

The algorithm for this stage is presented as Algorithm 3:

figure c

3.3 Stage 03: K-means clustering

K-means clustering is a widely used method for partitioning a set of data points into K clusters. In the context of community detection, K-means can be used to group nodes in a network into K communities based on their similarity or distance. It is important to note that the quality of the community detection heavily depends on the choice of K, which is often determined by trial and error or by using external validation measures. Additionally, K-means can be sensitive to the initial choice of centroids and can sometimes converge to suboptimal solutions. Therefore, it is recommended to run the algorithm multiple times with different initializations and choose the best solution based on some criteria such as the minimum sum of squared distances between nodes and their assigned centroids.

In PCMeans, the number k of clusters to be equal to the number of communities identified in the previous stage and the centers of this communities are the initial centroids. The resulting K clusters generate the final community structure. We employ K-means clustering as a post-processing step after hierarchical clustering. Since K-means is faster and can partition data points into non-overlapping groups, we apply it to the cluster centers obtained from hierarchical clustering. The steps involved in the K-means clustering algorithm are present in algorithm 4:

figure d

3.4 The time complexity

The time complexity of the PCMeans algorithm depends on the time complexity of the individual steps involved. The time complexity of the first stage is the time for calculating the Local PageRank score for each node is T1. The time complexity of overlapping hierarchical clustering is T2. The time complexity of running K-means clustering on the communities to group nodes into K non-overlapping clusters is T3.

where T1 = Time complexity( first stage ) = O(n log n). T2 = Time complexity(second stage ) = O(\(K^2\) n log n). T3 = Time complexity( third stage )= O(it * K * n ).

where n: The number of nodes in the network. It: The number of iterations required for convergence, K: The number of communities.

Overall, the time complexity of PCMeans algorithm can be expressed as:

$$\begin{aligned} O(n log n + K^2 n log n + it * K * n). \end{aligned}$$
(5)

The time complexity of an algorithm is an important factor to consider when evaluating its performance. PCMeans has low time complexity compared to some other community detection algorithms. This is mainly due to the use of Local PageRank to identify the most influential nodes, which reduces the search space and computation time. Additionally, the overlapping hierarchical clustering strategy used in PCMeans can reduce the number of iterations required to converge to the final community structure, as it can quickly identify the optimal number of clusters. Finally, the K-means algorithm used in PCMeans is known for its rapid convergence and effective classification in large-scale datasets, which further contributes to the algorithm’s low time complexity. Overall, the low time complexity of PCMeans makes it a practical and efficient method for community detection in large networks.

4 Experiments

PCMeans algorithm as an effective tool for community detection in complex networks. It has been compared to other state-of-the-art algorithms (Louvain algorithm (Blondel et al. 2008), Girven-Newman (Newman and Girvan 2004), INFOMAP Fluid Communities algorithm (Parés et al. 2017), and label propagation algorithm (LPA) (Raghavan et al. 2007)).and found to be competitive in terms of accuracy and efficiency. However, the effectiveness of the algorithm may depend on the specific characteristics of the network being analyzed and the parameter settings used in the algorithm. Therefore, it is recommended to experiment with different parameter values and compare the results to other algorithms to determine the effectiveness of PCMeans in a given application. The effectiveness of the PCMeans algorithm was evaluated by conducting experiments on both synthetic and real social networks using Python implementation. The experiments were performed on a computer with an Intel Core i7 processor and 16GB of RAM. The approach was evaluated using four real network datasets and the Lancichinetti Fortunato Radicchi benchmark network.

4.1 Evaluation measures

Modularity (Q) Newman and Girvan (2004) A metric for measuring the quality of communities in a network. The higher the value of Q (closer to 1), the better the result.

Normalized mutual information (NMI) Danon et al. (2005) and Li et al. (2021) A performance measure based on information theory that compares the true partition of a network with the partition obtained by an experimental community detection algorithm. The higher the value of NMI, the better the performance of the algorithm.

Adjusted Rand Index (ARI) A measure of similarity between two partitions that corrects for chance agreement between the partitions. The ARI ranges from \(-1\) to 1, with 1 indicating perfect agreement, 0 indicating agreement no better than chance, and \(-1 \) indicating complete disagreement.

These measures are commonly used to evaluate the performance of PCMeans community detection algorithms.

4.2 Experimental results and analysis in networks

To validate the performance of the proposed algorithm, we used four real-world networks: Zachary Karate (Zachary 1977), Dolphin (Lusseau et al. 2003), Political Books (Krebs 2008), and Football (Jiang and McQuay 2012), along with five artificial networks from the LFR benchmark.

4.2.1 Experiments on real-world networks

Calculate the modularity Q, NMI, and ARI for different algorithms. Each result is an average of 30 repetitions, with the best result highlighted. The descriptions of these networks and all results can be found in Table 1.

Table 1 Modularity and NMI of different algorithms with real networks

Results of this table are presented in Fig. 2a–c for modularity, NMI, and ARI by order.

Moreover, while several algorithms exhibit unstable results, our algorithm remains stable. In summary, PCMeans produces similar partitions as compared to the other algorithms, while maintaining stability in its results.

Fig. 2
figure 2

Comparison of performance metrics between PCMeans and other algorithms

The proposed algorithm achieves remarkable results on various real-world networks. On Zachary’s karate club network, the algorithm converges to the global optimal, as indicated by the NMI score of 1. This means that the communities identified by the algorithm are identical to the actual communities. Similar excellent results are also observed on the Football club network, the Political book network, and the Dolphin network, with average NMIs of 0.9160, 0.6750, and 0.8140, respectively. The proposed algorithm outperforms all other experimental algorithms in terms of NMI and ARI indicators on all real-world networks. It also achieves the best ARI score for all experiments. Furthermore, on the Political book network, the algorithm identifies the second-best modularity score of 0.5191, which is comparable to the performance of INFOMAP. Despite not achieving the best modularity score, the NMI and ARI scores demonstrate that our algorithm’s performance is very close to the real results. This highlights the reliability and effectiveness of the proposed algorithm in identifying community structures in real-world networks.

4.2.2 Experiments on generated network “LFR benchmark”

To assess the accuracy of community detection algorithms, synthetic networks are often utilized due to the ability to control the network’s properties and obtain the corresponding real community structure through tunable parameters. The LFR benchmark network, developed by Lancichinetti, Fortunato, and Radicchi, is a widely adopted synthetic network model that exhibits similar properties to real-world networks. Our study focuses on LFR benchmark networks with 250 to 3000 nodes and average degree between 5 and 10. The degree distribution and community size distribution exponents are set to t1 = 1.5 and t2 = 3, respectively. Additionally, we consider two ranges of community sizes: s = (10:50) and b = (20:100), and maximum degrees of 20 or 50. The mixing parameter nu, which represents the expected fraction of links through which a node connects to other nodes in the same community, is set to 0.1 or 0.3. We evaluate the performance of different algorithms on all artificial networks using modularity, NMI, and ARI metrics.

All the results are present in Fig. 3a for modularity, in Fig. 3b for NMI and in Fig. 3c for ARI.

Fig. 3
figure 3

Comparison of performance metrics in artificial networks

PCMeans achieved the best NMI and ARI results on LFR artificial networks, indicating its superior performance compared to other algorithms. Although its modularity results were not the best, they were still satisfactory

5 Conclusion

In this study, we introduced the concept of Local PageRank through a community detection algorithm that combines K-means clustering and overlapping hierarchical clustering. Our proposed PCMeans algorithm leverages hierarchical clustering to study the similarity between nodes and reduces the number of iterations required for community splitting, resulting in improved efficiency. By initializing the population according to Local PageRank and using the proposed neighbor-based clustering operator, PCMeans outperforms traditional approaches with random initialization, as evidenced by our experiments on both real-world and synthetic networks. Specifically, our method achieves higher NMI and ARI values, indicating that it produces community structures that closely resemble real community structures, and the results are stable. In the future, we plan to explore PCMeans for use in very large dynamic complex networks.