A Comparative Study of Community Detection Techniques for Large Evolving Graphs

Coppens, Lauranne; De Venter, Jonathan; Mitrović, Sandra; De Weerdt, Jochen

doi:10.1007/978-3-030-43823-4_31

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1167))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2061 Accesses

Abstract

Community detection has recently received increased attention due to its wide range of applications in many fields. While at first most techniques were focused on discovering communities in static networks, lately the focus has shifted toward evolving networks because of their high relevance in real-life problems. Given the increasing number of the methods being proposed, this paper explores the current availability of empirical comparative studies of dynamic methods and also provides its own qualitative and quantitative comparison with the aim of gaining more insight in the performance of available methods. The results show that no single best performing community detection technique exists, but rather, the choice of the method depends on the objective and dataset characteristics.

You have full access to this open access chapter, Download conference paper PDF

Overlapping Community Detection in Static and Dynamic Networks: A Qualitative Assessment

Dynamic community detection in evolving networks using locality modularity optimization

Article 24 March 2016

Community detection in dynamic networks: a comprehensive and comparative review using external and internal criteria

Article 08 January 2021

Keywords

1 Introduction

Community detection techniques in complex networks are a well-covered topic in academic literature nowadays as identifying meaningful substructures in complex networks has numerous applications in a vast variety of fields ranging from biology, mathematics, and computer science to finance, economics and sociology. A majority of the literature covers static community detection algorithms, i.e. algorithms used to uncover communities in static networks. However, real-world networks often possess temporal properties as nodes and edges can appear and disappear, potentially resulting in a changed community structure. Consequently, researchers have recently taken a keen interest in community detection algorithms that can tackle dynamic networks. Given the increasing number of the methods being proposed, a systematic comparison of both their algorithmic and performance differences is required so as to be able to select a suitable method for a particular community discovery problem. Nonetheless, newly proposed community detection methods for dynamic graphs are typically compared with only very few methods in settings aiming to demonstrate superiority of the proposed method. Consequently, the setup and results of these comparisons might contain an unconscious bias towards one’s own algorithm. As such, a well-founded and extensive comparative analysis of dynamic community detection (DCD) techniques is missing in the current literature. This is not surprising given the many different aspects which come into play when comparing DCD methods: different underlying network models, different community definitions, different temporal segments used for detecting communities, different community evolution events tracked etc.

To bridge this literature gap, we perform a qualitative and quantitative comparison of DCD techniques. To this end, we adopt the classification system of Cazabet and Rossetti [1] to provide a concise framework within which the comparison is framed. For our qualitative comparison, we focus on relevant community detection characteristics like community definition used, ability to detect different type of communities and community evolution events as well as computational complexity. For quantitative analysis we report computational time and partition quality in terms of NF1 statistics, on 900 synthetic RDyn [7] and one real-world DBLP dataset [25]. Results showcase that no single best performing community technique exists. Instead, the choice of the method should adapt to the dataset and the final objective.

2 Methodology for Unbiased Comparison of Dynamic Community Detection Methods

In this section we provide details of our comparative study which basically consists of three parts: first, shortlisting candidate algorithms to be compared, second, analyzing their algorithmic characteristics and, third, performing the empirical analysis.

2.1 Algorithm Selection

Given the soundness and completeness of Cazabet and Rossetti’s classification framework [1], we opt for using this framework as a steering wheel in the process of method selection. Within this framework, three large types of dynamic algorithms for searching communities are distinguished: (1) those that only consider the current state of the network (instant-optimal); (2) those that only consider past and present clustering and past instances of the network topology (temporal trade-off); (3) those that consider the entire network evolution available in the data, both past and future clustering (cross-time).

In an applied setting, neglecting previous states of the network oftentimes leads to sub-optimal solutions. Additionally, it is realistic to assume that communities will be updated using data that become available periodically. Consequently, no future information will be available at time t. With this in mind, we opt for focusing on temporal trade-off algorithms. Within Cazabet and Rossetti’s framework, these are further subdivided into four categories: Global Optimization (denoted originally as 2.1), Rule-Based Updates (2.2), Multi-Objective Optimization (2.3) and Network Smoothing (2.4). For each subcategory, at least one representative is chosen. Additionally, the list of compared algorithms is complemented by more recently published techniques, which, in turn, are also classified in the four previously mentioned categories.

Moreover, three characteristics are instrumental in the selection. Firstly, the algorithm has to be able to detect communities in evolving graphs. Secondly, the algorithm would preferably be able to detect overlapping communities to ensure a realistic partitioning in social network problems. Thirdly, the capability of extracting community evolution events is a desired trait with the goal of having realistic partitions that incorporate as much available information as possible in the partitioning process. Finally, some algorithms will be included as benchmark algorithms in order to compare results with previously performed comparative analyses.

2.2 Qualitative Analysis

The qualitative analysis is based on the comparison of algorithmic characteristics. In particular, comparison is performed with respect to the following six questions:

1.
How does the method search for communities? In other words, which of the categories within the framework of Cazabet and Rossetti does it fit in (if any)?
2.
What community definition is adopted (modularity, density, conductance...)?
3.
How efficient the method is? That is, what is its computational complexity?
4.
Which community evolution events can the method track (birth, death, merge, split, growth, contraction, continuation, resurgence)?
5.
Can the method find overlapping communities?
6.
Can the method find hierarchical communities?

2.3 Empirical Analysis Setup

Given their different characteristics, to provide a fair comparison, selected DCD methods are benchmarked based on both synthetic and real-life datasets. As synthetic datasets, 9 different RDyn graphs [7] were created by varying the number of nodes to 1000, 2000 and 4000 and the communities size distribution parameter \(\alpha \) to be 2.5, 3 and 3.5. Larger \(\alpha \) makes the sizes of communities relatively larger, more dispersed, while smaller makes the differences between community sizes smaller, more uniform. The rate of node appearance and vanishing is fixed to 0.05 and 0.02 respectively. The appearance rate is slightly larger than the vanishing rate in an attempt to mimic a slowly growing graph which could resemble, for instance, a customer base where customers enter, remain for a (long) while, and churn. For each of the 9 different RDyn graphs, 100 RDyn instances are created, yielding 900 graphs in total. The specific number 100 was arbitrarily chosen but is used to account for variations in the results.

As for the real-life dataset, the co-authorship graph from [25] was used. This dataset was originally extracted from DBLP database and for purposes of this analysis, was further limited to data from 1971 to 2002. Resulting dataset has 850 875 nodes, which represent 303 628 unique authors, and 1 656 873 edges.

To measure the relative performance of the different algorithms, two metrics were chosen. On the one side, the quality of the partition is measured by Normalized F1-statistics (NF1) and on the other side, the efficiency of the algorithm is reported in terms of the computation time.

3 Results

In this section we provide results of each of the three phases of our comparative analysis: algorithm selection, qualitative and quantitative analysis.

3.1 Algorithm Selection

For the broad selection, the initial list of 51 papers on DCD methods was used [6, 13,14,15,16,17,18, 20,21,22,23, 27,28,65]. It was obtained by supplementing 32 temporal trade-off algorithms [6, 13,14,15, 17, 21, 22, 24, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] from [1] with 19 algorithms not included in the aforementioned survey [16, 18, 20, 23, 51,52,53,54,55,56,57,58,59,60,61,62,63,64,65] that nonetheless possess interesting characteristics with regards to community and evolution extraction. Figure 1 illustrates the relevance of adding those 19 papers as it ensures the inclusion of more recent methods.

After this list of algorithms was compiled, the three algorithm-specific characteristics mentioned before were compared in order to select the approximately ten most promising algorithms that will be compared qualitatively and empirically. Following the analysis mentioned above, 13 algorithms were selected to be compared, as follows.

Partition Update by Global Optimization (2.1). This category contains algorithms that incrementally update and find communities by globally optimizing a metric such as modularity, density or other utility functions. Two methods represent this category in the analysis. Firstly, D-GT is a game-theory based algorithm proposed in [13] for dynamic social networks. The technique considers nodes as rational agents, maximizes a utility function and finds the optimal structure when a Nash equilibrium is reached. Secondly, Updated BGL is a modularity based incremental algorithm designed by [14]. It is more time-efficient than its modularity-based peers that do not rely on community updating. Both D-GT and Updated BGL are capable of tracking community evolution events.

Partition Update by Set of Rules (2.2). This category seems to be the most promising in terms of efficiency and accuracy for algorithms that take into account past information. The algorithms belonging to this category all consider a list of historic network changes in order to update the network’s partitioning. AFOCS is an algorithm designed for performing well in mobile networks, such as online social networks, wireless sensor networks and Mobile Ad Hoc Networks [15]. The technique is able to uncover overlapping communities in an efficient way by incrementally updating the communities based on past information. It avoids the recalculation of communities at each time step, by identifying community evolution events based on four network changes, namely the appearance of a new node or edge and the removal of an edge or node. The algorithm applies different rules on how to update communities depending on which events occur. HOCTracker is a technique designed to detect hierarchical and overlapping communities in online social networks [16]. The approach detects community evolution by comparing significant evolutionary changes between consecutive time steps, reducing the number of operations to be performed by the algorithm. The algorithm identifies active nodes, which are nodes that (dis)appear or are linked to an edge that (dis)appears, and compares those nodes’ neighborhoods with their previous time step to reassign nodes to new communities if necessary. TILES, is an online algorithm that identifies overlapping communities by iteratively recomputing a node’s community membership in case of a new interaction [17]. The approach is capable of singling out community evolution events such as birth, death, merge, split, growth and contraction. OLCPM is an online, deterministic and DCD method based on clique percolation and label propagation [18]. OLCPM, unlike CPM (Clique Percolation Method) [19], works by updating communities by looking at some predefined events resulting in improved computation times. OLCPM is able to detect overlapping communities in temporal networks. Finally, DOCET [20] incrementally updates overlapping dynamic communities after it finds an initial community structure. It can track community evolution events.

Informed CD by Multi-objective Optimization (2.3). The two previous categories updated partitions by looking at past communities. Informed community detection algorithms, on the other hand, calculate the communities from scratch in each time step. The algorithm tries to balance partition quality and temporal partition coherence or in other words, the current network structure and past partitions. A disadvantage of these kinds of approaches is the computational power necessary to execute the algorithm. An advantage is its temporal independence, potentially resulting in more stable outcomes. In informed community detection by multi-objective optimization, the partition at time t is detected by optimizing a certain metric, e.g. modularity, density. Two algorithms will represent this category in the evaluation. FacetNet was a pioneer in detecting communities in an unified process, in contrast with a two-step approach, where evolution events can be uncovered together with the partitioning [6]. Consequently, FacetNet is used as a benchmark approach in many papers introducing algorithms with similar capabilities. The approach finds communities based on non-negative matrix factorization and iteratively updates the network structure to balance the current partitioning fit and historical cost function. A disadvantage of the technique is that the number of communities is fixed and should be determined by the user. DYNMOGA [21], unlike FacetNet, balances the current partitioning fit and cost function simultaneously and, therefore, does not need a preference parameter with regard to maximizing partition quality and minimizing the historical cost or clustering drift. It optimizes a multi-objective problem and automatically determines the optimal trade-off between cluster accuracy and clustering drift. Neither FacetNet or DYNMOGA are capable of detecting overlapping communities.

Informed CD by Network Smoothing (2.4). ECSD proposed by [22] is a particle-and-density based evolutionary clustering method that is capable of determining the number of communities itself. The method detects the network’s structure and evolutionary events by trading off historic quality and snapshot quality, similar to the previous subcategory. The difference, however, is that ECSD finds its clusters by temporally smoothing data instead of results.

Other Benchmarks. Within this final category of methods introduced for comparison purposes we consider two algorithms: DEMON and iLCD. DEMON introduced in [23] (and extended in [26]) is a technique that is able to hierarchically detect overlapping communities but cannot, unlike all previous methods, identify community evolution events. iLCD [24], in previous empirical comparisons, is repeatedly shown to perform worse in terms of partition quality and computation speed with regards to other tested algorithms (e.g. FacetNet). It will be interesting to evaluate or verify the relative performance of these methods.

3.2 Qualitative Results

The first aspect that stands out is the larger presence of algorithms that update communities by a set of rules (2.2) not only in our final selection, but, likewise, among the more recently proposed methods, such as AFOCS, HOCTracker, OLCPM and DOCET which are also more focused on performing in dynamic social environments.

The second aspect that attracted our attention was the fact that nearly none of the analyzed algorithms focused in particular on the detection of hierarchical communities. Moreover, even though it was expected that hierarchy would be a relevant factor, it was generally not even mentioned whether an algorithm was capable of detecting hierarchical communities. On the other hand, detecting overlapping communities in social networks was oftentimes considered as necessity in current literature.

Thirdly, it is striking that all algorithms from the categories that optimize an objective function use modularity as community definition. Algorithms that do not optimize an objective function sometimes still utilize a metric as a guide to search for communities, but operate by exploiting other characteristics of the network topology, such as the frequency of node neighbors by labeling nodes, done by label propagation [23].

An overview of previously discussed characteristics for selected methods can be found in Table 1. In the last column of Table 1, time complexity per each method, as provided by its introductory study, is presented. It can be seen that the required resources needed for running Extended BGL, LabelRankT, ECSD and iLCD grow linearly with the number edges, making these algorithms the most efficient ones. Next, FacetNet’s computation time grows proportional with its number of edges and communities, DYNMOGA scales log-linearly with the number of nodes and DEMON is dependent on the number of nodes and the maximum degree. Finally, TILES, AFOCS, HOCTracker and DOCET appear to be the most complex algorithms as their computation time is expected to grow quadratically with the number of nodes, which is particularly problematic for large graphs. It cannot be derived whether the complexity is closely related to the category the algorithm belongs to Category 2.2., however, seems to be most complex.

In the context of social networks (and not only these), the knowledge of what types of community evolution events occur at which moments in time can be valuable information in order to understand what is happening with the network structure over time. Currently, the literature recognizes eight community evolution events, namely birth, death, merge, split, growth, contraction, continue and resurgence of a community, although, obviously, not every method is able to detect all of them. Therefore, for each of the methods which do support community evolution tracking (see column “Evolution” in Table 1), it is worth investigating further which in particular event(s) the tracking refers to. As can be seen from Table 2, several remarks can be made along these lines.

Table 1. Comparison of dynamic methods based on observed characteristics and their time complexity (last column). “CS Type” stands for category in Cazabet and Rossetti’s framework. A “-” denotes that methods do not search communities by optimizing a metric, but operate by exploiting characteristics of network topology. Notation used for time complexity: n, m, c, g - number of nodes, edges, communities and generations respectively, K - maximum degree, \(\alpha \) - degree distribution parameter, \(RQ_{ttl}\) - expected average size of interactions processed during interaction removal phase, |U| - number of nodes to be updated.

Full size table

Firstly, it is remarkable that the event resurgence cannot be detected by any of the selected algorithms, nor by any of the other algorithms that were analyzed, even though the event has been included in the literature, among others by [1]. Similarly, the event continue is rarely mentioned explicitly. It might be the case that continue is implied/detected when no event occurs and is therefore not mentioned by the authors.

Secondly, the algorithms, such as OLCPM, HOCTracker and DOCET, that were included in addition to the survey by [1] because they were more recent and possessed good features for social network community detection, can detect most of the events community evolution events. Only resurgence cannot be detected by any of the methods which we assume is due to the fact that detecting resurgence requires more than two timestamps which is not the case with methods from category 2.2, in general.

Thirdly, some algorithms, such as Extended BGL, ECSD and iLCD, only track events that are linked with the emergence of nodes and not their disappearance.

3.3 Quantitative Results

In this section we present the empirical results on synthetic dataset, RDyn, and real-world dataset, DBLP, in terms of partition quality (NF1) and computation times (secs).

Synthetic Graph (RDyn)

Table 2. Tracking of community evolution events by selected algorithms. “CS Type” stands for category in Cazabet and Rossetti framework. Question marks denote that the algorithm is able to detect community evolution events, but the original papers do not specify which ones explicitly.

Full size table

Partition Quality. The results of partition quality in terms of NF1 measure are provided in Table 3. The best performing algorithm is HOCTracker followed by iLCD and DEMON which only slightly differ from each other. Next, OLCPM is the second worst performer followed by Tiles who ended up having very poor results in terms of NF1. In general, the community size distribution parameter and the number of nodes do not have a trend that influences the partition quality. The impact of these variables differs algorithm to algorithm.

HOCTracker returns the highest NF1 values for \(\alpha =3\) and the lowest for \(\alpha =2.5\). However, it also exhibits much higher standard deviations associated with each group of RDyn instances, especially in comparison with iLCD and OLCPM. Note that standard deviation in Table 3 is not the standard deviation of the mean NF1 across every RDyn instance of one of the nine RDyn categories, but represents the average standard deviation of all NF1 measures within one RDyn instance. Even though the small standard deviation values in iLCD could be interpreted as method consistency (thus in its advantage), closer investigation revealed that oftentimes a lot of nodes where not assigned to a community for a specific graph resulting in NF1 scores of 0 for those communities and consequently for their graphs. If NF1 mean is 0 than the standard deviation is either close to 0 or 0. This can be seen in Fig. 2. OLCPM and iLCD appear to classify algorithms quite well once the algorithm succeeds at assigning the majority of the nodes.

Scalability. As can be seen from Table 4, the best performing algorithm on synthetic dataset in terms of execution times is iLCD, followed by TILES, DEMON, OLCPM and lastly HOCTracker. Remarkably, the group with \(\alpha \!=\!3\) takes the longest to execute across all sizes with the exception of OLCPM on the (1000, 3) graph where (1000, 2.5) requires the longest time. Although this observation is very notable, there is no reasonable explanation why this occurs. It can be concluded that specific characteristics can have a significant impact on the time required to analyze a graph but we refrain from specifying a specific relation between community size distribution and execution times.

Another interesting observation is that within each size group, the graphs with relatively large differences in community sizes (\(\alpha = 3.5\)) often require the least time to analyze. If they are not the fastest their performance is rather similar to the fastest.

According to literature, iLCD scales with the number of edges in a graph. However, this is not reflected in the execution times. For OLCPM, it was observed that the execution time is very variable. For the \(\alpha \!=\!2.5\), the execution times for 1000 and 4000 are almost equal. For 2000 nodes it only takes 75% of the time used for the former. An anomaly can clearly be observed for OLCPM, the average execution time for the (2000, 2.5) and (2000, 3.5)-instances group takes less time than their (1000, 2.5) and (1000, 3.5) despite having double the number of nodes and approximately edges. This observation cannot be attributed to a specific aspect of the algorithm. As expected, DEMON was found to scale with the number of nodes and the average degree distribution parameter (kept constant on all instances of RDyn). At last, HOCTracker performed the worst which was not unsurprising as it scales in a quadratic way.

Real-World Graph (DBLP). Three algorithms were run on the DBLP dataset: DEMON, iLCD and TILES. HOCTracker returns an OutOfMemoryException when trying to run it on the DBLP dataset, which demonstrates the unsuitability of the algorithm for large graphs. OLCPM runs itself into a loop on the DBLP dataset. We also suspect the method unsuitability for large datasets as this phenomenon did not occur on the RDyn dataset and the test in its introductory paper encompassed only small datasets (<10 000 nodes).

To analyze the performance of DEMON, iLCD and TILES each partitioning is benchmarked with the resulting partitioning of each of the other algorithms as ground truth. From the results, in Table 5, it can be observed that, on average, TILES is the worst and DEMON the best performing algorithm. Even though DEMON is the best performing algorithm, it needs significantly more computation time 3099.48 s on DBLP dataset as compared to TILES requiring 1436.71 s and iLCD which is the fastest with only 55.73 s. Figure 3 shows the evolution of mean NF1 scores of the three different methods for each year from 1971 to 2002. A general trend can be observed: as time progresses and more nodes and edges are introduced, the NF1 values drop significantly, however, not at the same pace for every algorithm. While TILES starts as the worst-performing algorithm in the earlier timestamps and thus smaller graphs, it ends up being the most performing one once the graph size exceeds 35 000 nodes (1991).

4 Related Work

A plethora of studies focusing purely on comparing algorithms for static community detection methods can be found in the literature [3, 8,9,10,11,12]. In contrast, to the best of our knowledge, there are no studies that focus purely on the empirical comparison of DCD algorithms. Instead, in the studies which introduce new DCD algorithms or new dynamic benchmark graphs, authors typically benchmark their own method with few others with the aim to showcase that their technique performs better with regard to peers.

Table 3. Mean NF1 results and the associated standard deviations of benchmarked algorithms on RDyn dataset. The highest scores per different number of nodes and alpha are underlined while the best averages are boldfaced. “CS Type” stands for category in Cazabet and Rossetti framework.

Full size table

The situation is slightly better with respect to benchmark graphs, where a vast body of literature is available. Although the most prominently used benchmark graphs Girvan-Newman (GN) [4] and Lancichinetti-Fortunato-Radicchi (LFR) [2] are not suited for temporal community discovery, to this end, their extensions in [6] and [5] respectively, were proposed. Next, RDyn, a framework for generating dynamic networks along with time-dependent ground-truth partitions with tunable qualities, was introduced in [1].

Table 4. Average computation time (in sec) over 100 RDyn instances per each benchmarked algorithm (the shortest boldfaced). “CS Type” stands for category in Cazabet and Rossetti framework.

Full size table

Table 5. The mean NF1 results and the associated standard deviations when running DEMON, iLCD and TILES (rows) and using them as ground truth (columns) on DBLP dataset.

Full size table

To gain better insight in how the mentioned sporadic comparisons of DCD algorithms and/or dynamic benchmark graphs have been performed, in the context of this literature review, we considered a selection of 51 papers on DCD methods. The first remarkable finding is that 12 out of 51 papers did not include a single comparison with peer algorithms [28, 33, 35, 40, 41, 44, 45, 48,49,50, 52, 56], while 39 did. A closer investigation of these 39 papers revealed 57 algorithms out of 82 were only benchmarked once. On the other hand, the most frequently referenced algorithm is FacetNet [6]. The second interesting finding is that authors seem to use various datasets for comparison, as they often include synthetic graphs and/or real-world graphs. In the assessed papers, 1 made use of only synthetic graphs [32], 32 used only real graphs [13, 14, 16, 22, 27, 28, 30, 31, 33, 34, 36,37,38,39,40,41, 44, 45, 48,49,50,51,52, 54,55,57, 59,60,63] and 17 used both [6, 15, 17, 18, 20, 21, 23, 24, 29, 42, 43, 46, 47, 53, 58, 64, 65]. In the 49 papers that used real graphs 47 different real graphs were introduced. A little over half of the datasets are only used once. The most popular are graphs extracted from the DBLP database. These occur as benchmark graphs in 19 of 51 papers. Hence, similarly to the use of methods, the use of datasets is also fairly heterogeneous which contributes to the difficulty of assessing the relative performance of techniques.

Due to the fact that the overlap in comparison is limited, it is hard to make any deductions with regard to the relative performance of the algorithms. Moreover, as mentioned before, the setup and results of these comparisons might contain an unconscious bias towards the proposed algorithm. This shows the relevance of this study.

5 Conclusion

Dynamic community detection has numerous applications in different fields and as such is extensively studied in the current literature. Nevertheless, a systematic and unbiased comparison of these methods is still missing. Therefore, in this paper we made steps towards scrutinizing algorithms and performing fairly both qualitative and quantitative comparisons on synthetic as well as real-life evolving graphs. The qualitative analysis included an overall set of characteristics relevant for (social) community detection such as community definition used, the ability to track community life-cycle events, overlapping and hierarchical communities, and the time complexity. For the empirical analysis, several limiting factors such as unavailable/poorly documented source code and inability to run methods on large graphs led to a narrower set of compared methods. Nevertheless, 900 synthetic, evolving graphs of various sizes and community size distributions and the most frequently used real-world DBLP dataset were used for a thorough analysis.

Undoubtedly, the field of community detection techniques that act on evolving graphs is characterized by its inherent heterogeneity in all its aspects. As such, there is no single best performing community technique, but rather, the choice and the performance depends on the objective and dataset characteristics. For future work, we envision an even more extensive empirical evaluation.

References

Rossetti, G., Cazabet, R.: Community discovery in dynamic networks: a survey. ACM Comput. Surv. 51, 1–37 (2018)
Article Google Scholar
Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing community detection algorithms. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 78, 046110 (2008)
Google Scholar
Lancichinetti, A., Fortunato, S.: Community detection algorithms: a comparative analysis. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 80, 056117 (2009)
Google Scholar
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E - Stat. Nonlinear Soft Matter Phys. 69, 026113 (2004)
Google Scholar
Greene, D., Doyle, D., Cunninngham, P.: Tracking the evolution of communities in dynamic social networks. In: Proceedings of the International Conference on Advances in Social Networks, pp. 1–18 (2011)
Google Scholar
Lin, Y.-R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Analyzing communities and their evolutions in dynamic social networks. ACM Trans. Knowl. Discov. Data 3(2), 1–31 (2009)
Article Google Scholar
Rossetti, G.: RDyn: graph benchmark handling community dynamics. J. Complex Netw. 4, 893–912 (2017)
Article Google Scholar
Danon, L., Dìaz-Guilera, A., Duch, J., Arenas, A.: Comparing community structure identification. J. Stat. Mech: Theory Exp. 9, 219–228 (2005)
MATH Google Scholar
Harenberg, S., et al.: Community detection in large-scale networks: a survey and empirical evaluation. Wiley Interdiscip. Rev. Comput. Stat. 6(6), 426–439 (2014)
Article Google Scholar
Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6(1), 30750 (2016)
Article Google Scholar
Wagenseller, P., Wang, F., Wu, W.: Size matters: a comparative analysis of community detection algorithms. IEEE Trans. Comput. Soc. Syst. 5, 951–960 (2018)
Article Google Scholar
Zhao, Z., Zheng, S., Li, C., Sun, J., Chang, L., Chiclana, F.: A comparative study on community detection methods in complex networks. J. Intell. Fuzzy Syst. 35, 1077–1086 (2018)
Article Google Scholar
Alvari, H., Hajibagheri, A., Sukthankar, G.: Community detection in dynamic social networks: a game-theoretic approach. In: Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014, pp. 101–107 (2014)
Google Scholar
Shang, J., et al.: A real-time detecting algorithm for tracking community structure of dynamic networks, vol. 12 (2014)
Google Scholar
Nguyen, N.P., Dinh, T.N., Tokala, S., Thai, M.T.: Overlapping communities in dynamic networks: their detection and mobile applications. In: Mobicom, pp. 85–96 (2011)
Google Scholar
Bhat, S.Y., Abulaish, M.: HOCTracker: tracking the evolution of hierarchical and overlapping communities in dynamic social networks. IEEE Trans. Knowl. Data Eng. 27(4), 1013–1019 (2015)
Article Google Scholar
Rossetti, G., Pappalardo, L., Pedreschi, D., Giannotti, F.: Tiles: an online algorithm for community discovery in dynamic social networks. Mach. Learn. 106(8), 1213–1241 (2017)
Article MathSciNet Google Scholar
Boudebza, S., Cazabet, R., Azouaou, F., Nouali, O.: OLCPM: an online framework for detecting overlapping communities in dynamic social networks. Comput. Commun. 123, 36–51 (2018)
Article Google Scholar
Gergely, P., Imre, D., Illés, F., Tamás, V.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(m), 814–818 (2005)
Google Scholar
Wang, Z., Li, Z., Yuan, G., Sun, Y., Rui, X., Xiang, X.: Tracking the evolution of overlapping communities in dynamic social networks. Knowl.-Based Syst. 157, 81–97 (2018)
Article Google Scholar
Folino, F., Pizzuti, C.: Multiobjective evolutionary community detection for dynamic networks. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pp. 535–536 (2010)
Google Scholar
Kim, M.-S., Han, J.: A particle-and-density based evolutionary clustering method for dynamic networks. Proc. VLDB Endow. 2(1), 622–633 (2009)
Article Google Scholar
Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: DEMON: a local-first discovery method for overlapping communities, pp. 615–623 (2012)
Google Scholar
Cazabet, R., Amblard, F., Hanachi, C.: Detection of overlapping communities in dynamical social networks. In: Proceedings of the 2010 2nd IEEE International Conference on Socical Computing PASSAT 2010 2nd IEEE International Conference on Privacy, Security and Risk Trust, no. August, pp. 309–314 (2010)
Google Scholar
Demaine, E., Hajiaghayi, M.: DBLP graphs (BigDND: dynamic network data) (2019). http://projects.csail.mit.edu/dnd/DBLP/
Coscia, M., Rossetti, G., Giannotti, F., Pedreschi, D.: Uncovering hierarchical and overlapping communities with a local-first approach. ACM Trans. Knowl. Discov. Data 9(1), 1–27 (2014)
Article Google Scholar
Aynaud, T., Guillaume, J.-L.: Static community detection algorithms for evolving networks. In: Ad Hoc Wireless Networks, pp. 513–519 (2010)
Google Scholar
Bansal, S., Bhowmick, S., Paymal, P.: Fast community detection for dynamic complex networks. Commun. Comput. Inf. Sci. 116, 196–207 (2011). https://doi.org/10.1007/978-3-642-25501-4_20
Article Google Scholar
Görke, R., Maillard, P., Staudt, C., Wagner, D.: Modularity-driven clustering of dynamic graphs. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 436–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13193-6_37
Chapter Google Scholar
Miller, K., Eliassi-Rad, T.: Continuous time group discovery in dynamic graphs. In: Networks Learning with Graphs, pp. 1–7 (2009)
Google Scholar
Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clusters in highly dynamic graphs: identifying real world events in highly dynamic environments, vol. 5, no. 10, pp. 980–991 (2012)
Google Scholar
Cazabet, R., Amblard, F.: Simulate to detect: a multi-agent system for community detection. In: Proceedings of the 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, IAT 2011, vol. 2, September, pp. 402–408 (2011)
Google Scholar
Duan, D., Li, Y., Li, R., Lu, Z.: Incremental K-clique clustering in dynamic social networks. Artif. Intell. Rev. 38(2), 129–147 (2012)
Article Google Scholar
Falkowski, T., Barth, A., Spiliopoulou, M.: DENGRAPH: a density-based community detection algorithm. In: WI 2007 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 112–115 (2007)
Google Scholar
Görke, R., Hartmann, T., Wagner, D.: Dynamic graph clustering using minimum-cut trees. In: Dehne, F., Gavrilova, M., Sack, J.-R., Tóth, C.D. (eds.) WADS 2009. LNCS, vol. 5664, pp. 339–350. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03367-4_30
Chapter Google Scholar
Lee, P., Lakshmanan, L.V.S., Milios, E.E.: Event evolution tracking from streaming social posts (2013)
Google Scholar
Ma, H.S., Huang, J.W.: Cut: community update and tracking in dynamic social networks. In: Proceedings of the 7th Workshop on Social Network Mining and Analysis, pp. 1–8 (2013)
Google Scholar
Nguyen, N.P., Dinh, T.N., Xuan, Y., Thai, M.T.: Adaptive algorithms for detecting community structure in dynamic social networks. In: Proceedings of the IEEE INFOCOM, pp. 2282–2290 (2011)
Google Scholar
Xie, J., Chen, M., Szymanski, B.K.: LabelRankT: incremental community detection in dynamic networks via label propagation (2013)
Google Scholar
Zakrzewska, A., Bader, D.A.: A dynamic algorithm for local community detection in graphs. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015, no. 5, pp. 559–564 (2015)
Google Scholar
Crane, H., Dempsey, W.: Community detection for interaction networks, pp. 1–29 (2015)
Google Scholar
Gong, M.G., Zhang, L.J., Ma, J.J., Jiao, L.C.: Community detection in dynamic social networks based on multiobjective immune algorithm. J. Comput. Sci. Technol. 27(3), 455–467 (2012)
Article MathSciNet Google Scholar
Görke, R., Maillard, P., Schumm, A., Staudt, C., Wagner, D.: Dynamic graph clustering combining modularity and smoothness. J. Exp. Algorithmics 18(April), 1.1–1.29 (2013)
Article MathSciNet Google Scholar
Kawadia, V., Sreenivasan, S.: Sequential detection of temporal communities by estrangement confinement. Sci. Rep. 2, 794 (2012)
Article Google Scholar
Sun, Y., Tang, J., Han, J., Gupta, M., Zhao, B.: Community evolution detection in dynamic heterogeneous information networks. In: Proceedings of the Eighth Workshop on Mining and Learning with Graphs, MLG 2010, pp. 137–146 (2010)
Google Scholar
Tang, L., Liu, H., Zhang, J., Nazeri, Z.: Community evolution in dynamic multi-mode networks. Tetrahedron Lett. 45(9), 1903–1906 (2004)
Article Google Scholar
Yang, T., Chi, Y., Zhu, S., Gong, Y., Jin, R.: Detecting communities and their evolutions in dynamic social networks— a Bayesian approach. Mach. Learn. 82(2), 157–189 (2011). https://doi.org/10.1007/s10994-010-5214-7
Article MathSciNet MATH Google Scholar
Zhou, D., Councill, I., Zha, H., Giles, C.L.: Discovering temporal communities from social network documents. In: Proceedings of the IEEE International Conference on Data Mining, ICDM, pp. 745–750 (2007)
Google Scholar
Guo, C., Wang, J., Zhang, Z.: Evolutionary community structure discovery in dynamic weighted networks. Phys. A Stat. Mech. Appl. 413, 565–576 (2014)
Article Google Scholar
Xu, K.S., Hero, A.O.: Dynamic stochastic blockmodels: statistical models for time-evolving networks. In: Greenberg, A.M., Kennedy, W.G., Bos, N.D. (eds.) SBP 2013. LNCS, vol. 7812, pp. 201–210. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37210-0_22
Chapter Google Scholar
Li, J., Huang, L., Bai, T., Wang, Z., Chen, H.: CDBIA: a dynamic community detection method based on incremental analysis. In: 2012 International Conference on Systems and Informatics (ICSAI2012), pp. 2224–2228 (2012)
Google Scholar
Chakrabarti, D., Kumar, R., Tomkins, A.: Evolutionary clustering. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, p. 554 (2006)
Google Scholar
Hu, Y., Yang, B., Lv, C.: A local dynamic method for tracking communities and their evolution in dynamic networks. Knowl.-Based Syst. 110, 176–190 (2016)
Article Google Scholar
Ilhan, N., Oguducu, I.G.: Community event prediction in dynamic social networks. In: 2013 12th International Conference on Machine Learning and Applications, pp. 191–196 (2013)
Google Scholar
Appel, A.P., Cunha, R.L.F., Aggarwal, C.C., Terakado, M.M.: Temporally evolving community detection and prediction in content-centric networks. In: Berlingerio, M., Bonchi, F., Gärtner, T., Hurley, N., Ifrim, G. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11052, pp. 3–18. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10928-8_1
Chapter Google Scholar
Tajeuna, E.G., Bouguessa, M., Wang, S.: Modeling and predicting community structure changes in time evolving social networks. IEEE Trans. Knowl. Data Eng. 31, 1166–1180 (2018)
Article Google Scholar
Zhao, Z., Li, C., Zhang, X., Chiclana, F., Viedma, E.H.: An incremental method to detect communities in dynamic evolving social networks. Knowl.-Based Syst. 163, 404–415 (2019)
Article Google Scholar
Jiao, P., Wang, W., Jin, D.: Constrained common cluster based model for community detection in temporal and multiplex networks. Neurocomputing 275, 768–780 (2018)
Article Google Scholar
Cheraghchi, H.S., Zakerolhosseini, A.: Toward a novel art inspired incremental community mining algorithm in dynamic social network. Appl. Intell. 46(2), 409–426 (2017). https://doi.org/10.1007/s10489-016-0838-3
Article Google Scholar
Sun, H., et al.: IncOrder: incremental density-based community detection in dynamic networks. Knowl.-Based Syst. 72, 1–12 (2014)
Article Google Scholar
Said, A., Abbasi, R.A., Maqbool, O., Daud, A., Aljohani, N.R.: CC-GA: a clustering coefficient based genetic algorithm for detecting communities in social networks. Appl. Soft Comput. J. 63, 59–70 (2018)
Article Google Scholar
Li, Z., Liu, J., Wu, K.: A multiobjective evolutionary algorithm based on structural and attribute similarities for community detection in attributed networks. IEEE Trans. Cybern. 48(7), 1963–1976 (2018)
Article Google Scholar
Asadi, M., Ghaderi, F.: Incremental community detection in social networks using label propagation method. In: Conference of Open Innovations Association (FRUCT), pp. 13–16, November 2018
Google Scholar
Li, Y., He, K., Kloster, K., Bindel, D., Hopcroft, J.: Local spectral clustering for overlapping community detection. ACM Trans. Knowl. Discov. Data 12(2), 1–27 (2018)
Article Google Scholar
Zhang, C., Zhang, Y., Wu, B.: A parallel community detection algorithm based on incremental clustering in dynamic network. In: Procedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analaysis and Mining, ASONAM 2018, pp. 946–953 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center for Information Systems Engineering (LIRIS), KU Leuven, Leuven, Belgium
Lauranne Coppens, Jonathan De Venter, Sandra Mitrović & Jochen De Weerdt

Authors

Lauranne Coppens
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan De Venter
View author publications
You can also search for this author in PubMed Google Scholar
Sandra Mitrović
View author publications
You can also search for this author in PubMed Google Scholar
Jochen De Weerdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandra Mitrović .

Editor information

Editors and Affiliations

Institut National des Sciences Appliquées, Rennes, France
Peggy Cellier
Maastricht University, Maastricht, The Netherlands
Kurt Driessens

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Coppens, L., De Venter, J., Mitrović, S., De Weerdt, J. (2020). A Comparative Study of Community Detection Techniques for Large Evolving Graphs. In: Cellier, P., Driessens, K. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Communications in Computer and Information Science, vol 1167. Springer, Cham. https://doi.org/10.1007/978-3-030-43823-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-030-43823-4_31
Published: 28 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43822-7
Online ISBN: 978-3-030-43823-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

A Comparative Study of Community Detection Techniques for Large Evolving Graphs

Abstract

Similar content being viewed by others

Overlapping Community Detection in Static and Dynamic Networks: A Qualitative Assessment

Dynamic community detection in evolving networks using locality modularity optimization

Community detection in dynamic networks: a comprehensive and comparative review using external and internal criteria

Keywords

1 Introduction