1 Introduction

Network data are ubiquitous. Most real-world networks such as social networks, communication networks, and biological networks contain community structures. Discovering the community structures from a network is very useful for a number of applications. For example, in the biological network, a community may represent the molecule with common properties. In the communication network, a community may denote a close group which frequently communicate with each other.

Graph clustering is a fundamental tool to identify such community structures. In the last decade, there are a huge number of models and algorithms that have been proposed for graph clustering. A comprehensive survey on graph clustering and community detection algorithms can be found in [8]. Among all these algorithms, the structural graph clustering algorithm \(\mathsf {SCAN}\) proposed in [23] is an notable algorithm which has been successfully used in many network analysis tasks [23]. Unlike many other graph clustering algorithms, the streaking feature of \(\mathsf {SCAN}\) is that it is not only able to detect the clusters of a network, but it can also be identify hubs and outliers.

The idea of the \(\mathsf {SCAN}\) algorithm is similar to a density-based clustering algorithm DBSCAN, which has been widely used for clustering spatial data. Specifically, the \(\mathsf {SCAN}\) algorithm first defines the \(\mathsf {structural}\) \(\mathsf {similarity}\) between two end vertices of each edge in the graph. If the \(\mathsf {structural}\) \(\mathsf {similarity}\) for an edge is no less than a given threshold \(\varepsilon \), then this edge will be preserved. Otherwise, the algorithm can delete that edge. After this processing, the vertex in the remaining graph that has at least k neighbors is called a core vertex. Then, the algorithm uses the core vertices as seeds, and expands the clusters from the seeds by following the \(\mathsf {structural}\) \(\mathsf {similarity}\) edges (more details can be found in Sect. 2).

Unfortunately, the \(\mathsf {SCAN}\) algorithm is tailored for static graph data. However, real-world networks typically evolve over time. The naive structural clustering algorithm to handle the dynamic networks is to recompute all clusters from scratch using the \(\mathsf {SCAN}\) algorithm. Clearly, such a naive solution is very costly, as the time complexity of \(\mathsf {SCAN}\) algorithm is \(O(m^{1.5})\) (m denotes the number of edges of the graph), which is nonlinear with respective to the graph size [2].

To overcome this problem, we propose an efficient incremental structural clustering algorithm for dynamic networks, called \(\mathsf {ISCAN}\). The \(\mathsf {ISCAN}\) algorithm can efficiently maintain the clusters generated by the \(\mathsf {SCAN}\) algorithm without recomputing all the clusters. Specifically, when an edge updating (insertion or deletion), the \(\mathsf {ISCAN}\) algorithm only works on a small number of edges (i.e., the edges that their \(\mathsf {structural}\) \(\mathsf {similarity}\) may update). The \(\mathsf {structural}\) \(\mathsf {similarity}\) of the edges may decrease and increase when an edge updating (see Sect. 3). When the \(\mathsf {structural}\) \(\mathsf {similarity}\) of an edge decreases, we may need to split the clusters. On the other hand, when the \(\mathsf {structural}\) \(\mathsf {similarity}\) of an edge increases, we may merge the clusters. In \(\mathsf {ISCAN}\), we propose a BFS-forest structure to maintain the clusters. Each BFS-tree represents a cluster. We also use a set \(\varPhi \) to maintain the set non-tree edges such that the \(\mathsf {structural}\) \(\mathsf {similarity}\) of these edges are larger than the threshold \(\varepsilon \). When the algorithm splits a BFS-tree, we need to scan the set \(\varPhi \) to check whether the split tree can be merged again by an edge in \(\varPhi \). We conduct extensive experiments in eight large real-world networks. The results show that the \(\mathsf {ISCAN}\) algorithm is at least three orders of magnitude faster than the baseline algorithm.

The rest of this paper is organized as follow. In Sect. 2, we briefly introduce the \(\mathsf {SCAN}\) algorithm. We propose the \(\mathsf {ISCAN}\) algorithm in Sect. 3. The experimental results are reported in Sect. 4. We survey the related work and conclude this paper in Sects. 5 and 6 respectively.

2 Preliminaries

In this section, we briefly introduce several key concepts used in the \(\mathsf {SCAN}\) algorithm [23]. Let \(G=(V, E)\) be a graph, where V and E denote the set of vertices and edges respectively. The \(\mathsf {vertex}\) \(\mathsf {neighborhood}\) of a vertex \(v \in V\) is defined as \(\small \varGamma (v)\triangleq \{w\in V|(v,w)\in E \}\cup \{ v\}\). The \(\mathsf {structural}\) \(\mathsf {similarity}\) between two end vertices of an edge (uv) is defined as

$$\begin{aligned} \sigma (u,v) \triangleq \frac{| \varGamma (u)\cap \varGamma (v) |}{\sqrt{| \varGamma (u) || \varGamma (v) |}}. \end{aligned}$$
(1)

If u and v are not end vertices of an edge, we define \(\sigma (u,v)=0\). In the \(\mathsf {SCAN}\) algorithm, if \(\sigma (u,v) \) is no less than a given parameter \(\varepsilon \), the vertices u and v will be assigned into the same cluster. The \(\varepsilon \)-\(\mathsf {neighborhood}\) of a node v is defined as

$$\begin{aligned} N_{\varepsilon }(v) \triangleq \{ w\in \varGamma (v)| \sigma (w,v)\ge \varepsilon \}. \end{aligned}$$
(2)

A vertex v is called a core vertex if and only if \(|N_{\varepsilon }(v)| \ge \mu \), i.e., \( CORE_{\varepsilon ,\mu }(v)\Leftrightarrow |N_{\varepsilon }(v)|\ge \mu \). In the \(\mathsf {SCAN}\) algorithm, if v is a core vertex and \(u \in N_{\varepsilon }(v)\), u will be assigned to the cluster where v belongs to, and we call u is directly structural reachable from v (denoted by \(DirREACH_{\varepsilon ,\mu }(v,u)\)). Formally, we define \(\mathsf {direct}\) \(\mathsf {structure}\) \(\mathsf {reachability}\) as

$$\begin{aligned} DirREACH_{\varepsilon ,\mu }(v,u) \Leftrightarrow CORE_{\varepsilon ,\mu }(v)\wedge w\in N_{\varepsilon }(v) \end{aligned}$$
(3)

If \(DirREACH_{\varepsilon ,\mu }(v,u)\) and \(DirREACH_{\varepsilon ,\mu }(u,w)\) hold, we call w is structural reachable from v (denoted by \(REACH_{\varepsilon ,\mu }(v,w)\)). Formally, it is defined by

$$\begin{aligned} REACH_{\varepsilon ,\mu }(v,w) \Leftrightarrow \exists v_{1},....,v_{n}\in V:v_{1}\!=\!v \wedge v_{n}=w \wedge \forall i\!\in \! \{ 1,...,n-1\}:DirREACH_{\varepsilon ,\mu }(v_{i},v_{i+1}). \end{aligned}$$
(4)

If there exists a vertex \(v \in V\) such that \(REACH_{\varepsilon ,\mu }(v,u)\) and \(REACH_{\varepsilon ,\mu }(v,w)\) hold, we call u and w meeting structure connectivity, denoted by \(CONNECT_{\varepsilon ,\mu }(u,w)\). Based on the above definitions, the cluster C in \(\mathsf {SCAN}\) is defined as

Definition 1

\(CLUSTER_{\varepsilon ,\mu }(C)\Leftrightarrow \)

(1)    \(Connectivity:\forall u,w\in C:CONNECT_{\varepsilon ,\mu }(u,w)\)

(2)    \(Maximality:\forall u,w\in V:u\in C\wedge REACH_{\varepsilon ,\mu }(u, w)\Rightarrow w\in C\)

The \(\mathsf {SCAN}\) algorithm aims to find all clusters defined in Definition 1. Note that there may exist some vertices that do not belong to any cluster. Those vertices are considered as hubs if they bridge different clusters, otherwise they will be classified as outliers [23]. The \(\mathsf {SCAN}\) algorithm first finds a core vertex, and then creates a new cluster for that core vertex. Then, the algorithm traverses the \(\varepsilon \)-\(\mathsf {neighborhood}\) of the core vertex in a BFS (Breadth-first search) manner to add vertices into the cluster. When all the vertices are visited, the algorithm terminates. Note that the \(\mathsf {SCAN}\) algorithm is tailored for static graphs, and it is nontrivial to maintain the clusters when the graphs evolve over time. In this paper, we focus on such a cluster maintenance problem when the graph is updated by an edge insertion and deletion.

3 Incremental Structure Clustering Algorithm

To maintain the clusters, a naive algorithm is to recompute all clusters by invoking \(\mathsf {SCAN}\) when inserting or deleting an edge. Clearly, such a naive algorithm is inefficient. Below, we propose the \(\mathsf {ISCAN}\) algorithm to maintain the clusters without recomputing all clusters. Our algorithm is based on the following key observations.

Observation 1

Consider an edge \(e=(u, v)\). Let \(N(e_{uv}) \triangleq \varGamma (u) \cup \varGamma (v)\), \(R(e_{uv}) \subseteq E\) be the set of edges with two end vertices in \(N(e_{uv})\). When insert or delete an edge \(e=(u, v)\), we only need to update the \(\mathsf {structural}\) \(\mathsf {similarity}\) between the two end vertices of an edge in \(R(e_{uv})\). There is no need to update the \(\mathsf {structural}\) \(\mathsf {similarity}\) between the two end vertices of an edge in \(E\backslash R(e_{uv})\). When adding or removing an edge \(e=(u, v)\), the \(\mathsf {structural}\) \(\mathsf {similarity}\) may increase or decrease for different edges in \(R(e_{uv})\). Below, we focus mainly on the edge insertion case, and similar results also hold for the edge deletion case. When inserting an edge \(e=(u, v)\), we have three different cases.

figure a

First, the \(\mathsf {structural}\) \(\mathsf {similarity}\) between (uv), i.e., \(\sigma (u,v)\) increases to \(\frac{| \varGamma (u)\cap \varGamma (v) |}{\sqrt{(| \varGamma (u) |+1)(| \varGamma (v) |+1)}}\) after inserting (uv). Here \(\varGamma (v)\) denotes the \(\mathsf {vertex}\) \(\mathsf {neighborhood}\) of v before inserting (uv). This is because there is no edge between (uv) before inserting (uv), thus \(\sigma (u,v)=0\) before adding (uv) by definition. Second, if (wuv) forms a triangle after inserting (uv), \(\sigma (w,v)\) will increase to \(\frac{| \varGamma (w)\cap \varGamma (v) |+1}{\sqrt{| \varGamma (w) |(| \varGamma (v) |+1)}}\) based on the following lemma.

Lemma 1

\(\frac{| \varGamma (w)\cap \varGamma (v) |}{\sqrt{| \varGamma (w) || \varGamma (v) |}} < \frac{| \varGamma (w)\cap \varGamma (v) |+1}{\sqrt{| \varGamma (w) |(| \varGamma (v) |+1)}}\)

Proof

First , we have \(\small \frac{| \varGamma (w)\cap \varGamma (v) |^2}{| \varGamma (w) || \varGamma (v) |} / \frac{(| \varGamma (w)\cap \varGamma (v) |+1)^2}{| \varGamma (w) |(| \varGamma (v) |+1)}\) \(\small =| \varGamma (w)\cap \varGamma (v) |^2 (\sqrt{| \varGamma (v) |}+1)/\sqrt{| \varGamma (v) |}(| \varGamma (w)\cap \varGamma (v) |+1)^2\). Then, we have \(\small | \varGamma (w)\cap \varGamma (v) |^2 (\sqrt{| \varGamma (v) |}+1)/\sqrt{| \varGamma (v) |}(| \varGamma (w)\cap \varGamma (v) |+1)^2 \le | \varGamma (w)\cap \varGamma (v) | (\sqrt{| \varGamma (v) |}+1)/\sqrt{| \varGamma (v) |}(| \varGamma (w)\cap \varGamma (v) |+1)\). Since \(\small | \varGamma (w)\cap \varGamma (v) | \le | \varGamma (v)|\), we have \(\small | \varGamma (w)\cap \varGamma (v) | (\sqrt{| \varGamma (v) |}+1)/\sqrt{| \varGamma (v) |}(| \varGamma (w)\cap \varGamma (v) |+1) \le 1\). This completes the proof.

Third, if the vertices (wuv) do not form a triangle after adding (uv), \(\sigma (w,v)\) decreases to \(\frac{| \varGamma (w)\cap \varGamma (v) |}{\sqrt{| \varGamma (w) |(| \varGamma (v) |+1)}}\). Based on this observation, when the \(\mathsf {structural}\) \(\mathsf {similarity}\) of (wv) increases, we may merge two clusters. On the other hand, when the \(\mathsf {structural}\) \(\mathsf {similarity}\) decreases, we may need to split a cluster.

Observation 2

A crucial observation is that the clustering procedure of \(\mathsf {SCAN}\) will generate a BFS-forest where each BFS-tree is a cluster [23]. Note that all the non-leaf nodes in a BFS-tree are the core vertices. Based on this, we can use the BFS-forest to maintain the clusters when the graph changes. In Algorithm 1, we give a modified \(\mathsf {SCAN}\) algorithm to generate the BFS-forest (see lines 4 and 10).

3.1 The \(\mathsf {ISCAN}\) Algorithm

As shown in Observation 1, each edge updating (inserting or deleting) can lead to the \(\mathsf {structural}\) \(\mathsf {similarity}\) decreasing or increasing. When the \(\mathsf {structural}\) \(\mathsf {similarity}\) of an edge (uv) increases, the algorithm may need to merge the clusters of u and v if u(v) is directly structural reachable from v(u). Moreover, the vertices u and v may become core vertices, if they are not core before updating. On the other hand, if the \(\mathsf {structural}\) \(\mathsf {similarity}\) of (uv) decreases, the algorithm may need to split the cluster, because \(\sigma (u,v)\) may be smaller than the threshold \(\varepsilon \). Also, the vertices u and v may become non-core vertices if they are core vertices before updating. The challenge is how can we maintain the BFS-forest structure to handle all these cases.

figure b

To tackle this challenge, we additionally maintain a set \(\varPhi \) which stores all the non-tree edges (uv) such that v(u) is directly structural reachable from u (v). Recall that by the \(\mathsf {SCAN}\) algorithm, there may exist an edge (uv) meeting the DirREACH relationship, i.e., v(u) is directly structural reachable from u(v) and (uv) is not in any BFS-tree. We make use of the set \(\varPhi \) to keep all these edges. In other words, we classify the edges that satisfy the DirREACH relationship into two classes: tree edge which is stored in the BFS-forest, and non-tree edge which is kept in \(\varPhi \). When we split a BFS-tree into two sub-treess, we need to scan \(\varPhi \) to check whether these sub-trees can be merged again by an edge in \(\varPhi \). The \(\mathsf {ISCAN}\) algorithm maintains both the BFS-forest structure and the set \(\varPhi \). Initially, we can obtain \(\varPhi \) using the modified \(\mathsf {SCAN}\) algorithm as shown in Algorithm 1 (see line 12).

The \(\mathsf {ISCAN}\) algorithm is outlined in Algorithm 2. It consists of three steps to maintain the clusters after an edge (uv) updating. In the first step, the algorithm considers the case of \(\mathsf {structural}\) \(\mathsf {similarity}\) increasing. In this case, the algorithm scans the core vertices to maintain the BFS-forest and \(\varPhi \). The algorithm recomputes the \(\mathsf {structural}\) \(\mathsf {similarity}\) for each edge in \(R(e_{uv})\), because the \(\mathsf {structural}\) \(\mathsf {similarity}\) for these edges may be updated. For each core vertex in \(N(e_{uv})\), the algorithm invokes Algorithm 3 to maintain the set \(\varPhi \) and merge the clusters (lines 1–4).

figure c

In Algorithm 3, the algorithm first checks whether the core vertex w is classified or not. If it is unclassified (i.e., w does not belong to any cluster), we create a cluster ID for w. Then, the algorithm traverses the \(\varepsilon \)-\(\mathsf {neighborhood}\) of w. For each neighbor u in \(N_{\varepsilon }(w)\), if u is unclassified, then we add u into the same cluster as w, and set w as the parent for u (line 13). Otherwise, the algorithm checks whether u is a core vertex. If that is the case, the algorithm verify whether (wu) is a tree edge. If it is not a tree edge and w and u have the same cluster ID, we insert (wu) into \(\varPhi \) (lines 8–9). If w and u have different cluster IDs, we merge the two trees (i.e., clusters) of w and u (line 10–11). On the other hand, if u is not a core vertex, we consider two cases. First, if (wu) is not a tree edge and wu have the same cluster ID, we insert (wu) into \(\varPhi \). Second, if (wu) have different cluster IDs, we also add (wu) into \(\varPhi \) (lines 4–6). For this case, we will add u into the cluster of w in the third step.

figure d

In the second step, Algorithm 2 considers the case of when the \(\mathsf {structural}\) \(\mathsf {similarity}\) decreases. To this end, Algorithm 2 scans all the edges in \(R(e_{uv})\). For an edge \(e=(\tilde{u}, \tilde{v})\), if the \(\mathsf {structural}\) \(\mathsf {similarity}\) for e before updating (denoted by \(\sigma (\tilde{u}, \tilde{v})\)) is no less than \(\varepsilon \) and the \(\mathsf {structural}\) \(\mathsf {similarity}\) for e after updating (denoted by \(\sigma ^\prime (\tilde{u}, \tilde{v})\)) is smaller than \(\varepsilon \), the algorithm invokes Algorithm 4 to split the BFS-trees and also maintain the set \(\varPhi \).

In Algorithm 4, we consider four different cases for the input edge (uv). First, both u and v are core vertices after updating. In this case, if (uv) is not a tree edge, we delete (uv) from \(\varPhi \) (lines 2–3). Otherwise, we remove (uv) from the corresponding BFS-tree (line 5). Second, u is a core vertex and v is not. In this case, if u is a parent of v before updating, we remove (uv) from the corresponding BFS-tree (lines 7–8). Otherwise, we remove it from \(\varPhi \) (line 10). Third, v is a core vertex, but u is not. This case is similar to the second case, thus we omit the details. Fourth, both u and v are not core vertices. In this case, we need to consider whether u(v) is core before updating. If both u and v are not core vertices before updating, we do nothing. If u (v) is core and (uv) is a tree edge, we delete (uv) from the BFS-tree (lines 14–16). Otherwise, delete (uv) from \(\varPhi \).

In the third step, Algorithm 2 scans each edge \((\tilde{u}, \tilde{v})\) in \(\varPhi \), and merge two clusters by the edge \((\tilde{u}, \tilde{v})\) if \(\tilde{u}\) and \(\tilde{v}\) have different cluster IDs. Since the \(\mathsf {ISCAN}\) algorithm enumerates all the possible cases for updating both the BFS-forest and \(\varPhi \), it is correct. Below, we analyze the time and space complexity of the algorithm.

Complexity Analysis. We first analyze the time complexity of the \(\mathsf {ISCAN}\) algorithm. Let m and n be the number of edges and vertices of the graph G respectively. Let \(\tilde{m} = |\varPhi |\) be the size of \(\varPhi \). Clearly, \(\tilde{m}\) is much smaller than m in real-world graphs. In our experiments, we show that in the Youtube social network \(m=2,987,624\) whereas \(\tilde{m} = 3,210\). Initially, the algorithm recomputes the \(\mathsf {structural}\) \(\mathsf {similarity}\) for all edge in \(R(e_{uv})\). Let O(T) be the time spent in this initial step. Since \(|R(e_{uv})|\) is very small, O(T) typically can be dominated by O(m) in real-world graphs. In the first step, the cluster merging procedure can be done in O(n) time, because in the worst case, we merge at most O(n) trees. In the second step, we also at most split O(n) clusters, thus the time spent in this step can be bounded by O(n). In the last step, the algorithm takes \(O(\tilde{m})\) time to scan \(\varPhi \) and merge the clusters. Putting it all together, we can conclude that the time complexity is \(O(m+n)\). In the experiments, we will show that the time usage of our algorithm is much less than such a worst case bound. For the space complexity, our algorithm only need to maintain the BFS-forest and \(\varPhi \) which is dominated by \(O(m+n)\).

4 Performance Studies

In this section, we conduct extensive experiments to evaluate the performance of the proposed algorithm. We implement two algorithms: \(\mathsf {ISCAN}\) and \(\mathsf {Basic}\). The \(\mathsf {ISCAN}\) algorithm is the proposed algorithm, while the \(\mathsf {Basic}\) algorithm recomputes the clustering results using the \(\mathsf {SCAN}\) algorithm when the graph changes. We implement these algorithms in C++. All the experiments are conducted in a Linux Server with 2 CPUs and 32 GB main memory.

Dataset. We use four real-world large datasets in the experiments. The detailed statistics of the datasets are summarized in Table 1. All these datasets are downloaded from (http://konect.uni-koblenz.de/networks/). The first three datasets (Youtube, Pokec, and Flixster) are social networks, and the following three datasets (WebGoogle, WebBerkStan, and TREC) are web graphs. The Skitter dataset is a computer network and the RoadNetPA dataset is a road network.

Parameter Setting. There are two parameters in our algorithm: \(\varepsilon \), and \(\mu \). As recommended in [23], we set the default values of \(\varepsilon \) and \(\mu \) by 0.5 and 2, respectively. We vary \(\varepsilon \) from 0.3 to 0.8, and vary \(\mu \) from 2 to 7. In all experiments, when varying a parameter, we set the default value for the other parameter. In all experiments, we randomly insert and delete 1000 edges from the original network. For each edge update, we invoke the \(\mathsf {ISCAN}\) and \(\mathsf {Basic}\) algorithm to update the clustering results. We record the total time for each algorithm to handle the 1000 edge insertions and deletions.

Table 1. Datasets

Efficiency Testing (vary \(\varepsilon \)). In this experiment, we evaluate the efficiency of our algorithm when varying \(\varepsilon \). The results are shown in Fig. 1. As can be seen, the \(\mathsf {ISCAN}\) algorithm is at least three orders of magnitude faster than the \(\mathsf {Basic}\) algorithm over all the datasets. For example, in Youtube dataset, when \(\varepsilon =0.5\), our algorithm takes only 10 s to update 1000 edges, whereas the \(\mathsf {Basic}\) algorithm consumes more than 10000 s. Moreover, the running time of our algorithm generally decreases with increasing \(\varepsilon \), while the running time of \(\mathsf {Basic}\) keeps stable with varying \(\varepsilon \). The reason is as follows. When \(\varepsilon \) is large, the clusters obtained by the \(\mathsf {SCAN}\) algorithm are relatively stable with respect to an edge updating. As a result, our algorithm may only need to update a small amount of edges. For the \(\mathsf {Basic}\) algorithm, the algorithm always invoke \(\mathsf {SCAN}\) to recompute the clusters, thus its running time is insensitive to an edge updating.

Fig. 1.
figure 1

Comparison between \(\mathsf {ISCAN}\) and \(\mathsf {Basic}\) (vary \(\varepsilon \))

Efficiency Testing (vary \(\mu \)). In this experiment, we compare the efficiency between \(\mathsf {ISCAN}\) and \(\mathsf {Basic}\) when varying \(\mu \). The results are reported in Fig. 2. From Fig. 2, we can see that the \(\mathsf {ISCAN}\) algorithm is at least three orders of magnitude faster than the \(\mathsf {Basic}\) algorithm with different \(\mu \) values in all datasets. Furthermore, the running time of \(\mathsf {ISCAN}\) decreases as \(\mu \) increase. The rationale is as follow. When the graph updating, the larger value of \(\mu \), the less influence for the original clusters. Therefore, our algorithm is more efficient when \(\mu \) is large. Similarly, for the \(\mathsf {Basic}\) algorithm, it is robust with respect to the parameter \(\mu \), as it always recompute the clusters using the \(\mathsf {SCAN}\) algorithm.

To summarize, we can conclude that the \(\mathsf {ISCAN}\) algorithm is very efficient in practice. As shown in Figs. 1 and 2, under the default parameter setting, the \(\mathsf {ISCAN}\) algorithm takes only a few seconds to update the clusters in a large graph (e.g., in Pokec dataset, it has more than 22 million edges) with 1000 edge updates. These results demonstrate the high efficiency of the proposed algorithm.

Fig. 2.
figure 2

Comparison between \(\mathsf {ISCAN}\) and \(\mathsf {Basic}\) (vary \(\mu \))

5 Related Work

Structural Graph Clustering. The original structural graph clustering algorithm (\(\mathsf {SCAN}\)) was proposed by Xu et al. in [23]. Recently, Shiokawa et al. [20] proposed an improved \(\mathsf {SCAN}\) algorithm called \(\mathsf {SCAN}\) ++. The \(\mathsf {SCAN}\) ++ algorithm is based on a new data structure, called directly two-hop-away reachable node set (DTAR). Specifically, DTAR maintains the set of two-hop-away nodes from a given node which are likely to be in the same cluster as the given node. To further reduce the running time of the \(\mathsf {SCAN}\) algorithm, Chang et al. [2] developed a two-step algorithm called \(\mathsf {pSCAN}\). The \(\mathsf {pSCAN}\) algorithm first clusters the core nodes, and then clusters the border nodes. They also proposed an efficient technique to cluster the core nodes based on a union-find structure. All those \(\mathsf {SCAN}\) algorithms are tailored for the static graphs, and they are costly to handle the dynamic graphs.

Cohesive Subgraph and Community Detection. Our work is closely related to the cohesive subgraph detection problem which aims to find the densely connected subgraphs from a graph. There are a number of cohesive subgraph models proposed in the literature. Notable examples consist of the maximal clique [4], k-core [12, 15, 24], k-truss [5, 21], maximal k-edge connected subgraph (MkCS) [1, 3, 25], locally dense subgraph [14], influential community [10, 11], and so on. All those methods can be used to find the non-overlapped communities, and a comprehensive survey on the other community detection algorithms can be found in [8]. Another line of studies focus on finding overlapped communities. For example, Cui et al. [6] proposed an \(\alpha \)-adjacency \(\gamma \)-quasi-k-clique model to study the problem of overlapped community search. More recently, Huang et al. [9] introduce a k-truss community model to detect overlapped communities. An excellent survey on overlapped community detection can be found in [22].

Community Maintenance in Dynamic Networks. The community maintenance problem in dynamic networks is an important task in social network analysis [7]. Our work is also closely related to this issue. For the community maintenance problem, it is very often not necessary to recompute the communities when the graph changes. One can only need to detect the affected edges or nodes in a community after the the graph updating. Clearly, different community models have different community updating strategies. Notable community updating algorithms are listed as follows. For the maximal clique model, Cheng et al. [4] introduced an algorithm for dynamically updating the maximal cliques in massive networks. For the k-core model, Li [12] proposed an efficient core maintenance in large dynamic graphs. Similarly, for the k-truss model, Huang [9] proposed an efficient truss maintenance algorithm for dynamic networks. Different from all the existing algorithm, in this paper, we study the problem of dynamically updating the clustering results generated by the \(\mathsf {SCAN}\) algorithm. Our algorithms may also work on location-based social networks [?], spatial networks [17, 19] and trajectory data [16, 18]. In the next step, we will study dynamic algorithms in the metric space [13].

6 Conclusion

In this paper, we study the incremental structural clustering problem for dynamic network data. We propose a new algorithm called \(\mathsf {ISCAN}\) to efficiently maintain the clusters generated by the \(\mathsf {SCAN}\) algorithm. In the \(\mathsf {ISCAN}\) algorithm, we use a BFS-forest and a non-tree edge set structure to maintain the clusters. We conduct comprehensive experiments over eight large real-world networks, and the results demonstrate the high efficiency of our algorithm.