Keywords

1 Introduction

\(k-\)core [10] is defined as the maximal subgraph of a simple graph G such that every vertex in the subgraph has at least k degree. The problem of finding the core number of all vertices in G is called core decomposition [4], which is widely used in many real-world applications, including large graph visualization [1], community detection [5], and network analysis [2].

Most graphs in our life are highly dynamic, whose edges are inserted into or removed from the graph over time. The core number of vertices should be updated to reveal the up-to-date structure of the graph. Clearly, it is uneconomical to recalculate the core number of all vertices while a few edges change. Instead, core maintenance [6, 9] is proposed, whose goal is to update the core number of influenced vertices rather than decompose the entire graph. Unfortunately, existing solutions can only deal with a single edge each time, which leads to high cost when a graph with numerous inserted (removed) edges.

To overcome above drawbacks, we provide a novel solution to maintain core number of vertices with multiple inserted (removed) edges simultaneously. Compared with previous studies, our solution is relevant to the maximum core number of a graph. We conduct extensive experiments to evaluate the performance of our method and the existing solution. Experimental results show that our method can significantly outperform the previous algorithm up to 3 orders of magnitude for large real graphs tested.

The main contributions of our paper can be summarized as follows:

  • With the aid of quasi\(-k-\)core, a similar but more loose concept to \(k-\)core, we can estimate the vertices affected by a set of inserted (removed) edges.

  • Unlike existing approaches, our maintenance solution can update the core number of vertices in affected \(k-\)cores from the sparsest to the densest.

  • Through executing extensive experiments on real graphs, our solution performs better than the existing approach.

The rest of this paper is organized as follows: Sect. 2 provides some preliminaries. The details of our solution are introduced in Sect. 3. Section 4 reports experimental results and Sect. 5 describes the related work about our paper. Finally, Sect. 6 concludes the paper.

2 Preliminaries

Usually, G represents a simple graph, which consists of a vertex set V(G) and an edge set E(G) such that \(E(G) \subseteq V(G) \times V(G)\). For convenience, \(|G|=|V(G)|+|E(G)|\) is used to represent the size of |G|, where |V(G)| and |E(G)| are the size of vertices and edges respectively. Additionally, \(K_0\) indicates an empty graph without vertices or edges.

Given an arbitrary vertex \(v \in V(G)\), we define \(N(G,v) = \{u:(u,v) \in E(G)\}\) as the set of neighbors of v. Clearly, |N(Gv)| is the degree of v in G, denoted by d(Gv). For convenience, we also use \(v \in G\ ((u,v) \in G)\) to replace \(v \in V(G)\ ((u,v) \in E(G))\), where u and v are two adjacent vertices of an edge in G.

To clearly illustrate the relation between two graphs \(G_1\) and \(G_2\), we generalize four set notations on graphs: \(G_1 \subseteq G_2\) represents \(G_1\) is a subgraph of \(G_2\); \(G_1 \cap G_2\) refers to the intersection graph of \(G_1\) and \(G_2\); \(G_1\cup G_2\) is the union graph of \(G_1\) and \(G_2\); \(G_1 \setminus G_2\) depicts the difference graph of \(G_1\) and \(G_2\) such that \(E(G_1 \setminus G_2) = E(G_1) \setminus E(G_2)\).

\(k-\)core [10] is a well-established metric to evaluate the importance of vertices as well as their connections in the graph. Besides, \(k-\)core has two important properties: uniqueness and nestedness [4, 8].

Definition 1

A \(k-\)core is the largest subgraph of a graph G, denoted by C(Gk), such that \(d(C(G,k),v) \ge k\) for an arbitrary vertex \(v \in C(G,k)\).

Generally, we require \(k \ge 1\). When \(k=0\), \(0-\)core is the graph itself. If not specified, we assume isolated vertices have been removed from G. Besides, we use \(C(G,k)= K_{0}\) to represent an empty \(k-\)core.

Similar to existing studies [4], we define the core number of vertices in G. If a vertex v is located in \(k-\)core but not contained in \((k+1)-\)core, then its core number is k, denoted by \(\phi (G,v)= k\). Additionally, the maximum core number of vertices in G is denoted as \(\phi (G)\).

3 Solution

Existing methods obey the core update theorem [6, 9], while an edge is inserted into (removed from) a graph, the vertices affected by this edge will change their core number at most 1. When numerous edges change, existing methods repeatedly identify influenced vertices for each edge and some of them may change their core number many times. If the number of edges is very large, the maintenance cost will become expensive.

To address the above issues, we propose a novel solution, which updates core number of influenced vertices from the sparsest \(k-\)core to the densest. To this end, we first propose the quasi\(-k-\)core to estimate the candidate vertices for each influenced \(k-\)core. Secondly, we identify the partial\(-k-\)core of each affected \(k-\)core and update their core number. Lastly, we increase k until all affected \(k-\)cores are updated.

For convenience, we use \(G_c\) to represent the current graph and \(G_p\) to indicate the previous graph before changing. Correspondingly, we define \(S_i = G_c\setminus G_p\ (S_r = G_p \setminus G_c)\) as the insertion (removal) graph.

3.1 Quasi\(-k-\)core

Consider that most graphs are sparse, not all \(k-\)cores will be affected by inserted (removed) edges. To find influenced \(k-\)cores, an intuitive idea is to decompose \(S_i\ (S_r)\). Since some vertices of \(S_i \ (S_r)\) lack adjacent edge information in \(G_c\ (G_p)\), we decompose \(S_i\ (S_r)\) to a set of quasi\(-k-\)cores with the aid of \(G_{c}\ (G_p)\).

Consider that the steps of quasi core decomposition on \(S_i\) and \(S_r\) are similar, we use S (e.g. \(S_i\) or \(S_r\)) and G (e.g. \(G_c\) or \(G_p\)) to represent two arbitrary graphs for ease of presentation. To supplement extra edge information of vertices in S, we define the neighborhood graph S on G, which consists of vertices in S and their adjacent neighbors within one step in G. Similar to \(k-\)core, quasi\(-k-\)core is also unique and nested. Otherwise, it contradicts to the maximal property of quasi\(-k-\)core.

Definition 2

\(S(G)=(V(S(G)),E(S(G)))\) is the neighborhood graph of S on G such that \(V(S(G))= V(S) \cup \{v: v \in N(G,u) \wedge u \in V(S)\}\) and \(E(S(G)) = E(S) \cup \{(u,v):u \in S \wedge v \in N(G,u)\}\). Specially, if \(S \cap G =K_{0}\), then \(S(G)=S\).

Definition 3

The quasi\(-k-\)core \(\hat{C}(S,G,k)\) is the largest subgraph of S on G such that \(d(\hat{C}(G),v) \ge k\) for an arbitrary \(v \in \hat{C}(S,G,k)\), where \(\hat{C}(G)\) is the neighborhood graph of \(\hat{C}(S,G,k)\) on G.

To get a set of quasi\(-k-\)cores, we can revise the decomposition method of \(k-\)cores. Through recursively removing unsatisfied vertices from S for each k, we can get a set of quasi\(-k-\)cores. Besides, we can also define quasi core number for each vertex, which is similar to core number.

3.2 Insertion Case

Our insertion algorithm has four steps: firstly, we decompose \(S_i\) to a set of quasi\(-k-\)cores with the aid of \(G_c\); secondly, we expand each quasi\(-k-\)core to a candidate graph; thirdly, we get the partial\(-k-\)core from the candidate graph; lastly, we update the core number of vertices in the partial\(-k-\)core and continue the next loop until all affected \(k-\)cores are updated.

Note that inserted edges may increase the core number of adjacent vertices of the quasi\(-k-\)core, but they are not contained in the quasi\(-k-\)core. To find all affected vertices, we expand the quasi\(-k-\)core to a candidate graph, which contains all possible affected vertices. Additionally, we terminate the search path when edges are contained in the previous \(k-\)core since they must belong to the current \(k-\)core.

Definition 4

F(k) is a candidate graph whose vertex \( v \in C(G_c,k-1)\) satisfying \(d(C(G_c,k-1),v) \ge k\) is reachable from \(u \in \hat{C}(S_i,G_c,k)\) via a path and satisfies \((u', v') \notin C(G_p,k)\) for an arbitrary edge \((u',v') \in F(k)\).

We can observe that F(k) contains all vertices that may be contained in \(C(G_c,k) \setminus C(G_p,k)\). So, \(C(G_c,k) \subseteq F(k) \cup C(G_p,k)\) holds. Since F(k) contains some redundant vertices, we identify the partial\(-k-\)core from F(k), denoted by \(P(k) = C(G_c,k) \setminus C(G_p,k)\), which is the difference graph of \(C(G_c,k)\) and \(C(G_p,k)\). Since \( \hat{C}(P(k), C(G_p,k),k) \) is a subgraph of P(k) and for any \(v \in P(k)\), \(d(P(C(G_p,k)),v) \ge k\) holds, we have \(\hat{C}(P(k), C(G_p,k),k) = P(k)\).

To get P(k) from F(k), we observe that \(P(k)= \hat{C}(P(k), C(G_p,k),k)\). Since \(P(k) \subseteq F(k)\), we have \(P(k) \subseteq \hat{C}(F(k),C(G_p,k),k)\). On the contrary, if a vertex \(v \in \hat{C}(F(k),C(G_p,k),k) \setminus P(k)\), there must be a vertex \( u \in C(G_p,k) \) which can be reachable from v such that \(d(F(C(G_p,k)),u) < k\) via a path. Again, this is a contradiction. Consider another case \((u,v) \in \hat{C}(F(k),C(G_p,k),k) \setminus P(k)\), since \(u,v \in P(k)\), \(P(k) \subseteq P(k) \cup (u,v)\). When \(C(G_p,k) = K_0\), we can directly get \(C(G_c,k)\) from F(k) since \(C(G_c,k) \subseteq F(k)\).

figure a

Based on above discussions, we implement Algorithm 1 to maintain \(k-\)cores for the insertion case. In detail, it first decomposes \(S_i\) to a map of quasi core numbers and corresponding vertex sets. Then the algorithm updates the core number of influenced vertices from \(k=2\) to \( \hat{\phi }(S_i,G_c) \) (the maximal quasi core number of \(S_i\) on \(G_c\)) according to two cases mentioned above.

Clearly, the time complexity of Algorithm 1 is \(O(|S_i|+\sum _{k=1}^{\hat{\phi }(S_i,G_c)}2|F(k)|)\), where O(2|F(k)|) is the cost to get F(k) and the partial\(-k-\)core. Note that \( \hat{\phi }(S_i,G_c) \) is much less than \(|V(G_c)|\), where \(|V(G_c)|\) is the number of vertices in \(G_c\). As for the space complexity, it only costs \(O(|G_c|)\) to store the entire graph.

figure b

3.3 Removal Case

Our removal algorithm contains three steps: firstly, we decompose \(S_r\) to a set of quasi\(-k-\)cores with the aid of \(G_p\); secondly, we delete the common edges in \(\hat{C}(S_r,G_p,k)\cap C(G_p,k)\) and recursively remove influenced vertices that cannot be located in \(C(G_p,k)\); thirdly, we continue the loop until all affected \(k-\)cores are updated.

The implementation of Algorithm 2 is relatively simple. Firstly, it decomposes \(S_r\) to a map of vertices and their quasi core number (line 1). Secondly, for each influenced \(k-\)core, it deletes common edges in \(\hat{C}(S_r,G_p,k)\cap C(G_p,k)\), recursively removes influenced vertices and updates the core number of affected vertices. The time complexity of Algorithm 2 is \(O(|S_r|+\sum _{k=1}^{\hat{\phi }(S_r,G_p)}|C(G_p,k)|)\). Since it at most traverses the entire affected \(k-\)core for each k, and the space complexity is \(O(|G_p|)\).

4 Experiments

Our real graphs are downloaded from Koblenz Network CollectionFootnote 1, including 10 real graphs (seen Table 1), where Stanford is a direct graph and Youtube is a temporal graph. For the directed graph, we ignore the edge direction and regard it as a simple graph. Then for the temporal graph, we sort their edges by the timestamp. While for the remainder graphs, we keep the initial order of edges as the corresponding graph files.

All algorithms are implemented in C++ and compiled with GCC 7.4.0 at -O2 optimization level. All experiments are executed sequentially on the Linux operating system Ubuntu 18.04, which is running on a machine with two Xeon E5-2683v4@2.1 GHz CPUs and 128 GB RAM.

Table 1. The detail of graphs

Similar to the most of existing studies, we adopt the execution time is as the metric of our experiments. In our experiments, we select the traversal approach [9] as the baseline, which contains Trav-I for the insertion case and Trav-R for the removal case. To support numerous edges, we recursively execute the traversal approach multiple times. Before experiments, we set some necessary parameters. We set the 2-hops for Trav-I and 1-hop for Trav-R in experiments, the details of these algorithms can be seen in [9].

For the insertion case, the last m edges are used to construct the insertion graph \(S_i\) and the remainder are used to construct \(G_p\). As for the removal case, the first m edges are used to construct removal graph \(S_r\) and all edges are used to construct \(G_p\). Generally, we vary m from 100, 000 to 200, 000 for tracing the evolution of the performance of two approaches.

Table 2. The execution time for the insertion case (unit: second)

Table 2 shows the execution time on all graphs for the insertion case. Compared with Trav-I, IC achieves the best performance on all graphs. Table 3 shows the execution time on all graphs for the removal case. Since both two algorithms do not search candidate vertices in the removal case, the execution time is obviously less than that of the insertion case. Even so, the performance of RC is still better than Trav-R on all graphs.

Table 3. The execution time for the removal case (unit: second)

5 Related Work

\(k-\)core decomposition, which assigns each vertex v with a core number to reveal the connected state of v and its neighbors, is strongly related to graph degeneracy [10]. Numerous \(k-\)core decomposition algorithms are proposed to handle different cases. To handle \((k,h)-\)core of a temporal graph, which is an extension of \(k-\)core, Wu et al.  [11] proposed two distributed algorithms to deal with massive temporal graphs. Besides, the probabilistic core decomposition was also studied recently in [3], where \( (k, \eta )-\)cores were proposed.

\(k-\)core and its extensions have been extensively used in numerous applications. To solve the maximal clique problem, Lu et al.  [7] devised a randomized algorithm by utilizing \(k-\)core and \(k-\)truss. With the aid of \(k-\)core, variants of community detection problems are addressed such as local communities detection [5]. Alvarezhamelin et al.  [2] used \(k-\)core as a tool to analyze large scale graphs such as social network and Internet graph.

6 Conclusions

In this paper, we propose a novel solution to tackle \(k-\)core maintenance of dynamic graphs, which provides an effective solution to maintain the core number of vertices affected by multiple inserted (removed) edges simultaneously. We confirm our approaches by conducting extensive experiments on 10 real graphs. The results show that our solution can outperform the existing algorithm up to 3 order of magnitude for real graphs tested.