Keywords

1 Introduction

The study of complex networks has gained increased attention in recent years due to its applicability in different research fields (e.g. biology [1], ecology [7], telecommunication [12]). A relevant problem in networks study is the identification of a set of nodes, which, based on some properties, can be considered more important than others. If this property is a network measure, the problem is called the critical node detection problem.

The critical node detection problem (CNDP) has several application possibilities in different research fields, e.g. social network analysis [8, 14], network risk management [5] and network vulnerability studies [10].

Generally, the CNDP consists of finding a set of k nodes in a given graph \(G=(V,E)\), which, if deleted, maximally degrades the graph according to a given measure \(\sigma \) (\(\sigma \) can be for example, betweenness centrality, closeness centrality or page rank [18, 24]).

The definition of critical edge detection is almost the same. Given a graph \(G=(V,E)\), the goal is to find a set of l edges in order to optimize a certain network property.

We propose a new combinatorial optimization problem, the critical node and edge detection problem. Although the critical node detection and the critical edge detection problems exist separately, the unification of the two problems is essential in some applications (e.g. road networks, computer networks, etc.) as it can model real-world situations better. In a certain network not only nodes but also edges can be deleted.

The next section of the paper presents related work about critical node and edge detection variants and algorithms. The third section describes the new proposed critical node and edge detection problem. In the fourth section, the designed algorithms are described, while section five presents the numerical results. The article ends with conclusions and further work.

2 Related Work

The CNDP can be considered as a special case of the node deletion problem [22]. Based on the survey [21] the CNDP can be divided in two main classes. The first class is the k-vertex-CNDP, where in the given graph \(G=(V,E)\) and a connectivity metric \(\sigma \) and a given number k and the goal is to minimize the objective function \(f(\sigma )\) of deleting k nodes. Some problems from this class are the MaxNum problem [32] (maximizing the number of connected components), MinMaxC (minimizing the size of the largest components) [33], and CNP (critical node problem - minimizing pairwise connectivity) [26]. The most studied variant is the CNP, with several algorithm proposals, for example, using integer linear programming [4], iterated local search algorithms [23], and greedy randomized adaptive search procedures [29]. In [3] an evolutionary framework is proposed which can deal with several variants of the CNDP.

The other main class is the \(\beta \)-connectivity-CNDP, where for a given \(G=(V,E)\) graph, a connectivity metric \(\sigma \) and an integer \(\beta \), the main goal is to limit the objective function \(f(\sigma )\) to \(\beta \), while minimizing the number of deleted nodes. Examples from this class are the Cardinality-Constrained-CNP (CC-CNP) [6] or the Component-Cardinality-Constrained CNP (3C-CNP) [20].

In an early work [40], edge deletion was studied in the case of maximum flow networks. In [37], the edge interdiction clique problem is introduced, where edges need to be removed so that the size of the maximum clique in the remaining graph is minimized and an exact algorithm is proposed to solve it for small graphs. In [15], a branch-and-cut algorithm is proposed to solve the problem.

In the matching interdiction problem [43] the weight of the maximum matching in the remaining graph after deleting edges or nodes needs to be minimized. In the same article, a pseudo-polynomial algorithm is proposed.

A recent article [9] proposes the online node and edge detection problem, where there are discussed and analyzed some online edge and node deletion problems. In [27] vertex and edge protection is proposed to stop a spreading process.

Simultaneous deletion of nodes and links, for the best of our knowledge, appeared only in two variants: in [39] a joint region identification is proposed and in [11] the \(\beta \)-connectivity CNDP is studied where both nodes and edges can be deleted.

3 Combined Critical Node and Edge Detection Problem

The critical node and edge detection problem (CNEDP) consists of finding a set \(W\subseteq V\) containing k nodes and a set \(F \subseteq E \) having l edges in a given graph \(G=(V,E)\), which deleted maximally degrades the graph according to a given measure \(\sigma \). We denote this introduced problem by (kl)-CNEDP.

In this article we study as a network connectivity measure the pairwise connectivity. The objective function, which needed to be minimized is the following:

$$\begin{aligned} f(A)=\sum _{C_i \in G[V\backslash A]}\frac{\delta _{i}(\delta _{i}-1)}{2}, \end{aligned}$$
(1)

where \(A \subseteq V\), \(C_i\) is the set of connected components in the remaining graph, after the deletion of nodes and edges, and \(\delta _i\) is the size of the connected component \(C_i\).

Remark 1

It is obvious that (k, 0)-CNEDP reduces the CNDP, and (0, l)-CNEDP reduces the critical edge detection problem (CEDP).

Example 1

Let us consider the graph presented in Fig. 1. Considering (1,1)-CNDEP, if deleting the sixth node and the edge between node 3 and 4, \(A=\{1,2,3,4,5,7\}\), \(C_1=\{1,2,3\}\), \(C_2=\{4\}\), \(C_3=\{5,7\}\), \(\delta _1=\{3\}\), \(\delta _2=\{1\}\), \(\delta _3=\{2\}\),

$$f(A)=\frac{3(3-1)}{2}+\frac{1(1-1)}{2}+\frac{2(2-1)}{2}=4$$
Fig. 1.
figure 1

A simple graph with 7 nodes

Remark 2

The complexity of the (kl)-CNEDP is NP-complete. In [4] it is proved that the decision variant of the CNP problem is NP-complete, CNP is a subtask of the (kl)-CNDEP problem.

4 Methods

We propose two algorithms to solve the (kl)-CNEDP presented in the following.

4.1 Greedy Algorithm

In the framework proposed in [3] three non-evolutionary greedy solutions are presented to solve the CNDP. We adapted the second algorithm to fit the CNEDP, using the pairwise connectivity in the greedy decision-making process. The greedy selection is made based on the following function:

$$GR2(S_X)=argmax\{(f(S_X)-f(S_X \cup \{t\}) : t \in X \setminus S_X \} $$

where \(S_X\) is one of \(S_{nodes}\) or \(S_{edges}\), and X represents respectively the original set of nodes or edges of the network.

The algorithm selects the best nodes and edges that minimize the objective function and, depending on the values of k and l, selects a single node or edge randomly, which will be removed from the network. The procedure is repeated until the maximal number of removed nodes and edges is reached. Since the selection is made randomly from the pre-selected components that have maximal impact on pairwise connectivity, the algorithm should be repeated several times to achieve the best possible results. According to [3] the number of iterations to perform should be set to \(|S_{nodes}|^2\), thus we can achieve a feasible solution by setting this value to \((|S_{nodes}|+|S_{edges}|)^2\). The original framework recommends the execution of the other greedy selection rule to minimize the removed network component count after reaching a solution.

$$GR1(S_X)=argmax\{(f(S_X \setminus t)-f(S_X) : t \in S_X \} $$

Since the pairwise connectivity is driven by both the removed nodes and edges, and because of the specific structure of \((k,l)-CNEDP\), we consider that this step is not mandatory. We can argue that the complexity of the reintroduction of the removed components is too resource-intensive, but in case it is required, it can be executed. The outline of the algorithm is presented in Algorithm 1.

The number of total fitness function execution can be approximated for the algorithm, since the method is set to stop when k edges and l nodes are reached in the removal process. In each iteration the GR2 method computes the fitness of the network if any of the remaining nodes and edges are removed, one at a time, selecting the \(Best*\) items those resulting in the best fitness value. Let us suppose that \(D_{average}\) is the average node degree, and that with the removal of every node the number of fitness calculations in the next iteration will decrease with \(1+D_{average}\), and with the removal of an edge it decreases with 1. Then the approximate number of fitness execution will be \(l(V+E)+\frac{1}{2}l(l+1)(1+D_{average})+\frac{1}{2}k(2V-k)\).

figure a

4.2 Genetic Algorithm

We designed a simple genetic algorithm, the encoding, fitness evaluation and the operators used are presented in the next. The outline of the genetic algorithm is described in Algorithm 2.

Encoding: An individual is represented by two lists, one for nodes and the other for edges.

Fitness: Each individual is evaluated according to the pairwise connectivity of the connected components in the network, after the removal of the individual’s nodes and edges.

Crossover and Parent Selection: This is realized using a tournament-based selection. For each round of the crossover tournament, the algorithm randomly chooses a set number of individuals from the population after which a selection is made, keeping only the two best individuals according to their fitness. They will then reproduce, by combining their node and edge lists. The algorithm will then split these lists randomly and evenly, keeping some restrictions, such as uniqueness. This way we generate two children for each round of the tournament. At the end of the crossover tournament, a set of new individuals is created (Algorithm 3).

Mutation: Two types of mutation are used. The first one is done by randomly replacing either a node or an edge in the offspring. The chance of either selection is 50%. Our new node or edge selection takes into account uniqueness inside an individual.

The second mutation is a time-varying operator. In the first step 50% of the \(k+l\) nodes and edges are changed. The number of changes decreases linearly until half of the maximum generation number is reached, after which it will equal one.

While we have strict restrictions on each individual in our population, a repair operator is not necessary, since both the crossover and mutation operators self-repairs the potentially caused damage to any given individual.

Selection of Survivors: The algorithm combines the original population and any newly-created child, including the possible mutations, into a larger set of individuals, after which we trim this new set to the original population size using elitism (keeping the best individuals), this will become the new population for the next iteration of the algorithm (a \((\mu +\lambda )\) selection scheme is used).

figure b
figure c

4.3 Experimental Set-Up

Benchmarks The synthetic benchmark set proposed in [38] contains three types of graphs, with different basic properties: Barabási-Albert (BA) - scale-free networks, Erdős-Rényi (ER) - random networks, and Forest-fire (FF) graphs, which simulate how fire spreads through a forest. Table 1 describes basic network measures of the benchmarks employed: number of nodes (|V|), number of edges (|E|), average degree (\(\langle d \rangle \)), density of the graph (\(\rho \)), and average path length (\(l_G\)).

Table 1. Synthetic benchmark test graphs and basic properties.

In Table 2 the set of real networks used for numerical experiments is presented, including the source of the network.

Parameter Setting. To find a good parameter configuration 16 parameter settings were tested on four networks: two synthetic (ER250 and FF250) and two real world networks (dolphins and karate). Table 3 presents the tested parameter settings, and Fig. 2 presents the obtained results. Based on a Wilcoxon non-parametric statistical test the configuration S11 was chosen for the further experiments.

Table 2. Real graphs and basic properties.
Table 3. Parameter setting used for parameter tuning
Fig. 2.
figure 2

Pairwise connectivity values over ten independent runs for four networks for 16 different parameter configurations

The number of critical nodes (k) is \(5\%\) of the total nodes, while the number of critical edges (l) is set to \(3\%\) of the total number of edges (proportions are set general, to emphasize critical nodes and edges on different type of networks). The maximum generation number for both GA variants was set to 5000.

4.4 Results and Discussion

An example of the evolution of the fitness value of the genetic algorithm is presented in Fig. 3, we can observe the change of the values in each step.

For better understanding, Fig. 4 presents the smallest network, the Zebra network, and critical nodes and edges detected with the genetic algorithm.

Table 4 presents the results obtained from the genetic algorithm (\(GA_1\)), genetic algorithm with time-varying mutation (\(GA_2\)) and from the greedy algorithm. \(GA_1\) and \(GA_2\) outperformed the greedy algorithm in most cases. Only in the case of the Football network did both algorithms perform in the same way (with standard deviation equal to 0). However, analyzing the results, in the case of the Football network the values for k and l were too small, because there was no change from the initial population in \(GA_1\) and \(GA_2\). The incorporation of time-varying mutation did not significantly improve the results.

Regarding the running time of both methods (\(GA_1\) and greedy), in small networks, as expected, greedy runs faster (e.g. in the case of dolphins network \(2.47\pm 0.02\) s running time has the greedy algorithm, and \(183.64 \pm 0.48\) s the \(GA_1\)), but in a larger network the \(GA_1\) has better running time (e.g. for the FF500 network the greedy runs in average \(1420.66 \pm 63.48\) s and the \(GA_1\) \(1294.3\pm 25.15\) s).

4.5 Application: New Network Robustness Measure Proposal

As an application of critical node and edge detection, we introduce a new network robustness measure. In the literature several robustness measure exist, trying to capture different properties of the networks. For example [13] describes different measures to characterize network robustness: \(k_v\) - vertex connectivity - the minimal number of vertices which need to be removed to disconnect the graph, \(k_e\)- edge connectivity - the same measure for edges, diameter of the graph (d), average distance (\(d-\)), average efficiency (E) - considering shortest paths, maximum edge betweenness (\(b_em\)), average vertex betweenness (\(b_v\)), average edge betweenness (\(b_e\)) - these measures considering shortest paths. The average clustering coefficient (C) is a proportion of triangles and connected triples. Algebraic connectivity (\(\lambda _2\)) is the second smallest eigenvalue of the Laplacian matrix of G, a number of spanning trees (\(\epsilon \)) counts the possible different spanning trees of the graph, while effective graph resistance (Kirchhoff index) (R) investigates the graph as a network circuit.

Fig. 3.
figure 3

Errorbar of 10 independent runs representing fitness values over evaluations on two example graphs: an Erdos-Renyi graph and Dolphins network

We study several real-world networks from different application fields: two infrastructure networks - UsAir97 [31] a weighted undirected network with 332 nodes and 2126 edges (we do not take into account the weights) containing flights in the US in the year 1997 (nodes are airports and edges represent direct flight between them) and a road network [19, 35] containing international E-roads, nodes representing cities (1174 nodes) and edges representing direct E-road connections between them (1417 edges). Two brain networks are studied: a mouse visual cortex network [2] with 123 nodes and 214 edges, and a cat-mixed-brain-species-1 network [2] with 65 nodes and 1100 edges (we will use the abbreviations Mouse_cortex and Cat_brain in the next). Two power networks are studied: 494-bus [31] (494 nodes and 586 edges) and 662-bus [31] (662 nodes and 906 edges), two interaction networks (Infect-dublin [34] having 410 nodes and 2800 edges and Infect-hyper [34] with 113 nodes and 2200 edges), and a computer network - Route_views [19] (6474 nodes and 13895 edges).

The above mentioned network measures are calculated for the studied networks, as presented in Table 5. As we can see, the majority of the indices cannot be used for disconnected networks (in this example, the E-road network is disconnected), this is one of the motivations to introduce the new measure to analyze the network robustness, based on the (k,l)-CNEDP.

The introduced measure (\(NE_{k,l}\)) has the following form:

$$NE_{k,l}=\frac{2\cdot (k,l)\text {-CNEDP}}{(n-k-1)(n-k-2)}$$

\(\frac{(n-k-1)(n-k-2)}{2}\) is the worst possible value of pairwise connectivity, after deleting k nodes, n is the number of nodes in the original network, \(NE_{k,l} \in [0,1]\).

In the case of the USAir97 network, for example:

$$NE_{21,6}=\frac{2\cdot 35264}{(332-21-1)(332-21-2)}=0.66.$$

The \(NE_{k,l}\) can be seen as a measure which based on the number of deletion of nodes and edges quantifies the network robustness. To analyse the results a correlation matrix was built (without the results of the E-road network), the new measure - \(NE_{k,l}\) (new m) was compared with d, \(d-\), E, \(b_em\), \(b_v\) and C. As presented in Fig. 5 a weak correlation exists between the new measure and the clustering coefficient (C).

Fig. 4.
figure 4

The smallest real-world network, the zebra network (left). The remaining network after node and edge deletion - after (1, 3)-CNEDP (right)

Fig. 5.
figure 5

Pearson’s correlation coefficients between network robustness mesures

Table 4. Results for synthetic and real graphs. Mean values and standard deviation over 10 independent runs are presented. A (*) indicates best result based on a Wilcoxon sign-rank test
Table 5. Network robustness measures for studied networks

5 Conclusions

A new combinatorial optimization problem, the combined CNEDP, is defined and analyzed. Two methods are proposed to solve this problem: a greedy approach and a simple GA. Numerical experiments on both synthetic and real-world networks show the effectiveness of the proposed algorithm. As a direct application this newly-introduced problem is used as a new network centrality measure for network robustness testing. Further work will address other network measures (for example maximum components size, network centrality measures) and the refinement of the GA.