1 Introduction

Graphs are employed to represent social networks in which people and objects are connected. Such modeling allows for an investigation of social networks in a convenient manner. The progressive studies on the properties of graphs offer not only interesting insights into how social beings interact but also several practical applications, such as marketing influence maximization (Lei et al. 2015), fraud detection (Akoglu and Faloutsos 2013), and product recommendation (Huang et al. 2002).

Some of the most important properties of graphs revolve around the concept of k-core (Seidman 1983). The k-core of a graph is the maximal sub-graph in which the degree of each node is at least k. The core number of a node v is the maximum integer k such that v is in the k-core. The core number has demonstrated effectiveness in indicating the centrality of nodes in a network, especially in the problems of finding influential nodes (Kitsak et al. 2010; Shin et al. 2016) and graph clustering (Mei et al. 2021).

Real-world graphs often face attacks that remove or render several parts of the network impaired (Freitas et al. 2022), and a line of work has investigated the resilience of the core structure against such attacks (Linghu et al. 2020; Medya et al. 2020; Zhu et al. 2018). That is, in these works, resilience is characterized by the ability of the core structure of a graph to maintain one or several properties after a portion of the network has been removed. These works focus on how the size of the k-core decreases or how the ranking of core numbers is altered as the consequence of removing several nodes or edges from the network. One may devise strategies to delete some nodes or edges to minimize the k-core size (Medya et al. 2020; Zhu et al. 2018; Chen et al. 2021) or supplement the network with augmented edges to consolidate the core structure (Laishram et al. 2018; Zhou et al. 2019; Linghu et al. 2020).

Despite extensive studies on the properties and robustness of graphs, much is left undiscovered for hypergraphs. Hypergraphs, which are the extension of pair-wise graphs allowing multiple nodes to be in the same hyperedge rather than just two, naturally represent group interactions that are omnipresent in practice (Benson et al. 2018a; Yin et al. 2017; Do et al. 2020; Lee et al. 2021, 2020). For example, each hyperedge may represent a publication whose co-authors are nodes in the hypergraph, an email involving several email addresses as nodes, or a discussion thread consisting of several participants. Hypergraphs have been applied in the domain of image processing (Liu et al. 2011), social networks (Tan et al. 2014; Yang et al. 2019), contagion models (Iacopini et al. 2019; de Arruda et al. 2020), electronic commerce (Zhu et al. 2016), and circuit design (Ouyang et al. 2002).

Real-world hypergraphs may also face attacks that involve removing a portion of the network (Peng et al. 2022; Ma et al. 2018) for the same reasons as graphs. Hypergraphs are abstract structures representing several types of higher-order interactions and are stored in databases for mining purposes. For instance, coauthorship data are stored in academic databases (Sinha et al. 2015), emails are saved in storage systems (Klimt and Yang 2004), and discussion threads are stored in online forums.Footnote 1 Attackers may intrude on those systems to remove several nodes or hyperedges to weaken several properties of the networks, which corresponds to deletion attacks on hypergraphs.

The concept of k-core has been proven useful also in hypergraphs, and thus attackers may aim to impair the core structure in hypergraphs. Similarly to pair-wise graphs, the k-core of a hypergraph (Sun et al. 2020) is construed as the maximal sub-hypergraph within which the degree of each node is at least k. In hypergraphs, the concept of k-cores demonstrates applications in identifying dense regions (Gabert et al. 2021b) or monitoring epidemics (Gabert et al. 2021a), and as shown in Sect. 4, hypergraph cores are also useful in several other practical applications, such as identifying seed nodes for influence maximization or detecting abnormally dense sub-networks. Invaders, hoping to degrade the performances in those tasks, may be incentivized to attack the networks via the deletions of nodes or hyperedges, for the same motivations that attackers aim to impair the core structures in graphs (Linghu et al. 2020; Medya et al. 2020; Zhu et al. 2018).

In this work, we focus on the core resilience of real-world hypergraphs. Motivated by the applications of hypergraph k-cores and the possibilities of attacks on hypergraphs, we formulate CREAM (Core-conserving REsilience mAxiMization), the problem of improving the core resilience of the hypergraphs against deletion attacks through the means of augmenting hyperedges while conserving the original core structure. We first explore the relevant patterns of core resilience of real-world hypergraphs when a portion of the node set or the hyperedge set has been removed. Based on these, we consider supplementing each hypergraph with augmented hyperedges that strengthen the core resilience of the hypergraph while preserving all core numbers. Note that supplementing hyperedges to those hypergraphs constitutes adding “virtual” hyperedge records into the respective databases to strengthen those networks. These virtual hyperedges should be constructed carefully so that they preserve the network properties. Moreover, while remaining indistinguishable from real hypergraphs to attackers, these supplemented hyperedges can be removed by database administrators whenever necessary, thus staying harmless to the network’s applications.

However, there is a major challenge in augmenting hypergraphs through the addition of hyperedges, which is due to the complexity of hypergraphs. In hypergraphs, each hyperedge may contain an arbitrary number of nodes, and thus the number of all possible node combinations, which may form augmented hyperedges, is insurmountable. As a result, the cost of iterating through each possible combination of nodes and checking whether it is desirable to add the combination would be prohibitive.

To address the challenge, we introduce COREA, a fast, effective, and theoretically sound method that augments hyperedges to preserve the core structure and improve the core resilience of the hypergraphs. Inspired by several observations related to core resilience, COREA constructs a pool of candidate hyperedges, which are guaranteed to conserve all core numbers, and selects the best candidates to augment to the hypergraph. Our experiments show that COREA is up to \(1.7\times\) more effective than several baseline approaches while providing a better time-performance trade-off.

In short, our contributions in this research are three-fold:

  • Problem definition We propose and tackle CREAM (Core-conserving REsilience mAxiMization), the problem of core resilience improvement in real-world hypergraphs, for the first time, to the best of our knowledge.

  • Key concepts and empirical observations We propose relevant concepts and present the key observations regarding the core resilience of real-world hypergraphs that motivate the design of our method.

  • Method We propose COREA, a fast, effective, and theoretically sound method for enhancing the core resilience of hypergraphs. Our extensive experiments demonstrate the consistent superiority of COREA over several baseline approaches across ten real-world hypergraphs.

For reproducibility, the code and datasets are available at https://github.com/manhtuando97/CoReA.

The remaining sections of this paper are as follows: In section 2, we review some related work. We introduce some preliminaries and problem formulation in Sect. 3. We then present some applications of core numbers in hypergraphs in Sect. 4 to motivate our work. The key observations are summarized in Sect. 5. We propose our method in Sect. 6. We evaluate our method in Sect. 7, where we also investigate how our proposed method helps support the applications of hypergraph core numbers in the tasks outlined in Sect. 4 under various attack scenarios. Lastly, we conclude our work in Sect. 8.

2 Related work

Hypergraphs: Hypergraphs represent high-order interactions in various fields (Benson et al. 2018a; Yin et al. 2017; Do et al. 2020). There have been numerous studies on the structures and properties of real-world hypergraphs regarding transitivity (Kim et al. 2023), reciprocity (Kim et al. 2022), simplicial closures (Benson et al. 2018a), motifs (Lee et al. 2020), evolution patterns (Kook et al. 2020; Benson et al. 2018b), and realistic generative models (Do et al. 2020; Lee et al. 2021; Kim et al. 2023, 2022; Giroire et al. 2022; Benson et al. 2018b). Meanwhile, some others tackle several learning problems on hypergraphs, such as clustering (Rota Bulò and Pelillo 2013; Li and Milenkovic 2017; Amburg et al. 2020), link prediction (Kumar et al. 2020; Yadati et al. 2020; Hwang et al. 2022), and node classification (Feng et al. 2019; Yadati et al. 2019; Chien et al. 2022).

\(\textit{k-Core in graphs and hypergraphs}:\) The concept of k-core plays an integral role in the graph mining domain. It is used to detect dense subgraphs and influential nodes in (Shin et al. 2016), whereas Giatsidis et al. (2011) employ this concept to evaluate the cooperation within a community in social networks. Some other problems on k-cores include scalable core decomposition (Li et al. 2013; Aridhi et al. 2016), its maintenance on dynamic graphs (Lin et al. 2021), and core decomposition on uncertain graphs (Peng et al. 2018). On the other hand, little attention has been paid to the k-cores of hypergraphs. Some preliminary work focus on scalable maintenance of k-cores in dynamic hypergraphs (Gabert et al. 2021a; Sun et al. 2020) or how the concept of k-core in hypergraphs is applied in discovering dense components in social networks (Gabert et al. 2021b).

Core resilience: Medya et al. (2020) define the resilience of a k-core as its ability to maintain its nodes. After many edges are deleted, several nodes can lose their core numbers, and the size of the k-core can be reduced. Several studies attempt to minimize the number of remaining nodes in the k-core by deleting edges (Medya et al. 2020; Zhu et al. 2018) or removing nodes (Zhang et al. 2017). In contrast, some others enhance the resilience of the k-core against such attacks by anchoring nodes, i.e considering some nodes as having an infinite degree (Bhawalkar et al. 2015; Laishram et al. 2020; Linghu et al. 2020). Following a different approach, Laishram et al. (2018) define core resilience as the rank correlation of nodes in core numbers after several nodes or edges have been deleted. The authors correlate this statistic with several node-level measurements and design an algorithm to enhance the core resilience via adding edges.

In this work, we tackle the problem of improving the core resilience in hypergraphs. We adopt the same notion of hypergraph k-cores in (Leng et al. 2013) and core resilience in (Laishram et al. 2018. To this end, we extend the existing concepts of core strength and core influence in this work from graphs to hypergraphs, introduce new relevant concepts, and design an algorithm for the core resilience improvement problem. In this problem, we face new challenges unique to the complexity of hypergraphs, outlined in Sect. 3.2, and propose our method to address these challenges. The details for our technical contributions are presented in Sects. 5 and 6.

3 Preliminaries and problem definition

3.1 Basic concepts

We introduce some basic concepts. The key notations are in Table. 1.

Hypergraphs: A hypergraph is defined as \(\textbf{G}=(\textbf{V},\textbf{E})\), where \(\textbf{V}\) is the set of nodes, and \(\textbf{E}\subseteq 2^{\textbf{V}}\) is the set of hyperedges. Each hyperedge \(e \subseteq V\) is a set of \(|e |\) (\(\ge 2\)) nodes.Footnote 2 For each node v, we define the set \(\textbf{E}_{\textbf{G}}(v)\) of hyperedges incident to v as \(\textbf{E}_{\textbf{G}}(v)=\{e \in \textbf{E}\mid v \in e \}\). The degree \(d_{\textbf{G}}(v)\) of v is defined as the number of hyperedges incident to v, i.e., \(d_{\textbf{G}}(v) = |\textbf{E}_{\textbf{G}}(v) |\). A node having degree 0 is an isolated node. A sub-hypergraph \(\tilde{\textbf{G}}=(\tilde{\textbf{V}}, \tilde{\textbf{E}})\) of \(\textbf{G}\) is a hypergraph (i.e., \(\tilde{\textbf{E}} \subseteq 2^{\tilde{\textbf{V}}}\)) where \(\tilde{\textbf{V}} \subseteq \textbf{V}\), and \(\tilde{\textbf{E}} \subseteq \textbf{E}\).

Clique expansion: The clique expansion of hypergraph \(\textbf{G}=(\textbf{V}, \textbf{E})\) is a graph \(\textbf{G}_{(1)}=(\textbf{V}, \textbf{E}_{(1)})\) where \(\textbf{E}_{(1)} = \{\{u,v\} \mid u, v \in \textbf{V}, \exists e \in \textbf{E}, \{u,v\} \subseteq e \}\). That is, \(\textbf{G}_{(1)}\) is a graph in which two nodes \(u,v \in \textbf{V}\) are adjacent if and only if there exists a hyperedge e in \(\textbf{E}\) containing both u and v. A hyperedge \(e \in \textbf{E}\) results in a clique of \(|e|\) nodes in \(\textbf{G}_{(1)}\). The clique expansion is a representation of the hypergraph in the form of a pair-wise graph. However, this representation incurs information loss as the original hypergraph \(\textbf{G}\) cannot be reconstructed from \(\textbf{G}_{(1)}\) and two different hypergraphs may result in the same clique expansion, as depicted in Fig. 1.

Fig. 1
figure 1

The clique expansion of a hypergraph is a pair-wise graph in which two nodes are adjacent if and only if there exists at least one hyperedge of the original hypergraph containing them. This representation is lossy, as the original hypergraphs cannot be reconstructed from the clique expansion and different hypergraphs may have the same clique expansion

Table 1 Frequently used symbols

k-Core and core numbers: The k-core of \(\textbf{G}\), denoted by \(\textbf{C}(k,\textbf{G})\), is the sub-hypergraph of \(\textbf{G}\) within which the degree of every node is at least k (Leng et al. 2013). The core number \(N_{\textbf{G}}(v)\) of node v in hypergraph \(\textbf{G}\) is the maximum integer k such that v is in \(\textbf{C}(k,\textbf{G})\). The degeneracy \(N_{\textbf{G}}^{*}\) of hypergraph \(\textbf{G}\) is the highest core number of a node \(v \in \textbf{V}\). The degeneracy core of \(\textbf{G}\) is the \(N_{\textbf{G}}^{*}\)-core of \(\textbf{G}\), denoted by \(\textbf{C}(N_{\textbf{G}}^{*}, \textbf{G})\).

Core decomposition: Core decomposition is the process of obtaining the k-cores and core numbers of nodes in a hypergraph \(\textbf{G}\) (Algorithm 3 in Appendix 1). After removing all isolated nodes, the remaining hypergraph is the 1-core. For each \(k \ge 1\), to obtain the \((k+1)\)-core, a pruning process starts from the k-core and repeatedly removes the nodes of degrees lower than \((k+1)\) until no such removal is possible. The nodes removed in this pruning process are assigned core number k.

Node and hyperedge deletions: Attackers often seek to weaken the structure of \(\textbf{G}\) by deleting several nodes or hyperedges (Peng et al. 2022; Ma et al. 2018). We denote a deletion attack as \(\textsf{A}^{\textbf{V}}(r, s)\) that deletes \(r\%\) of the nodes in \(\textbf{V}\) by a strategy s. Similarly, \(\textsf{A}^{\textbf{E}}(r, s')\) is an attack that deletes \(r\%\) of the hyperedges in \(\textbf{E}\) by a strategy \(s'\). We introduce several potential attack strategies that attackers may employ in Sect. 5.2.

Spearman’s rank correlation: Spearman’s rank correlation is a measurement of rank correlation between two variables. Let \(X = [x_1,...,x_n]\) and \(Y=[y_1,...,y_n]\) be two variables. Let \(R(X)=[\tau _X(x_1),...,\tau _X(x_n)]\) be the rank variable of X in which \(\tau _X(x_i)\), for \(i=1,...,n\), is the relative ranking position of \(x_i\) when the values in \(\{x_1,...,x_n\}\) are sorted in the descending orderFootnote 3. Simiarly, let \(R(Y)=[\tau _Y(y_1),...,\tau _Y(y_n)]\) be the rank variable of Y. The Spearman’s rank correlation between X and Y, denoted as \(\rho (R(X), R(Y))\), equals to the Pearson correlation coefficient of R(X) and R(Y), i.e.,

$$\begin{aligned} \rho (R(X), R(Y))=\frac{\text {cov}(R(X), R(Y))}{\sigma _{R(X)}\sigma _{R(Y)}}, \end{aligned}$$
(1)

where \(\text {cov}(R(X), R(Y))\) is the covariance of R(X) and R(Y); \(\sigma _{R(X)}\) and \(\sigma _{R(Y)}\) are the standard deviations of R(X) and R(Y), respectively. We have \(-1 \le \rho (R(X), R(Y)) \le 1\), with \(\rho (R(X), R(Y))=1\) when R(X) and R(Y) are identical and \(\rho (R(X), R(Y))=-1\) when R(X) and R(Y) are fully opposed.

Core resilience: The core resilience \({\mathcal {R}}^{\textbf{V}}_{\textbf{G}}(r, s)\) against node deletions of a hypergraph \(\textbf{G}\) is defined as the Spearman’s rank correlation coefficient of the core numbers of the nodes before and after \(r \%\) of the nodes have been deleted from \(\textbf{V}\) by attack \(\textsf{A}^{\textbf{V}}(r, s)\). After several nodes are deleted alongside their incident hyperedges, there remains a sub-hypergraph \(\tilde{\textbf{G}}=(\tilde{\textbf{V}}, \tilde{\textbf{E}})\) in which some of the remaining nodes may lose their original core numbers, potentially distorting the ranking of core numbers. Denote the original and post-attack core numbers of the remaining nodes \(\tilde{V}=\{v_{i_1},...,v_{i_m}\}\) as \(N_{\textbf{G}}= [N_{\textbf{G}}(v_{i_1}),...,N_{\textbf{G}}(v_{i_m})]\) and \(N_{\tilde{\textbf{G}}} = [N_{\tilde{\textbf{G}}}(v_{i_1}),...,N_{\tilde{\textbf{G}}}(v_{i_m})]\)Footnote 4, respectively. The core resilience \({\mathcal {R}}^{\textbf{V}}_{\textbf{G}}(r,s)\) is defined as the Spearman’s rank correlation between \(N_{\textbf{G}}\) and \(N_{\tilde{\textbf{G}}}\), which is equal to \(\rho (R(N_{\textbf{G}}), R(N_{\tilde{\textbf{G}}}))\). Similarly, the core resilience \({\mathcal {R}}^{\textbf{E}}_{\textbf{G}}(r, s')\) against hyperedge deletions of a hypergraph \(\textbf{G}\) is defined as the Spearman’s rank correlation coefficient of the core numbers of the nodes before and after \(r \%\) of the hyperedges have been deleted from \(\textbf{E}\) by attack \(\textsf{A}^{\textbf{E}}(r, s')\).

These definitions of core resilience against node deletions and hyperedge deletions are adopted from (Laishram et al. 2018) and extended to hypergraphs. The core number serves as a measure of node centrality (Shin et al. 2016; Laishram et al. 2018), and core resilience measures the tendency of central (or peripheral) nodes to remain central (or peripheral) after the network faces node/hyperedge deletions.

3.2 Problem definition

In this section, we aim to establish a clear understanding of the problem at hand. First, we present a formal definition of the problem. Then, we discuss its objective and constraints. Lastly, we discuss the challenges associated with this problem and discuss its relevance to existing problems.

Problem 1

(CREAM: Core-conserving REsilience mAxiMization)

  • Input: a hypergraph \(\textbf{G}=(\textbf{V},\textbf{E})\) with the hyperedge size distribution D and a budget B,

  • Find: b hyperedges: \(\overline{\textbf{E}} = \{e_1,...,e_b\}\) where \(\overline{\textbf{E}} \subseteq 2^{\textbf{V}}\) and \(\overline{\textbf{E}} \cap \textbf{E}= \emptyset\) to augment to \(\textbf{G}\) to form \(\textbf{G}'=(\textbf{V}, \textbf{E}')\) with \(\textbf{E}'=\textbf{E}\cup \overline{\textbf{E}}\),

  • to Maximize: the core resilience \({\mathcal {R}}_{\textbf{G}'}^{\textbf{T}}(r,s)\) of \(\textbf{G}'\) in a case of attack \(\textsf{A}^{\textbf{T}}(r, s)\), whose target \(\textbf{T}\), degree r and strategy s are unknown in advance (\(\textbf{T}\) is either \(\textbf{V}\), for a node deletion attack, or \(\textbf{E}'\), for a hyperedge deletion attack)

  • Subject to constraints

    • all core numbers of nodes are conserved, i.e., \(N_{\textbf{G}}(v) = N_{\textbf{G}'}(v)\),

    • the hyperedges are augmented within the budget, i.e., \(b \le B\),

    • the size distribution of the hyperedges in \(\overline{\textbf{E}}\) follows D.

Objective: As shown later in Sect. 4, the ranking of core numbers is useful in several applications. Such ranking may be distorted once several nodes or hyperedges are deleted from the hypergraph. Thus, we wish to preserve such ranking under deletion attacks by improving the core resilience.

Constraints on cores numbers: The goal is to consolidate the resilience of the core structure, so it is essential to avoid distorting the core structure, to begin with. Also, we augment hyperedges as a pre-caution measure without any prior knowledge of the attack, so the augmented hyperedges should preserve the core structure even in the case that the attack would not occur. These justify the constraint of preserving the core numbers.

Constraints on the number of augmented hyperedges: While Problem 1 allows a maximum budget of B, in order to satisfy the requirement of preserving all core numbers, the actual number b of hyperedges that any method can augment to \(\textbf{G}\) can be smaller than B.

Constraints on the size of augmented hyperedges: In addition, since the augmented hyperedges should not be easily distinguishable from the real hyperedges or harmful to the network properties, the original hyperedge size distribution should be preserved. Moreover, if the size distribution of the augmented hyperedges deviates significantly from the real distribution, they become easily noticeable to the attackers, which may enable them to deliberately ignore all the augmented hyperedges that they deem unrealistic prior to any attacks, which renders the augmentation unavailing.

Related problems and unique challenges: A similar problem in graphs is defined in (Laishram et al. 2018), where the authors propose a method named MRKC. Among all pairs of non-adjacent nodes, MRKC retains those guaranteed to preserve all core numbers when they are added to the network while discarding the others. MRKC then ranks all the retained pairs by a certain metric and greedily selects the one with the highest score to be added to the network. A naive extension of MRKC to hypergraphs is to check all combinations of nodes that are not actual hyperedges and select only those that preserve all core numbers, However, the number of the combinations is in the order of \(\mathcal {O}(2^{|\textbf{V}|})\), which is huge in practice. Therefore, the cost of checking all possible node combinations is prohibitive and renders this approach impractical. This proves the challenge of Problem 1. We address this challenge by our method, COREA, in Sect. 6.

4 Motivating applications

In this section, we present two applications of the concept of k-core on hypergraphs, in the identification of influential nodes and anomaly detection, to motivate our studies on the core resilience of hypergraphs. Due to the importance of k-cores, attackers are often incentivized to impair the core structures of pair-wise graphs (Linghu et al. 2020; Medya et al. 2020; Zhu et al. 2018). Similarly, hypergraph core structures, proving useful in these applications, are vulnerable to deletion attacks. As subsequently shown in Sect. 7.6, the usefulness of hypergraph cores in those tasks is degraded when the networks face deletion attacks, and our proposed method helps mitigate such degradation.

4.1 Identification of influential nodes

The concept of core number in graphs has proven useful in finding influential nodes in social networks (Kitsak et al. 2010; Shin et al. 2016). We generalize the SIR model in (Kitsak et al. 2010) to hypergraphs (see Algorihm 4 in Appendix 2). In each dataset, we start with one seed node (initially infected), simulate the SIR process, and measure the number of ever-infected nodes (i.e., recovered nodes) as the influence of the seed node.

In Table 2, we report the Spearman’s rank correlation coefficient between the node influences and each of the following node-level statistics:

  • Core: the core numbers in the original hypergraph.

  • Degree: the degrees in the original hypergraph.

  • Clique-C: the core numbers in the clique expansion of the hypergraph.

  • Clique-D: the degrees in the clique expansion of the hypergraph.

Among them, the core number in hypergraph is the most correlated with the individual nodes’ influences, demonstrating the usefulness of hypergraph core number ranking in finding influential nodes in real-world hypergraphs.

Table 2 The Spearman’s rank correlation coefficient between the nodes’ influences and the statistics

4.2 Anomaly detection

Shin et al. (2016) introduce an effective scoring function to detect abnormally dense subgraphs. The scoring function employed to measure the abnormality of node v is the difference in the rankings of v in core number and degree, specifically, \(s(v) = |\log (rank_c(v)) - \log (rank_d(v))|\).

In each hypergraph, we select k nodes uniformly at random as abnormal nodes and inject \(\left\lceil\left\lceil {\frac{k(k-1)}{m}}\right\rceil\right\rceil\) hyperedges of size m in which each of the abnormal nodes is incident to \((k-1)\) hyperedges, with m is the maximum hyperedge size of the hypergraph. Each abnormal node now has core number and degree at least \((k-1)\). We use the score s(v) to estimate how abnormal each node v is in the two settings:

  • Core: \(rank_c(v)\) and \(rank_d(v)\) are the rankings of v in core number and degree in the hypergraph, respectively.

  • Clique-C: \(rank_c(v)\) and \(rank_d(v)\) are the rankings of v in core number and degree in the clique expansion of the hypergraph, respectively.

The AUC-PR of predicting which nodes are the abnormal nodes based on the score s(v) in the two settings Core and Clique-C is reported in Fig. 2. Using core numbers in hypergraphs yields better prediction than using the core numbers in the clique expansion, showing the usefulness of the concept of hypergraph core numbers, particularly the ranking of core numbers.

Fig. 2
figure 2

The AUC-PR of predicting the abnormal nodes. Employing core numbers in hypergraphs results in a more accurate prediction than core numbers in the clique expansions

5 Proposed concepts and observations

The objective of Problem 1, core resilience, is a hypergraph-level measurement, and it is difficult to optimize directly for two major reasons. Firstly, measuring the core resilience is computationally expensive as we need to conduct core decomposition on the original hypergraph, apply a deletion attack, and administer core decomposition again on the attacked networks. Furthermore, due to the unpredictable nature of attacks, it is impossible to anticipate their magnitude and strategy accurately. This lack of foresight hinders the precise computation of core resilience for our network. Thus, we define several node-level and hyperedge-level measurements to characterize the core resilience so that we can improve the core resilience indirectly via these measurements. In this section, we introduce such measurements and show that they are effective indicators of the core resilience of real-world hypergraphs via several empirical observations.

5.1 Proposed concepts

We introduce a number of concepts that are related to core resilience. These concepts serve as the foundation for the observations made in Sect. 5 and our proposed method presented in Sect. 6.

5.1.1 Hyperedge core number and anchor

As we wish to augment the hyperedges that preserve the core numbers, we seek to unravel how hyperedges contribute to the core numbers of nodes. While a node relies on having enough incident hyperedges for its core number, by definition, the existence of a hyperedge may not contribute to the core numbers of all of its incident nodes. In the core decomposition process, when a node v of core number k is removed, its incident hyperedges are also removed. If e is one of those hyperedges, e is not incident to any nodes of core numbers smaller than k; otherwise, e would have been removed before v. Moreover, e cannot contribute to the core numbers of the incident nodes whose core numbers are higher than k as e is not present in the core levels higher than k. In other words, e only helps contribute to the core numbers of the incident nodes whose core numbers are equal to k, and we refer to them as the anchors of e.

Core number of a hyperedge e: It is the maximum integer k such that e is in \(\textbf{C}(k,\textbf{G})\) and denoted by \(\overline{N_{\textbf{G}}}(e)\). In the pruning process to obtain the \((k+1)\)-core from the k-core, a node is removed along its incident hyperedges. Therefore, \(\overline{N_{\textbf{G}}}(e)\) is equal to the lowest core number of a node included in e: \(\overline{N_{\textbf{G}}}(e) = \min _{v \in e} N_{\textbf{G}}(v)\).

Anchor(s) of a hyperedge e: They are the nodes involved in e having core number equal to \(\overline{N_{\textbf{G}}}(e)\). The set of anchors of e is denoted by \(\textbf{A}_{\textbf{G}}(e)\). For each \(v \in \textbf{A}_{\textbf{G}}(e)\), e is said to be anchored at v. The anchors are critical to the core number of the hyperedge as the hyperedge loses its core number once an anchor loses its core number.

Each hyperedge incident to a node v has a core number that is either equal to or lower than that of v. We denote the sets of such hyperedges as \(\textbf{E}_{\textbf{G}}^=(v)=\{e \in \textbf{E}_{\textbf{G}}(v) \mid \overline{N_{\textbf{G}}}(e) = N_{\textbf{G}}(v) \}\) and \(\textbf{E}_{\textbf{G}}^<(v)=\{e \in \textbf{E}_{\textbf{G}}(v) \mid \overline{N_{\textbf{G}}}(e) < N_{\textbf{G}}(v) \}\), respectively.

5.1.2 Core strength and core influence

Before exploring the core resilience of the hypergraph as a whole, which is difficult to compute exactly, we characterize what constitutes the resilience of nodes in keeping their core numbers, how nodes benefit from the connections with other nodes for their core numbers, and in turn how nodes contribute to the core numbers of other nodes. As described, a node v of core number k relies entirely on the incident hyperedges whose core numbers are also k for its core number, i.e., the incident hyperedges consisting of only nodes having core numbers at least k. If v is incident to many hyperedges of such kind, even when some are removed, v may still have enough incident hyperedges, at least k, to maintain its core numbers k. In those hyperedges, the nodes having core numbers greater than k help contribute to the core number of v via the incident hyperedges. We extend the concepts of core strength and core influence in graphs (Laishram et al. 2018) to hypergraphs to quantify how resilient a node is in keeping its core number and how much a node contributes to the connected nodes of lower core numbers, respectively, in a hypergraph.

Core strength of a node v: It is the minimum number of hyperedges to delete to certainly reduce \(N_{\textbf{G}}(v)\), denoted by \(\mathcal{C}\mathcal{S}_{\textbf{G}}(v)\). The node v depends on its incident hyperedges in \(\textbf{E}_{\textbf{G}}^=(v)\) to obtain its core number because all hyperedges in \(\textbf{E}_{\textbf{G}}^<(v)\) are deleted before the core decomposition process reaches the \(N_{\textbf{G}}(v)\)-core. \(|\textbf{E}_{\textbf{G}}^=(v) |- N_{\textbf{G}}(v)\) is the number of “extra” hyperedges incident to v in the \(N_{\textbf{G}}(v)\)-core, beyond its minimum requirement of \(N_{\textbf{G}}(v)\) incident hyperedges, so after merely removing \(\mid \textbf{E}_{\textbf{G}}^=(v) \mid - N_{\textbf{G}}(v)\) incident to v, v is not guaranteed to lose its core number k. Thus, \(\mathcal{C}\mathcal{S}_{\textbf{G}}(v) = \mid \textbf{E}_{\textbf{G}}^=(v) \mid - N_{\textbf{G}}(v) + 1\). A node with a higher core strength has higher resilience to maintain its core number against deletion attacks. In order to improve a node’s resilience, we add hyperedges to improve its core strength.

Core strength of a hyperedge e: It is the minimum number of hyperedges to delete to certainly reduce \(\overline{N_{\textbf{G}}}(e)\). Since \(\overline{N_{\textbf{G}}}(e)\) certainly decreases once the core number of at least 1 anchor of e decreases, \(\overline{N_{\textbf{G}}}(e)\) is equal to the lowest core strength among those of its anchor(s). We denote the core strength of e by \(\overline{\mathcal{C}\mathcal{S}_{\textbf{G}}}(e)=\min _{x \in \textbf{A}_{\textbf{G}}(e)} \mathcal{C}\mathcal{S}_{\textbf{G}}(v)\). A hyperedge with a higher core strength has higher resilience to maintain its core number against deletion attacks.

Core influence of a node v: It is a number measuring v’s contribution to the core numbers of the anchors of the hyperedges in \(\textbf{E}_{\textbf{G}}^<(v)\), denoted by \(\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\). As a node relies on its incident hyperedges consisting of nodes having equal or higher core numbers to maintain its own core number, a node can contribute to the core numbers of lower-core nodes via such incident hyperedges. Particularly, the anchors of these hyperedges benefit from such contribution. \(\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\) measures such contribution and is defined as:

$$\begin{aligned} \mathcal{C}\mathcal{I}_{\textbf{G}}(v) = 1 + \sum _{e \in \textbf{E}_{\textbf{G}}^<(v)}(1 + \frac{\Delta }{N_{\textbf{G}}(v) - 1})\max _{t \in \textbf{A}_{\textbf{G}}(e)}\left[ (1 - \frac{\mathcal{C}\mathcal{S}_{\textbf{G}}(t)-1}{\mid \textbf{E}_{\textbf{G}}^=(t) \mid })\mathcal{C}\mathcal{I}_{\textbf{G}}(t)\right] , \end{aligned}$$

where \(\Delta = N_{\textbf{G}}(v) - \overline{N_{\textbf{G}}}(e)\) indicates the gap in the core numbers between v and e, and \(\frac{\Delta }{N_{\textbf{G}}(v) - 1}\) is the gap normalized by the highest possible gap (\(N_{\textbf{G}}(v) - 1\)). Among the nodes in \(e \setminus \textbf{A}_{\textbf{G}}(e)\), the term \(1 + \frac{\Delta }{N_{\textbf{G}}(v) - 1}\) gives a higher value to a node with a higher core number. For each anchor \(t \in \textbf{A}_{\textbf{G}}(e)\), t has \((\mathcal{C}\mathcal{S}_{\textbf{G}}(t)-1)\) “extra hyperedges” (deleting them does not change the core number of t). The term \(1 - \frac{\mathcal{C}\mathcal{S}_{\textbf{G}}(t)-1}{\mid \textbf{E}_{\textbf{G}}^=(t) \mid }\) reflects the idea that the more extra hyperedges t has, the less dependent t is on e. Among the anchors of e, the node with the greatest dependence on v is selected, explaining the \(\max\) aggregation. To compute the core influences, we first initialize the core influence of each node to 1 We start computing the core ìnluences of the nodes having the minimum core number and continue up until the nodes in the degeneracy core. The core influence of each node only depends on the nodes with lower core numbers, so we only need to iterate through each hyperedge once. A node that has a high core influence is important to the core numbers of many nodes, so if this node disappears or loses its core numbers, numerous nodes are affected. As a result, to preserve the core structure of the network, we wish to enhance the resilience in maintaining core numbers of the nodes having high core influences.

5.1.3 Core influence-strength and degeneracy centralized index of a hypergraph

Having described the resilience to maintain core numbers at the node and hyperedge levels, we aggregate the relevant measures to the hypergraph level to characterize the hypergraph’s core resilience. These characterizations involve core strengths, core influences, and the degeneracy core.

Core influence-strength of G: It is the average of \(\mathcal{C}\mathcal{I}_{\textbf{G}}\times \mathcal{C}\mathcal{S}_{\textbf{G}}\) over the nodes in \(\textbf{V}\), denoted by \(\mathcal {CIS}(\textbf{G})\): \(\mathcal {CIS}(\textbf{G})=\frac{1}{|\textbf{V}|}\sum _{v \in \textbf{V}}\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\mathcal{C}\mathcal{S}_{\textbf{G}}(v)\). If nodes of high core influences have high core strengths, they are resilient in keeping their core numbers, and as a result, many nodes benefit from the contribution of the high-influence nodes in keeping their core numbers, making the core structure more resilient. Thus, we hypothesize that the \(\mathcal {CIS}(\textbf{G})\) is a good indicator of core resilience of \(\textbf{G}\), which is confirmed in Observation 4 in Sect. 5.3.

Degeneracy centralized index of G: It is a value from 0 to 1 measuring how centralized \(\textbf{G}\) is around its degeneracy core. An index of 0 means that in every hyperedge, every node has the same core number. An index of 1 indicates that every hyperedge is incident to at least one node in the degeneracy core. The degeneracy centralized index of a hypergraph \(\textbf{G}\) is defined as: \(i(\textbf{G}) = \frac{1}{ |\textbf{E}|}\sum _{e \in \textbf{E}}\frac{k^*(e) - \overline{N_{\textbf{G}}}(e)}{N_{\textbf{G}}^{*}- \overline{N_{\textbf{G}}}(e)}\), where \(k^*(e)\) denotes the highest \(N_{\textbf{G}}(v)\) among all nodes \(v \in e\). We extend a similar measurement for graphs in (Laishram 2020), which is theoretically proven to be positively correlated with the core resilience of a random graph, to hypergraphs.

5.2 Attack strategies

In this section, we introduce several attack strategies that attackers may exploit to weaken a hypergraph \(\textbf{G}=(\textbf{V}, \textbf{E})\). Each strategy reflects the preferences of the attackers to delete particular nodes/hyperedges, which they may deem more vital to the core structure of the network. By simulating these attacks, we measure the core resilience of each hypergraph against each attack strategy and confirm the usefulness of the concepts proposed in Sect. 5.1.

Node deletions: We introduce different strategies s for an attack \(\textsf{A}^{\textbf{V}}(r,s)\) that deletes \(r \%\) of nodes.

  • If s is Random Attack: \(r\%\) of the nodes, together with their incident hyperedges, are chosen uniformly at random and deleted by \(\textsf{A}^{\textbf{V}}(r,s)\).

  • If s is Degree Attack: The high-degree nodes are targeted, and the chance for a node v to be deleted by \(\textsf{A}^{\textbf{V}}(r,s)\), alongside its incident hyperedges, is proportional to its degree \(d_{\textbf{G}}(v)\).

  • If s is Core Number Attack: The nodes having high core numbers are targeted, and the chance for a node v to be deleted by \(\textsf{A}^{\textbf{V}}(r,s)\), with its incident hyperedges, is proportional to its core number \(N_{\textbf{G}}(v)\).

  • If s is Core Strength Attack: The nodes of low core strengths are targeted, and the chance for a node v to be deleted by \(\textsf{A}^{\textbf{V}}(r,s)\) is proportional to \(\frac{1}{\overline{\mathcal{C}\mathcal{S}_{\textbf{G}}}(v)}\).

Hyperedge deletions: We introduce different strategies s for an attack \(\textsf{A}^{\textbf{E}}(r,s)\) that deletes \(r \%\) of hyperedges.

  • If s is Random Attack: \(r\%\) of the hyperedges are chosen uniformly at random and deleted by \(\textsf{A}^{\textbf{E}}(r,s)\).

  • If s is Cardinality Attack: The large-cardinality hyperedges are targeted, and the chance for a hyperedge e to be deleted by \(\textsf{A}^{\textbf{E}}(r,s)\) is proportional to its cardinality \(|e |\).

  • If s is Degree Attack: The hyperedges incident to high-degree nodes are targeted, and the chance for a hyperedge e to be deleted by \(\textsf{A}^{\textbf{E}}(r,s)\) is proportional to the degree of its highest-degree constituent node.

  • If s is Core Strength Attack: The hyperedges of low core strengths are targeted, and the chance for a hyperedge e to be deleted by \(\textsf{A}^{\textbf{E}}(r,s)\) is proportional to \(\frac{1}{\overline{\mathcal{C}\mathcal{S}_{\textbf{G}}}(e)}\).

5.3 Observations in real-world hypergraphs

Fig. 3
figure 3

The core resilience of real-world hypergraphs against hyperedge-deletion attacks varies among the attack strategies and across deletion ratios. The x-axis shows the deletion ratio, and the y-axis indicates Spearman’s rank correlation coefficient between the original and the post-attack core number distributions. Core strength attack is consistently the most destructive to the core resilience, while random attack is the least destructive

We present several patterns of core resilience of 10 real-world hypergraphs (Benson et al. 2018a) to validate the usefulness of the concepts proposed in Sect. 5.1. More details on the datasets are in Appendix 1. In this section, we present the results of hyperedge deletions only. The figures highlighting the results of node deletions are in Appendix 3.

Observation 1

Core strength attack is the most destructive to the core resilience of real-world hypergraphs for both node-deletion and hyperedge-deletion attacks.

Figure 3 shows the core resilience of real-world hypergraphs against hyperedge-deletion attack strategies, random, degree, cardinality, and core strength, across deletion ratios. The figure illustrates how the Spearman’s rank correlation, between the original and the post-attack core number distributions, changes depending on the ratio of the hyperedges that are deleted. Core strength attack results in the lowest core resilience per deletion ratio, while random attack results in the highest core resilience.

Fig. 4
figure 4

The distribution of core strengths of nodes in each dataset, visualized on a log-log scale, is positively skewed. This indicates that a majority of nodes have relatively low core strengths, indicating the potential for improvement through the augmentation of hyperedges

Fig. 5
figure 5

The skewness of the distribution of core strengths is negatively correlated with the core resilience. “CorrCoef” indicates Spearman’s rank correlation coefficient. It is worth noting that datasets within the same domain exhibit similarities in terms of both skewness and core resilience

Observation 2

The node core-strength distribution in each dataset is positively skewed.

The core strength distribution of nodes for each dataset is illustrated in Fig. 4. In each dataset, the distribution of core strengths is positively skewed, i.e., most nodes have low core strengths, and they are more prone to losing core numbers due to hyperedge deletions. Augmenting hyperedges to enhance their core strengths can make them more robust against deletion attacks.

Observation 3

A hypergraph of high core resilience tends to possess a low skewness of the core-strength distribution and vice versa. Hypergraph datasets within the same domain exhibit similarities in terms of both skewness and core resilience.

The relationship between the skewness of core-strength distribution and core resilience, when \(50\%\) of hyperedges are deleted, is depicted in Fig. 5. A high skewness indicates a tendency for the distribution to have a heavy tail to the right, indicating more nodes of low core strengths. This tendency is negatively correlated with core resilience. The two datasets in each domain (“co-authorship”, “contact”, “email”, “NDC”, and “threads”) exhibit similarities in terms of both core resilience and the skewness of core strength distribution.

Observation 4

A hypergraph of high core resilience tends to possess a high core influence-strength and vice versa.

Fig. 6
figure 6

The core influence-strength is positively correlated with the core resilience. “CorrCoef” indicates Spearman’s rank correlation coefficient

Fig. 7
figure 7

The degeneracy centralized index is positively correlated with core resilience. “CorrCoef” indicates Spearman’s rank correlation coefficient

Observation 5

A hypergraph of high core resilience tends to possess a high degeneracy centralized index, and vice versa.

For each hypergraph \(\textbf{G}\), we measure the core influence-strength, \(\mathcal {CIS}(\textbf{G})\), and the degeneracy centralized index \(i(\textbf{G})\). The positive correlations between the core resilience, when \(50\%\) of hyperedges are deleted, with \(\mathcal {CIS}(\textbf{G})\) and \(i(\textbf{G})\) are shown in Figs. 6 and 7, respectively. The results imply two indicators for high core resilience. The first indicator is that the nodes of high core influences have high resilience against deletion attacks, i.e., high core strengths. The second indicator is that many hyperedges are incident to the nodes in the degeneracy core.

The core resilience, a hypergraph-level measurement, is difficult to optimize directly as core resilience is computationally expensive to measure exactly and the deletion strategies and degree of attacks that attackers employ are unknown. Therefore, we seek to optimize the correlated measurements that are presented in this section. The details of our proposed method, COREA, are described in Sect. 6. Apart from basing on the observations, COREA also has several theoretical merits, outlined in Sect. 6.4.

6 Proposed method: COREA

In this section, we introduce our proposed method, COREA (COre REsilience Improvement by Hyperedge Augmentation), for addressing Problem 1. We begin by providing an overview of the approach, followed by a detailed description of each step. Lastly, we present its theoretical merits.

6.1 Overview

We present an overview of our two-step method, COREA, whose pseudocode is given in Algorithm 1. The inputs of Problem 1, which are defined regardless of specific solutions, are a hypergraph \(\textbf{G}\) with the hyperedge size distribution D and a budget B. Given these problem input parameters, COREA is tasked to find at most B hyperedges to augment to \(\textbf{G}\) such that these hyperedges have a size distribution following D and conserve all core numbers of the nodes in \(\textbf{G}\).

  • Step 1: Construct a pool P of candidate hyperedges that are guaranteed to conserve all core numbers. Firstly, COREA follows the core decomposition process (see Algorithm 2), i.e., a node-deletion process, to determine \(\mathcal {C}\), the maximum number of hyperedges to augment to \(\textbf{G}\) while conserving all core numbers. We introduce a tie-breaking scheme \(\textsf{T}\) to determine the order by which nodes are deleted in this process. Once the number \(\mathcal {C}\) is determined, we introduce a sampling scheme \(\textsf{S}\) to construct \(\mathcal {C}\) candidate hyperedges.

  • Step 2: Theorem 3 shows that there is a maximum number \(\mathcal {M}\) of hyperedges that can be augmented to \(\textbf{G}\) while preserving all core numbers and \(\mathcal {C} = \mathcal {M}\). Therefore, the maximum number b of hyperedges COREA can augment is \(b=\text {min}\{B, \mathcal {C}\}\), subject to the constraints of Problem 1. As our budget is limited and \(|P |\) might be greater than b, we need to select a few of the candidate hyperedges, constructed in Step 1, from P to add to \(\textbf{G}\). The core resilience is a hypergraph-level objective that is hard to maximize directly due to computational cost and attack unpredictability. Therefore, we use the improvement to the core influence-strength of \(\textbf{G}\), demonstrated to correlate with the core resilience in Observation 4, as the ranking metric. At each step, c candidate hyperedges with the highest scores are chosen to augment to \(\textbf{G}\), with c as the batch size of each step, an input parameter of COREA.

Apart from the input parameters given by Problem 1, COREA also employs 3 other algorithm input parameters: the tie-breaking scheme \(\textsf{T}\) in Step 1-1, the sampling scheme \(\textsf{S}\) in Step 1-2, and the batch size c in Step 2, as described above. These algorithm input parameters are the exclusive hyperparameters of our method, which may not be used for other algorithms. In section 7.3, we present our ablation study to investigate the importance of these algorithm input parameters.

figure a

6.2 Step 1: construct candidate hyperedges

As discussed, it is infeasible to check all possible node combinations and select those guaranteed to change no core numbers. As a workaround, we instead answer this question: for each node v of core number k, how many hyperedges anchored at v can be augmented without changing the core number of v?

Suppose a candidate hyperedge e is formed by grouping v with other nodes having core numbers higher than or equal to k. If we can guarantee the augmentation of e preserves the core number k of its anchor(s) including v, e will be deleted in process of obtaining the \((k+1)\)-core from the k-core. Therefore, the core numbers of all the nodes in e are unchanged. Because each hyperedge only contributes to the core number of its anchor(s), e does not affect any nodes of core numbers lower than k. As a result, augmenting e into \(\textbf{G}\) changes no core numbers. In Step 1, COREA forms a pool P of such candidate hyperedges like e. We further divide Step 1 into two parts.

6.2.1 Step 1-1: compute anchor availabilities

This step is outlined in Algorithm 2 Following the core decomposition process, for each node \(v \in \textbf{E}\), Algorithm 2 computes the number of hyperedges anchored at v that can be augmented while preserving \(N_{\textbf{G}}(v)\).

In the pruning process of obtaining the \((k+1)\)-core from the k-core, when node v, \(N_{\textbf{G}}(v) = k\), is about to be deleted, its degree is lower than \((k+1)\), i.e., \(d_{\textbf{G}}(v) \le k\), and let \(a\ge 0\) be the value satisfying \(d_{\textbf{G}}(v) = k -a\). If we augment \(a = k - (k-a)\) hyperedges anchored at v, its degree becomes \(k - a + a = k\), which still qualifies v for removal. Prior to removing v, Algorithm 2 computes the number \(c(v) = a\), referred to as the anchor availability of v. c(v) is the number of hyperedges anchored at v that can be augmented while preserving \(N_{\textbf{G}}(v)\). The total number \(\mathcal {C}\) of hyperedges that can be augmented by COREA, subject to preserving all core numbers, is the sum of all anchor availabilities of the nodes: \(\mathcal {C}=\sum _{v \in \textbf{V}}c(v)\).

figure b
Fig. 8
figure 8

An illustration of Algorithm 2 with two different valid orders of node removals in the core decomposition of hypergraph \(\textbf{G}\). Incorporating the core decomposition process, the method computes the anchor availability c(v) before removing node v of core number \(N_{\textbf{G}}(v)\). While different orders lead to different individual anchor availabilities, the sum of anchor availabilities is always 0 for the one node of core number 1, 1 for the nodes of core number 2, 5 for the nodes of core number 3, and 6 for total

At any point during the pruning process of obtaining the \((k+1)\)-core from the k-core, several nodes may have degree \(\le k\), and the order by which those nodes are removed may affect their respective anchor availabilities. In particular, when both u and v have degree \(\le k\). If we delete u first, the hyperedges anchored in both u and v are removed along u, which further reduces the degree of v. As a result, Algorithm 2 will afford a higher anchor availability for v. The tie-breaking scheme \(\textsf{T}\) that decides which node to remove first impacts the anchor availabilities of the nodes. While COREA does not assume a specific tie-breaking scheme, we set \(\textsf{T}\) to select v to delete first with the chance proportional to \(\mathcal{C}\mathcal{S}_{\textbf{G}}(v)/\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\). By this, we defer the removals of the nodes having high \(\mathcal{C}\mathcal{I}_{\textbf{G}}/\mathcal{C}\mathcal{S}_{\textbf{G}}\) values to potentially afford them higher anchor availabilitiles. Our experiment results in Sects. 7.2 and 7.3 justify this choice for the tie-breaking scheme \(\textsf{T}\).

An example in Fig. 8 illustrates the process of computing the anchor availabilities of Algorithm 2 in two different deletion orders. In the two different deletion orders, the anchor availabilities of a node may be different, but the total anchor availabilities is always zero for the one node of core number 1, one for the nodes of core number 2, five for the nodes of core number 3, and six for total.

Note that our method does not always afford the maximum anchor availabilities for all nodes. Different deletion orders, governed by the tie-breaking scheme \(\textsf{T}\), may result in different anchor availabilities for the same node v, and not every order guarantees the maximum availability for v. In Appendix 5, we conduct additional analysis regarding the reasons why achieving the maximum anchor availabilities for all nodes is not always guaranteed. As presented in Sect. 6.4, Theorem 2 shows that the sum \(\mathcal {C}\) of anchor availabilities, where the anchor availabilities of some nodes might be sub-optimal, is always constant with respect to \(\textbf{G}\). More importantly, however, Theorem 3 shows that \(\mathcal {C}\) is actually the maximum number of hyperedges any method can augment to \(\textbf{G}\), subject to conserving all core numbers of \(\textbf{G}\). That is, any method that augments more than \(\mathcal {C}\) hyperedges, attempting to provide more anchor availabilities than COREA, certainly violates the core-conserving constraint of Problem 1.

Given \(\textbf{G}\) and the tie-breaking scheme \(\textsf{T}\), the first output of Step 1-1 (Algorithm 2) is the anchor availabilities of the nodes in \(\textbf{V}\), which are the number of hyperedges anchored at the respective nodes that can be augmented while conserving all core numbers. The anchor availabilities are exclusive to COREA. Other output results include the core numbers of the nodes in \(\textbf{V}\), the deletion order \(\mathbb {O}\) of the nodes in \(\textbf{V}\) in the core decomposition process, and the degeneracy of \(\textbf{G}\), which are the output of a core decomposition process.

6.2.2 Step 1-2: build a pool P of candidate hyperedges

Given the results of Step 1-1, Step 1-2 constructs a pool P of \(\mathcal {C}\) candidate hyperedges guaranteed to conserved all core numbers if augmented to \(\textbf{G}\).

For each v, COREA constructs c(v) candidate hyperedges anchored at v to add to the pool P of candidates. To conserve the size distribution D of the hyperedges in \(\textbf{E}\), the size s of each candidate hyperedge e is drawn from D. e includes v, and the other \((s-1)\) nodes have the core numbers \(\ge N_{\textbf{G}}(v)\).

As shown in Line 11 of Algorithm 1, those \((s-1)\) nodes are chosen from \(\mathbb {O}[i+1:]\) by the sampling scheme \(\textsf{S}\), which are the nodes removed after v in the core decomposition process. As stated in Theorem 1, it is guaranteed that augmenting e into \(\textbf{G}\) does not alter any core numbers. While our method does not assume a particular sampling scheme \(\textsf{S}\), we set \(\textsf{S}\) to choose each node u with a chance proportional to \(\mathcal{C}\mathcal{I}_{\textbf{G}}(u)/\mathcal{C}\mathcal{S}_{\textbf{G}}(u)\), giving the nodes of high core influences and relatively low core strengths more incident hyperedges, and include at least one node in the degeneracy core. In Sect. 5.3, we show that the core influence-strength and degeneracy centralized index are positively correlated with the core resilience (see Observations 4 and 5). The nodes of high \(\mathcal{C}\mathcal{I}_{\textbf{G}}/\mathcal{C}\mathcal{S}_{\textbf{G}}\) values are favored with higher anchor availabilities (due to the tie-breaking scheme \(\textsf{T}\) described in Sect. 6.2.1) and in turn higher core strenghts in the augmented hypergraph, making them more robust in keeping core numbers and indirectly improve the core influence-strength of \(\textbf{G}\). Therefore, the anchors of e can potentially benefit from the connections with such nodes. Moreoever, to maximize the degeneracy centralized index of the augmented hyperedges, each hyperedge of core number lower than \(N_{\textbf{G}}^{*}\), the degenracy of \(\textbf{G}\), needs to include at least one of in the degeneracy. The choices for \(\textsf{S}\) reflect the results of Observations 4 and 5 and prove helpful in the empirical performance of COREA in Sects. 7.2 and 7.3.

6.3 Step 2: select the best hyperedges from the pool

As shown in Theorem 3, there is a maximum number \(\mathcal {M}\) of hyperedges that can be augmented to \(\textbf{G}\) while preserving all core numbers, and the total anchor availabilities \(\mathcal {C} = \sum _{v \in \textbf{V}}c(v)\) is equal to \(\mathcal {M}\). As a result, in order to satisfy all constraints of Problem 1, the maximum number b of hyperedges that COREA can augment to \(\textbf{G}\) is not only \(\le B\) but also \(\le \mathcal {C}\). In other words, \(b = \text {min}\{B, \mathcal {C}\}\). In the case \(|P |> b\), which is usually true as the budget B is usually tight in practice, COREA needs to select b hyperedges from P to augment to \(\textbf{G}\).

Given the pool P of candidate hyperedges from Step 1, COREA ranks each candidate e in P by the increase in the core influence-strength of the hypergraph. At each iteration, let the current hypergraph snapshot be \(\textbf{G}_{\text {cur}}=(\textbf{V}, \textbf{E}_{\text {cur}})\), where COREA has augmented q hyperedges from P to \(\textbf{E}\) to form \(\textbf{E}_{\text {cur}}\) (\(q=0\) at the beginning of Step 2). For each \(e \in P\), COREA computes a score \(s(e) = \mathcal {CIS}(\textbf{G}_{\text {new}}) - \mathcal {CIS}(\textbf{G}_{\text {cur}})\), with \(\textbf{G}_{\text {new}}=(\textbf{V}, \textbf{E}_{\text {new}}), \textbf{E}_{\text {new}}=\textbf{E}_{\text {cur}} \cup \{e\}\).

COREA keeps greedily selecting c candidate hyperedges with the highest scores, augmenting them to \(\textbf{G}\), and updating the scores of the remaining hyperedges in P until b hyperedges have been augmented to \(\textbf{G}\).

This scoring method is based on Observation 4 in which a higher core influence-strength implies a higher core resilience. Since the core resilience is difficult to optimize directly for the computational challenges and unpredictable behavior of attackers, we employ a surrogate objective that is the improvement to the core influence-strength of \(\textbf{G}\) in Step 2 This surrogate objective reflects the goal of maximizing the core influence-strength of \(\textbf{G}\), which is positively correlated with core resilience, and is more convenient to maximize.

6.4 Theoretical analysis

In this section, we present several theoretical results regarding COREA. All proofs can be found in "Appendix 4".

Theorem 1

(Feasibility of COREA) Step 1 of COREA guarantees to construct a pool P of candidate hyperedges that do not change the core number of any node when they are added together to \(\textbf{G}\).

Theorem 2

(Invariance of COREA) The total number of anchor availabilities \(\mathcal {C} = \sum _{v \in \textbf{V}}c(v)\) realized by COREA is always constant with respect to \(\textbf{G}\).

Theorem 3

(Exhaustivenessof COREA) There is a maximum number \(\mathcal {M}\) of hyperedges that can be augmented to \(\textbf{G}\) while conserving all core numbers, and the total number of anchor availabilities \(\mathcal {C}\) realized by COREA is equal to \(\mathcal {M}\).

Theorems 1 and 2 state that COREA always satisfies the constraint of preserving all core numbers in Problem 1 and returns the same total number \(\mathcal {C}\) of anchor availabilities regardless of the tie-breaking scheme \(\textsf{T}\) in Step 1. According to Theorem 3, \(\mathcal {C}\), is equal to \(\mathcal {M}\), which is the maximum possible number of hyperedges that can be augmented without altering any core numbers. That is, in a case where the budget B exceeds \(\mathcal {M}\), COREA is guaranteed to augment the maximum number \(\mathcal {M}\) of hyperedges while ensuring the preservation of all core numbers. In general, COREA always augments \(b=\text {min}\{B, \mathcal {M}\}\) hyperedges, which is the maximum number of hyperedges that can be augmented subject to all constraints.

Theorem 4

(Time Complexity of COREA) Given the hypergraph \(\textbf{G}=(\textbf{V}, \textbf{E})\) with maximum hyperedge cardinality m, the budget B, the total number of anchor availabilities \(\mathcal {C}\) of all nodes (constant with respect to each dataset), and the batch size c by which COREA augments c hyperedges at a time in Step 2, the time complexity of COREA is \(\mathcal {O}\left[ |\textbf{V}|\text {log}|\textbf{V}|+ \mathcal {C}m \log |\textbf{V}|+ (|\textbf{V}|+ \sum _{e \in \textbf{E}}|e |+ \mathcal {C}m^2)\frac{b}{c}\right]\), where \(b = \text {min}\{B, \mathcal {C}\}\).

7 Empirical evaluation of COREA

In this section, we answer the following questions:

  • Q1. Time and performance how are different methods compared in terms of the running time and improvement of the core resilience in real-world hypergraphs?

  • Q2. Ablation study how do different variants of each component of COREA affect the performance and running time?

  • Q3. Effect of hyperedge size distribution what is the effect of the size distribution of the augmented hyperedges on the performance?

  • Q4. Further insights what are interesting characteristics of the hyperedges returned by COREA?

  • Q5. Applications to what extent do the hyperedges augmented by COREA contribute to the applications of core numbers discussed in Sect. 4?

7.1 Experiment settings

Datasets: We used 10 real-world hypergraphs across several domains. The basic statistics of the datasets are provided in Appendix 1.

Proposed method: For COREA, the tie-breaking scheme \(\textsf{T}\) in Step 1-1 selects v to delete first with the chance proportional to \(\mathcal{C}\mathcal{S}_{\textbf{G}}(v)/\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\) among several nodes up for removals. This defers removing nodes having high \(\mathcal{C}\mathcal{I}_{\textbf{G}}/\mathcal{C}\mathcal{S}_{\textbf{G}}\) to potentially afford them higher anchor availabilities. The sampling scheme \(\textsf{S}\) in Step 1-2 selects v with the chance proportional to \(\mathcal{C}\mathcal{I}_{\textbf{G}}(v)/\mathcal{C}\mathcal{S}_{\textbf{G}}(v)\) and ensures each candidate hyperedge has at least one node in the degeneracy core. These options stem from Observations 4 and 5. Including one node in the degeneracy core maximizes the degeneracy centralized index after augmentation. To improve the core strengths of the nodes having high core influences, COREA prioritizes nodes of high \(\mathcal{C}\mathcal{I}_{\textbf{G}}/\mathcal{C}\mathcal{S}_{\textbf{G}}\) with higher anchor availabilities and more incident hyperedges. COREA is implemented in Java.

Baselines: We consider the following baseline methods:

  • MRKC-G: we apply the method MRKC in (Laishram et al. 2018) to generate the augmented edges for the clique expansion. We augment the edges (i.e., size-2 hyperedges) that satisfy the constraints of Problem 1 to the hypergraph.

  • MRKC-D: we construct the decomposed pairwise graphs from the original hypergraph, as in (Do et al. 2020), and then apply MRKC  (Laishram et al. 2018) to each decomposed graph to generate edges. After that, we construct the hyperedges from those edges (each edge in a decomposed graph corresponds to a hyperedge), select those that satisfy the constraints of Problem 1, and augment them to \(\textbf{G}\).

  • MRKC-H: we generate the hyperedges of size 2 only in Step 2 of COREA and use the same scoring function as MRKC in (Laishram et al. 2018).

  • Random: We replace the tie-breaking scheme \(\textsf{T}\) in Step 1-1 and the sampling scheme \(\textsf{S}\) in Step 1-2 of COREA by uniform random selection. The selection of candidate hyperedges in Step 2 from the pool P is also uniform at random.

MRKC-G and MRKC-D are extentions of the core-resilience improvement method for pair-wise graph (Laishram et al. 2018) with proper adjustments to hypergraphs, and we use the implementation provided by the authors for these two baselines. We implement MRKC-H as a variant of COREA that constructs size-2 hyperedges only. Random is a simplified variant of COREA with randomization at each step outlined in Sects. 6.2 and 6.3.

Experimental details: We evaluate the performance of each method in terms of the improvement of core resilience: \({\mathcal {R}}_{\textbf{G}'}^{\textbf{E}'}(r,s) - {\mathcal {R}}^{\textbf{E}}_{\textbf{G}}(r, s)\) with \(\textbf{G}'\) obtained by augmenting the hyperedges selected by each method to \(\textbf{G}\). The budget B is fixed to \(5\% \times |\textbf{E}|\). For hyperedge-deletion attacks, \(r\% \times |\textbf{E}|\) (\(r=10,20, 30, 40, 50\)) hyperedges are deleted. For node-deletion attacks, \(r\% \times |\textbf{V}|\) (\(r=5,10, 15, 20, 25\)) nodes are deleted along their incident hyperedges. For each method and each dataset, we report the average running time and performance over 10 trials.

In this section, we present the results of hyperedge-deletion attacks when s is Core Strength Attack only. The results for node-deletion attacks when s is Core Strength Attack are in Appendix 3. The results for all other attack strategies are in the supplementary material. In all cases, we draw similar conclusions regarding the superior performance of COREA compared with the baselines and the roles each component of COREA plays in the performance.

7.2 Q1. Time and performance

Performance: The comparison of different methods in core resilience improvement across deletion ratios is in Fig. 9. The x-axis indicates the deletion ratios, the y-axis shows the performance, and the vertical bars indicate the standard deviations. COREA consistently outperforms the others in all datasets. In each dataset, the performance by COREA is \(5\%-35\%\) better than that of the best-performing baseline and up to \(70\%\) superior to the performance of Random. While Random is consistently the worst-performing baseline, for the three baselines MRKC-G, MRKC-D, and MRKC-H, they all perform slightly better than Random, and no method surpasses the other two consistently in all datasets.

Time and performance trade-off: The time-performance tradeoff of the methods is illustrated in Fig. 10. The x-axis indicates the running time, the y-axis shows the performance when the deletion ratio \(r=50\%\), and the vertical bars indicate the standard deviations. COREA significantly outperforms other methods in all datasets, while the running time of COREA is relatively close to the fastest baseline Random, which is the worst-performing method.

In addition to Figs. 9 and 10, for each dataset, we test the difference in the performance of our method with that of the best-performing baseline using an one-tailed Student’s t-test as follows:

  • \(H_0\): the mean performance of COREA is lower than or equal to the mean performance of the baseline.

  • \(H_a\): the mean performance of COREA is greater than the mean performance of the baseline.

At \(95\%\) confidence when \(\alpha = 0.05\), the test rejects \(H_0\) in favor of \(H_a\) (p-value \(<0.05\)), confirming that COREA is significantly superior to all the baselines.

Fig. 9
figure 9

The comparison of different methods in terms of performance. The x-axis shows the deletion ratios, and the y-axis shows the core resilience improvement of the methods. The vertical bars indicate the standard deviations. COREA consistently brings better improvement of core resilience than the others in all datasets regardless of deletion ratios

Fig. 10
figure 10

The trade-off of the methods in terms of time and performance. The x-axis shows the running time, and the y-axis shows the core resilience improvement of each variant when the deletion ratio \(r=50\%\). The vertical bars indicate the standard deviations. COREA consistently provides a better time-performance trade-off than the other methods in all datasets regardless of deletion ratios

Fig. 11
figure 11

The comparison of different variants in terms of performance. The x-axis shows the deletion ratios, and the y-axis shows the core resilience improvement of each variant. The vertical bars indicate the standard deviations. The full-fledged version of COREA consistently outperforms the other variants in all datasets regardless of deletion ratios

Fig. 12
figure 12

The trade-off of different variants in terms of time and performance. The x-axis shows the running time, and the y-axis shows the core resilience improvement of each variant when the deletion ratio \(r=50\%\). The vertical bars indicate the standard deviations. The full-fledged version of COREA consistently provides a better time-performance trade-off than the other variants in all datasets regardless of deletion ratios

7.3 Q2. Ablation study

We investigate the role of each component of COREA in improving the core resilience of the hypergraphs. Similar to Sect. 7.2, in each section of the ablation study, apart from highlighting the results in Figs. 111213, and 14, we also employ an one-tailed Student’s t-test, at \(95\%\) confidence, to verify that our full-fledged method significantly outperforms all the other variants. In all cases, the p-value is smaller than 0.05, so the test rejects \(H_0\) in favor of \(H_a\) that the full-fledged variant of COREA is superior to the best-performing simplified variant.

Simplified variants of COREA: We compare the full-fledged version of COREA, as described in Sect. 7.1, with the following five simplified variants in terms of running time and performance:

  • CoReA-CI: obtained by modifying the scoring function s(.) in Step 2 of COREA to the sum of the core influences of the anchor. The score for each candidate hyperedge e is: \(s'(e) = \sum _{v \in \textbf{A}_{\textbf{G}}(e)}\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\). This scoring function gives high priority to hyperedges anchored at high-influence nodes, those contributing to the core numbers of other nodes.

  • RB1: obtained by replacing the tie-breaking scheme \(\textsf{T}\) in Step 1-1 of COREA by selecting a node uniformly at random.

  • RB2: obtained by replacing the sampling scheme \(\textsf{S}\) in Step 1-2 of COREA by selecting nodes from \(\mathbb {O}[i+1:]\) uniformly at random.

  • RB3: in Step 2 of COREA, choose candidate hyperedges uniformly at random.

  • Random: the same as method Random in Sects. 7.1 and 7.2.

For each method, if we increase the batch size c while keeping other components unchanged, the running time decreases as there are fewer iterations of the loops in lines 16-22 of Algorithm 1. However, the performance declines as the method augments more hyperedges at 1 iteration and undertakes fewer updates on the scores of the candidate hyperedges in Step 2 of Algorithm 1. For the full-fledged version, we set the batch size c equal to the budget b and record the running time as t. For the competitors, we set the batch size \(c'\) to afford them sufficient time and update iterations for potentially better performance. Specifically, for each competitor, we set \(c' = \min \{10, b\}\) if the running time is at least as long as t, and otherwise, we set \(c'=1\) to give it the most possible time. We compare the performance across deletion ratios in Fig. 11 and the time-performance trade-off of all methods when deletion ratio \(r=50\%\) in Fig. 12. It is clear that the full-fledged version of COREA consistently yields a better time-performance trade-off and outperforms the others regardless of deletion ratios.

Fig. 13
figure 13

The performance of COREA when the degeneracy requirement is enforced and waived. The x-axis shows the deletion ratios, and the y-axis shows the core resilience improvement of each variant. The vertical bars show the standard deviations. Enforcing the degeneracy requirement of having at least one node in the degeneracy core in each candidate hyperedge is helpful to the performance

Fig. 14
figure 14

The performance of COREA with different tie-breaking schemes in Step 1-1. The x-axis shows the deletion ratios, and the y-axis shows the core resilience improvement of each variant. The vertical bars show the standard deviations. The tie-breaking scheme \(\mathcal{C}\mathcal{S}_{\textbf{G}}/\mathcal{C}\mathcal{I}_{\textbf{G}}\), leads to the highest improvement of core resilience among the three schemes

Fig. 15
figure 15

The distribution of hyperedge sizes in each dataset, visualized on a log-log scale, is positively skewed. In each distribution, only a small fraction of hyperedges have large sizes, while the majority of hyperedges are of small sizes

Fig. 16
figure 16

The performances of COREA when following the original and uniform hyperedge size distributions, respectively. The x-axis shows the deletion ratios, and the y-axis shows the core resilience improvement. The vertical bars indicate the standard deviations. For the uniform distribution, COREA augments larger-size hyperedges, potentially helping more nodes with the augmentation, and results in a better performance

Degeneracy core: We examine the effectiveness of the idea of including at least one node in the degeneracy core in each candidate hyperedge, as proposed in Sect. 6.2.2. Figure 13 highlights the performance of COREA in two scenarios: when the requirement of including at least one node in the degeneracy core in each candidate hyperedge is enforced in Step 1-2 of COREA, and when the requirement is waived. A better performance is achieved when this requirement is enforced, indicating that it is necessary to meet this requirement in our method.

Tie-breaking scheme: We also examine how different tie-breaking schemes \(\textsf{T}\) in Step 1-1 of COREA, which is discussed in Sect. 6.2.1, leads to different performances. Recall that a tie-breaking scheme \(\textsf{T}\) governs the order nodes are deleted in the core decomposition process and in turn determines the anchor availabilities of nodes. We compare three schemes of selecting which node to delete first when facing multiple nodes qualified for removal in Algorithm 2:

  • \(\mathcal{C}\mathcal{S}_{\textbf{G}}/\mathcal{C}\mathcal{I}_{\textbf{G}}\): the chance of selecting a node v is proportional to \(\mathcal{C}\mathcal{S}_{\textbf{G}}(v)/\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\) as of COREA described in Sect. 7.1.

  • \(1/\mathcal{C}\mathcal{I}_{\textbf{G}}\): the chance to select a node v, to delete first among several nodes up for removal in the core decomposition process, is proportional to \(1/\mathcal{C}\mathcal{I}_{\textbf{G}}(v)\). This defers removing nodes of high \(\mathcal{C}\mathcal{I}_{\textbf{G}}\) values to potentially afford them higher anchor availabilities.

  • Random: a node is selected uniformly at random. This is method RB1 in Sect. 7.3.

Figure 14 shows that the scheme \(\mathcal{C}\mathcal{S}_{\textbf{G}}/\mathcal{C}\mathcal{I}_{\textbf{G}}\) consistently leads to better performance than the other two.

7.4 Q3. Effect of hyperedge size distribution

The distributions of hyperedge sizes in real-world hypergraphs are known to be positively skewed (Kook et al. 2020), where most hyperedges have small sizes while only a small fraction of hyperedges have large sizes (see Fig. 15). To examine the effect of the size distribution, for each dataset, we reconfigure COREA to augment the hyperedges whose size distribution follows the uniform distribution. In other words, we replace the original hyperedge size distribution D of \(\textbf{G}\) in Algorithm 1 by the uniform distribution. The results are highlighted in Fig. 16. In the case of uniform distribution, as COREA creates and augments more hyperedges of larger sizes, due to switching from a heavy-tailed to the uniform distribution, the augmented hyperedges potentially help more nodes to maintain their core numbers, resulting in a better performance of core resilience improvement. However, it would be unrealistic to assume such uniform distribution as we are constrained to preserve the original skewed hyperedge size distributions in order to prevent attackers from deliberately ignoring our augmented hyperedges, as discussed in Sect. 3.2.

7.5 Q4. Further insights

We present three interesting characteristics of the hyperedges returned by COREA.

Insight 1

The augmentation by COREA is more helpful to the nodes of with medium to high original core numbers.

For each dataset, we group the nodes into three groups based on core numbers: low, medium, and high (each accounts for one-third of the range of core numbers) and measure the decrease in core numbers in each group after \(50\%\) of the hyperedges are removed by the Core Strength Attack, with or without the augmentation by COREA. As Fig. 17 shows, COREA mitigates such decrease more clearly in the medium and high groups.

Fig. 17
figure 17

Hyperedges augmented by COREA are more helpful in mitigating the core number degree, due to core strength attack, of the nodes of medium and high core numbers

Insight 2

A hypergraph of higher core resilience tends to possess less availability for augmentation and vice versa.

For each dataset, we define the ratio of availability as the average of anchor availabilities of nodes, found by COREA, normalized by their respective core numbers: \(r(\textbf{G}) = \frac{1}{|\textbf{V}|}\sum _{k=2}^{N_{\textbf{G}}^{*}}\sum _{v \in \textbf{V}_k}\frac{c(v)}{k} = \frac{1}{|\textbf{V}|}\sum _{k=2}^{N_{\textbf{G}}^{*}}\frac{\sum _{v \in \textbf{V}_k}c(v)}{k}\) (\(V_k = \{v \in \textbf{V}\mid N_{\textbf{G}}(v)=k\}\)). For each \(v \in \textbf{V}_k\), \(0 \le c(v) \le k\). A dataset with high \(r(\textbf{G})\) implies more availability for augmentation, and this statistic is negatively correlated with core resilience, as shown in Fig. 18 (left). Intuitively, if we can augment more, i.e., a high value of \(r(\textbf{G})\), the core structure of the hypergraph is “less complete”, resulting in weak core resilience against deletion attacks.

Fig. 18
figure 18

(Left) The ratio of availability is negatively correlated with core resilience. (Right) the set of actual hyperedges and the set of candidate hyperedges, constructed by COREA, have positively correlated distribution skewness of core numbers. “CorrCoef” indicates Spearman’s rank correlation coefficient

Insight 3

The skewness of the distributions of the core numbers of hyperedges in \(\textbf{E}\) is positively correlated with that of the hyperedges constructed by COREA.

This positive correlation is shown in Fig. 18 (right). For example, in threads-ask-ubuntu, the skewness of the core number distribution of hyperedges in \(\textbf{E}\) is positive, indicating more hyperedges of low core numbers but fewer hyperedges of high core numbers, and this tendency is also found in the pool of hyperedges P returned by COREA. By contrast, such skewness for the set \(\textbf{E}\) in contact-primary-school is negative, implying more hyperedges of high core numbers, and this is also true for the hyperedges in P from the dataset.

7.6 Q5. Applications

In this section, we demonstrate that the hyperedges augmented by COREA support the applicability of hypergraph core numbers, introduced in Sect. 4, when the networks face deletion attacks.

Identification of influential nodes: Table 3 reports the Spearman’s rank correlation coefficient between the nodes’ influences in the original hypergraph with: the original core numbers (Before Attack), the core numbers of the hypergraph after \(50\%\) of hyperedges have been deleted (No Augmentation), and the core numbers of the hypergraph after several hyperedges are augmented by COREA and then \(50\%\) of hyperedges are deleted (COREA). After the deletion attack, the ranking of core number is less correlated to the ranking of node influences, i.e., core numbers become less useful in characterizing influential nodes. However, the hyperedges augmented by COREA help alleviate that decrease in the usefulness of core numbers.

Anomaly detection: Fig. 19 highlights the AUC-PR of predicting abnormal nodes of the method Core with the settings detailed in Sect. 4.2 before (Original) and after \(50\%\) of hyperedges have been deleted (COREA and No Augmentation). COREA and No Augmentation indicate the cases where hyperedges are augmented by COREA before the attack and no augmentation is undertaken, respectively. After an attack, the ranking of core numbers is less useful in detecting anomalies, but this decline of usefulness is mitigated with the hyperedges augmented by COREA.

Table 3 Spearman’s rank correlation coefficients between the node’s influences and the core numbers before an attack, after an attack, and after an attack with augmentation by COREA. COREA helps preserve the usefulness of core numbers in finding influential nodes after the networks are attacked
Fig. 19
figure 19

The AUC-PR of predicting abnormal nodes by the method Core, which is based on the ranking of core numbers and outlined in Sect. 4, before (Original) and after deletion attack with (COREA) and without (No Augmentation) the hyperedges augmented by COREA. After the attack, the ranking of core numbers is less useful in predicting anomalies, but the augmentation by COREA helps reduce such drop in usefulness

8 Conclusion

In this work, we formulate and study the problem of enhancing the core resilience of real-world hypergraphs. We discuss the challenges of the problem, introduce the relevant concepts, and present the key patterns regarding the core resilience of the hypergraphs

Based on these, we develop a two-step method, COREA, to consolidate the core structure of hypergraphs by augmenting hyperedges within a given budget. COREA is fast, theoretically sound, and empirically effective in improving the core resilience of the hypergraphs. The hyperedges augmented by COREA not only preserve the core structure of the hypergraphs but also enhance its resilience. Through our extensive experiments in ten real-world hypergraphs, we demonstrate the superiority of COREA over the baseline approaches, investigate the characteristics of the augmentation by COREA, and examine the role each component plays in the performance of COREA. In addition, we show that COREA helps support the applications of hypergraph core numbers when the hypergraphs face deletion attacks. The code and datasets are available at https://github.com/manhtuando97/CoReA.