Keywords

1 Introduction

Complex networks may be used to represent a broad variety of real-world complex systems such as ecological, social, and biological systems. Given a limited budget, choosing a set of nodes is critical whether the objective is to boost or hinder any diffusion process. Accordingly, targeting influential nodes is of great interest. Classical centrality measures are one of the main methods used for identifying key nodes [1, 2]. Generally divided into local and global centrality measures, to compute a node’s centrality, local centrality measures solely examine its neighborhood. They are considered efficient. However, global centrality measures assess a node’s centrality by examining its location in the whole network. They are more accurate than local ones but are computationally expensive. The problem with classical centrality measures is that they ignore the network community structure, which is of high importance in real-world networks [3].

Recently developed community-aware centrality measures take the network’s community structure into account when identifying influential nodes [4,5,6,7,8,9,10,11,12,13]. They discern intra-community links from inter-community links. Intra-community links join nodes within the same community. Inter-community links join nodes from various communities. Intra-community and inter-community links are tied to the node’s influence at the local and global levels, respectively. Each community-aware centrality measure calculates the node’s centrality differently based on the node’s inter-community and intra-community links. For instance, Comm centrality [7] identifies hubs or bridges as prominent nodes based on the network’s community structure. However, it gives more importance to bridges. Modularity Vitality [8] calculates a node’s centrality based on the variation of modularity after a node’s removal.

Previous research has used the Susceptible-Infected-Recovered (SIR) and the Linear Threshold (LT) models to analyze the behavior of seven community-aware centrality measures [14,15,16]. This work extends the previous work by studying the performance of the community-aware centrality measures using the Independent Cascade (IC) model. Moreover, previous work did not examine the behavior of the community-aware centrality measures with synthetic networks. We depart from prior studies to systematically investigate eight popular community-aware centrality measures. We use three synthetic networks generated with the LFR model [17] that allows controlling the community structure strength. We also use three real-world networks originating from different domains.

The rest of the paper is organized as follows. Section 2 introduces the community-aware centrality measures that are being evaluated. The Independent Cascade model is described in Sect. 3. Section 4 presents the data and strategies used in the evaluation process. In Sect. 5, experimental results are given. A discussion of the findings and the conclusion are presented in Sect. 6.

2 Community-Aware Centrality Measures

In this section, we recall briefly the definitions of the community-aware centrality measures under evaluation. One can refer to [18] for more information. Assume that G(V,E) is connected, simple, unweighted, and undirected graph where V is the set of nodes of size \(N=|V|\) and \({E \subseteq V \times V}\). In G, we have \({N_c}\) non-overlapping communities where \({c_k}\) is the \({k^{th}}\) community. For each node i, the total degree is defined as \({k_i^{total} = k_i^{intra} + k_i^{inter}}\) where \({k_i^{intra}}\) and \({k_i^{inter}}\) are the intra-community and inter-community links respectively. Consider \({k_{i,c}}\) as the number of links node i has in a given community c.

Community Hub-Bridge [4] weights the intra-community links by the node’s community size and the inter-community links by the node’s number of neighboring communities:

$$\begin{aligned} \alpha _{CHB}(i) = Card(c_{k}) \times {k_i^{intra}} + NNC_i \times k_i^{inter} \end{aligned}$$
(1)

where \(Card(c_{k})\) is the community size of node i and \(NNC_i\) is the number of neighboring communities node i has.

Participation Coefficient [5] answers the question “How well-distributed are the links of node i among various communities?" If the links are uniformly distributed among all communities, the Participation Coefficient is close to 1, and 0 if the node is only linked to other nodes within its community:

$$\begin{aligned} \alpha _{PC}(i) = 1 - \sum _{c=1}^{N_c} \left( \frac{k_{i,c}}{k_i^{total}}\right) ^2 \end{aligned}$$
(2)

Community-Based Mediator [6] is based on the concept of entropy of the intra-community and inter-community links of node i. The value of the centrality gets higher as the links of a node i are more mixed:

$$\begin{aligned} \alpha _{CBM}(i) = H_i \times \frac{k_i^{total}}{ \sum _{i=1}^{N}k_i^{total}} \end{aligned}$$
(3)

where \(H_i = [-\sum \rho _i^{intra}\log (\rho _i^{intra})] + [-\sum \rho _i^{inter}\log (\rho _i^{inter})\)], \(\rho _i^{intra}\) and \(\rho _i^{inter}\) represent the node’s ratio of intra-community and inter-community links respectively, and the total degrees in the network is represented by \(\sum _{i=1}^{N}k_i^{total}\).

Comm Centrality [7] identifies hubs or bridges as prominent nodes based on the network’s community structure strength while giving more importance to bridges:

$$\begin{aligned} \begin{aligned} \alpha _{Comm}(i)&= (1 + \mu _{c_k}) \times \left( \frac{k_i^{intra}}{max(j \in c)k_j^{intra}} \times R\right) \\&+ (1 - \mu _{c_k}) \times \left( \frac{k_i^{inter}}{max(j \in c)k_j^{inter}} \times R \right) ^2 \end{aligned} \end{aligned}$$
(4)

where \(\mu _{c_k}\) is the number of inter-community links divided by the number of total community links in community \(c_k\), and R is a user-defined parameter to standardize the intra-community and inter-community links.

Modularity Vitality [8] is a signed measure indicating both how important a node is and what way the node is a key node by differentiating a hub and a bridge based on the variation of modularity after the node’s removal:

$$\begin{aligned} \alpha _{MV}(i) = M(G) - M(G_i) \end{aligned}$$
(5)

where M(G) is the network’s modularity and \(M(G_i)\) is the modularity after removing node i. This study investigates bridges-first, hubs-first, and bridges-and-hubs-first ranking strategies.

Community-Based Centrality [9] quantifies a node’s importance based on the node’s links in each community and the size of these communities:

$$\begin{aligned} \alpha _{CBC}(i) = \sum _{c=1}^{N} k_{i,c} \left( \frac{n_c}{N}\right) \end{aligned}$$
(6)

K-shell with Community [10] divides network G into two networks. The first contains the nodes and their intra-community links, while the second comprises the nodes and their inter-community links. The influence of each node is then assessed using a linear combination of the k-shell hierarchical decomposition of these networks:

$$\begin{aligned} \alpha _{ks}(i) = \delta \times \alpha ^{intra}(i) + (1 - \delta ) \times \alpha ^{inter}(i) \end{aligned}$$
(7)

where k-shell values of node i on the graphs having intra-community and inter-community links are represented respectively by \(\alpha ^{intra}(i)\) and \(\alpha ^{inter}(i)\). In this study, \(\delta \) is adjusted to 0.5 to ensure equal preference for hubs and bridges.

Map Equation Centrality [11] follows the vitality principle by assessing the difference between two descriptions of the same network based on the map equation:

$$\begin{aligned} \alpha _{MapEq}(i) = L^i - L^{i*} \end{aligned}$$
(8)

where \(L^i\) and \( L^{i*}\) represent the inefficient code and the efficient code, respectively.

3 Independent Cascade Model

The Independent Cascade (IC) model is a diffusion model proposed by [19]. It starts with an initial fraction of nodes \(f_o\) set in the active state. The probability \(P_{u,v}\) denotes the likelihood of node u activating node v. Once activated, each node has the potential to activate its neighbor once in the following time step. The activation is based on the probability associated with that edge. Following that time step, the previously activated node transitions to the inactive state and cannot activate other nodes. Since we use the generalized version of the IC model, a threshold on the edges (uv), denoted as \(\theta _{u,v}\), can be set to hinder the diffusion. Thresholds can assume a fixed value or be uniformly distributed in the interval [0, 1]. Accordingly, node v is activated by node u if and only the the \(P_{u,v} \ge \theta _{u,v}\). The diffusion iterates in discrete steps until activation is no longer possible. Since the IC model is stochastic, simulations are averaged over 100 separate iterations for each network.

The dynamics of the IC model differ from the Susceptible-Infected-Recovered (SIR) and the Linear Threshold (LT) models [20, 21]. Compared to the SIR, the infection rate is not constant but changes from one edge to another. Moreover, one can set a threshold value on each edge. Compared to the LT model, an activated node has only one chance to activate its neighbor(s). However, this is not true in the LT model. The IC model can represent many real-world cases such as the propagation of opinions in a group of people. A person with opinion X can influence his/her neighbors once. Moreover, the influence varies with the neighbors’ proximity.

4 Datasets and Evaluation Measure

This section briefly describes the data and the evaluation measure of the community-aware centrality measures. Table 1 reports the basic topological properties of the networks.

Table 1. Macroscopic topological properties of the synthetic and real-world networks. N is the total number of nodes. |E| is the number of edges. \(\mu \) is the mixing parameter. \(\gamma \) is the estimated exponent of the degree distribution. * indicates the largest connected component if the network is disconnected.

4.1 Synthetic Networks

One can control various topological properties in synthetic networks. We generate three synthetic networks with a non-overlapping community structure using the Lancichinetti, Fortunato and Radicchi algorithm (LFR) [17]. One has a strong community structure strength (\(\mu =0.05\)), the second has a medium community structure strength (\(\mu =0.2\)), and the last one has a weak community structure strength (\(\mu =0.7\)). Their degree distribution follows a power law. That is, \(P(k) \sim k^{-\alpha }\) where P(k) is the probability of a node having degree k and \(\alpha \) is a constant such that \(2<\alpha <3\). To mimic real-world networks, the degree and the community size distributions’ exponents are equal to 2.7.

4.2 Real Networks

We use two collaboration networks (GrQc and New Zealand Collaboration) and one infrastructural network (EU Airlines)Footnote 1. They are chosen in order to have comparable community structure strength with the synthetic networks. In the GrQc network, nodes are researchers co-authoring in General Relativity and Quantum Cosmology, and links represent co-authorship of scientific papers. In the New Zealand Collaboration network, academic institutions represent the nodes, and they are connected if Scopus lists a minimum of one common publication between authors in these institutions. In the EU Airlines network, nodes are European airports, and links represent airline routes. We use Infomap to uncover their community structure [22].

4.3 Evaluation Measure

We compare the activation size after the diffusion process of the IC model stops when the fraction of initial nodes is activated based on the community-aware centrality measures. The baseline is the degree centrality. The relative difference is defined as:

$$\begin{aligned} \varDelta A = \frac{A_c - A_b}{A_b} \end{aligned}$$
(9)

where \(A_c\) is the total number of activated nodes using a community-aware centrality measure c, and \(A_b\) is the total number of activated nodes with the baseline centrality. A positive \(\varDelta A\) value shows that the community-aware centrality measure outperforms the baseline centrality.

5 Empirical Analysis

In the IC model, one can consider identical threshold values for all the edges or randomly distributed threshold values. In this study, we investigate both types of thresholds. Let’s call “fixed threshold" the case where all the edges of the nodes share a similar constraint on whether to accept or not an opinion. In the following, the so-called “random threshold" depicts the case when it is uniformly distributed. It allows incorporating a deviation between the individuals characterizing their different sensitivity to their neighbors’ opinions.

5.1 Synthetic Networks

We first analyze the behavior of the community-aware centrality measures in the IC model using fixed thresholds set on the edges. Figure 1 reports the relative activation size as a function of the fraction of initially activated nodes on the three synthetic networks.

Comparing the Activation Size with a Fixed Threshold: In the network with a strong community structure (\(\mu =0.05\)), when the fraction of initially activated nodes is between 0.01 and 0.06, Community-Based Mediator (\(\alpha _{CBM}\)) is the winner alongside Modularity Vitality targeting hubs and bridges (\(|\alpha _{MV}|\)), then Comm Centrality (\(\alpha _{Comm}\)) exhibits a high activation difference for a range between 0.06 and 0.15 of initially activated nodes. The behavior of Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) is very similar to Comm Centrality (\(\alpha _{Comm}\)) when the fraction of initially activated nodes is low. So, one can say that Community-Based Mediator, Comm Centrality, Participation Coefficient, and Modularity Vitality targeting hubs perform better than others with a low fraction of initially activated nodes. Then, Participation Coefficient (\(\alpha _{PC}\)) takes the lead in the medium range, peaking at \(\varDelta A\) = 15%. Finally, Modularity Vitality targeting hubs performs best with a high range of initially activated nodes.

Fig. 1.
figure 1

Relative difference of the activation size (\(\varDelta A\)) as a function of the fraction of initially activated nodes for three synthetic networks. The initial spreaders set is built according to the ranks associated with a given community-aware centrality measure. On the left, a fixed threshold is set. On the right, a random threshold is set where each \(\theta _{u,v}\) is distributed uniformly among edges.

In the network with a medium community structure (\(\mu \)=0.20), when the fraction of initially activated nodes is between 0.01 and 0.05, all community-aware centrality measures have a negative value except for Community-Based Mediator (\(\alpha _{CBM}\)) and the Map Equation Centrality (\(\alpha _{MapEq}\)). Then, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) outperforms others with a fraction between 0.05 and 0.11. So, one can say that Community-Based Mediator, Map Equation Centrality, and Modularity Vitality targeting hubs perform better than others with a low fraction of initially activated nodes (i.e., \(\le \)0.11). One can distinguish that there is a very similar behavior for Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)), Comm (\(\alpha _{Comm}\)), and Participation Coefficient (\(\alpha _{PC}\)) when the fraction of initially activated nodes is in the medium range. Then, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) takes the lead with a high fraction of initially activated nodes.

In the network with a weak community structure, overall, the Map Equation Centrality (\(\alpha _{MapEq}\)) and Community-Based Mediator (\(\alpha _{CBM}\)) have positive \(\varDelta A\) with a fraction of initially activated nodes between 0.01 and 0.08. Comm (\(\alpha _{Comm}\)) performs better than others when this fraction is between 0.08 and 0.19. So, one can say that Community-Based Mediator, the Map Equation Centrality, and Comm Centrality perform better than others with a low fraction of initially activated nodes. Then, as this fraction increases above 0.19, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) takes over. Thus, with a medium to a high fraction of initially activated nodes, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) is the winner. The outperformance of \(\alpha _{MV}^+\) persists till the fraction of initially infected nodes reaches 0.50. However, compared with strong networks, the relative activation size’s magnitude is less than 5%. Nonetheless, the magnitude of \(\alpha _{MV}^+\) is the same when the fraction of initially activated nodes is 0.50.

Comparing the Activation Size with a Uniform Threshold: In order to investigate effect of the threshold set on edges compared to the fixed thresholds, we investigate the behavior of the centralities with uniform threshold. The results are given in Fig. 1 on the right.

In the network with a strong community structure, Participation Coefficient (\(\alpha _{PC}\)) performs the best when the fraction of initially infected nodes is \(\le \)0.36. Then, Modularity Vitality targeting hubs and bridges (\(|\alpha _{MV}|\)) takes the lead but with a smaller magnitude.

In the network with a medium community structure, Participation Coefficient (\(\alpha _{PC}\)) performs the best when the fraction of initially infected nodes is \(\le \)0.27. Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) takes the lead as the fraction of initially activated nodes increases. We can distinguish that all the ranking schemes of Modularity Vitality (i.e., \(\alpha _{MV}^+\), \(\alpha _{MV}^-\), (\(|\alpha _{MV}|\)) alongside the Participation Coefficient (\(\alpha _{PC}\)) compete when the fraction of initially activated nodes increases above 0.45.

In the network with a weak community structure, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) outperforms the rest of the centrality measures at all fractions of initially activated nodes. Its gain over the baseline reaches 9%, with a 5% difference with other centralities that rank second.

5.2 Real Networks

In order to study the consistency of the results in the previous section, the same experiments are done on real-world networks. Here, we are using three networks. One with strong community structure strength (EU Airlines), the other with medium community structure strength (GrQc), and the last one with a weak community structure strength (New Zealand Collaboration). The results are provided in Fig. 2.

Comparing the Activation Size with a Fixed Threshold: In the EU Airlines network, a network with a strong community structure, with a low fraction of initially activated nodes, Comm centrality (\(\alpha _{Comm}\)), Participation Coefficient (\(\alpha _{PC}\)), Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) perform well. Then, with a medium and high fraction of initially activated nodes, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) takes the lead.

In the GrQc network, a network with a medium community structure, results convey that when the fraction of initially activated nodes is \(\le \)0.28, Comm (\(\alpha _{Comm}\)) performs the best. Then, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) takes the lead.

In the New Zealand Collaboration network, a network with a weak community structure, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) outperforms the rest of the centrality measures at all fractions of initially infected nodes. Its gain is approximately 5% more than the other centralities that rank second.

Fig. 2.
figure 2

Relative difference of the activation size (\(\varDelta A\)) as a function of the fraction of initially activated nodes for three real-world networks. The initial spreaders set is built according to the ranks associated with a given community-aware centrality measure. On the left, a fixed threshold is set. On the right, a random threshold is set where each \(\theta _{u,v}\) is distributed uniformly among edges.

Comparing the Activation Size with a Uniform Threshold: In order to investigate whether the results are stable compared to those seen with fixed thresholds, we set the also thresholds based on the uniform distribution.

It can be seen that with the EU Airlines network, the behavior of the centrality measures can be divided into two categories. A similar behavior characterizes the first compared to the results observed when the threshold is fixed but with a lower relative activation size (\(\varDelta A\)). For instance, when the fraction of initially infected nodes is 0.50, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) reaches \(\varDelta A\) = 18% when the thresholds are uniform compared to \(\varDelta A\) = 27% when the thresholds are fixed at \(\theta _{(u,v)}\) = 0.12. However, in both cases, it is still the best performing centrality at a high fraction of initially infected nodes. The second category is characterized by having different trends which either result in a higher or lower magnitude in terms of \(\varDelta A\). For example, when the fraction of initially infected nodes is less than 0.31, the outperforming centrality measure with a uniform threshold is Community-Hub Bridge (\(\alpha _{CHB}\)). However, when the thresholds are fixed, \(\alpha _{CHB}\) is ranked fifth in terms of performance.

In the GrQc network, a network with a medium community structure strength, one can note that with thresholds based on the uniform distribution, Comm Centrality (\(\alpha _{Comm}\)) performs slightly worse compared to when the threshold is fixed. The maximum \(\varDelta A\) reaches 6.5% while with fixed thresholds, it reaches 9.5%. Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) maintains its performance when the fraction of initially active nodes crosses 0.24 with a magnitude similar to when the threshold is fixed.

In the New Zealand Collaboration network, a network with a weak community structure, Modularity Vitality targeting hubs (\(\alpha _{MV}^+\)) outperforms the rest of the centrality measures at all fractions of initially infected nodes using both fixed thresholds and uniform thresholds. However, its outperformance is slightly lower with uniform thresholds.

6 Discussion and Conclusion

Identifying key nodes in networks is critical since they significantly boost or hinder the diffusion process. The importance of communities in networks cannot be ignored. Thus, using traditional centrality measures becomes inefficient in this case, and using recently developed community-aware measures becomes crucial. Despite the importance of the diffusion process and network topological properties, earlier studies have usually relied on the Susceptible-Infected-Recovered and Linear Threshold models to investigate the performance of these community-aware centrality measures only on real-world networks. In this work, we use the Independent Cascade model to study the behavior of eight community-aware centrality measures on real-world and synthetic networks.

In synthetic networks, when the threshold is fixed among the nodes, results suggest that, in general, the Map Equation Centrality and Community-based Mediator perform better than others when resources are minimal. In contrast, Comm Centrality and Modularity Vitality targeting hubs function better with a medium to a large number of initially activated nodes. When thresholds are uniformly distributed, the results are generally similar except for Participation Coefficient. It performs the best within the low to medium range of initially activated nodes in networks with strong and medium community structure strengths. Thus, allowing a disparity in thresholds among the nodes in such conditions reveals that targeting nodes inter-linked to many communities is more efficient. The behavior of the centrality measures in real-world networks is generally consistent with their behavior in synthetic networks. Indeed, it is more effective to target hubs with high budget availability in all of these networks. If the budget is limited, it is better to use community-aware centrality measures prioritizing bridges or well-mixed nodes, namely, Community-based Mediator, Comm Centrality, or Community Hub-Bridge.

Results in both real-world and synthetic networks also show that when thresholds on edges are set based on the uniform distribution, the diffusive power of the community-aware centrality measures weakens. This is expected compared to the fixed thresholds set in the range \(\theta _{(u,v)}\) = [0.10, 0.16]. Indeed, a variance in the thresholds hinders the diffusive power of the nodes initially activated based on any community-aware centrality measure. Results also show that as the community structure gets weaker, the diffusive power of the community-aware centrality measures decreases.

In previous work, Rajeh et al. [15] performed a similar analysis on real-world networks using the Susceptible-Infected-Recovered (SIR) diffusion model. Results showed when resources are available, it is more beneficial to target distant hubs, whereas if resources are limited, bridges are better to target. These results are consistent with the results of the IC model. Nevertheless, results with the Linear Threshold model contradict that of the SIR and the IC models [14]. Indeed, results suggest that bridges and highly inter-connected nodes are better to target compared to hubs, regardless of the availability of resources in the Linear Threshold model. In the Linear Threshold model, the change in a node’s decision is highly dependent on the fraction of its neighbors that have adopted the same opinion. Therefore, hubs may inhibit the activation of other nodes, especially in dense communities where nodes have many neighbors imposing a higher threshold. On the contrary, selecting bridges and inter-linked nodes rather than hubs naturally spread out across the network, igniting a larger activation size.

These results demonstrate that three main parameters affect the performance of the community-aware centrality measures. The first is the community structure strength. The diffusive power of the community-aware centrality measures decreases as the community structure strength weakens. The second is the availability of resources. Bridges are more vital to target when resources are limited. The third is the dynamics of the diffusion model. It is more effective to target bridges than hubs in the Linear Threshold model. In contrast, hubs are more critical when resources are highly available in the Susceptible-Infected-Recovered model and the Independent Cascade model.