1 Introduction

The identification of influential nodes in complex networks has attracted considerable attention within the research community [36]. Social networking has become a modern tool for people to connect and spread the news with the development of science and technology. Thus, the critical issue of detecting a collection of the most prominent individuals in a network can help monitor the dissemination of social networking messages and rumours, discovering social influencers, etc. Particularly in the case of rumour propagation, as rumours spread quickly and widely and they have great power of destruction. It is therefore of great theoretical and practical importance to decide whether there is an influential spreader and to recognize who is the influential spreader in the process of spreading or controlling the spread of information [10, 14, 20].

Various approaches to the identification of influential nodes in complex networks including single centrality and group centrality measures have been explored in the past [21, 23]. A brief understanding of these concepts is discussed in Sect. 4.2. Other than the basic centrality measures, SV-based centrality algorithms have also been employed for similar problems [1, 22]. This is based on the main concept of a game-theoretic network which means defining a cooperative game across a network where agents are nodes, coalitions are node groups, and coalition payoffs are specified to meet the requirements of a given application.

Although many methods have been proposed, most of them fail to take into account the community structure of graphs while identifying influential nodes. Community structure organizes the nodes into sets such that a dense internal connection exists among the nodes in each set. In large and complex real-world networks, it is rarely the case that a single node could be able to dissipate a piece of information to the entire network. Therefore, we aim at identifying a group of nodes that might be able to scatter information to the rest of the network. Community structures refer to a subset of nodes in which all nodes are densely connected to all the other nodes present within the same community, but loosely connected to all the nodes present in other communities [27]. Higher-order community structure deals with small dense subgraph patterns and also the density of connections between two different communities. The benefit of identifying these is that we are able to find such a set of communities (rather than an individual nodes or just one community), that can dissipate information to majority of the network.

Various approaches have been used in the past for leveraging higer-order community structure such as Harmonic Motif Modularity [12], HBNmining [15], Attribute Homogeneous Motif [16], etc. The major drawback with all of them, however, is that they do not optimally deal with the lower-order topological structure. They do not take into consideration the smaller, isolated, and fragmented components that may be present in a large real-world network, and focus only on the larger connected components. EdMot [17] approach overcomes this by rewiring the network and by taking into account the higher-order community structure instead of only the lower-order one.

Graph clustering aims at partitioning the vertices of a graph into good clusters (based on different objective functions). Yang et al. [34] have studied and compared eight state-of-the-art clustering algorithms. They have also metioned the various merits and demerits of each of the approaches. Poulin et al. [26] proposes the ECG algorithm which outperforms all the algorithms studied by Yang et al. [34]. The ECG algorithm has been used to detect communities in a graph and is based on Multi-level Louvain algorithm [5]. Furthermore, the ECG algorithm also provides the information about strength between different communities present.

Our proposed algorithm initially aims to employ the EdMot approach that leverages the higher-order characteristics at the level of dense sub-graph patterns to identify the community structure in a graph. We propose an enhancement in this approach by using the Ensemble Clustering for Graph (ECG) algorithm for network partitioning, thus the rewired network is then fed to the ECG algorithm. After this, the most influential node from each partitioned community is identified according to the Shapely centrality value using EDGly algorithm. The ECG algorithm is based on the Louvain algorithm but produces more accurate results as a consequence of the ensembles of clusters used. Thus, we obtain the most significant partitioned communities from the original network and then employ SV based games to find the most significant nodes from each significant community identified. This gives us a list of all the influential nodes in the graph.

We have used the measures of prime centrality, the game-theoretical approach and our proposed algorithm to identify the most prominent top-k nodes and demonstrated a distinct and detailed contrast between all the approaches. We have finally used the SIR model to evaluate the performance of the proposed algorithm in comparison to the centrality-based measures and the SV games. The SIR model simulates an information spreading process in which our most significant nodes, identified via any of the algorithms, are considered as infected and tries to spread the information in the entire network. This helped to portray the different aspects and accuracy of the varied approaches vividly. More details on the working of SIR model and the evaluation criteria are mentioned in Section 7.3.

The main contribution of our work is to detect influential nodes in a graph by incorporating the community structure. EDGly algorithm uses and leverages the EdMot approach—an Edge enhancement approach for Motif-aware community detection, and ECG algorithm—a graph clustering method. The SV of each node is then calculated and the most influential nodes are obtained.

The remainder of this paper is organized as follows. Section 2 gives a detailed study of the various works done in the related field. Section 3 explains the datasets used. Section 4.1 explains the preliminaries while Sect. 6 explains the algorithmic flow. Section 7.1 illustrates the working of the proposed algorithm with the help of a diagram. Section 7.3 describes the evaluations performed and the results obtained. Section 9. concludes the research work with detailed insight into its future scope.

2 Literature review

One of the fundamental research discussions in the literature on network analysis is the topic of connectivity. The first to experiment to detect the primary top-k nodes were Domingos and Richardson [8]. They developed an algorithmic model to address this problem by modelling social media network as Markov random fields which mathematically characterized the probability of occurrence of an event. Chen and Teng [7] explained that single node centrality measures are suitable for assessing individual influence in isolation while Shapley centrality assesses individuals' performance in group influence settings.

Wei et al. [32] explored the need to learn distributed representation of the vectors for each vertex in a network. They laid emphasis on node classification and link prediction. A unique approach to identify influential nodes is to consider the k-shell value of each vertex as the mass and shortest path distance between two vertices as the distance, and then apply the concept of gravity centrality index [20].

We further investigated various kinds of centrality measures used for finding the most influential nodes in a network. Degree Centrality (DC), discussed by Gao et al. [10], is used to efficiently measure the significance of nodes. However, it does not take into consideration the overall, detailed anatomy of the network. Eigen-Vector Centrality (EC), according to Stephenson and Zelen [29], overcomes the defects associated with DC. It takes into account the influence of neighbours of the node in consideration. Betweenness Centrality (BC), as explored by Freeman [9], learns topology-related data of networks in advance. Al-Garadi et al. [2] describes how Closeness Centrality (CC) can be efficiently used to identify more than one prominent spreader.

Previous research by Tan et al. [30] on spreading rumours focused primarily on communities’ viral epidemics. The normal (and somewhat standard) model for viral epidemics is called the restored or SIR model that is susceptible-infected- recovered. Various methods have been defined for the same, including weighted k-core decomposition method [2] and rumour centrality with a mass centre technique [30]. Algorithms with low complexity have been devised that can be applied to large scale networks by introducing entropy into a complex network and using the SIR model for evaluation. Such a model [37] can be used for rumour management and targeted advertisements.

We studied the disadvantages associated with using centrality algorithms to find the most significant node in [7, 19, 24]. An attempt has been made to find the most influential node in a network using Mapping Entropy (ME) that reflects the correlation between a node and its neighbours [24]. We particularly inspected the application of ME using ENRON email dataset which is commonly used for the study of social networks [28].

Game-theory is a significant paradigm that finds its applications in various fields. It is used in statistics and business analytics for prototyping the interactivity among participating agents [4]. One of its most prominent applications is finding the most influential node within a network. Typical social network analysis cannot capture the dynamics of strategic interactions among the individuals in the network. Our proposed model is based on cooperative game theory that solves this issue [33]. The elemental constituents of intricate interactivities in a network can be efficiently processed using a rich class of games, called influence games, as has been demonstrated in [13]. Game-theoretic network centrality measures along with their computational complexity and axiomatic properties have been neatly explained in [31].

We studied about Explosion-Trust (ET) Game Model by referring to [18]. It remarkably explains how a rumour spreading model can be constructed using game-theory by considering two very significant factors—rumour explosion degree and trust degree of the source node. In complex networks, dense clusters [14] can be formed by detecting influential nodes using degree centrality and selecting the most influential nodes as the initial spreaders. This ensures that not just the spreader nodes are prominent themselves, but also that the distance among the nodes is relatively scattered. In [36], the probability of transfer of information was used between any pair of nodes and the clustering method, called k-medoid. Such an efficient algorithm helps to classify large scale powerful nodes in complex networks with community structures.

Amati et al. [3] aimed to find the most influential nodes from various Twitter scraped datasets. Primitive centrality measures like degree centrality, closeness centrality, eigen-vector centrality, and page-rank centrality were employed for the same. They also studied how the influence of spreaders evolved over time with the help of Dynamic Retweet Graph (DRG) that focused on how retweets of a tweet were made over time. This helped them better analyze the relationship between users. They concluded that closeness centrality gave too many nodes with high centrality and was, therefore, not a good indicator. Furthermore, the remaining measures gave almost similar results in terms of the most influential nodes.

3 Data sets

This section gives an elaborate description of the datasets that have been used for our implementation. We have used two datasets which have been described below.

  • The first dataset is the USAir97 dataset [6] which has been transformed into an undirected network, created by 332 nodes, where one airport represents a node, and 2126 edges, with each edge reflecting a direct air-line between two American airports, if any. Here, weights represent the normalized distance between two airports.

  • The second dataset is a tiny Facebook network [25] made up of 899 users serving as nodes and having 7089 edges. An edge occurs between two vertices if a few messages have been exchanged between the two users and are weighted by the number of messages exchanged.

3.1 Unweighted graph

An unweighted graph can be technically defined as a graph G(N, E) having ‘n’ nodes represented by set N and ‘e’ edges represented by set E consisting of unordered pairs, such that (n1, n2) = (n2, n1) and (n1, n2) ∈ E and n1, n2 ∈ N. Games 1 and 2 are played by creating an unweighted network from both the datasets.

3.2 Weighted graph

A weighted graph can be technically defined as a graph G(N, E) having ‘n’ nodes represented by set N and ‘e’ edges represented by set E consisting of ordered pairs, such that (n1, n2) ≠ (n2, n1) and (n1, n2) ∈ E and n1, n2 ∈ N. Games 3–5 are played by creating a weighted network from both the datasets.

4 Preliminaries

Section 4.2 explains the centrality measures used while Sect. 4.3 describes the process used for community detection in detail.

4.1 Centrality measures

We have used certain centrality measures in our implementation to compare the results our proposed algorithm with these already existing measures. For more insight into these centrality measures along with their mathematical derivations, refer to the work of Zhan et al. [35]. The centrality measures used in our study are as follows:

  1. (i)

    Eigen-vector Centrality (EC): EC [29] considers the relative power or significance of the nodes. Here, each node is assigned a value representing its relative significance considering the fact that nodes which are connected to high-power nodes have high influence over the network in comparison to those who are connected to low-power nodes.

    $$C_{e}\left( {v_{i} } \right) = \frac{1}{\lambda }\sum _{{j \in M(v_{i} )}} C_{e} (j)$$
    (1)

    In Eq. (1), \(M({v}_{i})\) is set of neighbours of \({v}_{i}\), and \(\lambda\) is a constant. Ce(vi) is the measure of EC for vertex vi.

  2. (ii)

    Betweenness Centrality (BC): BC [9] measures how strongly two nodes are connected via a given node. It is estimated as the ratio of the aggregate of shortest distances between any two nodes in the network, on which the node lies, to the shortest path between the two nodes considered.

    $$C_{b}\left({v}_{i}\right)=\sum_{j=1, k\ne 1}^{N}\frac{{g}_{jik}}{{g}_{jk}}$$
    (2)

    In Eq. (2), \({g}_{jk}\) is the total number of shortest paths from \({v}_{j}\) to \({v}_{k}\), and \({g}_{jik}\) is the number of paths that pass through \({v}_{i}\). Cb(vi) is the measure of betweenness centrality for vertex vi.

  3. (iii)

    Closeness Centrality (CC): CC [2] measures how quickly information can be spread from one node to all the other nodes in a network. It is measured as the inverse of the total sum of all shortest path distances between a given node to all other nodes in a network.

    $$C_{c}\left({v}_{i}\right)=\sum_{j=1}^{N}\frac{1}{d({v}_{i}{,v}_{j})}$$
    (3)

    In Eq. (3), \(d({v}_{i}{,v}_{j})\) is the distance between vertices \({v}_{i}{ \mathrm{and} v}_{j}\), and Cc(vi) is the measure of closeness centrality for vertex vi.

Single centrality measures suffer from certain disadvantages due to the failure to recognize the effects when considered in groups on node functionality. Group centrality measures place great focus on operating in groups of nodes and not on their individual functionalities. Nonetheless, group centrality also suffers from a drawback by focusing on a- priori-determined node groups and contributing to confusion when prioritizing individual nodes within the network.

4.2 Community detection

The algorithm divides the task of detecting influential nodes into two parts:

  1. (i)

    Detecting significant communities in a graph

    We used the ECG algorithm to detect the communities present in the graph, primarily because of its efficiency in accuracy and the number of communities found. The latter metric enables us to experiment with the number of communities and see the effect of that on the infected scale in our SIR Model. The ECG algorithm works with consensus clustering. It has two main steps—generation and integration. The generation process uses the Multilevel Louvain algorithm [5]. The ML algorithm creates a hierarchy of partitions, and ECG uses the randomized partitions at level-1. The number of the randomised partitions used determines the size of the ensemble of clusters. The integration step involves applying ML on the weighted version of a graph, G, where the weight of each edge is given by:

    $${W}_{P}\left(u, v\right)= \left\{\begin{array}{c}{ w}_{*}+ \left(1- {w}_{*}\right) . \left(\frac{\sum_{i=1}^{k}{v}_{{P}_{i}}(u, v)}{k}\right), \text{if }\left(u, v\right) \text{is in the 2-core of }G \\ {w}_{*}, \text{otherwise}\end{array}\right.$$
    (4)

    In Eq. (4), 0 < \({w}_{*}\)  < 1 is the minimum ECG weight and \({v}_{{P}_{i}}\left(u, v\right)= \sum_{j=1}^{{l}_{i}}{1}_{{C}_{i}^{j}}\left(u\right).{1}_{{C}_{i}^{j}}\left(v\right)\) indicates if the vertices u and v co-occur in the same cluster of Pi or not. The k-core of a graph is its maximal subgraph such that every vertex has degree of atleast k. In this case, the 2—core of G is its maximal subgraph such that each vertex has degree of atleast 2.

    For this, we proposed an enhanced version of the EdMot algorithm by replacing the Louvain algorithm for network partitioning by the ECG algorithm. The EdMot approach for network rewiring has been used to increase connectivity within connected components in the graph. This approach takes into account the higher-order community structure instead of only lower-order structures. The algorithm aims to unravel peculiar characteristics at the level of network motifs. These are basic interaction patterns in a graph that recur throughout graph networks, much more than in random networks. This is crucial to inculcate while considering the influence of nodes in a graph.

  2. (ii)

    Finding the most influential node from each community

    For finding the most influential node per community in a graph, we use The Shapley Value. On the basis of the concept of marginal (or borderline) contribution, an important solution concept was proposed by Shapely [1, 22].

    Player \({i}^{^{\prime}}s\) SV, denoted by \({SV}_{i}\left(v\right)\) is equal to the weighted mean of \({i}^{^{\prime}}s\) borderline contributions to each coalition C, to which the player may belong.

    $$SV_{i}\left( v \right) = \frac{1}{{n!}}\sum\limits_{{\pi \in \pi (I)}} {\{ v\left( {C_{\pi } \left( i \right)\bigcup {\{ i\} } } \right) - v(C_{\pi } \left( i \right))\} }$$
    (5)

    In Eq. (5), the aggregate count of players is given by ‘n’ while \(\uppi\)(I) gives the set of all permutations with ‘n’ players. More insight into the SV and its derivation can be found in [1, 22].

This idea is based on the theory of cooperative games—an element of game theory that allows players to form coalitions to maximize their game yields. Cooperative games theory provides a high-level approach as it describes only the coalitions' structure, strategies and benefits.

4.3 Coalition games

Coalitions are gatherings of players that form the essential or fundamental elements of decision making. These are assumed to uphold cooperative conduct which makes it reasonable to view these games as a contest between alliances of participants and not between separate players. The core assumption here is that as the game proceeds, an eminent alliance or coalition comprising all participants will manifest eventually.

5 Proposed EDGly algorithm for influential nodes detection

Focusing on Game Theory's Shapley algorithm, we referred to the algorithms described in Michalak and Szczepański's work [1, 22]. In both weighted and unweighted networks, the exact analytical formulae for SV-based centrality were established. The SV-based centrality polynomial-time algorithms have been developed.

Step 1: Creation of weighted and un-weighted network graphs:

Graphs are created by using the networkx library in Python for both USAir97 and Facebook datasets. Games 1 and 2 require unweighted graphs whereas the remaining games require weighted graphs. Please refer Sect. 3.

Step 2: Detection of communities:

Connectivity within the network is enhanced by network rewiring as explained in Algorithm 1. After this, communities are detected using the ECG algorithm for the detection of network communities. In line with EDGly algorithm, from each community, we picked the most influential node (as described in Step 3). It was observed that the number of communities identified does not change results significantly. We have shown the seven communities and their corresponding influential nodes.

figure a
Step 3: Coalition games based on Shapely algorithm:

SV is the average marginal cost contribution across all potential coalitions of a function value. The Shapley algorithm is applied carefully and tries to find the top-k nodes that might be the most prominent ones.

Step 3: Coalition games based on Shapely algorithm:

Specifically, we concentrated on five underlying network-defined coalition games that vary in degree and centrality of the network. Each game has a certain characteristic function v(C) which represents how prominent a particular node is to a given coalition C. For more insight into the working of these games and their underlying mathematics, refer to [1, 22]. The game descriptions are as follows:

  1. (i)

    Game 1: In this game, we consider all the permutations of all the nodes that are immediately reachable, by one hop to the node ni ∈ N(G). Let each random permutation be denoted by Ρi, the neighbours of node ni, in the graph G(N, E) be denoted by ni.neighbours and the degree of node ni, be denoted by ni.degree. Algorithm 2 describes the procedure involved in SV calculation.

  2. (ii)

    Game 2: In many real-life social scenarios, often taking into account nodes that are directly attached to each other is not enough. A rumour source, will more often than not, affects farther nodes.

    For the purpose of taking relationships with farther nodes into account, and generalising the game, we introduce a value, p, depicting the number of agents that the node is adjacent to in a coalition. In this game, a node is considered ‘influenced’ if at least p of its neighbours are influenced. We divide the analysis using this game into 2 parts, first, where the degree of the node is less than p and second where the degree is more than p. Algorithm 3 describes the procedure involved in SV calculation.

  3. (iii)

    Game 3. In this game, we introduce the concept of weighted graph networks. This game is an extension of game 1; it uses the Dijkstra Algorithm to compute the distance between 2 nodes. The cutoff value, d, is the maximum permissible distance of a node from any member in a given coalition. The extended degree is defined as the size of the set of all nodes that are at-most distance ‘d’ away from the node ni. Algorithm 4 describes the procedure involved for SV calculation.

  4. (iv)

    Game 4. This is a generalization of game 3. Here we work with the assumption that a node closer to a coalition will have a greater effect on it than some other node farther away, even if both nodes satisfy the cut-off criteria as in game 3. For this purpose, we introduce a positive-valued decreasing function f(x). f(d) refers to the function which has a directly proportional effect on SV of the coalition which is ‘d’ units away from a node. The marginal contribution of each node ni through node ninj, for each coalition Ci gives SV, as shown in Algorithm 5.

  5. (v)

    Game 5. This is a generalization of game 2 in case of weighted networks. Here we have defined a cut-off value (ni) for each ni ∈ N(G). For every coalition C, d(nj, C) = ∑njni.neighbours W(ni, nj ), where W(ni, nj) represents the edge’s weight between nodes ni and nj (0, if no edge exists).

A node ni marginally contributes node njni. neighbours to the value of coalition Ci if and only if njCi and d(nj) - W(ni, nj ) ≤ W(Ci , nj ) < d(nj, C). Algorithm 6 describes the procedure for calculating the SVs.

Step 4: Estimating centrality measures:

After working on the five coalition games, we introduced multiple centrality measures to determine the network's most powerful node with the highest scope or effect. To generate an elaborate comparison, various network centrality measures such as EC, BC, CC, have been used.

6 Illustration with example

A synthetic network is constructed to illustrate the working of our proposed EDGly algorithm, where nodes in communities are denoted with different colours and the edges represent the connection between communities.

A hypergraph, is formed in the first step as shown in Fig. 1. After partitioning the largest connected component into modules in Step 2, two modules can be identified and a new edge set is constructed to derive a clique from each module, A clique is a graph that has an edge connecting every pair of vertices—the new edges are shown as the dashed lines. A rewired network is obtained by combining the current edge set into the original network by reorienting the network in Step 3. Finally, ECG algorithm is used to partition the rewired network into communities. 

Fig. 1
figure 1

Algorithmic flow for EDGly

After the community structure of the graph is obtained via ECG algorithm in Step 4, the game-theoretic Shapely centrality value is used to detect the most influential node from each community. Accumulation of these nodes from all communities in the graph gives the set of most influential nodes of the network.

Table 1 depicts the results of using EDGly on the synthetic dataset. Three communities are detected and further influential nodes are obtained by applying the Shapely centrality using Games 1–5. The three influential nodes obtained from game 1 and 4 are nodes 1, 9 and 12. Similarly, for game 2, the nodes are 3, 5 and 11. For game 3, the nodes are 1, 6 and 12 while for game 5, the nodes are 3, 9 and 11. We get the set of influential nodes by accumulating the most influential node from each community.

Table 1 Result of EDGly on synthetic dataset

7 Evaluations and results

Section 7.2 explains SIR Model [37] used to compare our model with the prominent existing centrality algorithms to identify the network's most powerful node. To evaluate and compare the performance of EDGly, we have used various benchmark network centrality measures such as EC, BC, CC; these centrality measures describe the influence of nodes on the basis of the structure of the network, and are helpful to depict an accurate effect of the impact of community structure on the influence of the nodes in a graph. We have further used the SIR model. We have compared the algortihms on the basis of their spreading abilities that can be measured by final infected scale F(tc) value, that is obtained from SIR simulation and widely used to measure the spreading ability of nodes.

To generate an elaborate comparison, Sect. 7.3 explains the evaluations performed. Sect. 7.1 shows the detailed comparison of results of our proposed EDGly algorithm with those of the Shapely centrality measure and of Shapely centrality with EdMot-Louvain for community detection.

7.1 SIR model

According to this model, the dynamics of rumour spreading can be studied. There are three types of nodes in a typical rumour propagation model: (i) susceptible nodes capable of infection, (ii) infected nodes capable of further spreading the rumour, and iii) recovered nodes that are healed and no longer capable of infection. The most influential nodes identified by any of the algorithms are considered to be the initial sources of rumour propagation. These nodes are marked as infected and all the remaining nodes are marked as susceptible.

We used the USAIR97 dataset [6] to find the most significant airports in an air-travel network. We assumed the initially infected nodes to be most influential airport per graph community. As each edge holds the attribute for the number of flights from one airport to another, we were able to determine the coverage of all locations via the initial influential airport at the end of the simulation.

In the Facebook dataset [25], we aimed to depict a simulation of rumour spreading. The most influential nodes identified by any of the algorithms were considered to be the initial sources of rumour propagation. These nodes were marked as infected and all the remaining nodes were marked as susceptible. At the end of the simulation, we were able to understand the influence of each person predicted by our algorithm. A similar understanding can be used for targeted advertising. The versatility of the model can be gauged by analyzing results on datasets from various domains.

Each iteration is considered to be one time-step, where each of the infected nodes can spread the rumour to its susceptible neighbours with a probability β. Also, each currently infected node becomes recovered with a probability γ, i.e., this node will not spread rumours in the further iterations.

7.2 Evaluations

For the purpose of our experiments, we set β = 0.8 and γ = 0.2. The iterations go on till there are no more new infected nodes. The term R0 = β/ γ is known as the reproduction rate and is extremely crucial to the spreading process. If R0>1, then rumour will spread otherwise it will die out soon. Spreading potential is assessed at each time-step ‘t’ by the infected F(t) scale [11]. The value of F(t) is obtained by using the formula given by Eq. (6).

$$F\left(t\right)= \frac{{n}_{i(t) }+ {n}_{r(t)}}{n}$$
(6)

Here, ni(t) is the number of infected nodes at time t, nr(t) is the number of recovered nodes at time t, and n is the total number of nodes. Larger the value of F(t), more the influencing capacity of our initial infected set.

After a few iterations, when the rumour spread starts to slow down, we reach a point of saturation where no more new nodes get infected. This is considered to be a stable state and is represented by F(tc), as shown in Eq. (7).

$$F\left({t}_{c}\right)= \frac{{n}_{r({t}_{c})}}{n}$$
(7)

7.3 Results

Tables 2 and 3 display the F(tc) results obtained in the SIR simulations for each of the five coalition games. Out of the seven communities identified, the most influential node from each community is identified via the EDGly algorithm. The final infected scale, produced by these spreading nodes are then compared for each algorithm. It can be seen that the nodes identified by the EDGly algorithm constantly result in a higher final infected scale F(tc) and hence a better coverage of the network.

Table 2 Comparison of infected scale at stable state in various benchmark algorithms for USAIR97 dataset
Table 3 Comparison of infected scale at stable state in various benchmark algorithms for Facebook dataset

It was observed that the change in the number of communities identified in the graph, does not show significant variation in the infected scale, hence we have picked the number of communities as 7.

In Figs. 2 and 3, each graph shows the experimental results of the infected scales by time, F(t)/t received by the SIR Model, on the five different coalition games, using USAIR97 and Facebook datasets respectively. The results are obtained by taking the average of the results of 100 independent runs of SIR simulations. Larger F(t) implies a larger scale of control in a network with the same spreading time t, which in turn shows a faster spreading rate. It can be seen that the nodes identified by the EDGly algorithm result in a significantly higher final infected scale F(tc)—depicted by navy blue—and thus, EDGly performs significantly better than the centrality measures EC, BC and CC, marked by colours—brown, light blue and sea green.

Fig. 2
figure 2

Spreading speeds and influence of initially infected nodes of the various methods employed for USAIR97 dataset

Fig. 3
figure 3

Spreading speeds and influence of initially infected nodes of the various methods employed for Facebook dataset

Further, it is seen that EDGly also consistently performs better than other algorithms. The addition of community detection using the EdMot—Louvain algorithm results in a significant jump in the infected scale, further the replacement of Louvain, by the ECG algorithm drastically improves the model. Thus, it can be seen from the figures that EDGly spreads faster than other benchmark methods in all games in almost all scenarios and over varied data.

8 Conclusion

We aimed at organizing the nodes into community structures and employing SV-based centrality algorithm for finding the most influential nodes from each community. EdMot approach, in tandem with ECG algorithm, was used to detect communities in a graph because of its unique quality of considering not only lower-order structures but also higher-order community structures. Finally, the SIR model was used to evaluate the capability and scope of spreading ability of each influential node identified during the process.

To evaluate our algorithm on various real-world scenarios, we examined five different game situations, thereby taking into consideration various approaches to determine the most influential nodes in a given dataset. This helped us to gain a deeper and holistic understanding of the game-theoretical algorithm.

It is observed that our proposed algorithm, EDGly algorithm, not only compares well to other commonly-accepted centrality measures, but also drastically improves results by the addition of the task of community detection, to the idea of using coalitional game theory. This is in line with our intuitive prediction, that the application of solely using coalitional game theory on an interconnected network, where each node acts as a player, and maximizes his cumulative yield over all permutations, creates a certain level of restriction in all the possible permutations.

With the application of community detection, we were able to analyze the network on a more individualistic level along with combining the positive effects of coalitions in game theory. This approach thus creates a fine balance between the two approaches and performs better than state of the art approaches.

The model has shown significant success as the most prominent nodes are successfully identified for both the datasets used. We plan to extend the idea of finding the most powerful node in social networks to numerous other similar applications for future work, such as the Internet, or urban networks, and involving a given node in disease dynamics. This will help us understand our algorithm's efficiency and accuracy in multiple applications in the real world.

Further, various optimisation techniques on the SV algorithm, for example, Fuzzy Logic will be implemented for mining much larger social networks and to improve accuracy and other relevant metrics of the project. Fuzzy-based implementation will solve various complexities and limitations that we’re currently encountering.