Abstract
The influence maximization problem in social networks is an optimization problem in viral marketing. This problem is concerned with identifying a certain number of people with the most influence in the social network level. Considering the NP-hard of this problem, finding an optimal solution with acceptable accuracy and the low running time is of the high importance. To this purpose, GIN (Group of Influential Nodes) algorithm is presented in this article which creates different groups of graph nodes with more connections than other groups. Then, it selects specific nodes from each group to reduce the search space to find the most influential nodes. Following the greedy method, it selects the seed nodes with the highest expected diffusion value. Experimental results show that the GIN algorithm has provided high influence spread along with low running time in comparison algorithms on all seven real-world datasets.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The social networks like LinkedIn, Facebook, and Twitter provide the grounds for people to interact with no time and space limitation [1]. Consequently, information diffusion occurs across a wide range of social networks at high speed. That’s why business companies are using the speed of information diffusion on social networks to achieve widespread advertising and business optimization. Given the limited advertising resources, the main challenge in this problem is to select a specific set of influential persons that most effectively on other people in a short time. To solve this challenge, the influence maximization problem in social networks has been provided.
The influence maximization problem identifies the active nodes as seed nodes. Algorithms in this problem take the graph G and a number k as input and generate seed set, with the intention that the expected number of nodes influenced by the seed set by the stochastic process of the diffusion model, One major problem in the influence maximization problem to maximize the expected size of the final active set, given some constraints on the seed set. The diffusion process of the seed nodes is performed to maximize the influence spread. The influence of seed nodes is determined by the number of nodes activated. The influence maximization problem under both IC and LT models are NP-hard. In other words, the goal of the influence maximization problem is to find initial active people who have the most influence on other people in a short time under a diffusion model [2, 3].
Viral marketing [1, 4,5,6,7,8,9], terrorist attack prevention [4], and network control [10, 11] constitute the applications of the influence maximization problem that viral marketing has attracted researchers more than other applications. In viral marketing, unlike traditional marketing where information diffusion was through newspaper advertising and the mass media, the focus is on peoples effective through family members, relatives, acquaintances, and friends by word of mouth [12]. In this way, diffusion is information transmission from person to other person, such as virus transmission [1]. Domingos and Richardson were the first who addressed the influence maximization problem [6]. Kempe et al. formulated and proved that this problem is an NP-hard [2], who then proposed an approximation algorithm called General Greedy algorithm to solve the influence maximization problem [2]. This algorithm has very low speed and high running time [2]. Some researchers proposed some algorithms to overcome drawbacks in the General Greedy algorithm. The existing influence maximization algorithms have several major drawbacks:
1. They have no effective methods to reduce the search space for influence spread calculation and do not optimally select the candidate nodes.
2. They are not suitable for large-scale networks due to high computational time.
Handling the NP-Hardness and to optimize the seed selection by reducing search space while keeping the running time faster are important to influence maximization problem. So, to address these issues, we propose a new algorithm named GIN heuristic algorithm. In this algorithm, by the identified nodes with a more common number of connections and neighbors than other nodes of the graph, different groups of nodes are generated. By selecting certain nodes from each group, the search space where the final effective nodes are to be found is reduced. Its goal is to increase the speed of the algorithm. During this process, the nodes with the highest expected diffusion value are selected as the seed nodes.
The contributions of our work are as follows:
• Use a grouping method for graph nodes.
• We use candidate nodes optimally to reduce the search space for influence spread calculation.
• The experimental results on seven real-world networks show that the proposed algorithm GIN performs better than the other algorithms in terms of influence spread and running time.
The structure of this article is organized as follows. In Sect. 2, the related works for influence maximization problem is introduced. In Sect. 3, problem definition is expressed. Our proposed algorithm is presented in Sect. 4. Experiments results and evaluations are described in Sect. 5 and we conclude this article in Sect. 6.
2 Related works
Identifying seed nodes is one of the key issues for influence maximization problem in social networks. This is one of the NP-hard problems [2] that has many challenges like high influence spread and low running time, especially on large-scale graphs. Therefore, researchers have presented different algorithms to solve the influence maximization problem. For the first time, Kempe et al. presented an approximate algorithm called General Greedy algorithm with high influence spread that has an optimal approximation [2]. However, this algorithm has a high running time, especially on large-scale graphs [13]. So they proposed the High Degree algorithm by selecting nodes with the highest degree in graph [2]. The High Degree algorithm is very fast, even on large-scale graphs, but has a lower influence spread than the General Greedy algorithm.
Leskovec et al. proposed the CELF algorithm [14]. This algorithm uses lazy evaluation in functions that have the submodularity, significantly reduces influence spread evaluation. Although the CELF algorithm is faster than the General Greedy algorithm, it does not have acceptable speed. Then, Chen et al. proposed the NewGreedyIC algorithm to improve the CELF algorithm [15]. This algorithm computes the influence spread by creating sample graphs from the original graph. However, the NewGreedyIC algorithm improves the running time of the CELF algorithm, but this algorithm still needs to improve the running time, especially on large-scale graphs.
Chen et al. presented the DegreeDiscount algorithm to improve the running time of the CELF algorithm [15]. This algorithm uses a degree discount method for influence spread computations. The DegreeDiscount algorithm has a better running time than the CELF algorithm but does not guarantee optimal approximation. Also, Jiang et al. presented an evolutionary algorithm SA to improve the DegreeDiscount algorithm [16]. This algorithm has gained more influence spread than the DegreeDiscount algorithm, with a high number of iterations and increased running time. Then presented the SAEDV algorithm to reduce the running time of the SA algorithm [16]. In this algorithm, the seed nodes are selected with the maximum expected diffusion value based on the SA algorithm. The SAEDV algorithm has a lower running time than the SA algorithm.
Goyal et al. introduced the SIMPATH algorithm to improve the running time of the CELF algorithm [17]. This algorithm computes the influence spread without using Monte Carlo simulation and using simple paths count. The SIMPATH algorithm has a low memory overhead but does not guarantee optimal approximation. Another algorithm proposed to solve the influence maximization problem is the MIA algorithm [18]. This algorithm was proposed by Wang et al. to improve the computation of the influence spread of the DegreeDiscount algorithm. In the MIA algorithm, using the Dijkstra algorithm, the shortest maximum influence path for each node of the graph is calculated. Although the MIA algorithm has an acceptable running time by eliminating the number of Monte Carlo simulation, the disadvantage of this algorithm is the high memory overhead.
Jung et al. proposed the IRIE algorithm to improve the MIA algorithm [19]. In this algorithm, the influence spread is calculated recursively based on the influence spread of the neighbors of each node in the graph. In the IRIE algorithm, the memory overhead is very low, but the influence spread is low compared to the General Greedy algorithm. In the following, He et al. presented the CLDAG algorithm to improve the General Greedy algorithm [20]. The focus of this algorithm is on blocking the influence maximization problem. Thus, the most important advantage of this algorithm is not dependent on the number of seed nodes, and its main disadvantage is the high memory overhead.
Tang et al. proposed the TIM algorithm to improve the SIMPATH algorithm [21]. In this algorithm, the first set of RR-Set includes nodes that have paths to each other, and then the seed nodes are selected using graph nodes covering. Benefits of TIM algorithm are low running time and low memory overhead. Then, Lu et al. developed the RR-CIM algorithm to improve the High Degree algorithm [22]. This algorithm solves the influence maximization problem by using complementary products. The parameters of this algorithm close the influence maximization problem to the real-world, but the algorithm has a high running time.
In the following, He et al. presented a new algorithm based on community detection to improve the High Degree algorithm [23]. In this algorithm, each community is considered as one node. Then influential nodes are selected based on the red-black tree. This algorithm avoids the rich-club phenomenon but has a high running time. Also, Morone et al. Presented the CI algorithm based on localization of influence spread computation [24]. In this algorithm, the influence spread computation in a circle is limited to the radius L. The main advantage of the CI algorithm is to provide acceptable influence spread into high-density social networks, however, that the calculation of influence spread and running time of the CI algorithm depends on the parameter L and the number of seed nodes.
Then the topology-based algorithm LIR is presented by Liu et al. to improve DegreeDiscount and NewGreedyIC algorithms [25]. First, in this algorithm is calculated the LI of each node, which is indicated the local index of each node that avoids the Rich-club Effect but does not guarantee optimal approximation. The CoFIM algorithm is also based on community detection proposed by Shang et al. to improve High Degree algorithm [26]. In this algorithm, first, the influence spread is calculated based on diffusion between communities and then diffusion inside communities. The advantage of this algorithm is low running time, but the overlapping nodes are not considered. Then Wu et al. developed the LAIM algorithm to improve CoFIM algorithm [27]. By this algorithm, the influence is estimated locally. The main advantage of the LAIM algorithm is its very low running time, and its main disadvantage is that it does not guarantee optimal approximation.
Cui et al. presented an evolutionary algorithm called DDSE to improve the running time of the CELF algorithm [28]. In this algorithm, using the genetic algorithm and local and global search, the seed nodes are selected with the highest expected diffusion value. The DDSE algorithm has influence spread very close to the CELF algorithm on some small datasets. Xie et al. developed the IRR algorithm based on the MBIC model [29]. This algorithm uses the same idea as the DegreeDiscount algorithm. The main advantage of the IRR algorithm is the use of different states based on real-world criteria for nodes, but this algorithm is dependent on the number of nodes in the graph.
The C2IM algorithm is presented by Singh et al. to improve the time-efficiency of the TIM algorithm for the influence maximization problem [10]. This algorithm uses the user’s interests and community framework to find effective seed nodes. Also, Ahmadi Beni et al. presented an algorithm called TI-SC based on community detection to improve the influence spread of the DegreeDiscount algorithm [30]. This algorithm selects seed nodes by examining the relationships between the core nodes. The TI-SC algorithm is useful to reduce the overlap in selecting the seed nodes, but this algorithm has not an efficient performance in large-scale networks.
3 Problem definition
The social networks’ graph is mapped as G (V, E), where V and E are described as follows:
-
V = {v1, v2, …, vn} is a set of graph nodes, where vi is the individuals, and n is the number of nodes in the graph.
-
E = {e1, 2, e1, 3, …, e1, n, e2, 1, e2, 3, …, en, n − 1} is a set of graph edges, where, ei, j is the relationship between two individuals or an edge between each pair of nodes (i ≠ j).
Considering G (V, E) to find k seed nodes for influence maximization problem, the subset S select from V = {v1, v2, …, vn} due to the diffusion model. The process of seed nodes selection is done until ∣ S ∣ = k so that the influence spread of S, σ(S) is maximized by Eq. (1) under the given diffusion model [31]:
The influence spread is calculated due to the diffusion model for influence maximization problem. One of the most widely used information diffusion models is the independent cascade model proposed by Kempe et al. The independent cascade model has two important properties in the influence spread function which these properties are submodularity and monotonicity. A set function f : 2V → ℝ is submodular if, for any S ⊆ T ⊆ V subsets and anyv ∈ V\T element , the marginal contribution of v element, when added to a T set can not exceed the marginal contribution of v element when added to a S ⊆ T subset due to Eq. (2) [31]:
The objective function f : 2v → ℝ is monotone if we have f(S) ≤ f(T) for eachS ⊆ T ⊆ V subset [31].
4 Proposed algorithm
In GIN algorithm, a heuristic method is proposed to identify the influential nodes for the influence maximization problem. In this algorithm, different graph nodes are created to identify nodes with more communication and more common neighbors than other graph nodes. Then, certain nodes are selected from each group. This process is done to create new search space and reduce the number of influence spread calculations. At last, the final influential nodes are selected among the nodes in the new search space which maximize the expected diffusion value as the seed nodes. Figure 1 shows an overview of the general levels of GIN algorithm.
According to Fig. 1, the GIN algorithm consists of three levels: (1) grouping graph nodes, (2) creating new search space, and (3) selecting seed nodes.
4.1 Grouping graph nodes
The first level of algorithm GIN is due to AntCBO [32] algorithm by modification. This level of GIN algorithm includes the process of creating different groups of graph nodes with more communication and common neighbors than other nodes. This level consists of six basic steps:
Step 1 For every node of graph, an available variable is defined which determines the available or non- available of each node. The value of this variable is either True or False. At first, the value of this variable is considered True for all graph nodes. If the node has an available = True value, it will be available; so, it can be selected for calculations, otherwise non-available; thus, it cannot be selected for calculations. Figure 2 shows a graph consists of 16 nodes with available = True variable for all graph nodes.
Step 2 The graph nodes are sorted from highest to lowest degree in descending order in an undirected graph, or out-degree in a directed graph and placed in list L. Thus, in the first place of list, the node is with the highest degree, and in the last place, the node is with the lowest degree from graph, according to Fig. 3.
Step 3 The group gi = {} is created (i = {1, 2, 3, …, l}). Then, the first node from list L which has an available = True value, is selected and added to gi group (gi ∪ {vi}). Then, the availability value of this node changes into available = False. In the following, vj node with the highest degree among vi node ’s neighbors which has the available = True value, is selected and added to gi group (gi ∪ {vj}). The availability value of vj node is changed into available = False (Fig. 4).
Step 4 If vi and vj nodes do not have common neighbors, the computation continues from the previous step for the next node from L list with the value of avalable = True.
Step 5 If vi and vj nodes have common neighbors, ComNei set will be created including common neighbors of vi and vj nodes (ComNei = {v1, v2, …, vm}) which {v1, v2, …, vm} will be always belonging to ComNei for each vi and vj. The vm node is selected from ComNei set with available = True value randomly and is added to gi group (gi ∪ {vm}). Then, the availability value of this node is changed to available = False. If the neighbors of vm node do not have any common with nodes within ComNei set , the availability of next node from ComNei set will be checked which is added to gi group ; otherwise, Z set including the common nodes of vm’s neighbors and ComNei set is created (Z = neighbor(vm) ∩ ComNei). Then, ComNei set is replaced with Z set . This process is repeated until ComNei = {}, or the nodes in this set have available = false value (Fig. 5).
Step 6 The previous described steps are repeated until the created groups reach a specified number (Itrgi = l). l is the maximum number of groups created; also, that is constant.
4.2 Creating new search space
In this level, the two nodes vi and vj which include the first and second added nodes are selected from each gi group. The first node selected from each gi group is the node vi with the highest degree in that group, and the second node selected is the vj node with the highest degree among the neighbors vi. Each group gi contains at least one vi node. If there exists no node vj in a gi, that group has only one node, thus the only vi node is selected. Also, if there are multiple nodes vi that have the same maximum degree value in each gi group, and similarly, if there are such multiple nodes vj, these are selected randomly. In this way, by identifying and selecting these nodes from each group, the search space to find influential nodes is reduced. This level is repeated until the selected nodes reach a specified number (Ncandidate = m). m is the maximum number of candidate nodes selected.
4.3 Selection of seed nodes
In this level, the selection of seed nodes from a new search space is done by the greedy method. Thus, according to Eq. (3), the Expected Diffusion Value (EDV) of these nodes is calculated. This equation has also been used in other studies [16, 28]. The first time, Jiang et al. proposed the expected diffusion value to estimate the actual influence spread under the independent Cascade model and employed it in their algorithm to replace Monte Carlo simulation [16]. The EDV approximates the spread by computing how many direct neighbors of the nodes in the candidate node set S are expected to be activated. According to the Eq. (3), k is the number of seed node, p is the activation probability the non-seed nodes, \({N}_S^l\backslash S\) is the neighbors of the nodes S except the nodes of set S, and τ (i) is the number of neighbors of i node in set S.
Consequently, by applying the greedy selection method, a node from the new search space which causes maximization of the marginal gain function (EDV(S ∪ {w}) − EDV(S)) as the seed node added to the set S. This process is repeated until ∣ S ∣ = k. Then, the influence spread of seed nodes is calculated using the independent cascade model. Algorithm1 shows the pseudo-code of the GIN algorithm.
5 Experiments and evaluation of the proposed algorithm
The GIN algorithm has been evaluated on seven real datasets. Then the results of influence spread and running time of algorithms evaluated under the independent cascade model in a similar system. Influence spread means the number of final nodes activated by the seed nodes in the diffusion process. The running time means the time has passed to find the seed nodes. All the codes are written in C#, and all algorithms are performed on a Windows 10 operating system with 2.20 GHz CPU (Intel® Core (TN) i5-5200U) and 8 GB of RAM.
5.1 Description of the datasets
The GIN algorithm is implemented on seven datasets of different sizes. The characteristics of these datasets are shown in Table 1.
Email [33]: This dataset is a network of emails at URV University that includes professors, administrators, technicians, researchers, and graduates. In this dataset, each node represents the email address, and the link between the emails is an edge.
NetScience [34]: This dataset contains a collaboration network of scientists working on network theory. Newman compiles this dataset in the year 2006. The nodes in this dataset represent the authors of the articles, and the edges indicate the collaboration between the authors in a common article.
Power [35]: This dataset contains information about the power grid of the Western States of the United States of America that a node is either a generator, a transformator, or a substation. Also, an edge represents a power supply line.
NetHEPT [15]: This dataset contains arXiv articles in the “High Energy Physics Theory” section, published in the years 1991 to 2003. The nodes in this dataset represent the authors and, if they work together in an article, an edge is created between them.
NetPHY [15]: This dataset contains arXiv articles in the “Physics” section. In this network, each node is an author, and the edges between two nodes represent the collaboration of two authors in one article.
Epinions1 Footnote 1: This dataset is an online social network trust-based, collected from www.epinions.com. Each node in this dataset represents customers who are users of the site, and each edge represents a secure connection.
Slashdot0902 Footnote 2: This dataset is a version of the popular site www.slashdot.org about technology news where users can tag other users as friends or foes. The network was collected in February 2009.
5.2 The compared algorithms
The GIN algorithm is compared with various algorithms, including General Greedy, High Degree, SAEDV, LIR, and DDSE algorithms based on two criteria of influence spread and running time. The algorithm is implemented with parameter l = 150, Monte Carlo simulation R = 10000, and the activation probability p = 0.01.
General Greedy According to 2], this algorithm is evaluated by Monte Carlo simulation R = 10,000 and the activation probability p = 0.01 under independent cascade model.
High Degree According to [2], in this algorithm, the influence spread of k seed nodes with the highest degree of the graph is calculated under the independent cascade model.
SAEDV According to [16], in this algorithm, choose to initialize a node set randomly and optimize it with iterations. The parameters for SAEDV algorithm are set as follows: the initial temperature T0 = 500000, the final temperature Tf = 100000, the number of inner loop q = 10 and the temperature drop after each inner loop ∆T = 2000.
LIR According to [25], in this algorithm, nodes with 0-LI are selected for the NetScience, NetHEPT, NetPHY, Epinions1, and Slashdot0902 datasets, while in the Email dataset, nodes with 0-LI are low, then nodes are selected with 1-LI. Influence spread calculations are performed by Monte Carlo simulation R = 10,000 and the activation probability p = 0.01 under independent cascade model.
DDSE According to [28], the results of this algorithm are obtained with the size of the population Npop = 10, mutation probability OPMutation = 0.1, Crossover probability OPCrossover = 0.4, Number of generation the evolution lasts MaxIt = 200 and the diversity of the population Diversity = 0.6.
5.3 Diffusion model
Researchers have proposed various types of information diffusion models [2, 36,37,38,39,40]. One of the most widely used diffusion models is the independent cascade model, which has been used in most studies [2, 5, 37, 41, 42]. Our algorithm focuses on the influence maximization problem under the independent cascade model too. This model was introduced by Goldenberg et al. [5]. In this model, the nodes are either in an active or inactive state. Each edge between two nodes v and u has the activation probability Pvu that means inactive node u is active by node v. Each active node v at time t can only activate its neighbor’s inactive node once. And then, nodes that are activated at time t − 1 can activate their neighbor inactive nodes. Thus, if every active node v at time t with probability Pvu activates its neighbor inactive nodes, then at time t + 1 considered as an active node. The diffusion process will stop when no more new nodes are activated.
5.4 Experimental results
According to the description of the proposed algorithm, first, l groups are created. Here, the GIN algorithm is tested with different l to selecting the best value of parameter l with the highest influence spread on each seven real-world datasets. The influence spread results of this algorithm with different l are shown in Tables 2, 3, 4, 5, 6, 7, and 8 on Email, NetScience, Power, NetHEPT, NetPHY, Epinions1, and Slashdot0902 datasets respectively. According to these results, obviously, the influence spread of the different l is very close to each other algorithms, but the influence spread of the GIN with l = 150 is higher than other values of the parameter l in most of the k nodes. Thus, to compare the influence spread of the proposed algorithm with other algorithms is used parameter l=150 for the GIN algorithm on all of seven datasets.
The results of the evaluated algorithms in terms of influence spread and running time are shown in Fig. 6. These algorithms were implemented with fixed activation probability (p = 0.01) and varying sizes of the seed set with the range from k = 5 to k = 50 based on the independent cascade model. The x-axis indicates the seed nodes, and the y-axis indicates the influence spread.
The influence spread of the evaluated algorithms on the Email dataset is shown in Fig. 6a. In this figure, the results of the algorithms General Greedy and GIN are close to each other. The influence spread in the SAEDV algorithm is higher than the High Degree and DDSE algorithms. The influence spread in the DDSE algorithm from k = 5 to k = 20 is very close to that of the High Degree algorithm, and from k = 25 to k = 50, it follows a gradual reduction. The LIR algorithm has the lowest influence spread than the other algorithms in the Email dataset.
In Fig. 6b, the influence spread of the evaluated algorithms on the NetScience dataset is shown. In this figure, the influence spread of General Greedy, GIN, and DDSE algorithms are higher than other algorithms and are close to each other. The influence spread in LIR and SAEDV algorithms is less than the High Degree algorithm. Also, the influence spread of the comparable algorithms on the Power dataset is shown in Fig. 6c. According to Fig. 6c, General Greedy and GIN have higher influence spread than other algorithms. The highest influence spread belongs to DDSE after than the General Greedy and GIN algorithms. Then, the influence spread from high to low belongs to the High Degree, LIR, and SAEDV in most k nodes, respectively.
The influence spread of the evaluated algorithms on the NetHEPT dataset is shown in Fig. 6d, where the highest influence spread is for General Greedy and then GIN algorithm. The influence spread in the GIN algorithm, for k = 5 to k = 25, is very close to the General Greedy algorithm. The DDSE algorithm has a higher influence spread than the High Degree, LIR, and SAEDV algorithms. Of course, the results of the DDSE algorithm from k = 5 to k = 20 are close to the GIN algorithm. After the DDSE algorithm, the highest influence spread is for the High Degree algorithm and then the LIR algorithm. The influence spread of the SAEDV algorithm is very low on the NetHEPT dataset.
The influence spread of the evaluated algorithms on the NetPHY dataset is shown in Fig. 6e. In this figure, the influence spread of General Greedy and then GIN are higher than other algorithms. Also, the influence spread from most to less is for the DDSE algorithm and then the High Degree, LIR, and SAEDV algorithms. Also, the influence spread of the evaluated algorithms on the Epinions1 dataset is shown in Fig. 6f. Epinions1 is a large-scale dataset, and the General Greedy algorithm cannot run on it because this algorithm has a high running time. Therefore, the run of the General Greedy algorithm on the Epinions1 dataset is ignored. As observed in Fig. 6f, the GIN and High Degree algorithms have higher influence spread than other algorithms. However, GIN and High Degree algorithms have very close results in some k nodes, but the GIN algorithm in most k nodes has a higher influence spread than the High Degree. Then the influence spread of the DDSE algorithm is more than SAEDV and LIR algorithms.
In Fig. 6g, the influence spread of the algorithms on the Slashdot0902 dataset is shown. This dataset, like the Epinions1 dataset, has a large-scale, and the General Greedy algorithm cannot run on it because this algorithm has a high running time. Therefore, the run of the General Greedy algorithm on the Slashdot0902 dataset is ignored. According to Fig. 6g, the influence spread of the GIN algorithm is higher than other algorithms. Of course, the High Degree and DDSE algorithms in some k nodes have results close to the GIN algorithm. But in general, the GIN algorithm in the Slashdot0902 dataset has shown more influence spread than the compared algorithms. The SAEDV algorithm and then the LIR algorithm in this dataset has the lowest influence spread.
The speedup % (in terms of influence spread) of the GIN algorithm over other algorithms is shown in Table 9. In this table, the General Greedy algorithm called GG and the High Degree algorithm called HD. Due to the General Greedy lack of scalability, its results are not reported for Epinions1 and Slashdot0902 datasets. Table 9 shows the speedup % (in terms of influence spread) of our proposed algorithm GIN over other algorithms. The speedup is computed using Eq. (4). We can see that the GIN algorithm has a positive speedup over other algorithms in this table.
According to the Eq. (4), α is the influence spread of the GIN algorithm, and β is the influence spread of the other algorithms for different k seed nodes.
The results of the running time algorithms are shown in Fig. 7 on the seven datasets Email, NetScience, Power, NetHEPT, NetPHY, Epinions1, and Slashdot0902. These results are for k = 50 and the number of the Monte Carlo simulation R = 10000 under the independent cascade model per second. According to Fig. 7, the High Degree and LIR algorithms have the lowest running time in all seven datasets. Of course, the High Degree and LIR algorithms have low the influence spread on more comparable datasets. Then, the running time from lowest to highest belongs to the GIN, DDSE, SAEDV algorithms, and then to the General Greedy algorithm.
Given the results of influence spread and running time of the compared algorithms in Figs. 6 and 7, it is quite obvious that the GIN algorithm has high influence spread along with low running time in all seven datasets.
6 Conclusion
Influence maximization is a problem that has been addressed in social networks analysis and viral marketing. The purpose of this problem is to find the k individual that, by activating them, the highest number of people in a social network can be activated at an acceptable time. For this purpose, the GIN algorithm is presented in this article. In this algorithm, the process of creating different groups, including graph nodes with more number of connections and more common neighbors than other groups, is performed. Hence, the search space to find the final influential nodes is reduced. This process reduces the number of computations and increases the speed of the proposed algorithm. Next, nodes from the new search space that maximize the marginal gain function of the expected diffusion value are selected as the seed node. Experimental results show that the GIN algorithm, in comparison algorithms on all seven real-world datasets, has provided high influence spread along with low running time.
References
Rui X et al (2019) A reversed node ranking approach for influence maximization in social networks. Appl Intell 49(7):2684–2698
Kempe D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Banerjee S, Jenamani M, Pratihar DK (2019) ComBIM: a community-based solution approach for the budgeted influence maximization problem. Expert Syst Appl 125:1–13
Huang H, Shen H, Meng Z (2019) Community-based influence maximization in attributed networks. Appl Intell 1–11
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of word-of-mouth. Mark Lett 12(3):211–223
Domingos P, Richardson M (2001) Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Richardson M, Domingos P (2002) Mining knowledge-sharing sites for viral marketing. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Pei S et al (2017) Efficient collective influence maximization in cascading processes with first-order transitions. Sci Rep 7:45240
Chen Y et al (2020) Semantics-aware influence maximization in social networks. Inf Sci 513:442–464
Singh SS et al (2019) C2IM: community based context-aware influence maximization in social networks. Phys A Stat Mechan Appl 514:796–818
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web (TWEB) 1(1):5
Liqing Q et al (2020) TSIM: a two-stage selection algorithm for influence maximization in social networks. IEEE Access 8:12084–12095
Srinivasan S, Babu LD (2019) Interest aware influential information disseminators in social networks. SN Appl Sci 1(11):1456
Leskovec J, et al (2007) Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM
Chen W, Wang Y, Yang S (2009) Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Jiang Q, et al (2011) Simulated annealing based influence maximization in social networks. In Twenty-fifth AAAI conference on artificial intelligence
Goyal A, Lu W, Lakshmanan LV (2011) Simpath: an efficient algorithm for influence maximization under the linear threshold model. In 2011 IEEE 11th international conference on data mining. IEEE
Wang C, Chen W, Wang Y (2012) Scalable influence maximization for independent cascade model in large-scale social networks. Data Min Knowl Disc 25(3):545–576
Jung K, Heo W, Chen W (2012) Irie: scalable and robust influence maximization in social networks. In 2012 IEEE 12th international conference on data mining. IEEE
He X, et al (2012) Influence blocking maximization in social networks under the competitive linear threshold model. In Proceedings of the 2012 siam international conference on data mining. SIAM
Tang Y, Xiao X, Shi Y (2014) Influence maximization: Near-optimal time complexity meets practical efficiency. In Proceedings of the 2014 ACM SIGMOD international conference on management of data
Lu W, Chen W, Lakshmanan LV (2015) From competition to complementarity: comparative influence diffusion and maximization. Proc VLDB Endow 9(2):60–71
He J-L, Fu Y, Chen D-B (2015) A novel top-k strategy for influence maximization in complex networks with community structure. PLoS One 10(12):e0145283
Morone F et al (2016) Collective influence algorithm to find influencers via optimal percolation in massively large social media. Sci Rep 6:30062
Liu D et al (2017) A fast and efficient algorithm for mining top-k nodes in complex networks. Sci Rep 7:43330
Shang J et al (2017) CoFIM: a community-based framework for influence maximization on large-scale networks. Knowl-Based Syst 117:88–100
Wu H et al (2018) LAIM: a linear time iterative approach for efficient influence maximization in large-scale networks. IEEE Access 6:44221–44234
Cui L et al (2018) DDSE: a novel evolutionary algorithm based on degree-descending search strategy for influence maximization in social networks. J Netw Comput Appl 103:119–130
Xie G et al (2019) MBIC: a novel influence propagation model for membership-based influence maximization in social networks. IEEE Access 7:75696–75707
Beni HA, Bouyer A (2020) TI-SC: top-k influential nodes selection based on community detection and scoring criteria in social networks. J Ambient Intell Humaniz Comput 1–20
Chen W, Lakshmanan LV, Castillo C (2013) Information and influence propagation in social networks. Synth Lect Data Manag 5(4):1–177
Zhou X et al (2015) An ant colony based algorithm for overlapping community detection in complex networks. Phys A Stat Mech Appl 427:289–301
Guimera R et al (2003) Self-similar community structure in a network of human interactions. Phys Rev E 68(6):065103
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E 74(3):036104
Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’networks. Nature 393(6684):440–442
Peng S et al (2018) Influence analysis in social networks: A survey. J Netw Comput Appl 106:17–32
Wang W, Street WN (2018) Modeling and maximizing influence diffusion in social networks for viral marketing. Appl Netw Sci 3(1):6
Singh SS et al (2019) MIM2: multiple influence maximization across multiple social networks. Phys A Stat Mech Appl 526:120902
Tejaswi V, Bindu P, Thilagam PS (2016) Diffusion models and approaches for influence maximization in social networks. In 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE
Banerjee S, Jenamani M, Pratihar DK (2018) A survey on influence maximization in a social network. arXiv preprint arXiv:1808.05502
Zhong Z, Yang H (2019) Community Structure-based re-ranking information influence maximization algorithm for typhoon disasters. Ekoloji Dergisi 2019(107)
Zhu R, et al (2018) QuickIM: efficient, accurate and robust influence maximization algorithm on billion-scale networks. arXiv preprint arXiv:1805.12320
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest.
Additional information
Publisher's Note
pringer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aghaee, Z., Kianian, S. Influence maximization algorithm based on reducing search space in the social networks. SN Appl. Sci. 2, 2067 (2020). https://doi.org/10.1007/s42452-020-03812-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452-020-03812-w