1 Introduction

With the rapid development of information technology, the number of network users is increasing and its hidden potential value is also growing. People’s lives and social network are increasingly closely linked and the influence of the network on people’s lives is also increasing. With the emergence of related research and industries, it is of great practical significance to investigate the influence maximization in social networks for purposes such as advertising, marketing, and public opinion monitoring.

In the field of social network influence maximization, the most commonly used algorithm is the greedy algorithm [1, 2] in the initial stage of research. The greedy algorithm uses a Monte Carlo simulation and the simulation of information dissemination based on this method is very accurate. However, in large-scale networks, there is a need for time-efficient algorithms; therefore, many scholars have improved the greedy algorithm and proposed heuristic algorithms [3,4,5,6]. Because of the need to solve real problems, scholars have put forward algorithms for maximizing the influence of specific issues. For example, researchers have developed influence maximization algorithms with budget constraints and time constraints and algorithms focused on dynamic networks. Methods have been developed to solve problems by combining economic concepts with the Shapley value and genetic algorithms.

Though many improved greedy algorithms and heuristic algorithms are proposed, there are relatively few studies on the attenuation of the influence intensity over time in information dissemination. In the algorithm proposed by Wang et al. [7], the information dissemination model takes into account the information decay over time in the process of information dissemination. A message can spread rapidly among people and disappear quickly or slowly over time. Accurately and effectively capturing the attenuation of this information will contribute to the research of influence maximization. In greedy algorithms, each simulation step requires considerable time. The intensity of information dissemination in word-of-mouth marketing or viral marketing will weaken over time. In a particular stage of information dissemination, the influence of information dissemination is very small. Many studies have shown that the diffusion of information in the network will weaken significantly after three hops. In research, three-hop propagation can solve the problem of low time efficiency.

In influence maximization research, modeling the influence of propagation is a significant problem and challenge. In the dissemination of influence, a node activates inactive neighbor nodes so that they can accept new ideas in order to disseminate the information. The problem is to identify a small number of nodes as initial active nodes and spread the information to maximize the influence. In this paper, a new propagation model is proposed that captures the attenuation of the influence in the diffusion process and quantifies the influence and its diffusion. Based on this propagation model, we develop an efficient heuristic algorithm to maximize the influence.

According to the network structure information of each node, we develop a three-hop propagation path of each node and quantify the influence of each node. By using the influence value of nodes, the algorithm iterates to find the top-k most influential seed nodes. This algorithm takes into account the actual condition of attenuation and maximizes the use of the network structure information to optimize the algorithm to achieve time efficiency.

The main contributions of this paper are as follows:

  1. (1)

    During actual information dissemination, the intensity of information dissemination weakens over time. Therefore, the objective is to capture the factors of influence attenuation and establish a three-hop information attenuation propagation model.

  2. (2)

    For the purpose of influence maximization, the greedy algorithm has high accuracy for simulating information dissemination and can obtain a good seed set. But the time efficiency of the algorithm is very low. In contrast, some heuristic algorithms have good time efficiency but their accuracy is lower than that of greedy algorithms. Therefore, we investigate the effectiveness of the three-hop method for information dissemination simulation and propose a heuristic influence maximization algorithm based on the new model and the three-hop method. The proposed method not only improves the time efficiency but also the quality of the seed set.

The rest of this paper is organized as follows. Section 2 describes the related work. Section 3 introduces the proposed propagation model. Section 4 provides the maximization algorithm proposed in this paper. Section 5 presents extensive experiments and comparisons, and finally section 6 is the conclusion of this paper.

2 Related work

Maximizing influence is the problem of finding k nodes with the greatest influence on the information dissemination model, and Kempe et al. [8] proved that this problem is a NP-hard problem under various dissemination models. The problem of maximizing influence can be divided into two categories: greedy algorithm [1,2,3, 15, 16], and heuristic algorithm [4,5,6, 17,18,19]. Greedy algorithm is proposed by Kempe et al., but it is very time-consuming. In order to improve the time efficiency of greedy algorithm, a large number of heuristic algorithms have emerged. Chen et al. [3] put forward DegreeDiscount algorithm based on the degree. The idea of this algorithm is that when node v has a neighbor node to be selected as a seed node, a specific discount should be made in calculating its degree, to improve the phenomenon of seed node aggregation and make the seed nodes disperse in the network. Guo et al. [9] seeded node search algorithm in a dynamic network graph. Its basic idea is to change the network graph over time. The current moment is based on the seed set of the previous moment, and establish the update strategy of the seed set. The disadvantage of the algorithm is that it only considers the increase and disappearance of network edges and the change of weights, but does not consider the influence of the increase or decrease of nodes.

However, some particular problems of maximizing influence need to be considered, such as time cost, money cost and different characteristics of information dissemination for different disseminators. Bo Liu [10] et al. proposed the problem of maximizing the influence under time constraints. The algorithm uses an improved IC model to take into account the change of propagation probability over time. Manuel Gomez-Rodriguez et al. proposed a method for influence maximization named INFLUMAX that takes the temporal dynamics into account during diffusion processes [17]. Huy Nguyen [11] et al. proposed the influence maximization algorithm with money budget. The algorithm improves the traditional greedy algorithm and uses Bayesian network to infer other probabilities based on some variable information to solve the uncertainty of the information dissemination process. Shi et al. took an online–offline interactive setting into account for influence maximization, so they proposed the location-driven influence maximization (LDIM) problem, in which they aimed to find the optimal offline deployment of locations and durations for hold events in order to maximize the online influence spread [20]. Lichao Sun et al. thought that influence propagates in multiple rounds independently from possibly different seed sets, and studied the Multi-Round Influence Maximization (MRIM) problem [21].

To solve the problem of information transmission attenuation in the problem of maximizing influence, Wang et al. [7] proposed an algorithm that takes into account the attenuation of information over time in the process of information transmission when modeling the model of information transmission. In the Ref. [12], the authors proposed a fast simulation method for the problem of maximizing the influence, and pointed out that the simulation of two-hop information propagation was an effective method for estimating the influence diffusion of nodes. In Ref. [8], the authors used 2-hop influence propagation simulation to find the optimal seed node set. Gong M. et al. [13] proposed the problem of maximizing the influence based on the recommendation of friends. The dissemination and interaction of the influence of the close friends of users were analyzed. Based on the analysis of this problem, an algorithm of maximizing the influence was proposed. The experiment in Ref. [14] also demonstrates the validity of the final result. It shows that the transmission of information in the network has a certain attenuation, and the simulation of 3-hop information transmission is accurate.

Computing influence in IC model is an NP-hard problem, which requires running Monte-Carlo simulation. Under the huge network, the events that need to be consumed are incalculable. In Ref. [12,13,14], the proposed simulation algorithm can solve this problem very well. Based on the above research and analysis, the proposed algorithm constructs a three-hop propagation path for each node to quantify the influence of each node. That is to say, considering the attenuation of influence, the network structure information is maximized, and the time efficiency problem can be well solved.

3 The three hops velocity attenuation propagation model

Generally, a network is abstracted as a directed graph G(V, E). V represents the set of nodes, a node represents a user or a group, E represents the set of edges, and each edge represents the relationship between two users. |V| = N represents the number of nodes. Each node has two states: active and inactive. Active nodes affect inactive neighbors, which transforms inactive nodes into active nodes. This process continues, and more and more nodes change from the inactive state to the active state, which represents the process of information dissemination.

In the process of maximizing influence, the main models of transmission are the independent cascade (IC) model and linear threshold model, as well as the infectious disease model and aerodynamic molecular diffusion model. Each model has its own characteristics. For example, the independent model is a probabilistic propagation model. A node u tries to activate its inactive neighbor node V with a certain probability; therefore, the result of the next simulation of information propagation may be different from that of the last activation. In Ref. [1], a Monte Carlo simulation model was used to solve this problem.

The Monte Carlo simulation is a statistical method invented by the American mathematicians Von Neumann and Ulam. We simulate information propagation based on the IC model and simulate it 10,000 times similar to most other algorithms.

In the traditional greedy algorithm, if we want to identify k seed nodes, we need to find the most influential nodes to join the seed set in each round and k rounds are required. In each round, we need to simulate the information transmission in the network. Each simulation uses a Monte Carlo simulation (assuming T = 10,000) and the time complexity is defined as O(k ∗ n ∗ 10000). Therefore, the method has a very long running time. The proposed algorithm has better time efficiency.

3.1 Propagation velocity

In social networks, the influence of users on neighbors has considerable variability. Initially, information travels quickly between users. With the passage of time, users who are far away from the information dissemination center will receive and disseminate information slowly. Therefore, information dissemination slows down over time and with increasing distance. In this study, the speed of information transmission will be defined.

In social networks, the farther the distance from the information source, the weaker the relationship between the node and the source is. Correspondingly, the response to information becomes slower and the acceptance degree becomes smaller [7]. Assuming that the propagation path is (i, j, k, l), the propagation speed decreases with increasing distance. The distance attenuation factor μ is defined as follows:

$$ \mu =\frac{1}{d^2} $$
(1)

where d denotes the distance between the nodes. The larger d is, the lower the speed.

In addition, the influence on social networks will weaken over time. For example, when we use an excellent product, we are very excited and eager to recommend it to our friends. This recommendation will significantly affect how many people purchase the product but over time, this interaction will weaken. That is, the propagation speed decreases over time. The time attenuation factor φ is defined in Eq. (2):

$$ \varphi ={e}^{-\lambda t} $$
(2)

where λ is a parameter that is network specific and its value is defined according to the network or information that is disseminated. We obtained a reasonable value for λ through experiments. Given a reasonable value of λ, the speed attenuation cannot be increasing or decreasing sharply during the three-hop process, and the obtained propagation effect is relatively better. In this study, the value of λ is set to 1. t is the delay time; when the information is just beginning to spread, t tends to 0 and φ tends to 1. At this time, the influence is not affected by the time delay and has not weakened.

Barabasi et al. demonstrated that human behavior is highly heterogeneous [15]. With the development of data processing and mining capabilities, researchers have discovered different distribution patterns by using large amounts of data, for example, the exponential distribution and power law distribution [22,23]. Note that the probability function of the Poisson distribution is \( \mathrm{P}\left(X=k\right)=\frac{\lambda^k}{k!}{e}^{-\lambda } \). The Poisson distribution is a discrete probability distribution. When X = k = 0, P(X = 0) = eλ. The exponential distribution function is P(τ) = λeλt. The power rate distribution function is P(τ)~τλ. These distributions have similarities - indices. In this study, the time decay factor φ = eλt is its post-deformation application.

The influence propagation rate v is defined based on the distance attenuation factor and time attenuation factor.

$$ \kern0.5em {v}_i^j=\frac{1}{d^2}\ast {e}^{-\lambda t},\mathrm{d}=1,\mathrm{t}=0 $$
(3)

The propagation path (i-j-k-l) (Figure 1) affects the influence propagation velocity \( {v}_i^j \) between the source point i and the node j, where d = 1, t = 0, and velocity \( {v}_i^j \)=1. Since the influence has just started from the source, the influence does not decay.

Fig. 1
figure 1

Propagation path (i-j-k-l)

$$ {v}_i^k=\frac{1}{d^2}\ast {e}^{-\lambda t},d=2,t=1 $$
(4)

where the influence is transmitted to the node k via the node j, and there is a certain distance attenuation and time decay. Assuming that λ is equal to 1, then \( {v}_i^k \)=\( \raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$4e$}\right. \) is a number between 0 and 1.

$$ {v}_i^l=\frac{1}{d^2}\ast {e}^{-\lambda t},\mathrm{d}=3,\mathrm{t}=2 $$
(5)

where \( {v}_i^l \) represents the rate at which the influence propagates between node i and node l. When d = 1 and t = 0, \( {v}_i^j=1 \), the influence does not diminish. Here all the rate values v are equal to values less than or equal to 1. When i node is the initial node of influence propagation, the value of v is equal to 1. The other rate values are less than 1. When the rate is multiplied by the influence of the corresponding node, the influence of the node is less than that of the original value and the corresponding influence will decay. The influence from node i to node j, k, l, where d = 3, t = 2, and the influence propagation speed is \( {v}_i^l \)=\( \frac{1}{9{e}^2} \).\( {v}_i^l=\frac{1}{9{e}^2}<{v}_i^k=\frac{1}{4e} \). When d = 4, t = 3, and the influence propagation speed is \( \frac{1}{64{e}^3} \), which is much smaller than \( {v}_i^j,\kern0.5em {v}_i^k, and\ {v}_i^l \), we use three hops for the propagation speed.

Then, on the propagation path (i-j-k-l), the corresponding objective function of the influence is defined in Eq. (6):

$$ \upsigma \left(\mathrm{i}\right)={\mathrm{w}}_i+{w}_j\ast {v}_i^j+{w}_k\ast {v}_i^k+{w}_l\ast {v}_i^l $$
(6)

where wi, wj, wk and wl represents the degree of node i, j, k and l respectively.\( {v}_i^j \) is the velocity from node i to node j.

3.2 The three hops velocity attenuation propagation model

The propagation process in the three-hop velocity attenuation propagation model can be summarized as follows: at time 0, the influence begins to spread from an initial seed node to its neighbor node, at which time the influence does not attenuate and the propagation rate is 1. Next, the information propagates from the neighbor nodes to their neighbor nodes. At this time, the influence decreases and the propagation rate is not zero. This process is repeated until the influence propagates to the initial seed node’s three-hop neighbors. At this time, although the communication process is not complete, the model no longer simulates the propagation beyond three hops because the influence propagation can be neglected later.

Suppose that the information is transmitted from the source point through the path P(u, x, t) (0 < t < 4), where u represents the source point and x is the node at which the information arrives at time t. In this study, the node influence σ is defined as the sum of the influences of the node it activates. The influence on one path p is\( {\sigma}_p=\sum \limits_p{w}_x \) and the influence on all paths is \( {\sigma}_{\sum p}=\sum \limits_{x\in \sum p}{w}_x \). According to the analysis and calculation described in the previous section, it can be concluded that the influence of the corresponding nodes is attenuated by the extension of the propagation distance and the extension of the propagation time. Therefore, wx in \( {\sigma}_p=\sum \limits_{x\in p}{w}_x \) needs to be modified according to the time coefficient and distance coefficient,\( {w_x}^{\prime }={w}_x\ast {v}_d^t \), and v differs and depends on the extension time and distance.

$$ \overline{\sigma_{\sum p}}=\sum \limits_{x\in \sum p}{w}_x\ast {v}_d^t $$
(7)

In order to maximize the influence, the calculation of the influence of the seed set is repeated; for example, the set S of the seed nodes is {a, b...}. The influence range of node a is σ1 and that of node b is σ2, where σ1 and σ2 satisfy |σ1 ∪ σ2| ≤ |σ1| +  ∣ σ2∣. When the left side is less than the right side, this is the coverage phenomenon in the calculation of influence. Therefore, we solve this problem by deleting a duplicate path or node on the information transmission path, where \( {\sigma}_{p^{\prime }}=\sum \limits_{x\in {p}^{\prime }}{w}_x\ast {v}_d^t \) is satisfied and \( {\sigma}_{\sum {p}^{\prime }}=\sum \limits_{x\in \sum {p}^{\prime }}{w}_x\ast {v}_d^t \) is satisfied in all paths.

The propagation process of the proposed three-hop velocity attenuation propagation model is as follows: the activation probability of a node vi to a node vj is pij:

  1. (1)

    At time t = 0, we begin with the initial seed set S.

  2. (2)

    At t = 1, the active node vi(vi ∈ St − 1) in the set tries to deactivate its inactive neighbor vj with probability pij. If vj is activated successfully, then vj is added to the collection S1 and the process is repeated.

  3. (3)

    At t = 3, the propagation process ends.

In the simulation of the propagation process, the objective function of the influence is:

$$ \upsigma \left({u}_i\right)={\sigma}_{\sum {p}^{\prime }}=\sum \limits_{\mathrm{x}\in \sum {p}^{\prime }}{w}_{\mathrm{x}}\ast {v}_d^t=\sum \limits_{u_i\in N\left(\mathrm{x}\right)}{w}_{u_i}\ast {v}_d^t+\sum \limits_{u_j\in N\left({u}_i\right)}{w}_{u_j}\ast {v}_d^t+\sum \limits_{u_k\in N\left({u}_j\right)}{w}_{u_k}\ast {v}_d^t=\sum \limits_{i=1}^3{w}_x\ast {v}_d^t $$
(8)

where wi is the degree of node i. The equation indicates that the greater the speed, the smaller the attenuation of the influence is, and vice versa. Therefore, in the simulation of in the influence propagation process, the proposed model not only considers the attenuation factors of the information transmission but also makes rational use of the characteristics of information transmission. The algorithm improves the time efficiency and quantifies the influence of the nodes by using the structural characteristics of the nodes.

4 The algorithm of maximizing influence based on the three-hop velocity attenuation propagation model

The pseudocode of the algorithm IMMRA is shown in Algorithm 1.

figure a

In the influence maximization algorithm based on the rate attenuation propagation model (IMMRA) (G = (V, E), k), the first row assigns the initial seed to null and the second row defines the set of node states. The third line to the fifth line calculates the degree of all nodes in the network and the sixth to fourteenth line loops to find the top-k seed nodes. The fifteenth line calculates the influence of the seed set. The influence spreads from the Sk set and the number of nodes is activated at the end of the propagation. There is no three-hop communication until the end of the communication process. Next, the seed set and its influence are output and the algorithm is complete.

Algorithm 1 combines a greedy algorithm with a heuristic algorithm. Each round still prioritizes the current optimal node and iterates multiple times using a greedy strategy. This algorithm uses a propagation model based on three-hop velocity attenuation, which effectively reduces the running time and improves the time efficiency of the algorithm. The time complexity is O(k*T*3) and it is evident that the time complexity O(k*T*n) is tens of thousands times lower than that of the greedy algorithm. Moreover, the larger the network scale, the larger the multiplier of the operational efficiency is. The algorithm is based on the greedy algorithm and includes a suitable and effective heuristic strategy, thereby combining the advantages of both approaches and improving the time efficiency and accuracy of the algorithm.

5 Experiments

5.1 Experimental design

For the experiment, we used the DBLP dataset and Facebook dataset. The DBLP dataset is a collaborative network dataset. The nodes in the network represent the authors and the edges represent the cooperative relationship between the two authors. The number of nodes in the DBLP network is 1000 and there are 9887 edges; there are 3722 nodes and 118,117 edges in the Facebook network (Table 1).

Table 1 Statistics of our datasets

In the experiment, data preprocessing consists of re-arrangement of the data. The proposed IMMRA algorithm is used to select the seed nodes. The proposed three-hop velocity attenuation propagation model is used for the simulation of information propagation. In order to validate the performance of the proposed algorithm, the classical greedy algorithm is used as a comparison. Compared with the greedy algorithm without improvement and branch reduction, the accuracy of information simulation is high and the overall results are accurate for the proposed algorithm. Many improved mountain climbing algorithms suffer from corresponding effects to a certain extent. The proposed algorithm also improves time efficiency.

5.2 Experimental results

The evaluation index of the influence maximization algorithm in social networks describes the performance of the algorithm. The number of nodes at the end of propagation is largest for the initial propagation seed set searched by the algorithm. In the following figures, the X-axis represents the number of seed nodes k and the Y-axis represents the range of influence. Different k values were used for the two data sets to compare the performance of the algorithm.

In Figure 2, the square line segment shows the results of the traditional greedy algorithm for different K values and the line with the star markers shows the results of the IMMRA algorithm for the DBLP dataset. For K values of 2, 3, 4, and 5, the number of results of the two algorithms is small and the difference is small; therefore, the results are similar. As the k value increases, the ordinate value also increases but the difference between the results of the two algorithms and the ratio of the k value has not changed much. For example, the influence of the greedy algorithm is 429 and the influence of the IMMRA algorithm is 421 when k is 5 and the ratio of the difference to k is 1.6. The influence of the two algorithms is 861 and 848 when k is 30, with a rate of 0.433. As shown in Table 2, when the k value is very small, the influence of the different seed sets does not change much. As k increases, the ratio of the difference to k increases slightly. However, when k increases sharply, the ratio decreases gradually. Therefore, the accuracy of the proposed algorithm is similar to that of the traditional greedy algorithm.

Fig. 2
figure 2

Experiments on DBLP datasets

Table 2 The ratio of influence difference to k value

Figure 3 shows the time efficiency of the traditional greedy algorithm and IMMRA algorithm for the DBLP dataset. It is evident that the running time of the IMMRA algorithm is much less when searching for the same number of seed nodes in the same network, indicating that the proposed algorithm provides similar accuracy but is more time efficient than the greedy algorithm.

Fig. 3
figure 3

Runtime comparison on DBLP Dataset

Figure 4 shows the experimental results of the traditional greedy algorithm and the proposed algorithm for different k values for the Facebook dataset. Only slight differences are observed in the results of the two algorithms. The trend of the dashed line shows that the influence of the seed set does not differ for the two algorithms when the k value is small. With the increase in the k value and the expansion of the network scale, the results of the proposed algorithm approach the results of the traditional greedy algorithm.

Fig. 4
figure 4

Experiments on FACEBOOK datasets

Figure 5 shows the time efficiency of the algorithms for the Facebook datasets. The upper segment is the time efficiency of the greedy algorithm. The time efficiency of the proposed algorithm is much better than that of the greedy algorithm for the same k value. As the k value increases, the greedy algorithm exhibits exponential growth. When k reaches a certain value, it cannot describe its time; therefore, the blue line segment is represented by dotted lines. When the value of k is large, the running time can be measured and the trend of the curve is smooth. These results demonstrate that the time efficiency of the proposed algorithm is significantly higher than that of the traditional greedy algorithm for the same k value.

Fig. 5
figure 5

Runtime comparison on FACEBOOK Dataset

6 Conclusion

In this study, the influence maximization in social networks is investigated by simulating the influence propagation and determining the attenuation of the information transmission. Although the traditional greedy algorithm provides high accuracy, it is very time-consuming, making it unsuitable for use in large-scale networks. The mountain climbing algorithm has better time efficiency but relatively low accuracy. We proposed a three-hop velocity attenuation propagation model based on the traditional greedy algorithm. The three-hop approach is used for simulating the information dissemination and reduces the required information to improve the time efficiency. In the propagation simulation, the sum of the modified degrees of the active nodes is used to measure the influence of a node. This not only reflects the influence of the node but also compensates for the lack of influence of the three-hop simulation. The experimental results of two example datasets show that the proposed algorithm provides good performance and confirms the feasibility and validity of the proposed model.