Keywords

1 Introduction

In recent years, with the increasing popularity of social networks like Facebook and Twitter, more and more scientists who study the influence maximization problem pay their attention to this field. Indeed this problem has drawn much attention and many researches have proposed various algorithms to detect influential nodes, most of these methods are based on static social networks. However, real social networks keep changing during time. New connections between users are created and some other users lose contact. Since many influential nodes can be detected by carefully studying the relationships among the links, these ones play an important role in dynamic social network analysis.

A social network looks like a graph structure consisting of nodes and edges, where nodes represent users and edges represent the interactions between the connected users. Users develop their connections to each other, while interactions between them vary over time. A changing social network consists of social networks observations at different time stamps \(\left\{ G1,G2,...,\right. Gn \}\) and contains not only a set of relationships between nodes, but also information on how these relationships change in each time stamp.

To resolve the above drawbacks, we propose an extension of SND to tackle the problem of important nodes detection under changing social networks. The remainder of this paper is organized as follows. The related works are reviewed in Sect. 2. In Sects. 3 and 4, the proposed extension SNDUpdate for maximizing the influence propagation is described. The details of the experiments are presented in Sect. 5. Finally, the paper is concluded in Sect. 6.

2 Related Work

This section discusses the review of different researches done on dynamic social networks. In this paper, we concentrate on the problem of influential nodes detection in edges changing social networks.

2.1 Methods Based on a Non-linear Model

Aggarwal et al. proposed [7] the first paper which deals with the problem of information flow authority determination in dynamic networks based on temporary interactions. Moreover, this paper studies both the problem of influence propagation, and that of tracking back from a given pattern of spread the influential nodes. They considered how to discover a set of nodes having the highest influence within a time. They modelled the influence spread as a non-linear system which is very different from triggering models like the Linear Threshold model or the Independent Cascade model. The algorithm in [7] is heuristic and the produced results have not any provable quality guarantee. There are only a few papers focusing on Influence Maximization under changing networks. Aggarwal et al. [7] focuses on finding a seed set at time t, that maximizes the influence spread at some \(\left[ t+\varDelta \right] \) given the dynamics of the evolution of networks over interval time \(\left[ t, t+\varDelta \right] \). In this paper we consider to maximize the influence under a serie of snapshots taken from a social network. Zhuang et al. [18] study the influence maximization problem under dynamic networks where the changes can be only detected by occasionally probing some nodes. The main idea of their paper is how to choose a subset of nodes so that the actual influence propagation is improved.

2.2 Methods Based on Metrics

During the years, some centrality metrics have been proposed to estimate node importance. It is well-known that degree centrality, betweenness centrality and closeness centrality are three basic centrality measures to identify influential nodes. However, all of these centrality measures are designed for static networks and they looked over the feature that networks are usually dynamic. Motivated by this deficiency, to apply those metrics in changing networks we added information about the time of edges evolution.

Kitsak et al. [1] proposed k-shell decomposition and they discover that the influential nodes are those positioned in the core of the network. The k-shell decomposition assigns many nodes in the same K-shell. Basaras et al. [6] proposed an alternative measure, the \(\mu \)-power community index, that is an combination of coreness and betweenness centrality, \(\mu \)-PCI is calculated in a totally localized manner and thus is appropriated for any type of networks ignoring its size or dynamicity. Wei et al. [15] proposed two classes of dynamic metrics to estimate temporal evolution of agents with regard to persistence and emergence. Here, the network activities are measured per time stamp using static network metrics such as degree centrality, authority centrality or clustering coefficients. Ren et al. [14] proposed a new approach to identify influential nodes in a complex network called Evidential K-shell centrality based on edge weight. The authors study only undirected networks, while many surveys could be done on identifying influential nodes in directed networks. As reported by Wang et al. [5], nodes in the network can be evaluated by centrality metrics. Therefore, they defined influential nodes in a dynamic community as nodes which have the most important centrality scores higher than the other nodes and have comparatively long life in this community.

2.3 Methods Based on a Diffusion Model

Chen et al. [8], included time in the influence diffusion model to track influential nodes in changing social networks. The main idea is to choose a budgeted subset to maximize the influence propagation \(\varphi \) time stamps later. However, they assume that the dynamic network is completely observed. This is impossible in many real situations. Ohsaka et al. [16] studied a related problem, maintaining some RR sets over a stream of networks updates under the IC model such that approximation influence maximization can be completed with a fixed probability. Tong et al. [13] proposed the Dynamic Independent Cascade (DIC) model by extending the classic IC model, DIC is able to better take control of the dynamic aspects of real social networks. Liu et al. [4] proposed an incremental approach, IncInf, which can efficiently locate the top-k influential nodes in changing social networks based on previous information. Song et al. [3] proposed Upper Bound Interchange Greedy (UBI) algorithm for the Influential Nodes Tracking (INT) problem, in which they found the seed set that augments the influence under \(G^{t+1}\) based on the seed set \(S^t\) that they have effectively found in \(G^t\). Then they proposed UBI+ algorithm that enhances the computation of the upper bound and achieves better influence propagation. The authors in [9] modelled a changing network as a stream of edge weight updates. In [10], the authors have systematically tackle two essential tasks of tracking \(top-k\) influential nodes. Their goal is to control the error incurred in their algorithm [9], so that even without prior knowledge about the data, they can still obtain meaningful results by setting a relative error threshold.

3 Preliminaries and Problem Statement

In this section, we first introduce the previous diffusion model for static networks. We then present our extension algorithm as a generalization of the influential nodes detection problem to dynamic social networks.

Table 1. Notation explanation

3.1 Problem Statement

The previous algorithm SND aims to detect influential nodes for only static social networks. However, real social networks are dynamic so both the structure and also the influence propagation associated with the edges are constantly changing. As a result, according to the evolution of the network structure and the influence propagation, the leader nodes that maximizes the influence propagation should be changed. In this paper, we model the dynamic social network as a group of snapshot graphs where nodes remain the same while the edges in each snapshot graph change over different time intervals. We denote the snapshot graph as \(G^t=(V, E^t)\), where \(V=\left\{ v_{1},...,v_{i},...,v_{n}\right\} \) represents nodes and E represents edges appearing during time intervals. Notation \(G^t\), \(t=0,...,T\) defines the snapshot graph over time. Our main idea is to identify a set of leading nodes, denoted as \(\mathbb {NL}^t\); \(t = 1,...,T\), that maximizes the influence propagation in each of the snapshot graph \(G^t\). Each user is represented by an attributes vector \(X_{i} = (x_{i1},...,x_{ij})\), where \(x_{ij}\) is the value taken by the attribute j of the vertex \(v_{i}\). In SND this value is binary, either 1 (if the user likes a center of interest) otherwise 0. This approach exploits, on the one hand, the relations between the vertices of the network and, on the other hand, the attributes that characterize them. Table 1 lists the notations to be used extensively in the sequel of this paper.

4 Proposed Algorithm

The main objective of SNDUpdate is to detect the most influential nodes in a dynamic social network. It exploits the structural and semantic aspects of the network. For this reason, the main idea is to propose an approach that contains two phases. Indeed, the first phase of SNDUpdate explores the structural aspect of the network and the second phase concentrates on the semantic aspect. In the sequel of this paper, each user is described by a set of interests that are represented as an attributes vector. In the previous work, the influential nodes are detected in a static social network and thus the weight of the link between two nodes remains unchangeable. In this paper, the network is dynamic so the structure of the network changes and thus the link between two nodes belonging to a snapshot graph \(G^t\) can be removed in the snapshot graph \(G^{t+1}\) which generates a modification in the value of the semantic similarity between two nodes as well as a modification in the set of leader nodes.

4.1 Phase 1: Community Detection

In this phase, we propose to use Combo [11]. This algorithm handles the community detection problem which is able to deal with various objective functions. The majority of search strategies take one of the next steps to improve the quality of partition: merging two communities, splitting a community into two and moving nodes between two distinct communities. Combo covers all these possibilities.

4.2 Phase 2: Influential Nodes Detection

The first part of the phase 2 enables to generate a set of leader nodes. A node is a leader if its degree centrality is greater or equal then its neighbors. The degree centrality enables to measure the total of the node connections with its neighbors. In this paper, we model the dynamic social network as a group of snapshot graphs while the edges in each snapshot graph change over different time intervals and thus the leader nodes change in each snapshot graph. This equation is used to calculate the degree centrality of a node v of a given snapshot graph \(G^t=(V,E^t)\):

$$\begin{aligned} {{dc(v)=\frac{deg(v)}{|V|-1}}} \end{aligned}$$
(1)

In the second part of this phase, we use the diffusion model in [2] to define the inactive nodes that can be activated. In \(G^t\), the link between two nodes is associated with a weight which is defined by the semantic similarity of their information. The semantic similarity related to each snapshot graph \(G^t\) is given by following equation:

$$\begin{aligned} sim_{u,v}^{t}=\frac{Common(u,v)}{long(u)+long(v)} \end{aligned}$$
(2)

Every node v in a partition \(\mathcal {P}=\) \(\left\{ C_{1},...,C_{r} \right\} \) is related to a degree of centrality. Taking into consideration the degree of centrality of each node in \(G^t\) and a set of active nodes, a node v becomes active if the total of the similarity or the total weight of its active neighbors overrides the threshold value dc(v) associated with a node v. Our diffusion model is defined as follows:

$$\begin{aligned} \sum _{u\in \mathbb {A}_{v}}w_{v,u}^{t} > dc(v) \end{aligned}$$
(3)

When we apply our diffusion model in each snapshot \(G^t\), we can get a set of active nodes \(\mathbb {NA}^t\) which allows us to identify the set of influential nodes. Our main objective is to identify from the active nodes those that maximize the influence propagation. In each community, we determine its influence degree which is given by the Eq. 4 and is equal to the number of active nodes in each community (\(V_{active}\)) divided by the sum of nodes in the graph G(V).

$$\begin{aligned} R (C_{r}) = V_{active} / V \end{aligned}$$
(4)

Firstly, the proposed approach determines influential nodes from the community that contains the value of the highest influence degree. Secondly, our objective is to calculate for each active node its closeness centrality given by the Eq. 5. It represents the total of the length of the shortest paths between the node and all other nodes in the graph. The closeness centrality is defined by the following equation:

$$\begin{aligned} CCenter(v_{i})=\frac{1}{\sum _{v_{j}\epsilon V}\left| ShortPath(v_{i},v_{j}) \right| } \end{aligned}$$
(5)

Once we have calculated the closeness centrality of every leader node, we classify these nodes according to an increasing order. The influential node is that admits the highest closeness centrality. The algorithm of SNDUpdate is described in Algorithm 1. Firstly, it detects communities using Combo algorithm. Secondly, on each snapshot graph \(G^t\), it generates a set of nodes playing the role of a leading nodes. Once we have generated the set of the leader nodes that are the initiators, we apply our diffusion model. Finally, we apply our diffusion model to determine the set of active nodes and then identify the influential nodes. In the proposed algorithm we find the set of the active nodes that maximizes the influence within each snapshot graph \(G^{t}\) with updating b (Update b) the function which allow us to update the value of edges over each time stamp t. About the SNDUpdate time complexity, because it is difficult to analyse the complexity of dynamic edges update operations [16], we will only analyze each step in a static snapshot. SNDUpdate consists of T snapshots consisting each one of two phases. The first phase admits a complexity of \(O(V^{2}\log {}M)\) as we used the Combo algorithm. The second phase is composed of two steps. The first generates leader nodes and admits a complexity of \(O(MV^{2})\). The second calculates the temporary shortest path of active nodes and corresponds to a complexity of \(O(E^{t}(NA^t)\log (NA^t))\).

figure a

5 Experiments

In this section, we estimate the effectiveness and efficiency of our proposed SNDUpdate to identify the influential nodes in dynamic social networks on three real networks. The experimental results demonstrate that our algorithm is both efficient and effective.

5.1 Experiment Settings

In our experiments, we use three real social networks as shown in Table 2. For each data set, we used 3 different social networks, updated edges, and thus executed the experiments 3 times. For the 3 examples, however the original networks and updated edges are different, the combination of the 3 groups snapshots is similar to the data set itself. Let b denote the number of edges in each snapshot graph, using different parameters, b and \(\varDelta t\), we can create a family of snapshots with many properties for our next experiments. In Table 2, to simulate dynamic networks, we partitioned all edges into 3 time stamps: 25% of edges in time stamp \(t=5\), 50% of edges in time stamp \(t=10\) and 85% of edges in time stamp \(t=15\). The evolution of edges can reflect that the real-world social networks change rapidly during the considered time periods.

Table 2. Datasets and edge information

Table 2 shows the number of edges in each snapshot graph created from networks. We make a group of snapshot graphs from three networks by changing the number of edges with a constant time difference \(\varDelta t\) = 5 min. The metrics applied in this paper to evaluate the performance of our algorithm are: influence propagation and running time. Influence propagation is the total number of influenced nodes activated by leader nodes and the running time is the time to identify the most influential nodes.

5.2 Experiment Results

In this paper, we compare our algorithm with two static influence maximization algorithm IMM, Degree and one dynamic algorithm DIM [19]. As shown in Figs. 1 and 2, the influence propagation of SNDUpdate algorithm outperforms that of IMM and DIM algorithms on Flixster and NetHept datasets. Varying the network size, Degree algorithm can not detect influential nodes anymore, while SNDUpdate finds better values of the influence propagation than those of both IMM and DIM. According to the results of this evaluation, while increasing the size of the network and updating the number of edges at each time stamp, SNDUpdate is still able to have a better influence propagation. Thus, it covers a large number of influential nodes in any type of networks. As conclusion, on the Flixster and NetHept networks, SNDUpdate is more efficient since the diffusion strategy based on edges change can narrow the search space of influential nodes.

Fig. 1.
figure 1

Detecting influential nodes under NetHept network

Fig. 2.
figure 2

Detecting influential nodes under Flixster network

5.3 Comparison with Methods Based on Metrics

We compared our algorithm SNDUpdate with three algorithms based on metrics which are: K-shell, MDD [17] and CN [12]. Our study is based on metrics to detect influential nodes in dynamic social networks. In the K-shell method, some nodes with large degree are not under consideration. When examining the exhausted degree in the network decomposition, the ranking of these nodes are ameliorated by the MDD algorithm. That is the reason why MDD method outperforms the K-shell method. In spite of the fact that the MDD method can improve the K-shell method in running time, it is still not the best way to deal with the problem of identifying influential nodes. According to this evaluation, when updating the size of the network SNDUpdate still has a better running time than the three other algorithms K-shell, CN and MDD. We explain this result by the fact that our method covers a considerable number of influential nodes in any type of networks. As indicated in Table 3, SNDUpdate is the best one regarding to the running time.

Table 3. Evaluation of the running time in different algorithms

5.4 Comparison with Methods Based on a Diffusion Model

To detect influential nodes, all the algorithms are running under a diffusion model: Independent Cascade model (IC), Linear Threshold model (LT) and the Weighted Cascade (WC). Our diffusion model is based on the semantic similarity between nodes. We evaluate the running time of SNDUpdate, DIM and IMM algorithms on NetHept and Flixster networks in three snapshots. As illustrated in Figs. 3 and 4, the running time of SNDUpdate is better than the other two algorithms on the two networks. Among the three compared algorithms, we can easily find that IMM is very slow, while SNDUpdate performs well in terms of running time on the two networks. The reason is that IMM is a static algorithm so, when the network changes, DIM and SNDUpdate which are two dynamic algorithms run faster than the static algorithm IMM on the two networks. On Flixster network, the SNDUpdate running time increases, but is still better than the running time of DIM algorithm on large dynamic network. On NetHept and Flixster networks, SNDUpdate is faster than DIM algorithm. As conclusion, we can see that SNDUpdate has the best running times. The reason is that we use a diffusion model that improves the influence propagation as well as the running time of SNDUpdate. Note that IMM and DIM cannot detect influential nodes behind any diffusion model because they do not know the influence propagation of each node.

Fig. 3.
figure 3

Running time on NetHEPT

Fig. 4.
figure 4

Running time on Flixster

6 Conclusion

In this paper, we study the problem of identifying influential nodes in a changing social networks. We propose an approach called SNDUpdate algorithm and describe how to integrate it into a classical SND algorithm. The goal of this method is to detect influential nodes in dynamic social networks where edges are evolving over time. To prove the performance of the proposed approach, we compare it with approaches based on metrics and diffusion model in three real networks. Experimental studies carried out on the selected networks shown how it is possible to obtain good results in the detection of nodes that maximize the influence propagation. Empirical studies show that our algorithm has great improvement on the influence propagation compared with IMM, Degree and DIM algorithms. In the future, we plan to predict the change of influential nodes where both nodes and edges evolve in dynamic social networks.