Keywords

1 Introduction

Since the recent emergence of social media such as Facebook and Twitter in the past few years, more and more people pay attention to social networks. A social network is a social structure composed of a set of social actors where nodes represent individuals or even other entities embedded in a social context and where edges reflect the interaction, cooperation or influence between entities.

People’s relationships are continuously evolving, new edges and vertices are added over time to the graph and old ones can be removed. Social networks are highly dynamic. As a key issue of social networks, prediction has attracted more and more attention, as prediction of links is important for mining and analyzing social network evolution.

One of the interesting issues addressed in social network analysis, allowing the understanding of the evolution of social networks, concerns the problem of prediction. It consists of predicting future associations between a pair of nodes knowing that there is no link between them in the current state of the graph.

Social networks are not static but are dynamically changing at an exponential rate with regular changes (e.g. nodes and edges additions/deletions). Similar to the regular network structure changes, node attributes often change automatically, the modification of online user posts is a classic example. With both topology and attribute changes, we refer to such networks as dynamic social networks. The aim of our work is to quantify the influence of individuals within a period of time by using a new approach and to find influential individuals in a such manner that we can predict the influence of each user with a high precision. The contributions of this paper are:

  1. 1.

    To propose an incremental algorithm to detect influential nodes taking into consideration the structural evolution of social networks,

  2. 2.

    To present a new influential nodes prediction method for dynamic networks based on the egocentric networks detected in the first contribution and the semantic similarity of the nodes extracted from those networks.

The rest of this paper is organized as follows. In Sect. 2, we discuss the related work. Our proposed method is given in Sect. 3. Experimental results are presented in Sect. 4. Finally, we conclude our work and present some future work in Sect. 5.

2 Related Work

This section discusses the review of different researches done on the prediction problem in the literature. In this paper, we have classified the problem of prediction into two approaches namely: Link prediction and User influence prediction like described in Fig. 1. We have classified the existing approaches based on the different measures used in the prediction problem (see Table 1).

Fig. 1.
figure 1

Approaches of prediction problem

2.1 Link Prediction Approaches

The link prediction problem has several applications [2], such as link analysis, bioinformatics, information retrieval, tracking temporal topics [17] or the identification of influential nodes [7]. In particular, predicting future links is useful for understanding the network evolution. The evaluation measures used in the field of link prediction are embraced from other fields of research, i.e. classification, retrieval information [16].

The estimation of future connections is useful for the understanding of the evolution of the network and communication. In social media networks, for example, promising links that do not yet exist will facilitate user engagement and interaction, which also affects the structure of the network. In addition, the network structure influences the interaction or the spread of information. For example, future friendships could be predicted when analyzing social networks or predicting future co-authors in a collaborative network. Existing link prediction approaches can be divided into similarity based and learning-based approaches. Similarity based approaches consist of measuring the similarities between a pair of nodes through various similarity metrics, and using the similarity scoring as a classification to predict the relation between two nodes in the future.

In [3], the authors show that the analysis of user-to-user evaluations can be considerably reinforced by taking into account the similarity of user characteristics such as the degree to which their contributions to the site involved similar content, or involve interactions with a common set of other users. Learning-based methods have difficulties in selecting features and unbalancing performance groups, and are affected by computational costs and capacity constraints, so it is not ideal for large and dynamic networks. In order to predict the relation, the authors [15] evaluated six different social attributes in the context of face-to-face communication networks. Language and country are the two attributes that play an important role in communication prediction. They have observe that people prefer to contact those people who are similar in language and region.

Some traditional machine learning models including classifiers (Markov chains, SVM, etc.) and probabilistic models (Markov Random Fields, Bayes model, etc.) can thus be used to solve this problem. Li et et al. suggested a graph-based learning model using profile features (i.e. book title, education, age, introduction, keywords, etc.) to predict a connection in the bipartite network between user and item [10]. In order to overcome the limitations of the Learning-based methods, Aggarwal et al. [20] propose a model for predicting links with spatial and temporal precision (LIST), to predict links in series of networks over time. LIST characterizes the structure of the network as a function of time, which includes the spatial topology of the network at each time and the temporal evolution of the network. LIST integrates network propagation and temporal matrix factorization techniques.

The availability of the overall network structure is generally assumed by existing link prediction algorithms. However, this assumption is unfeasible since real-world networks are always large-scale and measured in terabytes or even petabytes. This fact makes them difficult to store and recall before link prediction takes place.

Table 1. Classification of measures used in the prediction problem

2.2 Social Influence Prediction Approaches

Due to the massive data and the public availability of the popular social media system, researchers recently became interested in studying the influence of users on social networks. The study of the issue has a reasonable meaning in our real life. It helps marketing strategies to target the most influential people to maximize the selling process.

Social influence is everywhere, not only in our real life, but also on social media. The term social influence usually refers to the phenomenon according to which one person’s emotions, opinions, or behaviours are affected by others.

All the existing mesaures allow users to be ranked at a certain instant or period of time from existing data. Over time, these rankings can change considerably, leaving a trail of historical data that can be used to make social influence predictions. To predict influential users, we can extract information from metrics such as the ones given in Table 1. Some studies have estimated user influence from the network structure perspective to measure social networking potentials of social media users. As shown in [12], people tend to create new relationships with people that are closer to them on a social graph.

Users, in online social networks, not only make new friends but also seek and share information. Users tend to create relationships with people that are similar to them along certain profile attributes, such as gender, education and religion. When a user shares a message, his/her contacts can be influenced to re-post that information. A new idea to quantify user influence is introduced in [21]. The user’s influence is described as the potential of his actions that motivates others to republish or respond to his messages. This description calculates influence by taking into account both the quantity of posted messages and their popularity. In [1], the authors suggest a novel user similarity for the evaluation of social networks regarding to network structure and profile attributes. The authors implement two distinct similarity metrics, namely network and profile similarity, and demonstrate how these two measures can be combined to find user similarity.

Qiu et al. [14] developed an end-to-end framework named DeepInf, motivated by the recent success of deep neuron networks in a wide variety of computing applications. DeepInf uses the local network of a user to learn its latent social representation as an input into a graph neural network. To integrate both network structure and user features into a neural network, the authors design strategies. Given the active action of the near neighbors of a user and their local structural details, the objective of the authors is to predict the user’s action status.

2.3 Preliminaries and Problem Statement

The prediction is an important problem in social networks. Many of the existing approaches attempt to predict interactions between individual in static networks, ignoring the dynamic structure of social networks. In this paper, we propose a prediction method that explores the dynamic topology of social networks.

Given a set of influential nodes identified in our first phase, our aim is to predict the collection of influential nodes that can be generated in the future. In the first phase of our proposed approach, we study the evolution of the network between t1 and t2. In the second phase, we look to predict with precision taking into consideration the structural evolution of the graph the influential nodes that will be modified during the interval time t1 (or t2) at a given future time t. Our aim is to predict the presence of an influential node in the new set of nodes at the time \(t+1\), taking into account the node’s features. We denote the dynamic social network as a sequence of networks at different timestamps: \(DSN=\left\{ G^{1}, G^{2},...,G^{t}\right\} \), where: \(G^t\) represents the snapshot of network at time t, and \(G^t\) = (\(V^t\), \(E^t\)), whith \(V^t\) denoting the set of nodes in \(G^t\) and \(E^t\) denoting the set of edges in \(G^t\). We will assign a set of interests for each user that will describe it. Such interests are described as an attributes vector \(X_{i}=\left\{ x_{i1},...,x_{ij} \right\} \), where \(X_{ij}\) is the value taken by the attribute j of the vertex \(v_{i}\). In this work, we adopt the following definitions:

Definition 1

Influential Nodes (\(Inf^{t}\)). It represents the set of influential nodes detected from \(G^{t}\) after a succession of updates. Therefore \(Inf^{t}=\left\{ S^{t}_{1}, S^{t}_{2},...,S^{t}_{k^{t}}\right\} \), where: \(k^{t}\) represents the number of influential nodes in \(G^{t}\) and \(S^{t}_{i}\) represents the \(i^{th}\) nodes in \(G^{t}\).

Definition 2

Subgraph (\( SubG^{t}\)). It represents the set of subgraphs in the time interval \(\varDelta \) \(G^{t+1}\) from t to \(t+1\). These subgraphs are composed by nodes and edges added/removed over a well determined time interval.

Definition 3

Influential egocentric network \(G'=(V',E')\). It represents the influential area as an aggregation of egocentric networks detected from DSN based on \(Inf^{t}\). The egocentric network contains an “ego” which consists of the influential nodes and nodes influenced by the ego which are called “alters”.

3 Semantic and Structural Influential Nodes Prediction

The proposed approach looks to detect the most influential nodes in dynamic social networks. It is interested in the structural and semantic aspects of the network. For this reason, the main idea is to propose a two-phase approach. Indeed, the first phase of our approach explains the structural evolution of the network, the second phase focuses on the semantic aspect by presenting the proposed prediction model.

3.1 Phase 1

In the first phase, by applying metrics, we begin to identify the influential nodes detected in the original graph. During the second step, we attempt to detect the change in the structure of the network since the network is dynamic then the edges and nodes change.

It’s very expensive to measure influential nodes from scratch after each update, which inspires us to develop a structural approach to updating influential nodes in dynamic social networks. The main objective of this phase is to identify the different subgraphs detected between two different timestamps. Based on the relation between the subgraph observed in time t and the previously observed influential nodes, we propose three types of changed elements.

To start, the proposed approach detects the influential nodes in the original graph. In the first step, we propose to use SND algorithm [6]. On the one side, it exploits the relationships between the network’s nodes and, on the other side, the attributes characterizing them. In the second step, we seek to detect the change in the structure of the network. As the network is dynamic, the edges and nodes evolve over time. Thus the already identified influential nodes in the original graph will change over time. In the third step, we present our updating strategies for the subgraphs that have been observed between two consecutive timestamps.

Fig. 2.
figure 2

Example of an egocentric network

In this step, we propose to divide the social network observed in time t into a collection of egocentric networks G’ that are connected together to better classify the changed areas. Every egocentric network contains a node of “ego” (influential node) and nodes affected by (and between) this ego, called “alters” (see Fig. 2). We can observe the influential region by using the egocentric network. We define the following three strategies to update the influential node based on the observed subgraphs.

  1. 1.

    Strategy to update the completly separate type

    In this type all \(SubG^{i}\) added nodes are new ones. We just consider the new v \(\in \) \(Inf^{t+1}\) added node if they change the marginal gain of any node that is not in \(G^{'}\).

  2. 2.

    Strategy to update the Completely reliant type

    In this type, edges added between nodes are considered. Second, we use egocentric networks to identify the influential nodes. Then, to classify the influential nodes affected by this update, we caculate the closeness centrality [13] between added edges and ego nodes. Finally, we change the influence degree of the ego node. Equation 1 is used to calculate the influence degree of a node where \(V_{u}\) is the set of nodes influenced by u in \(G^{t}\) and \(N^{t}\) is the set of nodes in \(G^{t}\).

    $$\begin{aligned} \begin{array}{lcl} \displaystyle \sigma (u)=\frac{V_{u}}{N^{t}} \end{array} \end{aligned}$$
    (1)
  3. 3.

    Strategy to Update the Mixed type

    In this type, we consider adding/removing new and old nodes. Thus, we need to measure the average relation strenght of nodes in N(v) (respectively \(N'(v)\)) with each node in G (respectively \(G'\)), where N(v) denotes the neighbors of v and \(N'(v)\) is the set of neighbors of v in \(G'\). The value of \(sim^{t}(v, u)\) represents the Jaccard’s coefficient used to calculate the semantic similarity between two nodes v and u. If the division of \(S_{v}^{G'}\) (see Eq. 2) by \(S_{v}^{G'}\) (see Eq. 3) exceeds 1, then we added v to \(G'\) and we change the influence degree of the ego node.

    $$\begin{aligned} \begin{array}{lcl} \displaystyle S_{v}^{G'}=\sum \left( \frac{\sum _{u\in N'(v)}sim^{t}(v,u)}{\left| N'(v) \right| } \right) \end{array} \end{aligned}$$
    (2)
    $$\begin{aligned} \begin{array}{lcl} \displaystyle S_{v}^{G}=\sum \left( \frac{\sum _{u\in N(v)}sim^{t}(v,u)}{\left| N(v) \right| } \right) \end{array} \end{aligned}$$
    (3)

3.2 Phase 2

In this phase, our aim is to collect the required data that our learning model can be trained. In the first phase and based on the observed subgraphs between two timestamps, we tried to update the influential nodes. The proposed prediction model is described in Fig. 3. The formation of new influential nodes is predicted from the topology of a social network obtained from the evolutions of the network during the test period. In the data acquisition process, an egocentric network (output of our first contribution) is generated based on the observed influential nodes. From this input graph, information relating to the influential nodes found in the egocentric network is extracted. Our proposed approach is summarized as follows: First, we apply SND algorithm to identify the influential nodes in the original network. Second, we adopt our proposed strategies to update the influential nodes based on the structure evolution of the social network between two consecutive timestamps. Then, we propose a model for the prediction of the future influential nodes via exploring the semantic aspect of social networks Fig. 3.

Fig. 3.
figure 3

The influence prediction model

After updating the graph, we identify the influential node A as shown in Fig. 4. We extract respectively the two observed egocentric networks of A egocentric network 1 and egocentric network 2. We suppose that A is an influential node. We have 2 influential area of A. The node A has 10 friends observed in two areas (Black nodes). We suppose that we have two strangers nodes u and v. Our objective is to predict, based on the semantic similarity between the influential nodes A and the two strangers’ ones, which is the favourable node that can be added to the influential area of A. To do that, we need to calculate the semantic similarity between the influential node A and the two stranger nodes u and v. We have associated the link between two nodes of the egocentric network with the weight which is defined by the semantic similarity of their information given in the following equation:

Fig. 4.
figure 4

Illustratif example of the semantic aspect

$$\begin{aligned} \begin{array}{lcl} \displaystyle sim^{t}(x,y)=\frac{\left| n_{x}\cap n_{y} \right| }{\left| n_{x}\cup n_{y} \right| } \end{array} \end{aligned}$$
(4)

Semantic similarity compares the center of interests stored in the attributes vector associated with two social network users, to determine how much they are similar. The easiest way is to compute the semantic similarity of all the paths between an ego node I and a stranger node x (see Eq. 5).

$$\begin{aligned} \begin{array}{lcl} \displaystyle NS(I,x)=\frac{Log(\sum _{Paths \in E^{'}_{(I,x)}}sim(I,x))}{Log(2.\sum _{Paths \in E^{'}_{(I)}}sim(I))} \end{array} \end{aligned}$$
(5)

Stranger u has 3 common friends with user A and the egocentric network contains 9 edges, stranger v also has 3 common friends with A and 6 edges within the egocentric network. The network similarity between A and u is then NS(A, v) = Log(3.9)/Log(2 * 9.1) = 0.46, while NS(A, u) = Log(5.1)/Log(2 * 9.1) = 0.65. Our metric favors u as it is connected to a stronger influential area around A than v.

4 Experiments

We test the effectiveness and efficiency of the proposed approach on real dynamic networks where each node has been associated with a set of centers of interest. Experiments were conducted to provide a comparison between the proposed structural and semantic approach and those discussed in the literature.

4.1 Datasets

We evaluated the proposed approach on three real-world networking datasets. We assume that the evolution of each network is an evolving network with two timestamps. Table 2 gives information about these networks.

Table 2. Datasets

4.2 Algorithms and Parameters

Algorithms. We compared our algorithm with two algorithms called UBI [4] and Local D&U [18] to improve dynamic influence of social networks. UBI’s main aim is to classify the important nodes based on those previously identified, rather than locate them from an empty set. Local D&U’s main objective is to classify influential nodes in dynamic networks by exploiting a local detection and updating strategy. In this paper we used Local D&U’s first step to calculate the fraction of the activated nodes. Since, it provides nodes with larger degree centrality having a higher influence on their neighbours, then those nodes can be considered as seed nodes [11]. The selected seed nodes can be used to spread the influence based on the relationship strength between nodes because if the neighbours of a node v strongly follow the node then v can be considered as an influential node.

Parameter Settings. For UBI, we vary the maximal number of seed nodes k from 40 to 100 respectively in each timestamp by 20. In our experiments, the marginal gain of a node is selected empirically by a threshold \(\theta \) which vary from 0.1 to 1 by 0.1. A larger value of \(\theta \) leads to a significant change on the influence of the k selected seed nodes. Thus, the marginal gain of a node v depends on the influence of seed nodes over those nodes that v influences.

4.3 Results

Dynamic networks are generated between two timestamps based on each original network and the changed elements. A dynamic network can be generated between 40 and 100 timestamps and updated using the updating strategy and the prediction model.

Evolution of the Influence Degree. We can observe from Tables 34 and 5 that our approach achieves better values of influence degree than those of both UBI and Local D&U algorithms.

Table 3. Email
Table 4. Facebook
Table 5. NetHEPT

We may observe that the values of influence degree obtained with our approach and with Local D&U have shown some variation compared to UBI. The value of the influence degree obtained with algorithm and that of the Local D&U algorithm are close. Local D&U algorithm takes into account the evolution of nodes and edges in each time interval to change the degree of influence of the important nodes. Our algorithm should have a greater degree of control. Thus, the first phase of our proposed approach allows us to cover a large number of influential nodes in any type of network. UBI aims to follow dynamically a set of influential nodes in such a way that the degree of influence is maximized at any time. Therefore, where the current snapshot differs considerably from the previous one, the UBI algorithm reaches a low influence degree.

Fig. 5.
figure 5

Email network

Fig. 6.
figure 6

NetHEPT network

Evolution of the Computational Time. From Fig. 5, we can note that in the incremental calculation process, the computational time of our proposed approach is more stable that results when using the global calculation version of our approach. This can be explained by the smallness number of changed elements between two consecutive timestamps. In this experiment, we compare our approach based on the two calculation process: the global and incremental versions of our proposed approach.

In the global calculation version, we calculate the influence degreee of important nodes globally (in all the network) based on our approach. In the incremental calculation version of our approach, we only need to update the influence degree of the modified elements. From Fig. 6, we can observe that, while using our proposed approach in the incremental version, the computational time varies overall timestamps while using the global calculation version is almost constant. This is due to the considerable variation in the nodes degrees in NetHEPT network. There are multiple nodes of various degrees which are added and/or removed at each timestamp. Thus, the calculation varies with each timestamp. Therefore, in the case of small and large social networks, our proposed solution based on incremental calculation version produces good results.

5 Conclusion

In this paper, a structural and semantic approach to update influential nodes in dynamic social networks is proposed. The main idea is to propose a two-phase approach. Indeed, the first phase of our approach explains the structural evolution of the network, and the second phase focuses on the semantic aspect by presenting the proposed prediction model. Thus, the proposed approach presented in this paper is efficient and effective. Its efficiency is proved with experiments based on the influence degree and computational time on three real dynamic social networks. In future work, we would like to explore the machine learing models to study the dynamic evolution of both the network structure and the user’s features. It is also important to analyze the optimal duration of the training period in dynamic social networks.