Keywords

1 Introduction

In recent years online social networks like Twitter, Flixster, Facebook, etc., are not only used for communication and users’ interaction but also for marketing and promotion. The advertisers utilize the word-of-mouth spreading of information to promote a new product, idea, innovation, etc. The word-of-mouth spreading of information in social networks leads to potential applications in viral marketing [6], revenue maximization [34], rumor control [4, 38], network monitoring [15], and social recommendation [39]. Pedro and Matt [6] were the first ones to introduce the IM problem in the field of viral marketing and proposed a probabilistic model to identify the most influential users. Kempe et al. [11] considered IM problem as a discrete optimization problem and proved that the problem is NP-hard. They also proposed a hill-climbing approach, viz. greedy algorithm.

There are a lot of users who have several accounts across online social networks simultaneously and have the opportunity to propagate information across networks. Therefore, it is necessary to focus on multiple networks simultaneously to identify influential users accurately. Most of the existing works only focus on individual networks [3, 14, 26, 27]. There is some research done in this direction [18, 24, 40]. These studies consider IM problem across multiple networks and ignore multiple product marketing simultaneously [31]. To the best of our knowledge, Singh et al. [29] are the first to study MIM2 problem by considering multiple products and multiple networks simultaneously. With this, we present a centrality measure to find influential users under MIM2 framework.

Contribution. The major contributions of this paper are as follows.

  • We introduce a new centrality measure to identifying seed users across the network under IM2 framework.

  • We incorporate independent cascade model (IC) [30] for information diffusion. Also, we adopt IM2 framework [19].

  • The experimental evaluation show that the superiority of proposed centrality measure against the compared methods. Also, the results indicate the advantage of using IM2 framework over classical IM.

Organization. The organization of the paper is described as follows. Section 2 presents the insight and development of IM problem. Section 3 explains the model and problem definition. Section 4 presents the proposed work with analysis. Section 5 describes the experimental setup and compares the performance of proposed algorithm with baseline methods. Finally, Sect. 6 draws the conclusion and list some future directions of research.

2 Related Work

Singh et al. [28, 29] categorize the IM problem into four classes based on seed selection problem under (1). single network (IM); (2). multiple networks (IM2), single network with multiple products (MIM); (4). multiple networks with multiple products (MIM2). Pedro and Matt [6] and Kempe et al. [11] were the first to study IM as an optimization problem. The application of greedy algorithm is computationally inefficient due to utilization of time-consuming Monte-Carlo simulations. There has been much research into finding influential users efficiently such as heuristic methods [3, 14, 25, 36], community-based methods [17, 27, 37], path-based methods [2, 12, 13], score estimation based [1, 8, 10], and sampling-based [5, 20, 33].

Influence maximization across networks (IM2) framework tackles the situation when some of the users have multiple accounts across networks and are able to propagate information simultaneously. There are some studies which consider this problem such as [18, 24, 40]. These methods work in two major steps: network coupling and seed selection. The network coupling strategy first identifies overlapping users then couple multiple networks into a multiplex network based on these users. The seed selection process identifies the seed nodes and propagates influence through a diffusion model.

Sun et al. [31] present a novel framework called multiple influence maximization (MIM) which allows IM for multiple products simultaneously. Inspired by the idea of IM2 and MIM, the authors of [29] introduced a unique problem of multiple IM across multiple social networks (MIM2). They present a diffusion degree heuristic to identify influential users under this framework. Table 1 provides and overview and comparison of existing IM algorithms.

Table 1. The theoretical overview of the existing IM approaches

3 Model and Problem Definition

Let \(G=\{G^1,G^2,\dots ,G^l\}\) represents a set of l social networks where \(G^i=(V^i,E^i,W^i); 1\le i \le l\) is an individual network. The node set, edge set, and influence weight set of a network \(G^i\) are denoted by \( V^i,E^i \& W^i\). A single multiplex network was created by coupling these l networks. We utilize independent cascade (IC) model to estimate the overall spread of influence from seed nodes S. In the IC model, there is two states: active and inactive. Initially, all the seed nodes are active and other than seed nodes are inactive. Each active node would have the chance to influence only their neighbors. Once a node becomes active, it will never change its state in the future. If no node is activated in the subsequent iteration, the diffusion process is considered to be completed.

Definition 1

(Problem Definition). Given a set of influence graphs \(G=\{G^1,G^2,\dots ,G^l\};G^i=(V^i,E^i,W^i)\), an information diffusion model, two positive integers k and l, then influence maximization process selects a seed set \(S \subseteq V\) of k users to maximize the influence spread in G, i.e., \(\sigma (S)= argmax_{|S^*|=k \wedge S^* \subseteq V} \sigma (S^*)\).

Fig. 1.
figure 1

The working of proposed framework

4 Proposed Work

In this section, we discuss a centrality measure for IM problem under IM2 framework. Algorithm 1 works in three major steps: network coupling, seed selection and influence propagation. Figure 1 shows the framework of proposed algorithm.

  1. 1.

    Network Coupling. To perform network coupling on multiple networks, we first identify overlapping users across networks. Then network coupling [29] is performed using overlapping users based on network topology and information propagation capability of different relationships. With this, influence weight w(xy) of each edge (xy) in coupled network is estimated as follows.

    $$\begin{aligned} \begin{aligned} w(x,y)= \sum _{1 \le i \le l} A_{x,y}^{i}.P^i; 0\le P^i \le 1, 0\le \sum _{1 \le i \le l} P^i \le l \end{aligned} \end{aligned}$$
    (1)

    where \(A_{x,y}^{i} \in [0,1]\) and \(P= [P^{1},P^2,\dots ,P^l]\) denote adjacency value of (xy) in network i and propagation capability of each relationship respectively.

  2. 2.

    Seed Selection. After network coupling, we identify influential users in coupled network. To perform seed selection, we present a centrality measure to identify the most influential users. The influence of a user is dependent on its neighbors connection (local influence) and its own location in the network (global influence). Pei et al. [21] have stated that the local influence of a user is bound within its two-hop area. Therefore local influence \(I_L(x)\) of a node x can be estimated as follows.

    $$\begin{aligned} \begin{aligned} I_L(x) ={}&\sum _{j=0}^{j=2} I_L^j(x) ={}1+|N(x)|_G+\sum _{y\in N(x)}{|N(y)|_{G\setminus x}}\\ ={}&1+D_G(x)+\sum _{y\in N(x)}{D_{G\setminus x}(y)}\\ \end{aligned} \end{aligned}$$
    (2)

    where \(D_G(x)=D(x)\) represents degree of node x in network G. Local influence can be calculated based on influence weight, given as follows.

    $$\begin{aligned} \begin{aligned} I_L(x) =1+ \sum _{y \in N(x)} w(x,y) + \sum _{y \in N(x)} \sum _{z \in N(y)} w(x,y) \times w(y,z) \end{aligned} \end{aligned}$$
    (3)

    The location of a node is defined using coreness score which can be measured by k-shell decomposition method [23]. The coreness centrality is a global method for identifying vital nodes, since the process of getting coreness score requires the global information of the network. Therefore, global influence \(I_G(x)\) of a node x is estimated using coreness score \(k_c(x)\), given as follows.

    $$\begin{aligned} \begin{aligned} I_G(x)= k_c(x)\Big (1+\frac{D(x)}{D_N}\Big ) \end{aligned} \end{aligned}$$
    (4)

    where \(D_N=max(D_x)\) denotes network degree. Now, we present centrality measure based on local and global influence. The overall influence capacity I(x) of a node x is defined as follows.

    $$\begin{aligned} \begin{aligned} I(x)= \frac{I_L(x)}{max_{y\in V}I_L(y)} \times \frac{I_G(x)}{max_{y\in V}I_G(y)} \end{aligned} \end{aligned}$$
    (5)

    With this, we can select a most influential seed node x based on largest influence capacity metric I(x) value. The adjacent nodes who are more similar to each other, have more influence overlap in the network. Therefore, we adopt a similarity index common neighbors to measure the similarity between adjacent nodes. To select subsequent seed nodes, we utilize common neighbors index CN(xy), given as follows.

    $$\begin{aligned} \begin{aligned} CN(x,y)= \frac{|N(x) \cap N(y)|}{|N(x) \cup N(y)|} \end{aligned} \end{aligned}$$
    (6)
  3. 3.

    Influence Propagation. In order to propagate influence in coupled social network, we incorporate IC diffusion model.

figure a
Fig. 2.
figure 2

The example graph: Twitter Network

4.1 Algorithm

The Algorithm 1 takes three inputs: influence graph set G, number of networks l, and seed size k. In line 1, we initialize the seed set S with an empty set. Line 2 performs network coupling based on network topology and the importance of a relationship. The for loop in lines 3–7 iteratively calculates the influence capacity I(x) of each node x and also mark them as unvisited. Line 8 perform sorting based on influence capacity of each node. The for loop in lines 9–16 iteratively select seed nodes. Line 10 selects highest influence capacity unvisited node. Line 11 marks node x as visited. The for loop in lines 12–14 marks visited such nodes that have common neighbors more than the threshold value. Line 15 adds a node to seed set. Line 16 increments the value of i. Finally, line 17 returns the output of the algorithm as seed set.

Returns the seed set S as output.

4.2 Applying the Algorithm

In order to better understand the execution of proposed algorithm C-IM2, we take a running example of Twitter network with three different type of relationship networks: Follower, Re-tweet, and Reply Network as shown in Fig. 2. First, algorithm performs coupling based on Eq. 1. Let probability vector \(P=[0.2,0.5,0.3]\) and each edge in example graph have a unit weight. Now, we perform network coupling. For example, the edge weight of (AB) in coupled multiplex network is calculated as \(w(A,B) \leftarrow 1\,\times \, 0.2\,+\,1\,\times \, 0.5\,+\,1\,\times \, 0.3 \leftarrow 1\). Similarly, \(w(A,C) \leftarrow 1\,\times \, 0.5\,+\,1\,\times \, 0.3 \leftarrow 0.8\). Figure 3 demonstrates the coupled multiplex after performing network coupling. To identify influential users, algorithm calculates the influence capacity of each node as shown in Table 2. For example, local influence of A is computed as \(I_L(A)\leftarrow 1\,+\,(1\,+\,0.8\,+\,0.2)\,+\,(1 \,\times \, (0.5\,+\,0.5)\,+\,0.8 \,\times \, (0.5\,+\,0.7)\,+\,0.2 \,\times \, (0.5\,+\,0.7)) \leftarrow 5.2\). After estimating influence capacity of each node, the algorithm identifies most influential nodes as A and C.

Table 2. The estimation of influence capacity of each node
Fig. 3.
figure 3

The coupled multiplex graph of examplary Twitter network

5 Results and Discussion

5.1 Experimental Setup

In order to evaluate the performance of C-IM2 algorithm, we compare the influence spread of our algorithm with the state-of-the-art algorithms on real-world networks. The propagation probability in IC model follows uniform distribution over tri-valency model, i.e., {0.1,0.01,0.001}. The distribution of activation probability follows uniform distribution, i.e. \(a_p(x,y)\in \) [0,1].

5.2 Dataset

To perform experiment, we utilized three co-author networksFootnote 1 [40]: Network Science (NSN), High-Energy Theory (HTN), and Condensed Matter (CMN). The name of the authors are used to identify identical users across networks. The statistical information of datasets is given in Table 3.

Table 3. The statistical information of the datasets

5.3 Algorithm to Compare

  • Random: Randomaly selects k nodes as seed nodes.

  • MaxDegree: This method selects k nodes based on degree of nodes (maximal degree nodes).

  • Greedy: Selects seed nodes based on the marginal gain of nodes [11].

  • Degree Discount: Selects seed based on degree and discount procedure [3].

  • C-IM2: This is our proposed algorithm which selects influential nodes based on influence capacity of nodes.

5.4 Experimental Result

This section describes the performance of C-IM2 algorithm. The influence spread of C-IM2 is compared with baseline methods to validate the performance of the C-IM2. The algorithm with the highest influence spread is referred to as the best performing IM algorithm. To experiment, we utilized three real-world networks with differently sized seed set (10–50).

Fig. 4.
figure 4

The performance comparison of C-IM2 against the baseline methods in terms of influence spread

Comparison of C-IM2 Under IM Framework. Figure 4 illustrates the comparison of C-IM2 method against the compared methods. The proposed method performs better than heuristics Random, MaxDegree, and Degree Discount. This is because of the fact that these heuristics only consider the degree of nodes and ignores others topological information like local influence, location in the network, etc. Therefore, C-IM2 outperforms these methods. Also, the proposed method has a comparable influence spread with the hill-climbing greedy method.

Advantage of IM2 Framework. Figure 5 shows the performance gain of IM2 framework over traditional IM problem. The proposed algorithm performs better than the classical IM problem. This is because of IM problem ignores the fact that some of the social network users have multiple accounts on different networks simultaneously while IM2 considers this assumption. Therefore, IM2 framework is more realistic and generates more effective seed set.

Fig. 5.
figure 5

The performance comparison of C-IM2 under IM and IM2 framework in terms of influence spread

6 Conclusion and Future Directions

In this paper, we study the IM2 problem in the multiplex network. We perform network coupling to form a multiplex network based on network topology and the importance of relationship in information spreading. Then we present a new centrality measure to identify the most influential users in the coupled multiplex network. Exhaustive experiments show the advantage of IM2 framework over classical IM framework and provide a new insight into the problem. In the future, we can extend our work to MIM2 problem with heterogeneous propagation models. Also, we can add some contextual features like topical, spatial, dynamic, etc.