1 Introduction

The exploding popularity of social networking services, such as Facebook, LinkedIn, Google+ and Twitter, has led to an increasing interest in the effective use of word-of-mouth to market products or brands to consumers. A few individuals, seen as influencers, are targeted with free merchandise, exclusive deals or new information on a product or brand. Marketers hope that these key influencers promote the product to others in their social network through status updates, blog posts or online reviews and that this information propagates throughout the social network from peers to peers of peers until the product “goes viral.” Therefore, a key question for marketers with limited budgets and resources is to identify a small number of individuals whom to target with promotions and relevant information so as to instigate a cascade of peer influence, taking into account the network effects.

1.1 Literature review

Domingos and Richardson [15] first introduce the problem of finding which customers to target to maximize the spread of their influence in the social network. The authors propose a Markov random-field-model of the social network, where the probability that a customer is influenced takes into account whether her connections are influenced. After building this network, the authors propose several heuristics to identify which k individuals to target in a viral marketing campaign, where k is a user-defined positive integer. Kempe et al. [21] formalize the optimization problem and introduce two fundamental models to maximize the influence spread in a social network: the independent cascade model and the linear threshold model. The authors show that the optimization problems are NP-hard, assuming that there is an efficient oracle to compute the influence spread function. This seminal work spurred a flurry of research on social networks with over 4600 citations recorded by Google Scholar in August 2017. Wang et al. [53] show that calculating the influence spread function is #P-hard under the probabilistic assumptions of Kempe et al. [21]. Therefore, the independent cascade problem is #P-hard, and there are two sources of difficulty. First, the calculation of the influence spread function is hard because there is an exponential number of scenarios. This difficulty is overcome by using sampling. Second, the seed selection is combinatorial in nature, and requires the evaluation of an exponential number of choices. This difficulty is overcome by seeking heuristic solutions in the literature. We describe the results of the seminal paper by Kempe et al. [21] and the subsequent developments in Sect. 2.

The majority of the existing work on optimization-based methods for social network analysis focus on various aspects other than influence maximization (see the review by Xanthopoulos et al. [55]) with the exception of some recent work [18, 42], which develop mathematical programming approaches to solve deterministic weighted target set selection problems so that the total cost of influencing all nodes in a social network is minimized. Outside the influence maximization realm, the first class of problems studied is that of identifying the influential nodes of a network with respect to the nodes’ centrality and connectivity. As an example of this class of problems, Arulselvan et al. [3] propose an integer programming formulation for the problem of identifying k nodes whose removal from a deterministic social network causes maximum fragmentation (disconnected components). The second class is that of clustering the nodes of a deterministic social network to identify the cohesive subgroups of the network. For example, Balasundaram et al. [4] and Ertem et al. [16] utilize optimization models to identify clique relaxations. Third, game-theoretic approaches are used to study various aspects of social networks, such as modeling competitive marketing strategies of two firms to maximize their market shares (see, e.g., [7]).

In contrast to these models, we focus on the stochastic influence maximization problems and propose a two-stage stochastic programming method. In addition, by utilizing the submodularity of the second stage value (objective) function, we develop effective decomposition algorithms. Two-stage stochastic programming is a versatile modeling tool for decision-making under uncertainty. In the first stage, a set of decision needs to be made when some parameters are random. In the second stage, after the uncertain parameters are revealed, a second set of (recourse) decisions are made so that the expected total cost is minimized. We refer the reader to Birge and Louveaux [8] and Shapiro et al. [47] for an overview of stochastic (linear) programming. To the best of our knowledge, Song and Dinh [49] provide the only study besides ours that uses a stochastic programming approach to solve a problem in social networks. In this paper, the authors consider the problem of protecting some arcs of a social network (subject to a limited budget) so that the damage caused by the spread of rumors from their sources to a set of targeted nodes is minimized.

1.2 Our contributions

Despite the ubiquity of social networks, there has been a paucity of research in finding provably optimal solutions to the two fundamental problems of maximizing influence in social networks (independent cascade and general threshold). The algorithms studied to date are approximation algorithms with a worst-case guarantee within 63% optimal ([22], and references therein). The proposed heuristics are tested on real social networks and compared to other simple heuristics. However, their practical performance has not been tested against the optimal solution due to the hardness of the problem and the unavailability of an algorithm that can find the optimal solution for large-scale instances of the problem. To fill this gap, we introduce a new class of problems that we refer to as two-stage stochastic submodular optimization models. We propose a delayed constraint generation algorithm to find the optimal solution to this class of problems with a finite number of samples. The proposed delayed constraint generation algorithm exploits the submodularity of the second-stage value function. The influence maximization problems of interest are special cases of this general problem class. Utilizing the special structure of the influence function, we give an explicit characterization of the cut coefficients of the submodular inequalities, and identify conditions under which they are facet-defining for the full master problem that is solved by delayed constraint generation. This leads to a more efficient implementation of the proposed algorithm than is available from a textbook implementation of available algorithms for this class of problems [5, 35, 51]. In addition, we give the complete linear description of the master problem for \(k=1\), where k is the number of nodes targeted in a social network. We illustrate our proposed algorithm on the classical independent cascade and linear threshold problems [21]. In our computational study, we show that our algorithm outperforms a basic implementation of the greedy heuristic in most of the large-scale real-world instances.

We note that while we demonstrate our algorithms on the independent cascade and linear threshold models, our approach is more generally applicable to many other variants of the influence maximization problem studied previously in the literature. Furthermore, beyond social networks, there are other applications of identifying a few key nodes in complex networks for which our models are applicable. For example, Ostfeld and Salomons [40] consider the problem of locating costly sensors on the crucial junctures of the water distribution network to ensure water quality and safety by the early detection and prevention of outbreaks. The models could also be useful in the development of immunization strategies in epidemic models (see, e.g. [31]), and prevention of cascading failures in power systems (see, e.g., [19]). Furthermore, it also applies to more general stochastic optimization problems that have submodular second-stage value functions. For example, recently Contreras and Fernández [13] consider a deterministic hub location problem, and prove that the routing costs in the objective function are submodular. Using this observation, the authors employ the delayed constraint generation algorithm of Nemhauser and Wolsey [35] to solve the optimization problem more effectively than the existing models for this problem. Our proposed algorithm can be used to solve a stochastic extension of the hub location problem, where in the first stage, the hub locations are determined, and in the second stage, after the revelation of uncertain demand of multiple commodities, the optimal routing decisions are made. Hence, the general two-stage stochastic submodular optimization model and method that we introduce in Sect. 3 has a potential broader impact beyond social networks.

1.3 Outline

In Sect. 2, we formally introduce the influence maximization problem and review the greedy algorithm of Kempe et al. [21]. In Sect. 3, we define a general two-stage stochastic submodular optimization model, and describe a delayed constraint generation algorithm that exploits the submodularity of the second-stage value function. We show that for \(k=1\), solving a linear program with a simple set of submodular optimality cuts and the cardinality restriction on the seed set guarantees an integer optimal solution. In Sect. 4, we consider the two fundamental influence maximization problems as defined by Kempe et al. [21], namely independent cascade and linear threshold. We show that for these special cases of the two-stage stochastic submodular optimization problems, we can obtain an explicit form of the submodular optimality cuts and identify conditions under which they are facet defining. In Sect. 5, we report our computational experience with large-scale real-world datasets, which show the efficacy of the proposed approach in finding optimal solutions as compared to the greedy algorithm. We share our conclusions and future work in Sect. 6.

2 Greedy algorithm of Kempe et al. [21]

In this section, we describe the modeling assumptions of Kempe et al. [21], and overview the greedy hill-climbing algorithm proposed by these authors. Suppose that we are given a social network \(G=(V,A)\), where \(|V|=n, |A|=m\). The vertices represent the individuals, and an arc \((i,j)\in A\) represents a potential influence relationship between individuals i and j. Our goal is to select a subset of seed nodes, \(X\subset V\), with \(|X|\le k<n\) to activate initially, so that the expected number of people influenced by X (denoted by \(\sigma (X)\)) is maximized, where k is a given integer. (Note that the original problem statement is to select exactly k nodes to activate. However, for the relaxation that seeks \(|X|\le k\) seed nodes that maximize influence, there exists a solution for which the inequality holds at equality.) The influence propagation is assumed to be progressive, in other words, once a node is activated it remains active.

Kempe et al. [21] show that for various influence maximization problems, the influence function \(\sigma (X)\) is nonnegative, monotone and submodular. Therefore, the influence maximization problem involves the maximization of a submodular function. The authors show that this problem is NP-hard even if there is an efficient oracle to compute the influence spread function. However, using the results of Cornuéjols et al. [14] and Nemhauser et al. [36] that the greedy method gives a \((1-\frac{1}{e})\)-approximation algorithm for maximizing a nonnegative monotone submodular function, where e is the base of the natural logarithm, Kempe et al. [21] establish that the greedy hill-climbing algorithm solves the influence maximization problem with a constant (0.63) guarantee, assuming that the function \(\sigma (X)\) can be calculated efficiently. Recognizing the computational difficulty of calculating \(\sigma (X)\) exactly, which involves taking the expectation of the influence function with respect to a finite (but exponential) number of scenarios, Kempe et al. [21] propose Monte-Carlo sampling, which provides a subset of equiprobable scenarios, \(\Lambda \), of moderate size. Letting \(\sigma _\omega \) denote the influence function for scenario \(\omega \in \Lambda \), we get \(\sigma (X)=\frac{1}{|\Lambda |}\sum _{\omega \in \Lambda }\sigma _\omega (X)\). The basic greedy approximation algorithm of Kempe et al. [21] is given in Algorithm 1.

Subsequently, Wang et al. [53] formally show that calculating \(\sigma (X)\) is #P-hard under the assumption of independent arc probabilities \(\pi _{ij}, (i,j)\in A\). Therefore, Kempe et al. [22] propose a modification where an arbitrarily good approximation of \(\sigma (X)\) is obtained in polynomial time by sampling from the true distribution. In particular, Kempe et al. [22] show that for a sample size of \(\varOmega \left( \frac{n^2}{\varepsilon ^2}\ln (1/\alpha )\right) \), the average number of activated nodes over the sample is a \((1\,\pm \, \varepsilon )\)-approximation to \(\sigma (X)\), with probability at least \(1-\alpha \).

figure a

Further algorithmic improvements to the greedy heuristic are given in the literature (see [10, 22], for an overview). Most notably, Borgs et al. [9] give a randomized algorithm for finding a \((1-1/e-\epsilon )\)-approximate seed sets in \(O((m+n)\epsilon ^{-3}\log n)\) time for any precision parameter \(\epsilon >0\). Note that this run time is independent of the number of seeds k. The authors show that the running time is close to the lower bound of \(\varOmega (m+n)\) on the time required to obtain a constant factor randomized approximation algorithm. The proposed randomized algorithm has a success probability of 0.6, and failure is detectable. Therefore, the authors suggest repeated runs if failure is detected to improve the probability of success.

3 A general two-stage stochastic submodular optimization model and method

In this section, we define a general two-stage stochastic submodular optimization model and outline a delayed constraint generation algorithm for its solution. Then, in Sect. 4, we describe how this general model and method is applicable to the influence maximization problems of interest.

Let \((\Lambda , {\mathcal {F}}, \mathbb {P})\) be a finite probability space, where the probability of an elementary event \(\omega \in \Lambda \) is \(p_\omega :=\mathbb {P}(\omega )\). Consider a general two-stage stochastic binary program

$$\begin{aligned} \max ~~&c^\top x +\sum _{\omega \in \Lambda } p_\omega \sigma _\omega (x) \end{aligned}$$
(1a)
$$\begin{aligned} \text {s.t.}~~&x\in {\mathcal {X}} \end{aligned}$$
(1b)
$$\begin{aligned}&x\in \{0,1\}^n, \end{aligned}$$
(1c)

where \(c\in \mathbb {R}^n\) is a given objective vector, the set \({\mathcal {X}}\) represents the constraints on the first-stage variables x and \(\sigma _\omega (x)\) is the objective function of the second-stage problem for scenario \(\omega \in \Lambda \) solved as a function of first-stage decisions given by

$$\begin{aligned} \sigma _\omega (x):=\max ~~&q^\top y \end{aligned}$$
(2a)
$$\begin{aligned} \text {s.t.}~~&y\in {\mathcal {Y}}(x,\omega ). \end{aligned}$$
(2b)

Here q is an objective vector of conformable dimension, y is the vector of second-stage decisions, and \({\mathcal {Y}}(x,\omega )\) defines the set of feasible second-stage decisions for a given first-stage vector x, and the realization of the uncertain outcomes given by the scenario \(\omega \in \Lambda \). We assume that \(\sigma _\omega (x):\{0,1\}^n\rightarrow \mathbb {R}\) is known to be a submodular function for each \(\omega \in \Lambda \), and refer to the optimization problem (1) as a two-stage stochastic submodular optimization model. It is well-known from the property of submodular functions that if \(\sigma _\omega (x), \omega \in \varOmega \) is submodular, then so is the second-stage value function \(\sigma (x)= \sum _{\omega \in \Lambda } p_\omega \sigma _\omega (x)\), which is a nonnegative (convex) combination of submodular functions. Furthermore, we assume that \({\mathcal {Y}}(x,\omega )\) is a non-empty set for each \(x\in {\mathcal {X}}, \omega \in \Lambda \), a property known as relatively complete recourse in stochastic programming.

Next we overview a delayed constraint generation approach to solve the two-stage program (1). The generic master problem at an iteration is formulated as

$$\begin{aligned} \max ~~&c^\top x +\sum _{\omega \in \Lambda } p_\omega \theta _\omega \end{aligned}$$
(3a)
$$\begin{aligned} \text {s.t.}~~&x\in {\mathcal {X}} \end{aligned}$$
(3b)
$$\begin{aligned}&(x,\theta )\in {\mathcal {C}} , \end{aligned}$$
(3c)

where \(\theta \) is a \(|\Lambda |\)-dimensional vector of variables \(\theta _\omega \) representing the second-stage objective function approximation for scenario \(\omega \), constraints (3c) represent the so-called optimality cuts generated until this iteration. The set of inequalities in \({\mathcal {C}}\) provides a piecewise linear approximation of the second stage value function, which is iteratively refined through the addition of the optimality cuts. (We will describe different forms of these inequalities in the following discussion.) Let \(({\bar{x}},{\bar{\theta }})\) be the optimal solution to the master problem at the current iteration. Then for all \(\omega \in \Lambda \) we solve the subproblems (2) to obtain \(\sigma _\omega ({\bar{x}})\). We add valid optimality cuts to \({\mathcal {C}}\) if \({\bar{\theta }}_\omega >\sigma _\omega ({\bar{x}})\) for any \(\omega \in \Lambda \), otherwise we deduce that the current solution \({\bar{x}}\) is optimal. The generic version of the delayed constraint generation algorithm is given in Algorithm 2. In this algorithm, \(\varepsilon \) is a user-defined optimality tolerance. The particular implementation of Algorithm 2 depends on the method with which subproblems are solved to obtain \(\sigma _\omega ({\bar{x}})\) (in line 5 of Algorithm 2), and the form of the optimality cuts added to the master problem (in line 7 of Algorithm 2). In this section, we explore the possibility of utilizing the submodularity of the second-stage value function in a two-stage stochastic programming problem. We discuss a natural alternative in “Appendix A”, which we use as a benchmark.

figure b

Nemhauser and Wolsey [35] give submodular inequalities to describe the maximum of a submodular set function (see also [37]). Consider the polyhedra \({\mathcal {S}}_\omega =\{(\theta _\omega ,x)\in \mathbb {R}\times \{0,1\}^n:\theta _\omega \le \sigma _\omega (S) +\sum _{j\in V{\setminus } S} \rho ^\omega _j(S) x_j, \forall S\subseteq V\}\), and \( {\mathcal {S}}'_\omega =\{(\theta _\omega ,x)\in \mathbb {R}\times \{0,1\}^n:\theta _\omega \le \sigma _\omega (S) -\sum _{j\in S} \rho ^\omega _j(V{\setminus }\{j\})(1-x_j) +\sum _{j\in V{\setminus } S} \rho ^\omega _j(S) x_j, \forall S\subseteq V\}\) for \(\omega \in \Lambda \), where \(\rho _j^\omega (S)=\sigma _\omega (S\cup \{j\})-\sigma _\omega (S)\) is the marginal contribution of adding \(j\in V{\setminus } S\) to the set S.

Theorem 3.1

(cf. [35]) For a submodular and nondecreasing set function \(\sigma _\omega :2^n\rightarrow \mathbb {R}\), \({\bar{X}}\), with a characteristic vector \({\bar{x}}\), is an optimal solution to \(\max _{S\subseteq V: |S|\le k} \{\sigma _\omega (S)\}\), if and only if \((\theta _\omega , {\bar{x}})\) is an optimal solution to \(\{\max \ \theta _\omega : \sum _{j\in V} x_j\le k, {(\theta _\omega ,x)\in {\mathcal {S}}_\omega } \}\). Similarly for a submodular and nonmonotone set function \(\sigma _\omega :2^n\rightarrow \mathbb {R}\), \({\bar{X}}\), with a characteristic vector \({\bar{x}}\), is an optimal solution to \(\max _{S\subseteq V: |S|\le k} \{\sigma _\omega (S)\}\), if and only if \((\theta _\omega , {\bar{x}})\) is an optimal solution to \(\{\max \ \theta _\omega : \sum _{j\in V} x_j\le k, {(\theta _\omega ,x)\in \mathcal {S}'_\omega } \}\).

Therefore, we can adapt the algorithm of Nemhauser and Wolsey [37] given for deterministic submodular maximization problems to two-stage stochastic submodular optimization problems. Note that because there are exponentially many submodular inequalities, we cannot add all of them a priori to the formulation. Instead, we use a delayed constraint generation approach that adds the violated inequalities as needed and solves the resulting mixed-integer program by branch-and-cut (see, e.g., [6] for an overview of the general form of a delayed constraint generation algorithm for linear programs). The proposed method takes the form of Algorithm 2. For a given first stage solution, \({\bar{x}}\), which is a characteristic vector of the set \({\bar{X}}\), and scenario \(\omega \in \Lambda \), we use the optimality cut

$$\begin{aligned} \theta _\omega \le \sigma _\omega ({\bar{X}}) +\sum _{j\in V{\setminus } {\bar{X}}} \rho ^\omega _j({\bar{X}}) x_j, \end{aligned}$$
(4)

if the second-stage value function \(\sigma _\omega (x)\) is nondecreasing and submodular. If the second-stage value function \(\sigma _\omega (x)\) is nonmonotone and submodular, then we use the optimality cut given by the inequality

$$\begin{aligned} \theta _\omega \le \sigma _\omega ({\bar{X}}) -\sum _{j\in {\bar{X}}} \rho ^\omega _j(V{\setminus }\{j\})(1-x_j) +\sum _{j\in V{\setminus } {\bar{X}}} \rho ^\omega _j({\bar{X}}) x_j. \end{aligned}$$
(5)

We refer the reader to Nemhauser and Wolsey [35] for validity of inequalities (4)-(5). Ahmed and Atamtürk [1] and Yu and Ahmed [56] strengthen the submodular inequalities by lifting, under the condition that the submodular utility function is strictly concave, increasing, and differentiable. These assumptions do not apply to our problem.

Corollary 3.1

Algorithm 2 with optimality cuts (4) and (5) converges to an optimal solution in finitely many iterations for a two-stage stochastic program with binary first-stage decisions, \(x\in \{0,1\}^{|V|}\) for which the second-stage value function, \(\sigma _\omega (x), \omega \in \Lambda \), (\(|\Lambda |\) finite) is submodular nondecreasing and submodular nonmonotone, respectively.

Proof

The result follows from the fact that the number of feasible first stage solutions is finite, and from Theorem 3.1. \(\square \)

Note that Algorithm 2 is generally applicable to two-stage stochastic programs with binary first-stage decisions, \(x\in \{0,1\}^{n}\), where the second-stage value function, \(\sigma _\omega (x)\) is submodular for all \(\omega \in \Lambda \). There is very limited reporting on the computational performance of this algorithm even for deterministic submodular maximization problems for which the method was originally developed (see [13, 26], for computational results on quadratic cost partition and hub location problems, respectively). Kawahara et al. [20] utilize the algorithm of Nemhauser and Wolsey [35] along with convexity cuts derived from Lovász extension of submodular functions to solve deterministic submodular maximization problems. The authors derive inequalities that are not globally valid as they cut off solutions that do not strictly improve upon the current incumbent solution. To the best of our knowledge, our work is the first adaptation and testing of this algorithm for two-stage stochastic optimization. While the submodular inequalities (4)-(5) are implicit in that they require the calculation of \(\rho ^\omega _j(\cdot )\) terms, in Sect. 4, we give an explicit form of the submodular optimality cuts for influence maximization problems of interest. This allows us to characterize conditions under which the optimality cuts are strong, and to improve the performance of a textbook implementation of the algorithm of Nemhauser and Wolsey [35].

Next, we consider the special case of cardinality-constrained first-stage problem (1), i.e., \({\mathcal {X}}:=\{x\in \{0,1\}^n: \sum _{j\in V}x_j\le k\}\), when \(k=1\). It is easy to see that in this case, the greedy algorithm is optimal. Note also that for fixed k, the problem is polynomially solvable (with respect to the input size of number of nodes, arcs and scenarios), because it involves evaluating \(O(n^k)\) possible functions \(\sigma _\omega (X), \omega \in \Lambda \). Observe that, without loss of generality, we can assume that \(p_\omega >0\) for all \(\omega \in \Lambda \) (otherwise, we can ignore scenario \(\omega \)), and that \(\sigma _\omega (\emptyset )=0\) (otherwise, we can add a constant to the influence function). Furthermore, because \(\sigma _\omega (\cdot )\) is submodular \(\rho _j^\omega (\emptyset )\ge \rho _j^\omega (S)\) for any \(S\subseteq V, S\ne \emptyset \) and \(j\in V{\setminus } S\). As a result, if \(\rho _j^\omega (\emptyset )<0\), then \(x_j=0\) in any optimal solution. Therefore, without loss of generality, we can assume that \(\rho ^\omega _j(\emptyset )\ge 0\) for all \(j\in V, \omega \in \Lambda \).

Proposition 3.1

For submodular functions \(\sigma _\omega (x), \omega \in \Lambda \) with \(\rho ^\omega _j(\emptyset )>0\) for all \(j\in V, \omega \in \Lambda \), and \({\mathcal {X}}:=\{x\in \{0,1\}^n: \sum _{j\in V}x_j\le 1\}\), adding the submodular optimality cut (4) with \({\bar{X}} = \emptyset \) to the linear programming (LP) relaxation of the master problem (3) for each \(\omega \in \Lambda \) is sufficient to give the (integer) optimal solution \(x^*\).

Proof

First, note that for \({\bar{X}}=\emptyset \), inequalities (4) and (5) are equivalent. Under the given assumptions, in an optimal solution \(x\ne 0\), the right-hand side of (4) is positive for each \(\omega \in \Lambda \). Therefore, the decision variables \(\theta _\omega > 0, \omega \in \Lambda \), are basic variables at an extreme point optimal solution of the LP relaxation of the master problem (3). This gives us \(|\Lambda |\) basic variables, and the number of constraints is \(|\Lambda |+1\). Hence, only one decision variable \(x_j\) for some \(j\in V\) can be basic, and it is equal to 1 [due to constraint (6)], and \(\theta _\omega =\rho _j^\omega (\emptyset )=\sigma _\omega (\{j\})\). Furthermore, this is the optimal solution to the master problem for the case \(k=1\). \(\square \)

4 Application to the stochastic influence maximization problem

In this section, we specify how the general algorithm we propose for two-stage stochastic programs with submodular second-stage value functions applies to the influence maximization problems of interest. Kempe et al. [21] observe that even though the stochastic diffusion process of influence spread is dynamic, because the decisions of whom to activate do not influence the probability of an individual influencing another, we may envision the process to be static and ignore the time aspect. In other words, we can generate sample paths (scenarios) of likely events for each arc, a priori. As a result, the decision-making process considered by Kempe et al. [21] may be viewed as a two-stage stochastic program. In the first stage, the nodes to be activated are determined. The uncertainty, represented by a finite collection of scenarios, \(\Lambda \), is revealed with respect to how the influence spreads in the network. For each scenario \(\omega \in \Lambda \), with associated probability \(p_\omega \), let the influence spread given the initial seed set X be given by \( \sigma _\omega (X): = |\{j \in V: \exists \text { a path from } i \text { to } j \text { in } G_\omega , i \in X \}|\), i.e., \( \sigma _\omega (X)\) is the number of vertices reachable from X in \(G_\omega \). As a result, the expected total influence spread of the initial seed set X is given by \(\sigma (X)=\sum _{\omega \in \Lambda } p_\omega \sigma _\omega (X)\). Let \(x\in \{0,1\}^n\) be the characteristic vector of \(X\subset V\). Where appropriate, we use \(\sigma (x)\) interchangeably with \(\sigma (X)\).

As observed by Kempe et al. [21], the influence function \(\sigma _\omega (X)\) is submodular and monotone (nondecreasing) for various influence maximization problems. Then the two-stage stochastic programming formulation of the classical influence maximization problem is given by (1) where \(c_j=0\) for al \(j\in V\) and the set \({\mathcal {X}}\) defines the cardinality constraint on the number of seed nodes given by

$$\begin{aligned} \sum _{j\in V} x_j \le k, \end{aligned}$$
(6)

for a given \(0<k<|V|\). Therefore, Algorithm 2 can be used to solve the influence maximization problem. Furthermore, note that the influence functions of interest in this paper satisfy the assumption \(\rho ^\omega _j(\emptyset )>0\) for all \(j\in V, \omega \in \Lambda \), because influencing only node j contributes at least one node (itself) to the influence function. In addition, the first-stage problem is cardinality-constrained. Hence Proposition 3.1 applies to the influence functions considered in this paper.

To model the stochastic diffusion process and calculate the influence spread function, Kempe et al. [21] introduce a technique that generates a finite set, \(\Lambda \), of sample paths (scenarios) by tossing biased coins. The coin tosses reveal, a priori, which influence arcs are active (live). A live-arc (ij) indicates that if node i is influenced during the influence propagation process, then node j is influenced by it. For each scenario \(\omega \in \Lambda \), with a probability of occurrence \(p_\omega \), a so-called live-arc graph \(G_\omega =(V,A_\omega )\) is constructed, where \(A_\omega \) is the set of live arcs under scenario \(\omega \). Then the influence spread under scenario \(\omega \in \Lambda \) is calculated to obtain \(\sigma _\omega (X) \). Hence, the expected influence spread function is given by \(\sigma (X)=\sum _{\omega \in \Lambda }p_\omega \sigma _\omega (X)\). This is referred to as the “triggering model” or the “triggering set technique” by Kempe et al. [22]. The authors show the equivalence of the stochastic diffusion process of two fundamental influence maximization problems to the live-arc graph model with respect to the final active set. In addition, Kempe et al. [21] show that the influence spread in a live-arc graph representable problem is monotone and submodular under the given assumptions. As a result, our stochastic programming method applies to such problems. Next we describe the two fundamental influence maximization problems that are live-arc representable.

  • Independent Cascade Model In the independent cascade model of Kempe et al. [21], it is assumed that each arc \((i,j)\in A\) of the social network \(G=(V,A)\) has an associated probability of success, \(\pi _{ij}\). In other words, with probability \(\pi _{ij}\) individual i will be successful at influencing individual j. We say that an arc (ij) is active or live in this case. We generate a sample path (scenario) by tossing biased coins (with probability of \(\pi _{ij}\) for each arc \((i,j)\in A\)) to determine whether the arc is active/live to construct the live-arc graph. Because each arc influence probability is independent, and does not depend on which nodes are influenced, Kempe et al. [21] show that the influence maximization problem is equivalent to maximizing the expected influence function in the live-arc graph model.

  • Linear Threshold Model In the linear threshold model of Kempe et al. [21], each arc (ij) in the social network \(G=(V,A)\) has deterministic weight \(0\le w_{ij}\le 1\), such that for all nodes \(j\in V\), \(\sum _{i:(i,j)\in A}w_{ij}\le 1\). In addition, each node \(j\in V\) selects a threshold \(\nu _j\) uniformly at random between 0 and 1. A node is activated if sum of the weights of its active neighbors is above the thresholds, i.e., \(\sum _{i:(i,j)\in A} w_{ij}x_i\ge \nu _j\). Given the set of initial seed nodes, \({\bar{X}}\), the activated nodes in the set U at time t influence their unactivated neighbor j at time \(t+1\) if \(\sum _{u\in U} w_{uj} \ge \nu _j \). Kempe et al. [21] show that the linear threshold model also has an equivalent live-arc graph representation, where every node has at most one incoming live arc. Each node \(j \in V\) selects at most one incoming live arc (ij) with probability \(w_{ij}\), or it selects no arc with probability \(1-\sum _{i:(i,j)\in A}w_{ij}\). Given the seed set \({\bar{X}}\), Kempe et al. [21] prove the following two are equivalent:

    1. (i)

      The distribution of active nodes computed by executing the linear threshold model with starting seed set \({\bar{X}}\), and

    2. (ii)

      the distribution of nodes reachable from \({\bar{X}}\) in the live-arc graph representation of the linear threshold model defined above.

Next, we demonstrate how the proposed algorithm (Algorithm 2) can be applied to influence maximization problems that have a live-arc graph representation. Subsequently, we give extensions where the proposed algorithm applies to models which are not live-arc graph representable. In such models, the form of the cuts change, but as long as the influence spread function is submodular, the proposed algorithm applies.

4.1 Exploiting the submodularity of the second-stage value function for live-arc graph models

Utilizing Theorem 3.1, we give an explicit description of the submodular inequalities for the influence maximization problems that have live-arc graph representations. We say that a node j is reachable from a set of nodes S, in scenario \(\omega \in \Lambda \), if there exists a node \(i\in S\) such that there is a directed path from i to j in the graph \(G_\omega =(V, A_\omega )\). It is well known that reachability can be checked in linear time with respect to the number of arcs using depth- or breadth-first search. For \(S\subseteq V\) and \(\omega \in \Lambda \), let R(S) be the set of nodes reachable from the nodes in S not including the nodes in S, and let \({\bar{R}}(S)\) be the set of nodes not reachable from the nodes in S in the graph \(G_\omega =(V, A_\omega )\).

Proposition 4.1

For \(S\subseteq V\) and \(\omega \in \Lambda \) the inequality

$$\begin{aligned} \theta _\omega \le \sigma _\omega (S)+\sum _{j\in {\bar{R}}(S)} r_j^\omega (S) x_j, \end{aligned}$$
(7)

is a valid optimality cut for the master problem (3), where \(r_j^\omega (S)\) is the number of nodes reachable from \(j\in {\bar{R}}(S)\) (including j) that are not reachable from any node in S in \(G_\omega \).

Proof

From Theorem 3.1, we know that \(\theta _\omega \le \sigma _\omega (S) +\sum _{j\in V{\setminus } S} \rho ^\omega _j(S) x_j\) is a valid inequality. Note that \({\bar{R}}(S)\subseteq V{\setminus } S\) and for \(j\in {\bar{R}}(S)\), we have \(\rho ^\omega _j(S)=r_j^\omega (S)\), in other words, the marginal contribution of adding \(j\in {\bar{R}}(S)\) to S is precisely \(r_j^\omega (S)\). Furthermore, for any node \(j\in R(S)\), the marginal contribution of adding j to S is zero, because j is already reachable from at least one node in S. This completes the proof. \(\square \)

We refer to the cuts in the form of (7) as submodular optimality cuts. Note that to obtain an inequality (7) for a given \(G_\omega \) and S, we need to solve multiple reachability problems on the same graph, where time complexity of a single reachability problem by depth- or breadth-first search is \({\mathcal {O}}(|A_\omega |)\). We first compute \(\sigma _\omega (S)\) by solving a reachability problem on \(G_\omega \) and mark all nodes reachable from S. Then, for each \(j\in {\bar{R}}(S) {\setminus } S\), we compute \(r_j^\omega (S)\) by solving another reachability problem, where we count the number of unmarked nodes reachable from j. Hence the overall complexity of generating an inequality (7) is \(\mathcal {O}( |A_\omega | \times |{\bar{R}}(S) {\setminus } S|)\).

Next we give conditions under which inequalities (7) are facet defining for conv(\({\mathcal {S}}_\omega \)).

4.2 Strength of the submodular inequalities

For \(i\in V\), let indeg(i) and outdeg(i) denote the in-degree and out-degree of node i, respectively. Let \(T:=\{i\in V: \text {indeg}(i)=0\}\), we refer to the nodes in T as root nodes. For \(i\in V{\setminus } T\), let \(P_i\) be the set of root nodes such that i is reachable from the nodes in this set, i.e., \(P_i:=\{j\in T: i\in R(\{j\})\}\). Finally, let \(L:=\{i\in V: \text {indeg}(i)>0,\text {outdeg}(i)=0\}\) denote the set of leaf nodes that have no outgoing arcs.

First, note that the submodular inequality (7) for a set S is equivalent to that for the set \(S\cup R(S)=:{\hat{R}}(S)\), because \(\sigma _\omega (S)=\sigma _\omega ({\hat{R}}(S))\), \({\bar{R}}(S)={\bar{R}}({\hat{R}}(S))\), \(r_j^\omega (S)=r_j^\omega ({\hat{R}}(S))\) for all \(j\in {\bar{R}}(S)\) and \(\rho _j^\omega (S)=0\) for \(j\in R(S)\). Therefore, in what follows, without loss of generality, we assume that for all non-leaf nodes \(i\in S{\setminus } L\), we have \(R(\{i\})\subseteq S\) (Assumption A1).

Proposition 4.2

For \(S\subseteq V\) and \(\omega \in \Lambda \) the submodular inequality (7) is facet defining for conv(\({\mathcal {S}}_\omega \)) only if the following conditions hold

  1. (i)

    if \(i \in S\), then \(i\not \in T\),

  2. (ii)

    there exists \(T'\subseteq T\) with \(|T'|<k\) such that \(S\subseteq R(T')\).

These conditions are also sufficient

  1. (i)

    if \(S=\emptyset \) (for any \(k\ge 1\)), or

  2. (ii)

    if \(|S|=1\) for \(k\ge 2\).

Proof Necessity

  1. (i)

    Suppose, for contradiction, that there exists \(i \in S \cap T\). Now consider the submodular inequality (7) for the set \(S'=S{\setminus }\{i\}\) given by

    $$\begin{aligned} \theta _\omega \le \sigma _\omega (S')+\sum _{j\in {\bar{R}}(S')} r_j^\omega (S') x_j=\sigma _\omega (S)-1+x_i+\sum _{j\in {\bar{R}}(S)} r_j^\omega (S) x_j, \end{aligned}$$
    (8)

    which follows because the set of all descendants of i, \(R(\{i\})\) is contained in S by Assumption A1, so removing i reduces the influence function by exactly 1 (recall that, by the contradictory assumption \(i\in T\), hence its in-degree is 0 and it is not influenced by any other node in the graph), and the set of nodes not reachable from \(S'\) is given by \({\bar{R}}(S')={\bar{R}}(S)\cup \{i\}\), and hence the coefficients \(r_j^\omega (S') =r_j^\omega (S)\) for \(j\in {\bar{R}}(S)\), and \(r_i^\omega (S') =1\). Because \(x_i\le 1\), inequality (8) dominates the submodular inequality (7) for this choice of S. Hence, the submodular inequality for a set S such that there exists \(i \in S \cap T\) is not facet defining for conv(\({\mathcal {S}}_\omega \)).

  2. (ii)

    Suppose, for contradiction, that there does not exist \(T'\subseteq T\) with \(|T'|<k\) such that \(S\subseteq R(T')\). In other words, the minimum cardinality of root nodes \(T'\subseteq T\) such that \(S\subseteq R(T')\) is greater than or equal to k. In this case, consider the set \({\hat{S}}:=\{i\in S: \not \exists j\in S \text { with } i\in R(\{j\})\}\), in other words, \({\hat{S}}\) is the set of nodes in the graph induced by S that have no incoming arcs from other nodes in S. Note that from condition (i), we know that \({\hat{S}}\cap T=\emptyset \). Then, by the contradictory assumption, there exist at least k nodes, say nodes \(1,\ldots ,k\in {\hat{S}}\) such that \(P_i \cap P_j=\emptyset \) for all pairs \(i,j\in \{1,\ldots ,k\}, i\ne j\). Now consider the submodular inequality (7) for the set \(S'=S{\setminus }\{1,\ldots ,k\}\) given by

    $$\begin{aligned} \theta _\omega \le \sigma _\omega (S')+\sum _{j\in {\bar{R}}(S')} r_j^\omega (S') x_j=\sigma _\omega (S)-k+\sum _{j\in {\bar{R}}(S)} r_j^\omega (S) x_j + \sum _{i=1}^k \sum _{j\in {\hat{R}}(P_i){\setminus } R(\{i\})}x_j ,\nonumber \\ \end{aligned}$$
    (9)

    which follows because the set of all descendants of \(i\in \{1,\ldots ,k\}\), \(R(\{i\})\), is contained in S by Assumption A1, so removing nodes \(i=1,\ldots ,k\) reduces the influence function by exactly k, and the set of nodes not reachable from \(S'\) is given by \({\bar{R}}(S')={\bar{R}}(S)\cup \{1,\ldots ,k\}\). In addition, the coefficients \(r_j^\omega (S') =r_j^\omega (S)\) for \(j\in {\bar{R}}(S)\) such that \(j\not \in \cup _{i=1}^k\left( {\hat{R}}(P_i){\setminus } R(\{i\})\right) \), \(r_j^\omega (S') =r_j^\omega (S)+1\) for \(j\in {\bar{R}}(S)\) such that \(j \in \cup _{i=1}^k\left( {\hat{R}}(P_i){\setminus } R(\{i\})\right) \), and \(r_i^\omega (S') =1\) for \(i=1,\ldots ,k\). Because \( \sum _{i=1}^k \sum _{j\in {\hat{R}}(P_i){\setminus } R(\{i\})}x_j\le \sum _{j\in V} x_j \le k\), inequality (9) dominates the submodular inequality (7) for this choice of S. Hence, there must exist \(T'\subseteq T\) with \(|T'|<k\) such that \(S\subseteq R(T')\) for the submodular inequality (7) to be facet defining for conv(\({\mathcal {S}}_\omega \)).

Sufficiency First, note that for \(\omega \in \Lambda \), dim\(({\mathcal {S}}_\omega )=n+1\). Let \({\mathbf {e}}_i\) be a unit vector of dimension n whose ith component is 1, and other components are zero.

  1. (i)

    Note that when \(S=\emptyset \), the necessity conditions are trivially satisfied. Consider the \(n+1\) affinely independent points: \((\theta _\omega ,x)^0={\mathbf {0}}\), and \((\theta _\omega ,x)^i=(\sigma _\omega (\{i\}), {\mathbf {e}}_i)\), for \(i\in V\). These points are on the face defined by the inequality (7) for \(S=\emptyset \). Hence inequality (7) for \(S=\emptyset \) is facet-defining for conv(\({\mathcal {S}}_\omega \)).

  2. (ii)

    Note that for \(|S|=1\), the necessity conditions imply that \(S:=\{j\}\) for some \(j\in L\). Consider the \(n+1\) affinely independent points: \((\theta _\omega ,x)^0=(\sigma _\omega (\{j\}), {\mathbf {0}})\); \((\theta _\omega ,x)^j=(\sigma _\omega (\{j\}), {\mathbf {e}}_j\) and \((\theta _\omega ,x)^i=(\sigma _\omega (\{i,j\}), {\mathbf {e}}_j+{\mathbf {e}}_i)\), for \(i\in V{\setminus }\{j\}\). The last set of points is feasible because we have \(k\ge 2\) in this case. These points are on the face defined by the inequality (7) for \(S=\{j\}\). Hence inequality (7) for \(S=\{j\}\) is facet-defining for conv(\({\mathcal {S}}_\omega \)).\(\square \)

Note that during the course of the algorithm, if a submodular inequality (7) corresponding to the seed set S does not satisfy the necessary conditions given in Proposition 4.2, then a stronger inequality can be constructed using the arguments in the proof of the proposition.

From Proposition 4.2 we see that inequalities (7) with \(S=\emptyset \) are facets of conv(\({\mathcal {S}}_\omega \)) for any \(k\ge 1\). We will also see their importance in our computational study. Similarly, inequalities (7) with \(|S|=1\) are facets of conv(\({\mathcal {S}}_\omega \)) for any \(k\ge 2\). We note that more conditions are necessary for the inequalities (7) with \(|S|=2\) to be facets of conv(\({\mathcal {S}}_\omega \)). We illustrate this in the next example.

Example 4.1

Consider the network in Fig. 1 for a given scenario \(\omega \in \Lambda \) and let \(k=2\). From Proposition 4.2, inequalities (7) with \(S=\emptyset \), and inequalities (7) with \(S=\{j\}\), for \(j=4,\ldots ,9\) are facet-defining for conv(\({\mathcal {S}}_\omega \)). Inequalities (7) with \(S=\{7,8\}\) or \(S=\{5,6\}\) are facets of conv(\({\mathcal {S}}_\omega \)); each of these sets satisfies the necessary facet conditions in Proposition 4.2, which for these choices of S also turn out to be sufficient. However, the sets \(S=\{7,9\}\) or \(S=\{4,5\}\) satisfy the necessary facet conditions in Proposition 4.2, but they do not lead to facet-defining inequalities for \(k=2\). Finally, \(S=\{4,7\}\) violates the necessity condition (ii) of Proposition 4.2 (the minimum number of root nodes that can influence 4 and 7 is \(2=k\)) and is not a facet.

Fig. 1
figure 1

Network with 9 nodes and 10 arcs with equal influence probabilities p

It is important to note that in the direct adaptation of the delayed constraint generation algorithm proposed by Nemhauser and Wolsey [37] to our problem, for a given solution \({\bar{x}}\) to the current master problem, one would use the submodular inequalities (7), where we let \(S=\{i\in V: {\bar{x}}_i=1\}=:{\bar{X}}\). From Proposition 3.1, we have that Algorithm 2 with optimality cuts (7) with \(S={\bar{X}}\) converges to an optimal solution in finitely many iterations for two-stage stochastic submodular maximization problems. However, note that at any iteration, the solution to the master problem, \({\bar{x}}\) will be such that \(\sum _{j\in V} {\bar{x}}_j=k\). Therefore, all submodular optimality cuts (7) added will have \(|S|=k\), and no facets with \(S=\emptyset \) will be added for \(k\ge 1\). This may lead to slow convergence of the delayed constraint generation algorithm with submodular optimality cuts for \(S={\bar{X}}\), which we illustrate next.

Example 4.1

(Continued.) Consider the network in Fig. 1, and suppose that \(|\Lambda |=1\), hence we consider a deterministic problem. Adding the submodular optimality cut (7) for \(S=\emptyset \): \(\theta _1\le 5x_1+4x_2+4x_3+\sum _{j=4}^9 x_j\) to the cardinality constraint, and solving the linear programming relaxation of the master problem yields the integer optimal solution, \({\bar{x}}_1=1, \theta _1=5\) at the first iteration (from Proposition 3.1). In contrast, solving the master problem without any optimality cuts (i.e., with just the cardinality constraint) may lead to an initial solution of \({\bar{x}}_2=1\). Then the following set of submodular cuts are added in that order during the course of the algorithm of Nemhauser and Wolsey [37]:

$$\begin{aligned} \theta _1&\le 4+3x_1+4x_3+\sum _{j=7}^9 x_j \quad (S={\bar{X}}=\{2\}), \end{aligned}$$
(10)
$$\begin{aligned} \theta _1&\le 4+3x_1+4x_2+\sum _{j=4}^6 x_j \quad (S={\bar{X}}=\{3\}), \end{aligned}$$
(11)
$$\begin{aligned} \theta _1&\le 5+2x_2+2x_3+x_4+x_9 \quad (S={\bar{X}}=\{1\}). \end{aligned}$$
(12)

Furthermore, none of the inequalities (10 11)-(12) are facet-defining. Solving the LP relaxation of the master problem with the optimality cuts (10 11)-(12) leads to a fractional solution: \({\bar{x}}_2={\bar{x}}_3=0.5\). This small example highlights that a textbook implementation of the algorithm by Nemhauser and Wolsey [37] may lead to slow convergence because the algorithm (1) may explore, in the worst case, \(O{{n}\atopwithdelims (){k}}\) many locally optimal solutions before finding an optimal solution, and (2) may require long solution times for each master problem, because the optimality cuts given by \(S={\bar{X}}\) may not be facet-defining and hence lead to weak LP relaxations.

These observations motivate us to devise a more efficient implementation of Algorithm 2, which we report in Sect. 5.

4.3 Extensions

In this section, we give various extensions of the influence maximization problems that can be solved using our proposed methods.

4.3.1 Extensions to live-arc graph models

Observe that while we demonstrate our general algorithm on the independent cascade and linear threshold models, our proposed model and method is applicable to many extensions of the social network problems studied in the literature. For example, an extension considered in the literature is to replace the cardinality constraint on the number of nodes selected with a knapsack constraint representing a marketing budget where each node has a different cost to market. This model also admits an adapted and more involved 0.63-factor greedy approximation algorithm (see, [23, 50]). In fact, our model is flexible enough to allow any constraints in \({\mathcal {X}}\) so long as the master problem can be solved with an optimization solver, while the greedy approximation algorithm needs careful adjustment and analysis for each additional constraint. Similarly, the time-constrained influence spread problem studied in Chen et al. [11] and Liu et al. [30] can also be solved using our method. In this problem, there is an additional constraint that the number of time periods it takes to influence a node should be no more than a given parameter \(\tau \). The resulting influence spread function is monotone and submodular, hence we can use inequalities (4) as the submodular optimality cuts. Furthermore, we can efficiently calculate the coefficients \(\rho ^\omega _j({\bar{X}}) \) by solving, with breadth-first search, a modified reachability problem limiting the number of hops from the seed set \({\bar{X}}\) to any other node by \(\tau \).

4.3.2 General cascade and general threshold models

In the general cascade model, every node \(j\in V\) has an activation function \(p'_j(i,S)\in [0,1]\) for \(S\subseteq \{(k,j)\in A\}=:N^{in}(j)\) and \(i\in N^{in}(j){\setminus } S\). The activation function represents the probability that node j is influenced by node i given that the nodes in S failed to activate node i. The independent cascade model is a special case, where \(p'_j(i,S)=\pi _{ij}\), independent of S.

In the general threshold model, every node \(j\in V\) has an threshold function \(f_j(S)\) for \(S\subseteq N^{in}(j)\), where \(f_j(\cdot )\) is monotone and \(f_j(\emptyset )=0\). As before, every node j selects a threshold \(\nu _j\) uniformly at random in the range [0, 1]. Then, a node j is activated if for a given active set S, \(f_j(S\cap N^{in}(j))\ge \nu _j\). The linear threshold model is a special case, where \(f_j(S)=\sum _{i\in S}w_{ij}\).

Kempe et al. [21] show that general cascade model is equivalent to the general threshold model with an appropriate selection of activation and threshold functions. This is not true for the independent cascade and the linear threshold models (see Example 2.14 in [10]). Furthermore, the influence spread function is no longer submodular. However, if \(f_j(S)\) is submodular for all \(j\in V\), then the influence spread is submodular (first conjectured by Kempe et al. [21] and later proven by Mossel and Roch [33, 34]). Therefore, the greedy hill climbing algorithm is a 0.63-approximation algorithm for this case as well. Algorithm 2 is applicable in the submodular threshold functions case, where the optimality cuts take the more general form (4) or (5) depending on the monotonicity of the function f.

5 Computational experiments

In this section we summarize our experience with solving the influence maximization problem using the delayed constraint generation method (DCG) with various optimality cuts as given in Algorithm 2, and the greedy hill-climbing algorithm (Greedy) of Kempe et al. [21] as given in Algorithm 1.

The algorithms are implemented in C++ with IBM ILOG CPLEX 12.6 Optimizer. All experiments were executed on a Windows Server 2012 R2 with an Intel Xeon E5-2630 2.40 GHz CPU, 32 GB DRAM and x64 based processor. In our implementation of Algorithm  2, we set the parameter \(\varepsilon =0\). For the master problem of the decomposition algorithm, the relative MIP gap tolerance of CPLEX was set to 1%, so a feasible solution which has an optimality gap of 1% is considered optimal.

5.1 Small-scale network

First, we study the quality of the solutions produced by DCG and Greedy on a small-scale network for which we can enumerate all possible outcomes of the random process. In these experiments, we are able to capture the random process precisely, and no information is lost through sampling from the true distribution. An illustrative network is given in Fig. 1 with 9 nodes, 10 directed arcs and independent influence probability \(\pi _{ij}=p\) for all \((i,j)\in A\). Our goal is to select \(k=2\) seed nodes, so that the objective value, which is the expected number of nodes influenced by the seed nodes, is maximized. We generate all possible influence scenarios (a total of \(2^{10}=1024\) scenarios). Note that under the assumption that each influence is independent of the others, the probability of scenario \(\omega \), which has \(\ell \le 10\) live arcs, is given by \(p_\omega = \left( 1-p \right) ^{ 10-\ell } p^\ell \).

The solution of DCG and Greedy methods on 1024 scenarios with various values of \(p = 0.1, 0.2,\ldots , 1\) is shown in Table 1. When \(p \le 0.5\), both algorithms have the same objective value. For \( 0.6\le p \le 1\), Greedy selects node 1 as the seed in the first iteration of Algorithm 1 (line 4 of Algotihm 1) and selects either node 2 or 3 as the seed in the second iteration. However, DCG selects nodes 2 and 3 as the seed nodes, and provides a better objective value than Greedy (up to 12.5% improvement). So while Greedy does better than its worst-case bound (63%), it is within 12.5% of optimality.

Next, instead of generating all 1024 scenarios, we employed Monte-Carlo sampling, and independently sampled different number of scenarios \(|\Lambda | = 10, 50\) and 100 according to different p values, and let \(p_\omega = 1/|\Lambda |\). We summarize the results of this experiment in Table 2. For eight out of 15 cases, DCG has a higher objective value than Greedy, and in all other cases Greedy attains the optimal objective value (mostly for small influence probabilities \(p=0.1, 0.3\)). We also observe that the objective value for the instances with a larger number of scenarios is generally closer to the objective value with all 1024 scenarios (except for \(p=0.1\) and \(|\Lambda | =100\)). Note that Greedy is a 0.63-approximation algorithm even for the sampled problem, which assumes that the true distribution is given by the scenarios in \(\Lambda \), whereas DCG provides the optimal solution to the sampled problem.

Table 1 Expected influence obtained from two algorithms for the small-scale network with 1024 scenarios
Table 2 Expected influence obtained from two algorithms for the small-scale network with \(|\Lambda |\) scenarios
Table 3 The summary of real world datasets

5.2 Large-scale network with real-world datasets

To evaluate the efficiency of DCG and Greedy on large networks, we conduct computational experiments on four real-world datasets with different categories and scales. These datasets are summarized below and in Table 3:

  • UCI-message is the dataset for the online student community network at the University of California, Irvine [39]. The 1,899 nodes represent the students, and the 59,835 directed arcs between two nodes indicate that one student sent a message to the other.

  • P2P02 is the dataset for the Gnutella peer-to-peer file sharing network from August 2002 [27, 43]. The 10,876 nodes represent the hosts, and the 39,994 undirected edges denote the connections between two hosts.

  • Phy-HEP is the dataset for the academic collaboration network in the “high energy physics theory” (HEPT) section of the e-print arXiv (www.arxiv.org) [12, 21, 22]. The 15,233 nodes represent the authors, and the 58,891 undirected edges represent the co-authorship between each pair of authors in the “high energy physics theory” papers from 1991 to 2003. Note that this is the original dataset considered in Kempe et al. [21], and it is commonly used as a benchmark in comparing various algorithms for maximizing influence in social networks.

  • Email-Enron is the dataset for the email communication network of the Enron Corporation [24, 29]. It is posted to the public by the Federal Energy Regulatory Commission during the investigation. The 36,692 nodes represent the different email addresses, and the 183,831 directed arcs denote one address sent a mail to the other.

Note that if the graph in the original datasets contain undirected edges between i and j, then we construct a directed graph with two directed arcs from i to j and j to i. We follow the data generation scheme of Kempe et al. [21] in constructing live-arc graphs for these instances (i.e., the probabilities of influence on each arc for the independent cascade model, and the arc weights and node thresholds for the linear threshold model follow from Kempe et al. [21]).

In our experiments in this subsection, we compare three algorithms: Greedy is the greedy hill-climbing algorithm (Algorithm 1); DCG-SubIneqs is Algorithm 2 using submodular optimality cuts (7) for \(S={\bar{X}}\), where \({\bar{X}}\) is the optimal solution given by the master problem in the current iteration; and DCG-SubWarmup adds the submodular inequalities (7) for \(S = \emptyset \) for each scenario, before the execution of DCG-SubIneqs as a warm-start. Proposition 3.1 shows that the submodular optimality cut (7) with \(S = \emptyset \), which is referred to as EmptySetCut, is sufficient to find the optimal solution for \(k=1\) (note that \(\rho ^\omega _j(\emptyset )\ge 1\) for all \(j\in V, \omega \in \Lambda \), hence the assumptions of the proposition are satisfied). Since \(k = 1\) is an easy case for DCG-SubWarmup due to Proposition 3.1, we include DCG-SubWarmup to test if EmptySetCut is also useful for \(k>1\). To verify this, we add EmptySetCuts for all scenarios to the master problem before executing the DCG algorithm, and solving the initial master problem. Note that the total computation time in our experiments includes the generation time of all EmptySetCuts, and the total number of user cuts also includes the number of EmptySetCuts. We also implemented Algorithm  2 using alternative optimality cuts (adapted and strengthened versions of integer-L-shaped cuts of [25], referred to as Benders-LC, and described in “Appendix A”); however, the running time of Benders-LC is extremely long. Therefore, we only report our results with DCG-SubIneqs, DCG-SubWarmup and Greedy, and discuss the inefficiency of Benders-LC in “Appendix A.1”.

Table 4 Independent cascade model for UCI-message and P2P02
Table 5 Independent cascade model for Phy-HEP and Email-Enron

5.2.1 Independent cascade model

For the independent cascade model, we assign uniform influence probability \(\pi _{ij}=p=0.1\) independently to each arc (ij) in the network as was done in Kempe et al. [21]. Note that Kempe et al. [21] consider the dataset Phy-HEP with influence probabilities \(\pi _{ij}\in \{0.01, 0.1\}\) for each arc (ij) in the network. However, we observe that for \(\pi _{ij}=0.01\), the total number of live arcs is very small, resulting in sparse live-arc graphs with a large number of singletons. For example, the expected number of live arcs in our largest dataset Email-Enron is \(183,831 { \times } 0.01 = 1,838\) with \(p=0.01\). However, the number of nodes of Email-Enron is 36,692, resulting in over 30,000 singletons in the network. Therefore, we focus on the more interesting case of \(\pi _{ij}=0.1\) in our experiments in this section.

We generate \(|\Lambda | = 100, 200, 300\) and 500 scenarios to find \(k=2\) to 5 seed nodes that maximize influence. Tables  4 and 5 summarize our experiments with the algorithms DCG-SubIneqs, DCG-SubWarmup and Greedy for the independent cascade model. Column “k” denotes the number of seed nodes to be selected. Column “Cuts(#)” reports the total number of submodular inequalities (7) added to the master problem of DCG-SubIneqs, and column “Time(s)” reports the solution time in seconds. We do not report the objective values in these experiments, because we are able to prove that despite its worst-case performance guarantee of 63%, Greedy is within the optimality tolerance for these instances. In Kempe et al. [21] Greedy is tested empirically against other heuristics such as choosing the nodes with k highest degrees in the graph G, because it is said that an optimal solution is not available. Therefore, our computational experiments also provide an empirical test on the performance of the greedy heuristic when an optimal solution is available (to the sampled problem) due to our proposed method.

From column Cuts(#) in Tables 4 and 5, we observe that the number of cuts added to the master problem generally increases with the number of seed nodes k. In other words, more iterations are needed to prove optimality if we have more seed nodes to select. Columns DCG-SubIneqs Time(s) and Cuts(#) show that the overall running time does not necessarily increase with the number of user cuts, as more cuts may help the master problem converge to an optimal solution faster. Recall that the running time of DCG-SubIneqs includes the solution time of the master problem (a mixed-integer program) and the cut generation time of submodular inequalities, which decomposes by each scenario.

From column Greedy Time(s) we see that, for the same size of scenarios, the running time of Greedy increases linearly as the number of seed nodes increases. Recall that DCG solves a mixed integer master problem after the cut generation phase, which makes its running time nonlinear as we observe from the columns DCG-SubIneqs Time(s) and DCG-SubWarmup Time(s). The majority of the overall time for DCG is spent on cut generation for most instances (for example, the cut generation takes, on average, 80% of the time over all four problem instances with \(|\Lambda |=300\)). Because we observe that the major bottleneck is the cut generation time and most of the remaining time is spent on the solution of the mixed-integer master problem, we did not implement other enhancements known to improve convergence of related Benders methods, such as the trust region method or heuristics (cf. [38, 44]).

Considering the average solution time, we observe that there is not much difference between DCG-SubWarmup and DCG-SubIneq, and DCG outperforms Greedy for large networks with more than 10,000 nodes. For example, for the instance Email-Enron in Table 5, the average solution time of Greedy is five times that of DCG-SubIneqs. Only for the smallest instance, UCI-message, Greedy is the fastest algorithm (see Table 4).

Table 6 Linear threshold model for UCI-message and P2P02

5.2.2 Linear threshold model

In this section, we summarize our experiments with the linear threshold model. Recall that in the live-arc graph representation of linear threshold models, at most one incoming arc is chosen for each node in the live-arc graph construction for each scenario. As in Kempe et al. [21] we let the deterministic weight on each arc \((i,j)\in A\) be \(w_{ij}=1/\text {indeg}(j)\). We generate \(|\Lambda | \in \{ 100, 200, 300, 500\}\) for the four real-world datasets described earlier.

The results are shown in Tables 6 and 7. Similar to the independent cascade model, the running time of Greedy increases linearly in k. As in the previous experiments for the independent cascade model, DCG-SubWarmup is slower than Greedy only for the smallest dataset with fewer than 2,000 nodes (UCI-message) (see Table 6). For the large-scale datasets with over 10,000 nodes (P2P02, Phy-HEP, and Email-Enron) reported in Tables 6 and 7, we observe that the warm-up strategy is highly effective. It provides the best solution times, and fewer iterations and cuts. For example, DCG-SubWarmup outperforms Greedy by a factor of 2.46 in P2P02 in Table 6, a factor of 4.12 in Phy-HEP in Table 7, and a factor of 27 in Email-Enron in Table 7, the largest dataset considered.

Table 7 Linear threshold model for Phy-HEP and Email-Enron

Some comments are in order for both the independent cascade and linear threshold models. First, we make some observations on increasing k. As can be seen from our experiments, the running time of Greedy increases linearly with k. However, the increase in the running time of DCG is nonlinear, as can be expected. Hence, as we increase k, we need to set some time limits for both DCG and Greedy (currently, we impose no time limits). In this case, with DCG, we are still able to obtain an incumbent solution with k seed nodes and an estimate on optimality gap provided by the bound from the DCG master. However, with a time limit, we will have to stop Greedy prematurely, before it identifies all k seed nodes. For example, for the independent cascade model for the medium-sized instance P2P02, setting a time limit of one hour, for \(k=30\) and \(|\Lambda |=300\), Greedy stops at time limit with a solution that has \(k=4\) seed nodes (see Table 4). On the other hand, DCG stops with an incumbent solution that has \(k=30\) seed nodes and an optimality gap of 3.7%. For the largest instance Enron, from Table 5, we observe that Greedy cannot even find the first seed node in over three hours. Similarly, as we increase \(|\Lambda |\), the running time of both algorithms increase greatly, and as in the case of increasing k we will have to impose time limits and stop Greedy prematurely to be able to compare the performance of the algorithms.

Note that, throughout the paper, we present a so-called multicut version of the DCG algorithm and its variants, where we add an optimality cut for each scenario at each iteration. We have also tested a single cut implementation, in which multiple cuts across all scenarios are aggregated into a single cut at each iteration. We observe a degraded performance of the single cut version for our problem instances; therefore, we present our results for the multicut approach. In particular, for the independent cascade model, the total computational time of the single cut version of DCG-SubWarmup is between 20 to 100% higher than the multicut version in all four datasets with 300 scenarios for \(k>1\) (See page 167 of [8], for a discussion on the problem-dependent nature of the performance of the single vs. multicut approach).

We remark that we implemented the basic greedy algorithm as proposed by Kempe et al. [21]. This implementation needs to perform the influence spread function evaluation for each node for each scenario at each iteration, resulting in a heavy running time of \({\mathcal {O}}(kn|\Lambda |m)\). Several improvements to this implementation using different data structures and algorithmic enhancements have been proposed (see, e.g., [12, 28]). In particular, [12] propose the so-called Discount Greedy method and report that it solves the Phy-HEP instances within a second. In this paper, our main goal is not to compete with these efficient heuristics. Instead, we focus on solving the problem on large-scale datasets optimally in a reasonable time. Our computational experiments demonstrate an overlooked opportunity to use optimization methods to solve stochastic influence maximization problems to provable optimality.

6 Conclusion

In this paper, we propose a delayed constraint generation algorithm to solve influence maximization problems arising in social networks. We show that exploiting the submodularity of the influence function leads to strong optimality cuts. Furthermore, our computational experiments with large-scale real-world test instances indicate that the algorithm performs favorably against a popular greedy heuristic for this problem. In most instances, our algorithm finds a solution with provable optimality guarantees more quickly than a basic implementation of the greedy heuristic, which can only provide a 0.63 performance guarantee. Our algorithm is applicable to many other variants of the influence maximization problem for which the influence function is submodular. Furthermore, we generalize the proposed algorithm to solve any two-stage stochastic program, where the second-stage value function is submodular.

Our results on optimization-based methods for the fundamental influence maximization problems provide a foundation to build algorithms for more advanced models, such as the adaptive model of Seeman and Singer [45], where a subset of additional seed nodes is selected in the second stage based on the realization of some of the uncertain parameters and the seed nodes selected in the first stage. The decomposition methods of Sen [46]; Gade et al. [17] and Zhang and Küçükyavuz [57] can be employed in this case to convexify the second stage problems that involve binary decisions. Another possible future research direction is to develop optimization-based methods for the problem of marketing to nodes [21, 22] to increase their probabilities of getting activated.