Beyond Worst-Case (In)approximability of Nonsubmodular Influence Maximization

Schoenebeck, Grant; Tao, Biaoshuai

doi:10.1007/978-3-319-71924-5_26

Grant Schoenebeck¹⁵ &
Biaoshuai Tao¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10660))

Included in the following conference series:

International Conference on Web and Internet Economics

1098 Accesses
6 Citations

Abstract

We consider the problem of maximizing the spread of influence in a social network by choosing a fixed number of initial seeds, formally referred to as the influence maximization problem. It admits a $(1-1/e)$-factor approximation algorithm if the influence function is submodular. Otherwise, in the worst case, the problem is NP-hard to approximate to within a factor of $N^{1-\varepsilon }$, where N is the number of vertices in the graph. This paper studies whether this worst-case hardness result can be circumvented by making assumptions about either the underlying network topology or the cascade model. All of our assumptions are motivated by many real life social network cascades.

First, we present strong inapproximability results for a very restricted class of networks called the (stochastic) hierarchical blockmodel, a special case of the well-studied (stochastic) blockmodel in which relationships between blocks admit a tree structure. We also provide a dynamic-program based polynomial time algorithm which optimally computes a directed variant of the influence maximization problem on hierarchical blockmodel networks. Our algorithm indicates that the inapproximability result is due to the bidirectionality of influence between agent-blocks.

Second, we present strong inapproximability results for a class of influence functions that are “almost” submodular, called 2-quasi-submodular. Our inapproximability results hold even for any 2-quasi-submodular f fixed in advance. This result also indicates that the “threshold” between submodularity and nonsubmodularity is sharp, regarding the approximability of influence maximization.

The authors gratefully acknowledge the support of the National Science Foundation under Career Award 1452915 and AifT Award 1535912.

A full version of this paper is available on arXiv: https://arxiv.org/abs/1710.02827.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Parameterized Approximability of Maximizing the Spread of Influence in Networks

A general greedy approximation algorithm for finding minimum positive influence dominating sets in social networks

Article 06 October 2021

Hardness of Bounding Influence via Graph Modification

1 Introduction

A cascade is a fundamental social network process in which a number of nodes, or agents, start with some property that they then may spread to neighbors. The importance of network structure on cascades has been shown to be relevant in a wide array of environments, including the adoption of products [5, 8, 18, 30], farming technology [15], medical practices [14], participation in microfinancing [4], and the spread of information over social networks [26].

A natural question, known as the influence maximization problem (InfMax), is how to place a limited number k of initial seeds, in order to maximize the spread of the resulting cascade [17, 24, 25, 32, 33]. In order to study influence maximization, we first need to understand how cascades spread. Many cascade models have been proposed [2, 31, 37], and two simple examples are the Independent Cascade model [24, 25, 32] and the Threshold model [20]. In the Independent Cascade model, each newly infected node infects each currently uninfected neighbor in the subsequent round with some fixed probability p. In the Threshold model each node has a threshold (0, 1, 2, etc.) and becomes infected when the number of infected neighbors meets or surpasses that threshold.

Cascade models can be roughly divided into two categories: submodular and nonsubmodular. In submodular cascade models, such as the Independent Cascade model, a node’s marginal probability of becoming infected after a new neighbor is infected decreases with the number of previously infected neighbors [24]. Submodular cascade models are fairly well understood theoretically, and properties of these cascades are usually closely related to a network’s degree distribution and conductance [23].

In nonsubmodular contagion models, like the Threshold model, the marginal probability of being infected may increase as more neighbors are infected. For example, if a node has a threshold of 2, then the first infected neighbor has zero marginal impact, but the second infected neighbor causes this node to become infected with probability 1. Unlike submodular contagions, nonsubmodular contagions can require well-connected regions to spread [9].

Influence maximization becomes qualitatively different in nonsubmodular settings. In the submodular case, one should put as much distance between the k initial adopters as possible, lest they erode each other’s effectiveness. However, in the nonsubmodular case, it may be advantageous to place the initial adopters close together to create synergy and yield more adoptions. The intuition that it is better to saturate one market first, and then expand implicitly assumes nonsubmodular influence in the cascades.

In general, it is NP-hard even to approximate InfMax to within $N^{1-\epsilon }$ of the optimal expected number of infections [25]. However, assuming that we are looking at a submodular contagion, a straightforward greedy algorithm can efficiently find an answer that is at least a $(1- 1/e)$ fraction of the optimal answer. Unfortunately, empirical research shows that many cascades are not submodular [3, 27, 34].

Key Question: Can this worst-case hardness result for nonsubmodular influence maximization be circumvented by making assumptions about either the underlying network topology or the cascade model?

We know a lot about what social networks look like, and previous hardness reductions make no attempt to capture realistic features of networks. It is very plausible that by restricting the space of networks we might regain tractability.

In this paper, we consider two natural network topologies: the hierarchical block model and the stochastic hierarchical blockmodel. Each of both is a natural restriction on the classic (stochastic) blockmodel [16, 22, 38] network structure. In (stochastic) blockmodels, agents are partitioned into $\ell $ blocks. The weight (or likelihood in the stochastic setting) of an edge between two vertices is based solely on blocks to which the vertices belong. The weights (or probabilities) of edges between two blocks can be represented by an $\ell \times \ell $ matrix. In the (stochastic) hierarchical blockmodel, the structure of the $\ell \times \ell $ matrix is severely restricted to be “tree-like”.^{Footnote 1}

Our (stochastic) hierarchical blockmodel describes the hierarchical structure of the communities, in which a community is divided into many sub-communities, and each sub-community is further divided, etc. Typical examples include the structure of a country, which is divided into many provinces, and each province can be divided into cities. Our model captures the natural observation that people in the same sub-community in the lower hierarchy tend to have tighter (or more numerous) bonds among each other [13].

We also consider restrictions on the cascade model. The same research showing that cascades are often not submodular empirically also shows that the local submodularity often fails in one particular way—the second infected neighbor of an agent is, on average, more influential than the first. This has already been observed in community formation [3], viral marketing [27] and Twitter network [34]. This motivates our study of 2-quasi-submodular cascade model where the marginal effect of the second infected neighbor is greater than the first, but after that the marginal effect decreases.

1.1 Our Results

First, we present inapproximability results for InfMax in the hierarchical blockmodel. We show that InfMax is NP-hard to approximate within a factor of $N^{1-\varepsilon }$ for arbitrary $\varepsilon >0$, even if we assume all agents have unit threshold $\theta _v=1$. We also extend this hardness result to the stochastic hierarchical blockmodel in the full version of our paper.

Moreover, for hierarchical blockmodel, we present a dynamic program based polynomial time algorithm for the influence maximization problem when we additionally assume the influence from one block to another is “one-way”. This provides insights to the above intractability result: the difficulty comes from the bidirectionality of influence between agent-blocks.

Secondly, we present a family of inapproximability results for the 2-quasi-submodular cascade model. In particular, for any 2-quasi-submodular influence function f, we show that it is NP-hard to approximate the influence maximization problem within a factor of $N^\tau $ when each agent has f as its local influence function, where $\tau > 0$ is a constant depending on f. This can be seen as a threshold result for inapproximability of influence maximization, because if f is submodular, then the problem can be approximated to within a $(1-1{/}e)$-factor, but if f is just barely nonsubmodular the problem can no longer be approximated to within any constant factor.

Finally, we pose the open question of whether enforcing the aforementioned restrictions simultaneously on the network and the cascade renders the problem tractable.

1.2 Related Works

The influence maximization problem InfMax was first posed by Domingos and Richardson [17, 33]. Kempe, Kleinberg, and Tardos showed that a simple greedy algorithm obtains a $(1-1/e)$ factor approximation to the problem in the independent cascade model and linear threshold model [24], and extended this result to a family of submodular cascades which captures the prior results as a special case [25]. Mossel and Roch [32] further extended this result to capture all submodular cascades.

Perhaps most related to the present work, are several inapproximability results for InfMax. If no assumption is made for the influence function, InfMax is NP-hard to approximate to within a factor of $N^{1 - \varepsilon }$ for any $\varepsilon > 0$ [25].

Chen [10] found inapproximability results on a similar optimization problem: instead of maximizing the total number of infected vertices given k initial targets, he considered the problem of finding a minimum-sized set of initial seeds such that all vertices will eventually be infected. This work studied restrictions of this problem to various fixed threshold models.

An important difference between our hardness result in Sect. 5 and all the previous results is that our result holds for any 2-quasi-submodular functions. In particular, in this work, f is fixed in advance before the NP-hardness reduction, while in previous work, specific influence functions were constructed within the reductions.

Several works looked at slightly different aspects of influence maximization. Borgs et al. [7] provably showed fast running times when the influence function is the independent cascade model. Lucier et al. [28] showed how to parallelize (in a model based on Map Reduce) the subproblem of determining the influence of a particular seed. Seeman and Singer [35] studied the special case where only a subset of the nodes in the network are available to be infected. They showed a constant factor approximation to the problem in their setting. He and Kempe looked at a robust versions of the problem [21] where the exact parameters of the cascade are unknown. Several works [6, 19] studied the problem as a game between two different infectors.

Following the work of Kempe et al. [24, 25], there were extensive works to solve InfMax based on the heuristic implementations of the greedy algorithm designed to be efficient and scalable [11, 12, 28].

Notion of “near submodularity” was also proposed and studied in [36]. Our definition differs from the one in [36] in that a 2-quasi-submodular function can be, intuitively, very far from being submodular (for example, the 2-threshold cascade model). However, our reduction in Sect. 5 works for all a 2-quasi-submodular function, and 2-quasi-submodular functions can be arbitrarily close to a submodular one.

Our dynamic program for the hierarchical blockmodel with “one-way” influence (available in the full version) was further studied and generalized by Angell and Schoenebeck in [1]. They showed that, empirically, this generalized algorithm works very well even for arbitrary graphs. Specifically, they run dynamic programming on a hierarchical decomposition of general graphs, and, empirically, the algorithm effectively leverages the resultant hierarchical structures to return seed sets substantially superior to those of the greedy algorithm.

2 Preliminaries

In general a cascade on a graph is a stochastic mapping from a subset of vertices—the seed vertices, to another set of vertices that always contain the seed vertices—the infected vertices. The cascades we study in this paper all belong to the general threshold model [32], which captures the local decision making of vertices.

Definition 1

The general threshold model $I^{G}_{F,{\mathcal D}}$, is defined by a graph $G=(V,E)$ which may or may not be edge-weighted, and for each vertex v:

i.
monotone local influence function $f_v:\{0, 1\}^{|\Gamma (v)|} \mapsto {\mathbb R}_{\ge 0}$ where $\Gamma (v)$ denotes the neighbor vertices of v and $f_v(\emptyset ) = 0$, and
ii.
a threshold distribution ${\mathcal D}_v$ whose support is ${\mathbb R}_{\ge 0}$. Let F and $\mathcal {D}$ denote the collection of $f_v$ and $\mathcal {D}_v$ respectively.

On input $S \subseteq V$, $I^{G}_{F,{\mathcal D}}(S)$ outputs a set of vertices as follows:

1.
Initially only vertices in S are infected, and for each vertex v the threshold $\theta _v\sim {\mathcal D}_v$ is sampled from $\mathcal {D}_v$ independently.^{Footnote 2}
2.
In each subsequent round, a vertex v becomes infected if the influence of its infected neighbors exceeds its threshold.
3.
The set of infected vertices is the output (after a round where no additional vertices are infected).

We use k to denote |S|—the number of seeds, and use N to denote |V|—the total number of vertices in G. Let

$$\sigma ^{G}_{F,{\mathcal D}}(S) = \mathbb {E}\left[ \left| I^{G}_{F,{\mathcal D}}(S)\right| \right] $$

be the expected total number of infected vertices due to the influence of S, where the expectation is taken over the samplings of the thresholds of all vertices. We refer to $\sigma ^{G}_{F,\mathcal {D}}(\cdot )$ as the global influence function. Sometimes we write $\sigma (\cdot )$ with the parameters $G,F,\mathcal {D}$ omitted, when there is no confusion. Given that each $f_v$ is monotone, it is straightforward to see that $\sigma $ is monotone.

Definition 2

The InfMax problem is an optimization problem which takes as inputs $G=(V,E)$, F, ${\mathcal D}$, and an integer k, and outputs $\max _{S \subseteq V: |S| = k} \sigma ^{G}_{F,\mathcal {D}}(S)$, the maximum global influence of a set of size k.

In this paper, we consider several special cases of the general threshold model $I^G_{F,{\mathcal D}}$ by making assumptions on the network topology G, or the cascade model^{Footnote 3} $F,{\mathcal D}$.

2.1 Assumptions on Graph G

We consider the hierarchical blockmodel, which is the special case of the well studied blockmodel [38].

Definition 3

A hierarchical blockmodel is an undirected edge-weighted graph $G=(V,T)$, where V is the set of all vertices of the graph G, and $T=(V_T,E_T,w_T)$ is a node-weighted binary tree T called a hierarchy tree. In addition, $w_T$ satisfies $w_T(t_1)\le w_T(t_2)$ for any $t_1,t_2\in V_T$ such that $t_1$ is an ancestor of $t_2$.^{Footnote 4} Each leaf node $t\in V_T$ corresponds to a subset of vertices $V(t)\subseteq V$, and the V(t) sets partition the vertices of V. In general, if t is not a leaf, we denote $V(t)=\cup _{t':\text { a leaf, and an offspring of }t}V(t')$.

For $u, v \in V$, the weight of the edge (u, v) in G is just the weight of the least common ancestor of u and v in T. That is $w(u, v) = \max _{t: u, v \in V(t)} w(t)$. If this weight is 0, then we say that the edge does not exist.

Figure 1 provides an example of how a hierarchy tree defines the weights of edges in the corresponding graph.

In the full version, we will present an example for hierarchical blockmodel, and we will generalize Definition 3 to a stochastic version, and define the stochastic hierarchical model.

2.2 Assumptions on Cascade Model $F,{\mathcal D}$

We consider several generalizations of a well-studied cascade model called linear threshold model [24]. The linear threshold model is a special case of the general threshold model $I^G_{F,{\mathcal D}}$, with each $f_v$ being linear (see Definition 4 later), and each ${\mathcal D}_v$ being the uniform distribution on [0, 1].

The cascade model in Definition 4 generalizes the linear threshold model by removing the assumption on ${\mathcal D}_v$. The universal local influence model defined in Definition 5, generalizes the linear threshold model by allowing non-linear $f_v$, while it restricts our attention to unweighted graphs. We also consider a special case where $f_v$ is 2-quasi-submodular in the last subsection.

Linear Local Influence Functions. A natural selection of local influence function $f_v$ is the linear function, by which the influences from v’s neighbors are additive.

Definition 4

Given a general threshold model $I^G_{F,{\mathcal D}}$, we say that F is linear if for each $v\in V$ we have

$f_v(S_v)=\sum _{u\in S_v}w(u,v)$ when G is edge-weighted;
$f_v(S_v)=|S_v|$ when G is unweighted.

For a general threshold model $I^G_{F,{\mathcal D}}$ with linear F, if we additionally assume each ${\mathcal D}_v$ is the uniform distribution on [0, 1], then this becomes the linear threshold model.

Universal Local Influence Functions. We say $f_v$ is symmetric if $f_v(S_v)$ only depends on the number of v’s infected neighbors $|S_v|$ so that each of v’s infected neighbors is of equal importance. In this case, $f_v$ can be viewed as a function $f_v:{\mathbb Z}_{\ge 0}\mapsto {\mathbb R}_{\ge 0}$ which takes an integer as input, rather than a set of vertices. Thus $f_v$ can be encoded by an increasing sequence of positive real numbers $a_0, a_1, a_2, \ldots $ so that $f_v(i)=a_i$. Note that $f_v(0)= a_0 = 0$, as we have assumed $f_v(\emptyset )=0$.

For instance, consider the linear local influence function defined in Definition 4. $f_v$ is symmetric if G is unweighted, in which case $a_i=i$. $f_v$ may not be symmetric if G is edge-weighted, as the neighbors connected by heavier edges contribute more to $f_v(S_v)$.

Definition 5

Given an increasing function $f:{\mathbb Z}_{\ge 0}\mapsto [0,1]$, the universal local influence model $I_f^G$ is a special case of the general threshold model $I^G_{F,{\mathcal D}}$, such that for each $v\in V$ we have

$f_v$ is symmetric, and $f_v=f$ (such that all $f_v$’s are identical).
${\mathcal D}_v$ is the uniform distribution on [0, 1].

Notice that we can assume without loss of generality that G is unweighted in Definition 5, as each $f_v$ is fixed to be some increasing function f which does not depend on the weights of edges.

After assuming G is unweighted, the universal local influence model is a generalization of linear threshold model: the linear threshold model can be viewed as the universal local influence model by restricting $a_i=i$.

As a final remark, for any general threshold model $I^G_{F,{\mathcal D}}$ with each $D_v$ being the uniform distribution on [0, 1], we can intuitively view $f_v(S_v)$ as the probability that v will be infected (where we take $f_v(S_v)>1$ as probability 1). In the universal local influence model, $a_i$ can be viewed as the probability that a vertex will be infected, given that it has i infected neighbors.

Submodular and 2-Quasi-Submodular Functions. Let $g: 2^{S} \rightarrow {\mathbb R}$ be a function which takes as input a subset of a set S. Formally, g is submodular if $g(A\cup \{u\})-g(A)\ge g(B\cup \{u\})-g(B)$ for any $u \in S$ and sets $A\subseteq B \subseteq S$. Intuitively, this means that the marginal effect of each element decreases as the set increases.

Given $G, F, {\mathcal D}$ we say that $I^G_{F,{\mathcal D}}(\cdot )$ is submodular if $\sigma ^G_{F,{\mathcal D}}(\cdot )$ is. We can similarly define submodularity for local influence functions $f_v$. In [32], it has been shown that the local submodularity of all $f_v$’s implies the global submodularity of $I^G_{F,{\mathcal D}}(\cdot )$ for all G when ${\mathcal D}_v$ is the uniform distribution on [0, 1].

We are particularly concerned with the universal local influence model in Definition 5. Here f is submodular if the marginal gain of f by having one more infected neighbor is non-increasing as the number of infected neighbors increases. Formally, for $i_1<i_2$, we have

$$f(i_1+1) - f(i_1) \ge f(i_2+1) - f(i_2).$$

Intuitively, f is submodular if its domain can be smoothly extended to ${\mathbb R}_{\ge 0}$ to make f concave.

We will consider the 2-quasi-submodular local influence function f, which is “almost” submodular such that the submodularity is only violated for the first two inputs of f. In particular, we fail to have the submodular constraint $f(1)-f(0)\ge f(2)-f(1)$, and instead we have $f(1)-f(0)<f(2)-f(1)$, which is just $f(2)>2f(1)$ as $f(0)=0$.

Definition 6

$f:{\mathbb Z}_{\ge 0}\mapsto [0,1]$ is 2-quasi-submodular if $f(2)>2f(1)$ and $f(i)-f(i+1)$ is non-increasing for $i\ge 2$.

In general, for any non-zero submodular function f, if we sufficiently decrease f(1), f becomes 2-quasi-submodular. Thus, from any non-zero submodular function, we can obtain a 2-quasi-submodular function.

We note that the 2-threshold cascade model, where each vertex will be infected if it has at least 2 infected neighbors, can be viewed as the universal local influence function cascade with a 2-quasi-submodular f (with $f(0)=f(1)=0$ and $f(i)=1$ for $i\ge 2$, keeping the assumption $\theta _v$ is drawn uniformly at random from [0, 1]).

3 Hierarchical Blockmodel Influence Maximization

In this section, we provide a strong inapproximability result for InfMax problem for hierarchical blockmodel cascade even when all vertices have a deterministic threshold 1. Specifically, we will show that it is NP-hard to approximate optimal $\sigma (S)$ within a factor of $N^{1-\varepsilon }$ for any $\varepsilon >0$ (recall that $N=|V|$). The same inapproximability result holds for the most general case where ${\mathcal D}$ is given as input to the InfMax problem.

Theorem 1

For any constant $\varepsilon >0$, InfMax $(G,F,{\mathcal D},k)$ is NP-hard to approximate to a factor of $N^{1-\varepsilon }$, even if G is a hierarchical blockmodel, F is linear (see Definition 4), and ${\mathcal D}_v$ is the degenerate distribution with mass 1 on $\theta _v=1$ for all $v\in V$.

We will prove Theorem 1 by a reduction from the VertexCover problem, a well-known NP-complete problem.

Definition 7

Given an undirected graph $\mathsf{G} =(\mathsf{V},\mathsf{E})$ and a positive integer $\mathsf{k} $, the VertexCover problem asks if we can choose a subset of vertices $\mathsf{S} \subseteq \mathsf{V} $ such that $|\mathsf{S} |=\mathsf{k} $ and such that each edge is incident to at least one vertex in $\mathsf{S} $.

Proof

(Proof of Theorem 1 ). Given a VertexCover instance with $\mathsf{G} $ and $\mathsf{k} $, let $n=|\mathsf{V} |$ and $m=|\mathsf{E} |$. We use $A_1,\ldots ,A_n$ to denote the n vertices and $e_1,\ldots ,e_m$ to denote the m edges.^{Footnote 5} We assume $n>\mathsf{k} $ and $m>n+\mathsf{k} $.^{Footnote 6} Let $W=nm$, $M=(n(2W+m)-1)^\frac{1}{\varepsilon }$, and $\delta >0$ be a sufficiently small real number.

We will construct the graph $G=(V,E,w)$ by constructing a hierarchy tree T which uniquely determines G (see Definition 3). The construction of T is shown in Fig. 2.

Each branch $A_i$ corresponds to each vertex $A_i$ in the VertexCover instance. For each edge $e_j$ in VertexCover, we construct n vertices $v_{1j},\ldots ,v_{nj}$ on the n branches respectively in the way shown. The numbers shown on the tree nodes represent the weights, where

$$ w_{ij}=\left\{ \begin{array}{ll} \frac{1-(n+\mathsf{k}-1)W\delta -(n-1)(j-1)\delta -\delta }{W-1+j} &{} \text { if edge }e_j\text { is incident to }A_i\\ \frac{1-(n+\mathsf{k}-1)W\delta -(n-1)(j-1)\delta -2\delta }{W-1+j} &{} \text {otherwise} \end{array} \right. . $$

For VertexCover with $\mathsf{k} =|\mathsf{S} |$, consider InfMax with $k=n+\mathsf{k} $ seeds. We aim to show that,

1.
If the VertexCover instance is a YES instance, we can infect at least M vertices;
2.
If the VertexCover instance is a NO instance, we can infect at most $M^\varepsilon =n(2W+m)-1$ vertices.

To show (1), suppose we have a YES VertexCover instance with a subset of vertices $\mathsf{S} \subseteq \mathsf{V} $ that covers all edges in $\mathsf{E} $. In the InfMax instance, we aim to show that at least M vertices will be infected if we choose

an arbitrary seed in each of the cliques $C_1,\ldots ,C_n$ (n seeds in total), and
an arbitrary seed in the clique $D_i$ for each $A_i\in \mathsf{S} $ ($\mathsf{k} $ seeds in total).

By such a choice of $k=n+\mathsf{k} $ seeds, in the first round of the cascade, all the W vertices in each of $C_1,\ldots ,C_n$ and each of those $\mathsf{k} $ $(D_i)$’s are infected. For each edge $e_j\in \mathsf{E} $, denote $e_j=(A_{i_j},A_{i_j'})$ such that $A_{i_j},A_{i_j'}$ are the two vertices cover the edge $e_j$. By our choice of seeds, a seed is chosen in at least one of $D_{i_j}$ and $D_{i_j'}$, let $D_{i_j}$ be the one. By a careful calculation, we can see that the cascade after the first round carries on in the following order:

$$v_{i_11}\rightarrow v_{i_1'1}\rightarrow \{v_{i1}\}_{i\ne i_1,i_1'}\rightarrow v_{i_22}\rightarrow v_{i_2'2}\rightarrow \{v_{i2}\}_{i\ne i_2,i_2'}\rightarrow \cdots $$

$$\rightarrow v_{i_mm}\rightarrow v_{i_m'm}\rightarrow \{v_{im}\}_{i\ne i_m,i_m'}\rightarrow B.$$

Therefore, we conclude (1) as we already have M infected vertices by just counting those in the bundle B.

To show (2) by contradiction, we assume that we can choose a seed set $S\subseteq V$ such that $|S|=k=n+\mathsf{k} $ and $\sigma (S)>M^\varepsilon $. By a careful analysis, we can conclude that the only possible way to choose S is as follow.

an arbitrary seed from each of $C_1,\ldots ,C_n$ (n seeds in total);
an arbitrary seed from each of $D_{\pi _1},\ldots ,D_{\pi _\mathsf{k}}$ for certain $\{\pi _1,\ldots ,\pi _k\}\subseteq \{1,\ldots ,n\}$ ($\mathsf{k} $ seeds in total).

While we refer the readers to the full version of the paper for a detailed proof of why this is true, here we provide an intuition for this. Firstly, choosing k seeds among the 2n cliques $C_1,\ldots ,C_n,D_1,\ldots ,D_n$ is considerably more beneficial, as a seed would cause the infection of W vertices. Secondly, if we cannot choose both $C_i$ and $D_i$, it is always better to choose $C_i$ because the weights $w_{i1},\ldots ,w_{im}$ are considerably larger than $\delta (1+1/W)$, if $\delta $ is set sufficiently small.

Since the VertexCover instance is a NO instance, there exists an edge $e_j=(A_{i_j},A_{i_j'})$ such that no vertex in $D_{i_j}$ and $D_{i_j'}$ is chosen as seed. By a careful calculation, we show that the cascade would stop at the level $\{v_{ij}\}_{i=1,\ldots ,n}$, which concludes (2).

By (1) and (2), the InfMax problem for G we have constructed is NP-hard to approximate within a factor of at least

$$\frac{M}{M^\varepsilon }=M^{1-\varepsilon }=\varTheta \left( N^{1-\varepsilon }\right) ,$$

as $N=M+M^\varepsilon =\varTheta (M)$. Since $\varepsilon $ is arbitrary, the inapproximability factor can be written as just $N^{1-\varepsilon }$.

In the hard InfMax instances in Fig. 2, if the VertexCover instance is a YES instance, the influence of the properly chosen seeds actually passes through these n branches “back-and-forth” frequently. It is exactly this bi-directional effect that makes InfMax hard. In the full version of this paper, we consider a variant to the hierarchical blockmodel in which the influence between any two vertex-blocks can only be “one-way”, and present a dynamic program to solve InfMax for this variant optimally.

4 Stochastic Hierarchical Blockmodel Influence Maximization

In the full version, we consider a stochastic variant of the hierarchical blockmodel, called stochastic hierarchical blockmodel. The stochastic hierarchical blockmodel is similar to the hierarchical blockmodel, in that the structure of the graph is determined by a hierarchy tree. Instead of assigning weights to different edges measuring the strength of relationships, here we assign a probability with which the edge between each pair of vertices appears.

We show in the full version that InfMax is NP-hard to approximate within factor $N^{1-\varepsilon }$. In particular, we consider two settings respecting if the seed-picker picks the seed before or after seeing the sampling of the graph G, and show that the same inapproximability holds for both settings.

5 2-Quasi-Submodular Influence Maximization

We present a sketch of the proof for the following Theorem in this section. The complete proof is available in the full version of our paper.

Theorem 2

For any fixed 2-quasi-submodular f, there exists a constant $\tau $ depending on f such that InfMax with universal local influence model $I_G^f$ is NP-hard to approximate to within factor $N^\tau $.

We consider two cases: $f(1)\ne 0$ and $f(1)=0$, and we only discuss the first case here. The proof for the case $f(1)=0$ is available in the full version. Denote $a_i=f(i)$ and $p^*=\lim _{i\rightarrow \infty }a_i$.

We prove the theorem by a reduction from the SetCover problem.

Definition 8

Given a universe U of n elements, a set of $\kappa $ subsets $A=\{A_i\mid A_i\subseteq U\}$, and a positive integer k, the SetCover problem asks if we can choose k subsets $\{A_{i_1},\ldots ,A_{i_k}\}\subseteq A$ such that $A_{i_1}\cup \cdots \cup A_{i_k}=U$.

We construct a graph G which consists of two parts: the set cover part and the verification part, where the set cover part simulates SetCover and the verification part verifies if all the elements in the SetCover instance are covered. The construction is shown in Fig. 3. We first assume that the graph G is directed, and then we show that this assumption is not essential by constructing a directed edge gadget to simulate directed edges.

Given a SetCover instance, in the set cover part, we use a single vertex to represent a subset $A_i$ and a clique of size m to represent each element in U. If an element is in a subset, we create m directed edges from the vertex representing the subset to each the m vertices in the clique representing the element. If a vertex representing a subset is picked, then all vertices in the cliques corresponding to the elements contained in this subset will be infected with probability close to $p^*$, by choosing m large enough. We call such cliques as being activated. In a YES instance of SetCover, we can choose k seeds such that all cliques are activated.

In the verification part, we construct a AND gadget, simulating the logical AND operation, to verify if all the cliques are activated. The AND gadget takes n inputs, each of which is a set of vertices from each of the n cliques. The output of the AND gadget is a vertex v, such that it will only be infected with a positive constant probability if all the n cliques are activated.

We connect the output vertex v of this AND gadget to a huge bundle of $M_1$ vertices, such that a constant fraction of those $M_1$ vertices will be infected only if all the cliques are activated (which corresponds to the case the SetCover is a YES instance). By making $M_1$ large enough, we can achieve a hardness of approximation ratio $N^\tau $. To avoid the seed-picker bypassing the set cover game by directed seeding the output vertex v, we duplicate the verification part by $M_2$ times for some sufficiently large $M_2$.

Finally, we replace all directed edges in Fig. 3 by directed edge gadgets, including those connecting the vertices representing subsets and the cliques representing elements, and those connecting the set cover part and the verification part. To complete the proof of Theorem 2, we present the construction of the AND gadget and the directed edge gadget in the full version of this paper. The construction of both gadgets rely on that f is 2-quasi-submodular.

Notes

1.
Previous work on community detection in networks [29] defines a different, but related stochastic hierarchical blockmodel, where the hierarchy is restricted to two levels.
2.
The rationale of sampling thresholds after the seeds selection is to capture the scenario that the seed-picker does not have the full information on the agents in a social network, and this setting has been used in many other works [24, 32].
3.
The phrase “cascade model” here, as well as in the abstract and Sect. 1, refers to the description how each vertex is influenced by its neighbors, which is completely characterized by F and ${\mathcal D}$ in the general threshold model.
4.
Since, as it will be seen later, each node in the hierarchy tree represents a community and its children represent its sub-communities, naturally, the relation between two persons is stronger if they are in a same sub-community in a lower level.
5.
We use the letter A to denote the vertices in a VertexCover instance instead of commonly used v, while v is used for the vertices in an InfMax instance. Since VertexCover can be viewed as a special case of SetCover with vertices corresponding to subsets and edges corresponding to elements, the letter A, commonly used for subsets, is used here.
6.
For the assumption $m>n+\mathsf{k} $, notice that allowing the graph $\mathsf{G} $ to be a multi-graph does not change the nature of VertexCover, we can ensure m to be sufficiently large by just duplicating edges.

References

Angell, R., Schoenebeck, G.: Don’t be greedy: leveraging community structure to find high quality seed sets for influence maximization. In: WINE (2017)
Google Scholar
Arthur, W.B.: Competing technologies, increasing returns, and lock-in by historical events. Econ. J. 99(394), 116–131 (1989). http://www.jstor.org/stable/2234208
Article Google Scholar
Backstrom, L., Huttenlocher, D.P., Kleinberg, J.M., Lan, X.: Group formation in large social networks: membership, growth, and evolution. In: ACM SIGKDD (2006)
Google Scholar
Banerjee, A., Chandrasekhar, A.G., Duflo, E., Jackson, M.O.: The diffusion of microfinance. Science 341(6144), 1236498 (2013)
Article Google Scholar
Bass, F.M.: A new product growth for model consumer durables. Manag. Sci. 15(5), 215–227 (1969)
Article MATH Google Scholar
Bharathi, S., Kempe, D., Salek, M.: Competitive influence maximization in social networks. In: Deng, X., Graham, F.C. (eds.) WINE 2007. LNCS, vol. 4858, pp. 306–311. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77105-0_31
Chapter Google Scholar
Borgs, C., Brautbar, M., Chayes, J.T., Lucier, B.: Influence maximization in social networks: towards an optimal algorithmic solution. CoRR (2012)
Google Scholar
Brown, J.J., Reingen, P.H.: Social ties and word-of-mouth referral behavior. J. Consum. Res. 14, 350–362 (1987)
Article Google Scholar
Centola, D.: The spread of behavior in an online social network experiment. Science 329(5996), 1194–1197 (2010)
Article Google Scholar
Chen, N.: On the approximability of influence in social networks. SIAM J. Discret. Math. 23(3), 1400–1415 (2009)
Article MathSciNet MATH Google Scholar
Chen, W., Wang, Y., Yang, S.: Efficient influence maximization in social networks. In: ACM SIGKDD, pp. 199–208. ACM (2009)
Google Scholar
Chen, W., Yuan, Y., Zhang, L.: Scalable influence maximization in social networks under the linear threshold model. In: 2010 IEEE International Conference on Data Mining, pp. 88–97. IEEE (2010)
Google Scholar
Clauset, A., Moore, C., Newman, M.E.: Hierarchical structure and the prediction of missing links in networks. Nature 453(7191), 98–101 (2008)
Article Google Scholar
Coleman, J., Katz, E., Menzel, H.: The diffusion of an innovation among physicians. Sociometry 20, 253–270 (1957)
Article Google Scholar
Conley, T.G., Udry, C.R.: Learning about a new technology: Pineapple in Ghana. Am. Econ. Rev. 100(1), 35–69 (2010)
Article Google Scholar
DiMaggio, P.: Structural analysis of organizational fields: a blockmodel approach. Res. Organ. Behav. 8, 335–370 (1986)
Google Scholar
Domingos, P., Richardson, M.: Mining the network value of customers. In: ACM SIGKDD (2001)
Google Scholar
Goldenberg, J., Libai, B., Muller, E.: Using complex systems analysis to advance marketing theory development: modeling heterogeneity effects on new product growth through stochastic cellular automata. Acad. Mark. Sci. Rev. 9(3), 1–18 (2001)
Google Scholar
Goyal, S., Kearns, M.: Competitive contagion in networks. In: STOC, pp. 759–774 (2012)
Google Scholar
Granovetter, M.: Threshold models of collective behavior. Am. J. Sociol. 83(6), 1420–1443 (1978). http://www.journals.uchicago.edu/doi/abs/10.1086/226707
Article Google Scholar
He, X., Kempe, D.: Robust influence maximization. In: ACM SIGKDD (2016)
Google Scholar
Holland, P.W., Laskey, K.B., Leinhardt, S.: Stochastic blockmodels: first steps. Soc. Netw. 5(2), 109–137 (1983)
Article MathSciNet Google Scholar
Jackson, M.O.: Social and Economic Networks. Princeton University Press, Princeton (2008)
MATH Google Scholar
Kempe, D., Kleinberg, J.M., Tardos, É.: Maximizing the spread of influence through a social network. In: ACM SIGKDD, pp. 137–146 (2003)
Google Scholar
Kempe, D., Kleinberg, J., Tardos, É.: Influential nodes in a diffusion model for social networks. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 1127–1138. Springer, Heidelberg (2005). https://doi.org/10.1007/11523468_91
Chapter Google Scholar
Lerman, K., Ghosh, R.: Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. In: ICWSM, pp. 90–97 (2010)
Google Scholar
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. In: EC (2006)
Google Scholar
Lucier, B., Oren, J., Singer, Y.: Influence at scale: distributed computation of complex contagion in networks. In: ACM SIGKDD, pp. 735–744. ACM (2015)
Google Scholar
Lyzinski, V., Tang, M., Athreya, A., Park, Y., Priebe, C.E.: Community detection and classification in hierarchical stochastic blockmodels. arXiv (2015)
Google Scholar
Mahajan, V., Muller, E., Bass, F.M.: New product diffusion models in marketing: a review and directions for research. J. Mark. 54, 1–26 (1990)
Article Google Scholar
Morris, S.: Contagion. Rev. Econ. Stud. 67(1), 57–78 (2000)
Article MathSciNet MATH Google Scholar
Mossel, E., Roch, S.: Submodularity of influence in social networks: from local to global. SIAM J. Comput. 39(6), 2176–2188 (2010)
Article MathSciNet MATH Google Scholar
Richardson, M., Domingos, P.: Mining knowledge-sharing sites for viral marketing. In: ACM SIGKDD, pp. 61–70 (2002)
Google Scholar
Romero, D.M., Meeder, B., Kleinberg, J.: Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on Twitter. In: WWW, pp. 695–704. ACM (2011)
Google Scholar
Seeman, L., Singer, Y.: Adaptive seeding in social networks. In: FOCS, pp. 459–468. IEEE (2013)
Google Scholar
Horel, T., Singer, Y.: Maximization of approximately submodular functions. In: NIPS (2016). http://papers.nips.cc/paper/6236-maximization-of-approximately-submodular-functions.pdf
Watts, D.J.: A simple model of global cascades on random networks. Proc. Nat. Acad. Sci. 99(9), 5766–5771 (2002). http://www.pnas.org/content/99/9/5766.abstract
Article MathSciNet MATH Google Scholar
White, H.C., Boorman, S.A., Breiger, R.L.: Social structure from multiple networks. I. Blockmodels of roles and positions. Am. J. Sociol. 81(4), 730–780 (1976)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Michigan, Ann Arbor, USA
Grant Schoenebeck & Biaoshuai Tao

Authors

Grant Schoenebeck
View author publications
You can also search for this author in PubMed Google Scholar
Biaoshuai Tao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Biaoshuai Tao .

Editor information

Editors and Affiliations

Microsoft Research, Redmond, Washington, USA
Nikhil R. Devanur
Shanghai University of Finance and Economics, Shanghai, China
Pinyan Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schoenebeck, G., Tao, B. (2017). Beyond Worst-Case (In)approximability of Nonsubmodular Influence Maximization. In: R. Devanur, N., Lu, P. (eds) Web and Internet Economics. WINE 2017. Lecture Notes in Computer Science(), vol 10660. Springer, Cham. https://doi.org/10.1007/978-3-319-71924-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-71924-5_26
Published: 25 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71923-8
Online ISBN: 978-3-319-71924-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond Worst-Case (In)approximability of Nonsubmodular Influence Maximization

Abstract

Similar content being viewed by others

Parameterized Approximability of Maximizing the Spread of Influence in Networks

A general greedy approximation algorithm for finding minimum positive influence dominating sets in social networks

Hardness of Bounding Influence via Graph Modification