Keywords

1 Introduction

Online social networks, such as WeChat, Facebook, Twitter, have been important platforms for people to communicate and for business to advertise. People keep in touch with each other and make friends through social networks, they also like to share their innovations and ideas, etc., in the social networks [1, 2]. According to statistics, there are 3.725 billion users active in the social networks by Dec. 2019. Companies make use of the advantage of large crowds and rapid information dissemination in social networks to promote their products. Motivated by the information propagation in social networks, Influence Maximization (IM) problem is put forward by Kempe et al. in [3]. They formulate the IM problem as: selecting a set of users as seeds to maximize the expected number of users who are influenced by seeds. Influence maximization finds applications in many domain, like viral marketing. Viral marketing makes good use of the word-of-mouth effect of the social networks, and it promotes products by giving discounts to a small set of customers to spread their product information. Two classic influence spread model: Independent Cascade (IC) model and Linear Threshold (LT) model, are proposed by Kempe et al. in [3]. They also prove that the influence propagation function is monotone and submodular, thus, the greedy algorithm can be applied to solve the IM problem and obtain an \(1-1/e-\epsilon \) approximation ratio.

In real life, it is very common for multiple types of information propagate simultaneously in the social network. For example, in 2022 Apple Corporation issues two kinds of new iPhone, iPhone 14 is more cost-effective and iPhone 14 Pro is more powerful but more expensive. Although iPhone 14 and iPhone 14 Pro are the same kind of product, but they have different prices, appearances and performances, which can attract different groups of customers with different requirements. Based on the described background, consider the scenario where there are k kinds of products, a constraint K for the number of selected seeds, and a budget B for the activation cost of all selected seeds. Assuming that different seeds have different activation costs and different profits can be obtained when the product is purchased, which nodes to be selected as seeds and how to allocate the budget to seeds for k kinds of products such that the total profit is maximized? We study the profit maximization problem for competitive products in this paper, and assume that each seed can accept the discount of only one product for the fairness. We aim to allocate discounts to k sets of seed users for k kinds of products under two constraints K and B. The objective function that maximizes the total profit of k kinds of products can be formulated as a k-submodular function. Approximation algorithms with theoretical guarantee are proposed in our work. We summarize the main contributions in this paper as follows:

  • We formulate the profit maximization for competitive influence spread problem as a k-submodular function with both a knapsack constraint and a cardinality constraint problem. To best of our knowledge, this is the first time to study a monotone k-submodular maximization problem under both the knapsack and cardinality constraints in social networks.

  • We propose a Singleton+Greedy-Local-Search Algorithm in four steps, which obtains two approximation ratios: 0.216 and 0.158, in two different conditions.

2 Related Work

In this paper, we consider the profit maximization problem with multiple kinds of products in the network. We summarize the related studies on our work as follows.

Influence Maximization with Competitive Influence Spread: Most of the existing relevant studies on influence maximization consider the scenario of a single kind of information spreading in the social network. The competitive IM problem is firstly studied by Bharathi et al. [4]. They propose a game theory based method to solve it. Liang et al. [5] consider that multiple kinds of similar products promote in the network at the same time and the target of the promotion is a specific user group. They formulate such a problem as Targeted Influence Maximization in Competitive social networks (TIMC). A reverse reachable set based greedy method is proposed by them to solve the TIMC with approximation performance guarantee. Wu et al. [6] consider a scenario that multiple information propagate in a social network with different propagation probabilities. The problem is formulated as maximizing the total influence of all the different information under a constraint of seed budget k. They present a greedy algorithm with \(\frac{1}{3}\) approximation ratio, and also propose a parallel algorithm which improves the efficiency of the algorithm.

Profit Maximization: Profit maximization problem is a transformation of the influence maximization problem, which aims at maximizing the profit of products by selecting a seed set with a limited budget. Zhang et al. [7] study the Profit Maximization with Multiple Adoptions (\(PM^2A\)) problem. Two different approximation algorithms are devised to maximize the total profit of multiple kinds of product by selecting a limited number of seeds. Chen et al. [8] propose a Randomized Modified Greedy (RMG) algorithm to solve the Profit Maximization with Multiple Adoptions (\(PM^2A\)) problem, which obtain an \((1-1/e-\epsilon )\) approximation ratio. Yuan et al. [9] design two discount allocation strategies under the non-adaptive setting and adaptive setting, respectively, which achieve the goal of maximizing the expected number of users who adopt the product finally.

k -Submodular Maximization: Huber et al. [10] firstly define the k-submod- ular function, which is a generalization of the submodular function. Ohsaka et al. [11] study the maximization for a monotone k-submodular function with two different size constraints, and propose greedy algorithms with constant approximation factor. Tang et al. [12] study the maximization for a non-negative monotone k-submodular function with a knapsack constraint, they present a greedy algorithm which can obtain an \(\frac{1}{2}-\frac{1}{2e}\) approximation ratio with an \(O(n^4k^3)\) time complexity. Wang et al. [13] propose a framework for relaxing a k-submodular function to continuous space with the technique of multilinear extension. They also improve the approximation ratio to \(1/2-\epsilon \) for maximizing a monotone k-submodular function with knapsack constraint. V. Pham et al. [14] explore the applications of maximizing the k-submodular function under the knapsack constraint in influence maximization of social networks and the sensor placement. However, for the monotone k-submodular maximization problem under the knapsack constraint and cardinality constraint, to best of our knowledge, there is no related conclusion about it in social networks. Thus, we try to fill this gap and apply it to the social networks.

3 Diffusion Model and Problem Definition

3.1 The Diffusion Model

A social network is constructed as a directed graph G(VE), where each node \(v\in V\) represents a user, and each edge \((u,v)\in E\) represents that user v follows user u, u is an incoming neighbor of v, while v is an outgoing neighbor of u. The incoming neighbor set and the outgoing neighbor set of a node v are denoted as \(N^-(v)\) and \(N^+(v)\), respectively. IC model is used as the influence propagation model in our problem, its influence propagation process is described as follows.

Definition 1 (IC model)

Nodes in social networks have two different states: active and inactive, all nodes are initially inactive. There is a activation probability \(p_{uv}\in (0,1]\) associated with each edge \(e=(u,v)\in E\). When a node u is firstly activated at time t, then for each of his inactive outgoing neighbor \(v\in N^+(u)\), he can activate them with a probability \(p_{uv}\) at time \(t+1\). Finally, the influence propagation process terminates if there are no newly activated nodes in the future.

3.2 Problem Formulation

Give a social network \(G=(V,E)\). Assume that marketers wants to promote k kinds of products in the social network. k kinds of products information propagate under the IC model at the same time. We aim to choose k seed sets \(S=\{S_1, S_2,\cdots , S_k\}\) and provide discounts to them, and assume that each seed user can only be used for propagating at most one kind of product information. As different user can bring different levels of influence, so we give different discounts to different seeds, and influential users get bigger discounts. Let \(\sigma (S_i)\) be the expected influence spread of seed set \(S_i\) for product i, i.e., the expected number of users who adopt product i in the social network. Let \(f(S_i)\) be the total profit that obtained by purchasing product i. Moreover, \(\sigma (S|G)\) and f(S|G) are the expected number of influenced people and the total profit obtained by adopting k kinds of products, respectively.

The profit maximization problem for competitive products marketing at the same time in the social network with a total activation cost B and a total number of seeds K constraints can be formulated as follows:

Problem 1

(Profit Maximization Problem for Competitive Influence Spread (PMPCIS)). Given a social network graph \(G=(V,E)\), k kinds of products, the IC model, the cost c(a) that activating a node a to purchase a product, the profit \(p_i\) that a node can gain when he adopts product i, the seed set \(S=\{S_1, S_2,\cdots , S_k\}\), where \(S_i\) is selected for propagating the information of the product i and \(S_i\cap S_j=\varnothing \) for any \(i, j \in [1,k]\) and \(i\ne j\). The total number of selected seeds is represented by \(|S|=\sum _{i=1}^k|S_i|\) and the upper bound is K. The total activation cost for the seed set S is denoted by \(c(S)=\sum _{i=1}^k\sum _{a\in S_i}c(a)\) and the total budget is given as B. The expected influence spread for seed set \(S_i\) is expressed by \(\sigma (S_i)\). Our target is to select an optimal seed set \(S=\{S_1, S_2,\cdots , S_k\}\) such that the total profit f(S) is maximized, i.e.,

$$\begin{aligned} \begin{aligned} &S^*=\arg \max \limits _{S} \ f(S)\\ &s.t.\quad c(S)\le B\\ &\qquad |S|\le K. \end{aligned} \end{aligned}$$
(1)

From the definition of PMPCIS, we can intuitively get \(f(S)=\sum _{i=1}^kp_i\sigma (S_i)\). In literature [3], Kempe et al. have proved that the classical influence maximization problem in IC model is a NP-hard problem. When the types of products are reduced to one, our PMPCIS is equivalent to the traditional IM problem for IC model. Thus, the PMPCIS is a NP-hard problem.

4 Solution for PMPCIS

We propose solution for PMPCIS in this section. At first, we analyze the properties of the objective function f for PMPCIS.

4.1 Properties for Objective Function f

Firstly, we introduce an important property for a set function: k-submodular. Let X be a finite non-empty set, and let \((k+1)^X:=\{(U_1,\cdots , U_k)~|~U_i\subseteq X, \forall i\in \{1,2,\cdots ,k\}, U_i\cap U_j=\varnothing , \forall i\ne j \}\) be the family of k disjoint sets \((U_1,\cdots , U_k)\). A function h: \((k+1)^X\rightarrow \mathbb {R}\) is k-submodular if for any \(\boldsymbol{U}=\{U_1, \cdots , U_k\}\) and \(\boldsymbol{W}=\{W_1,\cdots , W_k\}\) in \((k+1)^X\), it satisfies,

$$\begin{aligned} \begin{aligned} h(\boldsymbol{U})+h(\boldsymbol{W})\ge h(\boldsymbol{U}\sqcup \boldsymbol{W})+h(\boldsymbol{U}\sqcap \boldsymbol{W}), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \boldsymbol{U}\sqcap \boldsymbol{W}\,{:}{=}\,(U_1\cap W_1,\cdots ,U_k\cap W_k), \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned} \boldsymbol{U}\sqcup \boldsymbol{W}\,{:}{=}\,\left( U_1\cup W_1\backslash \left( \bigcup \nolimits _{i\ne 1}U_i\cup W_i\right) ,\cdots ,U_k\cup W_k\backslash \left( \bigcup \nolimits _{i\ne k}U_i\cup W_i\right) \right) . \end{aligned} \end{aligned}$$

If a function satisfies the properties of orthant submodularity and pairwise monotonicity at the same time, which indicates that it is a k-submodular function. It is very intuitively to verify that our objective function is k-submodularity.

Then, we elaborate on additional characteristics and notations of k-submodu- lar functions for our problem. Every k-tuple \(\textbf{x}=(X_1,\ldots ,X_k)\in (k+1)^V\) uniquely corresponds to a set \(A=\{(a,d)~|~a\in X_d, d\in [k]\}\) composed of item-dimension pairs. Hence, a user-product pair (ad) is included in set A (termed as a solution) if and only if \(a\in X_d\) in \(\textbf{x}\).

In our problem, for ease of presentation, we write \(\textbf{x}\) and its corresponding solution A interchangeably. For any solution \(A\in (k+1)^V\), we define \(U(A):=\{a\in V~|~\exists ~d\in [k],~(a,d)\in A\}\) to be the set of seed included, and the size is \(|A|=|U(A)|\).

The marginal gain of adding a user-product pair (ad) to A is

$$\varDelta _{a,d}f(A)=f(A\cup \{(a,d)\})-f(A),$$

and the marginal density is \(\frac{\varDelta _{a,d}f(A)}{c(a)}\). As the profit maximization function f is monotone k-submodular, it satisfies the pairwise monotonicity

$$\begin{aligned} \varDelta _{a,d}f(A)+ \varDelta _{a,l}f(A)\ge 0, \end{aligned}$$

for any \(d, l\in [k]\) and \(d\ne l\). And it also satisfies the orthant submodularity

$$\begin{aligned} \varDelta _{a,d}f(A)\ge \varDelta _{a,d}f(C), \end{aligned}$$
$$\begin{aligned} \forall A, C\in (k+1)^V~\text { with }~ A\subseteq C, a\notin U(C), d\in [k]. \end{aligned}$$

In this problem, each seed node \(a\in V\) has a non-negative cost c(a), and the cost of solution A is \(c(A)=\sum _{a\in U(A)}c(a)\). The goal is to find a solution that maximizes the function value within the given budget \(B\in \mathbb Z_+\) and size \(K\in \mathbb Z_+\). Then we rewrite the PMPCIS in Eq. (1) as follows.

$$\begin{aligned} \begin{aligned} &A^*=\arg \max \limits _{A\in (k+1)^V} \ f(A)\\ &s.t.\quad c(A)\le B\\ &\qquad |A|\le K. \end{aligned} \end{aligned}$$
(2)

4.2 Proposed Algorithm

We present our solution for the proposed PMPCIS in four steps. In the first step, we consider unconstrained profit maximization problem and propose a simple greedy algorithm in Algorithm 1. Given a node set \(V'=\{e_1,e_2,\ldots ,e_m\}\) and \(V'\subseteq V\). Firstly, we remove the two constraints of the problem in Eq. 2, then, Algorithm 1 is devised by greedily finding the maximum function value \(\max _{A\in (k+1)^{V'}}f(A)\) without any constraint.

Let \(T=\{(e_1,d_1^*),\ldots ,(e_m,d_m^*)\}\) be an optimal solution that maximizes the function f(A) over \(V'\). Assume without loss of generality that the seed set are obtained by Greedy Algorithm in the order of \(\{e_1,e_2,\ldots ,e_m\}\), and denote the returned greedy solution by \(A=\{(e_1,d_1),\ldots ,(e_m,d_m)\}\). For \(j=\{0,1,\ldots ,m\}\), define \(A_j=\{(e_1,d_1),\ldots ,(e_j,d_j)\}\). We have \(A_0=\varnothing \) and \(A_m=A\). The following Lemma 1 is crucial to our subsequent proof.

Lemma 1

([15, 16]). If f is monotone, for \(t=\{0,1,\ldots ,m\}\), we have \(f(T)\le 2f(A_t)+\sum _{e_i\in U(T)\setminus U(A_t)}\varDelta _{e_i,d_i^*}f(A_t)\).

Algorithm 1
figure a

. A Simple Greedy Algorithm

Algorithm 2
figure b

. Greedy-Knapsack Algorithm

Then we propose the following Algorithm 2. In Algorithm 2, we select the user who can bring the largest marginal density for objective function f in each iteration, and the selected users can not violate the budget and size constraint at the same time. Algorithm 2 returns the solution \(A_t\).

Next, we devise the Algorithm 3, which aims at further optimizing the solution \(A_t\). If we have solution with size K that is obtained with the Algorithm 2, we can execute the local search procedure utilizing Algorithm 3. We input the solution obtained by Algorithm 2, and denote it as a feasible solution \(A'\). In each iteration, we try to swap a pair of selected and unselected items, aiming to augment the objective value while ensuring adherence to the knapsack constraint. In Algorithm 3, when we obtain the new seed set after the exchange in step 6, we invoke the Algorithm 1 Greedy Algorithm in step 7 to reassign the type of product that the seeds correspond to since the selected seed set may change.

Our main algorithm is as shown in Algorithm 4, which combines the singleton optimum, the greedy strategy in Algorithm 2 and the local search strategy in Algorithm 3. In step 1 of Algorithm 4, we find out the node that can maximize the objective function f among all the nodes, then \(A^*\) is the item pair with a single node set and the product it corresponds.

Algorithm 3
figure c

. Local-Search Algorithm

Algorithm 4
figure d

. Singleton+Greedy-Local-Search Algorithm

If \(f(A^*)\ge f(A)\), then \(A^*\) is returned as the final solution. We always find the node that maximize the marginal density as seeds in Algorithm 3, which may miss some nodes that can bring large profit and also need large costs at the same time, but have small marginal density. However, such nodes are actually a good solution. So the step 1 in Algorithm 4 is to find such kind of nodes. The returned solution \(A^*\) by the Algorithm 4 is a seed-product pair set, but in our original objective function in Eq. 1, we need to find an optimal seed set \(S^*\), so the nodes in \(A^*\) is the returned solution for our PMPCIS in Eq. (1).

4.3 Approximation Performance Analysis

In this section, we analyze the approximation performance guarantee of the proposed Algorithm 4. We intend to establish the approximation ratio in Theorem 1 following the proof framework in [17]. The subsequent observation will be used twice in the proof of Theorem 1.

Lemma 2

Let \(A^1,A^2\in (k+1)^V\) be two node sets with \(|A^1|\le K\) and \(|A^2|=K\), then for any one-to-one function \(y:U(A^1)\backslash U(A^2)\rightarrow U(A^2)\backslash U(A^1)\), we have \(\sum _{x\in U(A^1)\backslash U(A^2)}f(A^2)-f(A^2\backslash \{(y(x),d_{y(x)})\})\le f(A^2)\) where \((y(x),d_{y(x)})\in A^2\).

Proof

Let \(U(A^1)\backslash U(A^2)=\{x_1,x_2,\ldots ,x_{K'}\}\). For any one-to-one function y, suppose \(y_j=y(x_j)\) for each \(j\in \{1,\ldots ,K'\}\). Then we have

$$\begin{aligned} &\sum _{x\in U(A^1)\backslash U(A^2)}f(A^2)-f(A^2\backslash \{(y(x),d_{y(x)})\})\\ \le &~\sum _{j=1}^{K'}f(A^2\backslash \{(y_1,d_{y_1}),\ldots ,(y_{j-1},d_{y_{j-1}})\})-f(A^2\backslash \{(y_1,d_{y_1}),\ldots ,(y_{j},d_{y_{j}})\})\\ \le &~f(A^2), \end{aligned}$$

where \(j=1\), \(\{(y_1,d_{y_1}),\ldots ,(y_{j-1},d_{y_{j-1})}\}\) is an empty set.

Theorem 1

Algorithm 4 has an approximation ratio at least \(\frac{1}{6}(1-e^{-3})\approx 0.15836\).

Proof

Let T be the optimal solution with \(|T|\ge 2\) and A be the seed set output by Algorithm 4. If \(|A|<K\), then A is a \(\frac{1-e^{-2}}{4}\)-approximation solution of maximum k-submodular with a knapsack constraint [16]. Thus, A is also a \(\frac{1-e^{-2}}{4}\)-approximation solution of maximum our k-submodular objective function f with a knapsack constraint and a cardinality constraint.

If \(|A|=K\), we distinguish between two cases based on the last iteration of the algorithm.

Case 1: For any node \(x\in U(T)\backslash U(A)\), node \(y\in U(A)\backslash U(T)\), the swap (xy) was rejected because \(\rho _{(x,y)}\le 0\). Since A is a greedy solution, by Lemma 1, we derive

$$\begin{aligned} f(T)\le &~2f(A)+\sum _{x\in U(T)\backslash U(A)}[f(A\cup \{(x,d_x^*)\})-f(A)]\\ \le &~2f(A)+\sum _{x\in U(T)\backslash U(A)}[f(A\cup \{(x,d_x^*)\}\backslash \{(y(x),d_{y(x)})\})-f(A\backslash \{(y(x),d_{y(x)})\})]\\ \le &~2f(A)+\sum _{x\in U(T)\backslash U(A)}[f(A)-f(A\backslash \{(y(x),d_{y(x)})\})]\\ \le &~3f(A), \end{aligned}$$

where \((x,d_x^*)\in T\) and y(x) is a one-to-one function with \((y(x),d_{y(x)})\in A\). The second inequality holds because of submodularity. The third inequality holds as \(\rho _{(x,y)}\le 0\). The last inequality holds by Lemma 2. Thus, in this case, we have Algorithm 4 yields a \(\frac{1}{3}\)-approximation on the optimum.

Case 2: At least one swap for node pair (xy) with \(\rho _{(x,y)}>0\), for \(x\in U(T)\backslash U(A)\), \(y\in U(A)\backslash U(T)\) was rejected because \(c(x)-c(y)+c(A)>B\).

Let \(A_t\) be the partial greedy solution after t iterations as shown in Algorithm 2 and . Let \(l + 1\) be the first iteration in which a swap \((a_{l+1},b_{l+1})\) was rejected since it violates the knapsack constraint where \(a_{l+1}\in U(T)\backslash U(A_l)\). We can further assume that \(l + 1\) is the first iteration t for which \(A_t=A_{t-1}\). Since \(A_t\) is a greedy solution for \(t=0,1,\ldots ,l\), by Lemma 1, we have

$$f(T)\le 2f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\})-f(A_t)],$$

where \((x,d_x^*)\in T\).

For \(t=0,\ldots ,K-1\), \(|A_t|=t\), we have

$$\begin{aligned} f(T)\le &~2f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\})-f(A_t)]\le 2f(A_t)+B\rho _{t+1}. \end{aligned}$$

The second inequality holds since \(c(T)\le B\).

For \(t=K,\ldots ,l\), \(|A_t|=K\), we have

$$\begin{aligned} f(T)\le &~2f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\})-f(A_t)]\\ \le &~2f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\}\backslash \{(y(x),d_{y(x)})\})-f(A_t\backslash \{(y(x),d_{y(x)})\})]\\ \le &~2f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\}\backslash \{(y(x),d_{y(x)})\})-f(A_t)]+\\ &~\sum _{x\in U(T)\backslash U(A_t)}[f(A_t)-f(A_t\backslash \{(y(x),d_{y(x)})\})]\\ \le &~3f(A_t)+\sum _{x\in U(T)\backslash U(A_t)}[f(A_t\cup \{(x,d_x^*)\}\backslash \{(y(x),d_{y(x)})\})-f(A_t)]\\ \le &~3f(A_t)+B\rho _{t+1}. \end{aligned}$$

The second inequality holds because of submodularity. The fourth inequality holds by Lemma 2. The last inequality holds since \(c(T)\le B\).

Therefore, for each \(t=0,1,\ldots ,l\), we have

$$f(T)\le 3f(A_t)+B\rho _{t+1}.$$

Take advantage of the techniques in [17, 18], we get

$$\frac{f(A_l\backslash \{(b_{l+1},d_{b_{l+1}})\}\cup \{(a_{l+1},d_{a_{l+1}})\})}{f(T)}\ge \frac{1}{3}(1-e^{-3}),$$

where \(\rho _{(a_{l+1},b_{l+1})}=\frac{f(A_l\backslash \{(b_{l+1},d_{b_{l+1}})\}\cup \{(a_{l+1},d_{a_{l+1}})\})-f(A_l)}{c(a_{l+1})}\).

Therefore, Singleton+Greedy-Local-Search has a function value at least

$$\begin{aligned} &\max \{f(A_l),f(\{(a_{l+1},d_{a_{l+1}})\})\}\ge \frac{1}{2} f(A_l\cup \{(a_{l+1},d_{a_{l+1}})\})\\ \ge ~&\frac{1}{2} f(A_l\backslash \{(b_{l+1},d_{b_{l+1}})\}\cup \{(a_{l+1},d_{a_{l+1}})\})\ge \frac{1}{6}(1-e^{-3})f(T). \end{aligned}$$

The theorem is proved.

5 Conclusion

In this paper, we investigate the profit maximization problem for k kinds of competitive products information spreading at the same time in social networks. Considering that one seed user spreads multiple product information at the same time may disperse its followers’ attention, one seed user can only propagate the influence of one kind of product. The goal of the proposed problem is to select k subsets of users as seeds with a budget B and a seed size K constraints such that the total profit for k kinds of products is maximized. Our optimal problem is formulated as maximizing a monotone k-submodular function under a knapsack constraint and a cardinality constraint. A Singleton+Greedy-Local-Search Algorithm is put forward in four steps to solve the profit maximization problem, which achieves a 0.216 and 0.158 approximation performance guarantees in two different cases, respectively.