Maximizing k-submodular functions under budget constraint: applications and streaming algorithms

Pham, Canh V.; Vu, Quang C.; Ha, Dung K. T.; Nguyen, Tai T.; Le, Nguyen D.

doi:10.1007/s10878-022-00858-x

Maximizing k-submodular functions under budget constraint: applications and streaming algorithms

Published: 18 April 2022

Volume 44, pages 723–751, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Maximizing k-submodular functions under budget constraint: applications and streaming algorithms

Download PDF

Canh V. Pham¹,
Quang C. Vu²,
Dung K. T. Ha³,
Tai T. Nguyen³ &
…
Nguyen D. Le⁴

650 Accesses
14 Citations
Explore all metrics

Abstract

Motivated by the practical applications in solving plenty of important combinatorial optimization problems, this paper investigates the Budgeted k-Submodular Maximization problem defined as follows: Given a finite set V, a budget B and a k-submodular function $f: (k+1)^V \mapsto \mathbb {R}_+$, the problem asks to find a solution $\mathbf{s }=(S_1, S_2, \ldots , S_k) \in (k+1)^V $, in which an element $e \in V$ has a cost $c_i(e)$ when added into the i-th set $S_i$, with the total cost of $\mathbf{s }$ that does not exceed B so that $f(\mathbf{s })$ is maximized. To address this problem, we propose two single pass streaming algorithms with approximation guarantees: one for the case that an element e has only one cost value when added to all i-th sets and one for the general case with different values of $c_i(e)$. We further investigate the performance of our algorithms in two applications of the problem, Influence Maximization with k topics and sensor placement of k types of measures. The experiment results indicate that our algorithms can return competitive results but require fewer the number of queries and running time than the state-of-the-art methods.

Streaming Algorithms for Maximizing k-Submodular Functions with the Multi-knapsack Constraint

Article 18 June 2024

An optimal streaming algorithm for non-submodular functions maximization on the integer lattice

Article 29 December 2022

A Multi-pass Streaming Algorithm for Regularized Submodular Maximization

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Maximizing k-submodular functions has attracted a lot of attentions because of its potential in solving various combinatorial optimization problems such as influence maximization (Ohsaka and Yoshida 2015; Rafiey and Yoshida 2020; Qian et al. 2018; Nguyen and Thai 2020), sensor placement (Ohsaka and Yoshida 2015; Rafiey and Yoshida 2020; Qian et al. 2018), feature selection (Singh et al. 2012) and information coverage maximization (Qian et al. 2018). Given a finite set V and an integer k, we define $[k]=\{1, 2, \ldots , k\}$ and $(k+1)^V=\{(X_1, X_2, \ldots , X_k)| X_i \subseteq V, \forall i \in [k], X_i \cap X_j =\emptyset , \forall i \ne j\}$ as a family of k disjoint sets, called the k-set. A function $f: (k+1)^V \mapsto \mathbb {R}_+$ is k-submodular iff for any $\mathbf{x }=(X_1, X_2, \ldots , X_k)$ and $\mathbf{y }=(Y_1, Y_2, \ldots , Y_k)$ $\in (k+1)^V$, we have:

$$\begin{aligned} f(\mathbf{x })+ f(\mathbf{y }) \ge f(\mathbf{x }\sqcap \mathbf{y }) + f(\mathbf{x }\sqcup \mathbf{y }) \end{aligned}$$

(1)

where

$$\mathbf{x }\sqcap \mathbf{y }=(X_1 \cap Y_1, \ldots , X_k \cap Y_k)$$

and

$$\mathbf{x }\sqcup \mathbf{y }=\left( X_1 \cup Y_1 \setminus (\bigcup _{i \ne 1} X_i \cup Y_i), \ldots , X_k \cup Y_k \setminus (\bigcup _{i \ne k} X_i \cup Y_i)\right) $$

In addition to unconstrained case (Ward and Zivný 2014; IWata et al. 2016; Soma 2019; Oshima 2017), researchers also solve the problem under size constraint (Rafiey and Yoshida 2020; Ohsaka and Yoshida 2015; Qian et al. 2018; Nguyen and Thai 2020), matroid constraint (Sakaue 2017; Rafiey and Yoshida 2020) and knapsack constraint (Tang et al. 2022; Wang and Zhou 2021). However, these problems does not cover several real applications which customizes each element in terms of requiring its cost as well as limits the budget. We are going to discuss the following two applications:

Influence Maximization with k topics under a budget constraint Given a social network under an information diffusion model and k topics. Each user has a cost to start the influence under a topic which manifests how hard it is to initially influence the respective person for that topic. Given the budget B, we consider the problem of finding a set of users (seed set), each initially adopts a topic, with the total cost that is at most B to maximize the expected numbers of users activated by at least one topic.

Sensor placement with k types of measures under a budget constraint. Given k types of sensors for different measures and n locations, each of which can be instrumented with one sensor exactly. We assume that allocating a sensor to each location has a different cost depending on its position and the type of sensor. Given the budget B, we consider the problem of allocating those sensors to maximize the information gained with the total cost at most B.

In two above applications, the objective functions are k-submodular (Ohsaka and Yoshida 2015; Rafiey and Yoshida 2020; Nguyen and Thai 2020). Although there have been many attempts to find a solution that maximizes the k-submodular function, they did not cover the case that each element would have different costs when added into different sets of the solution with a limited budget as shown in two above examples. Motivated by that observation, in this work, we study a novel problem named Budgeted k-submodular maximization ($\textsf {BkSM}$), defined as follows.

Definition 1

($\textsf {BkSM}$ problem) Given a finite set V, a budget B and a k-submodular function $f: (k+1)^V \mapsto \mathbb {R}_+$. The problem asks to find a solution $\mathbf{s }=(S_1, S_2, \ldots , S_k)$ in which an element $e \in V$ has a cost $c_i(e)>0$ when added into $S_i$, with total cost $c(\mathbf{s })=\sum _{i \in [k]}\sum _{e \in S_i} c_i(e)\le B$ so that $f(\mathbf{s })$ is maximized.

In addition, the constant increase of input data makes it impossible to be stored in computer memory. Therefore it is critical to devise streaming algorithms for $\textsf {BkSM}$, in which a streaming algorithm receives each element in the ground set sequentially, and keeps only a small number of the element in memory at any point. After scanning one or a few passes over the ground set, the algorithm can return a solution with performance guarantees (Badanidiyuru et al. 2014; Yang et al. 2019; Rafiey and Yoshida 2020).

1.1 Our contributions

To address above challenges, in this paper we propose two single-pass streaming algorithms which provide theoretical bounds of $\textsf {BkSM}$. Overall, our contributions are as follows:

For the special case: an element has just one cost value when added into any i-th set, we first propose a deterministic streaming algorithm (Algorithm 2) which runs in a single pass, has $O( \frac{kn}{\epsilon }\log n)$ query complexity, $O\left( \frac{n}{\epsilon } \log n\right) $ space complexity and returns an approximation ratio of $\frac{1}{4}-\epsilon $ when f is monotone and $\frac{1}{5}-\epsilon $ when f is non-monotone for any input parameter $\epsilon \in (0,\frac{1}{5})$.
For the general case, we propose a random streaming algorithm (Algorithm 4) which runs in a single pass, has $O( \frac{kn}{\epsilon }\log n)$ query complexity, $O\left( \frac{n}{\epsilon } \log n\right) $ space complexity and returns an approximation ratio of $\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}-\epsilon $ when f is monotone and $\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \}-\epsilon $ when f is non-monotone in expectation where $\beta =\max _{e\in V, i , j \in [k], i\ne j} \frac{c_i(e)}{c_j(e)}$ and $\alpha \in (0,1], \epsilon \in (0,1)$ are input parameters.
We conduct comprehensive experiments to investigate the performance of our algorithms in two applications of $\textsf {BkSM}$, Influence Maximization and Sensor Placement. The results have shown that our algorithms not only reduce the number of queries but also return comparable solutions in term of quality than the state-of-the-art non-streaming algorithms.

1.2 Related work

Although submodular maximization problems is NP-hard in general (Schrijver 2003), they have been extensively studied because of their important roles in combinatorial optimization and machine learning. Nemhauser et al. (1978) first studied the problem of maximizing a monotone submodular function under a cardinality constraint and showed that the traditional greedy algorithm could provide an approximation ratio of $(1-1/e)$. Since then, there have been many studies on this problem under various constraints such as cardinality constraint (Badanidiyuru and Vondrák 2014; Mirzasoleiman et al. 2015, 2016; Buchbinder et al. 2015; Krause et al. 2008), knapsack constraint (Wolsey 1982; Sviridenko 2004; Huang et al. 2020) matroid constraint (Cualinescu et al. 2011), p-set constraint (Haba et al. 2020), d-knapsack constraint (Yuet al. 2016). However, submodular maximization algorithms may not be applicable to k-submodular maximizaiton problems due to intrinsic differences between submodularity and k-submodularity.

Studying on k-submodular functions was initiated by Singh et al. (2012) but the authors only focused on the case $k=2$, i.e., bisubmodular. Since then, more works have focused on the case of general k. Since submodular maximization problem, a special case of k-submodular maximization problem, is NP-hard (Schrijver 2003), the k-submodular maximization problem is also NP-hard. Though people proposed a polynomial-time algorithm in the case of minimizing a k-submodular (Thapper and Zivný S 2012), it’s still a challenge of devising a similar algorithm to solve the problem of k-submodular maximization.

Ward and Zivný (2014) first studied an unconstrained maximization of k-submodular function, a special case of $\textsf {BkSM}$ with cost values of all elements equal to 1 and $B=n$, and devised a deterministic greedy algorithm which provided an approximation ratio of 1/3. Later on, the authors in IWata et al. (2016) introduced a random greedy approach which improved the approximation ratio to $\frac{k}{2k-1}$ by introducing a probability distribution to select a larger marginal element with higher probability. Work in Oshima (2017) eliminated the random told in IWata et al. (2016) but the number of queries increased to $O(n^2k^2)$. The unconstrained k-submodular maximization was further studied in Soma (2019) in online settings. In fact, the algorithms in Ward and Zivný (2014) and IWata et al. (2016) work as single pass streaming algorithms but they cannot be directly applied to our problem. Our streaming algorithm for $\textsf {BkSM}$ also uses the idea of random selection in IWata et al. (2016) but it introduces a new distribution that can help to select an element with various costs and then establishes the relationship between the current solution and the optimal solution.

Maximizing k-submodular functions have been further studied with several types of constraints. Ohsaka and Yoshida (2015) first studied monotone k-submodular maximization problem with the size constraints. By using the greedy approaches, they proposed 1/2-approximation algorithm for the total size constraint and 1/3-approximation algorithm for the individual size constraint. Similarly, authors in Sakaue (2017) showed a greedy selection that could give an approximation ratio of 1/2 under the matroid constraint. However, these works did not provide any approximation guarantee when f was non monotone. The authors in Qian et al. (2018) then further proposed a multi-objective evolutionary algorithm for monotone k-submodular maximization problem under the total size constraint. Their algorithm provided 1/2-approximation solution and took $O(kn\log ^2B)$ queries in expectation. Recently, Rafiey and Yoshida (2020) have considered the k-submodular maximization problem subject to the total size constraint under noises and proposed two streaming algorithms which provided the approximation ratio of $O(\epsilon (1-\epsilon )^{-2}B)$ when f was monotone and $O(\epsilon (1-\epsilon )^{-3}B)$ when f was non-monotone. Zheng et al. (2021) investigated the problem of maximizing approximately k-submodular functions subject to the size constraints by introducing an approximate function of the objective function and proposed several greedy algorithms with provable guarantees. However, these algorithms can not adapt to our studied problem because of the variety of an element’s costs that makes devising an approximation algorithm more challenging.

To our best knowledge, Zhang et al. (2019) first studied the problem of maximizing the k-submodular function with each i-th set in the solution having a budget constraint. In the seminal paper, they devised a $\frac{1}{5}(1-\frac{1}{e})$ approximation algorithm with $O(kn^2)$ query complexity but it did not keep this ratio when f was non-monotone. Furthermore, this problem is completely different from our studied problem. Firstly, we consider a general case where each element has a variety of costs when added into various i-th set of the solution. Secondly, instead of the individual budget constraint, we consider the total budget constraint. Besides, our algorithm also provides the approximation guarantee when f is non-monotone. More recently, a $(1-1/e)/2$-approximation algorithm within $O(n^4k^3)$ queries for the monotone k-submodular maximization under a knapsack constraint has been proposed (Tang et al. 2022). This approximation ratio has been improved to $1/2-\epsilon $ by Wang and Zhou (2021). However, two above algorithms only hold for a special case of $\textsf {BkSM}$ when f is monotone and an element has just one cost value when added into any i-th set. On the other hand, our Algorithm 2 can give the approximate ratios for both monotone and non-monotone cases.

The streaming algorithm is one of efficient methods for solving submodular maximization problems under various kinds of constraints such as cardinality constraint (Gomes and Krause 2010; Badanidiyuru et al. 2014; Kumar et al. 2013; Yang et al. 2019), knapsack constraint (Huang et al. 2020), k-set constraint (Haba et al. 2020) and matroid constraint (Chakrabarti and Kale 2015) but it is not potential to be directly applied to our $\textsf {BkSM}$ problem due to intrinsic differences between submodularity and k-submodularity. Instead, constructing our algorithms is an inspired suggestion from Huang et al. (2020) and Badanidiyuru et al. (2014) in which we also sequentially make decision based on the value of incremental objective function per cost of each element and guess the optimal solution through the maximum singleton value.

1.3 Organization

The rest of the paper is organized as follows: The notations and properties of k-submodular functions are presented in Sect. 2. Sections 3 and 4 present our algorithms and theoretical analysis. The extensive experiments are shown in Sect. 5. Finally, we conclude this work in Sect. 6.

2 Preliminaries

This section presents notations used throughout the paper and properties of the k-submodular function.

Given a finite set V and an integer k, for $\mathbf{x }=(X_1, X_2, \ldots , X_k), \mathbf{y }=(Y_1, Y_2, \ldots , Y_k) \in (k+1)^V$, we define $supp_i(\mathbf{x })=X_i$, $supp(\mathbf{x })=\cup _{i\in [k]}X_i$, $X_i$ is called the i-th set of $\mathbf{x }$ and an empty k-set is defined as $\mathbf{0 }=(\emptyset , \ldots , \emptyset )$. If $e \in X_i$, we write $\mathbf{x }(e)=i$; if $e \notin \cup _{i \in [k]} X_i$, we write $\mathbf{x }(e)=0$ and i is called the position of e; adding $e \notin supp(\mathbf{x })$ into $X_i$ can be represented by $\mathbf{x }\sqcup (e, i) $. In the case of $X_i=\{e\}$, and $X_j= \emptyset , \forall j\ne i$, we denote $\mathbf{x }$ by (e, i). We denote by $\mathbf{x }\sqsubseteq \mathbf{y }$ iff $X_i \subseteq Y_i$ for all $i\in [k]$.

A function $f: (k+1)^V \mapsto \mathbb {R}_+$ is k-submodular iff for any $\mathbf{x }=(X_1, X_2, \ldots , X_k)$ and $\mathbf{y }=(Y_1, Y_2, \ldots , Y_k)$ $\in (k+1)^V$, we have:

$$\begin{aligned} f(\mathbf{x })+ f(\mathbf{y }) \ge f(\mathbf{x }\sqcap \mathbf{y }) + f(\mathbf{x }\sqcup \mathbf{y }) \end{aligned}$$

where

$$\mathbf{x }\sqcap \mathbf{y }=(X_1 \cap Y_1, \ldots , X_k \cap Y_k)$$

and

$$\mathbf{x }\sqcup \mathbf{y }=\left( X_1 \cup Y_1 \setminus (\bigcup _{i \ne 1} X_i \cup Y_i), \ldots , X_k \cup Y_k \setminus (\bigcup _{i \ne k} X_i \cup Y_i)\right) $$

A function $f: (k+1)^V \mapsto \mathbb {R}_+$ is monotone iff for any $\mathbf{x }\in (k+1)^V, e\notin supp(\mathbf{x })$ and $i \in [k]$, we have:

$$\begin{aligned} \varDelta _{e, i } f(\mathbf{x })=f(X_1, \ldots , X_{i-1}, X_i \cup \{e\}, X_{i+1}, \ldots , X_k) -f(X_1, \ldots , X_k) \ge 0 \end{aligned}$$

(2)

Given a k-submodular function $f: (k+1)^V \mapsto \mathbb {R}_+$, from Ward and Zivný (2014) the k-submodularity of f implies the orthant submodularity, i.e,

$$\begin{aligned} \varDelta _{e, i }f(\mathbf{x }) \ge \varDelta _{e, i } f(\mathbf{y }) \end{aligned}$$

(3)

for any $\mathbf{x }, \mathbf{y }\in (k+1)^V$ with $\mathbf{x }\sqsubseteq \mathbf{y }$, $e \notin supp(\mathbf{y })$ and $i \in [k]$, and the pairwise monotonicity, i.e,

$$\begin{aligned} \varDelta _{e, i }f(\mathbf{x }) + \varDelta _{e, j }f(\mathbf{x }) \ge 0 \end{aligned}$$

(4)

for any $\mathbf{x }\in (k+1)^V$ with $e \notin supp(\mathbf{x })$ and $i, j \in [k]$ with $i \ne j$.

In this paper, we assume that f is normalized, i.e, $f(\mathbf{0 })=0$ and each element e has a positive cost $c_i(e)$ when added into the i-th set of a solution and the total cost of k-set $\mathbf{x }$ is:

$$\begin{aligned} c(\mathbf{x })=\sum _{i \in [k], e \in supp_i(\mathbf{x })} c_i(e) \end{aligned}$$

We define $\beta $ as the largest ratio of different costs of an element, i.e,

$$\begin{aligned} \beta =\max _{e\in V, i \ne j}\frac{c_i(e)}{c_j(e)} \end{aligned}$$

Without loss of generality, throughout this paper, we assume that every element e satisfies $c_i(e)\ge 1, \forall i \in [k]$ and $c_i(e)\le B$ as otherwise we can simply remove it. We only consider $k\ge 2$ because if $k = 1$, the k-submodular function becomes the submodular function.

3 A deterministic streaming algorithm when $\beta =1$

In this section, we introduce a deterministic streaming algorithm for the special case when $\beta =1$, i.e, each element has the same cost for all subsets $c_i(e)=c_j(e), \forall e \in V, i \ne j$. For simplicity, we denote $c(e)=c_i(e)=c_j(e)$.

The main idea of our algorithms is that (1) we select each observed element e based on comparing between the ratio of f per total cost at the current solution and a threshold which is set in advance, and (2) we use the maximum singleton value $(e_{max}, i_{max})$ defined as

$$\begin{aligned} (e_{max}, i_{max})=\arg \max _{e\in V, i \in [k]} f((e, i)) \end{aligned}$$

(5)

to obtain the final solution. We first assume that the optimal solution is known and then remove this assumption by using the method in Badanidiyuru et al. (2014).

3.1 A deterministic streaming algorithm with known optimal value

We first present a simplified version of our deterministic streaming algorithm when the optimal value is known. Denote by $\mathbf{o }$ an optimal solution and ${\mathsf {opt}}=f(\mathbf{o })$, the algorithm receives v such that $v\le {\mathsf {opt}}$ and a parameter $\alpha \in (0, 1]$ as inputs. The role of these parameters are going to be clarified in main version. The details of the algorithm are fully presented in Algorithm 1. We define the following notations:

$(e^j, i^j)$ as the j-th element added in the main loop of the algorithm.
$\mathbf{s }^j=\{(e^1, i^1), \ldots , (e^j, i^j)\}$ as the solution when adding j elements in the main loop of the algorithm.
$\mathbf{o }^j=(\mathbf{o }\sqcup \mathbf{s }^j ) \sqcup \mathbf{s }^j$
$\mathbf{o }^{j-1/2}=(\mathbf{o }\sqcup \mathbf{s }^j ) \sqcup \mathbf{s }^{j-1}$
$\mathbf{s }^{j-1/2}$: If $e^j \in supp(\mathbf{o })$, then $\mathbf{s }^{j-1/2}=\mathbf{s }^{j-1} \sqcup (e^j, \mathbf{o }(e^j)) $. If $e^j \notin supp(\mathbf{o })$, $\mathbf{s }^{j-1/2}=\mathbf{s }^{j-1}$
$\mathbf{u }^t=\{(u_1, j_1), (u_2, j_2), \ldots , (u_r,j_r) \}$: a set of elements that are in $\mathbf{o }^t$ but not in $\mathbf{s }^t$, $r=|supp(\mathbf{u }^t)|$
$\mathbf{u }^t_i=\mathbf{s }^t \sqcup \{(u_1, j_1), (u_2, j_2), \ldots , (u_i,j_i) \}, \forall 1 \le i\le r$ and $\mathbf{u }^t_0=\mathbf{s }^t$.

The algorithm initiates a candidate solution $\mathbf{s }^0$ as an empty k-set. For each new incoming element e, the algorithm updates a tuple $(e_{max}, i_{max})$ to find the maximal singleton then checks that the total cost $c(\mathbf{s }^t) + c(e)$ exceeds B or not. If not, it finds a position $i' \in [k]$ that $f(\mathbf{s }^t \sqcup (e, i'))$ is maximal and adds (e, i) into $\mathbf{s }^t$ if $\frac{f(\mathbf{s }^t \sqcup (e, i'))}{c(\mathbf{s }^t) + c (e)}\ge \frac{\alpha v }{B} $. Otherwise, it ignores e and receives the next element. This step helps the algorithm select any element having high value of marginal value per its cost as well as eliminate bad ones.

After finishing the main loop, the algorithm returns the best solution between $ \{\mathbf{s }^{t} \}$ and $\{(e_{max}, i_{max})\}$ when f is monotone or returns the best solution between $\{\mathbf{s }{^j}: j \le t\}$ and $\{(e_{max}, i_{max})\}$ when f is non-monotone.

We now analyze the approximation guarantee of Algorithm 1. By exploiting the relation among $\mathbf{o }$, $\mathbf{o }^t$ and $\mathbf{s }^t$, we obtain the following Lemma.

Lemma 1

Denote by $e^t$ the last addition of the main loop of the Algorithm 1. If f is monotone then $v-f(\mathbf{o }^t) \le f(\mathbf{s }^t)$ and if f is non-monotone then $v-f(\mathbf{o }^t) \le 2f(\mathbf{s }^t) $.

Proof

The proof follows the analysis of the relationship among $\mathbf{s }^j, \mathbf{o }^j, \mathbf{o }$ in Nguyen and Thai (2020). We consider two following cases:

Case 1 If f is monotone. By the k-submodular property of f and note that $f(\mathbf{o })=f(\mathbf{o }^0)$ we obtain:

$$\begin{aligned} v-f(\mathbf{o }^t)&\le f(\mathbf{o })-f(\mathbf{o }^t) = \sum _{j=1}^t(f(\mathbf{o }^{j-1})-f(\mathbf{o }^{j}))\\&\le \sum _{j=1}^t(f(\mathbf{o }^{j-1})-f(\mathbf{o }^{j-1/2})) \ \ (\text{ due } \text{ to } \text{ the } \text{ monotoncity } \text{ of } f)\\&\le \sum _{j=1}^t(f(\mathbf{s }^{j-1/2})-f(\mathbf{s }^{j-1})) \ \ (\text{ due } \text{ to } \text{ the } k-{\text {submodularity}})\\&\le \sum _{j=1}^t(f(\mathbf{s }^{j})-f(\mathbf{s }^{j-1})) \ \ (\text{ due } \text{ to } \text{ the } \text{ selection } \text{ of } \text{ algorithm})\\&\le f(\mathbf{s }^t) \end{aligned}$$

Case 2 If f is non-monotone, we further consider following sub-cases:

- If $e^j \notin supp(\mathbf{o })$, define an integer number $l \in [k]$ that $l \ne i^j$ and $\mathbf{o }^{j}_l$ as a k-set as follows: $\mathbf{o }^{j}_l(e)=\mathbf{o }^j(e), \forall e \in V\setminus \{e^j\}$ and $\mathbf{o }^{j}_l(e^j)=l$, we have:

$$\begin{aligned} f(\mathbf{o }^{j-1})-f(\mathbf{o }^j)&= f(\mathbf{o }^j_l) -f(\mathbf{o }^{j-1})- (f(\mathbf{o }^j)+ f(\mathbf{o }^j_l)- 2f(\mathbf{o }^{j-1}) )\\&\le f(\mathbf{o }^j_l) -f(\mathbf{o }^{j-1}) \ \ \ (\text{ due } \text{ to } \text{ the } \text{ pairwise-monotoncity})\\&\le f(\mathbf{s }^j_l) -f(\mathbf{s }^{j-1})\\&\le f(\mathbf{s }^j) -f(\mathbf{s }^{j-1}) \end{aligned}$$

- If $e^j \in supp(\mathbf{o })$. In this case, if $\mathbf{o }^{j-1}(e^j)=i^j$. Due to the pairwise-monotone property of f, there exists $i'\in [k]$ that $f(\mathbf{s }^{j-1} \sqcup (e^j, i')) \ge 0$. Therefore,

$$\begin{aligned} f(\mathbf{o }^j)-f(\mathbf{o }^{j-1})=0\le f(\mathbf{s }^j) -f(\mathbf{s }^{j-1}) \end{aligned}$$

If $\mathbf{o }^{j-1}(e^j)\ne i^j$, we obtain:

$$\begin{aligned} f(\mathbf{o }^{j-1})-f(\mathbf{o }^j)&= 2f(\mathbf{o }^{j-1}) -2f(\mathbf{o }^{j-1/2})- (f(\mathbf{o }^{j-1})+ f(\mathbf{o }^j)- 2f(\mathbf{o }^{j-1/2}) )\\&\le 2f(\mathbf{o }^{j-1}) -2f(\mathbf{o }^{j-1/2}) \le 2f(\mathbf{s }^{j}) -2f(\mathbf{s }^{j-1}) \end{aligned}$$

Overall, we have $f(\mathbf{o }^{j-1})-f(\mathbf{o }^j) \le 2f(\mathbf{s }^{j}) -2f(\mathbf{s }^{j-1})$ for the non-monotone case. Therefore,

$$\begin{aligned} v-f(\mathbf{o }^t)&\le f(\mathbf{o })-f(\mathbf{o }^t) = \sum _{j=1}^t(f(\mathbf{o }^{j-1})-f(\mathbf{o }^{j}))\\&\le 2\sum _{j=1}^t(f(\mathbf{s }^{j})-f(\mathbf{s }^{j-1})) \le 2f(\mathbf{s }^t) \end{aligned}$$

which completes the proof. $\square $

Lemma 1 plays an important role for analyzing approximation ratio of the algorithm, which stated in the following Theorem.

Theorem 1

Algorithm 1 is a single pass streaming algorithm and returns a solution $\mathbf{s }$ satisfying:

If f is monotone, $f(\mathbf{s }) \ge \min \{\frac{\alpha }{2}, \frac{1-\alpha }{2} \}v$. The right hand side is maximized to $ \frac{v}{4}$ when $\alpha = \frac{1}{2}$.
If f is non-monotone, $f(\mathbf{s }) \ge \min \{\frac{\alpha }{2}, \frac{ 1-\alpha }{3} \}v$. The right hand side is maximized to $\frac{v}{5}$ when $\alpha = \frac{2}{5}$.

Proof

We observe that an element $e \in supp(\mathbf{o })$ does not belong to $supp(\mathbf{s }^t)$ if neither e does not pass the condition in line 8 nor its addition would cause the total cost of $\mathbf{s }^t$ to exceed B.

Denote by $e \in supp(\mathbf{o })$ as a bad element if it passes the condition in line 8 of Algorithm 1 but the total cost exceeds B, i.e., there exits an integer $i \in [k]$ satisfying:

$$\begin{aligned} \frac{f(\mathbf{s }^{t_e} \sqcup (e, i))}{c(\mathbf{s }^{t_e}) + c (e)}\ge \frac{\alpha v }{B} \ \text{ and } \ \ c(\mathbf{s }^{t_e}) + c (e) >B \end{aligned}$$

(6)

where $\mathbf{s }^{t_e}$ is the candidate solution obtained right before e arrives.

Case 1 There is no bad element.

By applying Lemma 1, we obtain:

$$\begin{aligned} v-f(\mathbf {s }^t)&=v-f(\mathbf {o }^t) + f(\mathbf {o }^t)-f(\mathbf {s }^t)\\ {}&\le f(\mathbf {o })-f(\mathbf {o }^t) + f(\mathbf {o }^t)-f(\mathbf {s }^t)\\ {}&\le f(\mathbf {s }^t) + \sum _{i=1}^{r}\left( f(\mathbf {u }^t_i)-f(\mathbf {u }^t_{i-1})\right) \ \text {(Lemma } \text {1) } \\ {}&\le f(\mathbf {s }^t) + \sum _{i=1}^{r}(f(\mathbf {s }^{t_{u_i}} \sqcup (u_i, j_i))-f(\mathbf {s }^{t_{u_i}}) \\ {}&\quad \text {(Due } \text { to } \text { the } k-\text {submodularity of} \ f)\\ {}&\le f(\mathbf {s }^t) + \sum _{i=1}^{r}\left( \frac{\alpha v(c(\mathbf {s }^{t_{u_i}})+ c(u_i))}{B}-\frac{\alpha vc(\mathbf {s }^{t_{u_i}})}{B}\right) \\ {}&\le f(\mathbf {s }^t) + \sum _{i=1}^{r}\frac{\alpha v c(u_i)}{B}\\ {}&\le f(\mathbf {s }^t)+ \alpha v \end{aligned}$$

This implies that $f(\mathbf{s }^t) \ge \frac{ 1-\alpha }{2}v$.

Case 2 If a bad element e exits, there is an integer $i \in [k]$ satisfying: $\frac{f(\mathbf{s }^{t_e} \sqcup (e, i))}{c(\mathbf{s }^{t_e}) + c (e)}\ge \frac{\alpha v }{B}$ and $c(\mathbf{s }^{t_e}) + c (e)>B$. Therefore:

$$f(\mathbf{s }^{t_e} \sqcup (e, i))\ge \frac{(c(\mathbf{s }^{t_e}) + c (e))\alpha v }{B} > \alpha v $$

By the k-submodularity of f, we have:

$$\begin{aligned} f(\mathbf{s }^{t_e} \sqcup (e, i)) \le f(\mathbf{s }^{t_e}) +f((e, i)) \end{aligned}$$

which implies that:

$$\begin{aligned} f(\mathbf{s })&\ge \max \{f(\mathbf{s }^t), f((e_{max}, i_{max}))\}\\&\ge \max \{f(\mathbf{s }^{t_e}), f((e, i))\}\\&\ge \frac{f(\mathbf{s }^{t_e}) +f((e, i))}{2}\\&> \frac{\alpha v}{2} \end{aligned}$$

Combine two above cases, we obtain $f(\mathbf{s })=\min \{ \frac{1-\alpha }{2}, \frac{\alpha }{2} \}v$ and $f(\mathbf{s })$ is maximized to $ \frac{1}{4} v$ when $\alpha = \frac{1}{2}$.

If f is non-monotone, by arguments that are similar to the monotone case, we have $f(\mathbf{s })=\min \{ \frac{1-\alpha }{3}, \frac{\alpha }{2} \}v$. The proof is completed. $\square $

3.2 A deterministic streaming algorithm

We present our deterministic streaming algorithm in the case of $\beta =1$, which reuses the framework of Algorithm 1 but removes the assumption that ${\mathsf {opt}}$ is known.

Define $m=\max _{e \in V, i\in [k]}f((e, i))$, we have $m \le {\mathsf {opt}}\le n\cdot m $. Therefore we use the value $v=(1+\epsilon ')^j$ with $ m \le (1+\epsilon ')^j \le n\cdot m, j\in \mathbb {Z}_+ $ to guess the value of ${\mathsf {opt}}$ by showing that there exits v such that $(1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}$. However, in order to find m, it’s necessary to require at least one pass over V. Therefore, we adapt the dynamic update method, which was first proposed by Badanidiyuru et al. (2014) and then widely used for streaming algorithms for both submodular and k-submodular optimizations (Yang et al. 2019; Huang et al. 2020; Nguyen and Thai 2020). It updates $m=\max \{m , \max _{i \in [k]}f((e, i))\}$ with already observed element e, to determine the range of the guessed optimal value.

This method can help algorithm maintain the good estimation of the optimal solution if that range shifts forward when next elements are observed. We implement this method by using variables $\mathbf{s }_j^{t_j}$ and $t_j$ to store the candidate solution and the number of its elements with respect to j.

We set the value of $\alpha $ by using Theorem 1 which provides the best approximation guarantees. The value of $\epsilon '$ is set to several times $\epsilon $ to reduce the complexity but still ensure approximation ratios. The detail of our algorithm is presented in Algorithm 2.

Lemma 2

After ending of the main loop of the Algorithm 2, there exists a number $j \in \mathbb {Z}_+$ that $v=(1+\epsilon ')^j \in O$ satisfying $(1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}$.

Proof

Define $m=f((e_{max},i_{max}))$. Due to k-submodularity of f, we have:

$$\begin{aligned} m \le {\mathsf {opt}}=f(\mathbf{o })\le \sum _{e \in supp(\mathbf{o })}f(e, \mathbf{o }(e)) \le m \cdot n \end{aligned}$$

Let $j=\lfloor \log _{1+\epsilon '} {\mathsf {opt}}\rfloor $, we have $v=(1+\epsilon ')^j \le {\mathsf {opt}}\le n.m $, and

$$\begin{aligned} v \ge (1+\epsilon ')^{\log _{1+\epsilon '}({\mathsf {opt}}) -1 }=\frac{{\mathsf {opt}}}{1+\epsilon '} \ge {\mathsf {opt}}(1-\epsilon ') \end{aligned}$$

$\square $

The performance of Algorithm 2 is claimed in the following Theorem.

Theorem 2

Algorithm 2 is a single-pass streaming algorithm that has $O\left( \frac{kn}{\epsilon } \log n\right) $ query complexity, $O\left( \frac{n}{\epsilon } \log n\right) $ space complexity and provides an approximation ratio of $\frac{1}{4}-\epsilon $ when f is monotone and $\frac{1}{5}-\epsilon $ when f is non-monotone.

Proof

The size of O is at most $\frac{1}{\epsilon '}\log n$, finding each $\mathbf{s }_j^{t_j}$ takes at most O(kn) queries and $\mathbf{s }_j^{t_j}$ includes at most n elements. Therefore, the query complexity is $O\left( \frac{kn}{\epsilon } \log n\right) $ and total space complexity is $O\left( \frac{n}{\epsilon } \log n\right) $.

By Lemma 2, there exists an integer number $j \in \mathbb {Z}_+$ that $v=(1+\epsilon ')^j \in O$ satisfies $(1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}$. Apply Theorem 1, for the monotone case we have:

$$\begin{aligned} f(\mathbf{s }) \ge \frac{1}{4} v \ge \frac{1}{4}(1-\epsilon '){\mathsf {opt}}= \left( \frac{1}{4} -\epsilon \right) {\mathsf {opt}}\end{aligned}$$

(7)

and for the non-monotone case:

$$\begin{aligned} f(\mathbf{s }) \ge \frac{1}{5}v \ge \frac{1}{5}(1-\epsilon '){\mathsf {opt}}= \left( \frac{1}{5} -\epsilon \right) {\mathsf {opt}}\end{aligned}$$

(8)

The theorem is proved. $\square $

4 A random streaming algorithm for general case

Since in this case each element having multiple different costs makes the problem more challenging and we cannot apply previous algorithms. We introduce a one pass streaming which provides approximation ratio in expectation for $\textsf {BkSM}$ problem. At the core of our algorithm, we introduce a new probability distribution for choosing a position for each element to establish the relationship among $\mathbf{o }$, $\mathbf{o }^j$ and $\mathbf{s }^j$ (Lemma 3) and analyze the performance of our algorithm. Besides, we also use the predefined threshold to filter high-value elements into the candidate solutions and the maximum singleton value to give the final solution.

Similar to the previous section, we first introduce a simplified version of streaming algorithm when the optimal solution is known in advance.

4.1 A random algorithm with known the optimal value

This algorithm also receives inputs $\alpha \in (0, 1)$ and v that $v\le {\mathsf {opt}}$. We use the same notations as in Sect. 3. This algorithm also requires one pass over V.

The algorithm initializes an empty k-set $\mathbf{s }^0$ and subsequently updates the solution after one-pass over V. Differing from Algorithm 1, for each $e \in V$ being observed, the algorithm finds a set collection J that contains some positions satisfying the total cost at most B and provides the ratio of increment of objective function per cost at least a given threshold, i.e,

$$\begin{aligned} J=\left\{ i\in [k]:c(\mathbf{s }^t) + c_i(e) \le B \ \text{ and } \ \frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{ c_i(e)}\ge \frac{\alpha v }{B}\right\} \end{aligned}$$

(9)

These constraints help the algorithm eliminate the positions having low such ratio. If $J\ne \emptyset $, the algorithm puts e into the set i of $\mathbf{s }^t$ with a probability:

$$\begin{aligned} \frac{p_i^{|J|-1}}{T}=\frac{\left( \frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{c_i(e)}\right) ^{|J|-1}}{\sum _{i \in J}\left( \frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{c_i(e)}\right) ^{|J|-1}} \end{aligned}$$

(10)

Simultaneously, the algorithm finds the maximum singleton value $(e_{max}, i_{max})$ by updating the current maximal value from the set of observed elements. As Algorithm 3, the algorithm also uses $(e_{max}, i_{max})$ as one of candidate solutions and finds the best among them. The full detail of this algorithm is described in Algorithm 3.

Denote by $e \in supp(\mathbf{o })$ as a bad element if

$$\begin{aligned} \frac{f(\mathbf{s }^{t_e} \sqcup (e, \mathbf{o }(e)))}{c(\mathbf{s }^{t_e}) + c_{\mathbf{o }(e)}(e)}\ge \frac{\alpha v }{B} \ \text{ and } \ \ c(\mathbf{s }^{t_e}) + c_{\mathbf{o }(e)}(e) >B \end{aligned}$$

(11)

where $\mathbf{s }^{t_e}$ is the candidate solution obtained right before e arrives.

Lemma 3 provides the relationship among $\mathbf{o }, \mathbf{o }^j$ and $\mathbf{s }^j$ that plays an important role for analyzing the performance of the algorithm.

Lemma 3

Assume that there is no bad element. In the Algorithm 3, we have:

If f is monotone, then:
$$\begin{aligned} f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})] \le \beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1}) + \frac{\alpha v c_{j^*}(e^j)}{kB} \end{aligned}$$
(12)
If f is non-monotone, then:
$$\begin{aligned} f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^j)] \le 2\beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) + \frac{2\alpha v c_{j^*}(e^j)}{kB} \end{aligned}$$
(13)

Proof

We consider following cases:

Case 1 If f is monotone. If $J=\emptyset $ the algorithm returns current $\mathbf{s }^j$ so we consider the case $J\ne \emptyset $. We further consider two following sub-cases.

Case 1.1 $e^j \notin supp(\mathbf{o })$, due to the monotonicity of f we have:

$$\begin{aligned} f(\mathbf{o }^{j-1})-f(\mathbf{o }^{j}) \le 0 < \beta \left( 1-\frac{1}{k}\right) \left( f(\mathbf{s }^{j})-f(\mathbf{s }^{j-1})\right) \end{aligned}$$

Case 1.2 If $e^j \in supp(\mathbf{o })$. We reuse the notations $\mathbf{o }^j, \mathbf{o }^{j-1/2}, \mathbf{s }^j, \mathbf{s }^{j-1/2}$ as in Sect. 2 and define $\mathbf{o }_l^j, \mathbf{s }^{j}_{l}$, $j^*$ as follows: $\mathbf{o }^{j}_l(e)=\mathbf{o }^j(e), \forall e \in V\setminus \{e^j\}$ and $\mathbf{o }^{j}_l(e^j)=l$; $\mathbf{s }^{j}_{l}=\mathbf{s }^{j-1} \sqcup (e^j, l)$; $j^*=\mathbf{o }(e^j)$. Since there is no bad element, we consider two following sub-cases.

- If $j^* \in J$, i.e, $\frac{f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))}{c(\mathbf{s }^{j-1}) + c_{j^*} (e^j)}\ge \frac{\alpha v }{B} \ \ \text{ and } \ \ c(\mathbf{s }^j) +c_{j^*}(e^j) \le B$. If $|J|=1$, we have $f(\mathbf{o }^{j-1})-f(\mathbf{o }^{j}) = 0 < \beta \left( 1-\frac{1}{k}\right) \left( f(\mathbf{s }^{j})-f(\mathbf{s }^{j-1})\right) $ so we consider $|J|>1$. In this case we have:

$$\begin{aligned}&f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})] = \sum _{l\in J}(f(\mathbf{o }^{j-1})- f(\mathbf{o }^j_l)) \frac{p_l^{|J|-1}}{T} \end{aligned}$$

(14)

$$\begin{aligned}&= \sum _{l\in J\setminus \{j^*\}}(f(\mathbf{o }^{j-1})- f(\mathbf{o }^j_l)) \frac{p_l^{|J|-1}}{T} \end{aligned}$$

(15)

$$\begin{aligned}&\le \sum _{l\in J\setminus \{j^*\}}(f(\mathbf {o }^{j-1})- f(\mathbf {o }^{j-1/2})) \frac{p_j^{|J|-1}}{T} \end{aligned}$$

(16)

$$\begin{aligned}&\le \sum _{l\in J\setminus \{j^*\}}(f(\mathbf {s }_{j^*}^{j})- f(\mathbf {s }^{j-1})) \frac{p_l^{|J|-1}}{T} \end{aligned}$$

(17)

$$\begin{aligned}&= \frac{1}{T}\sum _{l\in J\setminus \{j^*\}}c_{j^*}(e^j) \cdot p_{j^*} \cdot p_l^{|J|-1} = \frac{c_{j^*}(e^j)}{T}\sum _{j\in J\setminus \{j^*\}}p_{j^*} \cdot \underbrace{p_l \cdot p_l}_{|J|-1} \end{aligned}$$

(18)

$$\begin{aligned}&\le \frac{c_{j^*}(e^j)}{T}\sum _{l\in J\setminus \{j^*\}} \frac{1}{|J|}(p_{j^*}^{|J|} + \underbrace{p_l^{|J|} + \ldots + p_l^{|J|}}_{|J|-1}) \ (\text{ By } \text{ applying } \text{ AG-GM } \text{ inequality}) \end{aligned}$$

(19)

$$\begin{aligned}&= c_{j^*}(e^j) \left( 1-\frac{1}{|J|}\right) \sum _{l\in J} \frac{p^{|J|}_{j}}{T} \end{aligned}$$

(20)

$$\begin{aligned}&\le c_{j^*}(e^j) \left( 1-\frac{1}{k}\right) \sum _{l\in J}\frac{f(\mathbf{s }^{j}_l)-f(\mathbf{s }^{j-1})}{c_l(e^j)}\frac{p_{l}^{|J|-1}}{T} \end{aligned}$$

(21)

$$\begin{aligned}&\le \beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^{j})]-f(\mathbf{s }^{j-1})) \end{aligned}$$

(22)

- If $j^* \notin J$, i.e, $\frac{f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))-f(\mathbf{s }^{j-1})}{ c_{j^*}(e^j)}< \frac{\alpha v }{B}$, then $p_{j^*} \le p_l, \forall l \in J$. Similar to the transform from (14) to (22), we have:

$$\begin{aligned} c_{j^*}(e^j) p_{j^*}&= c_{j^*}(e^j) \sum _{l \in J}p_{j^*} \frac{p_l^{|J|-1}}{T}\\&\le c_{j^*}(e^j) \sum _{l \in J}p_{l} \frac{p_l^{|J|-1}}{T}\\&\le \beta (\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) \end{aligned}$$

Therefore,

$$\begin{aligned} f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})]&\le f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))-f(\mathbf{s }^{j-1}) =c_{j^*}(e) p_{j^*}\\&\le \beta (1-\frac{1}{k})(\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) + \frac{ c_{j^*}(e^j) p_{j^*}}{k}\\&\le \beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) + \frac{ c_{j^*}(e^j)\alpha v}{kB} \end{aligned}$$

Case 2 If f is non-monotone, similar to the monotone case, we only consider $J\ne \emptyset $ and two following cases:

Case 2.1 If $e^j \notin supp(\mathbf{o })$, we consider two sub-cases:

- If there exist $l \in [k]\setminus J $ satisfying $ \frac{f(\mathbf{s }^{j-1} \sqcup (e^j, l))-f(\mathbf{s }^{j-1})}{ c_{l}(e^j)}< \frac{\alpha v }{B}$ and $c(\mathbf{s }^{j-1}) + c_l(e) \le B$, then $p_l< p_x \forall x \in J$. By the pairwise monotonicity and k-submodularity properties of f, we obtain:

$$\begin{aligned} f(\mathbf{o }^{j-1}) -\mathbb {E}[f(\mathbf{o }^{j})]&= f(\mathbf{o }_l^j) - f(\mathbf{o }^{j-1}) - (\mathbb {E}[f(\mathbf{o }^{j})] + f(\mathbf{o }_l^j)- 2 f(\mathbf{o }^{j-1}))\\&\le f(\mathbf{o }_l^j) - f(\mathbf{o }^{j-1}) \le f(\mathbf{s }^j_l) -f(\mathbf{s }^{j-1})\\&< \mathbb {E}[f(\mathbf{s }^j)] -f(\mathbf{s }^{j-1}) \end{aligned}$$

- If there does not exist such integer $l \in [k]\setminus J $, we define a permutation $\pi : J \mapsto J$ such that $\pi (i) \ne i, \forall i \in J$. We have:

$$\begin{aligned} f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})]&= \sum _{i\in J}(f(\mathbf{o }^{j-1})- f(\mathbf{o }^j_i)) \frac{p_i^{|J|-1}}{T}\\&= \sum _{i\in J}\left( f(\mathbf{o }_{\pi (i)}^j) - f(\mathbf{o }^{j-1}) - (f(\mathbf{o }_i^{j})\right. \\&\left. \quad + f(\mathbf{o }_{\pi (i)}^j)- 2 f(\mathbf{o }^{j-1}))\right) \frac{p_i^{|J|-1}}{T}\\&\le \sum _{i\in J}(f(\mathbf{o }_{\pi (i)}^j) - f(\mathbf{o }^{j-1}))\frac{p_i^{|J|-1}}{T}\\&\le \sum _{i\in J}(f(\mathbf{s }_{\pi (i)}^j) - f(\mathbf{s }^{j-1}))\frac{p_i^{|J|-1}}{T}\\&\le \sum _{i\in J}c_{\pi (i)}(e^j) p_{\pi (i)} \frac{p_i^{|J|-1}}{T}\le \frac{c_{max}}{T} \sum _{i\in J} p_{\pi (i)} p_i^{|J|-1}\\&= \frac{c_{max}}{T}\sum _{i \in J} p_{\pi (i)} \cdot \underbrace{p_{i} \cdot p_{i}}_{|J|-1}\\&\le \frac{c_{max}}{T}\sum _{i \in J} \frac{1}{|J|}( p_{\pi (i)}^{|J|} \\&\quad + \underbrace{p_i^{|J|} + \ldots + p_i^{|J|}}_{|J|-1}) \ (\text{ By } \text{ applying } \text{ AG-GM } \text{ inequality})\\&\le c_{max} \sum _{i\in J} \frac{p_i^{|J|}}{T}\\&\le \beta (\mathbb {E}[f(\mathbf{s }^j)]- f(\mathbf{s }^{j-1})) \end{aligned}$$

Case 2.2 If $e^j \in supp(\mathbf{o })$. Similar to the monotone case with notice that $f(\mathbf{o }^{j-1})=f(\mathbf{o }^j_{j^*})$, we further consider two following sub-cases:

- If $j^* \in J$, i.e, $\frac{f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))}{c(\mathbf{s }^{j-1}) + c_{j^*} (e^j)}\ge \frac{\alpha v }{B} \ \ \text{ and } \ \ c(\mathbf{s }^j) +c_{j^*}(e^j) \le B$.

$$\begin{aligned}&f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})]\\&\quad = \sum _{l\in J\setminus \{j^*\}}(f(\mathbf{o }^{j-1})- f(\mathbf{o }^j_l)) \frac{p_l^{|J|-1}}{T} \\&\quad = \sum _{l\in J\setminus \{j^*\}}(2f(\mathbf{o }^j_{j^*})-2f(\mathbf{o }^{j-1/2}) - (f(\mathbf{o }_{j^*}^j) +f(\mathbf{o }^j_l) -2f(\mathbf{o }^{j-1/2})) )\frac{p_l^{|J|-1}}{T} \\&\quad \le \sum _{l\in J\setminus \{j^*\}}2(f(\mathbf{o }^j_{j^*})-f(\mathbf{o }^{j-1/2}))\frac{p_l^{|J|-1}}{T} \\&\quad \le \sum _{l\in J\setminus \{j^*\}}2(f(\mathbf{s }^j_{j^*})-f(\mathbf{s }^{j-1}))\frac{p_l^{|J|-1}}{T} \\&\quad \le 2\beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^j)]-f(\mathbf{s }^{j-1})) \end{aligned}$$

- If $j^* \notin J$, i.e, $\frac{f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))-f(\mathbf{s }^{j-1})}{ c_{j^*}(e^j)}< \frac{\alpha v }{B}$, then $p_{j^*} \le p_l, \forall l \in J$. Apply the transform as in case 2.1, we have:

$$\begin{aligned} f(\mathbf{o }^{j-1})-\mathbb {E}[f(\mathbf{o }^{j})]&\le 2\beta (f(\mathbf{s }^{j-1} \sqcup (e^j, j^*))-f(\mathbf{s }^{j-1}))= 2c_{j^*}(e^j)p_{j^*} \\&\le 2\beta \left( 1-\frac{1}{k}\right) (\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) +\frac{2c_{j^*}(e^j)\alpha v}{kB} \end{aligned}$$

Combine all cases, in general we obtain the proof. $\square $

Theorem 3

Algorithm 2 returns a solution $\mathbf{s }$ satisfying:

If f is monotone, $\mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}v$. The right hand side is maximized to $ \frac{v}{3+\beta -\frac{\beta }{k}} $ when $\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}$.
If f is non-monotone, $\mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \}v$. The right hand side is maximized to $ \frac{v}{3+2\beta -\frac{2\beta }{k}}$ when $\alpha = \frac{2}{3+2\beta -\frac{2\beta }{k}} $.

Proof

If f is monotone, denote by $e^t$ the last addition of the main loop of the algorithm, we consider two sub-cases as follows:

Case 1 There is no bad element e. By applying Lemma 3, we obtain:

$$\begin{aligned}&v-\mathbb {E}[f(\mathbf{s }^t)]\\&\quad \le f(\mathbf{o })-\mathbb {E}[f(\mathbf{s }^t)]= f(\mathbf{o })-f(\mathbf{o }^t) + f(\mathbf{o }^t)-\mathbb {E}[f(\mathbf{s }^t)]\\&\quad =\sum _{j=1}^t(f(\mathbf{o }^{j-1})-f(\mathbf{o }^j)) + \sum _{i=1}^{t}(f(\mathbf{u }^t_i)-f(\mathbf{u }^t_{i-1}))\\&\quad \le \beta \left( 1-\frac{1}{k}\right) \sum _{j=1}^t (f(\mathbf{s }^j)-f(\mathbf{s }^{j-1}) + \frac{\alpha v c_{j^*}(e^j)}{kB})\\&\qquad + \sum _{i=1}^{r}(f(\mathbf{s }^{t_{u_i}} \sqcup (u_i, j_i))-f(\mathbf{s }^{t_{u_i}}) \\&\quad \le \beta \left( 1-\frac{1}{k}\right) \mathbb {E}[f(\mathbf{s }^t)] + \sum _{j=1}^t\frac{\alpha v c_{j^*}(e^j)}{kB} + \sum _{i=1}^{r}\frac{\alpha v c(u_i)}{B} \\&\quad \le \beta \left( 1-\frac{1}{k}\right) \mathbb {E}[f(\mathbf{s }^t)] + \sum _{e \in supp(o)\cap supp(\mathbf{s }^t)}\frac{\alpha v c_{\mathbf{o }(e)}(e)}{B}\\&\qquad + \sum _{e \in supp(o)\setminus supp(\mathbf{s }^t)}\frac{\alpha v c_{\mathbf{o }(e)}(e)}{B} \le \beta \left( 1-\frac{1}{k}\right) \mathbb {E}[f(\mathbf{s }^t)] + \alpha v \end{aligned}$$

This implies that $\mathbb {E}[f(\mathbf{s }^t)]\ge \frac{(1-\alpha )v}{(1+\beta )k-\beta }$.

Case 2 There exists a bad element e. Let $j^*=\mathbf{o }(e)$ we have:

$$\begin{aligned} f(\mathbf{s }^{t_e} \sqcup (e, j^*))&\ge \frac{c_{j^*}(e)\alpha v }{B} +f(\mathbf{s }^{t_e})\ge \frac{c_{j^*}(e)\alpha v }{B} +\sum _{j=1}^{t_e}(f(\mathbf{s }^j)-f(\mathbf{s }^{j-1})) \\&\ge \frac{(c(\mathbf{s }^{t_e})+ c_{j^*}(e))\alpha v}{B} \ge \alpha v \end{aligned}$$

Therefore,

$$\begin{aligned} f(\mathbf{s })&\ge \max \{f(\mathbf{s }^{t_e}), f((e_{max}, i_{max}))\} \end{aligned}$$

(23)

$$\begin{aligned}&\ge \max \{f(\mathbf{s }^{t_e}), f((e, j^*))\} \ge \frac{f(\mathbf{s }^{t_e}) +f((e, j^*))}{2} \end{aligned}$$

(24)

$$\begin{aligned}&\ge \frac{ f(\mathbf{s }^{t_e} \sqcup (e, j^*))}{2} \ \ \text{(Due } \text{ to } \text{ the }\, k{\text {-submodularity}}) \end{aligned}$$

(25)

$$\begin{aligned}&\ge \frac{\alpha v}{2} \end{aligned}$$

(26)

Combine two above cases, we obtain $\mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}v$, $f(\mathbf{s })$ is maximized to $ \frac{v}{3+\beta -\frac{\beta }{k}} $ when $\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}$.

If f is non-monotone, similar to the monotone case and combine with Lemma 3, we obtain the proof. $\square $

4.2 A random streaming algorithm

In this section we remove the assumption that the optimal solution is known and present the random streaming algorithm which reuses the framework of Algorithm 3.

Similar to the Algorithm 2, we use the method in Badanidiyuru et al. (2014) to estimate ${\mathsf {opt}}$. We assume that $\beta $ is known in advance. This is feasible because we can calculate the value of $\beta $ in O(kn). We set $\alpha $ according to the properties of f to provide the best performance of the algorithm. The algorithm continuously updates $O \leftarrow \{j| f((e_{max}, i_{max}))\le (1+\epsilon )^j \le B f((e_{max}, i_{max})), j\in \mathbb {Z}_+ \}$ in order to estimate the value of maximal singleton and uses $\mathbf{s }_j^{t_j}$ and $t_j$ to save candidate solutions, which is updated by using the probability distribution as in Algorithm 3 with $(1+\epsilon )^j$ as an estimation of the optimal solution. The algorithm finally compares all candidate solutions to select the best one. The details of algorithm is presented in Algorithm 4.

Theorem 4

Algorithm 4 is one pass streaming algorithm that has $O\left( \frac{kn}{\epsilon } \log n\right) $ query complexity, $O\left( \frac{n}{\epsilon } \log n\right) $ space complexity and

If f is monotone, $\mathbb {E}[f(\mathbf{s })] \ge \left( \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}-\epsilon \right) {\mathsf {opt}}$. The right hand side is maximized to $ \left( \frac{1}{3+\beta -\frac{\beta }{k}} -\epsilon \right) {\mathsf {opt}}$ when $\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}$.
If f is non-monotone, $\mathbb {E}[f(\mathbf{s })] \ge \left( \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \}-\epsilon \right) {\mathsf {opt}}$. The right hand side is maximized to $\left( \frac{1}{3+2\beta -\frac{2\beta }{k}}-\epsilon \right) {\mathsf {opt}}$ when $\alpha = \frac{2}{3+2\beta -\frac{2\beta }{k}} $.

Proof

By Lemma 2, there exists $j \in \mathbb {Z}_+$ that $v=(1+\epsilon )^j \in O$ satisfies $(1-\epsilon ){\mathsf {opt}}\le v \le {\mathsf {opt}}$. Similar to the proof of Theorem 2, we easily show the query and space complexities of Algorithm 4. Using similar arguments of the proof of Theorem 3, for the monotone case:

$$\begin{aligned} \mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}v \ge \left( \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}-\epsilon \right) {\mathsf {opt}}\end{aligned}$$

and if $\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}$, we have:

$$\begin{aligned} \mathbb {E}[f(\mathbf{s })]\ge \frac{v}{3+\beta -\frac{\beta }{k}} \ge \left( \frac{1}{3+\beta -\frac{\beta }{k}}-\epsilon \right) {\mathsf {opt}}\end{aligned}$$

For the non-monotone case we also obtain the proof by applying the same arguments. $\square $

Remark 1

In the case of $\beta =1$, the algorithm returns an approximation ratio of $\frac{1}{4-1/k}-\epsilon $ when f is monotone and $\frac{1}{5-2/k}-\epsilon $ when f is non-monotone in expectation. Thus, these approximation ratios are better than that of Algorithm 2 in expectation.

5 Experiments

In this section, we compare the performance of our algorithms with a greedy algorithm in two applications of $\textsf {BkSM}$: Influence Maximization (IM) with k topics and Sensor Placement on three major metrics: value of objective function, the number of queries and running time. Besides, we further show the trade-off between the solution quality and the number of queries of algorithms with various settings of $\epsilon $.

5.1 The Greedy algorithm

Since existing algorithms for k-submodular maximization problems cannot be applied directly to $\textsf {BkSM}$, we adapt the recent Greedy algorithm (Ohsaka and Yoshida 2015) which is the best current non-streaming algorithm with some modifications. The Greedy algorithm iteratively adds a pair (e, i) into the current solution $\mathbf{s }$, which maximizes the marginal gain per its cost $\frac{f(\mathbf{s }\sqcup (e, i)) -f(\mathbf{s })}{c_i(e)}$ until there is no remaining cost to add any element. The algorithm has $O(k^2n^2)$ query complexity. The pseudo-code is presented in Algorithm 5.

5.2 Influence Maximization with k topics subject to a budget constraint

We first recap the information diffusion model, called Linear Threshold (LT) model (Kempe et al. 2003; Nguyen and Thai 2020) and then define the Influence Maximization with k topics subject to the budget constraint problem ($\textsf {IMkB}$) under this model.

LT model. In this model, a social network is modeled by a directed graph $G=(V, E)$, where V, E represent a set of users and a set of links, respectively. Each edge $(u, v)\in E$ is assigned weights $\{ w^i(u, v) \}_{ i\in [k]}$, where each $w^i(u, v)$ represents the strength of influence from u to v on the i-th topic. Each node $u \in V$ has a influence threshold with topic i, denoted by $\theta ^i(u)$, which is chosen uniformly at random in [0, 1]. Given a seed set $\mathbf{s }=(S_1, S_2, \ldots , S_k) \in (k+1)^V$, the information propagation for topic i happens in discrete steps $t=0, 1, \ldots $ as follows. At step $t=0$, all nodes in $S_i$ become active by topic i. At step $t\ge 1$, a node u becomes active if $\sum _{\mathrm{active\, node\, v}} w^i(v, u) \ge \theta ^i(u)$.

The information diffusion process on topic i ends at step t if there is no new active node in this step and the diffusion process of a topic is independent from other topics. Denote by $\sigma (\mathbf{s })$ as the number of activated nodes which becomes active in at least one of k topics after the diffusion process gives a seed k-set $\mathbf{s }$, i.e,

$$\begin{aligned} \sigma (\mathbf{s })=\mathbb {E}[|\cup _{i\in [k]} \sigma _i(S_i)|] \end{aligned}$$

(27)

where $\sigma _i(S_i)$ is a random variable representing the set of active users for topic i with seed $S_i$. The $\textsf {IMkB}$ problem is formally defined as follows:

Definition 2

($\textsf {IMkB}$ problem) Assume that each user u has a cost $c_i(u)$ for i-th topic which manifests how hard it is to initially influence the respective person for that topic. Given the budget B, the problem asks to find a seed set $\mathbf{s }$ with $c(\mathbf{s })\le B$ so that $\sigma (\mathbf{s })$ is maximal.

Experiment settings We use the Facebook social network dataset from SNAP (Leskovec and Krevl 2014). The network contains 4,039 nodes and 88,234 edges. Weights $\{w^i(u, v)\}_{i\in [k]}$ of the edge (u, v) are randomly selected from a set $\{\frac{1}{k N(v) }, \frac{2}{k N(v)}, \ldots , \frac{k}{k N(v) }\}$ according to the recent work (Nguyen and Thai 2020), where $d_v$ is in-degree of v.

Since the computation of $\sigma (\cdot )$ is #P-hard (Chen et al. 2010), we adapt the sampling method in Nguyen and Thai (2020); Borgs et al. (2014) to give an estimation $\hat{\sigma }(\cdot )$ with a $(\lambda ,\delta )$-approximation that is:

$$\begin{aligned} \Pr [(1+\lambda )\sigma (\mathbf{s }) \ge \hat{\sigma }(\mathbf{s }) \ge (1-\lambda )\sigma (\mathbf{s })]\ge 1-\delta \end{aligned}$$

(28)

In algorithms, we set parameters $\lambda =0.5, \delta =0.2$ and $k=3$ as in Nguyen and Thai (2020). In our algorithms, we set $\epsilon $ on varying in $\{0.1, 0.2, 0.3\}$ to show a trade-off between solution quality and number of queries. The costs of each element e are established in the following two cases:

Case 1 $\beta =1$, we set $c_i(e)=c_j(e)=c(e), i,j \in [k]$ with c(e) is calculated under the Normalized Linear model with the support [1, 2] according to recent works Nguyen and Zheng (2013) and Li et al. (2019) and the budget B varies in $\{10, 20, 30, 40, 50\}$.
Case 2: General $\beta $ $c_i(u)$ is also calculated under the Normalized Linear model with the support [1, 2] and the budget B varies in $\{10, 20, 30, 40, 50\}$. For Algorithm 4, $\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}$ if f is monotone and $\alpha = \frac{2}{3+2\beta -\frac{2\beta }{k}}$ if f is non-monotone with $\beta =2$.

Experiment results. For the purpose of providing a comprehensive experiment, we divide the experiment into two cases: the special case when $\beta =1$ and the general case. Figure 1 shows the performance of algorithms for $\beta =1$. We denote Algorithm x with $\epsilon =y$ by Alg.x(y). With solution quality (influence spread), our algorithms outperform Greedy algorithm in most cases. For the query complexity, our algorithms totally outperform Greedy algorithm by a large gap except Alg. 4(0.1) with $B=10,20$. They require up to 10 times fewer queries than the Greedy algorithm.

Compare Algorithm 2 with Algorithm 4, we can find that Algorithm 4 provides the better solution than Algorithm 2 with the same value of $\epsilon $ for most cases. This is consistent with the theoretical analysis and the discussion in Remark 1. However, Algorithm 4 requires more queries and running time than Algorithm 2. It might be because of Algorithm 2 that reaches to the budget B faster than Algorithm 4.

Figure 2 shows the performance of algorithms for the general case. Algorithm 2 can not adapt for $\textsf {IMkB}$ in this case, therefore there is no plot of this algorithm in Fig. 2. Again, our random streaming algorithm gives better quality solutions and takes fewer queries and running time than Greedy algorithm. This consists of results of the case $\beta =1$. We now investigate the affect of $\epsilon $ on the performance of these algorithms and the trade-off between solution quality, number of queries and running time of our streaming ones. As $\epsilon $ increases, our streaming algorithms tend to take fewer queries and running time but give lower quality of solutions. This is more clearly reflected for the general case in Fig. 2.

5.3 Sensor placement with k types of measures subject to a budget constraint

We further study the performance of algorithms for Sensor placement with k types of measures subject to a budget constraint ($\textsf {SPkB}$) problem. In this problem, we have k types of sensors for different measures and a set V of n locations, each of which can be instrumented with only one sensor. Denote by $X_e^i$ a random variable representing the observation collected from a sensor of kind i and the information gained of a k-set $\mathbf{s }$ is

$$\begin{aligned} f(\mathbf{s })=H(\cup _{e \in supp(\mathbf{s })} \{X_e^i\}) \end{aligned}$$

(29)

where H is entropy function. The function f is monotone and k-submodular Ohsaka and Yoshida (2015). We assume that allocating of a sensor to each location has a different cost depending on its position and the kind of sensor. Given the budget B, the $\textsf {SPkB}$ problem aims at allocating sensors to maximize the information gained with the total cost is at most B.

Experiment settings As previous works (Ohsaka and Yoshida 2015; Nguyen and Thai 2020), we use Intel Lab dataset (Bodik et al. 2004). This contains a log approximately 2.3 million readings collected from 54 sensors deployed in the Intel Berkeley research lab between February 28th and April 5th, 2004. Temperature, humidity, and light values are extracted and discretized into several bins of 2 degrees Celsius each, 5 points each, and 100 luxes each, respectively. Finally, we set $k=3$, $\epsilon $, costs and the budget B as in the experiments of $\textsf {IMkB}$.

Experiment results We also conduct this experiment for two cases as previous experiment. Figure 3 shows the performance of algorithms for the case $\beta =1$. Differing from the results of $\textsf {IMkB}$, Greedy gives the best solution quality but the gap with our algorithms is not significant. Once again, Algorithm 4 is usually better than Algorithm 2. However, it needs more number of queries and running time than Algorithm 2. The performance of random streaming algorithm and Greedy for the general case is shown in Fig. 4. Our algorithm is able to perform approximately to Greedy but it runs faster and takes 6 times fewer queries than Greedy.

6 Conclusions

This paper studies the $\textsf {BkSM}$ problem, which generalizes the problem of maximizing a k-submodular function under size constraint by considering the cost of each element and the a limited budget. We propose two single pass streaming algorithms with provable guarantees. The core of our algorithms is to exploit the relation between candidate solutions and the optimal solution by analyzing intermediate quantities and using a new probability distribution then comparing the contribution value (marginal objective per cost) to a given appropriate threshold.

In order to investigate the performance of our algorithms in practice, we conduct some experiments on two applications of Influence maximization and Sensor placement. Experimental results have shown that our algorithms not only return good solutions in term of quality requirement but also take a sharply smaller number of queries than that of the state-of-the-art Greedy algorithm. In the future, we further investigate the k-submodular maximization under an individual budget constraint in which each subset $S_i$ of the solution has a budget constraint.

References

Badanidiyuru A, Vondrák J (2014) Fast algorithms for maximizing submodular functions. In: Chekuri C (ed) Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA 2014, Portland, Oregon, USA, January 5–7, 2014, SIAM, pp 1497–1514. https://doi.org/10.1137/1.9781611973402.110
Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: massive data summarization on the fly. In: Macskassy SA, Perlich C, Leskovec J, Wang W, Ghani R (eds) The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, New York, NY, USA—August 24–27, 2014, ACM, pp 671–680
Bodik P, Hong W, Guestrin C, Madden S, Paskin M, Thibaux R (2004) Intel lab. http://db.csail.mit.edu/labdata/labdata.html
Borgs C, Brautbar M, Chayes JT, Lucier B (2014) Maximizing social influence in nearly optimal time. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA 2014, Portland, Oregon, USA, January 5–7, 2014, pp 946–957
Buchbinder N, Feldman M, Naor J, Schwartz R (2015) A tight linear time (1/2)-approximation for unconstrained submodular maximization. SIAM J Comput 44(5):1384–1402
Article MathSciNet Google Scholar
Călinescu G, Chekuri C, Pál M, Vondrák J (2011) Maximizing a monotone submodular function subject to a matroid constraint. SIAM J Comput 40(6):1740–1766
Article MathSciNet Google Scholar
Chakrabarti A, Kale S (2015) Submodular maximization meets streaming: matchings, matroids, and more. Math Program 154(1–2):225–247
Article MathSciNet Google Scholar
Chen W, Yuan Y, Zhang L (2010) Scalable influence maximization in social networks under the linear threshold model. In: ICDM 2010, The 10th IEEE international conference on data mining, Sydney, Australia, 14–17 December 2010, pp 88–97
Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Fürnkranz J, Joachims T (eds) Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010, Haifa, Israel, Omnipress, pp 391–398
Haba R, Kazemi E, Feldman M, Karbasi A (2020) Streaming submodular maximization under a k-set system constraint. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 2020, virtual event, PMLR, proceedings of machine learning research, vol 119, pp 3939–3949
Huang C, Kakimura N, Yoshida Y (2020) Streaming algorithms for maximizing monotone submodular functions under a knapsack constraint. Algorithmica 82(4):1006–1032
Article MathSciNet Google Scholar
Iwata S, Tanigawa S, Yoshida Y (2016) Improved approximation algorithms for k-submodular function maximization. In: Krauthgamer R (ed) Proceedings of the twenty-seventh annual ACM-SIAM symposium on discrete algorithms, SODA 2016, Arlington, VA, USA, 2016, SIAM, pp 404–413
Kempe D, Kleinberg JM, Tardos É (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, Washington, DC, USA, August 24–27, 2003, pp 137–146. https://doi.org/10.1145/956750.956769
Krause A, Singh AP, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res 9:235–284
MATH Google Scholar
Kumar R, Moseley B, Vassilvitskii S, Vattani A (2013) Fast greedy algorithms in mapreduce and streaming. In: Blelloch GE, Vöcking B (eds) 25th ACM symposium on parallelism in algorithms and architectures, SPAA ’13, Montreal, QC, Canada—23–25, 2013, ACM, pp 1–10
Leskovec J, Krevl (2014) A. snap datasets: Stanford large network dataset collection. http://snap.stanford.edu/data
Li X, Smith JD, Dinh TN, Thai MT (2019) Tiptop: (almost) exact solutions for influence maximization in billion-scale networks. IEEE/ACM Trans Netw 27(2):649–661
Article Google Scholar
Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrák J, Krause A (2015) Lazier than lazy greedy. In: Bonet B, Koenig S (eds) Proceedings of the twenty-ninth AAAI conference on artificial intelligence, 2015, Austin, Texas, USA, AAAI Press, pp 1812–1818
Mirzasoleiman B, Badanidiyuru A, Karbasi A (2016) Fast constrained submodular maximization: personalized data summarization. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19–24, 2016, JMLR.org, JMLR workshop and conference proceedings, vol 48, pp 1358–1367
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions - I. Math Program 14(1):265–294. https://doi.org/10.1007/BF01588971
Article MathSciNet MATH Google Scholar
Nguyen H, Zheng R (2013) On budgeted influence maximization in social networks. IEEE J Sel Areas Commun 31(6):1084–1094. https://doi.org/10.1109/JSAC.2013.130610
Article Google Scholar
Nguyen L, Thai M (2020) Streaming k-submodular maximization under noise subject to size constraint. In: Daumé H, Singh A (eds) Proceedings of the international conference on machine learning, (ICML-2020), thirty-seventh international conference on machine learning
Ohsaka N, Yoshida Y (2015) Monotone k-submodular function maximization with size constraints. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems 28: annual conference on neural information processing systems 2015, Montreal, Quebec, Canada, pp 694–702
Oshima H (2017) Derandomization for k-submodular maximization. In: Brankovic L, Ryan J, Smyth WF (eds) Combinatorial algorithms—28th international workshop, IWOCA 2017, Lecture notes in computer science, vol 10765, pp 88–99
Qian C, Shi J, Tang K, Zhou Z (2018) Constrained monotone k-submodular function maximization using multiobjective evolutionary algorithms with theoretical guarantee. IEEE Trans Evol Comput 22(4):595–608
Article Google Scholar
Rafiey A, Yoshida Y (2020) Fast and private submodular and k-submodular functions maximization with matroid constraints. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18, 2020, virtual event, proceedings of machine learning research, vol 119, pp 7887–7897
Rafiey A, Yoshida Y (2020) Fast and private submodular and k-submodular functions maximization with matroid constraints. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, virtual event, PMLR, proceedings of machine learning research, vol 119, pp 7887–7897
Sakaue S (2017) On maximizing a monotone k-submodular function subject to a matroid constraint. Discret Optim 23:105–113
Article MathSciNet Google Scholar
Schrijver A (2003) Combinatorial optimization: polyhedra and efficiency. Springer, Algorithms and Combinatorics
MATH Google Scholar
Singh AP, Guillory A, Bilmes JA (2012) On bisubmodular maximization. In: Lawrence ND, Girolami MA (eds) Proceedings of the fifteenth international conference on artificial intelligence and statistics, AISTATS 2012, La Palma, Canary Islands, Spain, 21–23, 2012, JMLR.org, JMLR proceedings, vol 22, pp 1055–1063
Soma T (2019) No-regret algorithms for online $k$-submodular maximization. In: Chaudhuri K, Sugiyama M (eds) The 22nd international conference on artificial intelligence and statistics, AISTATS 2019, 16–18 April 2019, Naha, Okinawa, Japan, proceedings of machine learning research, vol 89, pp 1205–1214
Sviridenko M (2004) A note on maximizing a submodular set function subject to a knapsack constraint. Oper Res Lett 32(1):41–43
Article MathSciNet Google Scholar
Tang Z, Wang C, Chan H (2022) On maximizing a monotone k-submodular function under a knapsack constraint. Oper Res Lett 50(1):28–31
Article MathSciNet Google Scholar
Thapper J, Zivný S (2012) The power of linear programming for valued csps. In: 53rd annual IEEE symposium on foundations of computer science, FOCS 2012, New Brunswick, NJ, USA, 2012, IEEE Computer Society, pp 669–678
Wang B, Zhou H (2021) Multilinear extension of k-submodular functions. CoRR abs/2107.07103, https://arxiv.org/abs/2107.07103, eprint2107.07103
Ward J, Zivný S (2014) Maximizing bisubmodular and k-submodular functions. In: Chekuri C (ed) Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms, SODA 2014, Portland, Oregon, USA, 2014, SIAM, pp 1468–1481
Wolsey LA (1982) Maximising real-valued submodular functions: primal and dual heuristics for location problems. Math Oper Res 7(3):410–425
Article MathSciNet Google Scholar
Yang R, Xu D, Cheng Y, Gao C, Du D (2019) Streaming submodular maximization under noises. In: 39th IEEE international conference on distributed computing systems, ICDCS 2019, Dallas, TX, USA, July 7–10, 2019, IEEE, pp 348–357
Yu Q, Xu EL, Cui S (2016) Submodular maximization with multi-knapsack constraints and its applications in scientific literature recommendations. In: 2016 IEEE global conference on signal and information processing, GlobalSIP 2016, Washington, DC, USA, 2016, IEEE, pp 1295–1299
Zhang Y, Li M, Yang D, Xue G (2019) A budget feasible mechanism for k-topic influence maximization in social networks. In: 2019 IEEE global communications conference, GLOBECOM 2019, Waikoloa, HI, USA, December 9–13, 2019, IEEE, pp 1–6
Zheng L, Chan H, Loukides G, Li M (2021) Maximizing approximately k-submodular functions. In: Demeniconi C, Davidson I (eds) Proceedings of the 2021 SIAM international conference on data mining, SDM 2021, Virtual Event, 2021, SIAM, pp 414–422

Download references

Acknowledgements

This work was supported by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.01-2020.21. The work has been carried out partly at the Vietnam Institute for Advanced Study in Mathematics (VIASM). The first author (Canh V. Pham) would like to thank VIASM for the hospitality and financial support.

Author information

Authors and Affiliations

ORlab, Faculty of Computer Science, Phenikaa University, Hanoi, 12116, Vietnam
Canh V. Pham
Faculty of Information Security, People’s Security Academy, Hanoi, 10000, Vietnam
Quang C. Vu
Faculty of Information Technology, University of Engineering and Technology, Vietnam National University, Hanoi, 10000, Vietnam
Dung K. T. Ha & Tai T. Nguyen
Haiphong University, Haiphong, 180000, Vietnam
Nguyen D. Le

Authors

Canh V. Pham
View author publications
You can also search for this author in PubMed Google Scholar
Quang C. Vu
View author publications
You can also search for this author in PubMed Google Scholar
Dung K. T. Ha
View author publications
You can also search for this author in PubMed Google Scholar
Tai T. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen D. Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Canh V. Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version appears in the proceedings of the 10th International Conference on Computational Data and Social Networks. This paper extends and revises the conference version by providing all the proofs more detail and experiment evaluation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pham, C.V., Vu, Q.C., Ha, D.K.T. et al. Maximizing k-submodular functions under budget constraint: applications and streaming algorithms. J Comb Optim 44, 723–751 (2022). https://doi.org/10.1007/s10878-022-00858-x

Download citation

Accepted: 21 March 2022
Published: 18 April 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s10878-022-00858-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Maximizing k-submodular functions under budget constraint: applications and streaming algorithms

Abstract

Similar content being viewed by others

Streaming Algorithms for Maximizing k-Submodular Functions with the Multi-knapsack Constraint

An optimal streaming algorithm for non-submodular functions maximization on the integer lattice

A Multi-pass Streaming Algorithm for Regularized Submodular Maximization

1 Introduction

Definition 1

1.1 Our contributions

1.2 Related work

1.3 Organization

2 Preliminaries

3 A deterministic streaming algorithm when \(\beta =1\)

3.1 A deterministic streaming algorithm with known optimal value

Lemma 1

Proof

Theorem 1

Proof

3.2 A deterministic streaming algorithm

Lemma 2

Proof

Theorem 2

Proof

4 A random streaming algorithm for general case

4.1 A random algorithm with known the optimal value

Lemma 3

Proof

Theorem 3

Proof

4.2 A random streaming algorithm

Theorem 4

Proof

Remark 1

5 Experiments

5.1 The Greedy algorithm

5.2 Influence Maximization with k topics subject to a budget constraint

Definition 2

5.3 Sensor placement with k types of measures subject to a budget constraint

6 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation