Keywords

1 Introduction

Maximizing k-submodular functions has attracted a lot of attention because of its potential in solving various combinatorial optimization problems such as influence maximization [8, 9, 11, 12], sensor placement [9, 11, 12], feature selection [14] and information coverage maximization [11]. Given a finite set V and an integer k, we define \([k]=\{1, 2, \ldots , k\}\) and \((k+1)^V=\{(X_1, X_2, \ldots , X_k)| X_i \subseteq V, \forall i \in [k], X_i \cap X_j =\emptyset , \forall i \ne j\}\) be a family of k disjoint sets. A function \(f: (k+1)^V \mapsto \mathbb {R}_+\) is k-submodular iff for any \(\mathbf{x }=(X_1, X_2, \ldots , X_k)\) and \(\mathbf{y }=(Y_1, Y_2, \ldots , Y_k)\) \(\in (k+1)^V\), we have:

$$\begin{aligned} f(\mathbf{x })+ f(\mathbf{y }) \ge f(\mathbf{x }\sqcap \mathbf{y }) + f(\mathbf{x }\sqcup \mathbf{y }) \end{aligned}$$
(1)

where

$$\mathbf{x }\sqcap \mathbf{y }=(X_1 \cap Y_1, \ldots , X_k \cap Y_k)$$

and

$$\mathbf{x }\sqcup \mathbf{y }=\left( X_1 \cup Y_1 \setminus (\bigcup _{i \ne 1} X_i \cup Y_i), \ldots , X_k \cup Y_k \setminus (\bigcup _{i \ne k} X_i \cup Y_i)\right) $$

Although there exists a polynomial time to maximize a k-submodular function [16], maximizing a k-submodular function is still NP-hard. Studying on maximizing a k-submodular function was initiated by Singh et al. [14] with \(k=2\). Ward et al. [17] first studied to maximize an unconstrained k-submodular function for general k and devised a deterministic greedy algorithm which provided an approximation ratio of 1/3. Later on, [6] introduced a random greedy approach which improved the approximation ratio to \(\frac{k}{2k-1}\) by applying a probability distribution to select any larger marginal element that has a higher probability. The authors in [10] eliminated the random told above but the number of queries increased to \(O(n^2k^2)\). The unconstrained maximizing k-submodular function was further studied in [15] in online settings.

Under the size constraint, Oshaka et al. [9] first proposed 1/2-approximation algorithm by using a greedy approach for maximizing monotone k-submodular maximization functions. [13] showed a greedy selection that could give an approximation ratio of 1/2 under the matroid constraint. The authors in [11] then further proposed multi-objective evolutionary algorithms that provided 1/2-approximation ratio under the size constraint but took \(O(kn\log ^2B)\) queries in expectation. Recently, Nguyen [12] et al. considered the k-submodular maximization problem subjected to the total size constraint under noises and devised two streaming algorithms which provided the approximation ratio of \(O(\epsilon (1-\epsilon )^{-2}B)\) when f was monotone and \(O(\epsilon (1-\epsilon )^{-3}B)\) when f was non-monotone.

Although there have been many attempts to solve the problem of maximizing a k-submodular function under several kinds of constraints, they did not cover several cases that could happens frequently in reality in which each element could be customized in terms of its private cost or a problem was provided with just limited budgets. Let’s consider the following application:

Influence Maximization with k Topics. Given a social network under an information diffusion model and k topics. Each user has a cost to start the influence under a topic which manifests how hard it is to initially influence to a respective person. Given a budget B, we consider the problem of finding a set of users (seed set), each initially adopts a topic, with the total cost is at most B to maximize the expected number of users who are eventually activated by at least one topic. In this application, the expected number of influenced users (objective) function is k-submodular where each user corresponds to each element in the set V [8, 9, 12].

Motivated by that observation, in this work, we study a novel problem named Budgeted k-submodular maximization (\(\textsf {BkSM}\)), defined as follows:

Definition 1

Given a finite set V, a budget B and a k-submodular function \(f: (k+1)^V \mapsto \mathbb {R}_+\). The problem asks to find a solution \(\mathbf{s }=(S_1, S_2, \ldots , S_k)\), each element \(e \in V\) has a cost \(c_i(e)>0\) to be put in to \(S_i\), with total cost \(c(\mathbf{s })=\sum _{i \in [k]}\sum _{e \in S_i}c_i(e)\le B\) so that \(f(\mathbf{s })\) is maximized.

In addition, input data increasing constantly makes it impossible to be stored in computer memory. Therefore it is critical to devise streaming algorithms which not only reduce the requirement of stored memory but also be able to produce guaranteed solutions in a single pass or some passes. Although streaming algorithm is one of efficient methods for solving submodular maximization problems under various kinds of constraints such as cardinality constraint [1, 3, 7, 18], knapsack constraint [5], k-set constraint [4] and matroid constraint [2], it is not potential to directly be applied to our \(\textsf {BkSM}\) problem due to intrinsic differences between submodularity and k-submodularity.

Our Contributions. In this paper we propose several algorithms which provide theoretical bounds of \(\textsf {BkSM}\). Overall, our contributions are as follows:

  • For a special case when every element has the same cost to be added to any i-th set, we first propose a deterministic streaming algorithm (Algorithm 2) which runs in a single pass, has \(O( \frac{kn}{\epsilon }\log B)\) query complexity, \(O(\frac{B}{\epsilon } \log B)\) space complexity and returns an approximation ratio of \(\frac{1}{4}-\epsilon \) when f is monotone and \(\frac{1}{5}-\epsilon \) when f is non-monotone for any input value of \(\epsilon >0\).

  • For the general case, we propose a random streaming algorithm (Algorithm 4) which runs in a single pass, has \(O( \frac{kn}{\epsilon }\log B)\) query complexity, \(O(\frac{B}{\epsilon } \log B)\) space complexity and returns an approximation ratio of \(\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}-\epsilon \) when f is monotone and \(\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \}-\epsilon \) when f is non-monotone in expectation where \(\beta =\max _{e\in V, i , j \in [k], i\ne j} \frac{c_i(e)}{c_j(e)}\) and \(\alpha \in (0,1], \epsilon \in (0,1)\) are inputs.

Our algorithms is an inspired suggestion from [1, 5] in which we also sequentially make decision based on the value of incremental objective function per cost of each element and guess the optimal solution through the maximum singleton value. In addition, we introduce a new probability distribution to subsequently select a new element to candidate solutions.

Organization. The rest of the paper is organized as follows: The notations and properties of k-submodular functions are presented in Sect. 2. Section 3 and 4 present our algorithms and theoretical analysis. Finally, we conclude this work in Sect. 5.

2 Preliminaries

Given a finite set V and an integer k, denote \([k]=\{1, 2, \ldots , k\}\), let \((k+1)^V=\{(X_1, X_2, \ldots , X_k)| X_i \subseteq V, \forall i \in [k], X_i \cap X_j =\emptyset , \forall i \ne j\}\) be a family of k disjoint sets, called a k-set. We define \(supp_i(\mathbf{x })=X_i\), \(supp(\mathbf{x })=\cup _{i\in [k]}X_i\), \(X_i\) is called i-th set of \(\mathbf{x }\) and an empty k-set is defined as \(\mathbf{0 }=(\emptyset , \ldots , \emptyset )\).

For \(\mathbf{x }=(X_1, X_2, \ldots X_k)\) and \(\mathbf{y }=(Y_1, Y_2, \ldots , Y_k) \in (k+1)^V\), if \(e \in X_i\), we write \(\mathbf{x }(e)=i\) else if \(e \notin \cup _{i \in [k]} X_i\), we write \(\mathbf{x }(e)=0\) and i is called the position of e; adding \(e \notin supp(\mathbf{x })\) into \(X_i\) can be represented by \(\mathbf{x }\sqcup (e, i) \). In the case of \(X_i=\{e\}\), and \(X_j= \emptyset , \forall j\ne i\), we denote \(\mathbf{x }\) as (ei). We denote \(\mathbf{x }\sqsubseteq \mathbf{y }\) iff \(X_i \subseteq Y_i\) for all \(i\in [k]\).

A function \(f: (k+1)^V \mapsto \mathbb {R}\) is k-submodular iff for any \(\mathbf{x }=(X_1, X_2, \ldots , X_k)\) and \(\mathbf{y }=(Y_1, Y_2, \ldots , Y_k)\) \(\in (k+1)^V\), we have:

$$\begin{aligned} f(\mathbf{x })+ f(\mathbf{y }) \ge f(\mathbf{x }\sqcap \mathbf{y }) + f(\mathbf{x }\sqcup \mathbf{y }) \end{aligned}$$
(2)

where

$$\mathbf{x }\sqcap \mathbf{y }=(X_1 \cap Y_1, \ldots , X_k \cap Y_k)$$

and

$$\mathbf{x }\sqcup \mathbf{y }=\left( X_1 \cup Y_1 \setminus (\bigcup _{i \ne 1} X_i \cup Y_i), \ldots , X_k \cup Y_k \setminus (\bigcup _{i \ne k} X_i \cup Y_i)\right) $$

A function f is monotone iff for any \(\mathbf{x }\in (k+1)^V, e\notin supp(\mathbf{x })\) and \(i \in [k]\), we have

$$\begin{aligned} \varDelta _{e, i } f(\mathbf{x })=f(X_1, \ldots , X_{i-1}, X_i \cup \{e\}, X_{i+1}, \ldots , X_k) -f(X_1, \ldots , X_k) \ge 0 \end{aligned}$$
(3)

From [17], the k-submodularity of f implies the orthant submodularity, i.e.,

$$\begin{aligned} \varDelta _{e, i }f(\mathbf{x }) \ge \varDelta _{e, i } f(\mathbf{y }) \end{aligned}$$
(4)

and the pairwise monotonicity, i.e.,

$$\begin{aligned} \varDelta _{e, i }f(\mathbf{x }) + \varDelta _{e, j }f(\mathbf{x }) \ge 0 \end{aligned}$$
(5)

for any \(\mathbf{x }, \mathbf{y }\in (k+1)^V\) with \(\mathbf{x }\sqsubseteq \mathbf{y }\), \(e \notin supp(\mathbf{y })\) and \(i, j \in [k]\) with \(i \ne j\).

In this paper, we assume that f is normalized, i.e., \(f(\mathbf{0 })=0\) and each element e has a cost \(c_i(e)\) to be added into i-th set of a solution and the total cost of k-set \(\mathbf{x }\) is

$$c(\mathbf{x })=\sum _{i \in [k], e \in supp_i(\mathbf{x })} c_i(e)$$

We define \(\beta \) as the largest ratio of different costs of an element, i.e.,

$$\beta =\max _{e\in V, i \ne j}\frac{c_i(e)}{c_j(e)}$$

Without loss of generality, throughout this paper, we assume that every element e satisfies \(c_i(e)\ge 1, \forall i \in [k]\) and \(c_i(e)\le B\), otherwise we can simply remove it. We only consider \(k\ge 2\) because if \(k = 1\), the k-submodular becomes to submodular function.

3 Deterministic Streaming Algorithm When \(\beta =1\)

In this section, we introduce a deterministic streaming algorithm for the special case when \(\beta =1\), i.e., each element has the same cost for all i-th sets \(c_i(e)=c_j(e), \forall e \in V, i \ne j\). For simplicity, we denote \(c(e)=c_i(e)=c_j(e)\).

The main idea of our algorithms is that (1) we select each observed element e based on comparing between the ratio of f per total cost at the current solution with a threshold which is set in advance, and (2) we use the maximum singleton value \((e_{max}, i_{max})\) defined as

$$\begin{aligned} (e_{max}, i_{max})=\arg \max _{e\in V, i \in [k]} f((e, i)) \end{aligned}$$
(6)

to obtain the final solution. We first assume that the optimal solution is known and then remove this assumption by using the method in [1] to approximate the optimal solution.

3.1 Deterministic Streaming Algorithm with Known Optimal Solution

We first present a simplified version of our deterministic streaming algorithm when the optimal solution is known. Denote \(\mathbf{o} \) as an optimal solution and \({\mathsf {opt}}=f(\mathbf{o} )\), the algorithm receives v such that \(v\le {\mathsf {opt}}\) and a parameter \(\alpha \in (0, 1]\) as inputs. The role of these parameters are going to be clarified in the main version. The details of the algorithm are fully presented in Algorithm 1. We define the notations as follows:

  • \((e^j, i^j)\) as the j-th element and its position added in the main loop of the algorithm;

  • \(\mathbf{s }^j\) - the solution when adding j elements in the main loop of the algorithm;

  • \(\mathbf{o} ^j=(\mathbf{o} \sqcup \mathbf{s }^j ) \sqcup \mathbf{s }^j\);

  • \(\mathbf{o} ^{j-1/2}=(\mathbf{o} \sqcup \mathbf{s }^j ) \sqcup \mathbf{s }^{j-1}\);

  • \(\mathbf{s }^{j-1/2}\): If \(e^j \in supp(\mathbf{o} )\), then \(\mathbf{s }^{j-1/2}=\mathbf{s }^{j-1} \sqcup (e^j, \mathbf{o} (e^j)) \). If \(e^j \notin supp(\mathbf{o} )\), \(\mathbf{s }^{j-1/2}=\mathbf{s }^{j-1}\);

  • \(\mathbf{u }^t=\{(u_1, j_1), (u_2, j_2), \ldots , (u_r,j_r) \}\) - a set of elements that are in \(\mathbf{o} ^t\) but not in \(\mathbf{s }^t\), \(r=|supp(\mathbf{u }^t)|\)

  • \(\mathbf{u }^t_i=\mathbf{s }^t \sqcup \{(u_1, j_1), (u_2, j_2), \ldots , (u_i,j_i) \}\)

The algorithm initiates a candidate solution \(\mathbf{s }^0\) as an empty k-set. For each new incoming element e, the algorithm updates a tuple \((e_{max}, i_{max})\) to find the maximal singleton then checks that the total cost \(c(\mathbf{s }^t) + c(e)\) exceed B or not? If not, it finds a position \(i' \in [k]\) that \(f(\mathbf{s }^t \sqcup (e, i'))\) is maximal and adds (ei) into \(\mathbf{s }^t\) if \(\frac{f(\mathbf{s }^t \sqcup (e, i'))}{c(\mathbf{s }^t) + c (e)}\ge \frac{\alpha v }{B} \). Otherwise, it ignores e and receives the next element. This step helps the algorithm select any element which has high value of marginal value per its cost as well as eliminate bad ones.

After finishing the main loop, the algorithm returns the best solution in \( \{\mathbf{s }^{t} \} \cup \{(e_{max}, i_{max})\}\) when f is monotone or returns the best solution in \(\{\mathbf{s }{^j}: j \le t\} \cup \{(e_{max}, i_{max})\}\) when f is non-monotone. We now analysis the approximation guarantee of Algorithm 1. Denote \(e^t\) is the last addition of the main loop of the Algorithm 1. By exploiting the relation among \(\mathbf{o} \), \(\mathbf{o} ^j\) and \(\mathbf{s }^j, j \le t\), we obtain the following Lemma.

figure a

Lemma 1

If f is monotone then \(v-f(\mathbf{o} ^t) \le f(\mathbf{s }^t)\) and if f is non-monotone then \(v-f(\mathbf{o} ^t) \le 2f(\mathbf{s }^t) \).

Due to the space constraint, we omit some proofs and presented them in a full version of this paper. Lemma 1 plays an important role for analyzing approximation ratio of the algorithm, which stated in the following Theorem.

Theorem 1

Algorithm 1 is a single pass streaming algorithm and returns a solution \(\mathbf{s }\) satisfying:

  • If f is monotone, \(f(\mathbf{s }) \ge \min \{\frac{\alpha }{2}, \frac{1-\alpha }{2} \}v\), \(f(\mathbf{s })\) is maximized to \( \frac{v}{4}\) when \(\alpha = \frac{1}{2}\).

  • If f is non-monotone, \(f(\mathbf{s }) \ge \min \{\frac{\alpha }{2}, \frac{ 1-\alpha }{3} \}v\), \(f(\mathbf{s })\) is maximized to \( \frac{v}{5}\) when \(\alpha = \frac{2}{5}\).

3.2 Deterministic Streaming Algorithm

We present our deterministic streaming algorithm in the case of \(\beta =1\) which reuses the framework of Algorithm 1 but removes the assumption that \(\mathbf{o} \) is known. We use the dynamic update method in [1] to obtain a good approximation of \({\mathsf {opt}}\).

To specific, denote \(m=\max _{e \in V, i\in [k]}f((e, i))\), we have \(m \le {\mathsf {opt}}\le Bm \). Therefore we use the value \(v=(1+\epsilon ')^j\) for \(\{j| m \le (1+\epsilon ')^j \le B m, j\in \mathbb {Z}_+ \}\) to guess the value of \({\mathsf {opt}}\) by showing that there exits v such that \((1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}\). However, in order to find m, we have to require at least one pass over V. Therefore, we adapt the dynamic update method in [1] which updates \(m=\max \{m , \max _{i \in [k]}f((e, i))\}\) with an already observed element e to determine the range of guessed optimal values. This method can help algorithm maintain a good estimation of the optimal solution if that range shifts forward when next elements are observed. We implement this method by using variables \(\mathbf{s }^{t_j}\) and \(t_j\) to store a candidate solution and the number of its elements in which \(v=(1+\epsilon ')^j\) is an guessed value of \({\mathsf {opt}}\).

We set the value of \(\alpha \) by using Theorem 1 which provides the best approximation guarantees. The value of \(\epsilon '\) is set to several times higher than \(\epsilon \) to reduce the complexity but still ensure approximation ratios. The detail of our algorithm is presented in Algorithm 2.

figure b

Lemma 2

In Algorithm 2, there exists a number \(j \in \mathbb {Z}_+\) so that \(v=(1+\epsilon ')^j \in O\) satisfies \((1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}\)

Proof

Denote \(m=f((e_{max},i_{max}))\). Duet to k-submodularity of f, we have

$$ m \le {\mathsf {opt}}=f(\mathbf{o} )\le \sum _{e \in supp(\mathbf{o} )}f(e, \mathbf{o} (e)) \le B m $$

Let \(j=\lfloor \log _{1+\epsilon '} {\mathsf {opt}}\rfloor \), we have \(v=(1+\epsilon ')^j \le {\mathsf {opt}}\le Bm \) and \( v \ge (1+\epsilon ')^{\log _{1+\epsilon '}({\mathsf {opt}}) -1 }=\frac{{\mathsf {opt}}}{1+\epsilon '} \ge {\mathsf {opt}}(1-\epsilon ') \).

The performance of Algorithm 2 is claimed in the following Theorem.

Theorem 2

Algorithm 2 is a single pass streaming algorithm that has \(O(\frac{kn}{\epsilon } \log B)\) query complexity, \(O(\frac{B}{\epsilon } \log B)\) space complexity and provides an approximation ratio of \(\frac{1}{4}-\epsilon \) when f is monotone and \(\frac{1}{5}-\epsilon \) when f is non-monotone.

Proof

The size of O is at most \(\frac{1}{\epsilon '}\log B\), finding each \(\mathbf{s }^{t_j}\) takes at most O(kn) queries and \(\mathbf{s }^{t_j}\) includes at most B elements. Therefore, the query complexity is \(O(\frac{kn}{\epsilon } \log B)\) and total space complexity is \(O(\frac{B}{\epsilon } \log B)\).

By Lemma 2, there exists an integer number \(j \in \mathbb {Z}_+\) so that \(v=(1+\epsilon ')^j \in O\) satisfies \((1-\epsilon '){\mathsf {opt}}\le v \le {\mathsf {opt}}\). Apply Theorem 1, for the monotone case we have: \( f(\mathbf{s }) \ge \frac{1}{4} v \ge \frac{1}{4}(1-\epsilon '){\mathsf {opt}}= (\frac{1}{4} -\epsilon ){\mathsf {opt}}\) and for the non-monotone case: \( f(\mathbf{s }) \ge \frac{1}{5}v \ge \frac{1}{5}(1-\epsilon '){\mathsf {opt}}= (\frac{1}{5} -\epsilon ){\mathsf {opt}}\). Hence, the theorem is proved.    \(\square \)

4 Random Streaming Algorithm for General Case

In the case each element e has multiple different cost \(c_i(e)\) for each i-th set, we can not apply previous algorithms. Therefore, in this section we introduce one pass streaming which provides approximation ratios in expectation for \(\textsf {BkSM}\) problem.

At the core of our algorithm, we introduce a new probability distribution to choose a position for each element to establish the relationship among \(\mathbf{o} \), \(\mathbf{o} ^j\) and \(\mathbf{s }^j\) (Lemma 3) and analyze the performance of our algorithm. Besides, we also use a predefined threshold to filter high-value elements into candidate solutions and the maximum singleton value to give the final solution. Similar to the previous section, we first introduce a simplified version of the streaming algorithm when the optimal solution is known in advance.

4.1 Random Algorithm with Known Optimal Solution

This algorithm also receives the inputs \(\alpha \in (0, 1)\) and v that \(v\le {\mathsf {opt}}\). We use the same notations as in Sect. 3. This algorithm also requires one pass over V. The algorithm imitates an empty k-set \(\mathbf{s }^0\) and subsequently updates the solution after once passing over V. Be different from Algorithm 1, for each \(e \in V\) being observed, the algorithm finds a set collection J that contains positions satisfying the total cost is at most B and the ratio of the increment of the objective function per cost is at least a given threshold, i.e.,

$$\begin{aligned} J=\left\{ i\in [k]:c(\mathbf{s }^t) + c_i(e) \le B \ \text{ and } \ \frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{ c_i(e)}\ge \frac{\alpha v }{B}\right\} \end{aligned}$$
(7)

These constraints help the algorithm eliminate which position having low increment of the objective function over its cost. If \(J\ne \emptyset \), the algorithm puts e into set i of \(\mathbf{s }^t\) with a probability:

$$\begin{aligned} \frac{p_i^{|J|-1}}{T}=\frac{(\frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{c_i(e)})^{|J|-1}}{\sum _{i \in J}(\frac{f(\mathbf{s }^t \sqcup (e, i))-f(\mathbf{s }^t)}{c_i(e)})^{|J|-1}} \end{aligned}$$
(8)

Simultaneously, the algorithm finds the maximum singleton value \((e_{max}, i_{max})\) by updating the current maximal value from the set of observed elements. As Algorithm 3, the algorithm also uses \((e_{max}, i_{max})\) as one of candidate solutions and finds the best among them. The full detail of this algorithm is described in Algorithm 3.

figure c

Lemma 3 provides the relationship among \(\mathbf{o} , \mathbf{o} ^j\) and \(\mathbf{s }^j, j\le t\) that play an importance role in analyzing algorithm’s performance.

Lemma 3

In Algorithm 3, if there is no pair \((e, i) \in \mathbf{o} \) satisfying \( \exists j \in [t]: e \notin supp(\mathbf{s }^j)\) so that \(\frac{f(\mathbf{s }^j \sqcup (e, i))}{c(\mathbf{s }^j) + c_i (e)}\ge \frac{\alpha v }{B} \ \ \text{ and } \ \ c(\mathbf{s }^j) +c_i(e) >B\), we have:

  • If f is monotone, then

    $$f(\mathbf{o} ^{j-1})-\mathbb {E}[f(\mathbf{o} ^{j})] \le \beta (1-\frac{1}{k})(\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1}) + \frac{\alpha v c_{j^*}(e^j)}{kB}$$
  • If f is non-monotone, then

    $$f(\mathbf{o} ^{j-1})-\mathbb {E}[f(\mathbf{o} ^j)] \le 2\beta (1-\frac{1}{k})(\mathbb {E}[f(\mathbf{s }^j)] - f(\mathbf{s }^{j-1})) + \frac{2\alpha v c_{j^*}(e^j)}{kB}$$

Theorem 3

Algorithm 3 returns a solution \(\mathbf{s }\) satisfying

  • If f is monotone, \(\mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}v\), \(f(\mathbf{s })\) is maximized to \( \frac{v}{3+\beta -\frac{\beta }{k}} \) when \(\alpha =\frac{2}{3+\beta -\frac{\beta }{k}}\).

  • If f is non-monotone, \(\mathbb {E}[f(\mathbf{s })] \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \}v\), \(f(\mathbf{s })\) is maximized to \( \frac{v}{3+2\beta -\frac{2\beta }{k}}\) when \(\alpha = \frac{2}{3+2\beta -\frac{2\beta }{k}} \).

4.2 Random Streaming Algorithm

In this section we remove the assumption that the optimal solution is known and present the random streaming algorithm which reuses the framework of Algorithm 3.

Similar to the Algorithm 2, we use the method in [1] to estimate \({\mathsf {opt}}\). We assume that we know \(\beta \) in advance. This is feasible because we can calculate the value of \(\beta \) in O(kn). We set \(\alpha \) according to the properties of f to provide the best performance of the algorithm. The algorithm continuously updates \(O \leftarrow \{j| f((e_{max}, i_{max}))\le (1+\epsilon )^j \le B f((e_{max}, i_{max})), j\in \mathbb {Z}_+ \}\) in order to estimate the value of maximal singleton and uses \(\mathbf{s }^{t_j}\) and \(t_j\) to save candidate solutions, which is updated by using the probability distribution as in Algorithm 3 with \((1+\epsilon )^j\) is an estimation of optimal solution. The algorithm finally compares all candidate solutions to select the best one. The details of algorithm is presented in Algorithm 4.

figure d

Theorem 4

Algorithm 4 is one pass streaming algorithm that has \(O(\frac{kn}{\epsilon } \log B)\) query complexity, \(O(\frac{B}{\epsilon } \log B)\) space complexity and provides an approximation ratio of \(\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \} -\epsilon \) when f is monotone and \(\min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+2\beta )k-2\beta } \} -\epsilon \) when f is non-monotone in expectation.

Proof

By Lemma 2, there exists \(j \in \mathbb {Z}_+\) that \(v=(1+\epsilon )^j \in O\) satisfies \((1-\epsilon ){\mathsf {opt}}\le v \le {\mathsf {opt}}\). Using similar arguments of the proof of Theorem 3, for the monotone case

$$ f(\mathbf{s }) \ge \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}v \ge \left( \min \{\frac{\alpha }{2}, \frac{(1-\alpha )k}{(1+\beta )k-\beta } \}-\epsilon \right) {\mathsf {opt}}$$

For the non-monotone case we also obtain the proof by applying the same arguments    \(\square \)

5 Conclusions

This paper studies the \(\textsf {BkSM}\), a generalized version of maximizing k-submodular functions problem. In order to find the solution, we propose several streaming algorithms with provable guarantees. The core of our algorithms is to exploit the relation between candidate solutions and the optimal solution by analyzing intermediate quantities and applying a new probability distribution to select elements with high contributions to a current solution. In the future we are going to conduct experiments on so some instance of \(\textsf {BkSM}\) to show the performance of our algorithms in practice.