Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty

Sun, Jian; Sheng, Haiyun; Sun, Yuefang; Du, Donglei; Zhang, Xiaoyan

doi:10.1007/s10878-021-00753-x

Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty

Published: 21 July 2021

Volume 44, pages 2626–2641, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Combinatorial Optimization Aims and scope Submit manuscript

Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty

Download PDF

Jian Sun¹,
Haiyun Sheng⁴,
Yuefang Sun²,
Donglei Du³ &
…
Xiaoyan Zhang ORCID: orcid.org/0000-0002-3085-2701⁴

319 Accesses
1 Citation
Explore all metrics

Abstract

Stochastic combinatorial optimization problems are usually defined as planning problems, which involve purchasing and allocating resources in order to meet uncertain needs. For example, network designers need to make their best guess about the future needs of the network and purchase capabilities accordingly. Facing uncertain in the future, we either “wait and see” changes, or postpone decisions about resource allocation until the requirements or constraints become realized. Specifically, in the field of stochastic combinatorial optimization, some inputs of the problems are uncertain, but follow known probability distributions. Our goal is to find a strategy that minimizes the expected cost. In this paper, we consider the two-stage finite-scenario stochastic set cover problem and the single sink rent-or-buy problem by presenting primal-dual based approximation algorithms for these two problems with approximation ratio $2\eta $ and 4.39, respectively, where $\eta $ is the maximum frequency of the element of the ground set in the set cover problem.

Minmax regret 1-sink location problems on dynamic flow path networks with parametric weights

Article 26 August 2024

Approximation Algorithm for Stochastic Set Cover Problem

Minimax Regret k-sink Location Problem in Dynamic Path Networks

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the set cover problem, there is a finite set $E=\{e_{1},e_{2}, \ldots ,e_{n}\}$ (called the universe) and a subset of its power set (denoted by $\mathbb {S}$), whose union equals to the universe. The aim is to search for a subcollection of $\mathbb {S}$ that covers the universe with minimum cost. This problem is NP-hard (Karp 1972), which means there is no exact algorithm with polynomial time complexity under the assumption P$\ne $NP. Cover problems have been widely used in the field of aviation personnel travel arrangement, circuit design and transportation vehicle routing arrangement.

A relevant variant of the set cover is the partial set cover problem, which is introduced by Kearns (1990). Specifically, for each set $S\in \mathbb {S}$, it is associated with a nonnegative cost c(S) and there is a nonnegative integer k. The goal is to select the least cost family of sets in $\mathbb {S}$ covering at least k elements. Another common variant is the set cover problem with penalty. In this variant, a subset $T\subseteq E$ may be uncovered by paying for an additional cost h(T) (called penalty), where $h(\cdot )$ is a nonnegative function defined by $h: 2^{E}\rightarrow \mathbb { R_{+}}$. The goal is to select a subcollection of $\mathbb {S}$ to cover some elements of E such that the total cost of subsection and penalty of the uncovered elements is minimized.

Set cover and its relevant variants are classic problems in combinatorial optimization (Hochbaum 1982). The deterministic set cover is one of the earliest NP-hard problems. Johnson designed a $O(\log n)$-approximation algorithm for this problem (Johnson 1974). In 1993, Bellare et al. (1994) proved that it is impossible to design a constant factor approximation algorithm with polynomial time complexity for this problem unless P=NP. Raz and Safra (1997) proposed a stronger conclusion on inapproachability: for some constant c, there is no approximation algorithm whose approximation ratio is less than $c\ln n$ unless P=NP. Arora and Sudan (2003) further proved that it is NP-hard to design approximation algorithms with ratios less than $\varOmega (\log n)$. In some variants of the set cover problem, such as the partial set cover Gandhi et al. (2004), the prize-collecting set cover problem (Könemann et al. 2011) and the generalized set cover problem (Könemann et al. 2011), the constraint which requires all the elements in the universe to be covered may be removed or relaxed, i.e., some elements can be uncovered. For the partial set cover problem, Slavik (1997) designed an $H(\varDelta )$-approximation algorithm, in which $\varDelta =\max \{|S|:\;S\in \mathbb {S}\}$. Gandhi et al. (2004) proposed an $\eta $-approximation algorithm, where $\eta $ is the maximum frequency of the element of E in the family of subsets. Ravic and Sinhac (2006) showed that the stochastic set cover problem is no more difficult in approximation algorithm design. Further, Li et al. designed an approximation algorithm with ratio $2(\ln n+1)$ (Li and Liu 2016) in which n is the cardinality of universe. Parthasarathy (2018) designed an adaptive greedy algorithm with ratio H(n) for the stochastic set cover problem.

Facility location problems have also been widely studied since the early 1960s. The problem is NP-hard which implies that no polynomial-time exact algorithm is expected to exist unless P=NP. This problem can be formulated as follows: Given a graph $G = (V,E)$, a set of facilities $\mathcal {F}\subseteq V$ with opening costs, and a set of demand nodes $\mathcal {D} \subseteq V$, the aim is to open some facilities in $\mathcal {F}$ and assign each demand node to one of those open facilities such that the sum of facility opening cost and connecting cost is minimized. To some extent, the facility location problem can be also seen as a cover problem: finding a set of facilities to cover (serve) customers with minimum cost.

Today, many applications take place in environments where open facilities also need to communicate with each other. For instance, facilities could be caches or file servers that need to communicate with each other to maintain consistent data, and clients could be users or processes requesting data items. Telecommunication network design is one such example. Network design involves selecting a subset of core nodes, connecting selected core nodes, and routing communication from terminal nodes to selected core nodes. Here, clients are the terminal nodes of the network and facilities are the core nodes. The open cost of the facility corresponds to the exchange cost of the corresponding core node. This problem requires open facilities being connected to each other through the Steiner tree.

The problem mentioned above can be modelled as the Connected Facility Location (ConFL) problem. Consider a graph $G = (V,E)$ with a cost function $c:E\rightarrow \mathbb {R}_{\ge 0}$, a facility set $\mathcal {F} \subset V$, a demand set $\mathcal {D}\subset V$ and a parameter $M\ge 1$. For $i\in \mathcal {F}$ and $j\in \mathcal {D}$, let $c_{ij}$ denote the length of shortest path connecting i and j (with respect to the cost function c), and let $d_{j}$ be the units of demand and $f_{i}$ be the open cost of i. A feasible solution consists of a set of open facilities, a connection scheme between the clients and the open facilities, and a Steiner tree T connecting the open facilities. The cost of the connecting facilities is M times the cost of T. Thus the total cost is

$$\begin{aligned} \sum _{i\in \mathcal {F}}f_{i}+\sum _{j\in \mathcal {D}}d_{j}c_{i(j)j}+M\sum _{e\in T}c_{e}. \end{aligned}$$

(1)

The objective is to find a solution of minimum cost. This problem has received much attention in the field of operations research and computer science (Gupta et al. 2001), (Khuller and Zhu 2002), (Karger and Minkoff 2000), (Kim et al. 1996), (Labbé et al. 2001), (Lee et al. 1996).

The rent-or-buy problem is a special case of the ConFL, in which $\mathcal {F} = V$ and $f_{i}=0,\,\forall i\in \mathcal {F}$. In the rent-or-buy problem, there are two ways to install capacity at any given edge; capacity can be rented at the cost based on unit capacity, or purchased, allowing unlimited use after paying a large fixed cost. The model can be represented by two positive parameters, $\vartheta $ and M, where the cost of rented capacity is equal to $\vartheta $ times the required capacity (per unit length) and the cost of purchased capacity is equal to M(per unit length). Without loss of generality, we assume that $\vartheta $ is equal to 1. A feasible solution is composed of demand allocation for facilities and a subgraph of G (Steiner tree T). The total cost is:

$$\begin{aligned} \sum _{j\in \mathcal {D}}d_{j}c_{ij}+M\sum _{e\in T}c_{e}. \end{aligned}$$

(2)

The main focus of our work is to consider the stochastic set cover problem and single sink rent-or-buy problem with submodular penalty. Notice that compared with set cover, in the rent-or-buy problem, the open facilities not only need to cover (serve) customers, but also need to be connected to communicate with each other. The differences with the classical model are that, in this stochastic model with submodular penalty, the demands of clients are uncertain. In addition, not all clients are required to be served but any unserved client will incur a submodular penalty cost.

The remainder of this paper is organized as follows. We introduce the stochastic set cover problem in Sect. 2. In Sect. 3, we design an algorithm with approximate ratio of $2\eta $ via the primal dual technique and present the proof. In Sect. 4, we design a 4.39-approximation algorithm for the stochastic single source rent-or-buy problem. We provide concluding remarks in Sect. 5.

2 Stochastic set cover problem with submodular penalties

In this section, we introduce the stochastic set cover problem with submodular penalties and propose an algorithm to solve this problem via the primal dual technique.

Formally, this problem can be described as follows. In the first stage, it is possible to anticipate possible scenarios and purchase some subsets in advance. The corresponding purchase cost of S is denoted as $c_{S}^{0}$. In the second stage, we get the probability distribution for all the scenarios. In this paper we only consider the case that the number m of the scenarios is polynomial with respect to the input of the problem. For a scenario $k\in \{1,2, \ldots ,m\}$ in the second stage, its probability is denoted by $p_{k}$, the client (element) set is denoted by $E_{k}$, the penalty cost function $h_{k}(): 2^{E_{k}}\rightarrow \mathbb {R}_{+}$ is monotone submodular, and the purchasing cost of set S is denoted by $c_{S}^{k}$. Relative research results about submodular optimization can be found in Balkanski and Singer (2020), Tang (2021).

The sets purchased in the first stage and in the k-th scenario of second stage can be used to serve the clients (elements) implemented in the second stage with respect to the scenario. For the unserved elements, they will be penalized and pay the penalty cost accordingly. For ease of description and modeling, we present some notations as follows: $\mathcal {S}=\{(S,k):S\in \mathbb {S},\;k = 0,1, \ldots ,m\}$; $\mathcal {C}=\{(e,k):e\in E,\;k = 0,1, \ldots ,m\}$; and $\mathcal {C}_{k}=\{(e,k):e\in E_k\}$ $\forall k\in \{0,1, \ldots ,m\}$. The task is to select two kinds of sets of $\mathbb {S}$, which is purchased in the first stage (denoted by $\widehat{\mathbb {S}}_{0}$) and in the k-th scenario of the second stage (denoted by $\widehat{\mathbb {S}}_{k}$) ($k =1, \ldots ,m$) respectively, and the penalty set $\widehat{T}_{k}$ ($k=1, \ldots ,m$) such that the sum of the expected purchased cost and the expected penalty is minimized.

Then we can formulate this problem as follows, in which $p_{0}=1$ and $\sum \nolimits _{k=1}^{m}p_{k}=1$.

(IP)

$$\begin{aligned} \begin{aligned} \min \quad \sum \limits _{(S,k)\in \mathcal {S}}p_{k}c_{S}^{k}x_{S}^{k}+\sum \limits _{k=1}^{m}\sum \limits _{T_{k}\subseteq E_{k}}p_{k}h_{k}(T_{k})Z_{T_{k}} \\ \quad \text{ s.t. } \sum \limits _{S\in \mathbb {S}:e\in S}x_{S}^{0}+\sum \limits _{S\in \mathbb {S}:e\in S}x_{S}^{k}+\sum \limits _{T_{k}\subseteq E_{k}:e\subseteq T_{k}}Z_{T_{k}}&\ge 1,&\quad&\forall (e,k)\in \mathcal {C},\\ x_{S}^{0},x_{S}^{k},Z_{T_{k}}&\in \{0,1\},&\quad&\forall S\in \mathbb {S},T_{k}\subseteq E_{k}. \end{aligned} \end{aligned}$$

Its LP relaxation and corresponding dual linear program are as follows:

(LP)

$$\begin{aligned} \begin{aligned} \min \quad \sum \limits _{(S,k)\in \mathcal {S}}p_{k}c_{S}^{k}x_{S}^{k}+\sum \limits _{k=1}^{m}\sum \limits _{T_{k}\subseteq E_{k}}p_{k}h_{k}(T_{k})Z_{T_{k}} \\ \quad \text{ s.t. } \sum \limits _{S\in \mathbb {S}:e\in S}x_{S}^{0}+\sum \limits _{S\in \mathbb {S}:e\in S}x_{S}^{k}+\sum \limits _{T_{k}\subseteq E_{k}:e\subseteq T_{k}}Z_{T_{k}}&\ge 1,&\quad&\forall (e,k)\in \mathcal {C},\\ x_{S}^{0},x_{S}^{k},Z_{T_{k}}&\ge 0,&\quad&\forall S\in \mathbb {S},T_{k}\subseteq E_{k}. \end{aligned} \end{aligned}$$

(DP)

$$\begin{aligned} \begin{aligned} \max \quad \sum \limits _{(e,k)\in \mathcal {C}}y_{e}^{k} \\ \quad \text{ s.t. } \sum \limits _{(e,k)\in \mathcal {C}}y_{e}^{k}&\le c_{S}^{0},&\quad&\forall S\in \mathbb {S},\\ \sum \limits _{e\in E_{k}}y_{e}^{k}&\le p_{k}c_{S}^{k},&\quad&\forall S\in \mathbb {S},k=1,2, \ldots ,m,\\ \sum \limits _{e\in T_{k}}y_{e}^{k}&\le p_{k}h_{k}(T_{k}),&\quad&\forall T_{k}\subseteq E_{k}, k=1,2, \ldots ,m,\\ y_{e}^{k}&\ge 0,&\quad&\forall (e,k)\in \mathcal {C}. \end{aligned} \end{aligned}$$

To describe the algorithm, we define some notations as follows:

$\overline{\mathcal {C}}_{k}\subseteq \mathcal {C}_{k}$ consists of frozen element-scenario pairs in the k-th scenario; $\overline{\mathcal {C}}{:}{=}\bigcup \limits _{k=1}^{m}\overline{\mathcal {C}}_{k}$; and N(S, k)={$(e,k):e\in S$ and $y_{e}^{k}>0$} $((S, k)\in \mathcal {S})$.

3 The analysis of Algorithm 1

Let SOL denote the output solution of Algorithm 1, whose cost consists of two parts: purchased cost (denoted by $\mathbb {S}_{SOL}$) and penalty cost (denoted by $P_{SOL}$). Before the proof, we highlight the main idea. First we will prove that Algorithm 1 has polynomial time complexity in Lemma 1. Then, we analyze the upper bound of $\mathbb {S}_{SOL}$ and $P_{SOL}$ respectively so as to obtain the approximation ratio. Note that $N(S,k)\bigcap N(S',k')\ne \emptyset $ may hold for some $k,k'=0,1, \ldots ,m$, which means there may be some clients served by S and $S'$ simultaneously, $S\in \widehat{\mathbb {S}}_{0}, S'\in \widehat{\mathbb {S}}_{k}$.

Lemma 1

Consider time $\tilde{\tau }$ at which one of the three events of Step 2.4 in Algorithm 1 occurs, then we can always find the next time $\tau ^{*}$ such that one of the three events occurs again in polynomial time.

Proof

Let $\widetilde{E}$ be the set of frozen element-scenarios during this iteration, then $\tau ^{*}$ can be obtained by the following process.

If Event 1 occurs first at time $\tau _{1}$, thus the following inequality holds

$$\begin{aligned} \sum \limits _{(e,k)\in \mathcal {C}\bigcap \widetilde{E}}y_{e}^{k}+\sum \limits _{(e,k)\in \mathcal {C}\backslash \widetilde{E}}\tau \le c_{S}^{0},\quad \forall S\in \mathbb {S},\forall \tau \in (\tilde{\tau },\tau _{1}), \end{aligned}$$

and then

$$\begin{aligned} \tau \le \frac{c_{S}^{0}-\sum \nolimits _{(e,k)\in \mathcal {C}\bigcap \widetilde{E}}y_{e}^{k}}{\sum \nolimits _{(e,k)\in \mathcal {C}\backslash \widetilde{E}}1},\quad \forall S\in \mathbb {S},\mathcal {C}\backslash \widetilde{E}\ne \emptyset . \end{aligned}$$

Let

$$\begin{aligned} \gamma _{k}= & {} \sum \limits _{(e,k)\in \mathcal {C}\backslash \widetilde{E}}1,\\ \gamma ^{\prime }_{k}= & {} \sum \limits _{(e,k)\in \mathcal {C}\bigcap \widetilde{E}}y_{e}^{k}, \end{aligned}$$

since $\gamma _{k}$ and $\gamma ^{\prime }_{k}$ are both modular functions, then

$$\begin{aligned} \tau _{1}=\min \limits _{S\in \mathbb {S}:\mathcal {C}\backslash \widetilde{E}\ne \emptyset }\frac{c_{S}^{0}-\gamma ^{\prime }_{k}}{\gamma _{k}} \end{aligned}$$

can be computed in polynomial time.

If Event 2 occurs first at time $\tau _{2}$, then

$$\begin{aligned} \sum \limits _{e\in E_{k}\bigcap \widetilde{E}}y_{e}^{k}+\sum \limits _{e\in E_{k}\backslash \widetilde{E}}\tau \le p_{k}c_{S}^{k},\quad \forall S\in \mathbb {S},\forall \tau \in (\tilde{\tau },\tau _{2}), \end{aligned}$$

and thus

$$\begin{aligned} \tau \le \frac{p_{k}c_{S}^{k}-\sum \nolimits _{e\in E_{k}\bigcap \widetilde{E}}y_{e}^{k}}{\sum \nolimits _{e\in E_{k}\backslash \widetilde{E}}1},\quad \forall S\in \mathbb {S},E_{k}\backslash \widetilde{E}\ne \emptyset . \end{aligned}$$

Let

$$\begin{aligned} \nu _{k}= & {} \sum \limits _{e\in E_{k}\backslash \widetilde{E}}1,\\ \nu ^{\prime }_{k}= & {} \sum \limits _{e\in E_{k}\bigcap \widetilde{E}}y_{e}^{k}. \end{aligned}$$

It is obvious that both $\nu _{k}$ and $\nu ^{\prime }_{k}$ are modular functions. Therefore

$$\begin{aligned} \tau _{2}=\min \limits _{S\in \mathbb {S}:E_{k}\backslash \widetilde{E}\ne \emptyset }\frac{p_{k}c_{S}^{k}-\nu ^{\prime }_{k}}{\nu _{k}} \end{aligned}$$

can be computed in polynomial time also.

If Event 3 occurs first at time $\tau _{3}$, then

$$\begin{aligned} \sum \limits _{e\in T_{k}\bigcap \widetilde{E}} y_{e}^{k}+\sum \limits _{e\in T_{k}\backslash \widetilde{E}}\tau \le p_{k}h_{k}(T_{k}), \quad \forall T_{k}\subseteq E_{k},\forall \tau \in (\tilde{\tau },\tau _{3}) \end{aligned}$$

and then

$$\begin{aligned} \tau \le \frac{p_{k}h_{k}(T_{k})-\sum \nolimits _{e\in T_{k}\bigcap \widetilde{E}}y_{e}^{k}}{\sum \nolimits _{e\in T_{k}\backslash \widetilde{E}}1}, \quad \forall T_{k}\subseteq E_{k}, T_{k}\backslash \widetilde{E}\ne \emptyset . \end{aligned}$$

Similar to the proof above, we have

$$\begin{aligned} \tau _{3}=\min \limits _{T_{k}\subseteq E_{k}} \frac{p_{k}h_{k}(T_{k})-\sum \nolimits _{e\in T_{k}\bigcap \widetilde{E}}y_{e}^{k}}{\sum \nolimits _{e\in T_{k}\backslash \widetilde{E}}1}. \end{aligned}$$

In fact, what we need to consider in particular is the calculation of $\tau _{3}$, which is to compute the minimum ratio of a submodular function over a modular function, which is solvable in polynomial time (Fleischer and Iwata 2003). Therefore, all the above inequations can be computed in polynomial time.

Since $\tau ^{*}=\min \{\tau _{1},\tau _{2},\tau _{3}\}$, Lemma 1 follows. $\square $

Lemma 2

$$\begin{aligned} \mathbb {S}_{SOL}=\sum \limits _{S\in \widehat{\mathbb {S}}_{0}}\sum \limits _{(e,k)\in N(S,0)}y_{e}^{k}+\sum \limits _{k=1}^{m}\sum \limits _{S\in \widehat{\mathbb {S}}_{k}}\sum \limits _{(e,k)\in N(S,k)}y_{e}^{k}. \end{aligned}$$

Proof

If S is purchased in the first stage, i.e., $S\in \widehat{\mathbb {S}}_{0}$ with

$$\begin{aligned} c_{S}^{0}=\sum \limits _{(e,k)\in N(S,0)}y_{e}^{k}, \end{aligned}$$

then we have

$$\begin{aligned} \sum \limits _{S\in \widehat{\mathbb {S}}_{0}}c_{S}^{0}=\sum \limits _{S\in \widehat{\mathbb {S}}_{0}}\sum \limits _{(e,k)\in N(S,0)}y_{e}^{k}, \end{aligned}$$

which is the total cost of all the sets purchased in the first stage.

Similarly, if $S\in \widehat{\mathbb {S}}_{k}$, i.e., it is purchased in the scenario k of second stage, where $k=1,2, \ldots ,m$, then

$$\begin{aligned} c_{S}^{k}=\sum \limits _{(e,k)\in N(S,k)}y_{e}^{k}. \end{aligned}$$

Thus the purchased cost of second stage is

$$\begin{aligned} \sum \limits _{k=1}^{m}\sum \limits _{S\in \widehat{\mathbb {S}}_{k}}c_{S}^{k}=\sum \limits _{k=1}^{m}\sum \limits _{S\in \widehat{\mathbb {S}}_{k}}\sum \limits _{(e,k)\in N(S,k)}y_{e}^{k}. \end{aligned}$$

The total purchased cost equals to the purchased costs of the first plus second stages’ purchased cost. Thus the above lemma is proved. $\square $

Let $\mathcal {C}^{(1)}{:}{=}\mathcal {C}_{\widehat{\mathbb {S}}_{0}}\bigcup (\bigcup \limits _{k=1}^{m}\mathcal {C}_{\widehat{\mathbb {S}}_{k}})$ and $\mathcal {C}^{(2)}{:}{=}\bigcup \limits _{k=1}^{m}\{(e,k):e\in \widehat{T}_{k}\}$. Then the penalty cost of SOL is as follows.

Lemma 3

$$\begin{aligned} P_{SOL}=\sum \limits _{(e,k)\in \mathcal {C}^{(2)}}y_{e}^{k}=\sum \limits _{k=1}^{m}\sum \limits _{e\in \widehat{T}_{k}}y_{e}^{k}. \end{aligned}$$

Proof

During an iteration, if $\tau ^{*}=\tau _{3}$, then the algorithm selects a subset (denoted by $T_{k_{3}^{*}}$) to be added to the penalty set. Let $\widetilde{T}_{k}$ denote the penalty set before such iteration

$$\begin{aligned} \sum \limits _{ e\in \widetilde{T}_{k}}y_{e}^{k}=p_{k}h_{k}(\widetilde{T}_{k}), \end{aligned}$$

$$\begin{aligned} \sum \limits _{ e\in T_{k_{3}^{*}}}y_{e}^{k}=p_{k}h_{k}(T_{k_{3}^{*}}). \end{aligned}$$

The only task is to prove that for any $k=1,2, \ldots ,m,$

$$\begin{aligned} \sum \limits _{e\in \widetilde{T}_{k}\bigcup T_{k_{3}^{*}}}y_{e}^{k}=p_{k}h_{k}(\widetilde{T}_{k}\bigcup T_{k_{3}^{*}}). \end{aligned}$$

Based on the submodularity of $h_{k}(\cdot )$ and the details of Algorithm 1, we have

$$\begin{aligned} \sum \limits _{ e\in \widetilde{T}_{k}\bigcup T_{k_{3}^{*}}}y_{e}^{k}+\sum \limits _{ e\in \widetilde{T}_{k}\bigcap T_{k_{3}^{*}}}y_{e}^{k}&=\sum \limits _{ e\in \widetilde{T}_{k}}y_{e}^{k}+\sum \limits _{e\in T_{k_{3}^{*}}\backslash \widetilde{T}_{k}}y_{e}^{k}+\sum \limits _{ e\in \widetilde{T}_{k}\bigcap T_{k_{3}^{*}}}y_{e}^{k}\\&=\sum \limits _{ e\in \widetilde{T}_{k}}y_{e}^{k}+\sum \limits _{ e\in T_{k_{3}^{*}}}y_{e}^{k}\\&=p_{k}h_{k}(\widetilde{T}_{k})+p_{k}h_{k}(T_{k_{3}^{*}})\\&\ge p_{k}h_{k}(\widetilde{T}_{k}\bigcup T_{k_{3}^{*}})+p_{k}h_{k}(\widetilde{T}_{k}\bigcap T_{k_{3}^{*}}). \end{aligned}$$

According to the construction of Algorithm 1, we can conclude that the following inequalities hold:

$$\begin{aligned} \sum \limits _{ e\in \widetilde{T}_{k}\bigcup T_{k_{3}^{*}}}y_{e}^{k}\le p_{k}h_{k}(\widetilde{T}_{k}\bigcup T_{k_{3}^{*}}), \end{aligned}$$

$$\begin{aligned} \sum \limits _{ e\in \widetilde{T}_{k}\bigcap T_{k_{3}^{*}}}y_{e}^{k}\le p_{k}h_{k}(\widetilde{T}_{k}\bigcap T_{k_{3}^{*}}). \end{aligned}$$

Combining the above formulas, we can obtain

$$\begin{aligned} \sum \limits _{ e\in \widetilde{T}_{k}\bigcup T_{k_{3}^{*}}}y_{e}^{k}+\sum \limits _{ e\in \widetilde{T}_{k}\bigcap T_{k_{3}^{*}}}y_{e}^{k}=p_{k}h_{k}(\widetilde{T}_{k}\bigcup T_{k_{3}^{*}})+p_{k}h_{k}(\widetilde{T}_{k}\bigcap T_{k_{3}^{*}}), \end{aligned}$$

which implies that

$$\begin{aligned} \sum \limits _{e\in \widetilde{T}_{k}\bigcup T_{k_{3}^{*}}}y_{e}^{k}=p_{k}h_{k}(\widetilde{T}_{k}\bigcup T_{k_{3}^{*}}). \end{aligned}$$

Then the lemma is proved. $\square $

Theorem 1

The approximation ratio of Algorithm 1 is $2\eta $.

Proof

There is no guarantee that $\mathcal {C}^{(1)}$ and $\mathcal {C}^{(2)}$ do not intersect. Thus combining Lemmas 2 and 3, we can obtain the upper bound on the cost of SOL as follows:

$$\begin{aligned} cost(SOL)&=\mathbb {S}_{SOL}+P_{SOL}\\&=\sum \limits _{S\in \widehat{\mathbb {S}}_{0}}\sum _{(e,k)\in N(S,0)}y_{e}^{k}+\sum \limits _{k=1}^{m}\sum \limits _{S\in \widehat{\mathbb {S}}_{k}}\sum _{(e,k)\in N(S,k)}y_{e}^{k}+\sum \limits _{k=1}^{m}\sum _{e\in \widehat{T}_{k}}y_{e}^{k}\\&\le \eta \left( \sum \limits _{(e,k)\in \mathcal {C}_{1}}y_{e}^{k}+\sum \limits _{(e,k)\in \mathcal {C}_{2}}y_{e}^{k}\right) \\&\le 2\eta \left( \sum \limits _{(e,k)\in \mathcal {C}}y_{e}^{k}\right) \\&\le 2 \eta OPT. \square \end{aligned}$$

Since set cover problem generalizes the vertex cover problem, in which each element can be included in at most two subsets (that is just the case of $\eta =2$), thus we can obtain a result as follows:

Corollary 1

The Algorithm 1 is a 4-approximation algorithm for the stochastic vertex cover problem with submodular penalties.

4 Stochastic single sink rent-or-buy problem with submodular penalties

In this section, we propose an approximation algorithm for the stochastic single sink rent-or-buy problem with submodular penalties.

For simplicity of presentation, we assume that all $d_{j}$ are equal to 1 and denote the sink node by v. For ease of description and modeling, we present some notations similar to Sect. 3 as follows: $\mathcal {E}=\{(e,k):e\in E(G),\;k = 0,1, \ldots ,m\}$; $\mathcal {D}=\{(j,k):j\in D,\;k = 0,1, \ldots ,m\}$; $\mathcal {D}_{k}=\{(j,k):j\in D_k\}$ $\forall k\in \{0,1, \ldots ,m\}$ and the penalty set $\widehat{P}_{k}$ ($k=1, \ldots ,m$). The relaxed linear program of the stochastic single sink rent-or-buy problem with submodular penalties is as follows.

$$\begin{aligned} \begin{aligned} \min \quad&\sum \limits _{(j,k)\in \mathcal {D}}\sum \limits _{i\in \mathcal {F}}p_kc_{ij}^{k}x_{ij}^{k} +M\sum \limits _{(e,k)\in \mathcal {E}} p_{k}c_{e}^{k}z_{e}^{k}+ \sum _{k=1}^{m}\sum \limits _{P_k\subseteq D_k}p_{k}h(P_k)z_{P_k} \\ \text{ s.t. }\quad&\sum _{i\in \mathcal {F}}x_{ij}^{0}+\sum \limits _{i\in \mathcal {F}}x_{ij}^{k} +\sum \limits _{P_k: j\in P_k}z_{P_k}\ge 1,\quad \forall (j,k)\in \mathcal {D}\\&\sum \limits _{i\in S}x_{ij}^{k} \le \sum \limits _{(e,k):e\in \delta (S)}z_{e}^{k}, \quad \forall S\subseteq V, v\notin S, (j,k)\in \mathcal {D}\\&x_{ij}^{k}, z_{e}^{k}, z_{P_k} \ge 0. \end{aligned} \end{aligned}$$

(3)

The dual of this linear program is presented as (4).

$$\begin{aligned} \begin{aligned} \max \quad&\sum \limits _{(j,k)\in \mathcal {D}} \alpha _{j}^{k} \\ \text{ s.t. }\quad \alpha _{j}^{k}&\le p_{k}c_{ij}^{k}+\sum _{S\subseteq V:i\in S,v\notin S}\theta _{S,j}^{k},&\quad&\forall i\ne v,(j,k)\in \mathcal {D}\\ \alpha _{j}^{k}&\le p_{k}c_{vj}^{k},&\quad&\forall (j,k)\in \mathcal {D}\\ \sum _{(j,k)\in \mathcal {D}}\sum _{S\subseteq V:e\in \delta (S),v\notin S}\theta _{S,j}^{k}&\le Mp_{k}c_{e}^{k},&\quad&\forall (e,k)\in E\\ \sum \limits _{(j,k): j\in P_k}\alpha _{j}^{k}&\le p_{k}h(P_k),&\quad&\forall P_{k}\subseteq D_{k}\\ \alpha _{j}^{k},\theta _{S,j}^{k}&\ge 0. \end{aligned} \end{aligned}$$

(4)

Our algorithm is as follows:

Similar to the proof of Lemma 1, it can be proved that the time complexity of Algorithm 2 is polynomial time. So we omit it.

We will consider Step 2 and Step 4. In the following, let $(\alpha ^{(1)},\theta ^{(1)})$, $(\alpha ^{(3)},\theta ^{(3)})$ be the dual variables at the end of Step 2 and 4, respectively.

Lemma 4

$(\alpha ^{(1)},\theta ^{(1)})$ is the dual feasible solution.

To prove the Lemma 4, we need to use the following lemma in Swamy and Kumar (2004):

Lemma 5

(Swamy and Kumar 2004) Consider a client j. If $i\ne v$, then $\alpha ^{(3)k}_{j}\le \alpha ^{(1)k}_{j}+p_{k}c_{ij}^{k}+\sum \limits _{S\subseteq V :i\in S,v\notin S }\theta ^{(3)k}_{S,j} .$ Further, $\alpha ^{(3)k}_{j}\le \alpha ^{(1)k}_{j}+p_{k}c_{vj}^{k}$.

From Lemma 5, we get that the $\theta ^{(3)}_{S,\,j}$ values satisfy the third constraint of the program (4) for each scenario k. Thus it can be concluded that $(\alpha ^{'}, \theta ^{(3)})$ is a feasible solution for the dual program, where $\alpha ^{'k}_{j} =$ max$(\alpha ^{(3)k}_{j}-c_{\sigma (j)j}^{k} , 0)$.

Finally, we analyze the cost of the solution obtained by Algorithm 2. Let OPT and SOL denote the optimal solution and the solution constructed by the algorithm, respectively. Also, cost(SOL) denotes the total cost. Then

$$\begin{aligned} cost(SOL)=C_{SOL}+P_{SOL}+T_{SOL}, \end{aligned}$$

where $C_{SOL}$, $P_{SOL}$ and $T_{SOL}$ denote the service cost, penalty cost and the cost of Steiner tree, respectively.

According to the construction of $D_{k}^{i}$ and F, we have that $D_{k}^{i}\bigcap D_{k}^{i^{'}}=\emptyset ,$ for all $i,i^{'}\in F$ at the end of Step 2. The service cost of $C_{SOL}$ is bounded in the following lemma.

Lemma 6

$$\begin{aligned} (C1) E[C_{SOL}]&\le \sum _{k=1}^{m}\left( \sum \limits _{j\in D_{k}^{'}\backslash \widehat{P}_k}p_{k}c_{ij}^{k}+3\sum \limits _{j\notin D_{k}^{'}\bigcup \widehat{P}_k}\alpha _{j}^{(1)k}\right) \\&\le \sum _{k=1}^{m}\left( \sum \limits _{j\in D_{k}^{'}\backslash \widehat{P}_k}\alpha _{j}^{(1)k}+3\sum \limits _{j\notin D_{k}^{'}\bigcup \widehat{P}_k}\alpha _{j}^{(1)k}\right) ,\\ (C2) E[P_{SOL}]&= \sum _{k=1}^{m}\sum _{j\in {P}_k}\alpha _{j}^{(1)k},\\ (C3) E[cost(T)]&\le 2\cdot \sum _{k=1}^{m}\sum _{j\in D_{k}^{'}}\alpha _{j}^{(3)k}. \end{aligned}$$

Proof

(C1) For any clients $j\in D_k\backslash \widehat{P}_k$, consider the following two possibilities.

Case 1 If $j\in D_{F}^{k}\backslash \widehat{P}_k$. Then there exists a location $i\in F$ such that $j\in D_{k}^{i}$ and $j\notin D_{F}^{k}\backslash D_{k}^{i}$. Connect client j to location i with excepted connection cost $p_{k}c_{ij}^{k}$. Clearly j is tight with i. So we can get $p_{k}c_{ij}^{k}\le \alpha _{j}^{(1)k}$.

Case 2 If $j\notin D_{F}^{k}\backslash \widehat{P}_k$, it means that j is not tight with a location in F, say $\sigma ^{k}(j)=i^{'}$. Then there is a location $i\in F$ and a client $j^{'}$ such that the client $j^{'}$ is tight with both i and $i^{'}$. Establish a connection between j and i. Define $t_{i}$ and $t_{i^{'}}$ to be the times at which locations i and $i^{'}$ are temporarily open respectively. Since $j^{'}$ is tight with two clients (i and $i^{'}$), it can be obtained that $\alpha _{j^{'}}^{(1)k}\ge p_{k}c^{k}_{i^{'}j^{'}}$ and $\alpha _{j^{'}}^{(1)k}\ge p_{k}c^{k}_{ij^{'}}$. From Algorithm 2, we can obtain that client $j^{'}$ is frozen earlier than min$\{t_{i}, t_{i^{'}}\}$ which implies that $\alpha _{j^{'}}^{(1)k}\le $ min$\{t_{i}, t_{i^{'}}\}$. In addition, we know that location $i^{'}$ is a witness for j. So we can easily obtain $\alpha _{j}^{(1)k}\ge t_{i^{'}}$, implying that $\alpha _{j}^{(1)k}\ge \alpha _{j^{'}}^{(1)k}$. According to the triangle inequality, we can get

$$\begin{aligned} p_{k}c_{ij}^{k}\le p_{k}c_{ij^{'}}^{k}+p_{k}c^{k}_{i^{'}j^{'}}+p_{k}c^{k}_{i^{'}j}\le 3\alpha _{j}^{(1)k} \end{aligned}$$

Comprehensively considering Case 1 and Case 2, (C1) can be proved.

(C2) At any time t of Step 2, the set $\widehat{P}_k$ always satisfies the following property

$$\begin{aligned} \sum \limits _{j\in \widehat{P}_k}\alpha _{j}^{k}(t)=p_{k}h_{k}(\widehat{P}_k), \end{aligned}$$

where $\alpha ^{k}_{j}(t)$ is the budget value of client j in scenario k at time t, and increase with time t until client j becomes frozen. By Lemma 4, the proof is straightforward.

(C3) This inequality is referenced from Swamy and Kumar (2004). $\square $

By Lemmas 5 and 6, it is not difficult to obtain:

$$\begin{aligned} E[T_{SOL}]\le 2\left( \sum _{k=1}^{m}\sum _{j\in D_{k}}\alpha _{j}^{'k}+2\sum \limits _{j\in D_{k}^{'}}p_{k}c^{k}_{\sigma (j)j}\right) \le 2\cdot OPT+2\sum _{k=1}^{m}\sum \limits _{j\in D_{k}^{'}}\alpha _{j}^{(1)k}. \end{aligned}$$

(5)

Then we present our main results of this section.

Theorem 2

Algorithm 2 is a 5-approximation combinatorial algorithm for the single sink rent-or-buy problem with submodular penalties.

Proof

It follows from inequality (5) and Lemma 6 that the cost of SOL is at most

$$\begin{aligned} E[cost(SOL)]&= E[C_{SOL}]+E[P_{SOL}]+E[T_{SOL}]\\&\le \sum _{k=1}^{m}\left( \sum \limits _{j\in D_{k}^{'}\backslash \widehat{P}_k}\alpha _{j}^{(1)k}+3\sum \limits _{j\notin D_{k}^{'}\bigcup \widehat{P}_k}\alpha _{j}^{(1)k}+\sum _{j\in P_k}\alpha _{j}^{(1)k}+2\sum \limits _{j\in D_{k}^{'}}\alpha _{j}^{(1)k}\right) \\&+2\cdot c(OPT)\\&\le 3\sum _{k=1}^{m}\sum \limits _{j\in D}\alpha _{j}^{(1)}+2\cdot c(OPT)\\&\le 5\cdot c(OPT)\\ \end{aligned}$$

$\square $

In Step 4 we use the algorithm in Byrka and Grandon (2010) to construct the Steiner tree on the open locations, then we can obtain a better ratio as follows:

Theorem 3

The ratio of Algorithm 2 can be improved to 4.39.

Proof

Based on the result of Lemma 6 and Byrka and Grandon (2010), the upper bound of cost(SOL) is

$$\begin{aligned} E[cost(SOL)]&= E[C_{SOL}]+E[P_{SOL}]+E[T_{SOL}]\\&\le \sum _{k=1}^{m}\left( \sum \limits _{j\in D_{k}^{'}\backslash \widehat{P}_k}\alpha _{j}^{(1)k}+3\sum \limits _{j\notin D_{k}^{'}\bigcup \widehat{P}_k}\alpha _{j}^{(1)k}+\sum _{j\in P_k}\alpha _{j}^{(1)k}+2\sum \limits _{j\in D_{k}^{'}}\alpha _{j}^{(1)k}\right) \\&+1.39\cdot c(OPT)\\&\le 3\sum _{k=1}^{m}\sum \limits _{j\in D}\alpha _{j}^{(1)}+1.39\cdot c(OPT)\\&\le 4.39\cdot c(OPT)\\ \end{aligned}$$

$\square $

5 Conclusions

In this paper, we consider the stochastic set cover problem and the single sink rent-or-buy problem with submodular penalty. We also present two primal-dual approximation algorithms respectively. In stochastic optimization problems, many techniques exist, such as sampling, cost sharing function, and primal dual etc. It will be our future work to design approximate algorithms for more general stochastic optimization problems based on these techniques.

References

Arora A, Sudan M (2003) Improved low degree testing and applications. Combinatorica 23(3):365–426
Article MathSciNet Google Scholar
Balkanski E, Singer Y (2020) A lower bound for parallel submodular minimization. In: Proceedings of the 52nd annual ACM SIGACT symposium on theory of computing (STOC), pp 130–139
Bellare M, Goldwass S, Lund C, Russell A (1994) Efficient probabilistically checkable proofs and applications to approximations. In: Proceedings of the 26th annual ACM symposium on theory of computing (STOC), pp 23–25
Byrka J, Grandon F (2010) An improved LP-based approximation for Steiner tree. In: Proceedings of the 42nd ACM symposium on theory of computing (STOC), pp 5–8
Fleischer L, Iwata S (2003) A push-relabel framework for submodular function minimization and applications to parametric optimization. Discrete Appl Math 131(2):311–322
Article MathSciNet Google Scholar
Gandhi R, Khuller S, Srinivasan A (2004) Approximation algorithms for partial covering problems. J Algorithms 53(1):55–84
Article MathSciNet Google Scholar
Gupta A, Kleinberg J, Kumar A, Rastogi R, Yener B (2001) Provisioning a virtual private network: a network design problem for multicommodity flow. In: Proceedings of the 33rd annual ACM symposium on theory of computing (STOC), pp 389–398
Hochbaum D-S (1982) Approximation algorithm for set covering and vertex cover problems. SIAM J Comput 11(3):555–556
Article MathSciNet Google Scholar
Johnson D-S (1974) Approximation algorithms for combinatorial problems. J Comput Syst Sci 9(3):256–278
Article MathSciNet Google Scholar
Karger D-R, Minkoff M (2000) Building Steiner trees with incomplete global knowledge. In: Proceedings of the 41st annual symposium on foundations of computer science (FOCS), pp 613–623
Karp R-M (1972) Reducibility among combinatorial problems. In: Miller RE, Thatcher JW, Bohlinger JD (eds) Complexity of computer computations. The IBM Research Symposia Series. Springer, Boston, pp 85–103
Chapter Google Scholar
Khuller S, Zhu A (2002) The general Steiner tree-star problem. Inf Process Lett 84(4):215–220
Article MathSciNet Google Scholar
Kearns M-J (1990) The computational complexity of machine learning. The MIT Press, Cambridge
Google Scholar
Kim T-U, Lowe T-J, Tamir A, Ward J-E (1996) On the location of a tree-shaped facility. Networks 28(3):167–175
Article MathSciNet Google Scholar
Könemann J, Parekh O, Segev D (2011) A unified approach to approximating partial covering problems. Algorithmica 59(4):489–509
Article MathSciNet Google Scholar
Labbé M, Laporte G, Martin I-R, González J-J-S (2001) The median cycle problem. Technical Report 2001/12, Department of Operations Research and Multicriteria Decision Aid at Université Libre de Bruxelles
Lee Y, Chiu S-Y, Ryan J (1996) A branch and cut algorithm for a Steiner tree-star problem. Inf J Comput 8(3):194–201
Article Google Scholar
Li J, Liu Y (2016) Approximation algorithms for stochastic combinatorial optimization problems. J Oper Res Soc China 4(1):1–47
Article MathSciNet Google Scholar
Parthasarathy S (2018) Adaptive greedy algorithms for stochastic set cover problems. arXiv:1803.07639
Ravic R, Sinhac A (2006) Hedging uncertainty: approximation algorithms for stochastic optimization problems. Math Programm 108(1):97–114
Article MathSciNet Google Scholar
Raz R, Safra S (1997) A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP. In: Proceedings of the 29th annual ACM symposium on the theory of computing (STOC), pp 4–6
Slavik P (1997) Improved performance of greedy algorithm for partial over. Inf Process Lett 64(5):251–254
Article Google Scholar
Swamy C, Kumar A (2004) Primal-dual algorithms for connected facility location problems. Algorithmica 40(4):245–269
Article MathSciNet Google Scholar
Tang S (2021) Beyond pointwise submodularity: non-monotone adaptive submodular maximization in linear time. Theor Comput Sci 850:249–261
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported or partially supported by the National Natural Science Foundation of China (Grant Nos. 11871280, 11871081, 11771386 and 11728104), the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant 06446, Qinglan Project and Zhejiang Provincial Natural Science Foundation (No. LY20A010013).

Author information

Authors and Affiliations

Department of Operations Research and Information Engineering, Beijing University of Technology, Beijing, 100124, People’s Republic of China
Jian Sun
School of Mathematics and Statistics, Ningbo University, Zhejiang, 315211, People’s Republic of China
Yuefang Sun
Faculty of Management, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
Donglei Du
School of Mathematical Science and Institute of Mathematics, Nanjing Normal University, Jiangsu, 210023, People’s Republic of China
Haiyun Sheng & Xiaoyan Zhang

Authors

Jian Sun
View author publications
You can also search for this author in PubMed Google Scholar
Haiyun Sheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuefang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Donglei Du
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoyan Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version appeared in the Proceedings of the 14th International Conference on Algorithmic Aspects in Information and Management (AAIM), August 10–12, 2020, Jinhua, China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, J., Sheng, H., Sun, Y. et al. Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty. J Comb Optim 44, 2626–2641 (2022). https://doi.org/10.1007/s10878-021-00753-x

Download citation

Accepted: 01 May 2021
Published: 21 July 2021
Issue Date: November 2022
DOI: https://doi.org/10.1007/s10878-021-00753-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty

Abstract

Similar content being viewed by others

Minmax regret 1-sink location problems on dynamic flow path networks with parametric weights

Approximation Algorithm for Stochastic Set Cover Problem

Minimax Regret k-sink Location Problem in Dynamic Path Networks

1 Introduction

2 Stochastic set cover problem with submodular penalties

3 The analysis of Algorithm 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 1

Proof

Corollary 1

4 Stochastic single sink rent-or-buy problem with submodular penalties

Lemma 4

Lemma 5

Lemma 6

Proof

Theorem 2

Proof

Theorem 3

Proof

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximation algorithms for stochastic set cover and single sink rent-or-buy with submodular penalty

Abstract

Similar content being viewed by others

Minmax regret 1-sink location problems on dynamic flow path networks with parametric weights

Approximation Algorithm for Stochastic Set Cover Problem

Minimax Regret k-sink Location Problem in Dynamic Path Networks

1 Introduction

2 Stochastic set cover problem with submodular penalties

3 The analysis of Algorithm 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Theorem 1

Proof

Corollary 1

4 Stochastic single sink rent-or-buy problem with submodular penalties

Lemma 4

Lemma 5

Lemma 6

Proof

Theorem 2

Proof

Theorem 3

Proof

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation