1 Introduction

Uncertainty is an inevitable facet of almost all important decision problems. It is presented in a variety of forms and caused by many reasons, such as the errors and/or noises in data measurements, the model parameters, or the predictions made by probabilistic predictive algorithms. Various types of problems of handling uncertainty have been a subject of extensive research in many areas, including computer science operations research, economics, management science, and social science. Stochastic optimization is a major method for dealing with uncertainty for optimization problems by modeling the uncertainty using probability distributions of the input instances. The field originated from the work of Dantzig [1], and has been studied extensively and found numerous applications in many areas. A central theme in the field is to solve stochastic versions of mathematical programming problems (the study of such problems is often referred to as stochastic programming as well). We refer interested readers to [2, 3].

Recent years have witnessed a surge of interest in solving combinatorial optimization problems under uncertainty from the theoretical computer science community. This is in part due to the dramatic increase in the number of application domains, such as information extraction systems, data integration systems, sensor networks, and predictive machine learning algorithms, which produces a huge amount of uncertain data. At the same time, applications (including the above) that require solving certain combinatorial optimization problems when fed with such data, have to take the uncertainty into account. This gives rise to a variety of stochastic models and optimization problems. In this article, we review some recent work on various stochastic versions of classical combinatorial optimization problems. It is well known that many combinatorial optimization problems are NP-hard. So we can imagine that the added stochasticity often makes the problems even harder (NP-hard, #P-hard, or even PSPACE-hard). Therefore, it is very unlikely that these stochastic problems admit efficient algorithms that can solve them exactly. One principled way to deal with the computational intractability is to settle for polynomial time approximation algorithms. We refer interested readers to the books on this field of approximation algorithms [4, 5]. A major focus in this field is to obtain polynomial time algorithms with provable approximation ratio. For maximization problems, the ratio is defined to be the ratio between the optimal cost and the cost of our solution (or expected cost, if any stochasticity or/and randomizationFootnote 1). For minimization problems, it is the ratio between the cost of our solution and the optimal cost. Obviously, the ratio is at least 1 and we want it to be as close to 1 as possible.

In this article, we present a selection of recent results in polynomial time approximation algorithms for stochastic combinatorial optimization problems. We have not attempted to be comprehensive and our selection of problems may be idiosyncratic. We also include the details of the algorithms and the analysis of approximation ratios for some problems. None of these results are really new. The expositions for some of them are simplifications of known results (possibly with slightly worse approximation ratios sometimes in favor of expositional simplicity). The purpose of doing this is to give readers a flavor of the models, ideas, and techniques in this new exciting and fast growing area.

1.1 A Few Representative Problems

There are diverse ways in which uncertainty can interact with and influence the decision process. So, there are several natural corresponding stochastic models. Instead of introducing these abstract models, we choose to be concrete by providing a few representative problems. We also highlight some interesting motivating applications for those problems.

Problem 1.1

(Fixed-Set Stochastic Shortest Path) We are given a stochastic graph where the length \(\mu _{e}\) of each edge e is an independent random variable.Footnote 2 See Fig. 1 for an example. The objective is to find an s-t path P connecting s and t such that the probability that the length of P is at most a given threshold C, i.e., \(\Pr \left[ \sum _{e\in P}\mu _e \leqslant {C}\right] \), is maximized.Footnote 3

Fig. 1
figure 1

A stochastic graph. There are three edges with uncertain lengths and \(2^3=8\) possible realizations in total

We can also think \(\mu _e\) as the travel time of the edge e (the probability distribution of \(\mu _e\) may be obtained from historic data, or traffic prediction algorithms). We need to arrive the airport before a specific time in order to catch a flight . The above problem asks for a path that maximizes the probability that we can arrive on time.

The problem and its variants have been studied in a number of papers since 1980s (see e.g., [611]). Besides the shortest path problem, one can easily formulate similar problems for other combinatorial optimization problems. We will discuss this problem and its generalizations in more details in Sect. 8.

Note that the solution path P is chosen before the trip, and once chosen, is a fixed set of edges. We call such problems fixed-set problems. Next, we introduce a problem in which the decision has to be made in an adaptive manner.

Problem 1.2

(Adaptive Stochastic Knapsack) We are given a knapsack with fixed capacity C and a set of items. The size and the profit of each item are uncertain and only their probability distributions are known to us in advance. We only know the actual size and profit of an item when we insert it into the knapsack. Our goal is to design an adaptive policy that maximizes the expected total profit of items that can be put into the knapsack. More specifically, the adaptive policy should determine which item to insert next based on the remaining capacity and the set of available items.

The above problem has a very natural motivating example in stochastic scheduling. Suppose we want to schedule a subset of n jobs on a single machine. We have a fixed deadline and we want to gain as much profit as possible before the deadline. The precise processing time and profit of each job are random variables, and the true values are only revealed until the job is completed.

The problem is a classical problem in stochastic combinatorial optimization. It was studied back in 1970s in the stochastic control and operations research literature, and some interesting special cases were identified and solved optimally (see e.g., [1215]). The problem has a very different nature from the fixed-set problems, in which a solution to it is an adaptive policy, instead of a set. Even to specify the optimal policy may need exponential space (see Sect. 3 for details). The problem is likely to be PSPACE-hard (a variant of the problem has been shown to be PSPACE-hard [16]). Designing good approximation algorithms for this problem and its generalizations has received considerable attention in recent years, which we review in more details in Sect. 3.

In the fixed-set stochastic shortest path problem, we make our final decision in one shot, while in the stochastic knapsack problem, we make decisions in an adaptive fashion. Another popular model that interpolates the above two models is the two-stage stochastic model with recourse. We provide an example below.

Problem 1.3

(Two-stage Stochastic Set Cover) We are given a set U of ground elements and the weighted family \({\mathcal F}\) of subsets of U. We would like to choose some subsets \({\mathcal S}\subseteq {\mathcal F}\) of minimum total weight to cover all elements in U (i.e., each element of U is contained in some subsets of \({\mathcal S}\)). This is the classical deterministic set cover problem. In the two-stage stochastic model, there are two stages (by its name). In the first stage, we do not know the actual demand (i.e., the set of elements we need to cover), but only a distribution \(\mathcal {D}\) of it.Footnote 4 We can select a few subsets \({\mathcal S}^\mathrm{I}\subseteq {\mathcal F}\) at the first stage. In the second stage, the actual demand set \(U' (U'\subseteq U)\) is revealed. However, our first-stage decision \({\mathcal S}^\mathrm{I}\) may not be able to cover \(U'\), and we can select some extra subsets \({\mathcal S}^\mathrm{II}\subseteq {\mathcal F}\) to cover the remaining elements of \(U'\). The catch is that the weight of each subset in \({\mathcal F}\) in the second stage is \(\lambda \) times more expensive than in the first stage (\(\lambda >1\) is called the inflation factor).Footnote 5 The goal is to minimize the expected cost over both stages.

The above 2-stage model naturally captures the applications in which the decision has to be made before knowing the demand, and some more expensive recourse action can be taken afterwards. For example, suppose there is a company which needs to set up its facilities. Before building the facilities, they only know the partial knowledge about the demands. After setting up the facilities, they receive the real demand. If the demand exceeds the capability of the current facilities, the company can set up more facilities, which is usually more expensive, to meet the extra demand.

The two-stage stochastic optimization model has been a major topic in both the stochastic programming literature (see e.g., [2]), and in the stochastic combinatorial optimization literature, which we will discuss in more details in Sects. 6 and 7.

All the above three problems assume the stochastic information is given as the input. However, in many real-life problems, there is no readily available stochastic information. All we have are some presumed probabilisticFootnote 6 models with unknown parameters (e.g., the cost is distributed normally with unknown mean and variance). We are allowed to take samples (or we may already have some historical samples) to obtain the distributional information (e.g., to learn the parameters of the probabilistic model). The learning step has traditionally concerned the statistical learning community, and was separated from the optimization stage in many cases. However, there have been several interesting recent research problems that attempt to bring the learning step and the optimization step together. We provide an example below. More can be found in Sect. 9.

Problem 1.4

(Combinatorial Bandit-Arm Selection) We have a bandit with n arms, where the i-th arm is associated with an unknown reward distribution supported on [0, 1] with mean \(\theta _i\). Upon each sample (or “pull”) of a particular arm, we receive a reward, which is an i.i.d. sample from the underlying reward distribution. We sequentially decide which arm to be pulled next and then collect the reward by sampling that arm. The top-k arm identification problem is to identify a subset of k arms with the maximum total mean using as few samples as possible.

Instead of the simple cardinality constraints, we can consider other combinatorial constraints over the arms as well. For example, suppose there is a matroid constraint over the set of arms and we need to select a basis of the matroid. The model was proposed recently by Chen at al. [17].

We highlight an interesting motivating application of the top-k arm identification in crowdsourcing. In recent years, crowdsourcing services become increasingly popular for providing labeled data for many machine learning tasks. In such a service (e.g., Amazon Mechanical TurkFootnote 7), the requester submits a batch of microtasks (e.g., unlabeled data) and the workers from the crowd are asked to complete the tasks. Upon each task completion, a worker receives a small monetary reward. Since some workers from the crowd are noisy and unreliable, it is important to choose more reliable workers. We can model the reliability of a worker by a Bernoulli distribution. One way to choose reliable workers is to test each worker by a few golden samples (i.e., data with the known correct labels). One such test corresponds to a pull of the arm. Clearly, it is desirable to select the best K workers using as few samples as possible.

The reader may have already noticed the deterministic versions of the above problems are mostly classical combinatorial optimization problems, which are quite well understood. However, the stochasticity has drastically changed the nature of these problems. The study of the approximability for many of stochastic combinatorial problems is still an active research area and there are many open problems. We attempt to list some of them along the way.

We note that the body of literature on the topic is already large and has been growing quite fast. There are certainly other important stochastic combinatorial optimization problems that do not belong to any of the above four classes. Depending on how the stochasticity interacts with the decision process in new applications, it is possible to formulate new classes of meaningful models and problems. We mention some of them in Sect. 9.

The outline of this article is as follows: We first review some standard terminologies in Sect. 2. In the next few sections, we discuss a few problems and techniques in details. In particular, we discuss the stochastic knapsack problem (Problem 1.2) in Sect. 3, the stochastic matching problem in Sect. 4, and the contention resolution scheme with its applications to stochastic probing in Sect. 5. The above three sections are about adaptive stochastic problems. In the next two sections, we discuss two important techniques, the linear programming (LP) technique (Sect. 6) and the boosted sampling technique (Sect. 7) for two-stage stochastic optimization with recourse. In Sect. 8, we introduce the Poisson approximation technique, which can be used to deal with the sum of random variables in both fixed-set problems and stochastic knapsack. In Sect. 9, we briefly discuss a few other important models in the literature.

2 Preliminaries

We first review some standard terminologies. For a minimization problem, we say an algorithm achieves an approximation factor of \(\alpha (\alpha \geqslant 1)\), if

where \(\mathrm {SOL}\) denotes the cost of the solution found by the algorithm, \(\mathrm {OPT}\) denotes the optimal cost, and the expectation is over the randomness of the problem instance and the algorithm (if it is randomized). For a maximization problem, an algorithm achieves an approximation factor of \(\alpha (\alpha \geqslant 1)\), if \( \mathrm {OPT}/ {E}[\mathrm {SOL}] \leqslant \alpha . \)

A polynomial time approximation scheme (PTAS) is an algorithm which takes an instance of a maximization problem and a parameter \(\varepsilon \) and produces a solution whose cost is at least \((1 - \varepsilon )\mathrm {OPT}\), and the running time, for any fixed \(\varepsilon \), is polynomial in the size of the input. If \(\varepsilon \) appears as an additive factor in the above definition, namely the cost of the solution is at least \(\mathrm {OPT}-\varepsilon \), we say the algorithm is an additive PTAS. In many cases, a PTAS (such as an \(O\big (n^{1/\varepsilon ^2}\big )\) time algorithm) is not efficient in practice. However, obtaining a PTAS for a problem is of significant theoretical interest and importance, as it is the best approximation ratio we can achieve in polynomial time for NP-hard (or harder) problems. We say a PTAS is a fully polynomial time approximation scheme (FPTAS) if the running time is polynomial in the size of the input and \(\frac{1}{\varepsilon }\). FPTAS can be quite efficient in practice.

For a deterministic combinatorial optimization problem A, the exact version of a problem A asks for a feasible solution of A with weight exactly equal to a given number K. An algorithm runs in pseudopolynomial time for the exact version if the running time of the algorithm is bounded by a polynomial of n and K.

The following standard Markov inequality will be used frequently.

Proposition 2.1

(Markov inequality) Let X be a random variable and E[X] be its expectation. For any \(\alpha >1\), it holds that

$$\begin{aligned} \Pr [X\geqslant \alpha E [X]]\leqslant 1/\alpha . \end{aligned}$$

A finite matroid \({\mathcal M}\) is a pair \((V,{\mathcal I})\), where V is a finite set (called the ground set) and \({\mathcal I}\) is a collection of subsets of V. Each element in \({\mathcal I}\) is called an independent set. A maximal independent set is called a basis. Moreover, \(\mathcal{M}=(V,{\mathcal I})\) satisfies the following three properties:

  1. 1.

    \(\varnothing \in {\mathcal I}\);

  2. 2.

    if \(A\subseteq B\) and \(B\in {\mathcal I}\), then \(A\in {\mathcal I}\); and

  3. 3.

    for all \(A, B \in {\mathcal I}\) with \(|A|>|B|\), there exists an element \(e\in A{\setminus }B\) such that \(B\cup \{e\} \in {\mathcal I}\).

Matroids generalize several important combinatorial objects, such as all subsets of a set with cardinality at most k, all forests in a graph, all linear independent subsets of a set of vectors. For more information about the theory of matroids, see, e.g., [18].

3 Adaptive Stochastic Knapsack

In this section, we consider Problem 1.2 the adaptive stochastic knapsack problem. First, we recall in the deterministic knapsack problem, we have a set of n item \({\mathcal I}\). The ith item has a value \(v_i\) and size \(s_i\). Our objective is to choose a maximum value set of items that fits into a knapsack of capacity C.

In the stochastic knapsack problem, the values \(\{v_i\}\) and sizes \(\{s_i\}\) of items are non-negative independent random variables with known distributions. For simplicity, we assume the distributions are discrete. One needs to insert the items one by one. Once one item is inserted, its value and size are revealed (drawn from the distribution), and we procure its value. If the realization of an item causes the overflow of the knapsack, we stop and the overflowing item contributes zero value. Our goal is to design a policy to insert items and maximize the expected total value of items successfully placed in the knapsack. Without loss of generality, we can assume that the values \(\{v_i\}\) are deterministic, and the capacity of the knapsack is \({C} = 1\).

We need to explain what is a policy. A non-adaptive policy is simply an ordering of items to insert. We also consider adaptive policies, which are more general. An adaptive policy is formally defined by a mapping \({\mathcal P}: 2^{\mathcal I}\times {\mathbb {R}}^+\rightarrow {\mathcal I}\) specifying which item to insert next, given the subset of items still available and the total remaining capacity. We can think about a policy as a decision tree. At every node of the tree, the policy specifies which item to insert next, and the directed edges leaving this node correspond to the different possible realizations of the size of this item. See Fig. 2 for an example. We can see the decision tree corresponding to an adaptive policy may have a linear height, and thus an exponential size. This means that we may not be able to represent the optimal policy in polynomial space. So, it is not even clear if the problem is in NP. In fact, Dean et al. [19] showed that a close variant of the above problem is PSPACE-hard.

Fig. 2
figure 2

A decision tree corresponding to an adaptive policy

Now, we present a constant factor approximation algorithm due to [16]. Before stating the algorithm, we first need some definitions and simple observations. For each item, it is convenient to consider the mean truncated size

$$\begin{aligned} \mu _i = {E}[\min \{s_1, 1\}]. \end{aligned}$$

For a set of item, we define size\((S) = \sum _{i\in S} s_i, \mu (S) = \sum _{i\in S} \mu _i\) and val\((S) = \sum _{i\in S} v_i\). We make the following simple claim.

Claim 3.1

For any fixed set S of items, \(\Pr [\mathrm{size}(S) < 1] \geqslant 1 - \mu (S)\).

Proof

By the definition of mean truncated size and Markov inequality, we can see that \(\Pr [\mathrm{size}(S) \geqslant 1] =\Pr [\min \{\mathrm{size}(S),1\} \geqslant 1] \leqslant {E}[\min \{\mathrm{size}(S), 1\}] \leqslant {E}\left[ \sum _{i\in S} \min \{s_i, 1\}\right] = \mu (S)\).

Lemma 3.2

For any adaptive policy \({\mathcal P}\), let S be the (random) set of all items that \({\mathcal P}\) tries to insert. We have \({E}[\mu (S)] \leqslant 2\), where the expectation is taken over the executions of the policy.

Proof

(sketch) Denote the truncated item size by \(\tilde{s}_i = \min \{s_i, 1\}\). For any policy \({\mathcal P}\), let \(S_t\) denote the first t items chosen by \({\mathcal P}\). We define \(X_t = \sum _{i\in S_t} (\tilde{s}_i - \mu _i)\). It is easy to verify \(X_t\) is a martingale.Footnote 8 Since \(X_0 = 0\), we get \({E}[X_t] = {E}[X_0]= 0\) due to the martingale property. Thus, \({E}[X_t]={E}[\sum _{i\in S_t} \tilde{s}_i]-{E}[\mu (S_t)] =0\). Moreover, the process stops once size\((S_t) > 1\) or there is no item left. Hence, \(\sum _{i\in S_t} \tilde{s}_i\leqslant 2\) is always true. Taking the expectation, we can see \({E}[\mu (S_t)]\leqslant 2\).

3.1 The Greedy Algorithm

We divide the items \({\mathcal I}\) into two sets: light items \({\mathcal L}\) with \(\mu _i \leqslant \sigma \) and heavy items \({\mathcal H}\) with \(\mu _i > \sigma \), where \(\sigma \) is a constant to be specified later. Assume \({\mathcal L}= \{1, 2, 3, \cdots \}\) such that \(\frac{v_1}{\mu _1} \geqslant \frac{v_2}{\mu _2} \geqslant \cdots \). Denote by \(M_k = \sum _{i = 1}^k \mu _i\). Let \(n^*\) be the maximum number such that \(M_{n^*} \leqslant 1\). We define two important quantities

$$\begin{aligned} m_G = \sum _{i = 1}^{n^*} v_i(1 - M_i) \quad \text { and }\quad m_1 = \max _{i\in {\mathcal I}}\{v_i\Pr [s_i \leqslant 1]\}. \end{aligned}$$

The algorithm is the following:

figure a

3.2 Analysis

Now, we analyze the approximation ratio of the greedy algorithm. First, we show the expected value of the algorithm can be lower bounded by \(m_G\) and \(m_1\).

Lemma 3.3

Let GRD be the expected value obtained by Algorithm 1. It holds that \(\mathrm {GRD}\geqslant m_1\) and \(\mathrm {GRD}\geqslant m_G\).

Proof

If \(m_1\geqslant m_G\), the expected value of the algorithm is exactly \(m_1\). Otherwise, we can see

$$\begin{aligned} \mathrm {GRD}=\sum _{k\geqslant 1} v_k\Pr \left[ \sum _{i=1}^k s_i \leqslant 1\right] =\sum _{k\geqslant 1} v_k\Pr \left[ \sum _{i=1}^k \tilde{s}_i \leqslant 1\right] . \end{aligned}$$

By Markov inequality, \( \Pr \left[ \sum _{i=1}^k \tilde{s}_i > 1\right] \leqslant \sum _{i=1}^k \mu _i=M_k, \) for any \(k\leqslant n^*\). Hence, \( \mathrm {GRD}\geqslant \sum _{k=1}^{n^*} v_k (1-M_k) = m_G\).

The remaining task is to lower bound \(m_1\) and \(m_G\) by the optimal value \(\mathrm {OPT}\). We handle light items and heavy items separately. We first deal with light items \({\mathcal L}\). We define a function \(\varPhi (c)\) as follows:

$$\begin{aligned} \varPhi (c) = \left\{ \max \sum _{i\in {\mathcal L}}v_ix_i: 0\leqslant x_i\leqslant 1, \sum _{i\in {\mathcal L}} \mu _ix_i\leqslant c\right\} . \end{aligned}$$
(3.1)

We can see \(\varPhi (c)\) is the best fractional relaxation of the knapsack problem with capacity c using only light items. A crucial observation is the following lemma, which places an upper bound for any adaptive policy.

Lemma 3.4

Let \(\textit{OPT}_{{\mathcal L}}\) be the expected value obtained by the optimal policy for the problem where we only have light items. Then, we have \(\mathrm {OPT}_{{\mathcal L}}\leqslant \varPhi (2)\).

Proof

(sketch) Let us fix an arbitrary adaptive policy. We should interpret \(x_i\) as the probability that item i is successful placed in the knapsack by the policy. So, \(\sum _{i\in {\mathcal L}}v_ix_i\) is the expected total value obtained by the policy. According to Lemma 3.2, it holds that \(\sum _{i\in {\mathcal L}} \mu _ix_i= {E}[\mu (S)] \leqslant 2\), where S is the set of items successfully placed in the knapsack. This constraint holds for any policy including the optimal one. Hence, the optimal policy has expected value at most \(\varPhi (2)\).

It is well known that the optimal fractional solution \(\varPhi (c)\) for the knapsack problem: We simply pack items in the decreasing order of \(\frac{v_i}{\mu _i}\), with only the last item possibly packed fractionally. In particular, \(\varPhi (c)\) is a piecewise linear and concave function with the following explicit form:

$$\begin{aligned} \forall k\in {\mathcal L}, \forall \delta \in [0,\mu _k), \varPhi (M_{k - 1} + \delta ) = \sum _{i = 1}^{k - 1}v_i + \frac{v_k}{\mu _k}\delta . \end{aligned}$$

Lemma 3.5

\(m_G \geqslant \frac{1 - \sigma }{2}\varPhi (1)\geqslant \frac{1 - \sigma }{4}\varPhi (2)\geqslant \frac{1 - \sigma }{4}\mathrm {OPT}_{{\mathcal L}}\).

Proof

We prove the lemma by picture. \(m_G\) corresponds to the area under the staircase. \(\varPhi (1)\) is twice of the area of the dark-shaded triangle ABC. Compare them in Fig. 3. The part of the triangle which is not covered by \(m_G\) consists of small triangles whose heights sum up to at most \(\varPhi (1)\), and the base of each triangle is \(\mu _i \leqslant \sigma \). Therefore \(m_G \geqslant \frac{1- \sigma }{2}\varPhi (1)\). The second inequality is due to the concavity of \(\varPhi \) and the last follows from Lemma 3.4.

Fig. 3
figure 3

The fractional knapsack solution \(\varPhi (c)\). The light-shaded area corresponds to \(m_G\). As \(\varPhi (c)\) is concave, the curve \(\varPhi (c)\) is above the segment AC (This figure is cited from [16])

Lemma 3.6

For any policy \({\mathcal P}\), we use \(H\subset {\mathcal H}\) to denote the subset of heavy items that are successfully placed in the knapsack. Note that H is a random set. Then \(E[\mathrm{val}(H)]\leqslant \frac{2}{\sigma }m_1\).

Proof

Whenever the policy \({\mathcal P}\) attempts to insert an element, it succeeds with probability at most \(\Pr [s_i\leqslant 1]\). Let S be the set of heavy item that \({\mathcal P}\) attempts to insert. Thus we have \(\Pr [i\in H] \leqslant \Pr [{\mathcal P}\text { attempts to insert }i]\Pr [s_i \leqslant 1] =\Pr [i\in S]\Pr [s_i \leqslant 1]. \) From Lemma 3.2, we know \(E[\mu (S)] \leqslant 2\). Finally, we can see that

$$\begin{aligned} {E}[\mathrm{val}(H)]&= \sum _i v_i\Pr [i\in H] \leqslant \sum _i v_i\Pr [i\in S]\Pr [s_i \leqslant 1] \\&\leqslant m_1 \sum _i \Pr [i\in S] = {E}[|S|] m_1 \leqslant \frac{2}{\sigma }m_1. \end{aligned}$$

The last inequality follows from the definition of heavy element, \(\mu _i\geqslant \sigma \) for each \(i\in S\).

Theorem 3.7

For \(\sigma = 1/3, \mathrm {OPT}\leqslant 12 \mathrm {GRD}\). So, Algorithm 1 is a 12-approximation.

Proof

Consider the optimal adaptive policy, and denote the set of items successfully inserted as \(S = L \cup H\). Therefore, by Lemma 3.5, \({E}[\mathrm{val}(L)] \leqslant \mathrm {OPT}_{{\mathcal L}} \leqslant \frac{4}{1 - \sigma }m_G\), and by Lemma 3.6, \({E}[\mathrm{val}(H)] \leqslant \frac{3}{\sigma }m_1\). Combining with Lemma 3.3, we obtain that

$$\begin{aligned} \mathrm {OPT}= {E}[\mathrm{val}(L)] + {E}[\mathrm{val}(H)] \leqslant \frac{4}{1 - \sigma }m_G + \frac{2}{\sigma }m_1 \leqslant 12\mathrm {GRD}. \end{aligned}$$

3.3 Historical Notes, Variants, and Generalizations

The adaptive stochastic knapsack was first studied by Dean et al. [16]. The above greedy algorithm and the analysis are essentially due to them. With some extra effort, one can in fact show the greedy algorithm achieves an approximation factor of 7. Note that the greedy algorithm is in fact a non-adaptive policy, and we compare it against the optimal adaptive policy (which may not have a polynomial size representation). The crux of the analysis, which allows us to upper bound the optimal value, is Lemma 3.4. Although the approach is very similar to the linear programming relaxation technique used to approximate deterministic NP-hard problems, they have essential difference. In the deterministic case, we have the integer constraints on the variables, and we relax them. In our case, the variables should be interpreted as certain probabilities, which can be fractional. But we only have some necessary (but not sufficient) constraints on the probabilities (in the above problem, the constraint is \(\sum _{i\in {\mathcal L}} \mu _ix_i\leqslant 2\)). In Sect. 4, we will see another example of this idea.

In the journal version of their paper [20], Dean et al. presented a randomized variant of the greedy algorithm which can achieve a factor of 32 / 7. In the same paper, they proposed another useful idea that can further improve the approximation factor. Note that in expectation, the number of heavy items that can be inserted successfully is bounded by a small constant. Hence, it is possible to enumerate all possible ways to insert heavy items (by enumerating the decision tree). Using this idea, they managed to improve the approximation factor to 3.

The idea of enumerating the decision trees was carried further by Bhalgat et al. [21]. They showed that there exists a special class of policies called block-adaptive policies which, combined with their discretization technique, allows one to enumerate all possible decision trees for such policies in polynomial time. They used this idea to obtained a bi-criterion PTAS: their policy can achieve an expected value of \((1-\varepsilon )\mathrm {OPT}\), and the knapsack capacity may be violated by at most \(\varepsilon \), for any constant \(\varepsilon >0\). Typically, such a result can not be achieved by the LP technique. Using this result, Bhalgat obtained a \((2+\varepsilon )\)-approximation, which is currently the best known approximation factor for the adaptive stochastic knapsack problem. Their approach is further simplified and generalized in [8], using the Poisson approximation technique, which we will detail in Sect. 8.

3.3.1 Stochastic Knapsack with Correlations and Cancellations

Gupta et al. [22] considered a generalization, in which the value and the size of a job can be correlated (described by a joint distribution), and we can cancel a job in the middle. They provided an approximation algorithm with an approximation factor of 8, based on a new time-indexed LP relaxation. They also considered a further generalization, called the non-martingale bandit problem, and provided a constant factor approximation algorithm. Using the Poisson approximation technique, Li and Yuan [8] obtained a bi-criterion PTAS for the stochastic problem with correlations and cancellations. They also showed that the bi-criterion PTAS can be easily converted into polynomial time \((2 + \varepsilon )\)-approximation if we only allow cancellations. Ma [23] again used the LP technique, but with a different LP which corresponds to a dynamic program, and obtained a polynomial time algorithm that finds a \((2 + \varepsilon )\)-approximate adaptive policy for stochastic knapsack with correlations or cancellations if all the possible sizes of any item are integers. We conclude the discussion of the stochastic knapsack problem with the following obvious open questions:

Open Question 1 Is there a PTAS for the adaptive stochastic knapsack problem (without violating the capacity constraint)?

Note that it is known that certain variants of stochastic knapsack are PSPACE-hard to approximate within some constant factors [16, 19]. However, these results do not preclude an affirmative answer of the above question. Obtaining a better than 2 approximation ratio is also a very interesting open question, which seems to require new ideas. In fact, even for the basic version of the adaptive stochastic knapsack, where the sizes are random but the values are deterministic, whether the problem is PSPACE-hard is still open.

3.3.2 Stochastic Orienteering Problem

Stochastic orienteering is a generalization of stochastic knapsack. It is defined as follows:

Problem 3.8

(Stochastic Orienteering) We first introduce the deterministic version. We are given a metric (Vd), where V is the set of vertices and d is the distance metric, a budget B and an initial vertex \(r\in V\). Each vertex has a job associated with a reward and processing time. We assume that the travel time between two vertices u and v is proportional to d(uv). The goal is to compute a path starting from r that visits a subset of vertices and run the jobs on these vertices, so as to maximize the total reward, subject to the constraint that the travel time plus the job processing time do not exceed the budget B.

In the stochastic version, the reward and the processing time of each job are random variables. The distributions are also given as input. The goal now is to compute an adaptive policy that maximizes the expected total reward.

We can easily see that if all the distances are 0, the problem reduces to the stochastic knapsack problem. The problem is introduced in Gupta et al. [22]. They also showed that when the rewards are deterministic, there is a policy which is an O(1) approximation to the best non-adaptive policy (that is to visit the vertices in a fixed order) while it is an \(O(\log \log B)\) approximation to the best adaptive policy. It was attempting to believe that there is a non-adaptive policy which is a constant approximation of the best adaptive policy. Somewhat surprisingly, Bansal and Nagarajan [24] showed the gap between the best adaptive policy and the best non-adaptive policy is at least \(\varOmega \left( (\log \log B)^{1/2}\right) \), if the rewards are deterministic.

4 Stochastic Matching

In this section, we consider the following adaptive stochastic matching problems. We will provide a simple constant factor approximation algorithm using the LP-rounding technique. In Sect. 5, we will present a significant extension of this technique, called the contention resolution scheme, which can be used to handle several other combinatorial constraints as well.

4.1 Problem Definition

Problem 4.1

(Adaptive Stochastic Matching) We are given a probabilistic graph G(VE) on n nodes. We would like to find a maximum weighted matching in this graph. For each edge (uv), we are given a probability value \(p_{uv}\), a weight \(w_{uv}\). Each vertex u is associated with a positive integer \(t_{u}\) (the patience level). To find out whether the edge (uv) is present in G or not, we can “probe” the pair (uv) (assuming that u and v are still unmatched). If the edge (uv) indeed exists (which happens with probability \(p_{uv}\) is independent of other edges), it gets irrevocably added to our matching M. Moreover, we can probe at most \(t_u\) edges incident to vertex u (\(t_u\) can be thought as the patience level of vertex u). Our goal is to design an adaptive probing policy that maximizes the expected total weight of the matching M.

The problem is motivated by the kidney exchange program, which we now briefly describe. The problem and some of its variants also find important applications in the popular online dating application and online advertisement assignment (see [25] for more details).

A Motivating Example: Kidney Exchange The United Network for Organ Sharing (UNOS) launched in year 2000 the famous kidney exchange program, which has saved thousands of lives [26].Footnote 9 We briefly describe the program and how it can be modeled as the stochastic matching problem. Suppose a family member of the patient would like to donate a kidney to the patient but the kidney is incompatible with the patient’s body. The idea is to identify two incompatible patient/donor pairs such that each donor is compatible with the other pair’s patient, and then perform the kidney exchange between the two pairs. We can think each patient/donor pair as a vertex, and there is an edge between two vertices if the compatibility allows for an exchange. Of course, our goal is to find a large matching. To decide compatibility, three main tests must be performed. The first two tests, the blood-type test and the antibody screen, are fairly easy to perform, while the third, called crossmatching, is the most critical one, yet very expensive and time consuming. However, the compatibility can only be determined by this test. Therefore, if the crossmatch test is passed for a pair, the transplant should be performed immediately. Thus, we can estimate the probability that two pairs are compatible based on the initial two tests, model it by a probabilistic edge. A crossmatch test corresponds to a probe on an edge. If the probe is successful, we should include this edge in our matching (i.e., the exchange should be performed). The patience level for each vertex is also very natural: It models the fact that a patient will eventually die without a successful match.

4.2 Algorithm

Similar as the stochastic knapsack problem, we use a linear program to upper bound the optimal profit, and to guide our algorithm. Consider the following LP: for any vertex \(v\in V\), \(\partial (v)\) denotes the edges incident to v. Variable \(y_{e}\) denotes the probability that edge \(e=(u,v)\) gets probed in the adaptive strategy, and \(x_{e}=p_{e}\cdot y_{e}\) denotes the probability that u and v get matched in the strategy.

$$\begin{aligned} \text {maximize}\quad&\sum _{e\in E} w_{e}\cdot x_{e} \nonumber \\ \text {s.t.}&\sum _{e\in \partial (v)} x_{e} \leqslant 1, \quad \forall v\in V, \nonumber \\&\sum _{e\in \partial (v)} y_{e} \leqslant t_v, \quad \forall v\in V, \nonumber \\&x_{e} =p_{e}\cdot y_{e}, \quad \forall e\in E, \nonumber \\&0\leqslant y_{e} \leqslant 1, \quad \forall e\in E. \end{aligned}$$
(4.1)

Claim 4.2

The optimal solution of the above LP is an upper bound of the optimal expected profit of any adaptive policy, which we denote as OPT.

We first solve the LP to the optimal and let \(\{y^*_e\}_{e\in E}\) denote an optimal solution to this linear program. Let \(x^*_{e} =p_{e}\cdot y^*_{e}\).

Our rounding algorithm proceeds as follows. Fix a constant \(\alpha \geqslant 1\), to be specified later. The algorithm picks a uniformly random permutation \(\pi : E\rightarrow E\) on all edges. Then, it inspects the edges in the order of \(\pi \). We say an edge \(e=(u,v)\) is safe if both u and v are unmatched and both \(t_u\) and \(t_v\) are larger than zero. Whenever it is safe to probe the next edge \(e\in E\), the algorithm does so with probability \(\frac{y^*_e}{\alpha }\). If edge e indeed exists, we include e in the matching. Otherwise, we decrease the patience levels of both endpoints of e by 1. Note that the algorithm skips all unsafe edges at the time they appear in \(\pi \). See Algorithm 2 for the pseudocode.

figure b

4.3 Analysis

The approximation guarantee is summarized in the following theorem:

Theorem 4.3

For \(\alpha =4\), Algorithm 2 achieves an approximation factor of 8 against any adaptive policy for the stochastic matching problem.

We start with a simple lemma, which bounds the probability of the safeness of an edge when it is considered by the algorithm.

Lemma 4.4

For any edge \((u,v)\in E\), at the point when (uv) is considered under \(\pi \),

  1. (a)

    the probability that vertex u loses its patience is at most \(\frac{1}{2\alpha }\), and

  2. (b)

    the probability that vertex u is matched is at most \(\frac{1}{2\alpha }\).

Proof

We begin with the proof of part (a).

Let random variable U denote the number of probes incident to vertex u by the time edge (uv) is considered in \(\pi \).

$$\begin{aligned} {E}[U]= & {} \sum _{e\in \partial (u)} \Pr [\text{ edge } e\hbox { appears before } (u,v)\hbox { in }\pi \hbox { and }e \hbox { is probed}]\\\leqslant & {} \sum _{e\in \partial (u)} \Pr [\text{ edge } e\hbox { appears before }(u,v) \hbox { in }\pi ]\cdot \frac{y^*_e}{\alpha }= \sum _{e\in \partial (u)} \frac{y^*_e}{2\alpha } \quad \leqslant \quad \frac{t_u}{2\alpha }. \end{aligned}$$

The first inequality above follows from the fact that the probability that edge e is probed (conditioned on \(\pi \)) is at most \(y^*_e/\alpha \). Since \(\pi \) is a uniformly chosen random permutation on E, edge e appears before (uv) with probability 1 / 2. Hence the second equality follows. The last inequality is by the second constraint of LP (4.1).

By Markov inequality, the probability that vertex u has timed out when (uv) is considered equals

$$\begin{aligned} \Pr [U\geqslant t_u] \leqslant \frac{{{E}}[U]}{t_u}\leqslant \frac{1}{2\alpha }. \end{aligned}$$

This proves part (a). The proof of part (b) is almost identical: We consider the event that an edge is matched and replace \(y^*_e\) and \(t_u\) by \(x^*_e\) and 1, respectively, and use the first constraint of LP (4.1).

Having Lemma 4.4, the proof of Theorem 4.3 is quite simple now.

Proof of Theorem 4.3

Given that an edge \(e\in E\) is safe when considered, the expected profit from e is

$$\begin{aligned} w_e p_e y^*_e/\alpha =w_e x^*_e/\alpha . \end{aligned}$$

Consider the point when \(e=(u,v)\) is considered. By a union bound, we can see that

Hence, the total expected profit is

$$\begin{aligned} \sum _e w_e \frac{x^*_e}{\alpha } \Pr [e \text{ is } \text{ safe }] \geqslant \sum _{e} w_e x^*_e\frac{1-2/\alpha }{\alpha } \geqslant \frac{1-2/\alpha }{\alpha }\mathrm {OPT}. \end{aligned}$$

Plugging in \(\alpha = 4\) gives an approximation ratio of 8, as desired.

4.4 Related Results and Variants

The problem was first formulated by Chen at al. [27]. They provided a 4-approximation for the unweighted version of the problem (\(w_e=1\) for all edges e). In fact, they showed that the greedy policy of choosing the edge e with highest \(p_e\) to probe achieves an approximation factor of 4. Their analysis is directly on the relations between the decision trees of the optimal policy and that of the greedy policy. Adamczyk [28] improved their analysis and showed that the greedy algorithm is in fact a 2-approximation for unweighted stochastic matching. Note that this is tight since the greedy algorithm is a 2-approximation even in the deterministic setting. However, this greedy algorithm (and other simple greedy schemes) can be seen to be arbitrarily bad for the weighted version. The above algorithm is from Bansal et al. [25]. They also provided a more nuanced 4-approximation (3-approximation for bipartite graphs) based on the dependent rounding technique [29]. Very recently, Adamczyk et al. [30] improved the ratio to 2.845 for bipartite graphs and Baveja et al. [31] obtained a 3.224-approximation for general graphs.

Open Question 2 Is there a polynomial time approximation algorithm for the adaptive (unweighted or weighted) stochastic matching problem that can achieve an approximation ratio \(\alpha \) with \(\alpha <2\)?

4.4.1 Online Stochastic Matching

In an influential paper, Karp et al. [32] proposed the following online matching problem: There is a bipartite graph G(UVE), where the vertices in U are given, and the vertices in V arrive one by one in an online fashion. When a new vertex \(v\in V\) arrives, the edges incident on v are also revealed. We need to match v to an available vertex in U irrevocably, or leave v unmatched (in this case, we cannot match v later). Karp et al. gave a very simple and elegant randomized algorithm that achieves a competitive ratio of \(1-1/e\), which is optimal if the order of arrivals of vertices in V are chosen by an adversary. A major motivation application of the online matching problem is online advertising. Here, each node in U represents an advertiser/bidder and each node in v represents an ad slot/keyword.

The online stochastic matching problem is a stochastic variant of the online matching problem, which was initially studied by Feldman et al. [33]. In this model, the bipartite graph is known, but the sequence of arrivals are i.i.d. samples from a given distribution (instead of chosen by an adversary). Under some technical assumption, they gave a 0.67-competitive algorithm, beating the optimal \(1-1/e\)-competitiveness known for adversarial arrivals [32]. Some improved bounds on this model were obtained [3436]. The current best competitive ratio is 0.706 due to Jaillet and Lu [35], and the best upper bound of competitive ratio of any online algorithm is 0.832 due to Manshadi et al. [36]. Goel and Mehta [37] considered a slight different model in which the arrival sequence is a random permutation (known as random permutation model). They showed a greedy algorithm achieves a \((1-1/e)\) competitive ratio. The ratio was improved to 0.653 by Karande et al. [38] and 0.696 by Mahdian and Yan [39]. The origin of the random permutation model is the popular secretary problem, which we review briefly in Sect. 9.

Mehta and Panigrahi [40] considered another related model where even the distribution is not given in advance, rather it arrives online. More precisely, when a new vertex v in V comes, we know a set of \(\{p_{uv}\}_{u\in U}\) values. We have to assign v to an available vertex u in U, and then with probability \(p_{uv}\) the assignment is successful. If the assignment is not successful, we cannot assign v again, but u is still available for later arrivals. It is easy to see a simple greedy algorithm can achieve a competitive ratio of 1 / 2. Mehta and Panigrahi improved the ratio to 0.567 for equal and vanishing probabilities (i.e., \(p_{uv}=p\) for all \((u,v)\in E\) and \(p\rightarrow 0\)). They also showed that there is even no \((1-1/e)\) competitive algorithm for the problem. Recently, Mehta et al. [41] considered the more general case for unequal and vanishing probabilities, and presented a 0.534-competitive online algorithm, beating the trivial 1 / 2 factor.

4.4.2 The Adword Problem and Online Stochastic LP

A closely related problem is the Adwords Problem, which has important applications to sponsored search auctions. The problem is defined as follows. There are n advertisers and m keyword queries. The advertiser i has a budget \(B_i\). In each time slot, a query j comes and we need to allocate query j to some advertiser \(i\in [n]\). Let \(b_{ij}\) be the bid by advertiser i for query j. We use indicator variable \(x_{ij}\) to denote whether j is allocated to i. The revenue generated by the algorithm is \(\sum _i \min (B_i, \sum _j b_{ij}x_{ij})\).

We can see that the problem reduces to the online matching problem if \(B_i=1\) and \(b_{ij}=0\) or 1. The seminal paper by Mehta et al. [42] started the investigation of the problem and presented an optimal \((1-1/e)\) competitive algorithm in the worse case setting. Various stochastic settings (i.i.d. arrivals, random permutation models) have been studied extensively. The problem has been referred to as online stochastic packing LP in the literature (one can think the variables of an LP comes online, and the value of each new coming variable has to be determined immediately). An important quantity in the problem is the so-called budget-to-bid ratio \(\gamma =\min _i B_i/\max _{ij} b_{ij}\). The ratio is usually very large in the online advertising applications, and hence it would be interesting to see if we can achieve better competitive ratios in this case. In fact, it is possible to achieve a competitive ratio of \((1-\varepsilon )\) provided that \(\gamma \) is large enough (in terms of \(\varepsilon , n, m\)). Agrawal et al. [43] proved that \(\gamma \) should be at least \(\varOmega (\varepsilon ^{-2}\log m)\) in the random permutation model. After a series of papers [4345], Kesselheim et al. [46] presented an optimal algorithm which achieves the condition established by Agrawal et al. In a recent elegant work, Agrawal and Devanur [47] generalized the result to a broader class of online stochastic convex optimization, using Fenchel duality and the online mirror descent algorithm developed in online learning. Gupta and Molinaro [48] considered a different generalization to both packing and covering constraints. They also made extensive use of the results from online learning.

5 Adaptive Stochastic Probing Problem

In this section, we consider the following general problem defined by Gupta and Nagarajan [49]. V is a set of elements and \({\mathcal I}\) is a family of subsets of V. We say \((V, {\mathcal I})\) is a downward closed set systems if for any \(I\in {\mathcal I}\), if \(I'\subseteq I\), then \(I'\in {\mathcal I}\). Many combinatorial constraints are downward closed, such as matroid, knapsack, matching, and any intersection of them.

Problem 5.1

(Stochastic Probing) The stochastic probing problem is defined over a universe V of elements with weights \(w_e\) for each \(e\in V\). For each element \(e\in V\), there is a probability \(p_e\), indicating the probability that e is active. Each element is independent of all other elements. We are also given two downward closed set systems \((V, \mathcal {I}_{\mathrm {out}})\) and \((V, \mathcal {I}_{\mathrm {in}})\), which are called the outer packing constraint and the inner packing constraint, respectively. We can adaptively probe the elements. If element e is probed and happens to be active (with probability \(p_e\)), we should choose e into our solution irrevocably. We require that the set Q of elements we can probe must be in \(\mathcal {I}_{\mathrm {out}}\) and the set of chosen active elements must be in \(\mathcal {I}_{\mathrm {in}}\). Our goal is to design an adaptive policy which can choose a (random) set \(S\subset V\) of active elements, the expected weight of S is maximized.

The above problem naturally generalizes the stochastic matching problem: We can encode the patience level constraint by the outer packing constraint (\({\mathcal I}\) is the set of all subgraphs which satisfies \(\mathrm {deg}(v)\leqslant t_v\)). The inner packing constraints dictates that the chosen set of edges form a matching (\({\mathcal I}\) is the set of all matchings). The result in [49] applies to several important combinatorial packing constraints, such as matroid constraints, k-systems, k-column sparse packing integer programs. A key tool used here is an elegant abstraction of a class of LP rounding schemes, called contention resolution schemes, introduced by Chekuri et al. [50].

5.1 Contention Resolution Schemes

A very popular and powerful methodology for designing approximation algorithms is to solve an LP relaxation first, and then round the fractional LP solution to an integral one. In the context of submodular maximization, Chekuri et al. [50] proposed a class of rounding schemes called contention resolution (CR) schemes which can be used to obtain constant approximations for the submodular maximization problem under several combinatorial constraints.

Now, we define what is a CR scheme. There are a set N of n elements and the set of feasible solutions can be captured by a downward closed set system \({\mathcal I}\subset 2^N\). Let \({\mathcal P}({\mathcal I})\) be the polytope of the LP relaxation of the problem (so \({\mathcal I}\subseteq {\mathcal P}({\mathcal I})\)). We solve the LP to obtain a fractional optimal LP solution \(x \in {\mathcal P}({\mathcal I})\). We would like to round x to an integral near-optimal solution \(\pi (x)\in {\mathcal I}\) via the following process \(\pi \). For \(0\leqslant b\leqslant 1\) and \(x\in {\mathcal P}({\mathcal I})\), we use \(R(bx)\subseteq N\) to denote the random set obtained by choosing each element \(e\in N\) with probability \(bx_e\).

Definition 5.2

(CR Scheme) A (bc)-balanced CR scheme \(\pi \) for a downward closed set system \({\mathcal I}\) is a scheme such that for any \(x\in P_{\mathcal I}\), the scheme returns a set \(\pi (I)\subseteq I=R(bx)\) with the following property:

  1. 1.

    \(\pi (I)\in {\mathcal I}\);

  2. 2.

    \(\Pr [e\in \pi (I) \mid e\in I] \geqslant c\) for every element i.

A scheme is said to be monotone if \(\Pr [e\in \pi (R_1)] \geqslant \Pr [e\in \pi (R_2) ]\) whenever \(i\in R_1\subseteq R_2\). A scheme is said to be ordered if there is a permutation \(\sigma \) on N so that for any \( I =R(bx)\subseteq N\), the output of the scheme \(\pi (I)\) is the maximal independent subset of I obtained by considering elements in the order of \(\sigma \).

A CR scheme \(\pi (I)\) is in fact an algorithm that takes I as input and returns \(\pi (I)\). The first property guarantees that \(\pi (I)\) is a feasible solution in \({\mathcal I}\). The second property says that conditioning on \(e\in {\mathcal I}, e\) is selected by \(\pi \) with probability at least c. Hence, we can see that each element e is selected by the CR scheme with probability at least \(bcx_e\). Hence, if we want to maximize a linear function \(\sum _e w_e x_e\) s.t. \(x\in {\mathcal I}\), we can immediately obtain a bc-approximation from a (bc)-balanced CR scheme for \({\mathcal I}\). Monotone and ordered CR schemes will be particular useful for the stochastic probing problem.

We know efficient CR schemes for several important combinatorial constraints. For example, for any \(0<b\leqslant 1\), there is a \((b, (1-e^{-b}/b))\) CR schemes for matroids, a \((b, 1-kb)\) ordered CR scheme for k-systems.Footnote 10 See many more examples in [50].

Moreover, CR schemes have a very nice property that they can be combined for different constraints if they are all monotone. For example, if there is a monotone \((b,c_1)\)-CR scheme \(\pi _1\) for \(P_{{\mathcal I}_1}\) and a monotone \((b,c_2)\)-CR scheme \(\pi _2\) for \(P_{{\mathcal I}_2}\), one can combine them to obtain a monotone \((b, c_1c_2)\)-CR scheme \(\pi \) for \(P_{{\mathcal I}_1\cap {\mathcal I}_2}\), where \(\pi (x) = \pi _1(x) \cap \pi _2 (x)\). We can easily extend it to the intersection of more constraints as well.

5.2 Algorithm and Analysis for Stochastic Probing

Now, we show how a CR scheme enters the picture of the stochastic probing problem. In particular, we show the following general theorem. Plugging in the known results about CR schemes, we can get constant approximations for stochastic probing with various different inner and outer constraints.

Theorem 5.3

Consider an instance of the stochastic probing problem. Suppose the following hold:

  1. 1.

    There is a \((b,c_{\mathrm {out}})\)-CR scheme \(\pi _{\mathrm {out}}\) for \({\mathcal P}(\mathcal {I}_{\mathrm {out}})\);

  2. 2.

    There is a monotone \((b,c_{\mathrm {in}})\) ordered CR scheme \(\pi _{\mathrm {in}}\) for \({\mathcal P}(\mathcal {I}_{\mathrm {in}})\);

Then, there is a polynomial time approximation algorithm which can achieve an approximation factor of \(b(c_{\mathrm {out}}+c_{\mathrm {in}}-1)\).

Given an instance of the stochastic probing problem with inner constraint \((V,\mathcal {I}_{\mathrm {in}})\) and outer constraint \((V,\mathcal {I}_{\mathrm {out}})\), we consider the following LP relaxation:

$$\begin{aligned} \text {Maximize}\,\, \,\,&\sum _{e\in V} w_ex_e \\ \text {s.t.}\,\,\,\,&x_e = p_ey_e,\,\,\, \forall e\in V,\\&x\in {\mathcal P}(\mathcal {I}_{\mathrm {in}}), \\&y\in {\mathcal P}(\mathcal {I}_{\mathrm {out}}). \end{aligned}$$

We assume that the above LP relaxation can be solved efficiently. This LP relaxation generalizes (4.1) for stochastic matching, in which the first two constraints can be seen as the out constraint. Moreover, we can easily obtain an analogue of Claim 4.2.

Claim 5.4

The optimal value of LP is at least the optimal value OPT of the stochastic probing problem.

Now, we present the algorithm here.

figure c

Now, we analyze the performance of the algorithm. We want to show the expected value of our solution E[w(S)] is large compared to the optimal LP value \(\sum _e w_ex_e\). Let \(J\subset V\) be the (random) set of active elements. Let \(I\subset 2^V\) be the (random) set. We picked in step 2, and P be the set we selected using \(\pi _{\mathrm {out}}\) in step 3. From the algorithm, we can easily see that our solution \(S=\pi _{\mathrm {in}}(P\cap J)\).

Lemma 5.5

$$\begin{aligned} \Pr _{I, J, \pi _{\mathrm {out}},\pi _{\mathrm {in}}}[e\in S]= \Pr _{I, J, \pi _{\mathrm {out}},\pi _{\mathrm {in}}}[e\in \pi _{\mathrm {in}}(\pi _{\mathrm {out}}(I)\cap J)] \geqslant b(c_{\mathrm {out}}+c_{\mathrm {in}}-1)x_e. \end{aligned}$$

Proof

(sketch) First, we can see that \(\Pr [e\in I\cap J] = by_e\cdot p_e = bx_e\) by the definition of I. By the definition of CR schemes, we have \(\Pr [e\in P = \pi _{\mathrm {out}}(I) \mid e\in I\cap J] \geqslant c_{\mathrm {out}}\). As \(P\subset I\), thus

$$\begin{aligned}&\quad \quad \Pr [e\in \pi _{\mathrm {in}}(P\cap J)] \\&\quad = \Pr [e\in \pi _{\mathrm {in}}(P\cap J) \wedge e\in I\cap P\cap J] \\&\quad =\Pr [e\in I\cap P\cap J] - \Pr [e\notin \pi _{\mathrm {in}}(P\cap J) \wedge e\in I\cap P\cap J]\\&\quad = \Pr _{\pi }[e\in P | e\in I\cap J]\Pr _{I,J}[e\in I\cap J]- \Pr [e\notin \pi _{\mathrm {in}}(P\cap J) \wedge e\in I\cap P\cap J]\\&\quad \geqslant bx_e\cdot c_{\mathrm {out}}- \Pr [e\notin \pi _{\mathrm {in}}(P\cap J) \wedge e\in I\cap P\cap J]. \end{aligned}$$

The remaining part is to show \( K=\Pr [e\notin \pi _{\mathrm {in}}(P\cap J) \wedge e\in I\cap P\cap J]\leqslant (1-c_{\mathrm {in}})\cdot bx_e. \) It is not hard to see that

$$\begin{aligned} K\leqslant \Pr [e\notin \pi _{\mathrm {in}}(I\cap J) \wedge e\in I\cap J]= & {} \Pr [e\notin \pi _{\mathrm {in}}(I\cap J) \mid e\in I\cap J] \Pr [e\in I\cap J] \\\leqslant & {} (1-c_{\mathrm {in}})\cdot bx_e, \end{aligned}$$

where the first inequality is due to the monotonicity of \(\pi _{\mathrm {in}}\) and the final inequality is by the definition of the CR schemes.

Now, it is straightforward to show Theorem 5.3. Indeed, we can see that

$$\begin{aligned} E[w(S)] = \sum _e w_e \Pr [e\in S] \geqslant b(c_{\mathrm {out}}+c_{\mathrm {in}}-1)\sum _e w_ex_e \geqslant b(c_{\mathrm {out}}+c_{\mathrm {in}}-1)\mathrm {OPT}. \end{aligned}$$

5.3 Related Problems and Results

5.3.1 Bayesian Online Selection

A closely related model is the Bayesian online selection problem (BOSP), defined as follows. We are given a set of elements V. Each element e is associated with a non-negative random variable \(X_e\) with known distributions. The family of feasible solutions \({\mathcal I}\subseteq 2^V\) (encoded by some combinatorial constraint). We can adaptively choose to observe the elements one by one. Once we see the true value of \(X_e\), we have to decide irrevocably whether to choose the element.Footnote 11 The goal is designing a policy which chooses a feasible subset \(S\in {\mathcal I}\), and maximizes the expected total value \(E\left[ \sum _{e\in S}X_e\right] \). In some setting, the order is chosen by an adversary instead of the policy. Chawla et al. [51] proposed a simple mechanism, called sequential posted pricing mechanism (SPM), for Bayesian single-parameter auctions. SPMs are closely related to, and in fact serve as important motivations for the stochastic probing problem and BOSP. See [8, 49, 5155].

5.3.2 Prophet Inequality

In fact, the special case of BOSP where we can only choose one element was studied by Krengel et al. back in 1970s [56]. They provided a selection algorithm which returns a single element of expected value at least half of \(E[\max _{e\in V} X_e]\), i.e., half of the expected value obtained by an offline algorithm which knows the realization upfront. More precisely, let e be the element that their algorithm returns. It holds that

$$\begin{aligned} E[X_e] \geqslant \frac{1}{2}E[\max _{e\in V} X_e]. \end{aligned}$$

Such equalities are often called the prophet inequalities. Recently, Kleinberg and Weinberg [55] significantly generalized the above result for the general matroid constraints (the constraint of choosing one element is just a uniform matroid of rank 1).

5.3.3 Online CR Schemes

Recently, Feldman et al. [53] introduced an extension of CR schemes, called online contention resolution schemes (OCRS). An OCRS is almost the same as the original CR scheme, except that the adversary chooses the order of the elements in I, and the OCRS needs to decide irrevocably whether to choose the element. They provide OCRS for a number of combinatorial constraints, such as matroids, matchings, knapsacks, and their intersections, and applied to BOSP, stochastic probing, and SPM. See their paper for more details.

6 Two-Stage Stochastic Optimization: LP Approach

In this section, we consider two-stage stochastic optimization models. In particular, we study the two-stage recourse models: In the first stage, only distributional information is available, and one commits on some initial actions. In the second stage, the actual data is revealed and we can take additional actions (called recourse actions) to augment the initial solution to a feasible solution. The cost for the recourse action can be more expensive than the initial action. The stochastic set cover problem (Problem 1.3) is a typical example in this model.

Let us recall the notations. Let U be the set of ground elements and \({\mathcal S}\) the weighted family of subsets of U. The actual demand (i.e., the set of elements we need to cover) follows a distribution \(\mathcal {D}\). We can select \({\mathcal S}^\mathrm{I}\subseteq {\mathcal S}\) as the first-stage action. In the second stage, the actual demand set \(A\subseteq U\) (\(A\sim \mathcal {D}\)) is revealed. We can select some extra subsets \({\mathcal S}^\mathrm{II}_A\subseteq {\mathcal S}\) as the recourse to cover the remaining elements of A. The inflation factor is \(\lambda >1\), that is a set in the second stage is \(\lambda \) times more expensive than in the first stage. We call a possible demand set \(A\subseteq U\) a scenario. Let \(p_A\) be the probability of scenario A. The goal is to minimize \(w({\mathcal S}^\mathrm{I})+{{E}}_A[w({\mathcal S}^\mathrm{II}_A)]\).

We use the following LP relaxation. \(x_S\) denotes whether the set S is chosen in the first stage, and \(y_{A,S}\) denotes whether the set S is chosen in scenario A.

$$\begin{aligned} \text {Minimize}\,\,&\sum _{S\in {\mathcal S}} w_Sx_S + \lambda \sum _{A, S\in {\mathcal S}} p_A w_Sy_{A,S}&\\ \text {s.t.}\,\,&\sum _{S:e\in S} x_S + \sum _{S:e\in S} y_{A,S} \geqslant 1,&\forall A, e\in A,\\&x_S, y_{A,S} \geqslant 0,&\forall S, A. \end{aligned}$$

The above LP is usually called a stochastic LP. The first constraint indicates that the combined first- and second-stage actions are a set cover for any scenario A. If there are only a polynomial number of scenarios and all \(\{p_A\}\) are given explicitly, we can solve the stochastic LP in polynomial time. However, in most applications, there can be an exponential number of scenarios given in an implicit manner. For example, each element is in A with certain probability independent of others. A more general and more challenging model is the so-called black-box model, in which we are only given a procedure from which we can sample the scenarios from \(\mathcal {D}\). We cannot even write down the stochastic LP explicitly for the black-box model. Nevertheless, Shmoys and Swamy [57] showed that there is a polynomial time approximation scheme for solving the LP by taking at most polynomial number of samples from the black box if \(\lambda \) is bounded by a polynomial of the input size. Their algorithm is an adaptation of the ellipsoid algorithm. Later, they presented another approximation schemes based on the sample average approximation (SAA) method. SAA is a very natural approach for stochastic optimization. We simply sample N scenarios from the black box. Then, we solve sample-average problem, which is a deterministic problem. Shmoys and Swamy [58] showed that we only need a polynomial number of samples, under some mild conditions. Charikar et al. [59] provided an alternative and somewhat simpler proof for the same result. In fact, their approach is more general: It does not only hold for stochastic LP, and can also be used for the computational intractable problems, where only an approximation algorithm for the deterministic sample-average problem is known.

Now, we come back to the stochastic set cover problem. We assume the polynomial-scenario model. For the black-box model, we only pay an extra factor of \((1+\varepsilon )\) for any constant \(\varepsilon >0\) based on the above discussion.

Theorem 6.1

There is a \(2(\ln n+1)\) factor approximation algorithm, where n is the number of elements.

Proof

We first consider the standard LP relaxation for set cover with universe U.

$$\begin{aligned} \text {minimize} \,\,\,\sum _{S\in {\mathcal S}} w_S x_S \quad \text {s.t.} \quad \sum _{S: e\in S} x_S\geqslant 1, \,\,\forall e\in U;\quad x_S\geqslant 0, \,\,\forall S\in {\mathcal S}. \end{aligned}$$

Let the optimal LP value be LP(U). It is long known that a standard independent rounding algorithm can find an integral set cover with cost at most \(O(\log n\)LP(U)) with high probability. Using the exponential clock algorithm by Buchbinder et al. [60], one can get an integral set cover with cost at most \((\ln n+1)\) LP(U), with probability 1. Denote this algorithm by \({\mathcal A}\). Let \(\alpha =\ln n+1\).

For any element e and scenario A, we can see that either \(\sum _{S:e\in S} x_S \geqslant 1/2\) or \(\sum _{S:e\in S} y_{A,S} \geqslant 1/2\). Consider \(E = \{e: \sum _{S:e\in S} x_S \geqslant 1/2\}. \{2x_S\}\) is a feasible fractional solution for the set cover instance with universe E. Running algorithm \({\mathcal A}\), we get an integral solution covering E with cost at most \(\alpha \)LP\((E)\leqslant \alpha \sum _{S} 2x_Sw_S\). In the second stage, suppose the scenario is A. \(\{2y_{A,S}\}\) is a feasible fractional solution covering all elements in \(A\setminus E\). Hence, we can get an integral solution covering \(A\setminus E\) with cost at most \(\alpha \)LP\((A\setminus E)\leqslant \alpha \lambda \sum _{S} 2y_{A,S} w_S\). So, the overall expected cost is at most \(\alpha \sum _{S} 2x_Sw_S+\alpha \lambda \sum _A p_A \sum _{S} 2y_{A,S} w_S.\) Thus we have a \(2\alpha \)-approximation algorithm for stochastic 2-stage set cover problem.

7 Two-Stage Stochastic Optimization-Boosted Sampling

In this section, we introduce another classical technique for dealing with two-stage stochastic models, called boosted sampling, introduced by Gupta et al. [61]. We use the minimum rooted Steiner tree problem as an example to illustrate the technique.

Problem 7.1

(Two-Stage Stochastic Rooted Steiner Tree) In the deterministic rooted Steiner tree problem, we are given an edge-weighted graph G(VE) with edge weight \(w:E \rightarrow {\mathbb {R}}^+\) satisfying triangle inequality and a root \(r\in V\). We are also given a set of terminals \(S\subseteq V\), and we want to choose a tree of the minimum total weight to connect all terminals and the root r. In the two-stage stochastic model, there is a probability distribution \(\mathcal {D}\) on the set of terminals. A set of edge \(E_0\) may be bought in the first stage under cost \(\{w_e\}_{e\in E}\). In the second stage, the real set \(S\sim \mathcal {D}\) of terminals is revealed and we may need to buy some extra edges \(E_S\) so that \(E_0\cup E_S\) is a feasible solution. However, we need to pay \(\sigma w_e\) for edge e in the second stage, where \(\sigma >1\) is the inflation factor (which is also an input). The goal again is to minimize the expected cost of the solution, that is \(w(E_0)+{{E}}_S[\sigma w(E_S)]\).

7.1 Cost-Sharing Functions

We will make a crucial use of cost-sharing functions in the analysis of the algorithm. Consider the deterministic Steiner tree problem. There is a client in each terminal. We would like to figure out a way to divide the cost of the Steiner tree among the client set S. Now, we formally define what is a cost-sharing function. Suppose there is an \(\alpha \)-approximation algorithm \({\mathcal A}\) that solves the Steiner tree problem. Moreover, we require existence of a polynomial time algorithm \(\mathrm {Aug}_{\mathcal A}\) (called augmentation algorithm) that can augment a solution \({\mathcal A}(S)\) to a feasible solution for \(S'\) for \(S\subset S'\).

Definition 7.2

The function \(\xi : 2^V\times V \rightarrow {\mathcal {R}}\) is a \(\beta \)-strict cost-sharing function if the following properties hold:

  1. 1.

    (Positivity) For a set \(S\subset V\), \(\xi (S, j) > 0\) only for \(j\in S\).

  2. 2.

    (Fairness) For a set \(S\subset V\), \(\sum _{j\in S} \xi (S, j) \leqslant w(\mathrm {OPT}(S))\).

  3. 3.

    (Strictness) If \(S' = S\cup T\) for \(S\cap T = \varnothing \), then \(\sum _{j\in T}\xi (S', j) \geqslant \frac{1}{\beta } \,\,\,\times \) cost of augmenting the solution \({\mathcal A}(S)\) to a solution of \(S'\) using \(\mathrm {Aug}_{\mathcal A}\).

Define \(\xi (S, A) = \sum _{j\in A} \xi (S, j)\).

In some cases, \(\mathrm {Aug}_{\mathcal A}\) can be obtained by zeroing out the costs of elements picked in \({\mathcal A}(S)\) and run \({\mathcal A}\) again. In this case, we typically have \(\beta = \alpha \). In general case, \(\beta \) is usually larger because it is usually easier to find a better approximation algorithm than to find a better augment algorithm. Now, we provide a cost-sharing function for the Steiner problem, which will be useful later.

Theorem 7.3

There is a 2-approximation algorithm \({\mathcal A}\) with a 2-strict cost-sharing function \(\xi \) for the rooted minimum Steiner tree problem.

Proof

Given a set of terminals S, \({\mathcal A}\) compute a minimum spanning tree on \(S\cup \{r\}\). It is well known that the cost of \({\mathcal A}(S)\) is within 2 times the cost of the optimal minimum Steiner tree of S.

With \({\mathcal A}\), we determine the value of \(\xi (S, j)\) according to \({\mathcal A}\). For any set S and vertex j, if \(j\in S\), set \(\xi (S, j) = w_{(p_j, j)} / 2\), where \(p_j\) is the parent of j in the spanning tree \({\mathcal A}(S)\); if \(j\not \in S\), set \(\xi (S, j) = 0\). Clearly, \(\sum _{j\in S} \xi (S, j) = \sum _{j\in S} w_{(p_j, j)} / 2 = w({\mathcal A}(S)) / 2 \leqslant w(\mathrm {OPT}(S))\). So we have shown the fairness.

To show the strictness, we first define \(\mathrm {Aug}_{\mathcal A}\). Let \(S'=S\cup T\). \(\mathrm {Aug}_{\mathcal A}\) first zeros out the edges in \({\mathcal A}(S)\) and then run Prim’s algorithm. The solution obviously include \({\mathcal A}(S)\). For each terminal \(j\in T\), let \(p_j\) be its parent in \({\mathcal A}(S\cup T)\). Consider the execution of Prim, edge \((j,p_j)\) is added by \(\mathrm {Aug}_{\mathcal A}\) as well. So the cost of augmenting the solution \({\mathcal A}(S)\) to a solution for \(S'\) is exactly \(\sum _{j\in T} w_{(j,p_j)}\). We can also see that \(\sum _{j\in T}\xi (S', j)=1/2 \sum _{j\in T} w_{(j,p_j)}\), which shows \(\beta =2\).

We can in fact use the approximation algorithm proposed in [62] and the same cost-sharing function as above.

Theorem 7.4

There is a 1.39-approximation algorithm, along with a 2-strict cost-sharing function for the rooted minimum Steiner tree problem.

If the problem is that you are given two sets ST separately and required to obtain a solution for \(S\cup T\), you can first compute an \(\alpha \)-approximation solution \(\varPhi (S)\) for S, then augment it to a solution of \(S\cup T\), \(\varPhi (S\cup T)\), using \(\mathrm {Aug}_{\mathcal A}\). As

$$\begin{aligned} w(\varPhi (S\cup T)) - w(\varPhi (S))\leqslant & {} \beta \xi (S\cup T, T) \leqslant \beta \xi (S\cup T, S\cup T) \\\leqslant & {} \beta w({\mathrm{OPT}}(S\cup T)) \end{aligned}$$

and

$$\begin{aligned} w(\varPhi (S)) \leqslant \alpha w({\mathrm{OPT}}(S)) \leqslant \alpha w({\mathrm{OPT}}(S\cup T)), \end{aligned}$$

\(\varPhi (S\cup T)\) is a \((\alpha + \beta )\)-approximation solution for \(S\cup T\).

7.2 Stochastic Steiner Trees

Now, let us come back to the two-stage stochastic Steiner tree problem. Consider the following algorithm.

figure d

Now, we analyze the performance of the above algorithm.

Theorem 7.5

Let \({\mathcal A}\) be an \(\alpha \)-approximation algorithm for minimum rooted Steiner tree problem that admits a \(\beta \)-strict cost-sharing function. Then Boost-and-Sample is an \((\alpha + \beta )\)-approximation algorithm for the 2-stage stochastic Steiner tree problem.

Proof

Let \(F_0^*\) be the first-stage component of the optimal solution, and \(F_S^*\) be the second-stage component for scenario S. Hence the optimal cost is

$$\begin{aligned} Z^* = w(F_0^*) + \sigma {{E}}_S[w(F_S^*)]. \end{aligned}$$

Now consider the cost of Boost-and-Sample. Without loss of generality, assume \(\sigma \) is an integer. We use \(\mathrm {Sols}(S)\) to denote the set of feasible solutions for terminal set S.

Consider the first stage. Define \(\hat{F_1} = F_0^*\cup F_{D_1}^*\cup \cdots \cup F_{D_\sigma }^*\). Thus \(\hat{F_1} \in \mathrm {Sols}(D)\). Moreover, we can see that

$$\begin{aligned} {{E}}_D[w(\hat{F_1})] \leqslant w(F_0^*) + \sum _i {{E}}_D[w(F_{D_i}^*)]=w(F_0^*) + \sigma {{E}}_{D_i}[w(F_{D_i}^*)] = Z^*. \end{aligned}$$

Since \({\mathcal A}\) is an \(\alpha \)-approximation algorithm, the expected cost of our first-stage solution satisfies \({{E}}_D[w(F_0)] \leqslant \alpha {{E}}_D[w(\hat{F_1})] \leqslant \alpha Z^*\).

Next consider the second stage, which is slightly trickier. Note that S follows the same distribution as any \(D_i\). The key is to consider an alternate process to generate the sets \(D_i\) and S: Draw \(\sigma + 1\) scenarios \(\hat{D_1}, \cdots , \hat{D}_{\sigma + 1}\) from the distribution \(\mathcal {D}\), choose a random K uniformly from \(\{1, 2, \cdots , \sigma + 1\}\), and set \(S = \hat{D}_K\) and \(D = \cup _{i\ne K} \hat{D}_i\). Now let \(\hat{D} = \cup _{i = 1}^{\sigma + 1} \hat{D}_i\) and \(\hat{D}_{-i} = \cup _{l \ne i} \hat{D}_l\). Intuitively, D has \(\sigma \) copies of \(D_i\), but \(\hat{D}\) has \(\sigma +1\) copies. So \({{E}}_{\hat{D}}[w(\mathrm {OPT}(\hat{D}))]\) is not much larger than \({{E}}_{\hat{D}}[w(\mathrm {OPT}(D))]\). Moreover, S is simply one of the \(\sigma +1\) copies. The cost for connecting S should be only \(1/(\sigma +1)\) of \(w(\mathrm {OPT}(\hat{D}))\). We provide the details below.

Recall \(F_S\) is the set of edges we add in the second stage. According to the \(\beta \)-strictness, we can see \(w(F_S) \leqslant \beta \xi (D\cup S, S\setminus D)\). By the fairness, we can see that \( \sum _{i=1}^{\sigma +1} \xi (\hat{D}, \hat{D}_i \setminus \hat{D}_{-i}) \leqslant w(\mathrm {OPT}(\hat{D})). \) From the process, we can see \(\xi (D\cup S, S\setminus D)\) is just a random term on the left hand side of the above inequality. Hence, we obtain that

$$\begin{aligned} {{E}}_{D,S}[\xi (D\cup S, S\setminus D)] \leqslant \frac{1}{\sigma + 1}{{E}}_{\hat{D}}[w(\mathrm {OPT}(\hat{D}))]. \end{aligned}$$

Moreover, by the sub-additivity, it holds that

$$\begin{aligned} {{E}}_{\hat{D}}[w(\mathrm {OPT}(\hat{D}))] \leqslant w(F_0^*)+\sum _{i=1}^{\sigma +1} {{E}}\left[ w\left( F^*_{\hat{D}_i}\right) \right] \leqslant \frac{\sigma + 1}{\sigma }Z^*. \end{aligned}$$

Combining the above inequalities, we get \(E_S[\sigma w(F_S)] \leqslant \beta Z^*\). Overall, the total expected cost is at most \((\alpha + \beta )Z^*\).

7.3 Related Results

Using the boosted sampling technique, Gupta et al. [61] provided constant factor approximation algorithms for several two-stage stochastic optimization problems, including Steiner tree, vertex cover, facility location, and Steiner network. Note that their framework works even if \(\mathcal {D}\) is given by the black-box model. However, if we assume the nodes are independent, they showed that it is possible to obtain improved approximation algorithms.

The work has generated a number of followups. We mention a very small subset here. Gupta and Pál [63] studied the unrooted version of the two-stage stochastic Steiner tree problem, and provided the first constant factor approximation algorithm. The problem is exactly the same as we defined before, except that there is no root r. This case is technically more challenging since the problem is not sub-additive any more.

Gupta and Kumar [64] generalized the above result by providing the first constant factor approximation algorithm for the two-stage stochastic Steiner forest problem,Footnote 12 using a primal-dual LP approach. In fact, the cost-sharing function in the last section is closely related to the dual variable. In a very recent work, Gupta and Kumar [65] showed a simple greedy algorithm for Steiner Forest problem with an approximation ratio of 96 with 2 880-strict cost-sharing function, which implies a 2 976 approximation for the two-stage stochastic Steiner forest problem.

In another generalization, Gupta et al. [66] considered the two-stage stochastic Steiner tree problem with non-uniform inflation factors, i.e., the cost of each edge in the second stage may be inflated by a different factor. They provided the first poly-logarithmic approximation factor and showed an approximation hardness of \(\varOmega (\log \log n)\).

7.4 Other Two-Stage or Multi-Stage Models

Ravi and Sinha [67] and Immorlica et al. [68] initiated the study on approximation algorithms for two-stage stochastic combinatorial optimization with recourse. They provided approximation algorithms for several problems for the polynomial scenario and independent-activation settings. Later, various techniques, such as boosted sampling, stochastic LP, SAA, were developed for handling the black-box model. We refer interested readers to [69] for an early survey. In Table 1, we list the current best known approximation ratios for the two-stage stochastic models of several other classic combinatorial problems. Note that here the objective is to minimize the expected overall cost.

Table 1 Best known approximation ratios for two-stage stochastic optimization problems with recourse

Many of the above results have been generalized to k stages for arbitrary constant k. See e.g., [58, 63, 72]. There are also several closely related models proposed in the literature. We review some of them here.

7.4.1 Two-Stage Robust Optimization

Robust optimization takes a more conservative viewpoint in decision making under uncertainty. The goal in robust optimization is to minimize the worst-case cost over all possible scenarios, instead of minimizing the average cost over scenarios, as done in stochastic optimization problems. The setting of a two-stage robust optimization problem is almost the same as the two-stage stochastic optimization counterpart, except that there is no probability distribution over the scenarios, and the objective is to minimize \(w(x) + \max _A f_A(x, y_A)\). Here x is the first-stage action and \(y_A\) is the recourse action for scenario A. Such model and its variations have also been studied extensively in the last decade. We refer interested readers to the following papers and the references therein [7378].

7.4.2 Two-Stage Risk-Averse Stochastic Optimization

The problem setting is exactly the same as Sect. 6 except that the goal is to minimize the expected total cost subject to the constraint

$$\begin{aligned} \mathop {\Pr }\limits _A[\,\text {2nd stage cost for scenario } A > \theta ] \leqslant \rho . \end{aligned}$$

Here, \(\theta \) and \(\rho \) are input parameters as well. Such constraints are called threshold probability constraints, or chance constraints, and used to model risk-averse behaviors of the decision maker.

Such chance constraints can be captured using similar LP relaxation as well. For the black-box model, Swamy [79] proposed an FPTAS for solving such LP and showed how to round the LP for several combinatorial problems.

8 Fixed-Set Problems and Stochastic Knapsack: Poisson Approximation

In this section, we introduce the Poisson approximation technique and apply it to the fixed-set stochastic optimization problems, and the adaptive stochastic knapsack problems. The technique is very useful to handle the distribution of the sum of independent random variables (not necessarily identical), which appears quite frequently in various stochastic optimization problems.

We first illustrate the technique by considering the fixed-set stochastic shortest path in Problem 1.1. The result can be generalized to a wide class of fixed-set problems (see [8]).

Recall each edge e has a non-negative random length \(\mu _e\). The objective is to find an s-t path P such that \(\Pr [\sum _{e\in P}\mu _e \leqslant {C}]\), is maximized. Without loss of generality, we can assume \({C}=1\) and all \(\mu _e\)s take values from [0, 1]. For ease of notation, for a set S of edges, we write \(\mu (S)=\sum _{e\in S}\mu _e\). Let \(S^*\) denote the optimal feasible set and \(\mathrm {OPT}=\Pr [\mu (S^*)\leqslant 1]\) the optimal value.

Theorem 8.1

For any \(\varepsilon >0\), there is a polynomial time approximation algorithm for finds an s-t path S such that

$$\begin{aligned} \Pr [\mu (S)\leqslant 1+\varepsilon ]\geqslant \Pr [\mu (S^*)\leqslant 1] - \varepsilon , \end{aligned}$$

where \(S^*\) is the optimal path.

We start by a lemma saying that if the \(\Pr [\mu (S)\leqslant 1]\) is not negligible, \({\mathbb {E}}\left[ \mu (S)\right] \) can not be very large.

Lemma 8.2

Suppose each edge e has a non-negative random weight \(\mu _e\) taking values from [0, 1]. Then, for any set S of edges, and \(\frac{1}{2} > \varepsilon > 0\), if \(\Pr [\mu (S)\leqslant 1] \geqslant \varepsilon \), then \({{E}}\left[ \mu (S)\right] \leqslant 3/\varepsilon \).

Intuitively, if \({C}\left[ \mu (S)\right] \) is very large, \(\mu (S)\) should be large with high probability (hence \(\Pr [\mu (S)\leqslant 1]\) should be very small). The proof is not difficult and can be found in [8].

If \(\mathrm {OPT}\leqslant \varepsilon \), then there is nothing to do since any feasible solution achieves the desired approximation guarantee. Hence, we focus on the other case where \(\mathrm {OPT}>\varepsilon \). We call an edge e heavy edge if \({C}[\mu _e]>\varepsilon ^{10}\). Otherwise we call it light. By Lemma 8.2, we can see that the number of heavy edges in \(S^*\) is at most \(\frac{3}{\varepsilon ^{11}}\).

8.1 Enumerating Heavy Elements

We enumerate all possible set of heavy edges with size at most \(3/\varepsilon ^{11}\). There are at most \(n^{3/\varepsilon ^{11}}\) such possibilities. Suppose we successfully guess the set of heavy edges in \(S^*\). In the following parts, we mainly consider the question that given a set H of heavy edges, how to choose a set L of light edges such that their union S is a feasible solution, and \(\Pr [\mu (S)\leqslant 1+\varepsilon ]\) is close to optimal.

8.2 Dealing with Light Elements

Unlike heavy edges, there could be many light edges in \(S^*\). Handling such edges involves two techniques. The first is the discretization, which transforms each distribution to one with a constant size support in [0, 1].

The second key ingredient is the Poisson approximation technique. We apply a theorem of Le Cam [80], which shows that the distribution of the sum of the discretized weights of light edges is very close to a compound Poisson distribution, which can be completely determined by a constant dimensional vector (which we call the signature of L).

Then, we enumerate all possible signatures (there are only polynomial number of them), and check whether there is an s-t path \(S=L\cup H\) (where H is the set of heavy edges we enumerate, and L is the set of light edges in S), such that the signature of L is the enumerated signature.

8.3 Discretization

We discuss how to discretize the size distributions for edges, using parameter \(\varepsilon \).

We say that edge e realizes to a “large” size if \(\mu _e>\varepsilon ^4\). Otherwise we say that e realizes to a “small” size. We use \(\widetilde{\mu }_e\) to denote the size after discretization and \(\widetilde{\pi }_e\) its distribution. The discretization consists of following two steps:

  1. 1.

    Small-size region: In the small-size region, \(\widetilde{\mu }_e\) follows a Bernoulli distribution, taking only values 0 and \(\varepsilon ^4\). The probability values \(\Pr [\widetilde{\mu }_e=0]\) and \(\Pr [\widetilde{\mu }_e=\varepsilon ^4]\) are set such that \( {{E}}[\widetilde{\mu }_e \mid \mu _e \leqslant \varepsilon ^4] = {{E}}[\mu _e \mid \mu _e \leqslant \varepsilon ^4]. \)

  2. 2.

    Large-size region: If \(\mu _e\) realizes to a large size, we simply discretize it as follows: Let \(\widetilde{\mu }_e = \lfloor \frac{\mu _e}{\varepsilon ^5}\rfloor \varepsilon ^5\) (i.e., we round a large size down to a multiple of \(\varepsilon ^5\)). We denote the set of the discretized sizes by \(\mathcal {S}=\{\mathrm {s}_0, \mathrm {s}_1,\cdots , \mathrm {s}_{z-1}\}\) where \(\mathrm {s}_0 = 0, \mathrm {s}_1 = \varepsilon ^5, \mathrm {s}_2 = 2\varepsilon ^5, \mathrm {s}_3 = 3\varepsilon ^5, \cdots , \mathrm {s}_{z-1}\). Note that \(\mathrm {s}_1=\varepsilon ^5, \cdots , \mathrm {s}_{1/\varepsilon -1}=\varepsilon ^4-\varepsilon ^5\) are also included in \(\mathcal {S}\), even though their probability is 0. It is straightforward to see that \( |\mathcal {S}|= z= O(1/\varepsilon ^5)\). This finishes the description of the discretization.

It is not difficult to show the behavior of the sum of their discretized distributions is very close to that of their original distributions.Footnote 13 The proof of the following lemma is completely standard and omitted here.

Lemma 8.3

Let S be a set of edges such that \({{E}}\left[ \mu (S)\right] \leqslant 3/\varepsilon \). We have that

  1. 1.

    \(\Pr [\mu (S) \leqslant 1] \leqslant \Pr [\widetilde{\mu }(S) \leqslant 1 + \varepsilon ] + O(\varepsilon )\);

  2. 2.

    \(\Pr [\widetilde{\mu }(S) \leqslant 1] \leqslant \Pr [\mu (S) \leqslant 1 + \varepsilon ] + O(\varepsilon )\).

8.4 Poisson Approximation

We use \(\widetilde{\pi }_e\) to denote the distribution of the discretized edge weight \(\widetilde{\mu }_e\), i.e., \(\widetilde{\pi }_e(\mathrm {s}_i)=\Pr [\widetilde{\mu }_e=\mathrm {s}_i]\). For an edge e, we define its signature to be the vector

$$\begin{aligned} \mathrm {Sg}(e)=\bigl (\overline{\pi }_e(\mathrm {s}_1), \overline{\pi }_e(\mathrm {s}_2), \overline{\pi }_e(\mathrm {s}_3), \cdots , \overline{\pi }_e(\mathrm {s}_{z-1})\bigr ), \end{aligned}$$

where \(\overline{\pi }_e(\mathrm {s}) = \left\lfloor \widetilde{\pi }_e(\mathrm {s})\cdot \frac{n}{\varepsilon ^6}\right\rfloor \cdot \frac{\varepsilon ^6}{n}\) for all nonzero discretized size \(\mathrm {s}\in \mathcal {S}\setminus \{0\} =\{ \mathrm {s}_1, \mathrm {s}_2, \cdots , \mathrm {s}_{z-1} \}\). For a set S of edges, its signature is defined to be the sum of the signatures of all edges in S, i.e.,

$$\begin{aligned} \mathrm {Sg}(S)=\sum _{e\in S}\mathrm {Sg}(e). \end{aligned}$$

We use \(\mathrm {Sg}(S)_k\) to denote the kth coordinate of \(\mathrm {Sg}(S)\). By Lemma  8.2, \(\sum _{k=1}^{z-1}\mathrm {Sg}(S)_k \cdot \mathrm {s}_k=\sum _{k=1/\varepsilon }^{z-1}\mathrm {Sg}(S)_k \cdot \mathrm {s}_k \leqslant 3/\varepsilon \). Thus \(\mathrm {Sg}(S)_k\leqslant 3/\varepsilon ^5\) for all k. Therefore, the number of possible signatures is bounded by \(\left( 3n/\varepsilon ^{11}\right) ^{|\mathcal {S}|-1}\), which is polynomial in n.

We will show shortly that it suffices to enumerate all possible signatures for light edges. For this purpose, we need a notation to measure the difference between two distributions, called the total variation distance (also call statistical distance), defined as follows (for discrete distributions):

$$\begin{aligned} \Delta \Bigl (X, Y\Bigr ) =\sum _{k}\Bigl |\Pr [X=k]-\Pr [Y=k]\Bigr |. \end{aligned}$$

Obviously, if \(\Delta (X, Y)=0\), two distributions are identical. The following lemma shows that if the signatures of two sets are the same, the total variation distance between their distributions is very small.

Lemma 8.4

Let \(S_1, S_2\) be two sets of light edges such that \(\mathrm {Sg}(S_1)=\mathrm {Sg}(S_2)\) and \({{E}}\left[ \widetilde{X}(S_1)\right] \leqslant 3/\varepsilon , {{E}}\left[ \widetilde{X}(S_2)\right] \leqslant 3/\varepsilon \). Then, the total variation distance between \(X(S_1)\) and \(X(S_2)\) satisfies

$$\begin{aligned} \Delta \left( \widetilde{X}(S_1), \widetilde{X}(S_2)\right) = \sum _\mathrm {s}\, \left| \,\Pr \left[ \widetilde{X}(S_1) = \mathrm {s}\right] -\Pr \left[ \widetilde{X}(S_2) = \mathrm {s}\right] \, \right| = O(\varepsilon ). \end{aligned}$$

The following Poisson approximation theorem by Le Cam [80], rephrased in our language, is essential for proving Lemma 8.4. Suppose we are given a K-dimensional vector \(V=(V_1,\cdots , V_K)\). Let \(\lambda =\sum _{i=1}^K V_i\). We say a random variable Y follows the compound Poisson distribution corresponding to V if it is distributed as \(Y = \sum _{j=1}^{N}Y_j\), where N follows Poisson distribution with expected value \(\lambda \):

$$\begin{aligned} \Pr (N{=}k)= \frac{\lambda ^k e^{-\lambda }}{k!},\quad \text {for }k\geqslant 0, \end{aligned}$$

(denoted as \(N\sim \mathrm {Pois}(\lambda )\) ) and \(Y_1,\cdots , Y_N\) are i.i.d. random variables with \(\Pr [Y_j=0]=0\) and \(\Pr [Y_j = k] = V_k / \lambda \) for \(k\in \{1,\cdots , K\}\) and \(j\in \{1, \cdots , N\}\).

Lemma 8.5

[80] Let \(X_1, X_2, \cdots \) be independent random variables taking integer values in \(\{0,1,\cdots ,K\}\), and let \(X = \sum X_i\). Let \(\pi _i=\Pr [X_i\ne 0]\) and \(V=(V_1,\cdots , V_K)\), where \(V_k=\sum _{i} \Pr [X_i=k]\). Suppose \(\lambda =\sum _i \pi _i=\sum _k V_k<\infty \). Let Y be the compound Poisson distribution corresponding to vector V. Then, the total variation distance between X and Y can be bounded as follows:

$$\begin{aligned} \Delta \Bigl (X, Y\Bigr ) =\sum _{k\geqslant 0}\Bigl |\Pr [X=k]-\Pr [Y=k]\Bigr |\leqslant 2\sum _i \pi _i^2. \end{aligned}$$

Proof of Lemma 8.4

For an edge b, we let \(\overline{\mu }_e\) be the random variable that \(\Pr \left[ \overline{\mu }_e = \mathrm {s}\right] = \overline{\pi }_e(\mathrm {s})\) for \(\mathrm {s}= \mathrm {s}_1, \mathrm {s}_2, \cdots , \mathrm {s}_{z-1}\), and \(\overline{\mu }_e = 0\) with the rest of the probability mass. Similarly, we use \(\overline{\mu }(S)\) to denote \(\sum _{b\in S}\overline{\mu }_e\) for a set S of edges. By definition of \(\overline{\mu }_e\), we have that \(\varDelta \left( \overline{\mu }_e, \widetilde{\mu }_e\right) \leqslant \varepsilon /n\) for any edge b. Since \(S_1\) and \(S_2\) contain at most n edges, we can show that

$$\begin{aligned} \Delta \left( \overline{X}(S_1), \widetilde{X}(S_1)\right) \leqslant \varepsilon \quad \quad \text { and }\quad \quad \Delta \left( \overline{X}(S_2), \widetilde{X}(S_2)\right) \leqslant \varepsilon . \end{aligned}$$

First, for any S such that \({{E}}\left[ \widetilde{X}(S)\right] \leqslant 3/\varepsilon \), we can see that \( \sum _{b \in S}\Pr \Bigl [\widetilde{\mu }_e \ne 0\Bigr ] \leqslant {{E}}\Bigl [\widetilde{\mu }(S)\Bigr ] / \varepsilon ^4 \leqslant 3/\varepsilon ^5. \) If we apply Lemma 8.5 to both \(\overline{X}(S_1)\) and \(\overline{X}(S_2)\), we can see they both correspond to the same compound Poisson distribution, say Y, since their signatures are the same. Moreover, since the total variation distance is a metric, we have that

$$\begin{aligned} \Delta \Bigl (\widetilde{\mu }(S_1), \widetilde{\mu }(S_2)\Bigr )\leqslant & {} \Delta \Bigl (\widetilde{\mu }(S_1), \overline{\mu }(S_1)\Bigr ) + \Delta \Bigl (\overline{\mu }(S_1), Y\Bigr ) + \Delta \Bigl (Y, \overline{\mu }(S_2)\Bigr )\\&+ \Delta \Bigl (\overline{\mu }(S_2), \widetilde{\mu }(S_2)\Bigr ) \\\leqslant & {} 2\varepsilon + 2\sum _{b \in S_1}\Bigl (\Pr \Bigl [\overline{\mu }_e \ne 0\Bigr ]\Bigr )^2 + 2\sum _{b \in S_2}\Bigl (\Pr \Bigl [\overline{\mu }_e \ne 0\Bigr ]\Bigr )^2 + \varepsilon \\= & {} O(\varepsilon ). \end{aligned}$$

This finishes the proof of the lemma.

8.5 Algorithm for Fixed-Set Problems

Now, we present our approximation algorithm for the fixed-set stochastic shortest path problem.

figure e

In step (6), we can use the pseudopolynomial time algorithm for the exact version of the shortest path problem to find a set L with the signature exact equal to \(\mathrm {Sg}\). This can be done by standard dynamic programming.Footnote 14 Since \(\mathrm {Sg}\) is a vector with \(O(\varepsilon ^{-5})\) coordinates and the value of each coordinate is bounded by O(n), it can be encoded by an integer which is at most \(n^{{{\mathrm{poly}}}(1/\varepsilon )}\). Thus the pseudopolynomial time algorithm actually runs in \({{\mathrm{poly}}}(n, n^{{{\mathrm{poly}}}(1/\varepsilon )})\) time, which is a polynomial. We also need to compute the value \(\Pr [\mu (S)\leqslant 1+\varepsilon ]\). The problem is in fact #P-hard [82]. However, we can use the FPTAS developed in [83] to obtain a \((1\pm \varepsilon )\)-approximated estimation, which suffices for our purpose. There are a polynomial number of combinations of heavy edges. So it is not hard to see the algorithm runs in \(n^{{{\mathrm{poly}}}(1/\varepsilon )}\) time overall.

It is fairly straightforward to show that the algorithm achieves the guarantee stated in Theorem 8.1. Suppose we have guessed the heavy edges \(H^*\) of the optimal solution \(S^*\), and the signature of the light edges of \(S^*=H^*\cup L^*\). Suppose the path we found is \(S=H^*\cup L\). Using 8.3, we can see that the original distribution \(\mu (L)\) (\(\mu (L^*)\) resp.) is close to the discretized distribution \(\widetilde{\mu }(L)\) (\(\widetilde{\mu }(L)\) resp.). By Lemma 8.4, \(\widetilde{\mu }(L)\) is close to the \(\widetilde{\mu }(L^*)\). Hence, we can see that the distribution of \(\mu (L)\) is close to that of \(\mu (L^*)\). Hence, S behaves very similar to \(S^*\).

Astute readers may have realized that we did not use much special properties about the combinatorics of the shortest path problem except a pseudopolynomial time algorithm that solves the exact version. In fact, we can generalize the above result to all combinatorial optimization problems which admit a pseudopolynomial time algorithm for the exact version, including spanning tree, k-median on trees, knapsack. Moreover, instead of maximizing \(\Pr [\mu (S)\leqslant 1]\), we can consider the more general expected utility maximization problem. See the detailed results in [7, 8].

8.6 Poisson Approximation for Adaptive Stochastic Knapsack

The Poisson approximation technique can also be used to obtain bi-criterion PTAS for the adaptive stochastic knapsack problem and its generalizations. This is somewhat surprising since it is not even clear at first sight where to apply the Poisson approximation technique, which can only handle the sum of a fixed set of random variables.

We need an important notion, called block-adaptive policies, introduced by Bhalgat et al. [21]. In a block-adaptive policy, instead of inserting the items one at a time, we insert a subset (a.k.a., block) of items at a time. In terms of the decision tree of a policy, each node in the tree corresponding to the insertion of a block of items.

A remarkable property proved in [21] is that there exists a block-adaptive policy, which corresponds to a decision tree with only O(1) nodes that can approximate the optimal policy, modulo an \(\varepsilon \) fraction of profit and an \(\varepsilon \) fraction of knapsack capacity. The result is further generalized to stochastic knapsack with arbitrary subset constraints [8], which allows us to handle cancellation of jobsFootnote 15 and precedence constraints. Hence, it suffices to consider only block-adaptive policies. Since there are only constant number of nodes in the decision tree, we can enumerate all different topologies of the decision tree. Now, for each node of the tree (which corresponds to a block of items), we can use Poisson approximation technique to approximation the distribution of the sum of random variables in the block. More concretely, after fixing the topology of the decision tree, we can enumerate the signatures of all blocks in polynomial time. Then, we can use a dynamic program to search a block-adaptive policy, which matches the enumerated signature. In the analysis, we need to show that if two block-adaptive policies have the same tree topologies and same signatures for every node of the tree, two policies must behave similarly (in particular, obtain almost the same expected profit).

Using the above idea, we can obtain a policy which can produce a profit at least \((1-\varepsilon )\mathrm {OPT}\) using \((1+\varepsilon )C\) capacity (i.e., a bi-criterion PTAS), for the adaptive stochastic knapsack problem, even when job cancellation is allowed. Even though the above idea is clean, the details are quite technical, and we refer interested readers to [8].

8.7 Related Work

8.7.1 Expected Utility Maximization and Threshold Probability Maximization

In Sect. 8, we consider the fixed-set stochastic shortest path problem, where our goal is to maximize the threshold probability \(\Pr [\mu (S)\leqslant 1]\). This is a very special case of the following more general expected utility maximization problem.

Problem 8.6

(Expected Utility Maximization) Suppose we have n elements, each having a non-negative random weight \(\mu _e\). We would like to choose a subset S, subject to some combinatorial constraint. Consider a utility function \(\mathcal {U}:{\mathbb {R}}^+\rightarrow {\mathbb {R}}^+\). If the cost of the solution is x, we obtain a utility value \(\mathcal {U}(x)\). The goal is to find a solution S which maximizes the expected utility \({{E}}[\mathcal {U}(\mu (S))]\), where \(\mu (S)\) is the (random) cost of the solution S, i.e., \(\mu (S)=\sum _{e\in S}\mu _e\).

Consider the utility function: \(\mathcal {U}(x)=1\) for \(x\in [0,1]\) and \(\mathcal {U}(x)=0\) for \(x>1\). It is easy to see that \({{E}}[\mathcal {U}(\mu (S))]=\Pr [\mu (S)\leqslant 1]\). Hence, maximizing the expected utility is equivalent to maximizing the threshold probability. The expected utility is known to be very versatile in expressing diverse risk-averse or risk-prone behaviors. For example, we can use a increasing concave utility function to capture the risk-averse behaviors. The theory was axiomatized by von Neumann and Morgenstern in 1940s [84, 85] (known as von Neumann– Morgenstern expected utility theory in economics literature).

Li and Deshpande [7] first studied the expected utility maximization problem in the context of stochastic combinatorial optimization. They considered general combinatorial constraints and utility functions. Specifically, they can handle the class of combinatorial constraints, which admit pseudopolynomial time algorithms. Examples include spanning tree, matching, simple path, matroid, knapsack. We can also obtain Theorem 8.1 as a corollary.

Their approach is very different. The high level idea is very simple. We first observe that for the exponential utility function \(\mathcal {U}(x)=\alpha ^{x}\) for any \(\alpha \in C\), the problem essentially reduces to a deterministic optimization problem. Indeed, fix an arbitrary solution S and a number \(\alpha \). Due to the independence of the elements, we can see that \( {E}[\mathcal {U}(\mu (S))]={E}[\alpha ^{\mu (S)}]={E}[\alpha ^{\sum _{e\in S}\mu _e}]={E}[\prod _{e\in S} \alpha ^{\mu _{e}}] =\prod _{e\in S}{E}[\alpha ^{\mu _{e}}]. \) Now, we consider a more general utility function \(\mathcal {U}\). For simplicity, we assume \(\mathcal {U}\) is essentially supported in a bounded interval. Then, for utility function \(\mathcal {U}\), we try to use a short exponential sum to approximate \(\mathcal {U}\). More concretely, we want an exponential sum \({\mathcal C}(x)=\sum _{i=1}^L c_{i} \varPhi _i^x\) with L being a constant, which satisfies \( \Bigl | \mathcal {U}(x)-{\mathcal C}(x) \Bigr |\leqslant \varepsilon , \ \ \forall x\geqslant 0. \) If we can do so, instead of directly optimizing \({E}[\mathcal {U}(\mu (S))]\), we can optimize \({E}[{\mathcal C}(\mu (S))]\), which can be done using dynamic programming. The remaining thing is to approximate a function by a short exponential sum. In general, this is not possible for an arbitrary function over the entire \(\mathbb {R}\). However, we only need to do it for a bounded interval. Then, we can utilize a theorem by Jackson in approximation theory combined with some other tricks to find such an approximation. See [7] for the details.

For increasing concave utility functions, Bhalgat and Khanna [86] obtained the first PTAS, if the exact version of the deterministic counterpart can be solved in pseudopolynomial time. They proved a utility equivalence theorem and showed that it suffices to enumerate only a polynomial number of “configurations.” In full version of [7], the authors reproduced the same result for increasing concave utility functions using the function approximation paradigm we just described.

8.7.2 Fixed-Set Stochastic Shortest Path

Nikolova et al. [11] studied the fixed-set stochastic shortest path for Gaussian, Poisson and exponential distributions. For Gaussian distributed edges, it is easy to see that the length of a path is also Gaussian distributed with the mean/variance being the sum of the means/variances of individual edges. Now, we can view the problem as a two-dimensional optimization problem, one for the mean, and the other for the variance. Imagine the mean-variance plane. Each point in the plane correspond to the (mean, variance) pair of a path. Consider the convex hull H of these points (they call it the path polytope). It is not hard to show that maximizing the threshold probability is equivalent to maximizing

$$\begin{aligned} \left( 1-\sum _{i\in S} {{E}}[\mu _i]\right) \Bigg /\sqrt{\sum _{i\in S}{{\text {Var}}[\mu _i]}}, \end{aligned}$$

which is quasi-convex on the path polytope. As a consequence, the quantity, thus the threshold probability, is maximized at a vertex of the path polytope (under certain technical condition). The vertices of H can be enumerated in \(O(n^{\log n})\) time due to a result in parametric shortest path by Carstensen [87]. Hence, by enumerating the vertices of H, they obtained an exact \(O(n^{\log n})\) time algorithm for maximizing the probability that the length of the path is at most 1, i.e., \({{\text {Pr}}}(w(S) \leqslant 1)\), assuming all edges are normally distributed and there is a path with its mean at most 1. Later, Nikolova [10] extended the result to an FPTAS for any problem under the same Gaussian assumptions, if the deterministic version of the problem can be solved in polynomial time.

8.7.3 Fixed-Set Stochastic Knapsack

In the fixed-set stochastic knapsack problem, we are given a knapsack of capacity 1, and a set of items, each with a random size \(s_{i}\) and a deterministic profit \(v_{i}\), and an overflow probability \(\gamma \). We are asked to pick a subset S of items such that

$$\begin{aligned} \Pr \left( \sum _{i\in S}s_{i}\geqslant 1\right) \leqslant \gamma \end{aligned}$$

and the total profit \(\sum _{i\in S}v_{i}\) is maximized.

Kleinberg et al. [82] first considered the fixed-set stochastic knapsack problem with Bernoulli-type distributions and provided a polynomial time \(O(\log 1/\gamma )\) approximation. For item sizes with exponential distributions, Goel and Indyk [88] provided a bi-criterion PTAS, and for Bernoulli-distributed items they gave a quasi-polynomial approximation scheme. Goyal and Ravi [89] showed a PTAS for Gaussian distributed sizes. Bhalgat et al.[21] applied the discretization technique to both adaptive stochastic knapsack and fixed-set stochastic knapsack. For the later, they provide a bi-criterion PTAS: for any constant \(\varepsilon >0\), there is a polynomial time algorithm that can find a solution S with the profit as the least optimum and \({{\text {Pr}}}(\sum _{i\in S}x_{i}\geqslant 1+\varepsilon )\leqslant \gamma +\varepsilon \). Using the result for expected utility maximization, the same result can be also obtained by the utility function approximation approach [7] or the Poisson approximation approach [8] with somewhat simpler proofs and better running times.

For both fixed-set stochastic shortest path and fixed-set stochastic knapsack, the current best approximations are bi-criterion additive PTASes. So we have the following obvious open questions.

Open Question 3 \(\mathrm{S}^*\) is the optimal solution for the fixed-set stochastic shortest path problem. For any \(\varepsilon >0\), whether there is a polynomial time approximation algorithm that finds an s-t path S, such that\( \Pr [\mu (S)\leqslant 1]\geqslant \Pr [\mu (S^*)\leqslant 1] - \varepsilon \).

Open Question 4 \(\mathrm{S}^*\) is the optimal solution for the fixed-set stochastic knapsack problem. For any \(\varepsilon >0\), whether there is a polynomial time approximation algorithm that finds a set of item S with the profit as the least optimum and \({{\text {Pr}}}(\sum _{i\in S}x_{i}\geqslant 1)\leqslant \gamma +\varepsilon \).

We finally note that a recent result by Daskalakis et al. [90] is closely related to the above problems. Their problem can be stated in the following abstract form: Given a random vector X generated by a known product distribution over \(\{0, 1\}^n\) and a threshold value \(0\leqslant \theta \leqslant 1\), output a non-negative vector \(w\in {\mathbb {R}}^n\) with \(\Vert w\Vert _1=1\), which maximizes \(\Pr [w\cdot X\geqslant \theta ]\). They provided an additive PTAS under certain technical condition. Their technique borrows ideas from the study of linear threshold functions in complexity theory and makes use of the Berry–Esseen theorem (a quantitative version of the central limit theorem). Removing their technical condition is also an interesting open problem.

9 Other Stochastic Models

9.1 Stochastic Universal Approximation

Suppose we want to distribute a file from the source s to a set T of nodes in a network G. If we know T, then we can just compute a Steiner tree connecting \(T\cup \{s\}\). Karger and Minkoff [91] considered the maybecast problem which is another stochastic version of the Steiner tree problem.

Problem 9.1

(Maybecast Problem) Each node i chooses to contact the source (we say i is active) with probability \(p_i\) independently. We need to fix a path \(P_i\) to s for each node i. If the node i is active, all edges in \(P_i\) become active. Our goal is to minimize the expected total number of active edges.Footnote 16

Karger and Minkoff [91] showed that the shortest path tree heuristic can be very bad (\(\varOmega (\sqrt{n})\) factor worse than the optimum) and also obtained a constant factor approximation algorithm for the problem by reducing the problem to the r-gathering problem, which is a variant of facility location problem with capacity lower bound for each open facility.

The maybecast problem is closely related to the notion of the universal approximation introduced in [92]. We take the universal Steiner tree problem for example. In this problem, we still need to fix a path \(P_i\) to s for each node i. However, we do not assume each node becomes active in a probabilistic manner. Instead, we take a worst case (or robust approximation) perspective, by bounding the approximation ratio for any subset of S. More precisely, we want to minimize

$$\begin{aligned} \max _{S\subseteq V}\frac{\mathrm {cost}(\cup _{i\in S} P_i)}{\mathrm {cost}(\mathrm {OPT}(S))}, \end{aligned}$$

where \(\mathrm {OPT}(S)\) is the optimal Steiner connecting \(S\cup \{s\}\). Since the line of research does not involve any stochasticity, we do not go into the details and refer interested readers to [92, 93] and the references therein.

9.2 Secretary Problem

The secretary problem is a classical stochastic sequential decision problem, introduced by Dynkin in 1960s [94]. The basic version is a very simple to state, yet the result is stunning at first glance, which makes the problem quite popular even in public media. Suppose we would like to hire the best secretary out of n applicants. The applicants are interviewed one by one in random order. After the interview of each applicant, a decision whether to hire the applicant must be made immediately and irrevocably. During the interview, we know the rank of the applicant among all applicants interviewed so far. The goal is to maximize the chance that we hire the best one. There is very simple strategy that can guarantee the probability is at least 1 / e, irrespective of how large n is. It works as follows: It first interviews the first n / e applicants without hiring any of them. Then, it hires the first one who is the best one among the applicants interviewed so far. The ratio 1 / e is tight for large n.

Kleinberg [95] studied a generalization of the problem, in which we want to select k candidates and maximize their sum. He provided an algorithm that can achieve a competitive ratio of \(1-O(\sqrt{1/k})\), which is asymptotically optimal. Babaioff et al. [96] studied a significant generalization, called the matroid secretrary problem, in which the selected set must be an independent set in a given matroid. The problem has attracted a lot of attentions (see e.g., [97102]). For several special types of matroids, constant approximations are known. However, for a general matroid, the current best approximation is \(O(\log \log r)\) [99, 101], where r is the rank of the given matroid.

Open Question 5 Is there a constant factor approximation algorithm for the matroid secretary problem?

The secretary problem has important applications in mechanism design, and played a similar role as the prophet inequalities we mentioned in Sect. 5.3. However, they are very different from the technical perspective. For example, the aforementioned matroid prophet inequality has a 2-approximation [55].

9.3 Stochastic Multi-Armed Bandit

Multi-armed bandit problems nowadays refer to many different variants which are too large to survey. They are mostly sequential decision making problem featured with an exploration and exploitation trade-off. The most basic version is the following problem: We are given n arms. The i-th arm is associated with an unknown reward distribution supported on [0, 1] with mean \(\theta _i\). If we pull the arm, we get an i.i.d. sample from the distribution. There are T rounds. In each round, we can choose one arm to pull. Our goal is to minimize the regret, which is defined to be the difference between the total reward obtained by our algorithm over the T round, and the reward we can obtain if we keep playing the best arm. For this basic problem and its numerous extensions, we can obtain an o(T) regret.Footnote 17 Instead of playing one arm, we may play a combinatorial set of arms in each round. This is the combinatorial bandit problem. There are numerous other extensions and variations. We refer interested readers to [103, 104] for more comprehensive treatments.

9.3.1 Markovian Bandit

Another extremely important class of stochastic multi-armed bandit problems we have not mentioned yet is the Markovian bandit problems. Here, the reward distribution of each arm follows from a Markov chain. Suppose the chains and the transition probabilities are known. Upon a play of an arm, the state of the arm changes according to the transition probabilities of the corresponding Markov chain. We would like to design an adaptive policy to play the arms and maximize the expected reward. A celebrated result in this domain is Gittins index [105] due to John Gittins, who obtained a polynomial time solution for maximizing the expected discounted reward. In fact, Gittins’ solution is a very simple-looking greedy solution, which assigns each state of the Markov chain a fixed number (the Gittins index), and always play the arm with the current largest index. Several simpler proofs of the result are discovered later (see e.g., [106]) and covered in several text books. Bertsimas and Niño-Mora [107] provided a unified view of several problems that admit similar efficient greedy-like algorithms, via the notion of extended polymatroid. However, many variants are PSPACE-hard, shown by Papadimitriou and Tsitsiklis [108]. There is also a body of work studying approximation algorithms with provable approximation factors for these problems. See [23, 109112] and the references therein. In fact, some bandit models generalizes the stochastic knapsack we considered in the beginning (see e.g., [23, 112]).

9.3.2 Bandit-Arm Selection

Now, we briefly review some problems and results related to Problem 1.4 we introduced in Sect. 1. Such problems are also called pure exploration multi-armed bandit problems. The most basic version is the best arm identification problem, in which the goal is to select the single best arm. Bechhofer [113] first formulated the problem for Gaussian arms in 1954. There has been a resurgence of interest for the problem in the last decade [114120]. Mannor and Tsitsiklis [120] showed that for any algorithm that returns the correct answer with probability at least \(1-\delta \), it requires \(\varOmega \left( \sum \nolimits _{i=2}^{n} \Delta _{i}^{-2} \ln \delta ^{-1}\right) \) samples in expectation for any instance, where \(\varDelta _i\) is the difference between the mean of the best arm and that of the ith arm. Chen and Li [114] obtained the current best upper bound

$$\begin{aligned} O\left( \Delta _{2}^{-2}\ln \ln \Delta _{2}^{-1}+ \sum _{i=2}^{n} \Delta _{i}^{-2} \ln \delta ^{-1}+\sum _{i=2}^{n} \Delta _{i}^{-2}\ln \ln \min (n,\Delta _{i}^{-1}) \right) . \end{aligned}$$

The above bound is worst-case optimal since there is a matching worst-case lower bound for each of the three terms. In fact, the first term is nearly instance optimalFootnote 18 (see [114] for the details), hence not improvable. The second term is instance optimal due to the lower bound in [120]. Only the third term is worst-case optimal.

Open Question 6 Obtain a nearly instance optimal algorithm for the best arm identification problem. We conjecture that the optimal bound is of the form \(\Delta _{2}^{-2}\ln \ln \Delta _{2}^{-1} + L(\mathrm{I})\), where \(L(\mathrm{I})\) is an instance-wise lower bound.

The worst-case sample complexity in the Probably Approximately Correct (PAC) setting is also well studied. In the PAC setting, the algorithm should return an arm whose mean is at most \(\varepsilon \) worse than the optimal one with probability at least \(1-\delta \). There is a matching (worst case) lower and upper bound \(\varOmega (n \ln \delta ^{-1}/\varepsilon ^2)\) [115, 120]. The generalization to selecting the top-k arms has also been studied extensively for the last few years (see e.g., [116, 118, 121127]). Recently, Chen et al. [17] initiated the study of the combinatorial pure exploration problem, which generalizes the cardinality constraint to more general combinatorial constraints (e.g., matroid).

9.4 Clustering Stochastic Points

There are two popular models for stochastic points in a metric space. In the existential uncertainty model, each node v is presented at a fixed point with probability \(p_v\), which is independent of other point. In the locational uncertainty model, the location of a node follows some given distribution. Cormode and McGregor [128] considered the k-center clustering problemFootnote 19 for the locational model in a finite metric graph, and gave a bi-criterion constant approximation. The result was improved later to a true constant factor approximation by Guha and Munagala [129]. Munteanu [130] studied the 1-center problem (a.k.a. the minimum enclosing ball problem) for stochastic points in fixed dimensional Euclidean space and provided a PTAS. Recently, Huang et al. [131] obtained a PTAS for the more general j-flat center problem (i.e., the center is a j-flat, i.e., a j-dimensional affine subspace) for stochastic point, using an extension of the powerful geometric notation \(\varepsilon \)-kernel coreset, introduced by Agarwal et al. [132], to the stochastic setting. In the (kj)-projective clustering problem, we are asked to choose k j-flat, such that the maximum distance for any point to its closest j-flat is minimized. Extending the above results to other clustering problems (e.g., the k-center problem \({\mathbb {R}}^d\) with \(k=O(1), d=O(1)\)) is an interesting future direction.

10 Concluding Remarks

Due to the limit of space and the authors’ knowledge, this survey is by no means comprehensive. We apologize in advance for the omission of any important results.