1 Introduction and Summary

The classification of general techniques is an important research program within the field of approximation algorithms. What abstractions are useful for capturing a wide variety of problems and analyses? What are the scopes of, and the relationships between, the various algorithm-design techniques such as the primal-dual method, the local-ratio method [9], and randomized rounding? Which problems admit optimal and fast greedy approximation algorithms [11, 12, 26]? What general techniques are useful for designing online algorithms? What is the role of locality among constraints and variables [9, 46, 53]? We touch on these topics, exploring a simple greedy algorithm for a general class of covering problems. The algorithm has approximation ratio Δ provided each covering constraint in the instance constrains only Δ variables.

Throughout the paper, \(\bar{\mathbb{R}}_{\ge0}\) denotes ℝ≥0∪{∞} and \(\bar{\mathbb{Z}}_{\ge0}\) denotes ℤ≥0∪{∞}.

The conference version of this paper is [44].

Definition 1

(Submodular-Cost Covering)

An instance is a triple \((c,\mathcal{C},U)\), where

  • The cost function \(c:\bar{\mathbb{R}}_{\ge0}^{n}\rightarrow\bar {\mathbb{R}}_{\ge0}\) is submodular,Footnote 1 continuous, and non-decreasing.

  • The constraint set \(\mathcal{C}\subseteq2^{\bar{\mathbb {R}}_{\ge0}^{n}}\) is a collection of covering constraints, where each constraint \(S\in\mathcal{C}\) is a subset of \(\bar{\mathbb{R}}_{\ge0}^{n}\) that is closed upwardsFootnote 2 and under limit. We stress that each S may be non-convex.

  • For each j∈[n], the domain U j (for variable x j ) is any subset of \(\bar{\mathbb{R}}_{\ge0}\) that is closed under limit.

The problem is to find \(x\in\bar{\mathbb{R}}_{\ge0}^{n}\), minimizing c(x) subject to x j U j   (∀j∈[n]) and xS (\(\forall S\in\mathcal{C}\)).

The definition assumes the objective function c(x) is defined over all \(x\in\bar{\mathbb{R}}_{\ge0}^{n}\), even though the solution space is constrained to xU. This is unnatural, but any c that is properly defined on U extends appropriatelyFootnote 3 to \(\bar{\mathbb{R}}_{\ge0}^{n}\). In the cases discussed here c extends naturally to \(\bar{\mathbb{R}}_{\ge0}^{n}\) and this issue does not arise.

We call this problem Submodular-Cost Covering.Footnote 4

For intuition, consider the well-known Facility Location problem. An instance is specified by a collection of customers, a collection of facilities, an opening cost f j ≥0 for each facility, and an assignment cost d ij ≥0 for each customer i and facility jN(i). The problem is to open a subset \(\mathcal{F}\) of the facilities so as to minimize the cost to open facilities in \(\mathcal{F}\) (that is, \(\sum_{j\in\mathcal{F}} f_{j}\)) plus the cost for each customer to reach its nearest open, admissible facility (that is, \(\sum_{i} \min\{d_{ij}\mid j\in\mathcal{F}\}\)). This is equivalent to Submodular-Cost Covering instance \((c,\mathcal{C},U)\), with

  • a variable x ij for each customer i and facility j, with domain U ij ={0,1},

  • for each customer i, (non-convex) constraint max jN(i) x ij ≥1 (the customer is assigned a facility),

  • and (submodular) cost c(x)=∑ j f j max i x ij +∑ i,j d ij x ij (opening cost plus assignment cost).

A Greedy Algorithm for Submodular-Cost Covering (Sect. 2)

The core contribution of the paper is a greedy Δ-approximation algorithm for the problem, where Δ is the maximum number of variables that any individual covering constraint S in \(\mathcal{C}\) constrains.

For \(S\in\mathcal{C}\), let \(\operatorname{\mathsf{vars}}(S)\) contain the indices of variables that S constrains (i.e., \(j\in\operatorname{\mathsf{vars}}(S)\) if membership of x in S depends on x j ). The algorithm is roughly as follows.

Start with an all-zero solution x, then repeat the following step until all constraints are satisfied: Choose any not-yet-satisfied constraint S. To satisfy S, raise each x j for \(j\in\operatorname{\mathsf{vars}}(S)\) (i.e., raise the variables that S constrains), so that each raised variable’s increase contributes the same amount to the increase in the cost.

Section 2 gives the full algorithm and its analysis.

Fast Implementations (Sect. 7)

One important special case of Submodular-Cost Covering is Covering Integer Linear Programs with upper bounds on the variables (CIP-UB), that is, problems of the form \(\min\{c\cdot x \mid x\in\mathbb{Z}_{\ge0}^{n}; Ax \ge b; x\le u\}\) where each c j , b i , and A ij is non-negative. This is a Submodular-Cost Covering instance \((c,U,\mathcal{C})\) with variable domain U j ={0,1,…,u j } for each j and a covering constraint A i xb i for each i, and Δ is the maximum number of non-zeros in any row of A.

Section 7 describes a nearly linear-time implementation for a generalization of this problem: Covering Mixed Integer Linear Programs with upper bounds on the variables (CMIP-UB). As summarized in the bottom half of Fig. 1, Sect. 7 also describes fast implementations for other special cases: Vertex Cover, Set Cover, Facility Location (linear time); and two-stage probabilistic CMIP-UB (quadratic time).

Fig. 1
figure 1

Some Δ-approximation algorithms for covering problems. “⋆” = generalized or strengthened here

Related Work: Δ-Approximation Algorithms for Classical Covering Problems (Top Half of Fig. 1)

See e.g. [35, 62] for an introduction to classical covering problems. For Vertex Cover Footnote 5 and Set Cover in the early 1980’s, Hochbaum gave a Δ-approximation algorithm based on rounding an LP relaxation [33]; Bar-Yehuda and Even gave a linear-time greedy algorithm (a special case of the algorithms here) [5]. A few years later Hall and Hochbaum gave a quadratic-time primal-dual algorithm for Set Multicover [27]. In the late 1990’s, Bertsimas and Vohra further generalized that result with a quadratic-time primal-dual algorithm for Covering Integer Programs with {0,1}-variables (CIP-01), but restricted to integer constraint matrix A and with approximation ratio max i j A ij Δ [10]. In 2000, Carr et al. gave the first Δ-approximation for CIP-01 [16]. In 2009 (independently of our work), Pritchard extended that result to CIP-UB [54, 55]. Both [16] and [54, 55] use the (exponentially many) Knapsack-Cover (KC) inequalities to obtain integrality gapFootnote 6 Δ, and their algorithms use the ellipsoid method, so have high-degree-polynomial running time.

As far as we know, Set Cover is the most general special case of Submodular-Cost Covering for which any nearly linear time Δ-approximation algorithm was previously known, while CIP-UB is the most general special case for which any polynomial-time Δ-approximation algorithm was previously known.

Independently of this paper, Iwata and Nagano give Δ-approximation algorithms for variants of Vertex Cover, Set Cover, and Edge Cover with submodular (and possibly decreasing!) cost [36].

Online Covering, Paging, and Caching (Sect. 3)

In online covering (following, e.g. [2, 13, 14]), the covering constraints are revealed one at a time in any order. An online algorithm must choose an initial solution x, then, as each constraint “xS” is revealed, must increase variables in x to satisfy the constraint, without knowing future constraints and without decreasing any variable. The algorithm has competitive ratio Δ if the cost of its final solution is at most Δ times the optimal (offline) cost (plus a constant that is independent of the input sequence). The algorithm is said to be Δ-competitive.

The greedy algorithm here is a Δ-competitive online algorithm for Submodular-Cost Covering.

As recently observed in [2, 13, 14], many classical online paging and caching problems reduce to online covering (usually online Set Cover). Via this reduction, the algorithm here generalizes many classical deterministic online paging and caching algorithms. These include LRU and FWF for Paging [59], Balance and Greedy Dual for Weighted Caching [17, 63, 64], Landlord [65, 66] (a.k.a. Greedy Dual Size) [15], for File Caching, and algorithms for Connection Caching [1, 2022] (all results marked with “⋆” in Fig. 2).

Fig. 2
figure 2

Δ-competitive online paging and caching. “⋆” = generalized or strengthened here

As usual, the competitive ratio Δ is the cache size, commonly denoted k, or, in the case of File Caching, the maximum number of files ever held in cache (which is at most the cache size).

Section 3 illustrates this connection using Connection Caching as an example.

Section 3 also illustrates the generality of online Submodular-Cost Covering by describing a (d+k)-competitive algorithm for a new class of upgradable caching problems, in which the online algorithm chooses not only which pages to evict, but also how to pay to upgrade d hardware parameters (e.g. cache size, CPU, bus, network speeds, etc.) to reduce later costs and constraints (somewhat like Ski Rental [39] and multi-slope Ski Rental [47]—special cases of online Submodular-Cost Covering with Δ=2).

Section 4 describes a natural randomized generalization of the greedy algorithm (Algorithm 2), with even more flexibility in incrementing the variables. This yields a stateless Δ-competitive online algorithm for Submodular-Cost Covering, generalizing Pitt’s Vertex Cover algorithm [4] and the Harmonic k-server algorithm as it specializes for Paging and Weighted Caching [56].

Related Work: Randomized Online Algorithms

For most online problems here, no deterministic online algorithm can be better than Δ-competitive (where Δ=k), but better-than-Δ-competitive randomized online algorithms are known. Examples include Ski Rental [39, 47], Paging [25, 49], Weighted Caching [2, 15], Connection Caching [20], and File Caching [3]. Some cases of online Submodular-Cost Covering (e.g. Vertex Cover) are unlikely to have better-than-Δ-competitive randomized algorithms. It would be interesting to classify which cases admit better-than-Δ-competitive randomized online algorithms.

Relation to Local-Ratio and Primal-Dual Methods (Sect. 6)

Sect. 6 describes how the analyses here can be recast (perhaps at some expense in intuition) in either the local-ratio framework or (at least for linear costs) the primal-dual framework. Local ratio is usually applied to problems with variables in {0,1}; the section introduces an interpretation of local ratio for more general domains, based on residual costs.

Similarly, the Knapsack Cover (KC) inequalities are most commonly used for problems with variables in {0,1}, and it is not clear how to extend the KC inequalities to more general domains (e.g. from CMIP-01 to CMIP-UB). (The standard KC inequalities suffice for \(O(\log(\widehat{\varDelta }))\)-approximation of CMIP-UB [41], but may require some modification to give Δ-approximation of CMIP-UB [54, 55].) The primal-dual analysis in Sect. 6 uses a new linear program (LP) relaxation for Linear-Cost Covering that may help better understand how to extend the KC inequalities.

Section 6 also discusses how the analyses here can be interpreted via a certain class of valid linear inequalities, namely inequalities that are “local” in that they can be proven valid by looking only at each single constraint \(S\in\mathcal{C}\) in isolation.

Related Work: Hardness Results, Log-Approximation Algorithms

Even for simple covering problems such as Set Cover, no polynomial-time (Δε)-approximation algorithms (for any constant ε>0) are currently known for small (e.g. constant) Δ. A particularly well studied special case, with Δ=2, is Vertex Cover, for which some complexity-theoretic evidence suggests that such an algorithm may not exist [6, 24, 2831, 34, 40, 50].

For instances where Δ is large, \(O(\log\widehat{\varDelta })\)-approximation algorithms may be more appropriate, where \(\widehat{\varDelta }\) is the maximum number of constraints in which any variable occurs. Such algorithms exist for set cover [19, 37, 38, 48, (greedy, 1975)] for CIP [60, 61, (ellipsoid, 2000)] and CIP-UB [41, (ellipsoid/KC inequalities, 2005)]. It is an open question whether there is a fast greedy \(O(\log\widehat{\varDelta })\)-approximation algorithm handling all of these problems (via, say, CIP-UB).

Recent works with log-approximations for submodular-cost covering problems include [18, 32, 57, 58]. Most of these have high-degree-polynomial run time. For example, the (lnn)-approximation algorithm for two-stage probabilistic Set-Cover [32] requires solving instances of Submodular Function Minimization [51, 52], which requires high-degree-polynomial run time. ([32] also claims a related 2-approximation for two-stage probabilistic Vertex Cover without details.)

Related Work: Distributed and Parallel Algorithms

Distributed and parallel approximation algorithms for covering problems are an active area of study. The simple form of the greedy algorithm here makes it particularly amenable for distributed and/or parallel implementation. In fact, it admits poly-log-time distributed and parallel implementations, giving (for example) the first poly-log-time 2-approximation algorithms for the well-studied (weighted) Vertex Cover and Maximum Weight Matching problems. See [42, 43, 45] for details and related results.

Organization

Section 2 gives the greedy algorithm for Submodular-Cost Covering (Algorithm 2) and proves that is has approximation ratio Δ. Section 3 describes applications to online problems. Section 4 describes randomized generalizations of the greedy algorithm, including a stateless online algorithm. Sections 5 and 6 explain how to view the analysis via the local-ratio and primal-dual methods. Section 7 details fast implementations for some special cases. After Sect. 2, each section may be read independently of the others.

2 Greedy Algorithm for Submodular-Cost Covering (Algorithm 2)

This section gives the full greedy algorithm for Submodular-Cost Covering (Algorithm 2) and the analysis of its approximation ratio. We assume Submodular-Cost Covering instances are given in canonical form:

Definition 2

(Canonical form)

An instance \((c,U,\mathcal{C})\) is in canonical form if each variable domain is unrestricted (each \(U_{j}=\bar{\mathbb {R}}_{\ge0}\)). Such an instance is specified by just the pair \((c,\mathcal{C})\).

This assumption is without loss of generality by the following reduction:

Observation 1

For any Submodular-Cost Covering instance \((c,U,\mathcal{C})\), there is an equivalent canonical form instance \((c,\mathcal{C}')\). By “equivalent”, we mean that any x that is feasible in \((c,U,\mathcal{C})\) is also feasible in \((c,\mathcal{C}')\), and that any xthat is minimally feasible in \((c,\mathcal{C}')\) is also feasible in \((c,U,\mathcal{C})\).

Given any feasible solution x′ to \((c,\mathcal{C}')\), one can compute a feasible solution x to \((c,U,\mathcal{C})\) with c(x)≤c(x′) by taking each x j =max{αU j αx j }.

The reduction is straightforward and is given in the Appendix. The idea is to incorporate the variable-domain restrictions “x j U j ” directly into each covering constraint \(S\in\mathcal{C}\), replacing each occurrence of x j in each S by max{αU j αx j }. For example, applied to a CIP-UB instance \((c,U,\mathcal{C})\) as described in the introduction, the reduction produces the canonical instance \((c,\mathcal{C}')\) in which each covering constraint A i xb i in \(\mathcal{C}\) is replaced in \(\mathcal{C}'\) by the stronger non-convex covering constraint

$$\sum_j A_{ij} \bigl\lfloor \min(x_j,u_j) \bigr\rfloor\ge b_i.$$

To satisfy these constraints, it doesn’t help to assign any x j a value outside of U j ={0,1,…,u j }: any minimal x satisfying the constraints in \(\mathcal{C}'\) will also satisfy x j ∈{0,1,…,u j } for each j.

In the rest of the paper, we assume all instances are given in canonical form. To handle an instance \((c,U,\mathcal{C})\) that is not in canonical form, apply the above reduction to obtain canonical instance \((c,\mathcal{C}')\), use one of the algorithms here to compute a Δ-approximate solution x for \((c,\mathcal{C}')\), then compute vector x′ as described after Observation 1.

Definition 3

For any covering constraint S and \(x\in\mathbb{R}_{\ge0}^{n}\), let “\(x\le _{{}_{S}}y\)”, “\(x >_{{}_{S}}y\)”, etc., mean that the operator holds coordinate-wise for coordinates in \(\operatorname{\mathsf{vars}}(S)\). E.g. \(x \le _{{}_{S}}y\) if x j y j for all \(j\in\operatorname {\mathsf{vars}}(S)\).

Observation 2

If xS and \(y\ge _{{}_{S}}x\), then yS.

The observation is true simply because S is closed upwards, and membership of y in S depends only on y j for \(j\in \operatorname{\mathsf{vars}}(S)\). We use this observation throughout the paper.

To warm up the intuition for Algorithm 2, we first introduce and analyze a simpler version, Algorithm 1, that works only for linear costs. The algorithm starts with x0, then repeatedly chooses any unmet constraint S, and, to satisfy S, raises all variables x j with \(j\in\operatorname{\mathsf {vars}}(S)\) at rate 1/c j , until x satisfies S:

figure a

As the variables increase in Line 4, the cost of x increases at rate \(|\operatorname{\mathsf {vars}}(S)|\le \varDelta \) (each variable contributes to the cost increase at unit rate).Footnote 7 The proof of the approximation ratio relies on the following observation:

Observation 3

Let y be any feasible solution. Consider an iteration for a constraint S. Unless the current solution x already satisfies S at the start of the iteration, at the end of the iteration, x has some variable x k with \(k\in\operatorname{\mathsf{vars}}(S)\) such that x k y k . (That is, \(x\not>_{{}_{S}}y\).)

Proof

At the start of the iteration, since y but not x satisfies S, Observation 2 implies that \(x\not\ge _{{}_{S}}y\). During the iteration, while Line 4 is raising the x j for \(j\in\operatorname{\mathsf{vars}}(S)\), if at some moment \(x\ge _{{}_{S}}y\), then, since yS, it must be (by Observation 2) that xS also, so at that moment Line 4 stops raising the variables, before \(x >_{{}_{S}}y\). □

As the variables increase in Line 4, Observation 3 implies that, for some x k , the growing interval [0,x k ] covers (at rate 1/c k ) a larger and larger fraction of the corresponding interval \([0,x^{*}_{k}]\) in the optimal solution x . This allows the Δ-rate increase in the cost of x to be charged at unit rate to the cost of x , proving that the final cost of x is at most Δ times the cost of x .

For example, consider two iterations of Algorithm 1 on input min{x 1+x 2+x 3x 1+x 2≥4;x 2+x 3≥4} with optimal solution x =(0,4,0), as shown in Fig. 3. The first iteration does a step for the first constraint, raising x 1 and x 2 by 2, and charging the cost increase of 4 to the [0,2] portion of \(x_{2}^{*}\). The second iteration does a step for the second constraint, raising x 2 and x 3 both by 1, and charging the cost increase of 2 to the [2,3] portion of \(x_{2}^{*}\).

Fig. 3
figure 3

Two steps of Algorithm 1, where x =(0,2,0). Dark portions of \(x^{*}_{2}\) have been charged

We generalize this charging argument by defining the residual problem for the current x, which is the problem of finding a minimum-cost augmentation of the current x to make it feasible. For example, after the first iteration of Algorithm 1 in the example above, the residual problem for x=(2,2,0) is equivalent to min{y 1+y 2+y 3y 1+y 2≥0;y 2+y 3≥2}. For notational simplicity, in the definition of the residual problem, instead of shifting each constraint, we (equivalently, but perhaps less intuitively) leave the constraints alone but modify the cost function (we charge y only for the part of y that exceeds x):Footnote 8

Definition 4

(Residual problem)

Given any Submodular-Cost Covering instance \((c,\mathcal{C})\), and any \(x\in\bar{\mathbb{R}}_{\ge0}^{n}\), define the residual problem for x to be the instance \(({\tilde{c}_{x}}, \mathcal{C})\) with cost function \(\tilde{c}_{x}(y) = c(x\vee y)-c(x)\).

For \(Q\subseteq\mathbb{R}_{\ge0}^{n}\), define the cost of Q in the residual problem for x to be \(\tilde{c}_{x}(Q) = \min_{y\in Q} \tilde {c}_{x}(y)\).

If Q is closed upwards, then \(\tilde{c}_{x}(Q)\) equals min{c(y)−c(x)∣yx,yQ}.

In all cases here Q is closed upwards, and we interpret \(\tilde {c}_{x}(Q)\) as the minimum increase in c(x) necessary to raise coordinates of x to bring x into Q. The residual problem \(({\tilde{c}_{x}},\mathcal{C})\) has optimal cost \(\tilde {c}_{x}(\bigcap_{S\in\mathcal{C}} S)\).

Here is the formal proof of the approximation ratio, as it specializes for Algorithm 1.

Lemma 1

(Correctness of Algorithm 1)

Algorithm 1 is a Δ-approximation algorithm for Linear-Cost Covering.

Proof

First consider the case when every cost c j is non-zero. Consider an iteration for a constraint \(S\in\mathcal{C}\).

Fix any feasible y. The cost \(\tilde{c}_{x}(y)\) of y in the residual problem for x is the sum ∑ j c j max(y j x j ,0). As Line 4 raises each variable x j for \(j\in \operatorname{\mathsf{vars}}(S)\) at rate 1/c j , by Observation 3, one of the variables being raised is an x k such that x k <y k . For this k, the term c k max(y k x k ,0) in the sum is decreasing at rate 1. No terms in the sum increase. Thus, \(\tilde{c}_{x}(y)\) decreases at rate at least 1.

Meanwhile, the cost c(x) of x increases at rate \(|\operatorname {\mathsf{vars}}(S)|\le \varDelta \). Thus, the algorithm maintains the invariant \(c(x)/\varDelta + \tilde {c}_{x}(y) \le c(y)\) (true initially because c(0)=0 and \(\tilde{c}_{\mathbf{0}}(y)= c(y)\)). Since \(\tilde{c}_{x}(y) \ge0\), this implies that c(x)≤Δc(y) at all times.

In the case that some c j =0 during an iteration, the corresponding x j ’s are set instantaneously to ∞. This increases neither c(x) nor \(\tilde{c}_{x}(y)\), so the above invariant is still maintained and the conclusion still holds. □

The main algorithm (Algorithm 2, next) generalizes Algorithm 1 in two ways: First, the algorithm works with any submodular (not just linear) cost function. (This generalization is more complicated but technically straightforward.) Second, in each iteration, instead of increasing variables just until the constraint is satisfied, it chooses a step size β≥0 explicitly (we will see that this will allow a larger step than in Algorithm 1). Then, for each \(j\in\operatorname{\mathsf{vars}}(S)\), it increases x j maximally so that the cost c(x) of x increases by (at most) β.

figure b

Choosing the Step Size β

In an iteration for a constraint S, the algorithm can choose any step size β≥0 subject to the restriction \(\beta\le\tilde {c}_{x}(S) =\min\{ c(y) - c(x) \mid y\in S, y\ge x\}\). That is, β is at most the minimum cost that would be necessary to increase variables in x to bring x into S. To understand ways in which Algorithm 2 can choose β, consider the following.

  • In all cases, Algorithm 2 can take β as Algorithm 1 does: just large enough to ensure xS after the iteration. By an argument similar to the proof of Lemma 1, this particular β is guaranteed to satisfy the restriction \(\beta\le\tilde{c}_{x}(S)\). (Of course another option is to take any β smaller than this β.)

  • In some cases, Algorithm 2 can take β larger than Algorithm 1 does. For example, consider a linear constraint x u +x w ≥1 with linear cost c(x)=x u +x w . Consider an iteration for this constraint, starting with x u =x w =0. Algorithm 1 would take β=1/2 and x u =x w =1/2, satisfying the constraint. But \(\tilde{c}_{x}(S) =1\) (to bring x into S would require raising x u +x w to 1), so Algorithm 2 can take β=1 and x u =x w =1, “over-satisfying” the constraint.

  • It would be natural to set β to its maximum allowed value \(\tilde{c}_{x}(S)\), but this value can be hard to compute. Consider a single constraint S: ∑ j c j min(1,⌊x j ⌋)≥1, with cost function c(x)=∑ j c j x j . Then \(\tilde{c}_{\mathbf{0}}(S)=1\) if and only if there is a subset Q such that ∑ jQ c j =1. Determining this for arbitrary c is Subset Sum, which is NP-hard. Still, determining a “good enough” β is not hard: take, e.g. β=min{c j (1−x j )∣x j <1}. If \(x\not\in S\), then this is at most \(\tilde{c}_{x}(S)\) because to bring x into S would require raising at least one x j <1 to 1. This choice of β is easy to compute, and with it Algorithm 2 will satisfy S within Δ iterations.

In short, computing \(\tilde{c}_{x}(S)\) can be hard, but finding a “good” \(\beta\le\tilde{c}_{x}(S)\) is not hard. A generic choice is to take β just large enough to bring x into S after the iteration, as Algorithm 1 does, but in some cases (especially in online, distributed, and parallel settings where the algorithm is restricted) other choices may be easier to implement or lead to fewer iterations. For a few examples, see the specializations of Algorithm 2 in Sect. 7.

The proof of the approximation ratio for Algorithm 2 generalizes the proof of Lemma 1 in two ways: the proof has a second case to handle step sizes β larger than Algorithm 1 would take, and the proof handles the more general (submodular) cost function (the generality of which makes this proof unfortunately more abstract).

Theorem 1

(Correctness of Algorithm 2)

For Submodular-Cost Covering, if Algorithm 2 terminates, it returns a Δ-approximate solution.

Proof

Consider an iteration for a constraint \(S\in\mathcal{C}\). By the submodularity of c, the iteration increases the cost c(x) of x by at most \(\beta|\operatorname{\mathsf{vars}}(S)|\).Footnote 9 We show that, for any feasible y, the cost \(\tilde{c}_{x}(y)\) of y in the residual problem for x decreases by at least β. Thus, the invariant \(c(x)/\varDelta + \tilde{c}_{x}(y) \le c(y)\), and the theorem, hold.

Recall that xy (resp. xy) denotes the coordinate-wise minimum (resp. maximum) of x and y.

Let x and x′ denote the vector x before and after the iteration, respectively. Fix any feasible y.

First consider the case when yx (the general case will follow from this one). The submodularity of c implies c(x′)+c(y)≥c(x′∨y)+c(x′∧y). Subtracting c(x) from both sides and rearranging terms gives (with equality if c is separable, e.g. linear)

$$\bigl[c(y) - c(x)\bigr]-\bigl[c\bigl(y \vee x'\bigr) - c\bigl(x'\bigr)\bigr] \ge c\bigl(x'\wedge y\bigr) -c(x).$$

The first bracketed term is \(c(y\vee x) - c(x) = \tilde{c}_{x}(y)\) (using here that yx so yx=y). The second bracketed term is \(\tilde{c}_{x'}(y)\). Substituting \(\tilde{c}_{x}(y)\) and \(\tilde{c}_{x'}(y)\) for the two bracketed terms, respectively, we have

$$ \tilde{c}_{x}(y) - \tilde{c}_{x'}(y) \ge c\bigl(x'\wedge y\bigr) - c(x).$$
(1)

Note that the left-hand side is the decrease in the residual cost for y in this iteration, which we want to show is at least β. The right-hand side is the cost increase when x is raised to x′∧y (i.e., each x j for \(j\in\operatorname{\mathsf{vars}}(S)\) is raised to \(\min(x'_{j}, y_{j})\)).

To complete the proof for the case yx, we show that the right-hand side is at least β.

Recall that if y is feasible, then there must be at least one x k with \(k\in\operatorname{\mathsf{vars}}(S)\) and x k <y k .

Subcase 1.:

When also \(x'_{k} < y_{k}\) for some \(k\in\operatorname{\mathsf{vars}}(S)\). The intuition in this case is that raising x to x′∧y raises x k to \(x'_{k}\), which alone costs β (by Algorithm 2). Formally, let z be x with just x k raised to \(x'_{k}\). Then:

  1. (i)

    Algorithm 2 chooses \(x'_{k}\) maximally such that c(z)≤c(x)+β.

  2. (ii)

    c(z)=c(x)+β because (i) holds, c is continuous, and \(x'_{k} < \infty\).

  3. (iii)

    zx′∧y because zx′ and (using xy and \(x'_{k} < y_{k}\)) zy.

  4. (iv)

    c(z)≤c(x′∧y) because c is non-decreasing, and (iii) holds.

Substituting (ii) into (iv) gives c(x)+βc(x′∧y), that is, c(x′∧y)−c(x)≥β.

Subcase 2.:

Otherwise \(x'\ge _{{}_{S}}y\). The intuition in this case is that \(x'\wedge y =_{{}_{S}}y\), so that raising x to x′∧y is enough to bring x into S. And, by the assumption on β in Algorithm 2, it costs at least β to bring x into S.

Here is the formal argument. Let z=x′∧y. Then:

  1. (a)

    \(z =_{{}_{S}}y\) by the definition of z and \(x'\ge _{{}_{S}}y\).

  2. (b)

    zS by (a), yS, and Observation 2.

  3. (c)

    zx by the definition of z and x′≥x and yx.

  4. (d)

    \(\tilde{c}_{x}(S) \le c(z) - c(x)\) by (b), (c), and the definition of \(\tilde{c}_{x}(S)\).

  5. (e)

    \(\beta\le\tilde{c}_{x}(S)\) by the definition of Algorithm 2.

By transitivity, (d) and (e) imply βc(z)−c(x), that is, c(x′∧y)−c(x)≥β.

For the remaining case (when \(y\not\ge x\)), we show that the case yx implies this case. The intuition is that if y j <x j , then \(\tilde{c}_{x}(y)\) is unchanged by raising y j to x j , so we may as well assume yx. Formally, define \(\hat{y} = y\vee x\ge y\). Then \(\hat{y}\ge x\) and \(\hat{y}\) is feasible.

By calculation, \(\tilde{c}_{x}(y)= c(x\vee y) - c(x)= c(x\vee(y \vee x)) - c(x)= \tilde{c}_{x}(\hat{y})\).

By calculation, \(\tilde{c}_{x'}(y)= c(x'\vee y) - c(x')= c(x'\vee(y \vee x)) - c(x')= \tilde{c}_{x'}(\hat{y})\).

Thus, \(\tilde{c}_{x}(y) - \tilde{c}_{x'}(y)\) equals \(\tilde{c}_{x}(\hat{y}) - \tilde{c}_{x'}(\hat{y})\), which by the case already considered is at least β. □

3 Online Covering, Paging, and Caching

Recall that in online Submodular-Cost Covering, each constraint \(S\in\mathcal{C}\) is revealed one at a time; an online algorithm must raise variables in x to bring x into the given S, without knowing the remaining constraints. Algorithm 1 or Algorithm 2 can do this, so by Theorem 1 they yield Δ-competitive online algorithms.Footnote 10

Corollary 1

Algorithm 1 and Algorithm 2 give Δ-competitive deterministic online algorithms for Submodular-Cost Covering.

Using simple variants of the reduction of Weighted Caching to online Set Cover from [2], Corollary 1 naturally generalizes a number of known results for Paging, Weighted Caching, File Caching, Connection Caching, etc. as described in the introduction. To illustrate such a reduction, consider the following Connection Caching problem. A request sequence r is given online. Each request r t is a subset of the nodes in a network. In response to each request r t , a connection is activated (if not already activated) between all nodes in r t . Then, if any node in r t has more than k active connections, some of the connections (other than r t ) must be deactivated (paying \(\operatorname{\mathsf {cost}}(r_{s})\) to deactivate connection r s ) to leave each node with at most k active connections.

Reduce this problem to online Set Cover as follows. Let variable x t indicate whether connection r t is closed before the next request to r t after time t, so the total cost is \(\sum_{t} \operatorname{\mathsf{cost}}(r_{t}) x_{t}\). For each node u and each time t, for any (k+1)-subset Q⊆{r s st;ur s }, at least one connection r s Q−{r t } (where s is the time of the most recent request to r s ) must have been deactivated, so the following constraintFootnote 11 is met: \(\max_{r_{s}\in Q-\{r_{t}\}} x_{s} \ge1\).

This is an instance of online Set Cover, with a set for each time t (corresponding to x t ) and an element for each triple (u,t,Q) (corresponding to the constraint for that triple as described above).

Algorithm 1 (via Corollary 1) gives the following k-competitive algorithm. In response to a connection request r t , the connection is activated and x t is set to 0. Then, as long as any node, say u, has k+1 active connections, the current x violates the constraint for the triple (u,Q,t), where Q contains u’s active connections. Node u implements an iteration of Algorithm 1 for the violated constraint: for all connections r s Q−{r t }, it simultaneously raises x s at rate \(1/\operatorname{\mathsf{cost}}(r_{s})\), until some x s reaches 1. Node u then arbitrarily deactivates connections r s Q with x s =1 so that at most k of u’s connections remain active.

For a more involved example with a detailed analysis, see Sect. 3.1.

Remark: On k/(kh+1)-Competitiveness

The classic competitive ratio of k/(kh+1) (versus \(\operatorname {\mathsf {opt}}\) with cache size hk) can be reproduced in the above settings as follows. For any set Q as described above, \(\operatorname {\mathsf {opt}}\) must meet the stronger constraint \(\sum_{r_{s}\in Q-\{r_{t}\}} \lfloor x_{s}\rfloor\ge k-h+1\). In this scenario, the proof of Lemma 1 extends to show a ratio of k/(kh+1) (use that the variables are in [0,1], so there are at least kh+1 variables x j such that x j <y j , so \(\tilde{c}_{x}(y)\) decreases at rate at least kh+1).

3.1 Covering Constraint Generality; Upgradable Online Problems

Recall that the covering constraints in Submodular-Cost Covering need not be convex, only closed upwards. This makes them relatively powerful. The main purpose of this section is to illustrate this power, first by describing a simple example modeling file-segment requests in the http: protocol, then by using it to model upgradable online caching problems.

Http File-Segment Requests

The http: protocol allows retrieval of segments of files. To model this, consider each file f as a group of arbitrary segments (e.g. bytes or pages). Let x t be the number of segments of file r t evicted before its next request. Let c(r t ) be the cost to retrieve a single segment of file r t , so the total cost is ∑ t x t c(r t ). Then (for example), if the cache can hold at most k segments total, model this with constraints of the form (for a given subset Q) \(\sum_{s\in Q}\max\{0, \operatorname{\mathsf{size}}(r_{s}) - \lfloor x_{s} \rfloor\} \le k\) (where \(\operatorname{\mathsf{size}}(r_{s})\) is the total number of segments in r s ).

Running Algorithm 1 on an online request sequence gives the following online algorithm. At time t, respond to the file request r t as follows. Bring all segments of r t into the cache. Until the current set of segments in cache becomes cacheable, increase x s for each file with a segment in cache (other than r t ) at rate 1/c(r s ). Meanwhile, whenever \(\lfloor\min(x_{s},\operatorname{\mathsf {size}}(r_{s})) \rfloor\) increases for some x s , evict segment ⌊x s ⌋ of r s . Continue until the segments remaining in cache are cacheable.

The competitive ratio will be the maximum number of files in the cache. (In contrast, the obvious approach of modeling each segment as a separate cacheable item will give competitive ratio equal to the maximum number of individual segments ever in cache.)

Upgradable Caching

The main point of this section is to illustrate the wide variety of online caching problems that can be reduced to online covering, and then solved via algorithms such as Algorithm 1.

An Upgradable Caching instance is specified by a maximum cache size k, a number d of hardware components, the eviction-cost function \(\operatorname{\mathsf{cost}}(\cdots)\), and, for each time step t (revealed in an online fashion) a request r t and a cacheability predicate, \(\operatorname{\mathsf {cacheable}}_{t}(\cdots)\). As the online algorithm proceeds, it chooses not only how to evict items, but also how to upgrade the hardware configuration. The hardware configuration is modeled abstractly by a vector \(\gamma \in\mathbb{R}_{\ge0}^{d}\), where γ i is the cost spent so far on upgrading the ith hardware component. Upgrading the hardware configuration is modeled by increasing the γ i ’s, which (via \(\operatorname{\mathsf{cost}}(\cdots)\) and \(\operatorname {\mathsf{cacheable}}(\cdots)\)), can decrease item eviction costs and increase the power of the cache.

In response to each request, if the requested item r t is not in cache, it is brought in. The algorithm can then increase any of the γ i ’s arbitrarily (increasing a given γ i models spending to upgrade the ith hardware component). The algorithm must then evict items (other than r t ) from cache until the set Q of items remaining in cache is cacheable, that is, it satisfies the given predicate \(\operatorname{\mathsf {cacheable}}_{t}(Q,\gamma)\). The cost to evict any given item r s is \(\operatorname{\mathsf {cost}}(r_{s},\gamma)\) for γ at the time of eviction.

The eviction-cost function \(\operatorname{\mathsf{cost}}(\cdots)\) and each predicate \(\operatorname{\mathsf{cacheable}}_{t}(\cdots)\) must meet the following monotonicity restrictions. The eviction-cost function \(\operatorname{\mathsf{cost}}(r_{s}, \lambda )\) must be monotone non-increasing with respect to each λ i . (Intuitively, upgrading the hardware can only decrease eviction costs.) The predicate \(\operatorname{\mathsf{cacheable}}_{t}(Q, \lambda)\) must be monotone with respect to Q and each λ i . That is, increasing any single λ i cannot cause the value of the predicate to switch from true to false. (Intuitively, upgrading the hardware can only increase the power of the cache.) Also, if a set Q is cacheable (for a given t and γ) then so is every subset Q′⊆Q. Finally, for simplicity of presentation, we assume that every cacheable set has cardinality k or less.

The cost of a solution is the total paid to evict items, plus the final hardware configuration cost, \(\sum_{i=1}^{d} \gamma_{i}\). The competitive ratio is defined with respect to the minimum cost of any sequence of evictions that meets all the specified cacheability constraints.Footnote 12 Note that the offline solution may as well fix the optimal hardware configuration at the start, before the first request, as this maximizes subsequent cacheability and minimizes subsequent eviction costs.

Standard File Caching is the special case when \(\operatorname{\mathsf{cacheable}}_{t}(Q,\gamma)\) is the predicate “\(\sum_{r_{s}\in Q} \operatorname{\mathsf{size}}(r_{s}) \le k\)” and \(\operatorname{\mathsf{cost}}(r_{s},\gamma)\) depends only on r s ; that is, d=0. Using Upgradable caching with d=0, one could model independent use of the cache by some interfering process: the cacheability predicate could be changed to \(\operatorname{\mathsf{cacheable}}_{t}(Q) \equiv \mbox{``}\sum_{r_{s}\in Q}\operatorname{\mathsf{size}}(r_{s}) \le k_{t}\mbox{''}\), where each k t is at most k but otherwise depends arbitrarily on t. Or, using Upgradable caching with d=1, one could also model a cache that starts with size 1, with upgrades to larger sizes (up to a maximum of k) available for purchase at any time. Or, also with d=1, one could model that upgrades of the network (decreasing the eviction costs of arbitrary items arbitrarily) are available for purchase at any time. One can also model fairly arbitrary restrictions on cacheability: for example (for illustration), one could require that, at odd times t, two specified files cannot both be in cache together, etc.

Next we describe how to reduce Upgradable Caching to online Submodular-Cost Covering with Δ=d+k, giving (via Algorithm 1) a (d+k)-competitive online algorithm for Upgradable Caching. The resulting algorithm is a natural generalization of existing algorithms.

Theorem 2

(Upgradable caching)

Upgradable Caching has a (d+k)-competitive online algorithm, where d is the number of upgradable components and k is the maximum number of files ever held in cache.

Proof

Given an arbitrary Upgradable Caching instance with T requests, define a Submodular-Cost Covering instance \((c,\mathcal{C})\) over \(\mathbb{R}_{\ge0}^{d+T}\) as follows.

The variables are as follows. For i=1,…,d, variable γ i is the amount invested in component i. For t=1,…,T, variable x t is the cost (if any) incurred for evicting the tth requested item r t at any time before its next request. Thus, a solution is a pair \((\gamma,x)\in\mathbb{R}_{\ge0}^{d}\times \mathbb{R}_{\ge0}^{T}\). The cost function is \(c(\gamma,x) = \sum_{i=1}^{d} \gamma_{i}+\sum_{t=1}^{T} x_{t}\).

At any time t, let A(t) denote the set of times of active requests, the times of the most recent requests to each item:

$$A(t) = \bigl\{ s \mid s \le t, \ \bigl(\forall s' \le t\bigr)r_{s'} = r_s \rightarrow s' \le s\bigr\}.$$

In what follows, in the context of the current request r t at time t, we abuse notation by identifying each time sA(t) with its requested item r s . (This gives a bijection between A(t) and the requested items.)

For any given subset QA(t) of the currently active items, and any hardware configuration γ, either the set Q is cacheable or at least one item sQ−{t} must be evicted by time t. In short, any feasible solution (γ,x) must satisfy the predicate

$$\operatorname{\mathsf{cacheable}}_t(Q,\gamma)\quad\mbox{\textbf{or}}\quad \exists s\in Q-\{t\} \mbox{ such that } x_s \ge \operatorname{\mathsf {cost}}(r_s,\gamma).$$

For a given t and Q, let S t (Q) denote the set of solutions (γ,x) satisfying the above predicate. The set S t (Q) is closed upwards (by the restrictions on \(\operatorname{\mathsf{cacheable}}\) and \(\operatorname{\mathsf{cost}}\)) and so is a valid covering constraint.

The online algorithm adapts Algorithm 1, as follows. It initializes γ=x=0. After request r t , the algorithm keeps in cache the set of active items whose eviction costs have not been paid, which we denote C:

$$C = C_t(\gamma,x) = \{t\}\cup\bigl\{s\in A(t) \mid x_s <\operatorname {\mathsf{cost}}(r_s,\gamma)\bigr\}.$$

To respond to request r t , as long as the cached set C is not legally cacheable (i.e., \(\operatorname{\mathsf{cacheable}}_{t}(C,\gamma)\) is false), the corresponding constraint, S t (C) is violated, and the algorithm performs an iteration of Algorithm 1 for that constraint. By inspection, this constraint depends on the following variables: every λ i , and each x s where r s is cached and st (that is, sC−{t}). Thus, the algorithm increases these variables at unit rate, until either (a) \(x_{s} \ge\operatorname{\mathsf {cost}}(r_{s},\gamma)\) for some cached r s and/or (b) \(\operatorname{\mathsf{cacheable}}_{t}(C,\gamma)\) becomes true (due to items leaving C and/or increases in γ). When case (a) happens, the algorithm evicts that r s to maintain the invariant that the cached set is C, then continues with the new constraint for the new C. When case (b) happens, the currently cached set is legally cacheable, and the algorithms is done responding to request t,

This completes the description of the algorithm. For the analysis, we define the constraint collection \(\mathcal{C}\) in the underlying Submodular Covering instance \((c,\mathcal{C})\) to contain just those constraints S t (C) for which the algorithm, given the request sequence, does steps. When the algorithm does a step at time t, the cached set C contains only t and items that stayed in cache (and were collectively cacheable) after the previous request. Since at most k items stayed in cache, by inspection, the underlying constraint S t (C) depends on at most d+k variables in (γ,x). Thus, the degree Δ of \((c,\mathcal{C})\) is at most d+k.

For the Submodular-Cost Covering instance \((c,\mathcal{C})\), let (γ ,x ) and (γ′,x′), respectively, be the solutions corresponding to \(\operatorname {\mathsf {opt}}\) and generated by the algorithm, respectively. For the original upgradable caching instance (distinct from \((c,\mathcal{C})\)), let \(\operatorname {\mathsf {opt}}\) and \(\mathcal{A}\) denote the costs of, respectively, the optimal solution and the algorithm’s solution.

Then \(\mathcal{A}\le c(\gamma',x')\) because the algorithm paid at most \(x'_{s}\) to evict each evicted item r s . (We use here that \(x_{s} \ge\operatorname{\mathsf{cost}}(r_{s},\gamma)\) at the time of eviction, and x s does not decrease after that; note that x s may exceed \(\operatorname{\mathsf{cost}}(r_{s},\gamma)\) because some items with positive \(x'_{s}\) might not be evicted.) The approximation guarantee for Algorithm 1 (Lemma 1) ensures c(γ′,x′)≤Δc(γ ,x ).

By transitivity \(\mathcal{A}\le c(\gamma',x') \le \varDelta \, c(\gamma^{*},x^{*})= \varDelta \operatorname {\mathsf {opt}}.\) □

Flexibility in Tuning the Algorithm

In practice, it is well known that a competitive ratio much lower than k is desirable and usually achievable for paging. Also, for file caching (where items have sizes), carefully tuned variants of Landlord (a.k.a. Greedy-Dual Size) outperform the original algorithms [23]. In this context, it is worth noting that the above algorithm can be adjusted, or tuned, in various ways while keeping its competitive ratio.

First, there is flexibility in how the algorithm handles “free” requests—requests to items that are already in the cache. When the algorithm is responding to request r t , let t′ be the most recent time that item was requested but was not in the cache at the time of the request. Let F(t)={st′<s<t,r s =r t } denote the times of these recent free requests to the item. Worst-case sequences have no free requests, and, although each free request r s costs nothing, the analysis in the proof above charges x s for it anyway.

The algorithm in the proof stops the step for the current constraint S t (C) and removes an item s from the cache C when some x s reaches \(\operatorname{\mathsf{cost}}(r_{s}, \gamma)\). Modify the algorithm to stop the step (and remove s from C) sooner, specifically, when x s reaches \(\operatorname{\mathsf{cost}}(r_{s},\gamma) - \sum_{s'\in F(s)} x_{s'}\) for some sC (effectively reducing the eviction cost of r s by ∑ s′∈F(s) x s. The modified algorithm is still a specialization of Algorithm 1. Although the resulting solution (x,γ) may be infeasible, the approximation guarantee still applies, in that (x,γ) has cost at most \(\varDelta \,\operatorname {\mathsf {opt}}\). The online solution \(\mathcal{A}\) is feasible though, and has cost equal to the cost of (x,γ), and is thus Δ-competitive.

In the above description, each free request is used to reduce the effective cost of a later request to the same item. Whereas the unmodified algorithm generalizes LRU, the modified algorithm generalizes FIFO.

Even more generally, the sum over the free requests r s of x s can be arbitrarily distributed over the non-free requests to reduce their effective costs (leading to earlier eviction). Essentially the same analysis still shows k-competitiveness.

There is a second, independent source of flexibility—the rates at which the variables are increased in each step. As it specializes for File Caching, the algorithm in the proof raises each x s at unit rate until x s reaches \(\operatorname {\mathsf{cost}}(r_{s})\). This raises the total cost c(x,γ) at rate ∑ sC−{t}1≤k, while (in the analysis of Algorithm 1) the residual cost of \(\operatorname {\mathsf {opt}}\) decreases at rate at least 1, implying a competitive ratio of k. In contrast, Landlord (effectively) raises each x s at rate \(\operatorname{\mathsf{size}}(r_{s})\) until x s reaches \(\operatorname{\mathsf{cost}}(r_{s})\). This raises c(x,γ) more rapidly, at rate \(\sum_{s\in C-\{t\}}\operatorname{\mathsf{size}}(r_{s})\), but this sum is also at most k (since all summed items fitted in the cache before r t was brought in). This implies the (known) competitive ratio of k for Landlord. Generally, for items of size larger than 1, the algorithm could raise x s at any rate in \([1,\operatorname {\mathsf{size}}(r_{s})]\). The more general algorithm still has competitive ratio at most k.

Analogous adjustments can be made in other applications of Algorithm 1. For some applications, adjusting the variables’ relative rates of increase can lead to stronger theoretical bounds.

4 Stateless Online Algorithm and Randomized Generalization of Algorithm 2

This section describes two randomized algorithms for Submodular-Cost Covering: Algorithm 3—a stateless Δ-competitive online algorithm, and an algorithm that generalizes both that and Algorithm 2. For simplicity, in this section we assume each U j has finite cardinality. (The algorithms can be generalized in various ways to arbitrary closed U j , but the presentation becomes more technical.Footnote 13)

Algorithm 3 generalizes the Harmonic k-server algorithm as it specializes for Paging and Caching [56], and Pitt’s weighted vertex cover algorithm [4].

Definition 5

(Stateless online algorithm)

An online algorithm for a (non-canonical) Submodular-Cost Covering instance \((c,U,\mathcal{C})\) is stateless provided the only state it maintains is the current solution x, in which each x j is assigned only values in U j .

Although Algorithm 1 and Algorithm 2 maintain only the current partial solution \(x\in\mathbb{R}_{\ge0}^{n}\), for problems with variable-domain restrictions x j may take values outside U j . So these algorithms are not stateless.Footnote 14

The stateless algorithm initializes each x j to minU j . Given any constraint S, it repeats the following until S is satisfied: it chooses a random subset \(J\subseteq\operatorname{\mathsf{vars}}(S)\), then increases each x j for jJ to its next allowed value, min{αU j α>x j }. The subset J can be any random subset such that, for some β≥0, for each \(j\in\operatorname{\mathsf{vars}}(S)\), Pr[jJ] equals β/β j , where β j is the increase in c(x) that would result from increasing x j .

For example, one could take J={r} where r is chosen so that Pr[r=j]∝1/β j . Or take any β≤min j β j , then, independently for each \(j\in\operatorname{\mathsf{vars}}(S)\), take j in J with probability β/β j . Or, choose τ∈[0,1] uniformly, then take J={jβ/β j τ}.

figure c

In the case that each U j ={0,1} and c is linear, one natural special case of the algorithm is to repeat the following as long as there is some unsatisfied constraint S:

Choose a single \(k\in\{ j \mid j\in\operatorname{\mathsf {vars}}(S), x_{j}=0\}\) at random, so that Pr[k=j]∝1/c j . Set x k =1.

Theorem 3

(Correctness of stateless Algorithm 3)

For online Submodular-Cost Covering with finite variable domains, Algorithm 3 is stateless. If the step sizes are chosen so the number of iterations has finite expectation (e.g. taking β=Ω(min j β j )), then it is Δ-competitive (in expectation).

Proof

By inspection the algorithm maintains each x j U j . It remains to prove Δ-competitiveness.

Consider any iteration of the repeat loop. Let x and x′, respectively, denote x before and after the iteration. Let β and β j be as in the algorithm.

First we observe that iteration increases the cost of algorithm’s solution x by at most βΔ in expectation:

Claim 1

Cost c(x) increases by at most \(\sum_{j\in\operatorname{\mathsf{vars}}(S)} (\beta/\beta_{j}) \beta_{j}= \beta|\operatorname{\mathsf{vars}}(S)| \le\beta \varDelta \) in expectation.

The claim follows easily by direct calculation and the submodularity of c.

Inequality (1) from the proof of Theorem 1 still holds: \(\tilde{c}_{x}(y) - \tilde{c}_{x'}(y) \ge c(x'\wedge y) - c(x)\), so the next claim implies that the residual cost of any feasible yx decreases by at least β in expectation:

Claim 2

For any feasible yx, E J [c(x′∧y)−c(x)∣x]≥β.

Proof of Claim

By Observation 2, there is a \(k\in\operatorname{\mathsf{vars}} (S)\) with y k >x k . Since y k U k , the algorithm’s choice of \(\hat{x}_{k}\) ensures \(y_{k} \ge\hat{x}_{k}\). Let z be obtained from x by raising just x k to \(\hat{x}_{k}\). With probability β/β k , the subroutine raises x k to \(\hat{x}_{k}\le y_{k}\), in which case c(x′∧y)−c(x)≥c(z)−c(x)=β k . This implies E J [c(x′∧y)−c(x)∣x]≥(β/β k )β k =β, proving Claim 2.

Thus, for yx, in each iteration, the residual cost of y decreases by at least β in expectation: \(E_{J}[\tilde{c}_{x}(y) -\tilde{c}_{x'}(y) \mid x] \ge\beta\). By the argument at the end of the proof of Theorem 1, this implies the same for all feasible y (even if \(y\not\ge x\)).

In sum, the iteration increases the cost of x by at most Δβ in expectation, while decreasing the residual cost of any feasible y by at least β in expectation. By standard probabilistic arguments, this implies that the expected final cost of x is at most Δ times the initial residual cost of y (which equals the cost of y).

Formally, \(c(x^{t}) + \varDelta \tilde{c}_{x^{t}}(y)\) is a super-martingale, where random variable x t denotes x after t iterations.

Let random variable T be the number of iterations. Using, respectively, \(\tilde{c}_{x^{T}}(y) \ge0\), a standard optional stopping theorem, and \(\tilde{c}_{x^{0}}(y) = c(y) - c(x^{0})\) (because x 0y), the expected final cost E[c(x T)] is at most

$$E\bigl[c\bigl(x^{T}\bigr) + \varDelta \tilde{c}_{x^{T}}(y)\bigr]\le E\bigl[c\bigl(x^{0}\bigr) + \varDelta \tilde{c}_{x^{0}}(y)\bigr] = c\bigl(x^{0}\bigr) + \varDelta \bigl(c(y) - c\bigl(x^{0}\bigr) \bigr) \le~\varDelta c(y).$$

 □

Most General Randomized Algorithm

Algorithm 2 raises the variables continuously, whereas Algorithm 3 steps each variable x j through the successive values in U j . For some instances, both of these choices can lead to slow running times. Next is an algorithm that generalizes both of these algorithms. The basic algorithm is simple, but the condition on β is more subtle. The analysis is a straightforward technical generalization of the previous analyses.

The algorithm has more flexibility in increasing variables. This may be important in distributed or parallel applications, where the flexibility allows implementing the algorithm so that it is guaranteed to make rapid (probabilistic) progress. (The flexibility may also be useful for dealing with limited-precision arithmetic.)

The algorithm is Algorithm 2, modified to call subroutine \(\operatorname{\mathsf{random\_step}}_{c}(x,S)\) (Algorithm 4, below) instead of \(\operatorname{\mathsf{step}}_{c}(x,S)\) to augment x in each iteration.

figure d

The step-size requirement is a bit more complicated.

Theorem 4

(Correctness of randomized algorithm)

For Submodular-Cost Covering suppose, in each iteration of the randomized algorithm for a constraint \(S\in\mathcal{C}\) and \(x\not\in S\), the step size β≥0 is at most

$$ \min \bigl\{ E_J \bigl[c\bigl(x\uparrow_{J}^{y}\bigr)-c(x) \bigr] : y\ge x; y\in S \bigr\},$$
(2)

where \(x\uparrow_{J}^{y}\) is a random vector obtained by choosing a random subset J from the same distribution used in Line 4 of \(\operatorname {\mathsf{random\_step}}\) and then raising x j to y j for jJ. Suppose also that the expected number of iterations is finite. Then the algorithm returns a Δ-approximate solution in expectation.

Note that if p=1, then (2) simplifies to \(\tilde{c}_{x}(S)\). If c is linear, (2) simplifies to \(\tilde{c}'_{x}(S)\) where \(c'_{j} = p_{j} c_{j}\).

Proof

The proof mirrors the proof of Theorem 3.

Fix any iteration. Let x and x′, respectively, denote x before and after the iteration. Let p, β, \(\hat{x}\), and J be as in \(\operatorname{\mathsf {random\_step}}\).

Claim 1

The expected increase in c(x) is

$$E_J\bigl[c\bigl(x'\bigr)-c(x)|x\bigr] \le \sum _{j\in\operatorname{\mathsf{vars}}(S)} p_j \beta/p_j =\beta|\operatorname{\mathsf{vars}}(S)| \le\beta \varDelta .$$

The claim follows easily by calculation and the submodularity of c.

Inequality (1) from the proof of Theorem 1 still holds: \(\tilde{c}_{x}(y) - \tilde{c}_{x'}(y) \ge c(x'\wedge y) - c(x)\), so the next claim implies that the residual cost of any feasible yx decreases by at least β in expectation:

Claim 2

For any feasible yx, E J [c(x′∧y)−c(x)∣x]≥β.

Proof of Claim

The structure of the proof is similar to the corresponding part of the proof of Theorem 1.

Recall that if y is feasible, then there must be at least one x k with \(k\in\operatorname{\mathsf{vars}}(S)\) and x k <y k .

Subcase 1:

When also there is an \(\hat{x}_{k} < y_{k}\) for \(k\in\operatorname {\mathsf{vars}}(S)\) with p k >0.

In case of the event kJ, raising x to x′∧y raises x k to \(\hat{x}_{k}\), which alone (by Algorithm 4) costs β/p k .

Thus, the expected cost to raise x to x′∧y is at least Pr[kJ] β/p k =β.

Subcase 2:

Otherwise, \(\hat{x}_{j} \ge y_{j}\) for all jJ (for all possible J).

In this case, \(x'\wedge y \ge x\uparrow_{J}^{y}\) in all outcomes.

Thus, the expected cost to increase x to x′∧y is at least the expected cost to increase x to \(x\uparrow_{J}^{y}\).

By the assumption in the theorem, this is at least β. This proves Claim 2.

Claims 1 and 2 imply Δ-approximation via the argument in the final paragraphs of the proof of Theorem 3.  □

5 Relation to Local-Ratio Method

The local-ratio method has most commonly been applied to problems with variables taking values in {0,1} and with linear objective function cx (see [4, 6, 8, 9]; for one exception, see [7]). For example, [8] shows a form of equivalence between the primal-dual method and the local-ratio method, but that result only considers problems with solution space {0,1}n (i.e., 0/1-variables). Also, the standard intuitive interpretation of local-ratio—that the algorithm reduces the coefficients in the cost vector c—works only for 0/1-variables.

Here we need to generalize to more general solution spaces. To begin, we first describe a typical local-ratio algorithm for a problem with variables over {0,1} (we use CIP-01). After that, we describe one way to extend the approach to more general variable domains. With that extension in place, we then recast Theorem 1 (the approximation ratio for Algorithm 2) as a local-ratio analysis.

Local-Ratio for {0,1} Variable Domains

Given a (non-canonical) Linear-Cost Covering instance \((c,U,\mathcal{C})\) where each U j ={0,1}, the standard local-ratio approach gives the following Δ-approximation algorithm:

Initialize vector =c. Let “the cost of x under ℓ” be j j x j . Let \(\hat{x}(\ell)\) be the maximal x∈{0,1}n that has zero cost under (i.e., \(\hat{x}_{j}(\ell)=1\) if j =0). As long as \(\hat{x}(\ell)\) fails to meet some constraint \(S\in\mathcal {C}\) , repeat the following: Until \(\hat{x}(\ell) \in S\) , simultaneously for all \(j\in\operatorname{\mathsf{vars}}(S)\) with j >0, decrease j at unit rate. Finally, return \(\hat{x}(\ell)\) .

The algorithm has approximation ratio \(\varDelta =\max_{S}|\operatorname{\mathsf{vars}}(S)|\) by the following argument. Fix the solution x a returned by the algorithm. An iteration for a constraint S decreases \(\ell_{j} x^{a}_{j}\) for each \(j\in \operatorname{\mathsf{vars}}(S)\) at rate \(x^{a}_{j} \le1\), so it decreases x a at rate at most Δ. On the other hand, in any feasible solution x , as long as the variables x j for jS are being decreased, at least one \(j\in\operatorname{\mathsf{vars}}(S)\) with j >0 has \(x^{*}_{j}=1\) (otherwise \(\hat{x}(\ell)\) would be in S). Thus the iteration decreases x at rate at least 1. From this it follows that cx aΔcx (details are left as an exercise).

This local-ratio algorithm is the same as Algorithm 1 for the case U={0,1}n (and linear cost). To see why, observe that the modified cost vector in the local-ratio algorithm is implicitly keeping track of the residual problem for x in Algorithm 1. When the local-ratio algorithm reduces a cost j at unit rate, for the same j, Algorithm 1 increases x j at rate 1/c j . This maintains the mutual invariant (∀j) j =c j (1−x j )—that is, j is the cost to raise x j the rest of the way to 1. Thus, as they proceed together, the CIP-01 instance \((\ell, \mathcal{C})\) defined by the current (lowered) costs is exactly the residual problem \(({\tilde{c}_{x}}, \mathcal{C})\) for the current x in Algorithm 1. To confirm this, note that the cost of any y in the residual problem for x is \(\tilde{c}_{x}(y) = \sum_{j} c_{j}\max(y_{j}-x_{j}, 0) = \sum_{j : y_{j} = 1}c_{j} (1-x_{j})\), whereas in the local-ratio algorithm the cost for y under is \(\sum_{j:y_{j}=1} \ell_{j}\), and by the mutual invariant above these are equal.

So, at least for linear-cost covering problems with {0,1}-variable domains, we can interpret local-ratio via residual costs, and vice versa. On the other hand, residual costs extend naturally to more general domains. Is it possible to likewise extend the local-ratio cost-reduction approach? Simply reducing some costs j until some j =0 does not work—\(\ell_{j}x^{a}_{j}\) may decrease at rate faster than 1, and when j reaches 0, it is not clear which value x j should take in U j .

Local Ratio for More General Domains

One way to extend local-ratio to more general variable domains is as follows. Consider any (non-canonical) instance \((c,U,\mathcal{C})\) where c is linear. Assume for simplicity that each variable domain U j is the same: U j ={0,1,…,u} for some u independent of j, and that all costs c j are non-zero. For each variable x j , instead of maintaining a single reduced cost j , the algorithm will maintain a vector \(\ell_{j}\in\mathbb {R}_{\ge0}^{u}\) of reduced costs. Intuitively, jk represents the cost to increase x j from k−1 to k. (We are almost just reducing the general case to the 0/1 case by replacing each variable x j by multiple copies, but that alone doesn’t quite work, as it increases Δ by a factor of u.) Define the cost of any x∈{0,1,…,u}n under the current to be \(\sum_{j} \sum_{k=1}^{x_{j}} \ell_{jk}\). As a function of the reduced costs , define \(\hat{x}(\ell)\) to be the maximal zero-cost solution, i.e. \(\hat{x}_{j}(\ell) = \max\{ k \mid\sum_{i=1}^{k} \ell_{ji} = 0\}\).

The local-ratio algorithm initializes each jk =c j , so that the cost of any x under equals the original cost of x (under c). The algorithm then repeats the following until \(\hat{x}(\ell)\) satisfies all constraints.

  1. 1.

    Choose any constraint S that \(\hat{x}(\ell)\) does not meet. Until \(\hat{x}(\ell)\in S\), do:

  2. 2.

    Just until an jk reaches zero, for all \(j\in\operatorname{\mathsf{vars}}(S)\) with \(\hat{x}_{j}(\ell)< u\)

  3. 3.

    simultaneously, lower \(\ell_{jk_{j}}\) at unit rate, where \(k_{j}= \hat{x}_{j}(\ell) + 1\).

Finally the algorithm returns \(\hat{x}(\ell)\) (the maximal x with zero cost under the final ).

One can show that this algorithm is a Δ-approximation algorithm (for Δ w.r.t. the original CIP-UB instance) by the following argument. Fix x a and x to be, respectively, the algorithm’s final solution and an optimal solution. In an iteration for a constraint S, as changes, the cost of x a under decreases at rate at most Δ, while the cost of x under decreases at rate at least 1. We leave the details as an exercise.

In fact, the above algorithm is equivalent to Algorithm 1 for CIP-UB. If the two algorithms are run in sync, at any given time, the CIP-01 instance with modified cost exactly captures the residual problem for Algorithm 1.

Local-ratio for Submodular-Cost Covering

The previous example illustrates the basic ideas underlying one approach for extending local-ratio to problems with general variable domains: decompose the cost into parts, one for each possible increment of each variable, then, to satisfy a constraint S, for each variable x j with \(j\in \operatorname{\mathsf{vars}}(S)\), lower just the cost for that variable’s next increment. This idea extends somewhat naturally even to infinite variable domains, and is equivalent to the residual-cost interpretation.

Next we tackle Submodular-Cost Covering in full generality. We recast the proof of Theorem 1 (the correctness of Algorithm 2) as a local-ratio proof. Formally, the minimum requirement for the local-ratio method is that the objective function can be decomposed into “locally approximable” objectives. The common cost-reduction presentation of local ratio described above gives one such decomposition, but others have been used (e.g. [7]). In our setting, the following local-ratio decomposition works. (We discuss the intuition after the lemma and proof.)

Lemma 2

(Local-ratio lemma)

Any algorithm returns a Δ-approximate solution x provided there exist T∈ℤ≥0 and \(c^{t}:\mathbb{R}_{\ge 0}^{n}\rightarrow\mathbb{R}_{\ge0}\) (for t=1,2,…,T) and \(r:\mathbb{R}_{\ge0}^{n}\rightarrow\mathbb {R}_{\ge0}\) such that

  1. (a)

    for any y, \(c(y) = \sum_{t=1}^{T} c^{t}(y) +r(y)\),

  2. (b)

    for all t, and any y and feasible x , c t(y)≤c t(x )Δ,

  3. (c)

    the algorithm returns x such that r(x)=0.

Proof

Properties (a)–(c) state that the cost function can be decomposed into parts, where, for each part c t(), any solution y is Δ-approximate, and, for the remaining part r(), the solution x returned by the algorithm has cost zero. Since x is Δ-approximate w.r.t. each c t(), and x has cost zero for the remaining part, x is Δ-approximate overall. Formally, let x be an optimal solution. By properties (a) and (c), (b), then (a), respectively,

$$c(x) = \sum_{t=1}^T c^t(x)\le \sum_{t=1}^T c^t\bigl(x^*\bigr) \varDelta + r\bigl(x^*\bigr)\varDelta = c\bigl(x^*\bigr)\varDelta .$$

 □

In local-ratio as usually presented, the local-ratio algorithm determines the cost decomposition as it proceeds. The only state maintained by the algorithm after iteration t is the “remaining cost” function t, defined by t(y)=c(y)−∑ st c s(y). In iteration t, the algorithm determines some portion c t of t−1 satisfying Property (b) in the lemma and removes it from the cost. (This is the key step in designing the algorithm.) The algorithm stops when it has removed enough of the cost so that there is a feasible solution x a with zero remaining cost ( T(x a)=0), then returns that x a (taking r= T for Property (c) in the lemma). By the lemma, this x a is a Δ-approximate solution.

For a concrete example, consider the local-ratio algorithm for the linear-cost, 0/1-variable case described at the start of this section. Let T be the number of iterations. For t=0,1,…,T, let t be the modified cost vector at the end of iteration t (so 0 is the original cost vector). Define c t(y)=( t t−1)⋅y to be the decrease in the cost of y due to the change in in iteration t. Define r(y)= Ty to be the modified cost vector at termination (so the returned solution \(x=\hat{x}(\ell^{T})\) has r(x)=0). It is easy to see that property (a) and (c) hold. To see that property (b) holds, recall that in iteration t the algorithm reduces all j for \(j\in \operatorname{\mathsf{vars}}(S)\) with j >0, simultaneously and continuously at unit rate. It raises each x j to 1 when j reaches 0. It stops once xS. At most Δ of the j ’s are being lowered at any time, so the rate of decrease in y for any y∈{0,1}n is at most Δ. But for any x S, the rate of decrease in x is at least 1, because at least one \(j\in\operatorname{\mathsf{vars}}(S)\) has \(x^{*}_{j} =1\) and j >0 (otherwise x would be in S).

Next we describe how to generate such a decomposition of the cost c corresponding to a run of Algorithm 2 on an arbitrary Submodular-Cost Covering instance \((c,\mathcal{C})\). This gives an alternate proof of Theorem 1. The proof uses the previously described idea for extending local ratio to more general domains. Beyond that, it is slightly more complicated than the argument in the previous paragraph for two reasons: it handles submodular costs, and, more subtly, in an iteration for a constraint S, Algorithm 2 can increase variables more than enough to satisfy S (of course this is handled already in the previous analysis of Algorithm 2, which we leverage below).

Lemma 3

(Correctness of Algorithm 2 via local-ratio)

Algorithm 2, run on any instance \((c,\mathcal{C})\) of Submodular-Cost Covering, implicitly generates a cost decomposition {c t} and r as described in Lemma 2. Thus, Algorithm 2 gives a Δ-approximation.

Proof

Assume without loss of generality that c(0)=0. (Otherwise use cost function c′(x)=c(x)−c(0). Then c′(x) is still non-negative and non-decreasing, and, since Δ≥1, the approximation ratio for c′ implies it for c.)

Let x t denote Algorithm 2’s vector x after t iterations. Let T be the number of iterations.

Recall that \({\tilde{c}_{x^{t}}}\) is the cost in the residual problem \(({\tilde{c}_{x^{t}}},\mathcal{C})\) for x after iteration t: \(\tilde{c}_{x^{t}}(y) = c(x^{t}\vee y) - c(x^{t})\).

Define c t so that the “remaining cost” function t (as discussed before the lemma) equals the objective \({\tilde{c}_{x^{t}}}\) in the residual problem for x t. Specifically, take \(c^{t}(y) = \tilde{c}_{x^{t-1}}(y) - \tilde{c}_{x^{t}}(y)\). Also define \(r(y)=\tilde{c}_{x^{T}}(y)\).

These c t and r have properties (a)–(c) from Lemma 2.

Properties (a) and (c) follow by direct calculation. To show (b), fix any y. Then c t(y)=c(x t)−c(x t−1)+c(x t−1y)−c(x ty)≤c(x t)−c(x t−1). So c t(y) is at most the increase in the cost c(x) of x during iteration t. In the proof of Theorem 1, this increase in c(x) in iteration t is shown to be at most Δβ. Also, for any feasible x , the cost \(\tilde{c}_{x}(x^{*})\) for x in the residual problem for x is shown to reduce by at least β. But the reduction in \(\tilde{c}_{x}(x^{*})\) is exactly c t(x ). Thus, c t(y)≤ΔβΔc t(x ), proving Property (b). □

Each c t in the proof captures the part of the cost c lying “between” x t−1 and x t. For example, if c is linear, then \(c^{t}(y) = \sum_{j} c_{j}|[0,y_{j}]\cap[x_{j}^{t-1},x_{j}^{t}] |\). The choice of x t in the algorithm guarantees property (b) in the lemma.

6 Relation to Primal-Dual Method; Local Valid Inequalities

Next we discuss how Algorithm 1 can be reinterpreted as a primal-dual algorithm.

It is folklore that local-ratio and primal-dual algorithms are “equivalent”; for example [8] shows a formal equivalence between the primal-dual method and the local-ratio method. But that result only applies to problems with solution space {0,1}n (i.e., 0/1-variables), and the underlying arguments do not seem to extend directly to this more general setting.

Next we present two linear-program relaxations for Linear-Cost Covering, then use the second one to reprove Lemma 1 (that Algorithm 1 is a Δ-approximation algorithm for Linear-Cost Covering) using the primal-dual method.

Fix any Linear-Cost Covering instance \((c,\mathcal{C})\) in canonical form.

To simplify the presentation, assume at least one optimal solution to \((c,\mathcal{C})\) is finite (i.e., in \(\mathbb{R}_{\ge0}^{n}\)).

For any \(S\in\mathcal{C}\), let \(\overline{S}\) denote the complement of S in \(\bar{\mathbb{R}}_{\ge0}^{n}\). Let \(\overline{S}^{*}\) denote the closure of \(\overline{S}\) under limit.

By Observation 2, if x is feasible, then, for any \(S\in\mathcal{C}\) and \(y\in\overline{S}\), x meets the non-domination constraint \(x \not<_{{}_{S}}y\) (that is, x j y j for some \(j\in\operatorname{\mathsf{vars}}(S)\)). By a limit argument,Footnote 15 the same is true if \(y\in\overline{S}^{*}\). In sum, if x is feasible, then x meets the non-domination constraint for every (S,y) where \(S\in\mathcal{C}\) and \(y\in\overline{S}^{*}\). For finite x, the converse is also true:

Observation 4

If \(x\in\mathbb{R}_{\ge0}^{n}\) meets the non-domination constraint for every \(S\in\mathcal{C}\) and \(y\in\overline{S}^{*}\), then x is feasible for \((c,\mathcal{C})\).

Proof

Assume x is not feasible. Fix an \(S\in\mathcal{C}\) with \(x\not\in S\). Define y(ε) by y j (ε)=x j +ε so \(\lim_{\varepsilon\rightarrow0} y = x \not\in S\). Since S is closed under limit, \(y(\varepsilon')\not\in S\) for some ε′>0. Since x is finite, x j <y j (ε′) for each \(j\in \operatorname{\mathsf{vars}}(S)\). Thus, \(x <_{{}_{S}}y(\varepsilon')\) (i.e., x fails to meet the non-domination constraint for (S,y(ε′))). □

First Relaxation

The non-domination constraints suggest this relaxation of \((c,\mathcal{C})\):

$$\operatorname {\mathsf {minimize}}\quad c\cdot x\quad \mbox{subject to}\quad \bigl(\forall S\in \mathcal{C}, y\in\overline{S}^*\bigr) \sum_{j\in\operatorname{\mathsf{vars}}(S)}{x_j}/{y_j}\ge 1.$$

Let \((c,\mathcal{R}^{1})\) denote this Linear-Cost Covering instance. Call it Relaxation 1.

Observation 5

Fix any \(x\in\mathbb{R}_{\ge0}^{n}\) that is feasible for \((c,\mathcal{R}^{1})\).

Then Δx is feasible for \((c,\mathcal{C})\).

Proof

Fix any \(S\in\mathcal{C}\) and \(y\in\overline{S}^{*}\).

Then \(\sum_{j\in\operatorname{\mathsf{vars}}(S)} x_{j}/y_{j} \ge1\). Thus, \(\max_{j\in\operatorname{\mathsf{vars}}(S)} x_{j}/y_{j} \ge 1/|\operatorname{\mathsf{vars}}(S)|\).

Thus, \(\max_{j\in\operatorname{\mathsf{vars}}(S)} \varDelta x_{j}/y_{j}\ge1\).

That is, Δx meets the non-domination constraint for (any) (S,y).

By Observation 4, Δx is feasible for \((c,\mathcal{C}\)). □

Corollary 2

(Relaxation gap for first relaxation)

The relaxation gap Footnote 16 for \((c,\mathcal{R}^{1})\) is at most Δ.

Proof

Let x be a finite optimal solution for \((c,\mathcal{R}^{1})\). By Obs. 5, Δx is feasible for \((c,\mathcal{C})\), and has cost c⋅(Δx)=Δ(cx). Thus, the optimal cost for \((c,\mathcal{C})\) is at most Δ times the optimal cost for \((c,\mathcal{R}^{1})\). □

Incidentally, \((c,\mathcal{R}^{1})\) gives an ellipsoid-based Linear-Cost Covering Δ-approximation algorithm.Footnote 17

Linear-Cost Covering Reduces to Set Cover

From the Linear-Cost Covering instance \((c,\mathcal{C})\), construct an equivalent (infinite) Set Cover instance \((c',(E,\mathcal{F}))\) as follows. Recall the non-domination constraints: \(x\not<_{{}_{S}}y\) for each \(S\in\mathcal{C}\) and \(y\in\overline{S}^{*}\). Such a constraint is met if, for some \(j\in\operatorname{\mathsf{vars}}(S)\), x j is assigned a value ry j . Introduce an element e=(S,y) into the element set E for each pair (S,y) associated with such a constraint. For each j∈[n] and r∈ℝ≥0, introduce a set s(j,r) into the set family \(\mathcal{F}\), such that set s(j,r) contains element (S,y) if assigning x j =r would ensure \(x\not<_{{}_{S}}y\) (i.e., would satisfy the non-domination constraint for (S,y)). That is, \(s(j,r) = \{ (S,y) \mid j\in\operatorname{\mathsf{vars}}(S),~r\ge y_{j}\}\). Take the cost of set s(j,r) to be \(c'_{jr} = rc_{j}\) (equal to the cost of assigning x j =r).

Observation 6

(Reduction to Set Cover)

The Linear-Cost Covering instance \((c,\mathcal{C})\) is equivalent to the above Set Cover instance \((c',(E,\mathcal{F}))\). By “equivalent” we mean that each feasible solution x to \((c,\mathcal{C})\) corresponds to a set cover X for \((E,\mathcal{F})\) (where s(j,r)∈X iff x j =r) and, conversely, each set cover X for \((E,\mathcal{F})\) corresponds to a feasible solution x to \((c,\mathcal{C})\) (where x j =∑ r:s(j,r)∈X r). Each correspondence preserves cost.

The observation is a consequence of Observation 4.

Note that above reduction increases Δ.

Second Relaxation, via Set Cover

Relaxation 2 is the standard LP relaxation of Set Cover, applied to the equivalent Set Cover instance \((c',(E,\mathcal {F}))\) above, with a variable X jr for each set \(s(j,r)\in\mathcal{F}\):

$$\operatorname {\mathsf {minimize}}\sum_{j,r} r c_jX_{jr} \quad \mbox{subject to}\quad \bigl(\forall S\in\mathcal{C}, y\in \overline{S}^*\bigr) \sum_{j\in\operatorname{\mathsf{vars}}(S)} \sum _{r \ge y_j} X_{jr} \ge 1.$$

(There is a technicality in the definition above—the index r of the inner sum ranges over [y j ,∞). Should one sum, or integrate, over r? Either can be appropriate—the problem and its dual will be well-defined and weak duality will hold either way. Here we restrict attention to solutions X with finite support, so we sum. The same issue arises in the dual below.)

We denote the above relaxation \((c',\mathcal{R}^{2})\). By Observation 6, any feasible solution x to \((c,\mathcal{C})\) gives a feasible solution to \((c,\mathcal{R}^{2})\) of the same cost (via X jr =1 iff r=x j and X jr =0) otherwise. Incidentally, any feasible solution X to \((c',\mathcal{R}^{2})\) also gives a solution x to \((c,\mathcal{R}^{1})\) of the same cost, via x j =∑ r rX jr . That is, Relaxation 1 is a relaxation of Relaxation 2. The converse is not generally true.Footnote 18

Dual of Set-Cover Relaxation

The linear-programming dual of Relaxation 2 is the standard Set Cover dual: fractional packing of elements under (capacitated) sets. We use a variable z e for each element e:

$$\operatorname {\mathsf {maximize}}\sum_{e\in E} z_e\quad \mbox{subject to}\quad \bigl(\forall s(j,r)\in\mathcal{F}\bigr) \sum _{e \in s(j,r)} z_e \le r c_j .$$

Recall \(E=\{(S,y) \mid S\in\mathcal{C}, y\in\overline{S}^{*}\}\); \(s(j,r) = \{ (S,y)\in E \mid j\in\operatorname{\mathsf{vars}}(S),~r\ge y_{j}\}\).

We now describe the primal-dual interpretation of Algorithm 1.

Lemma 4

(Primal-dual analysis of Algorithm 1)

Algorithm 1 can be augmented to compute, along with the solution x to \((c,\mathcal{C})\), a solution z to the dual of Relaxation 2 such that cx is at most Δ times the cost of z. Thus, Algorithm 1 is a Δ-approximation algorithm.

Proof

Initialize z=0. Consider an iteration of Algorithm 1 for some constraint S′. Let x and x′, respectively, be the solution x before and after the iteration. Fix element e′=(S′,x′). Augment Algorithm 1 to raiseFootnote 19 the dual variable z e by β. This increases the dual cost by β. Since the iteration increases the cost of x by at most βΔ, the iteration maintains the invariant that the cost of x is at most Δ times the dual cost.

To finish, we show the iteration maintains dual feasibility. For any element e=(S,y)∈E, let S(e) denote S. Increasing the dual variable z e by β maintains the following invariant:

$$\mbox{for all}\ j\in[n], \quad x_j c_j = \sum _{e : j\in \operatorname{\mathsf{vars}}(S(e))} z_e.$$

The invariant is maintained because z e occurs in the sum iff \(j\in \operatorname{\mathsf{vars}}(S(e'))=\operatorname{\mathsf{vars}}(S')\), and each x j is increased (by β/c j ) iff \(j\in\operatorname{\mathsf{vars}}(S')\), so the iteration increases both sides of the equation equally.

Now consider any dual constraint that contains the raised variable z e. Fix the pair (j,r) defining the dual constraint. That e′∈s(j,r) implies \(j\in\operatorname{\mathsf{vars}}(S')\) and \(x'_{j} \le r\). Each dual variable z e that occurs in this dual constraint has \(j\in\operatorname{\mathsf{vars}}(S(e))\). But, by the invariant, at the end of the iteration, the sum of all dual variables z e with \(j\in \operatorname{\mathsf{vars}}(S(e))\) equals \(x'_{j} c_{j}\). Since \(x'_{j} \le r\), this sum is at most rc j . Thus, the dual constraint remains feasible at the end of the iteration. □

6.1 Valid Local Inequalities; the “Price of Locality”

Here is one general way of characterizing the analyses in this paper in terms of valid inequalities. Note that each of the valid inequalities that is used in Relaxation 1 from Sect. 6 can be obtained by considering some single constraint “xS” in isolation, and adding valid inequalities for just that constraint. Call such a valid inequality “local”. This raises the following question: What if we were to add all local valid inequalities (ones that can be obtained by looking at each S in isolation)? What can we say about the relaxation gap of the resulting polytope?

Formally, fix any Submodular-Cost Covering instance \(\min\{c(x) \mid x \in S\ \mbox{for all} \ S \in\mathcal{C}\}\). Consider the “local” relaxation \((c,\mathcal{L})\) obtained as follows. For each constraint \(S\in\mathcal{C}\), let \(\operatorname{\mathsf{conv}}(S)\) denote the convex closure of S. Then let \(\mathcal{L}= \{ \operatorname{\mathsf{conv}}(S) \mid S\in \mathcal{C}\}\). Equivalently, for each \(S\in\mathcal{C}\), let \(\mathcal{L}_{S}\) contain all of the linear inequalities on variables in \(\operatorname{\mathsf{vars}}(S)\) that are valid for S, then let \(\mathcal{L}= \bigcup_{S\in\mathcal{C}} \mathcal{L}_{S}\). For Linear-Cost Covering, Relaxation 1 above is a relaxation of \((c,\mathcal{L})\), so Corollary 2 implies that the gap is at most Δ. It is not hard to find examplesFootnote 20 showing that the gap is at least Δ.

Of course, if we add all (not just local) valid inequalities for the feasible region \(\bigcap_{S\in\mathcal{C}} S\), then every extreme point of the resulting feasible region is feasible for \((c,\mathcal{C})\), so the relaxation gap would be 1.

7 Fast Implementations for Special Cases of Submodular-Cost Covering

This section has a linear-time implementation of Algorithm 2 for Facility Location (and thus also for Set Cover and Vertex Cover), a nearly linear-time implementation for CMIP-UB, and an \(O(N\widehat{\varDelta }\log \varDelta )\)-time implementation for two-stage probabilistic CMIP-UB. (Here N is the number of non-zeroes in the constraint matrix and \(\widehat{\varDelta }\) is the maximum, over all variables x j , of the number of constraints that constrain that variable.) The section also introduces a two-stage probabilistic version of Submodular Covering, and shows that it reduces to ordinary Submodular Covering.

For Facility Location, Δ is the maximum number of facilities that might serve any given customer. For Set Cover, Δ is the maximum set size. For Vertex Cover, Δ=2.

7.1 Linear-Time Implementations for Facility Location, Set Cover, and Vertex Cover

The standard integer linear program for Facility Location is not a covering linear program due to constraints of the form “x ij y j ”. Also, the standard reduction of Facility Location to Set Cover increases Δ exponentially. For these reasons, we formulate Facility Location directly as the following special case of Submodular-Cost Covering, taking advantage of submodular cost:

Above jN(i) if customer i can use facility j. \((N(i) = \operatorname{\mathsf{vars}}(S_{i})\) where S i is the constraint above for customer i.)

Theorem 5

(Linear-time implementations)

For (non-metric) Facility Location, Set Cover, and Vertex Cover, the greedy Δ-approximation algorithm (Algorithm 2) has a linear-time implementation.

Proof

The implementation is as follows.

  1. 1.

    Start with all x ij =0. Then, for each customer i, in any order, do the following:

  2. 2.

    Let β=min jN(i)[d ij +f j (1−max i x ij )] (the minimum cost to raise x ij to 1 for any jN(i)).

  3. 3.

    For each jN(i), raise x ij by min[β/d ij ,(β+f j max i x ij )/(d ij +f j )].

  4. 4.

    Assign each customer i to any facility j(i) with x ij(i)=1.

  5. 5.

    Open the facilities that have customers.

Line 3 raises the x ij ’s just enough to increase the cost by β per raised x ij and to increase max jN(i) x ij to 1.

By maintaining, for each facility j, max i x ij , the implementation can be done in linear time, O(∑ i |N(i)|).

Set Cover is the special case when d ij =0; Vertex Cover is the further special case Δ=2. □

7.2 Nearly Linear-Time Implementation for CMIP-UB

This section describes a nearly linear-time implementation of Algorithm 2 for Covering Mixed Integer Linear Programs with upper bounds on the variables (CMIP-UB), that is, problems of the form

$$\min \bigl\{c\cdot x \mid x\in\mathbb{R}_{\ge0}^n; Ax \ge B;x\le u;\ (\forall j\in I)~x_j\in\mathbb{Z} \bigr\},$$

where \(c\in\mathbb{R}_{\ge0}^{n}\), \(A\in\mathbb{R}_{\ge0}^{m\times n}\) and \(B\in\mathbb{R}_{\ge0}^{n}\) have no negative entries. The set I contains the indices of the variables that are restricted to take integer values, while \(u\in\bar{\mathbb{R}}_{\ge0}^{n}\) gives the upper bounds on the variables. Δ is the maximum number of non-zeroes in any row of A. We prove the following theorem:

Theorem 6

(Implementation for CMIP-UB)

For CMIP-UB, Algorithm 2 can be implemented to return a Δ-approximation in O(NlogΔ) time, where N is the total number of non-zeroes in the constraint matrix.

Proof

Fix any CMIP-UB instance as described above. For each constraint A i xB i (each row of A), do the following. For presentation (to avoid writing the subscript i), rewrite the constraint as axb (where a=A i and b=B i ). Then bring the constraint into canonical form, as follows. Assume for simplicity of presentation that integer-valued variables in S come before the other variables (that is, \(I\cap\operatorname{\mathsf{vars}}(S) = \{1,2,\ldots,\ell \}\) for some ). Assume for later in the proof that these variables are ordered so that a 1a 2≥⋯≥a . (These assumptions are without loss of generality.) Now incorporate the variable-domain restrictions (xu and (∀jIx j ∈ℤ) into the constraint by rewriting it as follows:

Let \(\mathcal{C}\) be the collection of such canonical constraints, one for each original covering constraint A i xB i .

Intuition

The algorithm focuses on a single unsatisfied \(S\in\mathcal{C}\), repeating an iteration of Algorithm 2 (raising the variables x j for \(j\in\operatorname{\mathsf{vars}}(S)\)) until S is satisfied. It then moves on to another unsatisfied S, and so on, until all constraints are satisfied. While working with a particular constraint S, it increases each x j for \(j\in\operatorname{\mathsf{vars}}(S)\) by β/c j for some β. We must choose \(\beta\le\tilde{c}_{x}(S)\) (the optimal cost to augment x to satisfy S), thus each step requires some lower bound on \(\tilde{c}_{x}(S)\). But the steps must also be large enough to satisfy S quickly.

For intuition, consider first the case when S has no variable upper bounds (each u j =∞) and no floors. In this case, the optimal augmentation of x to satisfy S simply raises the single most cost-effective variable x j (minimizing a j /c j ) to satisfy S, so \(\tilde{c}_{x}(S)\) is easy to calculate exactly and taking \(\beta=\tilde{c}_{x}(S)\) satisfies S in one iteration.

Next consider the case when S has some variable upper bounds (finite u j ). In this case, we take β to be the minimum cost to either satisfy S or bring some variable to its upper bound (we call this saturating the variable). This β is easy to calculate, and will satisfy S after at most \(\operatorname{\mathsf{vars}}(S)\) iterations (as each variable can be saturated at most once).

Finally, consider the case when S also has floors. This complicates the picture considerably. The basic idea is to relax (remove) the floors, satisfy the relaxed constraint as described above, and then reintroduce the floors one by one. We reintroduce a floor only once the constraint without that floor is already satisfied. This ensures that the constraint with the floor will be satisfied if the term with the floor increases even once. (If the term for a floored variable x j increases, we say x j is bumped.) We also reintroduce the floors in a particular order—in order of decreasing a j . This ensures that introducing one floor (which lowers the value of the left-hand side) does not break the property in italics above for previously reintroduced floors.

The above approach ensures that S will be satisfied in \(O(\operatorname{\mathsf{vars}}(S))\) iterations. A careful but straightforward use of heaps allows all the iterations for S to be done in \(O(\operatorname{\mathsf{vars}}(S)\log \varDelta )\) time. This will imply the theorem.

Here are the details. To specify the implementation of Algorithm 2, we first specify how, in each iteration, for a given constraint \(S\in\mathcal{C}\) and \(x\not\in S\), the implementation chooses the step size β. It starts by finding a relaxation S h of S (that is, SS h, so \(\tilde{c}_{x}(S^{h}) \le\tilde{c}_{x}(S)\)). Having chosen the relaxation, the algorithm then takes β to be the minimum cost needed to raise any single variable x j (with \(j\in\operatorname{\mathsf{vars}}(S)\)) just enough to either satisfy the relaxation S h or to cause x j =u j .

The relaxation S h is as follows. Remove all floors from S, then add in just enough floors (from left to right), so that the resulting constraint is unsatisfied. Let S h be the resulting constraint, where h is the number of floors added in. Formally, for h=0,1,…,, define \(f^{h}(x) = \sum_{j=1}^{h} a_{j} \lfloor\min(x_{j},u_{j}) \rfloor+ \sum_{j > h} a_{j} \min(x_{j},u_{j})\) to be the left-hand side of constraint S above, with only the first h floors retained. Then fix h=min{h≥0∣f h(x)<b}, and take S h={xf h(x)≥b}.

Next we show that this β satisfies the constraint in Algorithm 2.

Lemma 5

(Validity of step size)

For S, \(x\not\in S\), and β as described above, \(\beta\in\ [0,\tilde{c}_{x}(S)]\).

Proof

As SS h, it suffices to prove \(\beta\leq\tilde{c}_{x}(S^{h})\). Recall that a variable x j is saturated if x j =u j . Focus on the unsaturated variables in \(\operatorname{\mathsf{vars}}(S)\). We must show that if we wish to augment (increase) some variables just enough to saturate a variable or bring x into S h, then we can achieve this at minimum cost by increasing a single variable. This is certainly true if we saturate a variable: only that variable needs to be increased. A special case of this is when some c i is 0—we can saturate x i at zero cost, which is minimum. Therefore, consider the case where all c i ’s are positive and the variable increases bring x into S h.

Let P be the set of unsaturated variables in {x 1,…,x h }, and let Q be the set of unsaturated variables among {x j j>h}. Consider increasing a variable x j P. Until x j is bumped (i.e., the term ⌊x j ⌋+1 increases because x j reaches its next higher integer), f h(x) remains unchanged, but the cost increases. Thus, if it is optimal to increase x j at all, x j must be bumped. When x j is bumped, f h(x) jumps by a j , which (by the ordering of coefficients) is at least a h , which (by the choice of h) is sufficient to bring x into S h. Thus, if the optimal augmentation increases a variable in P, then the only variable that it increases is that one variable, which is bumped once.

The only remaining case is when the optimal augmentation of x increases only variables from Q. Let x k =argmin{c j /a j x j Q}. Clearly it is not advantageous to increase any variable in Q other than x k . (Let δ j ≥0 denote the amount by which we increase x j Q. If δ j >0 for some jk, then we can set δ j =0 and instead increase δ k by a j δ j /a k . this will leave the increase in f h(x) intact, so x will still be brought into S h, yet will not inflate the cost increase, because the cost will decrease by c j δ j , but increase by c k a j δ/a k c j δ j , where the inequality holds by the definition of k.) □

By the lemma and Theorem 1, with this choice of β, the algorithm gives a Δ-approximation. It remains to bound the running time.

Lemma 6

(Iterations)

For each \(S\in\mathcal{C}\), the algorithm does at most \(2|\operatorname{\mathsf{vars}}(S)|\) iterations for S.

Proof

Recall that, in a given iteration, β is the minimum such that raising some single x k by β/c k (with \(k\in\operatorname{\mathsf{vars}}(S)\) and x k <u k ) is enough to saturate x k or bring x into S h. If the problem is feasible, β<∞ so there is such an x k . Each iteration increases x j for all \(j\in \operatorname{\mathsf{vars}}(S)\) by β/c j , so must increase this x k by β/c k . Thus, the iteration either saturates x k or brings x into S h.

The number of iterations for S that saturate variable is clearly at most \(|\operatorname{\mathsf{vars}}(S)|\). The number of iterations for S that satisfy that iteration’s relaxation (bringing x into S h) is also at most \(|\operatorname{\mathsf{vars}}(S)|\), because, by the choice of h, in the next iteration for S the relaxation index h will be at least 1 larger. Thus, there are at most \(2|\operatorname{\mathsf{vars}}(S)|\) iterations for S before xS. □

The obvious implementation of an iteration for a given constraint S runs in time \(O(|\operatorname{\mathsf{vars}}(S)|)\) (provided the constraint’s a j ’s are sorted in a preprocessing step). By the lemma, the obvious implementation thus yields total time \(O(\sum_{S} |\operatorname{\mathsf{vars}}(S)|^{2})\le O(\sum_{S} |\operatorname{\mathsf{vars}}(S)| \varDelta )= O(N \varDelta )\).

To complete the proof of Theorem 6, we show how to use standard heap data structures to implement the above algorithm to run in O(NlogΔ) time. The implementation considers the constraints \(S\in\mathcal{C}\) in any order. For a given S, it repeatedly does iterations for that S until xS. As the iterations for a given S proceed, the algorithm maintains the following quantities:

  • A fixed vector x b, which is x at the start of the first iteration for S, initialized in time \(O(|\operatorname{\mathsf{vars}}(S)|)\).

  • A variable τ, tracking the sum of the β’s for S so far (initially 0). Crucially, the current x then satisfies \(x_{j} = x^{b}_{j} + \tau/c_{j}\) for \(j\in\operatorname{\mathsf{vars}}(S)\). While processing a given S, we use this to represent x implicitly.

    We then use the following heaps to find each breakpoint of τ—each value at which a variable becomes saturated, is bumped, or at which S h is satisfied and the index h of the current relaxation S h increases. We stop when S (that is, S) is satisfied.

  • A heap containing, for each unsaturated variable x j in \(\operatorname{\mathsf{vars}}(S)\), the value \(c_{j} (u_{j}-x_{j}^{b})\) of τ at which x j would saturate. This value does not change until x j is saturated, at which point the value is removed from the heap.

  • A heap containing, for each unsaturated integer variable x j (jh) in S h, the value of τ at which x j would next be bumped. This value is initially \(c_{j}(1-(x^{b}_{j} - \lfloor x^{b}_{j}\rfloor))\). It changes only when x j is bumped, at which point it increases by c j .

  • A heap containing, for each unsaturated non-integer variable x j (j>h) in S j, the ratio c j /a j . This value does not change. It is removed from the heap when x j is saturated.

  • The current derivative d of f h(x) with respect to τ, which is \(d = \sum_{j>h, x_{j} < u_{j}} a_{j}/c_{j}\). This value changes by a single term whenever a variable is saturated or h increases.

  • The current slack b h=bf h(x) of S h, updated at each breakpoint of τ.

In each iteration, the algorithm queries the min-values of each of the three heaps. It uses the three values to calculate the minimum value of τ at which, respectively, a variable would become saturated, a variable would be bumped, or a single (non-integer) variable’s increase would increase f h(x) by the slack b h. It then increases τ to the minimum of these three values. (This corresponds to doing a step of Algorithm 1 with β equal to the increase in τ.) With the change in τ, it detects each saturation, bump, and increment of h that occurs, uses the derivative to compute the increase in f h(x), then updates the data structures accordingly. (For example, it removes saturated variables’ keys from all three heaps.)

After the algorithm has finished all iterations for a given constraint S, it explicitly sets \(x_{j} \leftarrow x_{j}^{b} + \tau/c_{j}\) for \(j\in \operatorname{\mathsf{vars}}(S)\), discards the data structures for S, and moves on to the next constraint.

The heap keys for a variable x j change (and are inserted or removed) only when that particularly variable is bumped, or saturated, or when h increases to j. Each variable is saturated at most once, and h increases at most \(\ell\le\operatorname{\mathsf{vars}}(S)\) times, and thus there are at most \(\operatorname{\mathsf{vars}}(S)\) bumps (as each bump increases h by at least 1). Thus, during all iterations for S, the total number of breakpoints and heap key changes is \(O(\operatorname{\mathsf{vars}}(S))\). Since each heap operation takes O(logΔ) time, the overall time is then \(O(\sum_{S\in\mathcal{C}}|\operatorname{\mathsf{vars}}(S)|\log \varDelta )\) =O(NlogΔ), where N is the number of non-zeros in A.

This proves the theorem.  □

7.3 Two-Stage (Probabilistic) Submodular-Cost Covering

An instance of two-stage Submodular-Cost Covering is a tuple \((W,p,(c,\mathcal{C}))\) where \((c,\mathcal{C})\) is an instance of Submodular-Cost Covering over n variables (so \(S\subseteq\bar{\mathbb{R}}_{\ge0}^{n}\) for each \(S\in\mathcal{C}\)), \(W:\bar{\mathbb{R}}_{\ge0}^{|\mathcal{C}|\times n}\rightarrow\bar {\mathbb{R}}_{\ge0}\) is a non-decreasing, submodular, continuous first-stage objective function, and, for each \(S\in\mathcal{C}\), the activation probability of S is p S . A solution is a collection \(X=[x^{S}]_{S\in\mathcal{C}}\) of vectors \(x^{S}\in\bar{\mathbb{R}}_{\ge0}^{n}\), one for each constraint \(S\in\mathcal{C}\), such that x SS. Intuitively, x S specifies how S will be satisfied if S is activated, which happens with probability p S . As usual \(\varDelta = \max_{S\in\mathcal{C}} |\operatorname{\mathsf {vars}}(S)|\).

The solution should minimize the cost w(X) of X, as defined by the following random experiment. Each constraint S is independently activated with probability p S . This defines a Submodular-Cost Covering instance \((c, \mathcal{A})\) where \(\mathcal{A}=\{ S\in\mathcal{C}\mid S\mbox{ is activated}\}\subseteq\mathcal{C}\), and the solution \(x^{\mathcal{A}}\) for that instance defined by \(x^{\mathcal{A}}_{j} = \max\{ x^{S}_{j} \mid S \in\mathcal{A}\}\). Intuitively, \(x^{\mathcal{A}}\) is the minimal vector that meets the first-stage commitment to satisfy each activated constraint S with x S. The cost w(X) is then \(W(X) + E_{\mathcal{A}}[ c(x^{\mathcal{A}})]\), the first-stage cost W(X) (modeling a “preparation” cost) plus the (expectation of the) second-stage cost \(c(x^{\mathcal{A}})\) (modeling an additional cost for assembling the final solution to the second-stage Submodular-Cost Covering instance \((c,\mathcal{A})\)).

Facility-Location Example

For example, consider a Set Cover instance \((c,\mathcal{C})\) with elements [m] and sets s(j)⊆[m] for j∈[n]. That is, \(\operatorname {\mathsf {minimize}}c\cdot x \operatorname {\mathsf {subject~to}}x\in\mathbb{R}_{\ge0}^{n}\), (∀i∈[m])max j:is(j) x j ≥1.

Extend this to a two-stage Set Cover instance \((W,p,(c,\mathcal{C}))\) where W ij ≥0 and each p i =1. Let X=[x i] i be any (minimal) feasible solution to this instance. That is, x i∈{0,1}n says that element i chooses the set s(j) where \(x^{i}_{j} = 1\). All constraints are activated in the second stage, so each \(x^{\mathcal{A}}_{j} = \max\{ x^{i}_{j} \mid i\in s(j)\}\). That is, \(x^{\mathcal{A}}_{j} = 1\) iff any element i has chosen set s(j). The cost w(X) is \(\sum_{ij} W_{ij} x^{i}_{j} + \sum_{j} c_{j} \max\{ x^{i}_{j}\mid i\in s(j)\}\).

Note that this two-stage Set Cover problem exactly models Facility Location. The first-stage cost W captures the assignment cost; the second-stage cost c captures the opening cost.

Consider again general two-stage Submodular-Cost Covering. A Δ-approximation algorithm for it follows immediately from the following observation:

Observation 7

Two-stage Submodular-Cost Covering reduces to Submodular-Cost Covering (preserving Δ).

Proof

Any two-stage instance \((W,p,(c,\mathcal{C}))\) over n variables is equivalent to a standard instance \((w, \mathcal{C}')\) over \(n|\mathcal{C}|\) variables (\(X=[x^{S}]_{S\in\mathcal{C}}\)) where w(X) is the cost of X for the two-stage instance as defined above, and, for each \(S\in\mathcal{C}\), there is a corresponding constraint x SS on X in \(\mathcal{C}'\). One can easily verify that the cost w(X) is submodular, non-decreasing, and continuous because W(X) and c(x) are. □

Next we describe a fast implementation of Algorithm 2 for two-stage CMIP-UB—the special case of two-stage Submodular-Cost Covering where W is linear and the pair \((c,\mathcal{C})\) form a CMIP-UB instance.

Theorem 7

(Implementation for two-stage CMIP-UB)

For two-stage CMIP-UB:

  1. (a)

    Algorithm 2 can be implemented to return a Δ-approximation in \(O(N\widehat{\varDelta }\log \varDelta )\) time, where \(\widehat{\varDelta }\) is the maximum number of constraints per variable and N is the input size \(\sum_{S\in\mathcal{C}}|\operatorname{\mathsf{vars}}(S)|\).

  2. (b)

    When p=1, the algorithm can be implemented to run in time O(NlogΔ). (The case p=1 of two-stage CMIP-UB generalizes CMIP-UB and Facility Location.)

Proof

Fix an instance \((W,p,(c,\mathcal{C}))\) of two-stage CMIP-UB. Let \((w,\mathcal{C}')\) be the equivalent instance of standard Submodular-Cost Covering from Observation 7 over variable vector \(X=[x^{S}]_{S\in \mathcal{C}}\). Let random variable \(x^{\mathcal{A}}\) be as described in the problem definition (\(x^{\mathcal{A}}_{j} = \max\{ x_{j}^{S} \mid S\ \mbox{active}\}\)), so that \(w(X) = W\cdot X + E[c \cdot x^{\mathcal{A}}]\).

We implement Algorithm 2 for the Submodular-Cost Covering instance \((w,\mathcal{C}')\). In an iteration of the algorithm for a constraint S on x S, the algorithm computes β as follows. Recall that the variables in X being increased (to satisfy x SS) are \(x^{S}_{j}\) for \(j\in\operatorname{\mathsf{vars}}(S)\). The derivative of w(X) with respect to \(x^{S}_{j}\) is

The derivative will be constant (that is, w(X) will be linear in x S) until \(x^{S}_{j}\) reaches its next breakpoint \(t_{j} = \min\{ x^{R}_{j} \mid x^{R}_{j} > x^{S}_{j}, j\in \operatorname{\mathsf{vars}}(R)\}\). Define \(\beta_{t} = \min\{(t_{j}-x^{S}_{j})c'_{j} \mid j\in \operatorname{\mathsf{vars}}(S)\}\) to be the minimum cost to bring any \(x^{S}_{j}\) to its next breakpoint.

Let w′ be the vector defined above (the gradient of w with respect to x S). Let β′ be the step size that the algorithm in Theorem 6 would compute given the linear cost w′. That is, that it would compute in an iteration for constraint x SS given the CMIP-UB instance (w′,{S}) and the current x S.

The algorithm here computes β t and β′ as defined above, then takes the step size β to be β=min(β t ,β′). This β is a valid lower bound on \(\tilde{c}_{X}(S)\), because β t is the minimum cost to bring any \(x^{S}_{j}\) to its next breakpoint, while \(\beta'\le{\tilde{c}'_{x^{S}}} (S)\) is a lower bound on the cost to satisfy S without bringing any \(x^{S}_{j}\) to a breakpoint. Thus, by Theorem 1, this algorithm computes a Δ-approximation.

The algorithm is as follows. It considers the constraints in any order. For each constraint S, it does iterations for that S, with step size β defined above, until S is satisfied.

Lemma 7

(Iterations)

For each \(S\in\mathcal{C}\), the algorithm does at most \(|\operatorname{\mathsf {vars}}(S)|(\widehat{\varDelta }+2)\) iterations for S.

Proof

An iteration may cause some \(x^{S}_{j}\) to reach its next breakpoint t j . By inspection of the breakpoints t j , each \(x^{S}_{j}\) can cross at most \(\widehat{\varDelta }\) breakpoints (one for each constraint R on x j in the original instance). Thus, there are at most \(|\operatorname{\mathsf{vars}}(S)|\widehat{\varDelta }\) such iterations. In each remaining iteration the step size β equals the step size β′ from the algorithm in Theorem 6. Following the proof of Lemma 6 in Theorem 6, there are at most \(2|\operatorname{\mathsf{vars}}(S)|\) such iterations. (In each such iteration, either some variable \(x_{j}^{S}\) reaches its upper bound u j for the first time, or the constraint \(x_{j}^{S}\in S^{h}\) is satisfied for the current relaxation S h of S. By inspection, S h depends only on the current x S and the constraint S, and not on the cost function w′. Thus, as in the proof of Lemma 6, after an iteration for S where the current S h is satisfied, in the next iteration, h will be at least one larger. That can happen at most \(|\operatorname{\mathsf{vars}}(S)|\) times.) □

To complete the proof of Theorem 7, we prove that algorithm can be implemented to take time \(O(N\widehat{\varDelta }\log \varDelta )\), or, if p=1, time O(NlogΔ).

As the algorithm does iterations for S, the algorithm maintains the data structures described at the end of the proof of Theorem 6, with the following adjustments. When some \(x^{S}_{j}\) reaches its next breakpoint and \(w'_{j}\) increases, the algorithm

  • raises \(x^{b}_{j}\) to maintain the invariant \(x_{j} = x^{b}_{j} + \tau/w'_{j}\);

  • updates the derivative d to account for the change in the term a j /c j (if present in the derivative), and

  • updates the values for key j in the three heaps (where present).

By inspection of the proof of Theorem 6, these adjustments are enough to maintain the data structures correctly throughout all iterations for S. The updates take O(logΔ) time per breakpoint. Thus, the total time for the adjustments is \(O(\sum_{S} |\operatorname{\mathsf{vars}}(S)|\widehat{\varDelta }\log \varDelta )\), which is \(O(N\widehat{\varDelta }\log \varDelta )\).

To compute β t in each iteration, the algorithm does the following. As it is doing iterations for a particular constraint S, recall that τ is the sum of the β’s for S so far (from the proof of Theorem 6). The algorithm maintains a fourth heap containing values \(\{\tau+ (t_{j}-x^{S}_{j}) w'_{j} \mid j\in \operatorname{\mathsf{vars}}(S)\}\) (the values in the definition of β t , plus τ). Then β t is the minimum value in this heap, minus τ.

Then \(x^{S}_{j}\) reaches a breakpoint (and \(w'_{j}\) changes) if and only if β=β t and key j has minimum value in this heap. When that happens, the algorithm finds the next breakpoint \(t'_{j}\) for j (as described in the next paragraph) and updates j’s value in the fourth heap. The total time spent maintaining the fourth heap is O(logΔ) per breakpoint, \(O(\sum_{S}\sum_{j\in\operatorname{\mathsf{vars}}(S)} \widehat{\varDelta }\log \varDelta ) = O(N\widehat{\varDelta }\log \varDelta )\).

The algorithm computes the breakpoints t j efficiently as follows. Throughout the entire computation (not just the iterations for S), the algorithm maintains, for each j, an array of j’s variables in X, that is, \(\{x_{j}^{R} \mid R\in\mathcal{C}, j\in \operatorname{\mathsf{vars}}(R)\}\), sorted by the variables’ current values (initially all 0). Then t j is the value of the first \(x_{j}^{R}\) after \(x_{j}^{S}\) in j’s list. When \(x_{j}^{S}\) reaches its breakpoint t j (detected as described in the previous paragraph), the algorithm updates the list order by swapping \(x_{j}^{S}\) with the \(x_{j}^{R}\) following it in the list (the one with value t j ). The next breakpoint is then the value of the variable \(x_{j}^{R'}\) that was after \(x_{j}^{R}\) and is now after \(x_{j}^{S}\). The time spent computing breakpoints in this way is proportional to the total number of swaps, which is proportional to the total number of breakpoints, which is at most \(\sum_{S}\sum_{j\in\operatorname{\mathsf{vars}}(S)} \widehat{\varDelta }=N\widehat{\varDelta }\).

This concludes the proof for the general case.

When p=1, note that in this case the product in the equation for \(c_{j}'\) is 1 if \(x^{S}_{j} = \max_{R} x^{R}_{j}\) and 0 otherwise. So each constraint S has at most one breakpoint per variable, and the total time for the adjustments above reduces to \(O(\sum_{S} |\operatorname{\mathsf{vars}}(S)|\log \varDelta ) =O(N \log \varDelta )\). As in the proof of Theorem 6, the remaining operations also take O(NlogΔ) time.

This concludes the proof of the theorem.  □