1 Introduction

A very common approach for solving discrete optimization problems is to solve some linear programming relaxation, and then round the fractional solution into an integral one, without (hopefully) incurring much loss in quality. Over the years several ingenious rounding techniques have been developed (see e.g. [24, 25]) based on ideas from optimization, probability, geometry, algebra and various other areas. Randomized rounding and iterative rounding are two of the most commonly used methods.

Recently, discrepancy-based rounding approaches have also been very successful; a particularly notable result is for bin packing due to Rothvoss [18]. Discrepancy is a well-studied area in combinatorics with several surprising results (see e.g. [16]), and as observed by Lovász et al. [14], has a natural connection to rounding. However, until the recent algorithmic developments [1, 9, 15, 17, 19], most of the results in discrepancy were non-constructive and hence not directly useful for rounding. These algorithmic approaches combine probabilistic approaches like randomized rounding with linear algebraic approaches such as iterated rounding [12], which makes them quite powerful.

Interestingly, given the connection between discrepancy and rounding, these discrepancy algorithms can in fact be viewed as meta-algorithms for rounding. We discuss this in Sect. 1.1 in the context of the Lovett–Meka (LM) algorithm [15]. This suggests the possibility of one single approach that generalizes both randomized and iterated rounding. This is our motivating goal in this paper.

While the LM algorithm is already an important step in this direction, it still has some important limitations. For example, it is designed for obtaining additive error bounds and it does not give good multiplicative error bounds (like those given by randomized rounding). This is not an issue for discrepancy applications, but crucial for many approximation algorithms. Similarly, iterated rounding can work well with exponentially sized LPs by exploiting their underlying combinatorial structure (e.g., degree-bounded spanning tree [21]), but the current discrepancy results [15, 19] give extremely weak bounds in such settings.

Our Results: We extend the LM algorithm to overcome the limitations stated above. In particular, we give a new variant that also gives Chernoff type multiplicative error bounds (sometimes with an additional logarithmic factor loss). We also show how to adapt the above algorithm to handle exponentially large LPs involving matroid constraints, as in iterated rounding.

This new discrepancy-based algorithm gives new results for problems such as linear system rounding with violations [4, 13], degree-bounded matroid basis [6, 11], low congestion routing [10, 13] and multi-budgeted matroid basis [8], These results simultaneously combine non-trivial guarantees from discrepancy, randomized rounding and iterated rounding and previously such bounds were not even known existentially.

Our results are described formally in Sect. 1.2. To place them in the proper context, we first need to describe some existing rounding approaches (Sect. 1.1). The reader familiar with the LM algorithm can directly go to Sect. 1.2.

1.1 Preliminaries

We begin by describing LM rounding [15], randomized rounding and iterated rounding in a similar form, and then discuss their strengths and weaknesses.

LM Rounding: Let A be a m × n matrix with 0–1 entries,Footnote 1 x ∈ [0, 1]n a fractional vector and let b = Ax. Lovett and Meka [15] showed the following rounding result.

Theorem 1 (LM Rounding [15])

Given A and x as above, For j = 1, , m, pick any λ j satisfying

$$\displaystyle{ \sum _{j}\exp (-\lambda _{j}^{2}/4) \leq n/16. }$$
(1)

Then there is an efficient randomized algorithm to find a solution x such that: (i) at most n∕2 variables of x are fractional (strictly between 0 and 1) and, (ii) | 〈a j , x x〉 | ≤ λ j a j 2 for each j = 1, , m, where a j denotes the j-th row of A.

Remark

The right hand side of (1) can be set to (1 −ε)n for any fixed constant ε > 0, at the expense of O ε (1) factor loss in other parameters of the theorem; see e.g. [2].

Randomized Rounding: Chernoff bounds state that if X 1, , X n are independent Bernoulli random variables, and X = i X i and \(\mu = \mathbb{E}[X]\), then

$$\displaystyle{\mathrm{Pr}[\vert X -\mu \vert \geq \epsilon \mu ] \leq 2\exp (-\epsilon ^{2}\mu /4)\qquad \mbox{ for }\epsilon \leq 1.}$$

Then independent randomized rounding can be viewed as the following (by using Chernoff bounds and union bound, and denoting \(\lambda _{j} =\epsilon _{j}\sqrt{b_{j}}\)).

Theorem 2 (Randomized Rounding)

For j = 1, , m, pick any λ j satisfying \(\lambda _{j} \leq \sqrt{b_{j}}\) , and

$$\displaystyle{ \sum _{j}\exp (-\lambda _{j}^{2}/4) <0.5 }$$
(2)

Then independent randomized rounding gives a solution x such that: (i) All variables are 01, and (ii) \(\vert \langle a_{j},x^{{\prime}}- x\rangle \vert \leq \lambda _{j}\sqrt{b_{j}}\) for each j = 1, , m.

Iterated Rounding [ 12 ]: This is based on the following linear-algebraic fact.

Theorem 3

If m < n, then there is a solution x ∈ [0, 1]n such that (i) x has at least nm variables set to 0 or 1 and, (ii) A(x x) = 0 (i.e., b = Ax ).

In iterated rounding applications, if m > n then some cleverly chosen constraints are dropped until m < n and integral variables are obtained. This is done repeatedly.

Strengths of LM rounding: Note that if we set λ j ∈ {0, } in LM rounding, then it gives a very similar statement to Theorem 3. E.g., if we only care about some m = n∕2 constraints then Theorem 3 gives an x with at least n∕2 integral variables and a j x = a j x for all these m constraints. Theorem 1 (and the remark below it) give the same guarantee if we set λ j = 0 for all constraints. In general, LM rounding can be much more flexible as it allows arbitrary λ j .

Second, LM rounding is also related to randomized rounding. Note that (2) and (1) have the same left-hand-side. However, the right-hand-side of (1) is \(\Omega (n)\), while that of (2) is O(1). This actually makes a huge difference. In particular, in (2) one cannot set λ j = 1 for more than a couple of constraints (to get an \(o(\sqrt{b_{j}})\) error bound on constraints), while in (1), one can even set λ j = 0 for O(n) constraints. In fact, almost all non-trivial results in discrepancy [16, 22, 23] are based on this ability.

Weaknesses of LM rounding: First, Theorem 1 only gives a partially integral solution instead of a fully integral one as in Theorem 2.

Second, and more importantly, it only gives additive error bounds instead of multiplicative ones. In particular, note the λ j a j 2 vs \(\lambda _{j}\sqrt{ b_{j}}\) error in Theorems1 and 2. E.g., for a constraint i x i = logn, Theorem 2 gives \(\lambda \sqrt{\log n}\) error but Theorem 1 gives a much higher \(\lambda \sqrt{n}\) error. So, while randomized rounding can give a good multiplicative error like a j x ≤ (1 ±ε j )b j , LM rounding is completely insensitive to b j .

Finally, iterated rounding works extremely well in many settings where Theorem 1 does not give anything useful. E.g., in problems involving exponentially many constraints such as the degree bounded spanning tree problem. The problem is that if m is exponentially large, then the λ j ’s in Theorem 1 need to be very large to satisfy (2).

1.2 Our Results and Techniques

Our first result is the following improvement over Theorem 1.

Theorem 4

There is a constant K 0 > 0 and randomized polynomial time algorithm that given any n > K 0 , fractional solution y ∈ [0, 1]n , m ≤ 2n linear constraints \(a_{1},\ldots,a_{m} \in \mathbb{R}^{n}\) and λ 1, ⋯ , λ m ≥ 0 with \(\sum _{j=1}^{m}e^{-\lambda _{j}^{2}/K_{ 0}} <\frac{n} {16}\) , finds a solution y ∈ [0, 1]n such that:

$$\displaystyle\begin{array}{rcl} \vert \langle y^{{\prime}}- y,a_{ j}\rangle \vert & \leq & \lambda _{j} \cdot \sqrt{W_{j } (y)} + \frac{1} {n^{2}} \cdot \| a_{j}\|,\quad \forall j = 1,\cdots m{}\end{array}$$
(3)
$$\displaystyle\begin{array}{rcl} y_{i}^{{\prime}}\in \{ 0,1\},& & \mathit{\mbox{ for }}\Omega (n)\mathit{\text{ indices }}i \in \{ 1,\cdots \,,n\}{}\end{array}$$
(4)

Here W j (y): = i = 1 n a ji 2 ⋅ min{y i , 1 − y i }2 for each j = 1, ⋯m.

Remarks

 

  1. (1)

    The error \(\lambda _{j}\sqrt{W_{j } (y)}\) is always smaller than λ j a j ∥ in LM-rounding and λ j ( i = 1 n a ji 2 ⋅ y i (1 − y i ))1∕2 in randomized rounding. In fact it could even be much less if the y i are very close to 0 or 1.

  2. (2)

    The term n∕16 above can be made (1 −ε)n for any fixed constant ε > 0, at the expense of worsening other constants ( just as in LM rounding).

  3. (3)

    The additional error term \(\frac{1} {n^{2}} \cdot \| a_{j}\|\) above is negligible and can be reduced to \(\frac{1} {n^{c}} \cdot \| a_{j}\|\) for any constant c, at the expense of a larger running time n O(c).

We note that Theorem 4 can also be obtained in a “black box” manner from LM-rounding (Theorem 1) by rescaling the polytope and using its symmetry.Footnote 2 However, such an approach does not work in the setting of matroid polytopes (Theorem 5). In the matroid case, we need to modify LM-rounding as outlined below.

Applications: We focus on linear system rounding as the prime example. Here, given matrix A ∈ [0, 1]m×n and vector \(b \in \mathbb{Z}_{+}^{m}\), the goal is to find a vector z ∈ {0, 1}n satisfying Az = b. As this is NP-hard, the focus has been on finding a z ∈ {0, 1}n where Azb.

Given any fractional solution y ∈ [0, 1]n satisfying Ay = b, using Theorem 4 iteratively we can obtain an integral vector z ∈ {0, 1}n with

$$\displaystyle{ \vert a_{j}z - b_{j}\vert \leq \min \left \{O(\sqrt{n\log (2 + m/n)})\,,\,\,\sqrt{L \cdot b_{j}} + L\right \},\quad \forall j \in [m], }$$
(5)

where L = O(lognlogm) and [m]: = {1, 2, , m}.Footnote 3 Previously known algorithms could provide a bound of either \(O(\sqrt{n\log (m/n)})\) for all constraints [15] or \(O(\sqrt{\log m} \cdot \sqrt{b_{j}} +\log m)\) for all constraints (Theorem 2). Note that this does not imply a \(\min \{\sqrt{ n\log (m/n)},\sqrt{\log m} \cdot \sqrt{b_{j}} +\log m\}\) violation per constraint, as in general it is not possible to combine two integral solutions and achieve the better of their violation bounds on all constraints. To the best of our knowledge, even the existence of an integral solution satisfying the bounds in (5) was not known prior to our work.

In the setting where the matrix A is “column sparse”, i.e. each variable appears in at most \(\Delta\) constraints, we obtain a more refined error of

$$\displaystyle{ \vert a_{j}y - b_{j}\vert \leq \min \left \{O(\sqrt{\Delta }\log n)\,,\,\,\sqrt{L \cdot b_{j}} + L\right \},\quad \forall j \in [m], }$$
(6)

where L = O(logn ⋅ logm). Previous algorithms could separately achieve bounds of \(\Delta - 1\) [4], \(O(\sqrt{\Delta }\log n)\) [15] or \(O(\sqrt{\log \Delta } \cdot \sqrt{b_{j}} +\log \Delta )\) [13]. For clarity, Fig. 1 plots the violation bounds achieved by these different algorithms as a function of the right-hand-side b when m = n (we assume \(b,\Delta \geq \log ^{2}n\)). Note again that since there are multiple constraints we can not simply combine algorithms to achieve the smaller of their violation bounds.

Fig. 1
figure 1

Additive violation bounds for linear system rounding when \(\Delta \geq \log ^{2}n\) and b ≥ log2 n

One can also combine the bounds in (5) and (6), and use some additional ideas from discrepancy to obtain:

$$\displaystyle{ \vert a_{j}y - b_{j}\vert \,\leq \, O(1) \cdot \min \left \{\sqrt{j},\,\sqrt{n\log (2 + \frac{m} {n} )},\,\sqrt{L \cdot b_{j}} + L,\,\sqrt{\Delta }\log n\right \},\quad \forall j \in [m]. }$$
(7)

Matroid Polytopes: Our main result is an extension of Theorem 4 where the fractional solution lies in a matroid polytope in addition to satisfying the linear constraints {a j } j = 1 m. Recall that a matroid \(\mathcal{M}\) is a tuple \((V,\mathcal{I})\) where V is the groundset of elements and \(\mathcal{I}\subseteq 2^{V }\) is a collection of independent sets satisfying the hereditary and exchange properties [20]. The rank function \(r: 2^{V } \rightarrow \mathbb{Z}\) of a matroid is defined as \(r(S) =\max _{I\in \mathcal{I},I\subseteq S}\,\vert I\vert\). The matroid polytope (i.e. convex hull of all independent sets) is given by the following linear inequalities:

$$\displaystyle{P(\mathcal{M})\quad:= \quad \left \{x \in \mathbb{R}^{n}\,\,:\,\,\sum _{ i\in S}x_{i} \leq r(S)\,\,\forall S \subseteq V,\,\,\,x \geq 0\right \}.}$$

As is usual when dealing with matroids, we assume access to an “independent set oracle” for \(\mathcal{M}\) that given any subset SV returns whether/not \(S \in \mathcal{I}\) in polynomial time.

Theorem 5

There is a randomized polynomial time algorithm that given matroid \(\mathcal{M}\) , fractional solution \(y \in P(\mathcal{M})\) , linear constraints \(\{a_{j} \in \mathbb{R}^{n}\}_{j=1}^{m}\) and values {λ j } j = 1 m satisfying the conditions in Theorem  4 , finds a solution \(y^{{\prime}}\in P(\mathcal{M})\) satisfying  (3) and  (4).

We note that the same result can be obtained even if we want to compute a base (maximal independent set) in the matroid: the only difference here is to add the equality iV x i = r(V ) to \(P(\mathcal{M})\) which corresponds to the base polytope of \(\mathcal{M}\).

The fact that we can exactly preserve matroid constraints leads to a number of improvements:

Degree-bounded matroid basis (DegMat).

Given a matroid on elements [n]: = {1, 2, , n} with costs \(d: [n] \rightarrow \mathbb{Z}_{+}\) and m “degree constraints” {S j , b j } j = 1 m where each S j ⊆ [n] and \(b_{j} \in \mathbb{Z}_{+}\), the goal is to find a minimum-cost basis I in the matroid that satisfies | IS j | ≤ b j for all j ∈ [m]. Since even the feasibility problem is NP-hard, we consider bicriteria approximation algorithms that violate the degree bounds. We obtain an algorithm where the solution costs at most the optimal and the degree bound violation is as in (7); here \(\Delta\) denotes the maximum number of sets {S j } j = 1 m containing any element.

Previous algorithms achieved approximation ratios of \((1,b + O(\sqrt{b\log n}))\) [6], based on randomized swap rounding, and \((1,b + \Delta - 1)\) [11] based on iterated rounding. Again, these bounds could not be combined together as they used different algorithms. We note that in general the \((1,b + O(\sqrt{n\log (m/n)}))\) approximation is the best possible (unless P=NP) for this problem [3, 5].

Multi-criteria matroid basis.

Given a matroid on elements [n] with k different cost functions \(d_{i}: [n] \rightarrow \mathbb{Z}_{+}\) (for i = 1, ⋯ , k) and budgets {B i } i = 1 k, the goal is to find (if possible) a basis I with d i (I) ≤ B i for each i ∈ [k]. We obtain an algorithm that for any ε > 0 finds in \(n^{O(k^{1.5}\,/\,\epsilon ) }\) time, a basis I with d i (I) ≤ (1 + ε)B i for all i ∈ [k]. Previously, [8] obtained such an algorithm with \(n^{O(k^{2}\,/\,\epsilon ) }\) running time.

Low congestion routing.

Given a directed graph G = (V, E) with edge capacities \(b: E \rightarrow \mathbb{Z}_{+}\), k source-sink pairs {(s i , t i )} i = 1 k and a length bound \(\Delta\), the goal is to find an s i t i path P i of length at most \(\Delta\) for each pair i ∈ [k] such that the number N e of paths using any edge e is at most b e . Using an LP-based reduction [6] this can be cast as an instance of DegMat. So we obtain violation bounds as in (7) which implies:

$$\displaystyle{N_{e}\,\, \leq \,\, b_{e} +\min \left \{O(\sqrt{\Delta }\log n),\,O(\sqrt{b_{e}}\log n +\log ^{2}n)\right \},\quad \forall e \in E.}$$

Here n = | V | is the number of vertices. Previous algorithms achieved bounds of \(\Delta - 1\) [10] or \(O(\sqrt{\log \Delta } \cdot \sqrt{b_{j}} +\log \Delta )\) [13] separately. We can also handle a richer set of routing requirements: given a laminar family \(\mathcal{L}\) on the k pairs, with a requirement r T on each set \(T \in \mathcal{L}\), we want to find a multiset of paths so that there are at least r T paths between the pairs in each \(T \in \mathcal{L}\). Although this is not an instance of DegMat, the same approach works.

Overview of techniques: Our algorithm in Theorem 4 is similar to the Lovett–Meka algorithm, and is also based on performing a Gaussian random walk at each step in a suitably chosen subspace. However, there some crucial differences. First, instead of updating each variable by the standard Gaussian N(0, 1), the variance for variable i is chosen proportional to min(y i , 1 − y i ), i.e. proportional to how close it is to the boundary 0 or 1. This is crucial for getting the multiplicative error instead of the additive error in the constraints. However, this slows down the “progress” of variables toward reaching 0 or 1. To get around this, we add O(logn) additional constraints to define the subspace where the walk is performed: these restrict the total fractional value of variables in a particular “scale” to remain fixed. Using these we can ensure that enough variables eventually reach 0 or 1.

In order to handle the matroid constraints (Theorem 5) we need to incorporate them (although they are exponentially many) in defining the subspace where the random walk is performed. One difficulty that arises here is that we can no longer implement the random walk using “near tight” constraints as in [15] since we are unable to bound the dimension of near-tight matroid constraints. However, as is well known, the dimension of exactly tight matroid constraints is at most n∕2 at any (strictly) fractional solution, and so we implement the random walk using exactly tight constraints. This requires us to truncate certain steps in the random walk (when we move out of the polytope), but we show that the effect of such truncations is negligible.

2 Matroid Partial Rounding

In this section we will prove Theorem 5 which also implies Theorem 4.

We may assume, without loss of generality, that max j = 1 m λ j n. This is because setting μ j = min{λ j , n} we have \(\sum _{j=1}^{m}e^{-\mu _{j}^{2}/K_{ 0}} \leq \sum _{j=1}^{m}e^{-\lambda _{j}^{2}/K_{ 0}} + m \cdot e^{-n} <\frac{n} {16} + 1\) (we used the assumption m ≤ 2n). So we can apply Theorem 5 with μ j s instead of λ j s to obtain a stronger result.

Let \(y \in \mathbb{R}^{n}\) denote the initial solution. The algorithm will start with X 0 = y and update this vector over time. Let X t denote the vector at time t for t = 1, , T. The value of T will be defined later. Let = 3⌈log2 n⌉. We classify the n elements into 2 classes based on their initial values y(i) as follows.

$$\begin{array}{c} U_{k}:= \left \{\begin{array}{ll} \left \{i \in [n]: 2^{-k-1} <y(i) \leq 2^{-k}\right \}\mbox{ if }1 \leq k \leq \ell-1 \\ \left \{i \in [n]: y(i) \leq 2^{-\ell}\right \} \mbox{ if }k =\ell. \end{array} \right. \\ V _{k}:= \left \{\begin{array}{ll} \left \{i \in [n]: 2^{-k-1} <1 - y(i) \leq 2^{-k}\right \}&\mbox{ if }1 \leq k \leq \ell-1 \\ \left \{i \in [n]: 1 - y(i) \leq 2^{-\ell}\right \} \mbox{ if }k =\ell. \end{array} \right.\end{array}$$

Note that the U k ’s partition elements of value (in y) between 0 and \(\frac{1} {2}\) and the V k ’s form a symmetric partition of elements valued between \(\frac{1} {2}\) and 1. This partition does not change over time, even though the value of variables might change. We define the “scale” of each element as:

$$\displaystyle{s_{i}\,\,:=\,\, 2^{-k},\qquad \forall i \in U_{ k} \cup V _{k},\quad \forall k \in [\ell].}$$

Define W j (s) = i = 1 n a ji 2 ⋅ s i 2 for each j ∈ [m]. Note that W j (s) ≥ W j (y) and

$$\displaystyle{W_{j}(s) - 4 \cdot W_{j}(y) \leq \sum _{i=1}^{n}a_{ ji}^{2} \cdot \frac{1} {n^{6}} = \frac{\|a_{j}\|^{2}} {n^{6}}.}$$

So \(\sqrt{ W_{j}(y)} \leq \sqrt{W_{j } (s)} \leq 2\sqrt{W_{j } (y)} + \frac{\|a_{j}\|} {n^{3}}\). Our algorithm will find a solution y with \(\Omega (n)\) integral variables such that:

$$\displaystyle{\vert \langle y^{{\prime}}- y,a_{ j}\rangle \vert \leq \lambda _{j} \cdot \sqrt{W_{j } (s)} + \frac{1} {n^{3}} \cdot \| a_{j}\|,\quad \forall j \in [m].}$$

This suffices to prove Theorem 5 as

$$\displaystyle{\lambda _{j} \cdot \sqrt{W_{j } (s)} + \frac{1} {n^{3}} \cdot \| a_{j}\| \leq 2\lambda _{j} \cdot \sqrt{W_{j } (y)} + \left ( \frac{1} {n^{3}} + \frac{\lambda _{j}} {n^{3}}\right ) \cdot \| a_{j}\| \leq 2\lambda _{j} \cdot \sqrt{W_{j } (y)} + \frac{\|a_{j}\|} {n^{2}}.}$$

Consider the polytope \(\mathcal{Q}\) of points \(x \in \mathbb{R}^{n}\) satisfying the following constraints.

$$\displaystyle\begin{array}{rcl} x& \in & P(\mathcal{M}),{}\end{array}$$
(8)
$$\displaystyle\begin{array}{rcl} \vert \langle x - y,a_{j}\rangle \vert & \leq & \lambda _{j} \cdot \sqrt{W_{j } (s)} + \frac{1} {n^{3}} \cdot \| a_{j}\|\qquad \forall j \in [m],{}\end{array}$$
(9)
$$\displaystyle\begin{array}{rcl} \sum _{i\in U_{k}}x_{i}& =& \sum _{i\in U_{k}}y_{i}\qquad \qquad \qquad \qquad \qquad \forall k \in [\ell],{}\end{array}$$
(10)
$$\displaystyle\begin{array}{rcl} \sum _{i\in V _{k}}x_{i}& =& \sum _{i\in V _{k}}y_{i}\qquad \qquad \qquad \qquad \qquad \forall k \in [\ell],{}\end{array}$$
(11)
$$\displaystyle\begin{array}{rcl} 0\,\,\, \leq \,\,\, x_{i}& \leq & \min \{\alpha \cdot 2^{-k},1\}\qquad \qquad \qquad \forall i \in U_{ k},\,\forall k \in [\ell],{}\end{array}$$
(12)
$$\displaystyle\begin{array}{rcl} 0\,\,\, \leq \,\,\, 1 - x_{i}& \leq & \min \{\alpha \cdot 2^{-k},1\}\qquad \qquad \qquad \forall i \in V _{ k},\,\forall k \in [\ell].{}\end{array}$$
(13)

Above α = 40 is a constant whose choice will be clear later. The algorithm will maintain the invariant that at any time t ∈ [T], the solution X t lies in \(\mathcal{Q}\). In particular the constraint (8) requires that X t stays in the matroid polytope. Constraint (9) controls the violation of the side constraints over all time steps. The last two constraints (12) and (13) enforce that variables in U k (and symmetrically V k ) do not deviate far beyond their original scale of 2k. The constraints (10) and (11) ensure that throughout the algorithm, the total value of elements in U k (and V k ) stay equal to their initial sum (in y). These constraints will play a crucial role in arguing that the algorithm finds a partial coloring. Note that there are only 2 such constraints.

In order to deal with complexity issues, we will assume (without loss of generality, by scaling) that all entries in the constraints describing \(\mathcal{Q}\) are integers bounded by some value B. Our algorithm will then run in time polynomial in n, m and log2 B, given an independent set oracle for the matroid \(\mathcal{M}\). Also, our algorithm will only deal with points having rational entries of small “size”. Recall that the size of a rational number is the number of bits needed to represent it, i.e. the size of pq (where \(p,q \in \mathbb{Z}\)) is log2 | p | + log2 | q |.

The Algorithm: Let γ = n −6 and T = Kγ 2 where K: = 10α 2. The algorithm starts with solution \(X_{0} = y \in \mathcal{Q}\), and does the following at each time step t = 0, 1, , T:

  1. 1.

    Consider the set of constraints of \(\mathcal{Q}\) that are tight at the point x = X t , and define the following sets based on this.

    1. (a)

      Let \(\mathcal{C}_{t}^{var}\) be the set of tight variable constraints among (12) and (13). This consists of:

      1. i.

        iU k (for any k) with X t (i) = 0 or X t (i) = min{α ⋅ 2k, 1}; and

      2. ii.

        iV k (for any k) with X t (i) = 1 or X t (i) = max{1 −α ⋅ 2k, 0}.

    2. (b)

      Let \(\mathcal{C}_{t}^{side}\) be the set of tight side constraints from (9), i.e. those j ∈ [m] with

      $$\displaystyle{\vert \langle X^{t} - y,a_{ j}\rangle \vert =\lambda _{j} \cdot \sqrt{W_{j } (s)} + \frac{1} {n^{3}}\,\|a_{j}\|.}$$
    3. (c)

      Let \(\mathcal{C}_{t}^{part}\) denote the set of the 2 equality constraints (10) and (11).

    4. (d)

      Let \(\mathcal{C}_{t}^{rank}\) be a maximal linearly independent set of tight rank constraints from (8). As usual, a set of constraints is said to be linearly independent if the corresponding coefficient vectors are linearly independent. Since \(\mathcal{C}_{t}^{rank}\) is maximal, every tight rank constraint is a linear combination of constraints in \(\mathcal{C}_{t}^{rank}\). By Claim 2, \(\vert \mathcal{C}_{t}^{rank}\vert \leq n/2\).

  2. 2.

    Let \(\mathcal{V}_{t}\) denote the subspace orthogonal to all the constraints in \(\mathcal{C}_{t}^{var}\), \(\mathcal{C}_{t}^{side}\), \(\mathcal{C}_{t}^{part}\) and \(\mathcal{C}_{t}^{rank}\). Let D be an n × n diagonal matrix with entries d ii = 1∕s i , and let \(\mathcal{V}_{t}^{{\prime}}\) be the subspace \(\mathcal{V}_{t}^{{\prime}} =\{ Dv: v \in \mathcal{V}_{t}\}.\) As D is invertible, \(\mathrm{dim}(\mathcal{V}_{t}^{{\prime}}) =\mathrm{ dim}(\mathcal{V}_{t})\).

  3. 3.

    Let {b 1, , b k } be an almost orthonormal basis of \(\mathcal{V}_{t}^{{\prime}}\) given by Fact 2. Note that all entries in these vectors are rationals of size O(n 2logB).

  4. 4.

    Let G t be a random direction defined as G t : = h = 1 k g h b h where the g h are independent { − 1, +1} Bernoulli random variables.

  5. 5.

    Let \(\overline{G}_{t}:= D^{-1}G_{t}\). As \(G_{t} \in \mathcal{V}_{t}^{{\prime}}\), it must be that G t = Dv for some \(v \in \mathcal{V}_{t}\) and thus \(\overline{G}_{t} = D^{-1}G_{t} \in \mathcal{V}_{t}\). Note that all entries in \(\overline{G}_{t}\) are rationals of size O(n 3logB).

  6. 6.

    Set \(Y _{t} = X_{t} +\gamma \cdot \overline{G}_{t}\).

    1. (a)

      If \(Y _{t} \in \mathcal{Q}\) then X t+1Y t and continue to the next iteration.

    2. (b)

      Else X t+1 ← the point in \(\mathcal{Q}\) that lies on the line segment (X t , Y t ) and is closest to Y t . This can be found by binary search and testing membership in the matroid polytope. By Claim 1 it follows that the number of steps in the binary search is at most O(nlogB).

This completes the description of the algorithm. We actually do not need to compute the tight constraints from scratch in each iteration. We start the algorithm off with a strictly feasible solution \(y \in \mathcal{Q}\) which does not have any tight constraint other than (10) and (11). Then, the only place a new constraint gets tight is Step 6b: at this point, we add the new constraint to the appropriate set among \(\mathcal{C}_{t}^{var}\), \(\mathcal{C}_{t}^{side}\) and \(\mathcal{C}_{t}^{var}\) and continue.

In order to keep the analysis clean and convey the main ideas, we will assume that the basis {b 1, , b k } in Step 3 is exactly orthonormal. When the basis is “almost orthonormal” as given in Fact 2, the additional error incurred is negligible.

Running Time Since the number of iterations is polynomial, we only need to show that each of the steps in any single iteration can be implemented in polynomial time. The only step that requires justification is 6b, which is shown in Claim 1. Moreover, we need to ensure that all points considered in the algorithm have rational coefficients of polynomial size. This is done by a rounding procedure (see Fact 1) that given an arbitrary point, finds a nearby rational point of size O(n 2logB). Since the number of steps in the algorithm is polynomial, the total error incurred by such rounding steps is small.

Claim 1

The number of binary search iterations performed in Step 6b is O(n 4logB).

Proof

To reduce notation let a = X t , \(d =\gamma \overline{G}_{t}\) and \(Y (\mu ):= a +\mu \cdot d \in \mathbb{R}^{n}\) for any \(\mu \in \mathbb{R}\). Recall that Step 6b involves finding the maximum value of μ such that point \(Y (\mu ) \in \mathcal{Q}\).

By the rounding procedure (Fact 1) we know that a has rational entries of size O(n 2logB).

We now show that the direction d has rational entries of size O(n 3logB). This is because (i) the basis vectors {b 1, ⋯ , b k } have rational entries of size O(n 2logB) by Fact 2, (ii) G t = h = 1 k g h ⋅ b h (where each g h = ±1) has rational entries of size O(n 3logB) and (iii) \(\overline{G}_{t} = D^{-1}G_{t}\) where D −1 is a diagonal matrix with rational entries of size O(nlogB).

Next, observe that for any constraint 〈a , x〉 ≤ β in \(\mathcal{Q}\), the point of intersection of the hyperplane 〈a , x〉 = β with line \(\{Y (\mu ):\mu \in \mathbb{R}\}\) is \(\mu = \frac{\beta -\langle a^{{\prime}},a\rangle } {\langle a^{{\prime}},d\rangle }\) which is a rational of size at most σ = O(n 4logB) as a , a, d, β all have rational entries of size O(n 3logB). Let ε = 2−2σ be a value such that the difference between any two distinct rationals of size at most σ is more than ε.

In Step 6b, we start the binary search with the interval [0, 1] for μ where \(Y (0) \in \mathcal{Q}\) and \(Y (1)\not\in \mathcal{Q}\). We perform this binary search until the interval width falls below ε, which requires \(\log _{2}\frac{1} {\epsilon } = O(n^{4}\log B)\) iterations. At the end, we have two values μ 0 < μ 1 with μ 1μ 0 < ε such that \(Y (\mu _{0}) \in \mathcal{Q}\) and \(Y (\mu _{1})\not\in \mathcal{Q}\). Moreover, we obtain a constraint 〈a , x〉 ≤ β in \(\mathcal{Q}\) that is not satisfied by Y (μ 1). We set μ to be the (unique) value such that Y (μ ) satisfies this constraint at equality, and set X t+1 = Y (μ ). Note that μ 0μ < μ 1. To see that \(Y (\mu ^{{\prime}}) \in \mathcal{Q}\), suppose (for contradiction) that some constraint in \(\mathcal{Q}\) is not satisfied at Y (μ ); then the point of intersection of line \(\{Y (\mu ):\mu \in \mathbb{R}\}\) with this constraint must be at μ ∈ [μ 0, μ ) which (by the choice of ε) can not be a rational of size at most σ— a contradiction. □

Analysis The analysis involves proving the following main lemma.

Lemma 1

With constant probability, the final solution X T has \(\vert \mathcal{C}_{T}^{var}\vert \geq \frac{n} {20}\) .

We first show how this implies Theorem 5.

Proof of Theorem 5 from Lemma 1

The algorithm outputs the solution y : = X T . By design the algorithm ensures that \(X_{T} \in \mathcal{Q}\), and thus \(X_{T} \in P(\mathcal{M})\) and it satisfies the error bounds (9) on the side constraints. It remains to show that \(\Omega (n)\) variables in X T must be integer valued whenever \(\vert \mathcal{C}_{T}^{var}\vert \geq \frac{n} {20}\). For each k ∈ [] define u k : = | {iU k : X T (i) = α ⋅ 2k} | and v k : = | {iV k : X T (i) = 1 −α ⋅ 2k} |. By the equality constraints (10) for U k , it follows that

$$\displaystyle{u_{k} \cdot \alpha \cdot 2^{-k} \leq \sum _{ i\in U_{k}}X_{T}(i) = X_{T}(U_{k}) = y(U_{k}) \leq \vert U_{k}\vert \cdot 2^{-k}.}$$

This gives that \(u_{k} \leq \frac{1} {\alpha } \vert U_{k}\vert\). Similarly, \(v_{k} \leq \frac{1} {\alpha } \vert V _{k}\vert\). This implies that k = 1 (u k + v k ) ≤ nα. As the tight variables in \(\mathcal{C}_{t}^{var}\) have values either 0 or 1 or α ⋅ 2k or 1 −α ⋅ 2k, it follows that the number of {0, 1} variables is at least

$$\displaystyle{\vert \mathcal{C}_{t}^{var}\vert -\sum _{ k=1}^{\ell}(u_{ k} + v_{k}) \geq \left (\vert \mathcal{C}_{t}^{var}\vert -\frac{n} {\alpha } \right ) \geq \left ( \frac{1} {20} -\frac{1} {\alpha } \right )n}$$

which is at least n∕40 by choosing α = 40. □

In the rest of this section we prove Lemma 1.

Claim 2

Given any \(x \in P(\mathcal{M})\) with 0 < x < 1 , the maximum number of tight linearly independent rank constraints is n∕2.

Proof

Recall that a tight constraint in \(P(\mathcal{M})\) is any subset TV with iT x i = r(T). The claim follows from the known property (see e.g. [20]) that for any \(x \in P(\mathcal{M})\) there is a linearly independent collection \(\mathcal{C}\) of tight constraints such that (i) \(\mathcal{C}\) spans all tight constraints and (ii) \(\mathcal{C}\) forms a chain family. Since all right-hand-sides are integer and each variable is strictly between 0 and 1, it follows that \(\vert \mathcal{C}\vert \leq \frac{n} {2}\). □

Claim 3

The truncation Step 6b occurs at most n times.

Proof

We will show that whenever Step 6b occurs (i.e. the random move gets truncated) the dimension \(dim(\mathcal{V}_{t+1})\) decreases by at least 1, i.e. \(dim(\mathcal{V}_{t+1}) \leq dim(\mathcal{V}_{t}) - 1\). As the maximum dimension is n this would imply the claim.

Let \(\mathcal{E}_{t}\) denote the subspace spanned by all the tight constraints of \(X_{t} \in \mathcal{Q}\); Recall that \(\mathcal{V}_{t} = \mathcal{E}_{t}^{\perp }\) is the subspace orthogonal to \(\mathcal{E}_{t}\), and thus \(dim(\mathcal{E}_{t}) = n - dim(\mathcal{V}_{t})\). We also have \(\mathcal{E}_{0} \subseteq \mathcal{E}_{1} \subseteq \cdots \mathcal{E}_{T}\). Suppose that Step 6b occurs in iteration t. Then we have \(X_{t} \in \mathcal{Q}\), \(Y _{t}\not\in \mathcal{Q}\) and \(Y _{t} - X_{t} \in \mathcal{V}_{t}\). Moreover \(X_{t+1} = X_{t} +\epsilon (Y _{t} - X_{t}) \in \mathcal{Q}\) where ε ∈ [0, 1) is such that \(X_{t} +\epsilon ^{{\prime}}(Y _{t} - X_{t})\not\in \mathcal{Q}\) for all ε > ε. So there is some constraint 〈a , x〉 ≤ β in \(\mathcal{Q}\) with:

$$\displaystyle{\langle a^{{\prime}},X_{ t}\rangle \leq \beta,\quad \langle a^{{\prime}},X_{ t+1}\rangle =\beta \quad \mbox{ and}\quad \langle a^{{\prime}},Y _{ t}\rangle>\beta.}$$

Since this constraint satisfies 〈a , Y t X t 〉 > 0 and \(Y _{t} - X_{t} \in \mathcal{V}_{t}\), we have \(a^{{\prime}}\not\in \mathcal{E}_{t}\). As a is added to \(\mathcal{E}_{t+1}\), we have \(dim(\mathcal{E}_{t+1}) \geq 1 + dim(\mathcal{E}_{t})\). This proves the desired property and the claim. □

The statements of the following two lemmas are similar to those in [15], but the proofs require additional work since our random walk is different. The first lemma shows that the expected number of tight side constraints at the end of the algorithm is not too high, and the second lemma shows that the expected number of tight variable constraints is large.

Lemma 2

\(\mathbb{E}[\vert \mathcal{C}_{T}^{side}\vert ] <\frac{n} {4}\) .

Proof

Note that \(X_{T} - y =\gamma \sum _{ t=0}^{T}\overline{G}_{t} +\sum _{ q=1}^{n}\Delta _{t(q)}\) where \(\Delta\) s correspond to the truncation incurred during the iterations t = t(1), ⋯ , t(n) for which Step 6b applies (by Claim 3 there are at most n such iterations). Moreover for each q, \(\Delta _{t(q)} =\delta \cdot \overline{G}_{t(q)}\) for some δ with 0 < | δ | < γ.

If \(j \in \mathcal{C}_{T}^{side}\), then \(\vert \langle X_{T} - y,a_{j}\rangle \vert =\lambda _{j}\sqrt{W_{j } (s)} + \frac{1} {n^{3}} \cdot \| a_{j}\|\). We have

$$\displaystyle\begin{array}{rcl} \vert \langle X_{T} - y,a_{j}\rangle \vert & \leq & \vert \gamma \sum _{t=0}^{T}\langle \overline{G}_{ t},a_{j}\rangle \vert +\sum _{ q=1}^{n}\gamma \vert \langle \overline{G}_{ a(q)},a_{j}\rangle \vert \leq \vert \gamma \sum _{t=0}^{T}\langle \overline{G}_{ t},a_{j}\rangle \vert {}\\ & & +n\gamma \cdot \max _{t=0}^{T}\vert \langle \overline{G}_{ t},a_{j}\rangle \vert. {}\\ \end{array}$$

Note that at any iteration t,

$$\displaystyle{\vert \langle \overline{G}_{t},a_{j}\rangle \vert = \vert \langle D^{-1}G_{ t},a_{j}\rangle \vert \leq \vert \langle G_{t},a_{j}\rangle \vert \leq \sum _{h=1}^{k}\vert \langle b_{ h},a_{j}\rangle \vert \leq n\|a_{j}\|.}$$

The first inequality above uses that D −1 is a diagonal matrix with entries at most one, the second inequality is by definition of G t where {b h } is an orthonormal basis of \(\mathcal{V}_{t}^{{\prime}}\), and the last inequality uses that each b h is a unit vector. As γ = n −6, we have \(n\gamma \cdot \max _{t=0}^{T}\vert \langle \overline{G}_{t},a_{j}\rangle \vert \leq \| a_{j}\|/n^{4}\). So it follows that if \(j \in \mathcal{C}_{T}^{side}\), then we must have:

$$\displaystyle{\vert \gamma \sum _{t=0}^{T}\langle \overline{G}_{ t},a_{j}\rangle \vert \,\,\geq \,\,\lambda _{j}\sqrt{W_{j } (s)}.}$$

In order to bound the probability of this event, we consider the sequence {Z t } where \(Z_{t} =\langle \overline{G}_{t},a_{j}\rangle\), and note the following useful facts.

Observation 1

The sequence {Z t } forms a martingale satisfying:

  1. 1.

    \(\mathbb{E}\left [Z_{t}\mid Z_{t-1},\ldots,Z_{0}\right ] = 0\) for all t.

  2. 2.

    | Z t | ≤ na j whp for all t.

  3. 3.

    \(\mathbb{E}\left [Z_{t}^{2}\mid Z_{t-1},\ldots,Z_{0}\right ] \leq \sum _{i=1}^{n}s_{i}^{2} \cdot a_{ji}^{2} = W_{j}(s)\) for all t.

Proof

As \(\overline{G}_{t}\,=\,\sum _{h=1}^{k}g_{h} \cdot b_{h}\) where each \(\mathbb{E}[g_{h}] = 0\), we have \(\mathbb{E}[\overline{G}_{t}\vert \overline{G}_{0},\ldots,\overline{G}_{t-1}]\,=\,\mathbf{0}\).Note that \(\overline{G}_{t}\) is not independent of \(\overline{G}_{0},\ldots,\overline{G}_{t-1}\), as these choices determine the subspace where \(\overline{G}_{t}\) lies. So {Z t } forms a martingale sequence with the first property.

For the remaining two properties, we fix j ∈ [m] and t and condition on Z 0, , Z t−1. To reduce notation we drop all subscripts: so a = a j , G = G t , \(\mathcal{V}^{{\prime}} = \mathcal{V}_{t}^{{\prime}}\) and Z = Z t .

Let {b r } denote an orthonormal basis for the linear subspace \(\mathcal{V}^{{\prime}}\). Then G = r g r ⋅ b r where each g r is iid ± 1 with probability half. As \(\overline{G} = D^{-1}G\), we have \(Z =\langle \overline{G},a\rangle =\sum _{r}\langle D^{-1}b_{r},a\rangle \,g_{r} =\sum _{r}\langle D^{-1}a,b_{r}\rangle \,g_{r}\). So, we can bound

$$\displaystyle{\vert Z\vert \,\,\leq \,\,\sum _{r}\vert \langle D^{-1}a,b_{ r}\rangle \vert \cdot \vert g_{r}\vert \,\,\leq \,\,\| D^{-1}a\|\sum _{ r}\vert g_{r}\vert \,\,\leq \,\, n\|a\|.}$$

The first inequality follows from the triangle inequality, the second by Cauchy–Schwartz and as b r is a unit-vector, and the third follows as D −1 is a diagonal matrix with entries at most one. This proves property 2.

Finally, \(\mathbb{E}[Z^{2}] =\sum _{r}\langle D^{-1}a,b_{r}\rangle ^{2}\,\mathbb{E}[g_{r}^{2}] =\sum _{r}\langle D^{-1}a,b_{r}\rangle ^{2} \leq \| D^{-1}a\|^{2}\), where the last step follows as {b r } is an orthonormal basis for a subspace of \(\mathbb{R}^{n}\). This proves property 3. □

Using a martingale concentration inequality, we obtain:

Claim 4

\(\Pr \left [\vert \gamma \sum _{t=0}^{T}\langle \overline{G}_{t},a_{j}\rangle \vert \,\geq \,\lambda _{j}\sqrt{W_{j } (s)}\right ] =\Pr \left [\vert \sum _{t=0}^{T}Z_{t}\vert \,\geq \, \frac{\lambda _{j}} {\gamma } \sqrt{W_{j } (s)}\right ] \leq 2 \cdot \exp (-\lambda _{j}^{2}/3K)\) .

Proof

The first equality is by definition of the Z t s. We now use the following concentration inequality:

Theorem 6 (Freedman [7] (Theorem 1.6))

Consider a real-valued martingale sequence {Z t } t ≥ 0 such that Z 0 = 0, \(\mathbb{E}\left [Z_{t}\mid Z_{t-1},\ldots,Z_{0}\right ] = 0\) for all t, and | Z t | ≤ M almost surely for all t. Let \(W_{t} =\sum _{ j=0}^{t}\mathbb{E}\left [Z_{j}^{2}\,\mid \,Z_{j-1},Z_{j-2},\ldots Z_{0}\right ]\) for all t ≥ 1. Then for all ℓ ≥ 0 and σ 2 > 0, and any stopping time τ we have

$$\displaystyle{\Pr \left [\vert \sum _{j=0}^{\tau }Z_{ j}\vert \geq \ell\,\, and\,\,W_{\tau } \leq \sigma ^{2}\right ]\quad \leq \quad 2\exp \left (- \frac{\ell^{2}/2} {\sigma ^{2} + M\ell/3}\right )}$$

We apply this with M = na j ∥, \(\ell= \frac{\lambda _{j}} {\gamma } \sqrt{W_{j } (s)}\), σ 2 = T ⋅ W j (s) and τ = T. Note that

$$\displaystyle{ \frac{\ell^{2}} {2\sigma ^{2} + \frac{2} {3}M\ell}\quad = \quad \frac{\lambda _{j}^{2}} {2\gamma ^{2}T + \frac{2} {3}\gamma n\|a_{j}\|\lambda _{j}/\sqrt{W_{j } (s)}}\quad \geq \quad \frac{\lambda _{j}^{2}} {2\gamma ^{2}T + 1},}$$

where the last inequality uses W j (s) ≥ ∥a j 2n 6, λ j n and γ = n −6. Thus

$$\displaystyle{\Pr \left [\vert \gamma \sum _{t=0}^{T}\langle \overline{G}_{ t},a_{j}\rangle \vert \,\,\geq \,\,\lambda _{j}\sqrt{W_{j } (s)}\right ] \leq 2\exp \left ( \frac{-\lambda _{j}^{2}} {2\gamma ^{2}T + 1}\right ) \leq 2 \cdot \exp (-\lambda _{j}^{2}/3K).}$$

The last inequality uses T = Kγ 2 and K ≥ 1. This completes the proof of the claim. □

By the above claim, we have \(\mathbb{E}[\vert \mathcal{C}_{T}^{side}\vert ] <2\sum _{j=1}^{m}\exp (-\lambda _{j}^{2}/(30\alpha ^{2})) <0.25n\). This completes the proof of Lemma 2. □

We now prove that in expectation, at least 0. 1n variables become tight at the end of the algorithm. This immediately implies Lemma 1.

Lemma 3

\(\mathbb{E}[\vert \mathcal{C}_{T}^{var}\vert ] \geq 0.1n\) .

Proof

Define the following potential function, which will measure the progress of the algorithm toward the variables becoming tight.

$$\displaystyle{\Phi (x)\quad:= \quad \sum _{k=1}^{\ell}2^{2k} \cdot \left (\sum _{ i\in U_{k}}x(i)^{2} +\sum _{ i\in V _{k}}(1 - x(i))^{2}\right ),\qquad \forall x \in \mathcal{Q}.}$$

Note that since \(X_{T} \in \mathcal{Q}\), we have X T (i) ≤ α ⋅ 2k for iU k and 1 − X T (i) ≤ α ⋅ 2k for iV k . So \(\Phi (X_{T}) \leq \alpha ^{2} \cdot n\). We also define the “incremental function” for any \(x \in \mathcal{Q}\) and \(g \in \mathbb{R}^{n}\), \(f(x,g):= \Phi (x +\gamma D^{-1}g) - \Phi (x)\). Recall that D −1 is the n × n diagonal matrix with entries (s 1, , s n ) where s i = 2k for iU k V k . So

$$\displaystyle\begin{array}{rcl} f(x,g)& =& \gamma ^{2}\sum _{ i=1}^{n}g(i)^{2} + 2\sum _{ k=1}^{\ell}2^{2k} \cdot \left (\sum _{ i\in U_{k}}x(i)\gamma s_{i} \cdot g(i) -\sum _{i\in V _{k}}(1 - x(i))\gamma s_{i} \cdot g(i)\right ) {}\\ & =& \gamma ^{2}\sum _{ i=1}^{n}g(i)^{2} + 2\gamma \sum _{ k=1}^{\ell}\left (\sum _{ i\in U_{k}}\frac{x(i)g(i)} {s_{i}} -\sum _{i\in V _{k}}\frac{(1 - x(i))g(i)} {s_{i}} \right ) {}\\ \end{array}$$

Suppose the algorithm was modified to never have the truncation step 6b, then in any iteration t, the increase \(\Phi (Y _{t}) - \Phi (X_{t}) = f(X_{t},G_{t})\) where G t is the random direction chosen in \(\mathcal{V}_{t}^{{\prime}}\). The following is by simple calculation.

$$\displaystyle\begin{array}{rcl} f(X_{t},G_{t}) - f(X_{t},\delta G_{t})& =& \gamma ^{2}(1 -\delta ^{2})\|G_{ t}\|_{2}^{2} + 2\gamma (1-\delta ) \\ & & \sum _{k=1}^{\ell}\left (\sum _{ i\in U_{k}}\frac{X_{t}(i)} {s_{i}} \cdot G_{t}(i) -\sum _{i\in V _{k}}\frac{1 - X_{t}(i)} {s_{i}} \cdot G_{t}(i)\right ) \\ & \leq & \gamma ^{2}(1 -\delta ^{2})\|G_{ t}\|_{2}^{2} + 2\alpha \gamma (1-\delta )\sum _{ i=1}^{n}\vert G_{ t}(i)\vert \\ &\leq & \gamma ^{2}\|G_{ t}\|_{2}^{2} + 2\gamma \alpha \|G_{ t}\|_{1} \\ & \leq & \gamma ^{2}n + 2\gamma \alpha n^{3/2}\quad \leq \quad \frac{1} {n} {}\end{array}$$
(14)

The first inequality in (14) uses the fact that G t is the sum of orthogonal unit vectors, and the second inequality uses γ = n −6 and α = O(1).

This implies that

$$\displaystyle\begin{array}{rcl} \Phi (X_{T}) - \Phi (X_{0})& =& \sum _{t=0}^{T}f(X_{ t},\delta _{t}G_{t})\, \geq \,\sum _{t=0}^{T}f(X_{ t},G_{t}) \\ & & -\frac{1} {n}\sum _{t=0}^{T}\mathbf{1}[\mbox{ step 6b occurs in iteration }t] \\ & \geq & \sum _{t=0}^{T}f(X_{ t},G_{t}) - 1\qquad \mbox{ (by Claim 3)} {}\end{array}$$
(15)

Claim 5

\(\mathbb{E}[\Phi (X_{T})] - \Phi (y) \geq \gamma ^{2}T \cdot \mathbb{E}[\mathrm{dim}(\mathcal{V}_{T})] - 1\) .

Proof

From (15) we have:

$$\displaystyle{ \mathbb{E}[\Phi (X_{T})] - \Phi (X_{0}) \geq \sum _{t=0}^{T}\mathbb{E}[\,f(X_{ t},G_{t})] - 1. }$$
(16)

In any iteration t, as G t = h = 1 k g h b h where {b h } is an orthonormal basis for \(\mathcal{V}_{t}^{{\prime}}\) and g h = ±1,

$$\displaystyle\begin{array}{rcl} \mathbb{E}[\,f(X_{t},G_{t})]& =& \gamma ^{2}\sum _{ i=1}^{n}\mathbb{E}[G_{ t}(i)^{2}] =\gamma ^{2}\sum _{ h=1}^{k}\|b_{ h}\|^{2} =\gamma ^{2}k =\gamma ^{2}\mathbb{E}[dim(\mathcal{V}_{ t}^{{\prime}})] {}\\ & =& \gamma ^{2}\mathbb{E}[dim(\mathcal{V}_{ t})]. {}\\ \end{array}$$

Moreover, because \(\mathcal{V}_{0} \supseteq \mathcal{V}_{1} \supseteq \cdots \mathcal{V}_{T}\), we have \(\mathbb{E}[dim(\mathcal{V}_{t})] \geq \mathbb{E}[dim(\mathcal{V}_{T})]\). So

$$\displaystyle{ \sum _{t=0}^{T}\mathbb{E}[\,f(X_{ t},G_{t})]\quad \geq \quad \gamma ^{2}T \cdot \mathbb{E}[dim(\mathcal{V}_{ T})]. }$$
(17)

Combining (16) and (17), we complete the proof of Claim 5. □

By Claim 2 and the fact that \(\vert \mathcal{C}_{T}^{part}\vert = 2\ell\), we have

$$\displaystyle\begin{array}{rcl} \mathrm{dim}(\mathcal{V}_{T})& \geq & n -\mathrm{ dim}(\mathcal{C}_{T}^{var}) -\mathrm{ dim}(\mathcal{C}_{ T}^{side}) -\mathrm{ dim}(\mathcal{C}_{ T}^{rank}) -\mathrm{ dim}(\mathcal{C}_{ T}^{part}) {}\\ & \geq & \frac{n} {2} - 2\ell -\mathrm{ dim}(\mathcal{C}_{T}^{var}) -\mathrm{ dim}(\mathcal{C}_{ T}^{side}) {}\\ \end{array}$$

Taking expectations and by Claim 2, this gives

$$\displaystyle{ \mathbb{E}[\mathrm{dim}(\mathcal{V}_{T})] \geq \frac{n} {4} - 2\ell -\mathrm{ dim}(\mathcal{C}_{T}^{var}) }$$
(18)

Using \(\Phi (X_{T}) \leq \alpha ^{2}n\) and Claim 5, we obtain:

$$\displaystyle{\alpha ^{2}n \geq \mathbb{E}[\Phi _{ T}] \geq \gamma ^{2}T \cdot \left (\frac{n} {4} - 2\ell - \mathbb{E}[\mathrm{dim}(\mathcal{C}_{T}^{var})]\right ) - 1.}$$

Rearranging and using T = Kγ 2, K = 10α 2 and = logn gives that

$$\displaystyle{\mathbb{E}[\mathrm{dim}(\mathcal{C}_{T}^{var})] \geq \frac{n} {4} -\frac{\alpha ^{2}n} {K} - 2\ell - \frac{1} {K} \geq 0.1n,}$$

where we used K = 10α 2, α = 40 and = O(logn). This completes the proof of Lemma 3. □

3 Applications

3.1 Linear System Rounding with Violations

Consider a 0–1 integer program on n variables where each constraint j ∈ [m] corresponds to some subset S j ⊆ [n] of the variables having total value \(b_{j} \in \mathbb{Z}_{+}\). That is,

$$\displaystyle{P\quad = \quad \left \{x \in \{ 0,1\}^{n}\quad: \quad \sum _{ i\in S_{j}}x_{i} = b_{j},\,\,\forall j \in [m]\right \}.}$$

Theorem 7

There is a randomized polynomial time algorithm that given any fractional solution satisfying the constraints in P, finds an integer solution x ∈ {0, 1}n where for each j ∈ [m],

$$\displaystyle\begin{array}{rcl} \vert x(S_{j}) - b_{j}\vert \quad & \leq & \quad O(1) \cdot \min \left \{\sqrt{j},\,\,\sqrt{n\log (m/n)},\,\,\sqrt{\log m\log n \cdot b_{j}}\right. {}\\ & & \left.+\log m\log n,\,\,\sqrt{\Delta }\log n\right \}. {}\\ \end{array}$$

Above \(\Delta =\max _{ i=1}^{n}\vert \{j \in [m]: i \in S_{j}\}\vert\) is the maximum number of constraints that any variable appears in.

Proof

Let y ∈ [0, 1]n be a fractional solution with \(\sum _{i\in S_{j}}y_{i} = b_{j}\) for all j ∈ [m]. The algorithm in Theorem 7 uses Theorem 4 iteratively to obtain the integral solution x.

In each iteration, we start with a fractional solution y with fn fractional variables and set the parameters λ j suitably so that \(\sum _{j=1}^{m}e^{-\lambda _{j}^{2}/K_{ 0}} \leq \frac{f} {16}\). That is, the condition in Theorem 4 is satisfied. Note that \(W_{j}(y^{{\prime}}) =\sum _{i\in S_{j}}(y_{i}^{{\prime}})^{2} \leq y^{{\prime}}(S_{j})\) and W j (y ) ≤ f. Now, by applying Theorem 4, we would obtain a new fractional solution y such that:

  • For each j ∈ [m], \(\vert y^{{\prime\prime}}(S_{j}) - y^{{\prime}}(S_{j})\vert \leq \lambda _{j}\sqrt{W_{j } (y^{{\prime} } )} + \frac{1} {n} \leq O(\lambda _{j}) \cdot \sqrt{f}\).

  • The number of fractional variables in y is at most \(\frac{f} {K}\) for some constant K > 1.

Therefore, after \(\frac{\log n} {\log K} = O(\log n)\) iterations we obtain a solution with O(1) fractional variables. Setting these fractional variables arbitrarily to 0–1 values, we obtain an integral solution x.

Let us partition the constraints into sets M 1, M 2, M 3 and M 4 based on which of the four terms in Theorem 7 is minimum. That is, M 1 ⊆ [m] consists of constraints j ∈ [m] where \(\sqrt{ j}\) is smaller than the other three terms; M 2, M 3, M 4 are defined similarly. Below we show how to set the parameters λ j and bound the constraint violations for these parts separately.

Error bound of \(\min \{\sqrt{j},\,\sqrt{n\log (m/n)}\}\) for j ∈ M 1 ∪ M 2 In any iteration with fn fractional variables, we set the parameters λ j s in Theorem 4 as follows:

$$\displaystyle{\lambda _{j} = \left \{\begin{array}{ll} 0 &\mbox{ if }j <c_{1}f \\ \sqrt{c_{2 } \,\log \frac{j} {c_{1}f}} &\mbox{ if }j \geq c_{1}f \end{array} \right.}$$

Here c 1 and c 2 are constants that will be fixed later. Note that

$$\displaystyle\begin{array}{rcl} \sum _{j\in M_{1}\cup M_{2}}^{m}e^{-\lambda _{j}^{2}/K_{ 0}}\,& \leq & \,c_{1}f +\sum _{j\geq c_{ 1}f}e^{-\frac{c_{2}} {K_{0}} \log \frac{j} {c_{1}f} }\, \leq \, c_{1}f +\sum _{i\geq 0}2^{i}c_{1}f \cdot e^{-ic_{2}/K_{0}} {}\\ & \leq & \,c_{1}f + c_{1}f\sum _{i\geq 0}2^{-i}\, \leq \, 3c_{ 1}f, {}\\ \end{array}$$

which is at most f∕48 for c 1 < 1∕150. The second inequality above is obtained by bucketing the js into intervals of the form [2i ⋅ c 1 f, 2i+1 ⋅ c 1 f]. The third inequality uses c 2 ≥ 2K 0.

We now bound the error incurred.

  1. 1.

    Consider first a constraint jn. Note that λ j stays zero until the number of fractional variables f drops below jc 1. So we can bound | x(S j ) − b j | by:

    $$\displaystyle{\sum _{i\geq 0}\sqrt{c_{2 } \frac{j} {c_{1}K^{i}} \cdot \log K^{i}} \leq O(\sqrt{j})\,\sum _{i\geq 0}\sqrt{i}K^{-i/2} = O(\sqrt{j}),}$$

    where i indexes the iterations of the algorithm after f drops below jc 1 for the first time.

  2. 2.

    Now consider a constraint j > n. Similarly, we bound | x(S j ) − b j | by:

    $$\displaystyle{\sum _{i\geq 0}\sqrt{c_{2 } \frac{n} {K^{i}} \cdot \log ( \frac{j} {c_{1}n}K^{i})} \leq O(\sqrt{n\log (\,j/n)})\,\sum _{i\geq 0}\sqrt{i}K^{-i/2} = O(\sqrt{n\log (\,j/n)}).}$$

    Here i indexes the number of iterations of the algorithm from its start.

Error bound of \(\sqrt{L \cdot b_{j}} + L\) for j ∈ M 3 , where \(L = \Theta (\log m\log n)\) Note that the additive term in this expression is at least L. If any b j < L then we increase it to L (and add dummy elements to S j and ensure y(S j ) = L); this only affects the error term by a constant factor as \(L \leq \sqrt{L \cdot b_{j}} + L \leq 2L\). So in the following we assume that min j b j L.

Here we set λ j = in all iterations, which satisfies \(\sum _{j\in M_{3}}e^{-\lambda _{j}^{2}/K_{ 0}} = 0\).

The analysis of the error incurred is similar to that in Lemma 2 and we only sketch the details; the main difference is that we analyze the deviation in a combined manner over all O(logn) iterations. Fix any constraint j ∈ [m]. If we ignore the error due to the truncation steps over all iterationsFootnote 4 then we can write | x(S j ) − b j | = | t = 0 P γZ t | where γ = n −6 and \(Z_{t} =\langle \overline{G}_{t},\mathbf{1}_{S_{j}}\rangle\); recall that each \(\overline{G}_{t} = D^{-1}G_{t}\) for random direction G t as in Step 4 of the algorithm in Sect. 2. Here P = O(lognγ 2) since there are O(logn) iterations and O(1∕γ 2) steps in each iteration. We will use the concentration inequality in Theorem 6 with martingale {Z t } t ≥ 0 and stopping time τ being the first time t where \(\vert \sum _{t=0}^{t^{{\prime}} }Z_{t}\vert> \frac{1} {\gamma } \sqrt{Lb_{j}}\). Then it follows that at any step t before stopping, the current solution y satisfies \(y^{{\prime}}(S_{j}) - y(S_{j}) =\gamma \sum _{ t=0}^{t^{{\prime}} }Z_{t} \leq \sqrt{Lb_{j}} \leq b_{j}\) (using the assumption b j L), i.e. y (S j ) ≤ 2b j . Now we can bound W τ P ⋅ O(b j ) = O(lognγ 2) ⋅ b j . Using Theorem 6 with \(\ell= \sqrt{Lb_{j}}/\gamma\), we obtain:

$$\displaystyle{\Pr \left [\vert \gamma \sum _{t=0}^{\tau }Z_{ t}\vert \,\,\geq \,\,\sqrt{Lb_{j}}\right ] \leq 2\exp \left ( \frac{-Lb_{j}} {O(\log n)b_{j}}\right ) \leq \frac{1} {m^{2}},}$$

by choosing a large enough constant in L = O(logmlogn). It follows that with probability at least 1 − m −2, we have τ = P and \(\vert x(S_{j}) - b_{j}\vert = \vert \sum _{t=0}^{P}\gamma Z_{t}\vert \leq \sqrt{L \cdot b_{j}}\). Finally, taking a union bound over | M 3 | ≤ m such events, we obtain that with high probability, \(\vert x(S_{j}) - b_{j}\vert \leq \sqrt{L \cdot b_{j}}\) for all jM 3.

Error bound of \(\sqrt{\Delta }\log n\) for j ∈ M 4 Here we set \(\lambda _{j} = \sqrt{K_{1 } \Delta }/\sqrt{\vert S_{j } \vert }\) in all iterations, where K 1 is a constant to be fixed later. We first bound \(\sum _{j\in M_{4}}e^{-\lambda _{j}^{2}/K_{ 0}}\). Note that when restricted to the f fractional variables in any iteration, \(\sum _{j=1}^{m}\vert S_{j}\vert \leq \Delta f\) since each variable appears in at most \(\Delta\) constraints. So the number of constraints with \(\vert S_{j}\vert> 64\Delta\) is at most \(\frac{f} {64}\). For h ≥ 0, the number of constraints with \(\vert S_{j}\vert \in [2^{-h-1}64\Delta,2^{-h}64\Delta )\) is at most \(2^{h+1} \frac{f} {64}\). So,

$$\displaystyle\begin{array}{rcl} \sum _{j\in M_{4}}e^{-\lambda _{j}^{2}/K_{ 0}}& \leq & \frac{f} {64} +\sum _{ h=0}^{\infty }2^{h+1} \frac{f} {64}\exp \left ( \frac{-K_{1}\Delta } {2^{-h}64\Delta \cdot K_{0}}\right ) {}\\ & \leq & \frac{f} {64} + \frac{f} {64}\sum _{h=0}^{\infty }2^{h+1}e^{-2^{h+2} } \leq \frac{f} {48}. {}\\ \end{array}$$

The second inequality is by choosing large enough constant K 1.

We now bound the error incurred for any constraint jM 4. The error in a single iteration is at most \(O(\sqrt{\Delta }) + \frac{1} {n}\). So the overall error \(\vert x(S_{j}) - b_{j}\vert = O(\sqrt{\Delta }\log n)\).

Overall iteration By setting the λ j parameters for the different parts M 1, M 2, M 3, M 4 as above, it follows that in any iteration with f fractional variables, we have \(\sum _{j=1}^{m}e^{-\lambda _{j}^{2}/K_{ 0}} \leq \frac{f} {24}\) which satisfies the condition in Theorem 4. □

Remark

The above result also extends to the following “group sparse” setting. Suppose the constraints in M 4 are further partitioned into g groups {G k } k = 1 g where the column sparsity restricted to constraints in each group G k is \(\Delta _{k}\). Then we obtain an integral solution with \(\vert x(S_{j}) - b_{j}\vert = O(\sqrt{g \cdot \Delta _{k}}\,\log n)\) for all jG k . The only modification required in the above proof is to set \(\lambda _{j} = \sqrt{K_{1 } \cdot g \cdot \Delta _{k}}/\sqrt{\vert S_{j } \vert }\) for jG k .

3.2 Minimum Cost Degree Bounded Matroid Basis

The input to the minimum cost degree bounded matroid problem (DegMat) is a matroid defined on elements V = [n] with costs \(d: V \rightarrow \mathbb{Z}_{+}\) and m “degree constraints” {S j , b j } j = 1 m where each S j ⊆ [n] and \(b_{j} \in \mathbb{Z}_{+}\). The objective is to find a minimum-cost base I in the matroid that obeys all the degree bounds, i.e. | IS j | ≤ b j for all j ∈ [m]. Here we make a minor technical assumption that all costs are polynomially bounded integers.

An algorithm for DegMatis said to be an (α, β ⋅ b + γ)-bicriteria approximation algorithm if for any instance, it finds a base I satisfying | IS j | ≤ β ⋅ b j + γ for all j ∈ [m] and having cost at most α times the optimum (which satisfies all degree bounds).

Theorem 8

There is a randomized algorithm for DegMat, that on any instance, finds a base I of cost at most the optimum where for each j ∈ [m]:

$$\displaystyle{\vert I^{{\ast}} \cap S_{ j}\vert \leq O(1) \cdot \min \left \{\sqrt{j},\sqrt{n\log (m/n)},\sqrt{\log m\log n \cdot b_{j}} +\log m\log n,\sqrt{\Delta }\log n\right \}.}$$

Proof

Let y ∈ [0, 1]n be an optimal solution to the natural LP relaxation of DegMat. We now describe the rounding algorithm: this is based on iterative applications of Theorem 5. First, we incorporate the cost as a special degree constraint v 0 = d indexed zero. We will require zero violation in the cost during each iteration, i.e. λ 0 = 0 always. We partition the degree constraints [m] as in Theorem 7: recall the definitions of M 1, M 2, M 3, M 4, and the setting of their λ j parameters in each iteration.

In each iteration, we start with a fractional solution y with fn fractional variables. Using the same calculations as Theorem 7, we have \(\sum _{j=0}^{m}e^{-\lambda _{j}^{2}/K_{ 0}} \leq 1 + \frac{f} {24} \leq \frac{f} {16}\) assuming f ≥ 48. For now assume f ≥ max{K 0, 48}; applying Theorem 5, we obtain a new fractional solution y that has:

  • \(\vert \langle v_{0},y^{{\prime\prime}}- y^{{\prime}}\rangle \vert \leq \| d\|/n^{O(1)} \leq \frac{1} {n}\).

  • For each j ∈ [m], \(\vert y^{{\prime\prime}}(S_{j}) - y^{{\prime}}(S_{j})\vert \leq \lambda _{j}\sqrt{W_{j } (y^{{\prime} } )} + \frac{1} {n}\).

  • The number of fractional variables in y is at most \(\frac{f} {K^{{\prime}}}\) for some constant K > 1.

The first condition uses the fact that the error term ∥a j ∥∕n 2 in Theorem 5 can be reduced to ∥a j ∥∕n c for any constant c, and that ∥d∥ ≤ poly(n) as we assumed all costs to be polynomially bounded.

We repeat these iterations as long as f ≥ max{K 0, 48}: this takes \(T \leq \frac{\log n} {\log K^{{\prime}}} = O(\log n)\) iterations. The violation in the cost (i.e. constraint j = 0) is at most \(\frac{T} {n} <1\). For any degree constraint j ∈ [m], the violation is exactly as in Theorem 7.

At the end of the above iterations, we are left with an almost integral solution x: it has O(1) fractional variables. Notice that x lies in the matroid base polytope: so it can be expressed as a convex combination of (integral) matroid bases. We output the minimum cost base I in this convex decomposition of x. Note that the cost of solution I is at most that of x which is less than 〈d, y〉 + 1. Moreover, I agrees with x on all integral variables of x: so the worst case additional violation of any degree constraint is just O(1). □

We state two special cases of this result, which improve on prior work.

Corollary 1

There are randomized bicriteria approximation algorithms for DegMatwith ratios \((1,b + O(\sqrt{n\log (m/n)}))\) and \((1,O(\sqrt{\Delta }\log n))\).

Previously, [6] obtained a \((1,b + O(\sqrt{n\log (m)}))\) bicriteria approximation and [11] obtained a \((1,\Delta - 1)\) bicriteria approximation for DegMat.

3.3 Multi-criteria Matroid Basis

The input to the multi-criteria matroid basis is a matroid \(\mathcal{M}\) defined on elements V = [n] with k different cost functions \(d_{j}: [n] \rightarrow \mathbb{Z}_{+}\) (for j = 1, , k) and budgets {B j } j = 1 k. The goal is to find (if possible) a basis I with d j (I) ≤ B j for each j ∈ [k]. We obtain:

Theorem 9

There is a randomized algorithm for multi-criteria matroid basis, that given any ε > 0 finds in \(n^{O(k^{1.5}\,/\,\epsilon ) }\) time, a basis I with d j (I) ≤ (1 + ε)B j for all j ∈ [k].

Previously, [8] obtained a deterministic algorithm for MCM that required \(n^{O(k^{2}\,/\,\epsilon ) }\) time. One could also use the algorithm of [6] to obtain a randomized PTAS for MCM, but this approach requires at least \(n^{\Omega (k\,/\,\epsilon ^{2}) }\) time. Our running time is better when \(\epsilon <1/\sqrt{k}\).

We now describe the algorithm in Theorem 9. An element e is said to be heavy if its jth cost \(d_{j}(e)> \frac{\epsilon } {\sqrt{k}}B_{j}\) for any j ∈ [k]. Note that the optimal solution contains at most \(\frac{k^{1.5}} {\epsilon }\) heavy elements. The algorithm first guesses by enumeration all heavy elements in the optimal solution. Let \(\mathcal{M}^{{\prime}}\) denote the matroid obtained by contracting these heavy elements. Let B j denote the residual budget for each j ∈ [k]. The algorithm now solves the natural LP relaxation:

$$\displaystyle{x \in P(\mathcal{M}^{{\prime}}),\quad \langle d_{ j},x\rangle \leq B_{j}^{{\prime}},\,\,\forall j \in [k].}$$

The rounding algorithm is an iterative application of Theorem 5: the number of fractional variables decreases by a factor of K > 1 in each iteration.

As long as the number of fractional variables n < 16k, we use λ j = 0 for all j ∈ [k]; note that this satisfies the condition \(\sum _{j=1}^{k}e^{-\lambda _{j}^{2}/K_{ 0}} \leq n^{{\prime}}/16\). Note that there is no loss in any of the budget constraints in this first phase of the rounding.

Once n N: = 16k, we choose each \(\lambda _{j} = \sqrt{K_{0 } \log (N/n^{{\prime} } )}\) which satisfies the condition on λs. The loss in the jth budget constraint in such an iteration is at most \(\lambda _{j}\sqrt{ n^{{\prime}}}\cdot d_{j}^{max}\) where \(d_{j}^{max} \leq \frac{\epsilon } {\sqrt{k}}B_{j}\) is the maximum cost of any element. So the increase in the jth budget constraint over all iterations is at most:

$$\displaystyle{d_{j}^{max} \cdot \sum _{ i=0}^{t-1}\sqrt{K_{ 0}\, \frac{N} {K^{i}}\,\log (K^{i})}\,\, \leq \,\, O(\sqrt{N}) \cdot d_{j}^{max}\,\, =\,\, O(\epsilon )B_{ j}.}$$

Above i indexes iterations in the second phase of rounding.

3.4 Low Congestion Routing on Short Paths

The routing on short paths (RSP) problem is defined on an n-vertex directed graph G = (V, E) with edge capacities \(b: E \rightarrow \mathbb{Z}_{+}\). There are k source-sink pairs {(s i , t i )} i = 1 k and a length bound \(\Delta\). The goal in RSPis to find an s i t i path P i of length at most \(\Delta\) for each pair i ∈ [k] such that the number of paths using any edge e is at most b e .

The decision problem of determining whether there exist such paths is NP-complete. Hence we focus on bicriteria approximation algorithms, where we attempt to find paths P i s that violate the edge capacities by a small amount. As noted in [6], we can use any LP-based algorithm for DegMatto obtain one for RSP: for completeness we describe this briefly below.

Let \(\mathcal{P}_{i}\) denote the set of all s i t i paths of length at most \(\Delta\). Consider the following LP relaxation for RSP.

$$\displaystyle\begin{array}{rcl} \sum _{P\in \mathcal{P}_{i}}x_{i,P}& \geq & 1,\qquad \forall i \in [k] {}\\ \sum _{i=1}^{k}\sum _{ P\in \mathcal{P}_{i}:e\in P}x_{i,P}& \leq & b_{e},\qquad \forall e \in E {}\\ x& \geq & 0. {}\\ \end{array}$$

Although this LP has an exponential number of variables, it can be solved in polynomial time by an equivalent polynomial-size formulation using a “time-expanded network”.

Given any feasible instance of RSP, we obtain a fractional solution to the above LP. Moreover, the number of non-zero variables x i, P is at most k + | E | = poly(n). Let \(\mathcal{P}_{i}^{{\prime}}\) denote the set of s i t i paths with non-zero value in this fractional solution. Consider now an instance of DegMaton groundset \(U = \cup _{i=1}^{k}\mathcal{P}_{i}^{{\prime}}\) where the matroid is a partition matroid that requires one element from each \(\mathcal{P}_{i}^{{\prime}}\). The degree constraints correspond to edges eE, i.e. S e = {PU: eP}. The goal is to find a base I in the partition matroid such that | S e I | ≤ b e for all eE. Note that the column sparsity of the degree constraints is \(\Delta\) since each path in U has length at most \(\Delta\). Moreover \(\{x_{i,P}\,:\, P \in \mathcal{P}_{i}^{{\prime}},\,i \in [k]\}\) is a feasible fractional solution to the LP relaxation of this DegMatinstance. So we obtain:

Corollary 2

There is an algorithm that given any feasible instance of RSP, computes an s i t i path of length at most \(\Delta\) for each i ∈ [k] where the number of paths using any edge e is at most \(b_{e} +\min \left \{O(\sqrt{\Delta }\log n),\,O(\sqrt{b_{e}}\log n +\log ^{2}n)\right \}\) .

Multipath routing with laminar requirements Our techniques can also handle a richer set of requirements in the RSPproblem. In addition to the graph G, pairs {(s i , t i )} i = 1 k and length bound \(\Delta\), there is a laminar family \(\mathcal{L}\) defined on the pairs [k] with an integer requirement r T on each set \(T \in \mathcal{L}\). The goal in the laminar RSPproblem is to find a multiset of s i t i paths (for i ∈ [k]) such that:

  1. 1.

    each path has length at most \(\Delta\),

  2. 2.

    for each \(T \in \mathcal{L}\), there are at least r T paths between pairs of T, and

  3. 3.

    the number of paths using any edge e is at most b e .

Consider the following LP relaxation for this problem.

$$\displaystyle\begin{array}{rcl} \sum _{i\in T}\,\,\sum _{P\in \mathcal{P}_{i}}x_{i,P}& \geq & r_{T},\qquad \forall T \in \mathcal{L} {}\\ \sum _{i=1}^{k}\,\,\sum _{ P\in \mathcal{P}_{i}:e\in P}x_{i,P}& \leq & b_{e},\qquad \forall e \in E {}\\ x& \geq & 0. {}\\ \end{array}$$

This LP can again be solved using an equivalent polynomial-sized LP. Let \(\mathcal{P}_{i}^{{\prime}}\) denote the set of s i t i paths with non-zero value in this fractional solution, and define groundset \(U = \cup _{i=1}^{k}\mathcal{P}_{i}^{{\prime}}\). As before, we also define “degree constraints” corresponding to edges eE, i.e. at most b e elements can be chosen from S e = {PU: eP}. Unlike the usual RSPproblem we can not directly cast these laminar requirements as a matroid constraint, but a slight modification of the DegMatalgorithm works.

The main idea is that the partial rounding result (Theorem 5) also holds if we want to exactly preserve any laminar family \(\mathcal{L}\) of constraints (instead of a matroid). Note that a laminar family on | U | elements might have 2 | U | sets. However, it is easy to see that the number of tight constraints of \(\mathcal{L}\) at any strictly fractional solution is at most | U | ∕2. Using this observation in place of Claim 2, we obtain the partial rounding result also for laminar constraints.

Finally using this partial rounding as in Theorem 8, we obtain:

Theorem 10

There is an algorithm that given any feasible instance of laminar RSP, computes a multiset \(\mathcal{Q}\) of s i t i paths such that:

  1. 1.

    each path in \(\mathcal{Q}\) has length at most \(\Delta\) ,

  2. 2.

    for each \(T \in \mathcal{L}\) , there are at least r T paths in \(\mathcal{Q}\) between pairs of T, and

  3. 3.

    the number of paths in \(\mathcal{Q}\) using any edge e is at most:

    $$\displaystyle{b_{e} +\min \left \{O(\sqrt{\Delta }\log n),\,O(\sqrt{b_{e}}\log n +\log ^{2}n)\right \}.}$$