1 Introduction

Mixed-integer linear programming (MILP) has become a mature area of research both in theory and practice, with powerful commercial and non-commercial solvers available, like CPLEX, GUROBI, XPRESS, and SCIP, to name a few. These modern state-of-the-art solvers use linear programming (LP) relaxation as the primary work-horse to provide dual bounds, which is further enhanced with the use of cutting-planes: linear inequalities that are valid for all integral feasible solutions but not implied by the LP relaxation. We assume familiarity with basic cutting-plane theory; for an introduction to this subject, we refer to the books [77, 179].

We recount here a well-known anecdote about the complete change in outlook towards the use of general-purpose cutting-planes within solvers. Until the early 90s the research community was unanimous:

“In order to solve integer programs of meaningful sizes, one had to exploit the structure of the underlying combinatorial problem” [87].

In particular, general-purpose cutting-planes were seen as mostly of theoretical interest. However, in the mid 90s a breakthrough came with a series of papers by Balas, Ceria, Cornuéjols, and Natraj, where they obtained striking computational results using Gomory Mixed Integer cuts and the then newly discovered Lift-and-Project cuts, within the branch-and-bound framework [29, 30]. These papers show the efficacy of general-purpose cutting-planes, when properly selected and employed.

Today, even more families of cutting-planes are known. While we do not attempt to survey the vast literature on cutting-planes, we discuss the broad perspectives from which they are derived, with some representative examples.

Geometric cuts A set \(Q \subseteq \mathbb {R}^n\) is called lattice-free if \(\mathrm{int}(Q) \cap \mathbb {Z}^n = \emptyset \). Given a mixed-integer set \(P \cap (\mathbb {Z}^n \times \mathbb {R}^q)\) and a lattice-free set Q, observe that

$$\begin{aligned} P \cap (\mathbb {Z}^n \times \mathbb {R}^q) \subseteq P {\setminus } (\mathrm{int}(Q) \times \mathbb {R}^q), \end{aligned}$$

and therefore valid inequalities for \(P {\setminus } (\mathrm{int}(Q) \times \mathbb {R}^q)\) are valid for \(P \cap (\mathbb {Z}^n \times \mathbb {R}^q)\). We call such cuts geometric cuts. A set Q is a maximal lattice-free convex set if it is lattice-free, convex, and no other lattice-free convex set properly contains it. A beautiful structural result is that all maximal lattice-free convex sets are polyhedral [38, 169]. Note that when Q is a polyhedral set \( \{x\in \mathbb {R}^n\,|\, (a^i)^\top x \le b^i,~i \in [m]\}\) one can generate a geometric cut by ensuring its validity for all the disjunctions identified with the facets of Q, i.e., \(P \cap \{(x,y) \in \mathbb {R}^{n + q}\,|\, (a^i)^\top x \ge b^i \}\) for \(i \in [m]\).

Famous geometric cuts include the Chvátal–Gomory (CG) cuts [65, 120, 131, 164], implied bounds [145], and more generally split cuts [27, 34, 85]. For all these cuts, the underlying lattice-free set is the so-called split set: \(\{x \in \mathbb {R}^n \,|\, \pi _0 \le \pi ^\top x \le \pi _0 + 1\}\), where \(\pi _0 \in \mathbb {Z}\) and \(\pi \in \mathbb {Z}^n\). More general geometric cuts include cuts from maximal lattice-free convex sets [12, 62] (see the excellent survey [39]) and the closely related intersection cuts developed by Balas in 1970s [25]. Geometric cuts called multi-branch split cuts [99, 104, 165] are based on non-convex lattice-free sets obtained as the union of split sets.

Structured relaxations Given a mixed-integer set \(P \cap (\mathbb {Z}^n \times \mathbb {R}^q)\), the basic idea is to construct a relaxation of P, say \(R \supseteq P\), and analyze the mixed-integer set \(R \cap (\mathbb {Z}^n \times \mathbb {R}^q)\). Valid inequalities for the relaxation then serve as cutting-planes for the original set, and the additional structure of R eases the search for facets or strong inequalities. Two excellent review articles discussing these cutting-planes are [152, 171].

One of the most important families of structured relaxations are the single constraint relaxations, i.e., relaxations described by one non-trivial constraint together with various variable bounds and integrality restrictions. Given a mixed-integer linear set, a single constraint relaxation can be obtained by taking a positive combination of the original constraints. We discuss this in more detail in Sect. 3. Single constraint relaxations include knapsack constraint with binary and continuous variables [26, 35, 67, 97, 142, 200, 202, 204, 210] (which yield the much used lifted cover inequalities), and knapsack constraints with general integer and continuous variables [15,16,17, 101, 190], of which a special case is the MIR set [180, 206]. As a concrete example, the mixed-integer rounding (MIR) set is defined as \(\{(x, y) \in \mathbb {Z} \times \mathbb {R}_+\,|\, x + y \ge b \}\). The only facet missing in this description is the MIR inequality

$$\begin{aligned} x + \frac{1}{f}\cdot y \ge \lceil b \rceil , \end{aligned}$$

where \(f = b - \lfloor b\rfloor > 0 .\) The general MIR inequality can be obtained by embedding a general mixed-integer set into the MIR set through column and constraint aggregations, see [107] for details.

Multiple constraint relaxations have also been studied, i.e., when the relaxation is described using multiple non-trivial constraints. Multiple constraint relaxations include those studied in [6] and the mixing set [103, 121, 140, 193] (both generalizing the MIR set), stable sets [19, 153] (which yield the useful clique inequalities applied on conflict graphs by solvers), fixed-charge networks [18, 138, 183, 205] (used to obtain flow cover, path cover, and pack inequalities), and the corner and group relaxations [133,134,135, 149, 151]. The most important cuts obtained via the latter are the Gomory mixed-integer cuts (GMIC) [132]; also see [10, 92] for important variants of GMICs. Given an equalitiy constraint of the form

$$\begin{aligned} \sum _{j= 1}^n a_j x_j + \sum _{j = 1}^q g_j y_j = b \end{aligned}$$

where \(x_j \in \mathbb {Z}_{+}\) and \(y_j \in \mathbb {R}_{+}\), the GMIC inequality is given by:

$$\begin{aligned} \sum _{ j =1}^n \mathrm{min} \left\{ \frac{f_j}{f_0}, \frac{1 - f_j}{ 1 - f_0}\right\} x_j + \sum _{j = 1}^q \mathrm{max}\left\{ \frac{g_j }{f_0}, \frac{ - g_j}{ 1- f_0} \right\} y_j \ge 1, \end{aligned}$$

where \(f_j = a_j - \lfloor a_j \rfloor \) and \(f_0 = b - \lfloor b \rfloor \).

The study of the group relaxation has been very popular in the last 10-15 years  [14, 22, 32, 41, 49, 76, 102, 118, 119, 123, 136, 137, 156, 157, 175]. Many beautiful and non-trivial structural results have been proved regarding valid inequalities for these relaxations [44, 47, 48, 122, 208]. See [39, 45, 46, 189] for reviews on the group relaxation and its variants.

Subadditive cutting-planes An important concept that was first discovered in relation to the group relaxation is the notion of subadditive cutting-planes [149]. We state it here for a general pure-integer program: Consider a set \(S:= \{x \in \mathbb {Z}^n_+ \,|\, Ax \ge b \}\), where \(A \in \mathbb {Q}^{m \times n}\), and a function \(f: \mathbb {R}^m \rightarrow \mathbb {R}\) that satisfies the following:

  1. 1.

    f is subadditive, i.e., \(f(u) + f(v) \ge f(u + v)\) for all \(u,v \in \mathbb {R}^m\),

  2. 2.

    f is non-decreasing, i.e., \(f(u) \ge f(v)\) for all \(u - v \in \mathbb {R}^m_+\),

  3. 3.

    \(f(\mathbf 0 ) = 0\).

It is straightforward to verify that \(\sum _{j = 1}^n f(A^j)x_j \ge f(b)\) is a valid cutting-plane for S. More importanty, it can be shown that the convex hull of S can be obtained by such valid inequalities. See [51, 141, 147, 148, 150, 177] for generalizations to mixed-integer sets and other extensions.

Convexification via extended formulations and algebraic properties The key idea here is to go to a higher dimension, usually the space of products of variables, and then use algebraic properties of the matrix of these variables to imply non-trivial inequalities in the original space. Some of these methods include generating standard cuts like split cuts for extended formulations [55, 60], the LP based hierarchies Reformulation-linearization technique (RLT) [196,197,198], Lift-and-project cuts (L&P) [29, 34], the semidefinite programming hierarchies due to Lovász and Schrijver (\(N_{+}\)) [170], the Bienstock and Zuckerberg (BZ) hierarchy [53], and the more general Lasserre hierarchy [161]. See the excellent article by Rothvoß [191] on applications of these cuts to develop approximation algorithms.

1.1 Theoretical analysis of the strength of cutting-planes

Given this wealth of cuts at our disposal, a crucial question is that of cutting-plane selection: Which cuts should a solver employ at a given point in time? Motivated by this issue, there is a large literature on theoretical analysis of cutting-planes trying to better understand the strength of different families of cuts. While the overall goal is be able to solve MILP models efficiently, given the complexity of solvers one cannot expect to be able to perform a theoretical analysis of this performance metric directly, and thus proxy measures are required. In fact, many notions of usefulness or strength of cuts have been proposed. We briefly survey known results in this area.

Finite cutting-plane algorithms One natural question is: can we solve an MILP in finite time using solely cuts from a given family without branching or any other operation? Gomory gave the first finitely convergent cutting-plane algorithm for pure integer programs [131], using CG cuts. There have been very few results since then on finite cutting-plane algorithms, see [181] and references therein. See [68] for a polynomial-time cutting-plane algorithm for matching, [69] for a finite cutting-plane algorithm for bounded MILPs, and [104] for a finite cutting-plane algorithm for general MILPs. There are also lower bounds on the length of cutting-plane proofs (length of sequence of cuts needed to prove optimality or infeasibility) [73]. For example, generalizing previous work of Pudlák [188], Dash [98] presents exponential lower bounds for proofs via branch-and-cut procedures that use L&P and CG cuts.

Closure inclusion Given a polyhedron P and a family M of cutting-planes, we define the closure M(P) as the intersection of all cuts from this family valid for the (mixed-) integer solutions in P. For any family of cuts the closure M(P) contains the integer hull, and a smaller closure means a stronger approximation of the latter.

A fundamental result is that the GMIC closure, the MIR closure, and the split closure are equal [90, 180]. The L&P closure [29], defined only for binary MILPs, is weaker than these other closures. Marchand and Wolsey [172] prove that various well-know classes of cutting-planes are all MIR cuts; thus, the split closure is stronger than the closures of these classes of cuts. Cornuéjols and Li [90] provide comparisons between several basic closures, including the split, L&P, and RLT closures. The papers [104, 165] show that the t-branch closure can be strictly stronger than infinitely many iterations of the k-branch closure whenever \(t > k\). See also [99, 105, 108]. Laurent [162] compares RLT, \(N_{+}\), and the Lasserre closures for 0/1 polytopes and shows that the Lasserre construction provides the tightest relaxations. See [174] for an augmented version of the Lasserre hierarchy for 0/1 problems using high-degree polynomials that implies the BZ hierarchy, and comparison with the CG closure.

Rank Rank is a very popular measure of strength, measuring the number of “round” of cuts needed to obtain the integer hull. More precisely, the ith closure \(M^i(P)\) is defined inductively by applying the closure operator to the set \(M^{i-1}(P)\); the rank of P with respect to the family of cuts M is then the smallest integer i such that \(M^i(P)\) is the integer hull of P.

Schrijver [194] showed that the CG rank is finite for any rational polyhedron P. The paper [128] showed that the rank is finite for polytopes, even when restricting to the subclass of mod-2 CG cuts. Chv́atal, Cook, and Hartmann [72] proved lower bounds on the CG rank for natural formulations of various combinatorial problems. It was shown in [82] that for polyhedra with no integer points the CG rank can be upper bounded by a function of the dimension of the ambient space. For polyhedra in \([0, 1]^n\), Eisenbrand and Schulz showed that the CG rank is always upper bounded by \(\mathcal {O}(n^2 \log n)\). Recently, Rothvoß and Sanitá [192], showed that there are binary knapsack instances with CG rank \(\Omega (n^2)\). Also see [186, 187]. The paper [95] provides conditions for a binary IP, in terms of the subgraph induced by the infeasible 0/1 vertices in the skeleton graph of the 0/1-cube, that guarantee small CG rank. See [52] for generalizations that characterize polytopes in the 0/1-cube with bounded CG Rank.

The L&P closure [29], defined only for binary MILPs, has rank at most n, the number of binary variables; since this closure is weaker than the GMIC, MIR, and split closures, this yields the same rank upper bound for these cuts on binary MILPs. Moreover, these results are tight, since [91] gives an example of a binary MILP for which the rank of n is achieved for the split closure. In contrast, for general MILPs [85] showed that the rank with respect to split cuts may not be finite, although the ith split closure converges to the integer hull in the limit as i tends to infinity [109, 182].

Since the \(N_{+}\) operator is also stronger than the L&P operator, it also has a rank upper bound of n for binary MILPs. Cook and Dash [83] show that this rank n can be achieved even when \(N_{+}\) is combined with CG cuts. See [160] for a characterization of 0/1 instances that have rank n for the Lasserre hierarchy. Au and Tuncel [20, 21] prove a comprehensive set of results on the rank of the \(N_{+}\), BZ, and Lasserre closures and extensions. Also see [71] for related results.

The lattice-free cut approach and the subbadditive dual approach can be viewed as “dual” of each other, see  [75, 122] for concrete results along these lines. Indeed the GMIC cut is the subadditive functional version of the geometric cut from a split set. Therefore, it is expected that even for general MILPs lattice-free cuts should have finite rank, since subadditive functions are known to yield the convex hull. An integral lattice-free polyhedron is a lattice-free polyhedron whose vertices are integral; the collection of these lattice-free sets is a proper subset of all possible lattice-free convex sets. Del Pia and Weismantel [109] prove that the rank of general MILPs with respect to integral lattice-free disjunctive cuts is finite.

If the integrality gaps of an instance is high, we expect that it is difficult to solve this instance, and in particular we expect that more rounds of cutting-planes are needed to obtain the integer hull. The papers [56, 57, 187], present lower bounds on rank of CG, split cuts, lattice-free cuts, and closures of cut from multiple-constraint relaxations that increase as the integrality gap increases for packing and covering integer programs (see definition below).

One measure to compare a cut against split cuts is the so-called split rank. This is defined to be the smallest integer i such that the cut under study is valid for the ith split closure. The paper [111] gave a general construction of lower bounds on the split rank of intersection cuts. The papers [42, 115] analyze the split rank of the lattice-free cuts. In fact, given any family of lattice-free convex cuts, one can define the rank with respect to this family. The paper [110] characterizes when this rank is finite.

See [199] for examples showing when low rank CG cuts close the integrality gap significantly and compare these CG cuts to the performance of RLT hierarchy.

Blow-up measure One can develop a more refined measure of strength than closure inclusion for packing and covering sets. We say that \(P \subseteq \mathbb {R}^n_+\) is a covering set if it is upwards closed, namely if \(x \in P\) and \(y \ge x\), then \(y \in P\). Given two covering sets \(Q \supseteq P\) (so Q is a relaxation of P), we say that Q is an \(\alpha \)-approximation of P if \(P \supseteq \alpha Q := \{\alpha x \mid x \in Q\}\), i.e., pushing the bigger set up by an \(\alpha \) factor makes it be contained in the smaller set. Equivalently, Q is an \(\alpha \)-approximation of P if for all nonnegative objective functions \(c \in \mathbb {R}^n_+\)

$$\begin{aligned} \min \{c^\top x \mid x \in Q\} \ge \frac{1}{\alpha } \cdot \min \{c^\top x \mid x \in P\}, \end{aligned}$$
(1)

see [130] and Lemma 23 of [176]. This measure can be defined for packing sets in an analogous way, see Sect. 3.1.

While originally defined in this context for comparing inequalities for the Traveling Salesman Problem [130], this measure prominently appears in the analysis of inequalities for the continuous group relaxation, namely MILPs defined by k equations with k free integer variables and nonnegative continuous variables (whose projection onto the continuous variables is a covering set). For the case \(k=2\), it is known that all facet-defining inequalities are geometric cuts from three types of lattice-free sets: split, lattice-free triangle, or lattice-free quadrilateral inequalities. The paper [37] shows that the triangle closure (and the quadrilateral closure) is a 2-approximation of the integer hull, while the split closure can be arbitrarily weak under this measure. See [24] for improved bounds. However, several computational experiments [36, 106, 114, 124, 168] have indicated that triangle and quadrilateral cuts do not work as well as split cuts in practice. In order to better understand this phenomenon, the paper [43] analyzed uniformly generated MILPs and showed that with high probability both split and triangle closures provide very good approximations for the integer hull, thus triangle cuts do not add much over split cuts. Also see [144, 185].

This measure has also been used to analyze the related corner relaxation. For example, Averkov, Basu, and Paat [23] recently showed that for each natural number n, a corner polyhedron with n basic integer variables and an arbitrary number of continuous non-basic variables is approximated up to a constant factor by intersection cuts from lattice-free sets with at most i facets if \(i > 2^{n -1}\), and that no such approximation is possible if \(i \le 2^{n-1}\).

Minimality, extremality, and facetness Unlike the above measures, these attempt to measure the strength of individual cuts. Given a polyhedron \(P \subseteq \mathbb {R}^n\), a valid inequality \(a^\top x \le b\) is minimal if there is no other valid inequality \(c^\top x \le d\) that strictly dominates it, i.e., \(\{x \mid c^\top x \le d\} \subsetneq \{x \mid a^\top x \le b\}\). An inequality is facet-defining (or extreme) if it cannot be strictly dominated by a positive combination of other valid inequalities. Notice that facet-defining inequalities are a subset of minimal inequalities, which are a subset of all valid inequalities, and it is well-known that facet-defining inequalities are necessary and sufficient to describe a polyhedron; thus, these give a hierarchy of importance of inequalities.

There is a wealth of results on facet-defining inequalities for specific combinatorial optimization problems, and their use has led to vast computational improvements in the 80s and 90s. These notions are also heavily used in the study of the group relaxations. Gomory and Johnson [134] characterize the minimal inequalities for the infinite group relaxation in terms of subadditive functions. Also see [158]. For the infinite group relaxation with 1 equation, [134] also provides a sufficient condition for an inequality to be extreme, the so-called 2-Slope Theorem [135]. This was later generalized to an arbitrary number of equations [47, 96]. For the case of 2 equations, Basu et al. [44, 50] provide algorithms for testing whether a minimal valid function is extreme.

Note that a polyhedron P has a finite number of facet-defining inequalities and, excluding degenerate cases, infinitely many minimal inequalities; thus, restricting to facet-defining inequalities really narrows down the inequalities under consideration. However, interestingly, for the infinite group relaxation (which is infinite dimensional), it was recently proved that the facet-defining inequalities form a dense set within the set of minimal inequalities [48, 163]. This casts doubts on whether being a facet-defining inequality is really useful in this context. See the surveys [45, 46] for a comprehensive list of results on the infinite group relaxation.

1.2 Cutting-plane selection: the need to ask different questions

As described in the sections above, many different classes of general-purpose cutting-planes are known today and a wealth of results have been obtained about their strengths. However, when it comes to cutting-plane use and selection our scientific understanding is far from complete. Indeed, the seminal work of Balas, Ceria, Cornuejols, Natraj [30] were the first to obtain good results using the same GMI cuts that were already available in the 60s and tested in multiple papers. In order to better appreciate the design challenges in implementing cutting-planes, a quote from [13] is fitting:

“Branch-and-cut designers often have to face an Hamletic question: to cut or not to cut?”

Indeed, adding cuts in a naïve way could well reduce the number of branching nodes, but the overall computing time may increase dramatically, for example, due to an increase in time to solve the LPs in the branch-and-bound tree. Extreme care is needed regarding how cuts are added. To quote [30], one of the reason of their success is

“The cuts are used selectively, i.e., not all cuts from the pool are part of the formulation solved at every iteration. This helps us avoid the “clogging” of the linear programs with constraints that are locally useless.”

Modern solvers break down the problem of cut selection in two steps. In the Step 1, they maintain a list of valid cutting-planes called cut-pool that contains cuts that separate the current fractional point, together with all previously non-discarded cuts (but not added to LP) [4, 13]. Then in Step 2 the solver ranks the cuts from this pool and adds a few of them to the LP. As we illustrate next, several issues need to be considered in this selection process. We also point out why unfortunately the traditional analyses of strength of cuts offer only limited help in understanding and addressing these issues.

Measuring the effectiveness of cutting-planes While most theoretical analyses focus on the strength of a whole family of cuts, it is clear that solvers cannot add all of these cuts. Thus, it is important to have measures of the potential effectiveness of individual (or a small set of) cuts as a guide for selection. A common measure of effectiveness of a cut \(\alpha ^{\top }x \le \beta \) separating a fractional point \(x^{*}\) is the depth-of-cut, i.e., \(\frac{\alpha ^{\top } x^{*} - \beta }{||\alpha ||_2}.\) This measure has its drawbacks, and while counter-intuitive, it may rank higher weaker cuts: suppose the problem has non-negativity constraints and \(x^*_i = 0\); to obtain larger depth-of-cut one should try to set \(\alpha _i = 0\), while the same cut with \(\alpha _i > 0\) dominates the latter [13] (see [9] for concrete examples). Another potential measure is the volume cut off of a given LP [136]. The recent paper [40] proves that, for the group relaxation, the depth-of-cut measure may select very weak cuts, while a volume measure is able to single out GMICs, arguably one of the strongest cuts in practice, as the best one. It would be very interesting to better understand cuts with large volume measure and to be able to separate them efficiently. Also see [166] for an efficiency measure based on counting integer points, and [203] for other measures of efficiency.

Cuts from multiple sources Many solvers walk on the optimal face of the LP relaxation and collect cuts from different bases. (These multiple bases can also be helpful for primal heuristics.) One inspiration for this is how Gomory’s finite cutting-plane algorithm makes progress by incrementally cutting off the optimal face [1, 2, 209]. This may also help with obtaining stronger cuts; for example, the paper [94] shows how different corner relaxations for the Stable Set problem have very different strengths. Moreover, it is known that the split closure can be obtained from generating split cuts from multiple bases [11] (see [34, 100, 126] for computational uses of this idea). The paper [30] also shows how cuts from different parts of a branch and bound tree can be used in a different part of the branch-and-bound tree for binary MILPs. However, access to multiple sources of cuts leads to many questions: Should the cuts selected separate all the known optimal bases? What happens if none of the cuts separates all the known bases? How to adapt the measures of effectiveness to incorporate information of two or more fractional points?

Parallelism between cuts/objective function It has been observed that it is very important to use “cuts that improve the polyhedron in diverse directions” [31]. The dot-product between the normalized values of left-hand sides of two cuts is usually used as measure of parallelism. See the papers [4, 13] on how the performance of cutting-plane non-trivially depends on the setting of this parameter. The paper [9] presents cut generation explicitly optimizing also for parallelism using a bilevel optimization strategy. The paper [4] indicates that considering parallelism with respect to the objective function (favoring cuts in similar direction) may be ineffective. See also [81] for an MILP-based strategy for adding in a coordinated way the set of k cuts that provides the largest improvement to the relaxation’s objective value. To the best of our knowledge, there are no theoretical results/understanding pertaining to parallelism.

How many cuts to add A natural design parameter is the decision on how many cuts to add in each round. The importance of this parameter is gauged by the success of the the results in [30]. In particular, [30] prescribed adding a large number of GMIC simultaneously obtained from different rows of a simplex tableau corresponding to fractional basic variables. Previous to [30], most implementations added one cut at a time, which lead to very poor convergence. Also [30] observes that adding multiple cuts in rounds leads to less parallel cuts. Some options used are setting a fixed upper bound on the maximum number to be generated [3] and making this upper bound a fixed parameter times the number of constraints in the original formulation [13]. Whether these are the best ways to decide on the number of cuts to be added is not clear. For example, one may expect the number of integer variables to play a role, but to the best of our knowledge no such study has been conducted.

Whether to use facet-defining or simply minimal inequalities of a relaxation Given a relaxation whose valid inequalities will be used as cutting-planes, we may choose to restrict our search for cutting-planes to only facet-defining inequalities or use any valid inequality. If we do not use only facet-defining inequalities of the relaxation, a natural option is to use a cut measure, such as violation, for separating valid inequalities from it. For some families of cutting-planes, such as L&P, it is possible to write the separation problem as an LP. An important design choice is the selection of normalization used in the resulting LP; see for example [28, 61, 70]. In a recent paper, Wolsey and Conforti [80] address the problem of efficiently finding an inequality that is violated and either defines an improper face or a facet by the proper use of normalization. They also provide some evidence that this method works on structured and unstructured problems. Cadoux [64] proposes a projection-based method for generating a set of inequalities that are facet defining for the relaxation and that together dominate the separating inequality with largest depth-of-cut (see also [89]).

As mentioned above, other recent papers [48, 163] considered the question of separating faces versus facet-defining inequalities in the context of the (infinite) group relaxation. Surprisingly, it was shown that for any minimal inequality, there exists an extreme inequality that approximates the minimal inequality as closely as desired. Thus, in this context separating faces is effectively “equal to” separating facets. While some special cases are understood, many open problems remain regarding the selection of cuts from relaxations.

Cut sparsity It is well established that LP solvers perform much better with sparse formulations. Therefore, it is important to add sparser cutting-planes to maintain sparsity of the LP. We expand on the discussion pertaining to sparsity in Sect. 2.

Numerical properties, validity, and stability There are two main issues related to numerical properties of cuts. The first is that of validity: because most implementations use finite-precision arithmetic, round off errors may induce the generation and use of invalid cuts [93, 173]. This has motivated the line of research on numerically safe computations in integer programs; see for example [84] for the numerically safe MIR/GMI cuts. An alternative method is to have solvers using exact rational arithmetic [86, 129]. The second issue is that of numerical stability (e.g., ill conditioning of LPs), which is particularly salient when multiple rounds of cuts are generated [209]. To avoid this, typically cuts with large dynamism, i.e., ratio of largest to smallest absolute value of coefficients, are discarded [126]. Perhaps more interestingly, there have been a few proposed methods for generating cuts in a branch-and-cut framework to mitigate this issue, such as the use of rank-1 cuts [58, 100], the relax-and-cut framework [126], using a simplified version of the lexicographic simplex [209], and the use of generalized intersection cuts [33]. More research would be welcome to better understand the source of instability introduced by cut generation and how to avoid it in an efficient way.

Strength of cutting-planes with respect to problem being solved It is well-known that clique cuts are extremely useful for the stable set IPs, and flow covers are extremely important for certain fixed-charge network flow sub-structures [5]. However, very few results of this nature are known, i.e., that connect the performance of a particular general purpose cut to broad sub-classes of problems. Indeed, much of the results on the strength of cutting-planes, as discussed in Sect. 1.1, is via worst-case results. Such results only provide a coarse indication of which families of cuts seem to be strong, while not indicating which class of cutting-planes is good (or, equally importantly, bad) for a given class of instances. Recently, the notion of reverse rank was introduced and studied (for CG and split cuts) [78, 79] which connects instances to cut strength. In particular, the question asked is the following: for which problems, can there exist an LP relaxation with arbitrarily bad rank for a given class of cuts? While this is an important first step, many questions in this direction remain to be explored.

Balancing the different goals The points above illustrate that many, possibly conflicting, goals should be taken into account when generating cuts; the question is then how to balance these goals. Overall, the common strategy is to: (1) first generate a collection of violated cuts from different families using heuristics that typically aim at obtaining cuts with maximum effectiveness; (2) discard cuts whose performance on one of the goals (e.g., parallelism) is worse than a given (possibly dynamic) amount (possibly skipping this step for cuts with particularly high “quality”); (3) rank the remaining cuts using a linear combination of the different goals, and add the top cuts [4, 13, 203]. To the best of our knowledge, there are very few works that during cut generation explicitly attempt to optimize on more than one goal simultaneously. Here we can cite [9], which simultaneously optimizes efficiency and parallelism by solving a bilevel optimization problem, and [64] (mentioned above) that generates cuts that are facets for a relaxation and also dominate the separating cut with largest depth-of-cut.

We hope that the above discussion illustrates that there are many important issues pertaining to cut generation and selection, and also several exciting directions that can lead to a better understanding of cutting-plane selection. In the next two sections we review some lines of work that we undertook to provide a preliminary theoretical underpinning for some of the issues appearing in cut-selection discussed above, namely sparsity in Sect. 2 and use of knapsack relaxation cuts for packing and covering instances (i.e. connecting some family of cut to important sub-family of instances) in Sect. 3.

2 Sparsity

Sparsity is a very important topic in all areas of scientific computing. Consider the following statistics: The average number (median) of non-zero entries in the constraint matrix of MIPLIB 2010 [155] instances is \(1.63\%\) (\(0.17\%\)).

Sparse LPs can be solved efficiently [74, Chapter 3] [139, 184]. An IP solver typically solves a large number of LPs in a branch-and-bound tree to solve one IP instance. Therefore, clearly one of the biggest benefits of sparsity for state-of-the-art IP solvers is the sparsity of the underlying LP. In a very revealing study [201], the authors conducted the following experiment: They added a very dense valid equality constraint to a few constraints of the LP relaxation at each node while solving IP instances from MIPLIB using CPLEX. This does not change the underlying polyhedron, but makes the constraints dense. They observed approximately \(25\%\) increase in time to solve the instances if 9 constraints were made artificially dense.

In order to keep the underlying LP sparse, most commercial MILPs solvers consider sparsity of cuts as an important criterion for cutting-plane selection and use. The use of sparse cutting-planes may be viewed as a compromise between two competing objectives. As discussed above, on the one hand, the use of sparse cutting-planes aids in solving the linear programs encountered in the branch-&-bound tree faster. On the other hand, it is possible that ‘important’ facet-defining or valid inequalities for the convex hull of the feasible solutions are dense and thus without adding these cuts, one may not be able to attain significant integrality gap closure. This may lead to a larger branch-&-bound tree and thus result in the solution time to increase.

It is challenging to simultaneously study both the competing objectives in relation to cutting-plane sparsity. Therefore, a first approach to understanding usage of sparse cutting-planes is the following: How much do we lose in the closure of integrality gap if we only use (some of the) sparse cuts (as against dense cuts)?

2.1 Using all sparse cuts

The paper [117] considers the following starting point for understanding better this issue: If we are able to separate and use all valid inequalities with a given level of sparsity how much does this cost in terms of loss in closure of integrality gap?

Considered more abstractly, consider a polytope P contained in the \([0,\ 1]^n\) hypercube, representing the integer hull of an integer program. (The assumption \(P \subseteq [0,1]^n\) is WLOG and just a normalization.) A cut \(a x \le b\) is called k-sparse if the vector a has at most k nonzero components. For the polytope P, define \(P^k\) as the best outer-approximation obtained from k-sparse cuts, that is, it is the intersection of all k-sparse cuts valid for P. Consider the following natural measure of quality of approximation of P by sparse inequalities: \(\text {d}(P, P^k):= \max _{x \in P^k} \left( \mathrm{min}_{y \in P} \Vert x - y\Vert \right) \), where \(\Vert \cdot \Vert \) is the \(\ell _2\) norm. Note that \(\text {d}(P, P^k)\) is an upper bound on the depth of cut measure discussed in Sect. 1.2 when the left-hand-side of the cut is normalized to be a unit vector. The largest distance between any two points in the \([0,\ 1 ]^n\) hypercube is at most \(\sqrt{n}\). Therefore we will compare the value of \(\text {d}(P, P^k)\) to \(\sqrt{n}\).

We were able to obtain the following result in [117] to understand how well sparse cuts approximate the actual shape of the integer hull.

Theorem 1

(Upper Bound on \(\text {d}(P, P^k)\)) Let \(n \ge 2\). Let \(P \subseteq [0, 1]^n\) be the convex hull of points \(\{p^1, \ldots , p^t\}\). Then

$$\begin{aligned} \text {d}(P, P^k) \le \mathrm{min}\left\{ 8\frac{\sqrt{n}}{\sqrt{k}}\sqrt{2\log 4 tn}, 2\sqrt{n}\left( \frac{n}{k} -1\right) \right\} . \end{aligned}$$

In particular, consider polytopes with ‘few’ vertices, say \(n^q\) vertices for some constant q. Suppose we decide to use cutting-planes with half sparsity (i.e. \(k = \frac{n}{2}\)), a reasonable assumption in practice. Then plugging in these values, it is easily verified that \(\text {d}(P, P^k) \le {16}\sqrt{(q + 1)\log n} \approx c \sqrt{\log n}\) for a constant c, which is a significantly small quantity in comparison to \(\sqrt{n}\). In other words, if the number of vertices is small, independent of the location of the vertices, using half sparsity cutting-planes allows us to approximate the integer hull very well. We believe that as the number of vertices increase, the structure of the polytope becomes more important in determining \(\text {d}(P, P^k)\) and Theorem 1 only captures the worst-case scenario.

In the paper [117], it was also shown that the bound presented in Theorem 1 is tight for random 0-1 polytopes.

The proof of Theorem 1 uses probabilistic techniques which we outline below.

Proof sketch of Theorem 1

Consider a polytope \(P= {{\mathrm{Conv}}}\{p^1, p^2, \ldots , p^t\}\) in \([0,1]^n\). We only prove the first bound in the theorem, that is, we show that \(\text {d}(P, P^k)\) is at most \(4 \lambda ^*\) for

$$\begin{aligned} \lambda ^* = \frac{2n^{1/2}}{\sqrt{k}} \sqrt{2\log 4 t n}. \end{aligned}$$

Equivalently, we show that every point at distance more than \(4 \lambda ^*\) away from \(P\) is cut off by a k-sparse inequality valid for P. Assume that \(4 \lambda ^*\) is at most \(\sqrt{n}\), otherwise the result is trivial.

Let \(u \in \mathbb {R}^n\) be a point at distance more than \(4 \lambda ^*\) away from \(P\). Let v be the closest point in \(P\) to u. We can write \(u = v + \lambda d\) for some vector d with \(\Vert d\Vert _2 = 1\) and \(\lambda > 4\lambda ^*\). The inequality \(dx \le dv\) is valid for \(P\), so in particular \(d p^i \le dv\) for all \(i \in [t]\). In addition, this inequality cuts off u: \(du = dv + \lambda > dv\). The idea is to use this extra slack factor \(\lambda \) in the previous equation to show we can sparsify the inequality \(dx \le dv\) while maintaining separation of \(P\) and u. It then suffices to prove the following lemma.

Lemma 1

There is a vector \(\tilde{d} \in \mathbb {R}^n\) such that: \(\tilde{d}\) is k-sparse, \(\tilde{d} p^i \le \tilde{d} v + \frac{\lambda }{2}\) for all \(i \in [t]\), and \(\tilde{d} u > \tilde{d} v + \frac{\lambda }{2}\).

To prove the lemma we construct a random vector \(\tilde{\varvec{D}} \in \mathbb {R}^n\) satisfying these three properties simultaneously with non-zero probability. Let \(\alpha = \frac{k}{2 \sqrt{n}}\). Assume for simplicity that no coordinate of d is too big, that is, \(\alpha |d_i| \le 1\) for all i; bigger coordinates can be handled with simple modifications of the proof. Then define \(\tilde{\varvec{D}}\) as the random vector with independent coordinates where \(\tilde{\varvec{D}}_i\) takes value \(\text {sign}(d_i)/\alpha \) with probability \(\alpha |d_i|\) and takes value 0 with probability \(1 - \alpha |d_i|\). (For convenience we define \(\text {sign}(0) = 1\).)

Property 1. The probability that \(\tilde{\varvec{D}}\) is not k-sparse is at most \(\frac{1}{4n}\). Note that this vector is sparse in expectation, since the expected number of non-zero coordinates is \(\alpha \Vert d\Vert _1 \le k/2\). A simple calculation also upper bound the variance \(\text {Var}(\# {\text {of non-zero coords. of}} \tilde{\varvec{D}}) \le \frac{k}{2}.\) Since the coordinates of this vector are independent, by the Central Limit Theorem we expect it has approximately \(\frac{k}{2} \pm \sqrt{\frac{k}{2}} \ll k\) non-zero coordinates. In fact, one can make this quantitative by using Bernstein’s Inequality [63], obtaining that the probability that \(\tilde{\varvec{D}}\) is not k-sparse is at most \(\frac{1}{4n}\) (this uses the assumption \(4 \lambda ^* \le \sqrt{n}\)).

Property 2. The probability that for some vertex \(p^i\) we have \(\tilde{\varvec{D}} p^i \ge \tilde{\varvec{D}} v + \frac{\lambda }{2}\) is at most 1 / 4n. Given the “slack of \(\lambda \)” (which is at least \(4 \lambda ^*\)) mentioned before, it suffices to show that \(\Pr (\max _{i \in [t]} (\tilde{\varvec{D}} (p^i - v) - d(p^i - v)) \ge 2 \lambda ^*)\) is at most 1 / 4n. Define the centered random variable \(\varvec{Z}= \tilde{\varvec{D}} - d\). Triangle inequality and the fact that \(v \in {{\mathrm{Conv}}}\{p^1,p^2,\ldots ,p^n\}\) give that \(\max _{i \in [t]} \varvec{Z}(p^i - v) \le 2 \max _{i \in [t]} |\varvec{Z}p^i|\); so we upper bound the probability that the right-hand side is larger than \(\lambda ^*\).

First, fix a single \(i \in [t]\). Since \(\varvec{Z}\) is centered we have \(\mathbb {E}[\varvec{Z}p^i] = 0\). Simple calculations show that

$$\begin{aligned} \text {Var}(\varvec{Z}p^i) = \text {Var}(\tilde{\varvec{D}} p^i) \le \frac{\sqrt{n}}{\alpha } \le \frac{(\lambda ^*)^2}{4 \log 4 tn}. \end{aligned}$$

Then, again we expect that \(\varvec{Z}p^i\) is approximately \(0 \pm \sqrt{\text {Var}(\varvec{Z}p^i)} \ll \lambda ^*\), and using Bernstein’s inequality one actually obtains \(\Pr (|\varvec{Z}p^i| \ge \lambda ^*) \le \frac{1}{4tn}.\)

This analysis only considered a single \(i \in [t]\). But employing a union bound we can put all these bounds together and obtain that the probability that any \(i \in [t]\) has \(|\varvec{Z}p^i| \ge \lambda ^*\) is at most \(\frac{t}{4tn} = \frac{1}{4n}\), and thus \(\Pr (\max _{i \in [t]} |\varvec{Z}p^i| \ge \lambda ^*) \le \frac{1}{4n}\) as desired.

Property 3. The probability that \(\tilde{\varvec{D}}u > \tilde{\varvec{D}}v + \frac{\lambda }{2}\) is at most \(\frac{1}{2n - 1}\). Recalling \(u - v = \lambda d\), it is equivalent to upper bound \(\Pr (\tilde{\varvec{D}}d \le 1/2)\). In expectation we have \(\mathbb {E}[\tilde{\varvec{D}}d] = dd = 1\), and simple calculations (using \(4 \lambda ^* \le \sqrt{n}\)) show that for every scenario we have \(\tilde{\varvec{D}}d \le n\). Then employing Markov’s inequality to the non-negative random variable \(n - \tilde{\varvec{D}}d\), we get \(\Pr (\tilde{\varvec{D}}d \le 1/2) \le 1 - \frac{1}{2n - 1}\).

To prove Lemma 1, simply employ a union bound to see that with positive probability \(\tilde{\varvec{D}}\) satisfies the three properties simultaneously.

One drawback of this result is that it considers a very stylized situation, since almost always some dense cutting-planes are used or one is interested in approximating the integer hull only along certain directions or we may be able to use reformulations. Therefore, the paper [112] studies the following questions: (i) Are there polytopes, where the quality of approximation by sparse inequalities cannot be significantly improved by adding polynomial (or even exponential) number of arbitrary valid inequalities? (ii) Are there polytopes that are difficult to approximate under every rotation? (iii) Are there polytopes that are difficult to approximate in almost all directions using sparse inequalities? it was shown that the answer each of the above questions in the positive. This is perhaps not surprising: an indication that sparse inequalities do not always approximate integer hulls well even in the more realistic settings discussed above.

2.2 Sparse cuts for sparse formulations

Another feature missing from the previous results is that they work on worst-case instances. However, as mentioned before, in practice most instances in practice have a sparse LP formulation. For one, it is reasonable to expect that sparse cut perform better on sparse formulations. Moreover, the sparsity structure of the formulation can serve as a guide to choosing the support of the which sparse cuts to use.

As an extreme example, consider the feasible region of the following separable MILP:

$$\begin{aligned}&A^1x^1\le b^1\\&A^2x^2\le b^2\\&x^1\in \mathbb {Z}^{p_1}\times \mathbb {R}^{q_1},x^2\in \mathbb {Z}^{p_2}\times \mathbb {R}^{q_2} \end{aligned}$$

Since the constraints are completely disjoint in the \(x^1\) and \(x^2\) variables, the convex hull is obtained by adding valid inequalities in the support of the first \(p_1 + q_1\) variables and another set of valid inequalities for the second \(p_2 + q_2\) variables. Therefore, sparse cutting-planes, in the sense that their support is not on all the variables, are sufficient to obtain the convex hull. Now one would like to extend such a observation even if the constraints are not entirely decomposable, but “loosely decomposable”. Indeed this is the hypothesis that is mentioned in the classical computational paper [97]. This paper solves fairly large-scale 0-1 integer programs (up to a few thousand variables) within an hour in the early 80s, using various preprocessing techniques and the lifted cover inequalities within a cut-and-branch scheme. To quote from this paper:

“the inequalities that we generate preserve the sparsity of the constraint matrix.”

The paper [116] examines this interplay between formulation sparsity and sparse cuts. While the paper considers packing MILPs, covering MILPs, and a more general form of MILPs where the feasible region is arbitrary together with assumptions guaranteeing that the objective function value is non-negative, here we focus only on packing problems. So consider a packing problem of the form:

$$\begin{aligned} \mathrm{max}~&c^Tx \nonumber \\ s.t. ~&Ax \le b \nonumber \\&x_j \in \mathbb {Z}_+, \forall j \in \mathcal {L} \\&x_j \in \mathbb {R}_+, \forall j \in [n]\backslash \mathcal {L}, \nonumber \end{aligned}$$
(P)

with \(A \in \mathbb {Q}_{+}^{m \times n}\), \(b\in \mathbb {Q}_+^m\), \(c \in \mathbb {Q}_+^{n}\) and \(\mathcal {L} \subseteq [n]\).

In order to formalize the notion of “almost decomposable”, [116] uses an interaction graph of the columns of the constraint matrix A.

Definition 1

(Packing interaction graph of A) Consider a matrix \(A \in \mathbb {Q}^{m \times n}\). Let \(\mathcal {J}:= \{J_1, J_2 \ldots , J_q\}\) be a partition of the index set of columns of A (that is [n]). We define the packing interaction graph \(G^{\text {pack}}_{A, \mathcal {J}}= (V,E)\) as follows: (i) There is one node \(v_j \in V\) for every part \(J_j \in \mathcal {J}\). (ii) For all \(v_i, v_j \in V\), there is an edge \((v_i, v_j) \in E\) if and only if there is a row in A with non-zero entries in both parts \(J_i\) and \(J_j\), namely there are \(k \in [m]\), \(u \in J_i\) and \(w \in J_j\) such that \(A_{ku} \ne 0\) and \(A_{kw} \ne 0\).

Now we turn to the control over the sparsity structure of cuts one is willing to use. For that, [116] uses a list of allowed supports of the cuts and considers the result of adding all the valid cuts within these supports.

Definition 2

(Column block-sparse closure) Given the problem \(\mathrm{(P)}\), let \(\mathcal {J}:= \{J_1, J_2, \ldots , J_q\}\) be a partition of the index set of columns of A (that is [n]) and consider the packing interaction graph \(G^{\text {pack}}_{A, \mathcal {J}}= (V, E)\). With slight overload in notation, for a set of nodes \(S \subseteq V\) we say that inequality \(\alpha x \le \beta \) is a sparse cut on S if its support is contained in \(\bigcup _{v_j \in S} J_j\). The closure of these cuts is denoted by \(P^{(S)} := P^{(\bigcup _{v_j \in S} J_j)}\). Given a collection \(\mathcal {V}\) of subsets of the vertices V (the support list), we use \(P^{\mathcal {V},\text {pack}}\) to denote the closure obtained by adding all sparse cuts on the sets in \(\mathcal {V}\)’s, namely

$$\begin{aligned} P^{\mathcal {V},\text {pack}}:= \bigcap _{S \in \mathcal {V}} P^{(S)}. \end{aligned}$$

Example 1

(Two-stage stochastic packing program) The deterministic equivalent of a two-stage stochastic program with finitely many realizations of uncertain parameters in the second stage has the following form:

$$\begin{aligned} \max ~~~~&\sum _{i = 0}^K (c^i)^T x^i\nonumber \\ s.t. ~~~~&A x^0 \le b \\&A^i x^0 + B^i x^i \le b^i ~~~~~~~~\forall i \in [k]\nonumber \\&x^0, \ldots , x^k \text { integral}, \nonumber \end{aligned}$$
(2-SSP)

where \(x^0\) are the first stage variables and the \(x^i\)’s for \(i \ge 1\) correspond to each realization in the second stage. Note that there are no constraints involving two different realizations.

It is natural to put all the first stage variables \(x^0\) into one block and each of the second stage variables \(x^i\) (for \(i \ge 1\)) into a separate block of variables. So we have a graph \(G^{\text {pack}}_{A, \mathcal {J}}\) with vertex set \(\{v_0, v_1, \ldots , v_{k}\}\) and edges \((v_0, v_1), (v_0, v_2), \ldots , (v_0, v_k)\).

Consider the cuts on the support of first stages variables together with the variables corresponding to one second stage realization, the so-called single-scenario cuts. Such cutting-planes are well-studied, see for example [54, 66, 195, 211].

Intuitively, the “fewer edges” the graph \(G^{\text {pack}}_{A, \mathcal {J}}\) has, the more “almost decomposable” the blocks of variables \(J_1, J_2 \ldots , J_q\) are. However, it is also crucial to capture the interplay between this graph structure and the supports of the sparse cuts to be used. For that, [116] defines the notion of mixed stable sets and fractional mixed chromatic number (denoted by \(\eta ^{\mathcal {V}}\)) based on them (we refer to the paper for the exact definition). In the case the support list \(\mathcal {V}\) consists of just the individual nodes of \(G^{\text {pack}}_{A, \mathcal {J}}\) these notions reduce to the standard stable sets and fractional chromatic number.

With these definitions at hand, the main result of [116] for packing problem is a bound on the strength of sparse cuts, where the support of the sparse cuts is in a given list \(\mathcal {V}\), such that the bound solely depends on the fractional mixed chromatic number of \(G^{\text {pack}}_{A, \mathcal {J}}\). In particular, it is independent of the dimension and the data of the problem.

Theorem 2

Consider a packing integer program as defined in (P). Let \(\mathcal {J} \subseteq 2^{[n]}\) be a partition of the index set of columns of A and let \(G = G^{\text {pack}}_{A, \mathcal {J}}= (V,E)\) be the packing interaction graph of A. Then for any sparse cut support list \(\mathcal {V} \subseteq 2^V\) we have

$$\begin{aligned} z^{\mathcal {V}, \text {pack}} \le \eta ^{\mathcal {V}}(G) \cdot z^I, \end{aligned}$$

where \(z^{\mathcal {V}, \text {pack}}\) is the optimal value over the sparse closure \(P^{\mathcal {V},\text {pack}}\) and \(z^I\) is the optimal value of the original integer program (P).

Since the definitions involved in this theorem are a mouthful, we consider the more concrete case of the 2-stage stochastic packing program (2-SSP). Consider graph \(G^{\text {pack}}_{A, \mathcal {J}}\) defined as in Example 1, and consider the support list \(\mathcal {V}\) consisting of just the individual nodes of \(G^{\text {pack}}_{A, \mathcal {J}}\). In this case, only \(x^i\)-cuts are allowed (for \(i=0,1,\ldots ,k\)), i.e., cuts supported only on the \(x^i\) variables. As mentioned above, in this case the fractional mixed chromatic number reduces to the standard fractional chromatic number, which equals 2 for the star graph \(G^{\text {pack}}_{A, \mathcal {J}}\). Therefore, Theorem 2 reduces to the following.

Corollary 1

Consider a 2-stage stochastic integer program (2-SSP). If \(z^I\) is the optimal value of this integer program and \(z^{sparse}\) is the optimal value of the relaxation consisting only of all \(x^i\)-cuts (for \(i=0,\ldots ,k\)), then

$$\begin{aligned} z^{sparse} \le 2 z^I. \end{aligned}$$

We prove this corollary to illustrate the simple ideas behind the the proof of Theorem 2.

Proof of Corollary 1

Let \(P^I\) denote the integer hull of the integer program. Let \(\bar{x} = (\bar{x}^0, \ldots , \bar{x}^k)\) be a feasible solution for the relaxation consisting only of all the \(x^i\)-cuts cuts (for \(i = 0,\ldots ,k\)). We will upper bound its objective value by \(2 z^I\) by constructing two good integer solutions in \(P^I\).

Let \(\tilde{x}^i\) be the optimal solution just for the ith block of the problem, namely

Because we assumed the problem to be of packing-type, this is equivalent to

$$\begin{aligned} \tilde{x}^i \in {{\mathrm{argmax}}}\bigg \{ (c^i)^T x^i : x^i \in \text {proj}_{x^i} P^I\bigg \}, \end{aligned}$$
(2)

where \(\text {proj}_{x^i}\) denotes the projection onto the variables \(x^i\). Assume without loss of generality that \(\tilde{x}^i\) is integral.

By definition, \((\tilde{x}^0, 0, \ldots , 0)\) is a feasible solution for (2-SSP); moreover, because there are no constraints in (2-SSP) connecting any pair of different blocks \(x^i,x^j\) for \(i,j \ge 1\), we obtain that \((0, \tilde{x}^1, \ldots , \tilde{x}^k)\) is also a feasible solution. (This indicates the importance of stable sets in this context.) Each of these two solutions has value at most the optimum \(z^I\), thus

$$\begin{aligned} \sum _{i=0}^{k} (c^i)^T \tilde{x}^i \le 2 z^I. \end{aligned}$$
(3)

Now we claim that \(\bar{x}^i \in \text {proj}_{x^i} P^I\): this is because every valid inequality \(\alpha ^i x^i \le \beta \) for \(\text {proj}_{x^i} P^I\) naturally yields the \(x^i\)-cut \((0,\ldots ,0,\alpha ^i,0\ldots ,0) x \le \beta \) in the original space, which is satisfied by \(\bar{x}\), and hence \(\alpha ^i \bar{x}^i \le \beta \). The optimality of \(\tilde{x}^i\) in (2) then gives that \((c^i)^T \bar{x}^i \le (c^i)^T \tilde{x}^i\) (for all \(i = 0,\ldots , k\)), and so

$$\begin{aligned} \sum _{i = 0}^k (c^i)^T \bar{x}^i \le \sum _{i = 0}^k (c^i)^T \tilde{x}^i. \end{aligned}$$

Together with inequality (3) this concludes the proof.

The paper [116] also present similar phenomena for covering problems, and for more general MILPs where the feasible region is arbitrary together with assumptions guaranteeing that the objective function value is non-negative. This requires a different notion of interaction graph, and also the so-called “corrected average density” of the support list to be taken into account. The paper also present examples where these bounds are tight.

3 Selecting knapsack relaxation cuts

Recall that an important class of cuts via the method of structured relaxation (Sect. 1), is obtained via the knapsack relaxation. There are numerous ways to generate a knapsack relaxation: one can just use the original constraints, but more generally, we can take a linear combination of the constraints and obtain a knapsack relaxation. We call such cuts as aggregation cuts. Most cuts used by modern IP solvers (except perhaps clique cuts) can be described as aggregation cuts.

The set of all aggregation cuts have been studied empirically [127], but not much is known from a theoretical perspective. An important question from the perspective of cutting-plane selection is: How should we construct these relaxations? What multipliers should we use? What happens if we select cuts from relaxations obtained only using individual constraints, without aggregating?

3.1 Aggregation cuts and packing and covering MILPs

The paper [56] examines the strength of aggregation cuts for packing and covering MILPs. The main result is that for these classes of problems, even considering all infinitely many aggregations, offers limited help. More precisely, they show that the CG and more generally aggregation closures can be 2-approximated by simply generating the respective closures for each of the original constraints, without using any aggregations. Therefore, for these problems, in order to obtain cuts that are much stronger than original constraint cuts, one needs to consider more complicated cuts that cannot be generated through aggregations (e.g. clique cuts).

To give an idea of what is behind these results, we focus here on a weaker version of the result for aggregation cuts and packing polyhedra. We say that a packing polyhedron \(\{x \in \mathbb {R}^n_+ : Ax \le b\}\) is pre-processed if \(0 \le A_{ij} \le b_i\) for all ij (notice that otherwise all integer solutions have \(x_j = 0\) so we can simply exclude this variable). Given a packing polyhedron Q, its aggregation closure is defined as

$$\begin{aligned} \mathcal {A}(Q) := \bigcap _{\lambda \in \mathbb {R}^m_+} {{\mathrm{Conv}}}(\{x \in \mathbb {Z}^n_+ \mid \lambda ^\top A x \le \lambda ^\top b\}). \end{aligned}$$

Given two packing setsFootnote 1 \(U \supseteq V\), we say that U is an \(\alpha \)-approximation of V if for all non-negative objective functions \(c \in \mathbb {R}^n_+\) we have

$$\begin{aligned} \max \{c^\top x \mid x \in U\} \le \alpha \cdot \max \{c^\top x \mid x \in V\}. \end{aligned}$$
(4)

Notice that since \(U \supseteq V\), we have \(\alpha \ge 1\).

One of the results present in [56] is that the for pre-processed packing polyhedra the aggregation closure is quite close to the LP relaxation.

Theorem 3

For any pre-processed packing polyhedron P, P itself is a 2-approximation of the aggregation closure \(\mathcal {A}(P)\).

We remark that the LP relaxation P can have a gap of factor up to \(\Omega (n)\) to the integer hull (for example, in the standard formulation of the stable set problem), thus this result says that the aggregation closure can make very little progress.

We now present the proof of Theorem 3. At a very high level it stems from the following simple principle, which may be useful in guiding the choice of relaxations to be used for generating cuts:

“Relaxations with small integrality gap may yield weak cuts.”

This can be formalized in the case of packing sets as follows. Given a set Q, let \(Q^I\) denote its integer hull.

Lemma 2

Let \(P \in \mathbb {R}^n_+\) be packing set. Let \(\mathcal {Q}\) be a family of packing sets containing P (e.g., relaxations). Suppose there is \(\alpha \) such that for all \(Q \in \mathcal {Q}\), Q is an \(\alpha \)-approximation of \(Q^I\). Let \(\tilde{P} = P \cap \bigcap _{Q \in \mathcal {Q}} Q^I\), i.e., the effect of the addition of all valid cuts for the sets Q. Then P itself is an \(\alpha \)-approximation of \(\tilde{P}\).Footnote 2

Proof

One can show that given any two packing sets \(U \subseteq V\), V is an \(\alpha \)-approximation of U if and only if \(V \subseteq \alpha U := \{\alpha x : x \in U\}\) (see [130] and the extension to the non-polyhedral case in [56]). Thus, by our assumptions we have \(P \subseteq Q \subseteq \alpha Q^I\) for all \(Q \in \mathcal {Q}\), and since \(\alpha \ge 1\) we also have \(P \subseteq \alpha P\). Therefore

$$\begin{aligned} P \subseteq \alpha P \cap \bigcap _{Q \in \mathcal {Q}} \alpha Q^I = \alpha \left( \bigcap _{Q \in \mathcal {Q}} \alpha Q^I\right) = \alpha \tilde{P}, \end{aligned}$$

concluding the proof.

Theorem 3 is a quick consequence of this principle.

Proof of Theorem 3

For a fixed multiplier vector \(\lambda \), let \(P_{\lambda } = \{x \in \mathbb {R}^d_+ : \lambda ^{\top } A x \le \lambda ^{\top } b\}\) be the relaxation obtaining by the corresponding aggregation. Notice that \(P_{\lambda }\) is also a pre-processed packing polyhedron that is actually a knapsack. It is well-known and easy to show that for pre-processed knapsacks the LP relaxation is a 2-approximation of the IP hull [56], thus \(P_{\lambda }\) is a 2-approximation of \((P_{\lambda })^I\). The aggregation closure \(\mathcal {A}(P)\) is precisely the intersection of all the sets \((P_{\lambda })^I\), and so the result follows from the Lemma 2.

The paper [56] shows that results also hold for covering problem (with and without bounds) as well with suitable notions of “pre-processed” polyhedra. Moreover, the paper shows that all these pre-processings of arbitrary packing/covering polyhedra can be achieved by generating CG cuts for each of the rows individually, thus yielding a bound between the 1-row CG/aggregation closures and the full aggregation closure (see [56] for more detail).

3.2 Aggregations for general MIPs

The paper [56] also shows that for general MILPs the aggregation closure can be much stronger than the closure of the cuts implied by the original constraints of the MILPs individually. More precisely, the 1-row closure \(1\mathcal {A}(P)\) of a mixed-integer set \(P = \{ x \in \mathbb {R}^d \times \mathbb {Z}^n : Ax \le b,\, x \ge 0\}\) is obtained by intersecting the integer hulls of all relaxations \(\{ x \in \mathbb {R}^d \times \mathbb {Z}^n : (a_i)^\top x \le b_i,\, x \ge 0\}\) where \(a_i\) is the ith row of A. They show the following result.

Theorem 4

([56]) There is a family of mixed-integer sets where the 1-row closure is arbitrarily worse than the aggregation closure, namely for every \(\alpha \ge 0\) there is a mixed-integer set P and objective function c such that

$$\begin{aligned} \max \{ c^\top x : x \in 1\mathcal {A}(P)\} \ge \alpha \cdot \max \{ c^\top x : x \in \mathcal {A}(P)\}. \end{aligned}$$

As always, such worst-case examples could be very artificial and this phenomenon might not appear in practice. Given the availability of reasonably robust CG cut separating procedure [125], in our experiment we compare 1-row CG cuts (CG cuts for each \(\{ x \in \mathbb {R}^d \times \mathbb {Z}^n : (a_i)^\top x \le b_i,\, x \ge 0\}\)) versus general CG cuts. We study two classes of instances: random equality instances and the so-called market split instances [88]. See details of instances in [56]. It was empirically observed that the 1-row CG are significantly weaker than general CG cuts.

The message for cutting-plane selection is clear: For packing and covering IPs, it may be sufficient to generate cuts from knapsack cuts from original single constraints, while for more general IPs aggregations can be significantly useful.

3.3 Aggregations with sign patterns

We dig deeper into this question of packing and covering vs general IPs and the performance of aggregation cuts. Note that aggregation cuts can be generalized to multi-row aggregation: apply a k different set of multipliers to the original constraints to obtain a k-constraint relaxation and add cut valid for this set.

Observe that for packing and covering IPs all the coefficients of all the variables in all the constraints have the same sign. Therefore when we aggregate constraints we are not able to “cancel” variables, i.e., the support of an aggregated constraint is exactly equal to the union of supports of the original constraints used for the aggregation. A natural conjecture for the fact that the aggregation (resp. multi-row aggregation, i.e.) closure is well approximated by the original (resp. multi-row) 1-row closure for packing and covering problems, is the fact that such cancellations do not occur for these problems. Indeed one of the key ideas used to obtain good candidate aggregations in the heuristic described in [172] is to use aggregations that maximize the chances of a cancellation.

In order to study the effect of cancellations, the paper [113] examines the strength of aggregation closures vis-à-vis original row constraints for sign-pattern IPs. A sign-pattern IP is a problem of the form \(\{x\in \mathbb {Z}^n_+:Ax\le b\}\) where a given variable has exactly the same sign in every constraint, i.e. for a given j, \(A_{i,j}\) is either non-negative for all i or non-positive for all i. Thus aggregations do not create cancellations. The results are interesting: On the one hand we are able to show that the aggregation closure for such sign-pattern IPs is 2-approximated by the original 1-row closure. On the other hand, the multi-row aggregation closure cannot be well approximated by the original multi-row closure. Therefore, these classes of integer programs show results that are in between packing and covering IPs on one side and general IPs on the other side.

4 Final remarks

In Sect. 1, we discussed various types of cutting-planes available to modern day solvers and the theoretical analysis of these cutting-planes that has been conducted. As we discussed, while the theoretical analysis conducted thus far is important, these results have so far failed to directly help in actual cutting-plane selection. In Sect. 1.2, we then highlighted the need to focus on alternative questions in order to better develop the science of cutting-plane selection. We would like to throw a word of caution here. While the questions we have indicated may eventually be important ones to pursue, the answers may not be unique. Indeed, modern day solvers have become complex, highly integrated engineering marvels, where different components of the solvers are heavily inter-dependent, and therefore it may be altogether possible that one kind of cutting-plane selection improves the average solution time in the context of one code and miserably fails in conjunction with other codes! Moreover, different tasks performed by a solver need allocation of time and resource (memory) and the task of obtaining dual bounds is not necessarily the “top priority” for many industrial users of solvers. Given solvers must “please all”, the choice of cutting-plane selection definitely must speak to different requirements and compulsions. Overall, all of this points to the fact that the questions asked in Sect. 1 must be pursued rigorously, but specific answers for each solver may lie in balancing various competing goals in cutting-plane selection (as mentioned in Sect. 1.2).

Based on the above discussion, one is also tempted to ask if there is any value in attempting to pursue “principled and scientific” theory for cutting-plane selection at all. We believe that the answer is yes. Most solvers have implemented heuristics for cutting-planes selection that can be run cheaply and do not put too much pressure on other components of the solvers. To go back to the example of GMICs, one reason that most codes, before the work by [30], perhaps separated one cut and resolved LP was that it was considered expensive to add too many cuts and increase the size of LPs significantly. As computer speeds improve and specific architectures (such as parallel computing, GPUs etc.) steadily get better, it may be possible to revisit all the heuristics that current day solvers implement and improve them on the basis of hard science of integer programming and cutting-plane selection. Therefore, we believe a better understanding of cutting-plane selection in a systematic fashion is likely to improve solvers in the long run.

We also hope that there are some specific take-aways related to cut-selection from the results presented in Sects. 2 and 3: For example, if we have a structured sparse packing IP, Sect. 3 indicates that if we generate aggregation cuts, then we may restrict the selection of cuts from knapsack relaxations of the original constraints. The result of Sect. 2.2 presents indication of specific sparsity patterns of cuts to be selected, based on the sparsity pattern of the packing IP. We recognize that these results are very preliminary results; as such, we are very far from answering the questions raised in Sect. 1.2.

We end this paper with another important question: Is it possible to solve all (or any of) the questions raised in Sect. 1.2 using machine learning? Indeed, there is a significant trend of applying machine learning to integer programming. Recently this has been applied to branching [7, 154, 167], node selection [143], decompositions [159], solver parameter selection [146, 207], load balancing in parallel branch-and-bound [8], probing in MINLPs [178], and selecting solution methods for mixed-integer quadratic programs [59]. To the best of our knowledge machine learning has not been applied to cut generation and selection. We believe that the answer to improved and principled cut-selection may be somewhere in between using only machine learning or only theoretical analysis. It is possible to envisage that results from theoretical analysis of cutting-plane selection work in concert with learning algorithms. Indeed, learning task models may be able to leverage specific theoretical knowledge about cutting planes selection.

Overall, we hope that this review fosters inquiry into new questions and techniques that aid in cutting-plane selection, integrating practical knowledge and considerations with theoretical tools.