1 Introduction

Introduced in 1969 by Garfinkel et al. [8], the Set Partitioning Problem (SPP) is a well known model of integer linear programming. Its popularity mainly comes from its simple expression, and the wide range of its applications. It can be expressed asFootnote 1

figure a

Applications range from aircrew scheduling [4] to vehicle routing [2] and electricity production planning [25], among others.

Primal methods for integer linear programming were introduced in the 1960s, at the same time as most classical frameworks (e.g., branch and bound). These algorithms, sometimes described as augmenting methods or all-integer algorithms, are based on the following pattern: given a starting integer solution, improve it iteratively to obtain a sequence of integer solutions with a better objective function value until optimality is reached.Footnote 2 Two of their main features are (i) to avoid the combinatorial exploration of a branching tree and (ii) to take advantage of existing starting solutions. In turn, a key feature of SPP is that it possesses the quasi-integral (or Trubin) property [34], i.e., every edge of the convex hull of the feasible set is also an edge of the polytope of the linear relaxation. Therefore, it is suitable for the implementation of all-integer algorithms in which all improvements are obtained by performing simplex pivots;Footnote 3 such algorithms are hence named integral simplex algorithms. One of the first attempt in this direction is the seminal work of Balas and Padberg [1], who first proposed an integral simplex algorithm specifically designed for the set partitioning problem.

One of the main drawbacks of algorithms based on simplex pivots is their inability to perform well on degenerate problems. In mathematical programming, degeneracy occurs when some basic variables are at one of their bounds, which is common in the particular case of SPP. In this case, it is very much likely that the value of variable entering the simplex basis cannot be modified without making the current solution infeasible. The resulting degenerate pivot leads to no change in the solution, and no improvement in the objective value. Recently, Zaghrouti et al. [37] proposed a new algorithm for SPP, the Integral Simplex Using Decomposition (Isud) which is an offspring of recent works conducted around the Improved Primal Simplex in [6, 17, 18, 33]. It is therefore designed to take advantage of degeneracy, rather than suffer from it. Combined with (i) the canonically degenerate nature of the SPP, particularly in the industrial applications, and (ii) the observation that primal algorithms experience troubles with degeneracy, particularly when primal cutting planes are used (e.g., [16]), the previous observations on the advantageous way that Isud copes with degeneracy show its potential to embody the next generation of primal algorithms. Furthermore, it seems natural to apply primal cutting planes techniques within an “anti-degeneracy” framework since degeneracy is reported as the main trouble experienced when adding cuts in primal methods.

From the applicative point of view, and as shown in the last part of this paper, Isud proves highly efficient for reoptimization. This key feature makes it a very promising method in two particularly important cases which are reoptimization of existing solutions, and all-integer column generation. First, the need to quickly reoptimize an existing solution (a schedule) in case of unforeseen events is fundamental in many practical cases. Take for instance the example of airline crew scheduling (see [31]): The manpower planned schedule must often be modified to react to day-to-day operational constraints such as schedule disruptions, aircraft substitutions, crew absences, strikes or even volcanic eruptions! Second, consider the all-integer column generation process. Column generation is an optimization technique used to solve very large problems. When solving problems with integer variables, it is usually embedded in a branch-and-price framework. As for standard integer programming, the exploration of the branching tree can turn to be a very long process. In the same way that primal algorithms are an alternative to branch and bound, all-integer column generation is an alternative to branch and price. In all-integer column generation, a subset of the columns of the constraint matrix is generated in the beginning, and the optimal solution to this program, called restricted master problem, is found. Then, subproblems generate new columns to be appended to the matrix, and the new optimal solution for the extended matrix must be determined. One then iterates the process until no column can be generated, that improves the solution. Obviously, solving the extended problem from scratch (branch and bound) results in a loss of information and a probable lack of efficiency, while using the previous solution as a warm-start could give the algorithm a substantial advantage (primal algorithms). Hence, any efficient all-integer column generation code requires a good reoptimization method, since the global process is based on successive updates of an integral solution. Thus, Isud is an excellent candidate for the reoptimization of the optimal solution of the restricted master problem every time it is extended.

This paper is organized as follows. A literature review on primal algorithms, primal cutting planes and integral simplex methods, as well as some notation, problem definitions and contribution statement are given in Sect. 2. The Isud algorithm is presented in an innovative way, and new theoretical results are discussed in Sect. 3. Primal cutting planes are discussed in Sect. 4, new separation procedures are described, and we show that the search space of the separation can be restricted without preventing from finding these cutting planes. Numerical results are displayed in Sect. 5, which show the potential of adding primal cuts to Isud, and conclusions are drawn in Sect. 6. An extended abstract of this work was published in [24].

2 Literature review and contribution statement

In this paper, lower-case bold symbols are used for column vectors and upper-case bold symbols denote matrices. For subsets \(\mathcal {X} \subseteq \left\{ 1, \ldots , m \right\} \) of row indices and \(\mathcal {Y} \subseteq \left\{ 1, \ldots , n \right\} \) of column indices, the submatrix of \(\varvec{A}\) with rows indexed by \(\mathcal {X}\) and columns indexed by \(\mathcal {Y}\) is denoted as \(\varvec{A}_{\mathcal {X}\mathcal {Y}}\). Similarly, \(\varvec{A}_{\mathcal {X}.}\) is the set of rows of \(\varvec{A}\) indexed by \(\mathcal {X}\), while \(\varvec{A}_{.\mathcal {Y}}\) is the set of columns of \(\varvec{A}\) indexed by \(\mathcal {Y}\), and for any vector \(\varvec{v} \in \mathbb {R}^n\), \(\varvec{v}_\mathcal {Y}\) is the subvector of all \(v_y\), \(y \in \mathcal {Y}\). The vector of all zeros (resp. ones) with dimension dictated by the context is denoted by \(\varvec{0}\) (resp. \(\varvec{e}\)), and \(\varvec{A}^{T}\) is the transpose of \(\varvec{A}\). Finally, the linear span of all columns of any matrix \(\varvec{M}\), also called image of \(\varvec{M}\), is denoted as \( Span \,\left( \varvec{M} \right) \).

2.1 Primal algorithms for integer linear programs

Before addressing the SPP, a general introduction on primal algorithms for ILP is given here. We consider a generic integer linear program

figure b

where \(\varvec{A} \in \mathbb {N}^{m \times n}\), \(\varvec{b} \in \mathbb {N}^m\) and \(\varvec{c}\in \mathbb {N}^n\), with \(\mathbb {N}\) the set of natural integers. \(\varvec{A} \varvec{x} = \varvec{b}\) are called the linear constraints and \(\varvec{x} \ge 0\) the nonnegativity constraints. The set of all feasible solutions of \({\textsc {ILP{}}}\) is denoted by \(\mathcal {F}_{{\textsc {ILP{}}}}\). \(z^\star _{{\textsc {ILP{}}}}\) is called the optimal value of \({\textsc {ILP{}}}\) and any feasible solution \(\varvec{x}^\star \in \mathcal {F}_{{\textsc {ILP{}}}}\) such that \(\varvec{c}^T \varvec{x}^\star = z^\star _{{\textsc {ILP{}}}}\) is called an optimal solution of \({\textsc {ILP{}}}\). The linear relaxation of \({\textsc {ILP{}}}\), denoted as \({\textsc {ILP{}}}^{\mathrm {LR}}\) is the linear program obtained by relaxing the integrality constraints of \({\textsc {ILP{}}}\). Its feasible domain and optimal value are resp. denoted as \(\mathcal {F}_{{\textsc {ILP{}}}^{\mathrm {LR}}}\) and \(z^\star _{{{\textsc {ILP{}}}}^{\mathrm {LR}}}\).

As noted by Letchford and Lodi [14], algorithms for integer linear programming can be divided into three classes: dual fractional, dual integral, and primal methods. Dual fractional algorithms maintain optimality and linear-constraint feasibility at every iteration, and they stop when integrality is reached. They are typically standard cutting plane procedures such as Gomory’s algorithm [10]. The classical branch-and-bound scheme is also based on a dual-fractional approach, in particular for the determination of lower bounds. Dual integral methods maintain integrality and optimality, and they terminate once the (primal) linear constraints are satisfied. Letchford and Lodi give the sole example of another algorithm of Gomory [11]. Finally, primal algorithms maintain feasibility (including integrality) throughout the process and stop when optimality is reached. These are in fact descent algorithms for which the improving sequence \((\varvec{x}_k)_{k = 1 \ldots K}\) satisfies the conditions

C1 :

\(\varvec{x}^k \in \mathcal {F}_{{\textsc {ILP{}}}}\);

C2 :

\(\varvec{x}^K\) is optimal;

C3 :

\(\varvec{c}^T \varvec{x}^{k+1} < \varvec{c}^T \varvec{x}^k\).

Primal methods—sometimes classified as augmenting algorithms – were first introduced simultaneously by Ben-Israel and Charnes [3] and Young [35] and improved by Young [36] and Glover [9]. In Young’s method [35, 36], at iteration k, a simplex pivot is considered: if it leads to an integer solution, it is performed; otherwise, cuts are generated and added to the problem, thereby changing the underlying structure of the constraints. Young also developed the concept of a augmenting (or improving) vector at \(\varvec{x}^k\), i.e., a vector \(\varvec{z} \in \mathbb {R}^n\) such that \(\varvec{x}^k + \varvec{z}\) is integer, feasible, and of lower cost than \(\varvec{x}^k\). From this notion comes the integral augmentation problem (i AUG) that involves finding such a direction if it exists or asserting that \(\varvec{x}^k\) is optimal.

figure c

Remark 1

Traditionally, papers on constraint aggregation and integral simplex algorithms deal with minimization problems, whereas authors usually present generic primal algorithms for maximization problems. We therefore draw the reader’s attention to the following: to retain the usual classification, we call the improving direction problem i AUG, although it supplies a decreasing direction. In the same way, an improvement (general term) is, in our case, a decrease. For the same reasons, we still call it an augmentation.

For the sake of readability, in this paper, we will differentiate i AUG (above) from the fractional augmentation f AUG. The latter is the relaxation of the former in which \(\varvec{z}\) may be fractional and \(\varvec{x}^k + \varvec{z}\) needs only be a solution of the linear relaxation \({\textsc {ILP{}}}^{\mathrm {LR}}\).

In the end of the 1990s, there has been a renewed interest in primal integer algorithms, inspired by Robert Weismantel as mentioned in [15]. Many recent works specifically concern {0,1}-LP (integer programming problems for which the variables can only take values 0 or 1). However, only a few papers have addressed the practical solution of i AUG, most of them considering it as an oracle. As a matter of fact, most of the rare computational work since 2000 on primal algorithms concerns the primal separation problem, defined as

figure d

In the general case, there is no guarantee that a primal separation hyperplane exists, and no particular conclusion can be drawn if none is found. In our case, however, \(\varvec{x}^\star \) is typically an adjacent vertex of the feasible domain of the linear relaxation \(\mathcal {F}_{{\textsc {ILP{}}}^{\mathrm {LR}}}\) obtained by performing one or several simplex pivots from \(\varvec{x}^k\), which guarantees that such an hyperplane exists (see Sects. 3 and 4). An example of primal separation hyperplanes is given in Fig. 1.

Fig. 1
figure 1

Example of primal separation. \(\left( H\right) \) is a primal cut separating \(\varvec{x}^\star \) from \( Conv \,\left( \mathcal {F}_{{\textsc {ILP{}}}}\right) \), and tight at \(\varvec{x}^k\). The dark area represents the feasible domain of the linear relaxation \(\mathcal {F}_{{\textsc {ILP{}}}^{\mathrm {LR}}}\). The feasible domain of ILP is a finite set represented by bullets (\(\bullet \)) and its convex hull \( Conv \,\left( \mathcal {F}_{{\textsc {ILP{}}}}\right) \) is the light grey polyhedron

In 2003, Eisenbrand et al. [5] proved that the primal separation problem is, from the theoretical point of view, as difficult as the integral optimization problem for {0,1}-LP.

It is therefore expected to be a “complicated” problem because {0,1}-LP is \(\mathcal {NP}\)-hard. Letchford and Lodi [14, 16] and Eisenbrand et al. [5] adapt well-known algorithms for the standard separation problem to primal separation. To the best of our knowledge, only few papers present computational experiments using primal methods. Best examples are those of Salkin and Koncal [26], Letchford and Lodi [14], Haus et al. [12], and Stallmann and Brglez [30]. All these papers present results on small to mid-size instances. Haus et al. [12] describe a solid framework and their implementation is certainly the most complete one. Letchford and Lodi [14] present results for an algorithm using primal cutting planes and, interestingly, they stated that degeneracy prevented them from solving larger instances. As was already mentioned above, the Isud algorithm was originally designed to cope with degeneracy and therefore seems a promising answer to these computational limits of primal algorithms.

For further information on primal algorithms, the reader is referred to the more extensive review of Spille et al. [29].

2.2 The set partitioning problem

The Set Partitioning Problem (SPP) is a particular case of {0,1}-LP, presented in the introduction. In a mathematical programming form, it reads as

figure e

where \(\varvec{A} \in \{0,1\}^{m \times n}\) is a {0,1}-matrix, and \(\varvec{c} \in \mathbb {N}^n\) is any integer cost vector. Obviously, given the bounds (\(\varvec{0}\) and \(\varvec{e}\)) and the integrality constraints, \(\mathcal {F}_{{\textsc {SPP{}}}}\) only contains {0,1}-vectors. The set of indices of the rows is denoted as \(\mathcal {R} = \{ 1, \ldots , m\}\). Moreover, given a current solution \(\varvec{x}^0 \in \mathcal {F}_{{\textsc {SPP{}}}}\), the indices \(\left\{ 1, \ldots , n \right\} {}\) of its components can be partitioned into sets \(\mathcal {P} = \{ j | x^0_j = 1 \}\), which is the set of positive-valued variables, and \(\mathcal {Z} = \{ j | x^0_j = 0 \}\), which is the set of null variables. Submatrix \(\varvec{A}_{.\mathcal {P}}\) is referred to as the working basis, and so is \(\mathcal {P}\) by extension. Its indices and the variables of \(\varvec{x}_\mathcal {P}\) are respectively referred to as basic indices and basic variables, and \(p = |\mathcal {P}|\) denotes the cardinality of this working basis. The working basis is different from a standard simplex basis because it only contains linearly independent positive-valued variables (no degenerate variables). In the rest of this paper, the terms basis, basic, etc. refer to the working basis \(\mathcal {P}\).

As proved in 1969 by Trubin [34], the SPP is quasi-integral, i.e., every edge of the convex hull of the feasible set \( Conv \,\left( \mathcal {F}_{{\textsc {SPP{}}}}\right) \) is also an edge of the polytope of the linear relaxation \(\mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\). A consequence of this property is the existence of a decreasing sequence of integer solutions of SPP leading to an optimal solution, such that two consecutive solutions are adjacent vertices of \(\mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\) [easily proven by applying the simplex algorithm over \( Conv \,\left( \mathcal {F}_{{\textsc {SPP{}}}}\right) \)]. When working on the SPP, the following condition C4 can therefore be added to C1C3 to transform a primal algorithm into an integral simplex without preventing the procedure to reach optimality.

C4 :

\(x^{k+1}\) is a neighbor of \(x^k\) in \(\mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\).

These methods, first introduced in 1975 by Balas and Padberg [1], yield a sequence of improving all-integer solutions, obtained by only performing simplex pivots. Since this seminal paper, other integral simplex methods have been proposed (see Thompson [32] and Saxena [27]). Amongst contemporary work conducted parallel to ours, the most interesting one to mention is that conducted by Rönnberg and Larsson (see [21, 22]). Finally, note that the SPP is by nature highly degenerate. Hence it seems relevant to apply that kind of “anti-degeneracy” techniques in this case.

2.3 Contribution statement

With the concepts introduced in Sect. 2, we can describe the contributions of Sects. 3 and 4 more clearly. In Sect. 3, we present the Isud algorithm in a primal way, and relate it to the augmentation problems i AUG and f AUG. We formulate f AUG as a linear program, and i AUG as a nonlinear one of which f AUG is the linear relaxation. Each of these problems is decomposed into two subproblems. We reduce the number of constraints of both decomposed subproblems, and give a geometrical interpretation of these row-reduced problems. We also provide a simple characterization of integer directions. Section 4 addresses the improvement of the linear relaxation of i AUG with cutting planes. We demonstrate that every valid inequality for i AUG can be obtained as the linear transformation of a primal cut of SPP. We prove that such a primal cut always exists and that the characterization of integer directions given in Sect. 3 remains correct after the addition of cutting planes. Finally, we introduce two new P-SEP procedures for primal clique and odd-cycle cuts, and show that the search space for the cuts can be reduced to a small number of variables without changing the outcome of P-SEP.

3 The integral simplex using decomposition (ISUD)

This section aims to present the Integral Simplex Using Decomposition (Isud) of Zaghrouti et al. [37] from a purely primal point of view, so as to make the link with primal algorithms straightforward, and simplify some proofs as well as the geometrical interpretation of the process. Section 3.1 concentrates on f AUG, while Sect. 3.2 also considers integrality and tackles i AUG.

Hereinafter, suppose a decreasing sequence of solutions of \({\textsc {SPP{}}}\) ending at \(\varvec{x}^k\) is known, and i AUG must now be solved. For the sake of readability, we always denote the current (binary) solution as \(\varvec{x}^0\). We want to determine a direction \(\varvec{d} \in \mathbb {R}^n\) and a step \(r > 0\) such that \(\varvec{x}^1 = \varvec{x}^0 + r \varvec{d} \in \mathcal {F}_{{\textsc {SPP{}}}}\) and of lower cost than \(\varvec{x}^0\) or to assert that \(\varvec{x}^0\) is optimal. The set of positive-valued variables is denoted as \(\mathcal {P}^0 = \{ j | x^0_j = 1 \}\), and that of null variables as \(\mathcal {Z}^0 = \{ j | x^0_j = 0 \}\). For the sake of readability, \(\mathcal {P}\), \(\mathcal {Z}\), \(\varvec{d}\), and r (and later other objects) will not be indexed on the index of the current solution (k or 0) although they depend on \(\varvec{x}^0\) (or \(\varvec{x}^k\)).

3.1 Fractional augmentation f AUG and phase decomposition

In this section, the sole problem of a fractional augmentation, f AUG, is considered. However, for the sake of clarity, \(\varvec{x}^0 \in \mathcal {F}_{{\textsc {SPP{}}}}\) is still supposed to be integer. This section extends the work done by Omer et al. [18], and Rosat et al. [23] and details it in the case of the SPP.

3.1.1 Generic fractional augmentation

To practically address f AUG, it must be formulated in such a way that it can algorithmically be solved. It consists in finding a direction \(\varvec{d} \in \mathbb {R}^n\) such that it is feasible (\(\exists \rho > 0 | \varvec{x}^k + \rho \varvec{d} \in \mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\)) and augmenting (\(\varvec{c}^T \varvec{d} < 0\)). The set of all feasible directions at \(\varvec{x}^0\) is the cone

$$\begin{aligned} \varGamma = \left\{ \varvec{d} \in \mathbb {R}^n\, |\, \varvec{A} \varvec{d} = 0\, , \, \varvec{d}_\mathcal {P} \le \varvec{0}\, , \, \varvec{d}_\mathcal {Z} \ge \varvec{0} \right\} \text {,} \end{aligned}$$
(1)

from which we will only consider a section,

$$\begin{aligned} \varDelta = \varGamma \cap \left\{ \varvec{d} \in \mathbb {R}^n\, |\, \varvec{e}^T \varvec{d}_\mathcal {Z} = 1 \right\} \text {.} \end{aligned}$$
(2)

The linear constraint \(\varvec{e}^T \varvec{d}_\mathcal {Z} = 1\) is called the normalization constraint. A geometric interpretation of \(\varGamma \) and \(\varDelta \) is given in Fig. 2. Note that, since \(\varvec{A}\) is nonnegative and because of the sign constraints, any nonzero feasible direction \(\varvec{d} \in \varGamma \) has nonzero terms in both \(\varvec{d}_\mathcal {P}\) and \(\varvec{d}_\mathcal {Z}\). Hence, the normalization constraint defines a proper section of the cone and \(\varDelta \) is a (bounded) polytope.

Fig. 2
figure 2

Geometric description of \(\varDelta \) and \(\varGamma \). The cone \(\varGamma \) of all feasible directions at \(\varvec{x}^0\) is defined by the three arrowed lines. The grey area represents \(\varDelta \), the set of normalized feasible directions

It is easy to see that at least one feasible direction is augmenting if and only if the program

figure f

satisfies \(z^\star _{{\textsc {Mima}}{}} < 0\). On the one hand, any optimal solution of Mima yields a solution to f AUG; on the other hand, if \(z^\star _{{\textsc {Mima}}{}}\) is nonnegative, no feasible augmenting direction exists and \(\varvec{x}^0\) is optimal for \({\textsc {SPP{}}}\). Finally, since \(\varDelta \) is a polytope, \({\textsc {Mima}}{}\) is a bounded linear program. The name Mima stands for Maximum Incoming Mean Augmentation: The normalization constraint only concerns incoming variables, so the objective is a mean over a reduced subset of the variables. The choice of this normalization constraint follows that of the original algorithm of Zaghrouti et al. [37].

Remark 2

The normalization constraint \(\varvec{e}^T \varvec{d}_\mathcal {Z}\) can be replaced with any other equality of the form \(\varvec{w}^T \varvec{d} = 1\), as long as \(\varDelta \) defines a proper section of \(\varGamma \). On the theoretical side, this does not change the nature of the results presented in Sects. 3 and 4. In particular, the algorithm does not change, and neither do the separation algorithms. On the practical side, the choice of the normalization influences the performances of the algorithm. We use the aformentionned specific constraint in this paper for the following reasons: (1) it has been used in the original version of the algorithm [37] and therefore makes the comparison with it fairer and (2) most proofs are significantly more readable with this constraint than with a generic one. For a detailed study of the influence (both theroretical and practical) of the choice of the normalization in Isud, the reader is referred to [23].

Once an optimal solution \(\varvec{d}^\star \) to Mima has been found, the idea of an augmentation algorithm is to follow that direction as far as possible while remaining feasible. The maximal feasible step alongside direction \(\varvec{d}\) is thus defined as \(r(\varvec{d}) = \max \left\{ \rho > 0 \, | \, \varvec{x}^0 + \rho \varvec{d} \in \mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}} \right\} \). From \(\varvec{x}^0\), fractional augmentation can therefore be performed as \(\varvec{x}^{1} = \varvec{x}^0 + r(\varvec{d}) \varvec{d}\).

3.1.2 Incompatibility degree of the nonbasic variables

To decompose Mima into smaller problems, both in terms of number of variables and constraints, the notion of incompatibility degree introduced by Elhallaoui et al. [7] must be presented here. Given a column \(\varvec{A}_{.j}\) of the constraint matrix, a row \(r \in \mathcal {R}\) is said to be covered by that column if \({A}_{{r}{j}} = 1\). For each column \(\varvec{A}_{.j}\) of \(\varvec{A}\), let \(\mathcal {R}_j = \left\{ r \in \mathcal {R} \, | \, {A}_{{r}{j}} = 1 \right\} \) be the set of rows covered by that column. By definition, the sets of the rows covered by the columns of \(\mathcal {P}\), i.e., \(\left\{ \mathcal {R}_l \right\} _{l \in \mathcal {P}}\), form a partition of \(\mathcal {R}\) (see Fig. 3). Assume that a total order \(\preceq \) is known over the indices of the rows \(\mathcal {R}\). For any j, that order is extended to \(\mathcal {R}_j\), and the elements of \(\mathcal {R}_j\) are written according to \(\preceq \) as

$$\begin{aligned} \mathcal {R}_j\, : \, r_1^j \preceq r_2^j \preceq \cdots \preceq r_{|\mathcal {R}_j|}^j\text { .} \end{aligned}$$

Definition 1

(Incompatibility degree) The incompatibility degree of a column \(\varvec{A}_{.j}\), \(j\in \{1, \ldots , n\}\), with respect to the working basis \(\mathcal {P}\) is computed as

$$\begin{aligned} \iota ^{\mathcal {P}}_{j} = \underset{l \in \mathcal {P}}{\sum } \, \underset{t=1}{\overset{|\mathcal {R}_l| - 1}{\sum }} \kappa ^t_{lj}, \end{aligned}$$
(3)

where \(\kappa ^t_{lj} = 1\) if \(\varvec{A}_{.j}\) covers \(r^j_t\) or \(r^j_{t+1}\) but not both, 2 if \(\varvec{A}_{.j}\) covers \(r^j_t\) and \(r^j_{t+1}\) but not consecutively, 0 otherwise. Here, \(r^j_t\) and \(r^j_{t+1}\) denote two rows covered by column l that are performed consecutively within that column. An example is given in Fig. 3.

Fig. 3
figure 3

Incompatibility degree with respect to \(\mathcal {P}\) on a 5-rows example for two different ordering of the rows of the same matrix, namely (1, 2, 3, 4, 5) in matrix \(\varvec{A}\), and (3, 4, 2, 5, 1) in matrix \(\varvec{\hat{A}}\)

Remark 3

Any ordering of the rows can be used with these definitions. In scheduling applications, the rows correspond to tasks to carry out. It is therefore intuitive to order them by starting time and an incompatibility will correspond to the breaking of a succession of tasks that are performed consecutively in the current solution \(\varvec{x}^0\).

For the sake of clarity, we will always consider that the nonzero entries of a column of the current solution are consecutive within \(\mathcal {R}\), i.e., given any pair of different indices \(i,j \in \mathcal {P}\), either \(\forall r \in \mathcal {R}_i, \forall r^\prime \in \mathcal {R}_j, r_i \preceq r^\prime _j\), or \(\forall r \in \mathcal {R}_i, \forall r^\prime \in \mathcal {R}_j, r^\prime _j \preceq r_i\). Note that this is not the case of the left part of Fig. 3. With \(\iota ^{\mathcal {P}}_{j}\) as defined in Formula (3), \(\varvec{A}_{.j}\) is said to be \(\iota ^{\mathcal {P}}_{j}\)-incompatible. 0-incompatible columns are called compatible and the others are called incompatible. These notions extend to the index of the column and to the corresponding variable.

Observation 1

All columns from the working basis are compatible;

Observation 2

An incompatible column is one that breaks the partition of the rows \(\left\{ \mathcal {R}_l \right\} _{l \in \mathcal {P}}\).

Given a solution \(\varvec{x}^0\), we define the following subsets of nonbasic variables \(\mathcal {Z}\) as

$$\begin{aligned} \mathcal {C} = \left\{ j \in \mathcal {Z} \, | \, \iota ^{\mathcal {P}}_{j} = 0 \right\} \text { and } \mathcal {I}^\iota = \left\{ j \in \mathcal {Z} \, | \, 1 \le \iota ^{\mathcal {P}}_{j} \le \iota \right\} , \forall 1 \le \iota \le k_{max}. \end{aligned}$$
(4)

Then, \(\mathcal {C}\) is called the compatible set and the corresponding indices and variables are called compatible. Thus, \(\mathcal {I}^\iota \) is called the at most \(\iota \)-incompatible set and the corresponding indices and variables are said to be at most \(\iota \) -incompatible. By definition, \(\mathcal {I}^1 \subseteq \mathcal {I}^2 \subseteq \ldots \subseteq \mathcal {I}^{\iota _{max}} = \mathcal {I}\). \(\mathcal {I}\) is called the incompatible set, and its elements and the corresponding variables are called incompatible. Finally, \(\left( \mathcal {P},\mathcal {C},\mathcal {I}\right) {}\) form a partition of \(\left\{ 1, \ldots , n \right\} \).

Lemma 1

Given \(j \in \mathcal {Z}\), the following statements are equivalent:

  1. (i)

    \(\varvec{A}_{.j}\) is compatible;

  2. (ii)

    \(\exists \mathcal {P}_j \subseteq \mathcal {P} \, | \, \varvec{A}_{.j} = \sum _{l \in \mathcal {P}_j} \varvec{A}_{.l}\);

  3. (iii)

    \(\varvec{A}_{.j} \in Span \,\left( \varvec{A}_{.\mathcal {P}} \right) \).

Lemma 1 does not apply to the left part of the example given in Fig. 3 because the rows are not ordered properly in that case. The notion of compatible/incompatible column is extended to all vectors of \(\mathbb {R}^m\), namely \(\varvec{w} \in \mathbb {R}^m\) is compatible if and only if \(\varvec{w} \in Span \,\left( \varvec{A}_{.\mathcal {P}} \right) \). In the theoretical part of this paper, we will consider the partition of \(\left\{ 1, \ldots , n \right\} \) into \(\left( \mathcal {P},\mathcal {C},\mathcal {I}\right) {}\). However it is algorithmically efficient to look first for an augmenting direction over \(\mathcal {C}\), and then, successively \(\mathcal {I}^1\), \(\mathcal {I}^2\), \(\ldots \), until \(\mathcal {I}\).

3.1.3 The IPS decomposition

With \(\mathcal {\bar{P}} = \left\{ 1, \ldots , m \right\} {}\setminus \mathcal {P}\) and reordering the rows and columns of \(\varvec{A}\) so that \(\mathcal {R}\) is partitioned into \((\mathcal {P}, \mathcal {\bar{P}})\), we can write

$$\begin{aligned} \varvec{A} = \begin{bmatrix} \varvec{I}_p&\varvec{A}_{\mathcal {P}\mathcal {C}}&\varvec{A}_{\mathcal {P}\mathcal {I}} \\ \varvec{A}_{\mathcal {\mathcal {\bar{P}}}\mathcal {P}}&\varvec{A}_{\mathcal {\mathcal {\bar{P}}}\mathcal {C}}&\varvec{A}_{\mathcal {\mathcal {\bar{P}}}\mathcal {I}} \end{bmatrix}, \end{aligned}$$
(5)

where \(\varvec{I}_p\) is the \(p \times p\) identity matrix. Given \(\varvec{d} \in \varDelta \), from the constraints \(\varvec{A} \varvec{d} = \varvec{0}\), one can easily see that the aggregation of all columns corresponding to increasing variables (for which \(d_j \ge 0\)) \(\varvec{w} = \varvec{A}_{.\mathcal {Z}} \varvec{d}_\mathcal {Z}\) is compatible. As in a reduced-gradient algorithm, we are in fact looking for an aggregate column \(\varvec{w}\) that can enter the working basis and take a positive value by lowering only some variables of \(\mathcal {P}\). We introduce the following problems, respectively called Restricted- Mima and Complementary- Mima):

figure g
figure h

Theorem 1

\(z^\star _{{\textsc {Mima}}} = \min \left\{ z^\star _{{\textsc {R-Mima}}}, z^\star _{{\textsc {C-Mima}}{}} \right\} \).

Proof

Let \(\varvec{d} = (\varvec{d}_\mathcal {P}, \varvec{d}_\mathcal {C}, \varvec{d}_\mathcal {I})\) be an optimal solution of Mima. If \(\varvec{d}_\mathcal {C} = \varvec{0}\) or \(\varvec{d}_\mathcal {I} = \varvec{0}\), \(\varvec{d}\) is respectively a solution of C-Mima or R-Mima and the result clearly holds.

Suppose now that \(\varvec{d}_\mathcal {C} \ne \varvec{0}\) and \(\varvec{d}_\mathcal {I} \ne \varvec{0}\). We first prove that \(\varvec{d}\) can be written as a convex combination of \(\varvec{u}^\prime \), a solution of R-Mima, and \(\varvec{v}^\prime \), a solution of C-Mima. By Lemma 1-(ii), the surrogate column \(\varvec{A}_{.C} \varvec{d}_\mathcal {C}\) can be written as a linear combination of columns of \(\mathcal {P}\), \(\varvec{A}_{.\mathcal {C}} \varvec{d}_\mathcal {C} = - \varvec{A}_{.\mathcal {P}} \varvec{u}_\mathcal {C}^\prime \), \(\varvec{u}_\mathcal {P}^\prime \le \varvec{0}\). Let \(\varvec{u} = (\varvec{u}_\mathcal {P}^\prime , \varvec{d}_\mathcal {C}, \varvec{0})\), and \(\varvec{v} = \varvec{d} - \varvec{u}\). Let \(\alpha _{\varvec{u}} = \Vert \varvec{d}_\mathcal {C} \Vert _1 = \sum _{j \in \mathcal {C}} d_j\) and \(\alpha _{\varvec{v}} = \Vert \varvec{d}_\mathcal {I} \Vert _1 = \sum _{j \in \mathcal {I}} d_j\). Moreover, let \(\varvec{u}^\prime = \varvec{u} / \alpha _{\varvec{u}}\) and \(\varvec{v}^\prime = \varvec{v} / \alpha _{\varvec{v}}\) be the corresponding normalized directions. Thus, \(0 < \alpha _{\varvec{u}}, \alpha _{\varvec{v}}\) and, since \(\varvec{d}\) is a solution of Mima, the normalization constraint yields \(\alpha _{\varvec{u}} + \alpha _{\varvec{v}} = 1\) because \(\varvec{d} \in \mathcal {F}_{\textsc {Mima}}\). Therefore, \(\varvec{d} = \alpha _{\varvec{u}} \varvec{u}^\prime + \alpha _{\varvec{v}} \varvec{v}^\prime \) is a convex combination of \(\varvec{u}^\prime \) a solution of R-Mima and \(\varvec{v}^\prime \) a solution of C-Mima.

Looking at the objective function, the convex combination reads \(\varvec{c}^T \varvec{d} = \alpha _{\varvec{u}} (\varvec{c}^T \varvec{u}^\prime ) + \alpha _{\varvec{v}} (\varvec{c}^T \varvec{v}^\prime )\). Either \(\varvec{c}^T \varvec{d} \ge \varvec{c}^T \varvec{u}^\prime \) or \(\varvec{c}^T \varvec{d} \ge \varvec{c}^T \varvec{v}^\prime \). However, since every solution of R-Mima and C-Mima is also a solution of Mima and \(\varvec{d}\) is optimal for Mima, \(\varvec{c}^T \varvec{d} \le \varvec{c}^T \varvec{u}^\prime \) and \(\varvec{c}^T \varvec{d} \le \varvec{c}^T \varvec{v}^\prime \). One of the two inequalities is thus an equality and the second follows from \(\varvec{c}^T \varvec{d} = \alpha _{\varvec{u}} (\varvec{c}^T \varvec{u}^\prime ) + \alpha _{\varvec{v}} (\varvec{c}^T \varvec{v}^\prime )\). Therefore, \(\varvec{c}^T \varvec{d} = \varvec{c}^T \varvec{u}^\prime = \varvec{c}^T \varvec{v}^\prime \), and since \(\varvec{d}\) is an optimal solution of Mima, \(z^\star _{{\textsc {Mima}}{}} = z^\star _{{\textsc {R-Mima}}} = z^\star _{{\textsc {C-Mima}}}\). \(\square \)

In the previous works on IPS by Elhallaoui et al. [6], a weaker version of Theorem 1 stated that \(z^\star _{{\textsc {Mima}}}\) and \(\min \left\{ z^\star _{{\textsc {R-Mima}}}, z^\star _{{\textsc {C-Mima}}{}} \right\} \) have the same sign. Theorem 1 strengthens this result and makes the justification of most subsequent procedures more straightforward. This purely primal interpretation of their less-intuitive dual approach allows us to state a precise decomposition of Mima into R-Mima and C-Mima. As a consequence of Theorem 1, we will consider the pair of problems R-Mima and C-Mima instead of the more complicated Mima, and we will solve them sequentially. In particular, C-Mima will not be solved if \(z^\star _{{\textsc {R-Mima}}{}} < 0\) because an improving direction is already known. In the next sections, we discuss how to reduce the number of rows in R-Mima and C-Mima to ease their solution. Recall that, for the moment, \(\varvec{x}^1 = \varvec{x}^0 + r(\varvec{d}) \varvec{d}\) may be fractional.

3.1.4 Row-reduction of R-Mima

The practical solution of the restriction of f AUG to the compatible variables only is based on the following proposition. Recall that for all \(i \in \mathcal {C}\), \(\mathcal {P}_i \subseteq \mathcal {P}\) is defined as in Lemma 1-(ii), i.e., it is the unique subset of \(\mathcal {P}\) such that \(\varvec{A}_{.i} = \sum _{j \in \mathcal {P}_i} \varvec{A}_{.j}\).

Proposition 1

Given \(j \in \mathcal {C}\), the minimal direction associated with j is the vector \(\varvec{\delta }^j \in \mathbb {R}^{p + |\mathcal {C}|}\) defined as

$$\begin{aligned} \forall i \in \mathcal {P} \cup \mathcal {C} \, , \, \delta ^j_i = \left\{ \begin{array}{ll} -1 &{}\quad \text {if } i \in \mathcal {P}_j \text {,} \\ 1 &{}\quad \text {if } i = j \text {,} \\ 0 &{}\quad \text {otherwise.} \end{array} \right. \end{aligned}$$
(6)

For any \(j \in \mathcal {C}\), \(\varvec{\delta }^j\) is feasible and the set of all extreme points of \(\mathcal {F}_{\textsc {R-Mima}}\) is exactly the set of the minimal directions associated with variables of \(\mathcal {C}\), i.e., \(\left\{ \varvec{\delta }^j | j \in \mathcal {C} \right\} \). Moreover, the maximal feasible step alongside direction \(\varvec{\delta }^j\) is \(r(\varvec{\delta }^j) = 1\).

Proof

First, let \(j \in \mathcal {C}\). By definition of \(\varvec{\delta }^j\) and \(\mathcal {F}_{\textsc {R-Mima}}\), \(\varvec{\delta }^j\) is obviously feasible for R-Mima.

Second, let \(\varvec{d}^\star = (\varvec{d}_\mathcal {P}^\star , \varvec{d}_\mathcal {C}^\star ) \in \mathcal {F}_{\textsc {R-Mima}}\) be a feasible solution of R-Mima. We will show that \(\varvec{d}\) is a convex combination of one or several minimal solutions. Let \(\varvec{u} = \sum _{j \in \mathcal {C}} d^\star _j \varvec{\delta }^j\). By construction, \(\varvec{u}_\mathcal {C} = \varvec{d}_\mathcal {C}^\star \). Because all \(\varvec{\delta }^j\) are in \(\mathcal {F}_{\textsc {R-Mima}}\), then the linearity constraints apply to their combination \(\varvec{u}\). Hence, \(\varvec{A}_{.\mathcal {P}} \varvec{u}_\mathcal {P} = - \varvec{A}_{.\mathcal {C}} \varvec{u}_\mathcal {C} = - \varvec{A}_{.\mathcal {C}} \varvec{d}_\mathcal {C}^\star = \varvec{A}_{.\mathcal {P}} \varvec{d}_\mathcal {P}^\star \). The columns of the working basis \(\varvec{A}_{.\mathcal {P}}\) are linearly independent, therefore, \(\varvec{u}_\mathcal {P} = \varvec{d}_\mathcal {P}^\star \) and \(\varvec{d}^\star = \varvec{u}\).

Moreover, \(\varvec{d} \in \mathcal {F}_{\textsc {R-Mima}}\) satisfies the normalization constraint \(\varvec{e}^T \varvec{d}_\mathcal {C}^\star = \sum _{i \in \mathcal {C}} d^\star _i = 1\). Thus, \(\varvec{d}^\star = \varvec{u} = \sum _{j \in \mathcal {C}} d^\star _j \varvec{\delta }^j\) is a convex combination of minimal directions. Finally, given the current solution \(\varvec{x}^0\), for any \(j \in \mathcal {C}\) and \(\rho > 0\),

$$\begin{aligned} \forall i \in \mathcal {P} \cup \mathcal {C} \, , \, \left[ \varvec{x}^0 + \rho \varvec{\delta }^j \right] _i = \left\{ \begin{array}{ll} 1 - \rho &{}\quad \text {if } i \in \mathcal {P}_j \text {,}\\ \rho &{}\quad \text {if } i = j \text {,}\\ x^0_i &{}\quad \text {otherwise.} \end{array} \right. \end{aligned}$$

Therefore, considering the bounds \(\varvec{0} \le (\varvec{x}^0 + \rho \varvec{d}) \le \varvec{e}\), the maximum possible value for \(\rho \) is \(r(\varvec{d}^0) = 1\). \(\square \)

Corollary 1

There exists \(j \in \mathcal {P}\) such that \(\varvec{\delta }^j\) is an optimal solution of R-Mima.

As a consequence of Proposition 1 and Corollary 1, solving R-Mima and following the optimal direction until a bound is reached is equivalent to pivoting the corresponding compatible variable into the working basis. These pivots are guaranteed to be nondegenerate, as proven in the following proposition.

Proposition 2

Compatible variables are exactly those that yield nondegenerate pivots when inserted in the working basis.

Proof

For SPP, when inserted in the working basis, a nonbasic variable yields a nondegenerate pivot iff it can be written as a nonnegative linear combination of the variables in \(\mathcal {P}\) (\(\varvec{x}_\mathcal {P}^0 = \varvec{0}\) and \(\varvec{0} \le \varvec{x} \le \varvec{1}\)). By Lemma 1-(iii), compatible columns are exactly those that can be written as a linear combination of the columns of \(\varvec{A}_{.\mathcal {P}}\). Therefore, if a column yields a nondegenerate pivot, it must be compatible. As proven in Proposition 1 pivoting a compatible variable \(x_j\) in the working basis can be interpreted as following direction \(\varvec{\delta }^j\). Since the corresponding maximal step is positive (\(r(\varvec{\delta }^j) = 1\)), the pivot is nondegenerate and compatible variables all yield nondegenerate pivots. \(\square \)

Proposition 3

R-Mima is equivalent to the minimization program

figure i

called reduced problem, where \(\bar{c} = \varvec{c} - \varvec{c}_\mathcal {P}^T \varvec{A}_{\mathcal {P}.}\) is the reduced costs vector. Moreover, given an optimal solution \(j \in \mathcal {C}\), the corresponding direction is \(\varvec{\delta }^j\).

Proof

Given an index \(j \in \mathcal {C}\), the cost of the corresponding minimal direction in R-Mima is \(\varvec{c}^T \varvec{\delta }^j = \varvec{c}_\mathcal {P}^T \varvec{\delta }_\mathcal {P}^j + \varvec{c}_\mathcal {C}^T \varvec{\delta }_\mathcal {C}^j\). With Eq. (5), the linear constraints of R-Mima become

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{I}_p \varvec{\delta }_\mathcal {P}^j + \varvec{A}_{\mathcal {P}\mathcal {C}} \varvec{\delta }_\mathcal {C}^j = 0 \\ \varvec{A}_{\mathcal {\bar{P}}\mathcal {P}} \varvec{\delta }_\mathcal {P}^j + \varvec{A}_{\mathcal {\bar{P}}\mathcal {C}} \varvec{\delta }_\mathcal {C}^j = 0 \\ \end{array} \right. \text {,} \end{aligned}$$

and, with Eq. (6), the first row gives \(\varvec{\delta }_\mathcal {P}^j = - \varvec{A}_{\mathcal {P}{j}}\), and thus, \(\varvec{c}^T \varvec{\delta }^j = c_j - \varvec{c}_\mathcal {P}^T \varvec{A}_{\mathcal {P}{j}}\). Because the extreme points of \(\mathcal {F}_{\textsc {R-Mima}}\) are the \(\varvec{\delta }^j\) for \(j \in \mathcal {C}\), the result holds. \(\square \)

Note that the previous result can be extended for any choice of normalization weights. Namely, if the normalization constraint reads as \(\varvec{w}^T \varvec{d} = 1\), then \(\bar{c}_j\) is replaced by \(\bar{c}_j / w_j\) in equation (RP) and the result holds (see, [23] for more details).

The term of row-reduction refers to the following two facts. First, the linear program becomes a simple reduced-cost determination. Second, the computation of the whole improving direction is made without any matrix multiplication since \(\varvec{\delta }_\mathcal {P}^j = - \varvec{A}_{\mathcal {P}{j}}\). This is in the spirit of a standard simplex pivot in which a reduced-cost analysis is performed, and then a system must be solved to determine the future solution \(\varvec{x}^{k+1}\). As in Proposition 3, in practice, the determination of \(j \in \mathcal {C}\) is made as a reduced-cost analysis and exactly reproduces what a simplex pivot would be. Namely, the nonbasic variable of lowest reduced-cost \(z^\star _{{\textsc {R-Mima}}{}}\) is determined, and then, if it satisfies \(z^\star _{{\textsc {R-Mima}}{}} < 0\), a pivot is performed by inserting that variable into the working basis. However, if \(z^\star _{{\textsc {R-Mima}}{}} \ge 0\), it becomes necessary to consider C-Mima to determine an augmenting direction. This process is the topic of the following section.

3.1.5 Row-reduction of C-Mima

In this section, we suppose that the compatible variables yield no improvement, or equivalently that \(\mathcal {C} = \emptyset \). The first p constraints of C-Mima read \(\varvec{d}_\mathcal {P} = - \varvec{A}_{\mathcal {P}\mathcal {I}} \varvec{d}_\mathcal {I}\). As in a reduced gradient algorithm (e.g., [13]), the modification of the basic variables is inferred via a linear transformation of the increasing nonbasic variables \(\varvec{d}_\mathcal {Z}\), or in the case of C-Mima, \(\varvec{d}_\mathcal {I}\). Hence, we reduce C-Mima to an equivalent problem over the nonbasic variables of \(\mathcal {I}\) only. For the sake of clarity, denote as \(\varDelta _{\mathcal {I}}\) the feasible domain of C-Mima, i.e., all elements of \(\varDelta \) that satisfy \(\varvec{d}_\mathcal {C} = 0\), and define \(q = |\mathcal {I}|\). That domain can be defined by less linear constraints and variables than \(\varDelta \) as shown by Proposition 4.

Proposition 4

With \(\bar{\varDelta _{\mathcal {I}}} = \left\{ \varvec{d}_\mathcal {I} \in \mathbb {R}^q \, |\, \varvec{\bar{A}}_{\mathcal {\bar{P}}\mathcal {I}} \varvec{d}_\mathcal {I} = \varvec{0}\, ,\, \varvec{e}^T \varvec{d}_\mathcal {I} = 1\, ,\, \varvec{d}_\mathcal {I} \ge \varvec{0} \right\} \), then

$$\begin{aligned} \varDelta _{\mathcal {I}} = \left\{ \varvec{d} = \varvec{T} \varvec{d}_\mathcal {I}\, |\, \varvec{d}_\mathcal {I} \in \bar{\varDelta _{\mathcal {I}}} \right\} , \end{aligned}$$
(7)

where \(\varvec{T} = \begin{bmatrix} - \varvec{A}_{\mathcal {P}\mathcal {I}} \varvec{I}_q \end{bmatrix}\) and \(\varvec{\bar{A}}_{\mathcal {\bar{P}}\mathcal {I}} = - \varvec{A}_{\mathcal {\bar{P}}\mathcal {P}} \varvec{A}_{\mathcal {P}\mathcal {I}} + \varvec{A}_{\mathcal {\bar{P}}\mathcal {I}} = \varvec{A}_{\mathcal {\bar{P}}.} \varvec{T}\).

Proof

Let \(\varvec{d} \in \mathbb {R}^{n}\) such that \(\varvec{d}_\mathcal {C} = \varvec{0}\). Then,

$$\begin{aligned} \varvec{d} \in \varDelta _{\mathcal {I}}\Leftrightarrow & {} \varvec{A}_{.\mathcal {P}} \varvec{d}_\mathcal {P} + \varvec{A}_{.\mathcal {I}} \varvec{d}_\mathcal {I} = 0\, ,\, \varvec{e}^T \varvec{d}_\mathcal {I} = 1\, ,\, \varvec{d}_\mathcal {P} \le \varvec{0}\, , \, \varvec{d}_\mathcal {I} \ge \varvec{0} \\\Leftrightarrow & {} \left\{ \begin{array}{ll} \varvec{d}_\mathcal {P} + \varvec{A}_{\mathcal {P}\mathcal {I}} \varvec{d}_\mathcal {I} = \varvec{0} \\ \varvec{A}_{\mathcal {\bar{P}}\mathcal {P}} \varvec{d}_\mathcal {P} + \varvec{A}_{\mathcal {\bar{P}}\mathcal {I}} \varvec{d}_\mathcal {I} = \varvec{0} \\ \varvec{e}^T \varvec{d}_\mathcal {I} = 1 \\ \varvec{d}_\mathcal {P} \le 0, \varvec{d}_\mathcal {I} \ge 0 \end{array} \right. \\\Leftrightarrow & {} \left\{ \begin{array}{l} \varvec{d}_\mathcal {P} = - \varvec{A}_{\mathcal {P}\mathcal {I}} \varvec{d}_\mathcal {I} \\ \left( -\varvec{A}_{\mathcal {\bar{P}}\mathcal {P}} \varvec{A}_{\mathcal {P}\mathcal {I}} + \varvec{A}_{\mathcal {\bar{P}}\mathcal {I}} \right) \varvec{d}_\mathcal {I} = \varvec{0} \\ \varvec{e}^T \varvec{d}_\mathcal {I} = 1 \\ \varvec{d}_\mathcal {I} \ge 0 \end{array} \right. \\\Leftrightarrow & {} \varvec{d} = \varvec{T} \varvec{d}_\mathcal {I} \text { and } \varvec{d}_\mathcal {I} \in \bar{\varDelta _{\mathcal {I}}} \text {.} \end{aligned}$$

Hence, the proposition holds. \(\square \)

The cost of such a direction can be computed as \(\varvec{c}^T \varvec{d} = \varvec{c}^T \varvec{T} \varvec{d}_\mathcal {I}\). Define \(\varvec{\bar{c}}^T = \varvec{c}^T \varvec{T} = \varvec{c}_\mathcal {I}^T - \varvec{c}_\mathcal {P}^T \varvec{A}_{\mathcal {P}\mathcal {I}}\), and C-Mima is equivalent to the following program, called the complementary problem:

figure j

The cost vector of CP is the reduced-costs vector associated with all incompatible variables, i.e., their own cost (\(\varvec{c}_\mathcal {I}^T\)) minus the marginal impact they have on the values of the variables of \(\mathcal {P}\) if they enter the reduced-basis (\(- \varvec{c}_\mathcal {P}^T \varvec{A}_{\mathcal {P}\mathcal {I}}\)).

Remark 4

In Proposition 1, for \(j \in \mathcal {C}\), we called \(\varvec{\delta }^j\) a minimal direction. This denomination can be extended to directions of \(\bar{\varDelta _{\mathcal {I}}}\) (while still applying to those of \(\bar{\varDelta _{\mathcal {C}}}\)) as follows: \(\varvec{d} \in \bar{\varDelta _{\mathcal {I}}}\) is a minimal direction iff there exists no other direction whose support is strictly contained in that of \(\varvec{d}\). In that sense, the set of minimal directions of \(\bar{\varDelta }\) is exactly \( Ext \left( {\bar{\varDelta _{\mathcal {C}}}} \right) \cup Ext \left( {\bar{\varDelta _{\mathcal {I}}}} \right) \) (see [23] for more details on the geometrical structure of \(\varDelta \) and \(\bar{\varDelta }\)).

3.2 Integral augmentation i AUG

The previous section provides a method to find a fractional augmentation. If \(z^\star _{{\textsc {RP}}{}} < 0\) or \(z^\star _{{\textsc {CP}}{}} < 0\), then the corresponding solution of negative reduced cost \(\varvec{d} \in \varDelta \) yields a new solution \(\varvec{x}^1 = \varvec{x}^0 + r(\varvec{d}) \varvec{d} \in \mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\). However, nothing guarantees that \(\varvec{x}^1\) is integral. In this section, we characterize directions that lead to an integral solution \(\varvec{x}^1\). Such directions are called integral directions as opposed to fractional directions. The set of all integral directions within \(\varDelta \) is denoted as \(\varDelta ^{int}\). These names apply to \(\varvec{d}\), and its restrictions \(\varvec{d}_\mathcal {Z}\), \(\varvec{d}_\mathcal {C}\) or \(\varvec{d}_\mathcal {I}\) depending on the context. We also give some insights on the structure of the set of all integral directions, and a generic framework for our algorithm.

3.2.1 Over compatible variables

Note that for any \(j \in \mathcal {C}\), the feasible solution \(\varvec{\delta }^j\) of RP is an integral direction. Indeed, \(\varvec{A}_{.j} = \sum _{i \in \mathcal {P}_j} \varvec{A}_{.i}\) and following a step of length \(r(\varvec{\delta }^j) = 1\) in that direction is equivalent to let j enter the working basis \(\mathcal {P}\) and let \(\mathcal {P}_j\) leave it. Namely,

$$\begin{aligned} \left[ \varvec{x}^0 + \rho \varvec{\delta }^j \right] _i = \left\{ \begin{array}{ll} 1 &{}\quad \text {if } i \in \mathcal {P} \setminus \mathcal {P}_j \\ 1 - \rho &{}\quad \text {if } i \in \mathcal {P}_j \\ \rho &{}\quad \text {if } i = j \\ 0 &{}\quad \text {otherwise.} \end{array} \right. \end{aligned}$$

The corresponding maximal step is \(\rho = r(\varvec{\delta }^j) = 1\), so \(\varvec{x}^1 = \varvec{x}^0 + \varvec{\delta }^j \in \mathcal {F}_{{\textsc {SPP{}}}}\) and integrality is maintained.

3.2.2 Characterization of integral directions in \(\varDelta _{\mathcal {I}}\)

Now, let us consider the problem for C-Mima (and CP). We base our analysis on previous results of Zaghrouti et al. [37] that we first recall here. For any polyhedron P, denote as \( Ext \left( {P} \right) \) the set of its extreme points (or vertices).

Definition 2

A set \(\mathcal {S}\) of indices of the variables in \(\left\{ 1, \ldots , n \right\} \) is column-disjoint if no pair of columns of \(\varvec{A}_{.\mathcal {S}}\) has a common nonzero entry, i.e., \(\forall i,j \in \mathcal {S}, i \ne j, \forall k \in \left\{ 1, \ldots , m \right\} , A_{ki} A_{kj} = 0\). This definition is extended to \(\varvec{A}_{.\mathcal {S}}\) and \(\varvec{x}_\mathcal {S}\). Since \(\varvec{A}\) is binary, \(\mathcal {S}\) is column-disjoint iff \(\varvec{A}_{.\mathcal {S}}\) is a set of orthogonal columns.

Proposition 5

(Propositions 6 and 7, [37]) Given \(\varvec{d} \in Ext \left( {\varDelta } \right) \), \(\varvec{d}\) is an integral direction iff the support of \(\varvec{d}_\mathcal {I}\), denoted as \(S = Supp \,\left( \varvec{d}_\mathcal {I}\right) \), is column-disjoint. In this case, \(r(\varvec{d}) = |S|\).

Fig. 4
figure 4

Geometrical insight on the extreme points of \(\varDelta \) and \(\varDelta ^{int}\). All extreme points of \( Conv \,\left( \varDelta ^{int}\right) \) (\(\bullet \)) are extreme points of \(\varDelta \). Moreover, \(\varDelta ^{int}\) is a finite set (\(\mathcal {F}_{{\textsc {SPP{}}}}\) is finite)

Because SPP is quasi-integral, then the edges of \( Conv \,\left( \mathcal {F}_{{\textsc {SPP{}}}}\right) \) are edges of \(\mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\), and \( Ext \left( {\varDelta ^{int}_{\mathcal {I}}} \right) \subseteq Ext \left( {\varDelta } \right) \). A geometrical description of this is shown on Fig. 4. Since every linear program has at least one optimal solution that is also an extreme point of its feasible domain, and since the simplex algorithm guarantees to find such a solution, Proposition 5 has a strong practical interest. In the next section, we will show that it still remains valid when cutting planes or branching techniques are used. Hence, it is sufficient to test whether the solution of the relaxation (CP) is column-disjoint to determine if it is integral.

3.2.3 Algorithmic framework

figure k

Algorithm 1 is based on successive resolutions of augmenting problems either to \(\mathcal {C}\) or to \(\mathcal {I}^{\iota }\) for a certain incompatibility degree \(\iota \). Hence, the incompatibility degree defines a neighborhood of the current solution on which the augmentation problem is solved with integer programming techniques. \({\textsc {CP}}^{\iota }\) describes the restriction of CP to the columns of \(\mathcal {I}^\iota \) and \(\mathrm {INC\_MAX}{}\) describes the largest incompatibility degree considered during the execution. Whenever \(\mathrm {INC\_MAX}{} \ge \iota _{max}{}\), and provided that one uses exhaustive primal cutting planes families such as Gomory-Young inequalities (line 18 of Algorithm 1), the algorithm is exact. When it is not the case, the solution returned by Algorithm 1 may not be optimal: This strongly depends on the value of \(\mathrm {INC\_MAX}{}\), and on the chosen branching and cutting planes techniques. For instance, if \(\mathrm {INC\_MAX}< \iota _{max}\), there may exist augmenting integral directions involving columns of incompatibility degree greater than \(\mathrm {INC\_MAX}\). The same holds when the branching technique is nonexhaustive. This kind of heuristic stopping criterion of the algorithm makes the execution more efficient in practice.

4 Solving the augmentation problem with cutting-planes

In this section, we suppose that we know an optimal solution \(\varvec{d}_\mathcal {I}^\star \) of CP, but that this solution is not integral (\( Supp \,\left( \varvec{d}_\mathcal {I}^\star \right) \) is not column-disjoint). An idea to tighten the known relaxation of \(\bar{\varDelta }_{\mathcal {I}}^{int}\) is to add cutting planes to CP. Given a polyhedron P, a valid inequality for P is an inequality satisfied by all elements of P; and given a point \(\varvec{x}^\prime \), the inequality separates \(\varvec{x}^\prime \) from P if it is valid for P but violated by \(\varvec{x}^\prime \). Such an inequality is called a cut (or cutting plane). The main issue is to characterize valid inequalities for \(\bar{\varDelta }_{\mathcal {I}}^{int}\) that cut off the current optimal solution \(\varvec{d}_\mathcal {I}^\star \).

Notation. Denote as \({\mathcal {H}}^{1}_= = \left\{ \varvec{d}_\mathcal {I} \in \mathbb {R}^q \, |\, \varvec{e}^T \varvec{d}_\mathcal {I} = 1 \right\} \) the hyperplane defined by the normalization constraint of \(\bar{\varDelta }_{\mathcal {I}}\). Let \(\varvec{\bar{\alpha }} \in \mathbb {R}^q\), \(\bar{\beta } \in \mathbb {R}\), and denote by \(\left( \bar{\varGamma }\right) \) the inequality

$$\begin{aligned} \left( \bar{\varGamma }\right) \, :\, \varvec{\bar{\alpha }}^T \varvec{d}_\mathcal {I} \le \bar{\beta } \text {.} \end{aligned}$$
(8)

The associated hyperplane is denoted as \({\mathcal {H}}^{\bar{\varGamma }}_= = \left\{ \varvec{x} \in \mathbb {R}^q | \varvec{\bar{\alpha }}^T \varvec{d}_\mathcal {I} = \bar{\beta } \right\} \) and the associated half-space as \({\mathcal {H}}^{\bar{\varGamma }}_{\le } = \left\{ \varvec{x} \in \mathbb {R}^q | \varvec{\bar{\alpha }}^T \varvec{d}_\mathcal {I} \le \bar{\beta }\right\} \). Without loss of generality, we can assume that \(\varvec{\bar{\alpha }}\) is not proportional to \(\varvec{e}\).

Definition 3

Let \(\left( \varvec{\bar{\alpha }}_1, \bar{\beta }_1 \right) , \left( \varvec{\bar{\alpha }}_2, \bar{\beta }_2 \right) \in \mathbb {R}^q \times \mathbb {R}\), and \(\left( \bar{\varGamma }_1\right) \) and \(\left( \bar{\varGamma }_2\right) \) be the corresponding inequalities. \(\left( \bar{\varGamma }_1\right) \) and \(\left( \bar{\varGamma }_2\right) \) are equivalent for \(\bar{\varDelta }_{\mathcal {I}}^{int}\) if

$$\begin{aligned} {\mathcal {H}}^{1}_= \cap {\mathcal {H}}^{\bar{\varGamma }_1}_{\le } = {\mathcal {H}}^{1}_= \cap {\mathcal {H}}^{\bar{\varGamma }_2}_{\le }. \end{aligned}$$

Since \(\bar{\varDelta }_{\mathcal {I}} \subseteq {\mathcal {H}}^{1}_=\), two equivalent inequalities exactly discard the same subset of \(\bar{\varDelta }_{\mathcal {I}}\) and are simultaneously valid. Hence, adding the one or the other to the formulation is equivalent.

Proposition 6

There exists \(\varvec{\bar{\alpha }}^\prime \in \mathbb {R}^q\) such that \(\left( \bar{\varGamma }^\prime \right) : (\varvec{\bar{\alpha }}^\prime )^T \varvec{d}_\mathcal {I} \le 0\) is equivalent to \(\left( \bar{\varGamma }\right) \).

Proof

Consider \(\mathcal {E}_= = {\mathcal {H}}^{1}_= \cap {\mathcal {H}}^{\bar{\varGamma }}_=\). The two hyperplanes \({\mathcal {H}}^{1}_=\) and \({\mathcal {H}}^{\bar{\varGamma }}_=\) are not parallel, so \(\dim (\mathcal {E}_=) = q-2\). \(\varvec{0} \notin {\mathcal {H}}^{1}_=\), therefore \(\varvec{0} \notin \mathcal {E}_=\) and \({\mathcal {H}}^{0}_= = Span \,\left( \mathcal {E}_= \cup \{\varvec{0}\} \right) \) is a hyperplane containing \(\varvec{0}\). Thus, there exists \(\varvec{\bar{\alpha }}^\prime \in \mathbb {R}^q\) such that \({\mathcal {H}}^{0}_=\) is defined by \(\left( \varvec{\bar{\alpha }}^\prime \right) ^T \varvec{d}_\mathcal {I} = 0\). Furthermore, by construction, \({\mathcal {H}}^{0}_= \cap {\mathcal {H}}^{1}_= = \mathcal {E}_= = {\mathcal {H}}^{\bar{\varGamma }}_= \cap {\mathcal {H}}^{1}_=\). With \(\varvec{\bar{\alpha }}^\prime = \pm \varvec{\bar{\alpha }}^0\) (sign to be well chosen), the proposition holds. \(\square \)

Fig. 5
figure 5

Equivalent inequalities. Both \({\mathcal {H}}^{0}_=\) and \({\mathcal {H}}^{\varGamma }_=\) cut off the same part of \(\bar{\varDelta }_{\mathcal {I}}\). \({\mathcal {H}}^{0}_=\) is of the form \(\bar{\alpha }^T \varvec{d}_\mathcal {I} \le 0\) but not \({\mathcal {H}}^{\varGamma }_=\) (\(\varvec{0} \notin {\mathcal {H}}^{\varGamma }_=\))

The geometrical interpretation of Proposition 6 is given on Fig. 5. Hereinafter, we suppose that \(\bar{\beta } = 0\). We can now characterize valid inequalities for \(\bar{\varDelta }_{\mathcal {I}}^{int}\).

Proposition 7

Given \(\varvec{\alpha } \in \mathbb {R}^n\) such that \(\bar{\varvec{\alpha }} = \varvec{T}^T \varvec{\alpha }\), \(\left( \bar{\varGamma }\right) \) is a valid inequality for \(\bar{\varDelta }^{int}_{\mathcal {I}}\) if and only if

$$\begin{aligned} \left( \varGamma \right) \, : \, \varvec{\alpha }^T \left( \varvec{x} - \varvec{x}^0 \right) \le 0 \end{aligned}$$
(9)

is a valid inequality for \(\mathcal {F}_{{\textsc {SPP{}}}}\).

Proof

Let \(\varvec{\alpha } \in \mathbb {R}^n\) such that \(\bar{\varvec{\alpha }} = \varvec{T}^T \varvec{\alpha }\). Recall Proposition 4: \(\varDelta _{\mathcal {I}} = \left\{ \varvec{d} = \varvec{T} \varvec{d}_\mathcal {I} \, |\, \varvec{d}_\mathcal {I} \in \bar{\varDelta }_{\mathcal {I}} \right\} \). This extends to \(\varDelta ^{int}_{\mathcal {I}}\) and \(\bar{\varDelta }^{int}_{\mathcal {I}}\). Hence,

$$\begin{aligned} \left( \bar{\varGamma }\right) \text { is valid for } \bar{\varDelta }^{int}_{\mathcal {I}}\Leftrightarrow & {} \forall \varvec{d}_\mathcal {I} \in \bar{\varDelta }^{int}_{\mathcal {I}}, \varvec{\bar{\alpha }}^T \varvec{d}_\mathcal {I} \le 0 \\\Leftrightarrow & {} \forall \varvec{d}_\mathcal {I} \in \bar{\varDelta }^{int}_{\mathcal {I}}, \varvec{\alpha }^T \varvec{T} \varvec{d}_\mathcal {I} \le 0 \\\Leftrightarrow & {} \forall \varvec{d} \in \varDelta ^{int}_{\mathcal {I}}, \varvec{\alpha }^T \varvec{d} \le 0 \\\Leftrightarrow & {} \forall \varvec{d} \in \varDelta ^{int}_{\mathcal {I}}, \varvec{\alpha }^T \left( \varvec{x}^0 + r(\varvec{d}) \varvec{d} \right) \le \varvec{\alpha }^T \varvec{x}^0 \\\Leftrightarrow & {} \forall \varvec{x}^\prime \in \mathcal {F}_{{\textsc {SPP{}}}}, \varvec{\alpha }^T \left( \varvec{x}^\prime - \varvec{x}^0 \right) \le 0 \\\Leftrightarrow & {} \left( \varGamma \right) \text { is valid for SPP} \text {.} \end{aligned}$$

\(\square \)

Proposition 8

Let \(\left( \varGamma \right) \) and \(\left( \bar{\varGamma }\right) \) be valid inequalities defined as in Proposition 7, and \(\varvec{d}_\mathcal {I}^\star \in \varDelta _{\mathcal {I}}\), \(\varvec{d}^\star = \varvec{T} \varvec{d}_\mathcal {I}\), and \(\varvec{x}^\star = \varvec{x}^0 + r(\varvec{d}^\star ) \varvec{d}^\star \). Then,

$$\begin{aligned} \left( \bar{\varGamma }\right) \; {\textit{separates}}\; \varvec{d}_\mathcal {I}^\star \; {\textit{from}} \; \bar{\varDelta }^{int}_{\mathcal {I}} \Leftrightarrow \left( \varGamma \right) \; {\textit{separates}}\; \varvec{x}^\star \; {\textit{from}}\; \mathcal {F}_{{\textsc {SPP{}}}}. \end{aligned}$$

Proof

We only need to prove that \(\left( \bar{\varGamma }\right) \) is violated by \(\varvec{d}_\mathcal {I}^\star \) if and only if \(\left( \varGamma \right) \) is violated by \(\varvec{x}^\star \). We have

$$\begin{aligned} \bar{\varvec{\alpha }}^T \varvec{d}^\star> 0\Leftrightarrow & {} \varvec{\alpha }^T \varvec{d}^\star> 0 \\\Leftrightarrow & {} \varvec{\alpha }^T \left( \varvec{x}^0 + r(\varvec{d}^\star ) \varvec{d}^\star \right)> \varvec{\alpha }^T \varvec{x}^0 \\\Leftrightarrow & {} \varvec{\alpha }^T \left( \varvec{x}^\star - \varvec{x}^0 \right) > 0. \end{aligned}$$

\(\square \)

What we have shown must now be seen the other way round to take advantage of previous work on primal separation. Assume that we know how to determine a primal cut \(\left( \varGamma \right) \) for SPP that separates \(\varvec{x}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\). Then, with \(\varvec{\bar{\alpha }} = \varvec{T}^T \varvec{\alpha }\), the associated inequality \(\left( \bar{\varGamma }\right) \) is a cut for CP, that separates \(\varvec{d}_\mathcal {I}^\star \) from \(\bar{\varDelta }_{\mathcal {I}}^{int}\). Moreover, we have shown that any cut for CP can be obtained in this way. This enables us to develop a procedure based on the primal separation problem P-SEP – given \(\varvec{x}^0\) and \(x^\star \), is there a valid inequality for \(\mathcal {F}_{{\textsc {SPP{}}}}\), tight at \(\varvec{x}^0\), that separates \(\varvec{x}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\)? If it exists, it will be transferred to CP to tighten the relaxation of \(\bar{\varDelta }^{int}_{\mathcal {I}}\). From the theoretical point of view, if \(\varvec{d}^\star \) is extremal, such a cut always exists as shown on Corollary 2.

Corollary 2

Assume \(\varvec{d}_\mathcal {I}^\star \in Ext \left( {\mathcal {F}_{\textsc {CP}}} \right) \), and let \(\varvec{d}^\star = \varvec{T} \varvec{d}_\mathcal {I}^\star \) (we still assume that \(\varvec{d}_\mathcal {I}^\star \) is not an integral direction). There always exists a valid inequality \(\left( \varGamma \right) \) tight at \(\varvec{x}^0\) that separates \(\varvec{x}^\star = \varvec{x}^0 + r(\varvec{d}^\star ) \varvec{d}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\).

Proof

\(\varvec{d}_\mathcal {I}^\star \) is an extreme point of \(\mathcal {F}_{\textsc {CP}}\) but does not belong to \(\varDelta ^{int}\). Since \(\varDelta ^{int}\) is a finite subset of elements of \(\mathcal {F}_{\textsc {CP}}\), and since \(\mathcal {F}_{\textsc {CP}}\) is a polyhedron, there exists a cut \(\left( \bar{\varGamma }\right) \) that separates \(\varvec{d}_\mathcal {I}^\star \) from \(\bar{\varDelta }_{\mathcal {I}}^{int}\). By Proposition 8, the result holds. \(\square \)

It is also interesting to note that the linear transformation applied to the primal cut to transfer it to CP (\(\varvec{\bar{\alpha }}^T = \varvec{\alpha }^T \varvec{T}\)) is the same as for the objective function (\(\varvec{\bar{c}}^T = \varvec{c}^T \varvec{T}\)) and for the constraint matrix (\(\varvec{\bar{A}}_{\mathcal {\bar{P}}\mathcal {I}} = \varvec{A}_{\mathcal {\bar{P}}.} \varvec{T}\)). Finally, note that adding cuts does not prevent to use the characterization of extreme integer solutions as those having a column-disjoint support. Since no cutting plane is added to CP that cuts any integer directions, the set of extreme integral directions \( Ext \left( { Conv \,\left( \bar{\varDelta }^{int}_{\mathcal {I}} \right) } \right) \) is still contained in the set of extreme directions \( Ext \left( {\mathcal {F}_{\textsc {CP}}} \right) \) even after the addition of cutting planes. Geometrically, adding any valid inequality for \( Conv \,\left( \bar{\varDelta }^{int}_{\mathcal {I}}\right) \) to the relaxation cannot transform any nonextreme integral direction into a extreme one.

4.1 Specific primal separation procedures

As exposed in the introduction, there exist extensive work on the primal separation side. It has been shown that P-SEP is equivalent to SPP in terms of complexity [28]. Also, there exist general-purpose families of primal cuts, such as Gomory–Young’s cuts [35]. Iteratively adding cuts from these families would ultimately lead to a column-disjoint solution of CP, after a finite number of separations – provided that the choice of incoming variables follow some lexicographic order (see [35]). However, these families are not polyhedral, i.e., they do not take into account the SPP structure. We chose to study two families of inequalities that are known to yield strong cutting planes for SPP, namely clique and odd-cycle inequalities.

4.1.1 Primal clique inequalities

Let us consider the conflict graph of matrix \(\varvec{A}\), \(\mathcal {G} = \left( \mathcal {N}, E \right) \), where each node of \(\mathcal {N} = \left\{ 1, \ldots , n \right\} \) corresponds to a column of \(\varvec{A}\), and E is such that \(\{i,j\} \in E\) if and only if \(\varvec{A}_{.i}^T \varvec{A}_{.j} \ne 0\). Two vertices are linked by an edge if the corresponding columns are not disjoint. Given a clique \(\mathcal {W}\) in this graph, any feasible solution of SPP satisfies

$$\begin{aligned} \left( \varGamma _{\mathcal {W}}\right) \, :\, \sum _{i \in \mathcal {W}} x_i \le 1\text {,} \end{aligned}$$
(10)

called the clique inequality associated with \(\mathcal {W}\). clique cuts form a family of generally strong cutting planes for SPP, and were first introduced by Padberg [20]. Given a fractional solution \(\varvec{x}^\star \) of \({{\textsc {SPP{}}}}^{\mathrm {LR}}\), the standard clique separation problem consists in associating the weight \(w_i = x^\star _i\) with each vertex of \(\mathcal {G}\) and determine a clique of weight greater than 1 in \(\mathcal {G}\) if any.

From the work of Letchford and Lodi [16], we develop here a more efficient procedure for primal clique cuts. In the primal context, let \(\varvec{d}^\star \) be a fractional direction, and \(\varvec{x}^\star = \varvec{x}^0 + r(\varvec{d}^\star ) \varvec{d}^\star \). For \(\left( \varGamma _{\mathcal {W}}\right) \) to be tight at \(\varvec{x}^0\), we need \(\sum _{i \in \mathcal {W}} x^0_i = | \mathcal {W} \cap \mathcal {P} | = 1\). Hence, exactly one variable from \(\mathcal {P}\) must be part of clique \(\mathcal {W}\). Denote that variable as l, and the corresponding clique as \(\mathcal {W}_l\). Furthermore, for \(\left( \varGamma _{\mathcal {W}_l}\right) \) to separate \(\varvec{x}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\), we must have \(\sum _{i \in \mathcal {W}_l} x^\star _i > 1\). Since \(x^\star _i = r(\varvec{d}^\star ) d^\star _i\) for all \(i \in \mathcal {I}\), \(\sum _{i \in \mathcal {W}_l} x^\star _i = x^\star _l + \sum _{i \in {\mathcal {S}^+}{}} x^\star _i\), where \({\mathcal {S}^+}{} = Supp \,\left( \varvec{d}_\mathcal {I}^\star \right) \). Denote also as \({\mathcal {S}^-}{} = Supp \,\left( \varvec{d}_\mathcal {P}^\star \right) \) the set of columns of \(\mathcal {P}\) that are impacted by direction \(\varvec{d}^\star \). Sets \(({\mathcal {S}^-}{}, {\mathcal {S}^+}{})\) form a partition of \( Supp \,\left( \varvec{d}^\star \right) \) such that for all \(i \in Supp \,\left( \varvec{d}^\star \right) \), \(i \in {\mathcal {S}^+}{}\) if \(d^\star _i > 0\) and \(i \in {\mathcal {S}^-}{}\) if \(d^\star _i < 0\), so

$$\begin{aligned} \sum _{i \in \mathcal {W}_l} x^\star _i> 1 \Leftrightarrow x^\star _l + \sum _{i \in \mathcal {W}_l \cap {\mathcal {S}^+}{}} x^\star _i > 1\text {.} \end{aligned}$$

Because \(\mathcal {W}_l\) is a clique in \(\mathcal {G}\), then \(\mathcal {W}_l \cap {\mathcal {S}^+}{} \subseteq \mathcal {W}_l\) is also a clique. The consequence of these observations is summarized in the following proposition, which shows that it is sufficient to look for a primal clique cut within \({\mathcal {S}^+}{} \cup {\mathcal {S}^-}{}\) to solve the complete primal clique separation problem.

Proposition 9

There exists a primal clique inequality that separates \(\varvec{x}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\) if and only if for some \(l \in {\mathcal {S}^-}{}\), there exists a clique \(\mathcal {W}_l\) that satisfies

$$\begin{aligned} (1)\; \mathcal {W}_l \setminus \{l\} \subseteq N_l, \;(2)\; l \in \mathcal {W}_l, \;{\textit{and}} \;(3) \;w(\mathcal {W}_l) > 1, \end{aligned}$$

where \(N_l\) is the set of neighbors of l in \(\mathcal {G}\) that are also in \({\mathcal {S}^+}{}\) and \(w(\mathcal {W}_l) = \sum _{i \in \mathcal {W}_l} x^\star _i\).

Hence, the primal separation procedure for primal clique cuts, called CL_PSEP and summarized in Algorithm 2, returns an empty set of primal clique cuts if and only if none exists, i.e., this separation procedure is exact. This algorithm always finds a cut if there exists one, and that will be the most violated. However, the size of the clique could be potentially increased by adding 0-weight variables. Even if Step 4 of Algorithm 2 requires solving a \(\mathcal {NP}\)-hard problem,Footnote 4 note that the procedure is practically very fast because the size of \(N_l\) is usually small.

figure l

4.1.2 Primal odd-cycle inequalities

odd-cycle inequalities form another well-known family of valid inequalities for SPP. Given a cycle \(\mathcal {Q}\) of odd length in \(\mathcal {G}\), the following inequality

$$\begin{aligned} \left( \varGamma _{\mathcal {Q}}\right) \, :\, \sum _{i \in \mathcal {Q}} x_i \le \frac{|Q|-1}{2} \end{aligned}$$
(11)

is valid for SPP. Clearly, \(\left( \varGamma _{\mathcal {Q}}\right) \) is tight at \(\varvec{x}^0\) if and only if \(\sum _{i \in \mathcal {Q}} x^0_i = |\mathcal {Q} \cap \mathcal {P}| = (|\mathcal {Q}|-1)/2\). Furthermore, since \(\mathcal {P}\) is column-disjoint, there exists no edge between any pair of vertices of \(\mathcal {P}\). Therefore, \(\mathcal {Q}\) is an alternate cycle with vertices in \(\mathcal {P}\) and \(\mathcal {I}\), except for one \(\mathcal {I}-\mathcal {I}\) edge (i.e., an edge that links two vertices of \(\mathcal {I}\)). Based on these observations and similarly to clique cuts, the search graph can be restricted to vertices in \( Supp \,\left( \varvec{d}^\star \right) \), but in a stronger manner.

Proposition 10

Every primal odd-cycle cut \(\left( \varGamma _{\mathcal {Q}}\right) \) that separates \(\varvec{x}^\star \) from \(\mathcal {F}_{{\textsc {SPP{}}}}\) satisfies \(\mathcal {Q} \subseteq Supp \,\left( \varvec{d}^\star \right) \).

Proof

Suppose that the result is false and that there exists a cycle \(\mathcal {Q} \nsubseteq Supp \,\left( \varvec{d}^\star \right) \). Let \({\mathcal {S}^-}{} = Supp \,\left( \varvec{d}_\mathcal {P}\right) \) and \({\mathcal {S}^+}{} = Supp \,\left( \varvec{d}_\mathcal {I}\right) \). Define \(\mathcal {Q} = (q_1, q_2, \ldots , q_{2T+1}, q_1)\), where \(q_1, q_{2t+1} \in \mathcal {I}\) and \(q_{2t} \in \mathcal {P}\) for all \(t \in \{1, \ldots , T\}\). \(\mathcal {Q}\) is an alternate cycle but for the \(\{q_{2T+1},q_1\}\) edge. Consider the two following cases:

  1. (i)

    \(\underline{q_1 \notin {\mathcal {S}^+}{}}\). Then \(d^\star _{q_1} = 0\) and

    $$\begin{aligned} \sum _{i \in \mathcal {Q}} x^\star _i = \sum _{i = 2}^{2T+1} x^\star _{q_i} = \sum _{t=1}^{T} \left( x^\star _{q_{2t}} + x^\star _{q_{2t+1}} \right) \text {.} \end{aligned}$$

    Every \((q_{2t},q_{2t+1})\) is an edge of \(\mathcal {G}\), so for every pair of columns \(\left( \varvec{A}_{.2t}, \varvec{A}_{.2t+1} \right) \) there exists a row i such that \({A}_{{i}{(2t)}} = {A}_{{i}{(2t+1)}} = 1\). The linear set partitioning constraint that corresponds to that conflict yields \(x^\star _{q_{2t}} + x^\star _{q_{2t+1}} \le 1\) for all \(t \in \{1, \ldots T\}\). Hence, \(\sum _{i \in \mathcal {Q}} x^\star _i \le T = \left( |Q|-1\right) /2\) and \(\varvec{x}^\star \) does not violate \(\left( \varGamma _{\mathcal {Q}}\right) \).

  2. (ii)

    \(\underline{q_2 \notin {\mathcal {S}^-}{}}\). From \(\varvec{A} \varvec{d}^\star = \varvec{0}\) and \(\varvec{d}_\mathcal {I}^\star \ge \varvec{0}\), necessarily \(q_1, q_3 \notin {\mathcal {S}^+}{}\). Case (i) applies and this concludes the proof.\(\square \)

Proposition 10 suggests to distinguish between \({\mathcal {S}^-}{}-{\mathcal {S}^+}{}\) and \({\mathcal {S}^+}{}-{\mathcal {S}^+}{}\) edges. Let \(\mathcal {G}_B = \left( N_B, E_B \right) \) be a subgraph of \(\mathcal {G}\) with \(N_B = {\mathcal {S}^-}{} \cup {\mathcal {S}^+}{} = Supp \,\left( \varvec{d}^\star \right) \), and \(\{i,j\} \in E_B\) if and only if \(i \in {\mathcal {S}^-}{}\), \(j \in {\mathcal {S}^+}{}\), and \(\{i,j\} \in E\) (in conflict graph \(\mathcal {G}\)). \(\mathcal {G}_B\) is a bipartite graph. In the case of the primal odd-cycle separation, the edges (and not the vertices) of the graph are weighted as \(w_{ij} = 1 - x^\star _i - x^\star _j\) if \(\{i,j\} \in E_B\). The weight of a path or a cycle is the sum of the weights of its edges. Given a cycle \(\mathcal {Q}\), its weight is therefore \(w(\mathcal {Q}) = |\mathcal {Q}| - 2 \sum _{i \in \mathcal {Q}} x^\star _i\) and the corresponding odd-cycle inequality is violated by \(\varvec{x}^\star \) if and only if \(w(\mathcal {Q}) < 1\). The primal odd-cycle separation procedure, CY_PSEP is given in Algorithm 3. This algorithm is usually fast because it consists of finding at most \(|{\mathcal {S}^-}{}|\) shortest paths in a relatively small bipartite graph \(\mathcal {G}_B\).

figure m

4.2 Algorithm

Algorithm 4 incorporates the cutting plane results discussed in the previous sections into Algorithm 1. The different (nonexhaustive) branching strategies that we use are presented in Sect. 5.

figure n

\(\mathrm {SEP\_MAX}{}\) is a parameter that represents the maximum number of separation problems to be solved consecutively. Explicit values chosen for \(\mathrm {INC\_MAX}{}\) and \(\mathrm {SEP\_MAX}{}\), as well as the methods to determine the set of variables to be fixed (line 33) are discussed in Sect. 5 before the numerical results are given. Different priority rules for choosing the separation algorithm (line 28) are studied and the corresponding results are displayed in the next section.

5 Numerical results

The results presented in this section empirically show the relevance of our theoretical results. We show that primal cuts indeed improve the performance of our algorithm and help foster integral solutions in Isud.

5.1 Methodology

5.1.1 Instances

The instances presented here are those used by Zaghrouti et al. [37] (sppaa01, vcs1200, and vcs1600) to which we added another instance from the OR-Library,Footnote 5 sppaa04. Both sppaa01 and sppaa04 are small flight assignment problems (respectively 823 constraints and 8904 variables, and 426 constraints and 7195 constraints) with an average of 9 nonzero entries per column. vcs1200 is a medium-size bus driver scheduling problem (1200 constraints, 133,000 variables), and vcs1600 is a large bus driver scheduling problem (1600 constraints, 571,000 variables); both have an average of 40 nonzero entries per column. These numbers of nonzero entries per column are typical of aircrew and bus driver scheduling problems and do not vary much with the number of constraints. In scheduling instances, they typically correspond to the number of tasks to be completed by a worker in a given time span. The optimal solutions of the problems are known: sppaa01 and sppaa04 were solved with CPLEX,Footnote 6 and vcs1200 and vcs1600 were made with all columns generated during a column-generation process by GENCOLFootnote 7 whose optimal solution is known.

5.1.2 Initial solutions

Generating meaningful initial solutions is crucial for accurately testing primal algorithms. For this reason, we took here an “application-oriented” viewpoint and we generated initial solutions (from known optimal ones) that accurately mimic the ones available in practice, especially in transportation scheduling problems, i.e., where SPP is widely used. More precisely, in industrial crew scheduling instances, one can observe the following:

  1. (1)

    In the case of beforehand planning, one can use the vehicle (aircraft/bus) routes as initial crew schedules. Each column of this solution corresponds to the set of legs (flight or bus trips) covered by that vehicle. If some of these schedules are not feasible for the pilots/drivers (they may not respect the collective agreement), the corresponding columns are given prohibitive big-M costs.

  2. (2)

    In the case of reoptimization, one can use the existing schedule as initial solution. If some schedules are not feasible anymore in the new situation, they are given prohibitive big-M costs.

Suppose now that the rows of \(\varvec{A}\) are given in chronological order of the corresponding tasks, i.e., for all \(i, j \in \mathcal {R}\), \(i<j\) if and only if task i starts before task j. Given a solution \(\varvec{x}\) and the corresponding indices of the nonzero variables \(\mathcal {P} = Supp \,\left( x\right) \), a pair \((i,j) \in \mathcal {R}^2\) of tasks is called consecutive in \(\varvec{x}\) if and only if there exists \(k \in \mathcal {P}\) such that: (i) \({A}_{{i}{k}} = {A}_{{j}{k}} = 1\) and (ii) \(\forall j^\prime \in \left\{ i+1, \ldots , j-1 \right\} , {A}_{{j^\prime }{k}} = 0\). Consecutive tasks are therefore all the pairs of tasks performed consecutively by the same employee (see Example 1).

The main observation here is the following: the initial solutions described above share a high proportion of consecutive tasks with the optimal solution. Quantitatively, this is described by the primal information contained in the initial solution \(\varvec{x}^0\) that we define as the percentage of consecutive tasks of \(\varvec{x}^0\) that are also consecutive tasks in the optimal solution \(\varvec{x}^\star _\mathrm{SPP}\). In bus driver scheduling problems, the initial solution that follows the bus routes typically contains 90% of primal information (vcs1200, vcs1600). In aircrew scheduling, the figure is generally around 75% [37] (sppaa01, sppaa04). These percentages are even higher in the reoptimization case. Example 1 shows how primal information is computed on an 11-task case.

Example 1

Consider an 11-task scheduling problem. The tasks (rows) are in chronological order and denoted as A, ..., K. The individual schedules used in the optimal and initial solutions are respectively those of \(\mathcal {P}^\star \) and \(\mathcal {P}^0\) and are described here:

\(\mathcal {P}^\star \)

Task

\(\mathcal {P}^0\)

1

0

0

A

1

0

0

0

1

0

B

0

1

0

0

1

0

C

0

1

0

1

0

0

D

1

0

0

1

0

0

E

1

0

0

1

0

0

F

0

1

0

1

0

0

G

0

1

0

0

1

0

H

0

0

1

0

0

1

I

1

0

0

0

1

0

J

0

0

1

0

0

1

K

1

0

0

The set of consecutive tasks in the optimal solution is \(\mathcal {CT}^\star = \lbrace \left( A, D \right) , \left( D, E \right) , \left( E, F \right) , \left( F, G \right) , \left( B, C \right) , \left( C, H \right) , \left( H, J \right) , \left( I, K \right) \rbrace \). That of the initial solution is: \(\mathcal {CT}^0 = \lbrace \left( A, D \right) , \left( D, E \right) , \left( E, I \right) , \left( I, K \right) , \left( B, C \right) , \left( C, F \right) , \left( F, G \right) , \left( H, J \right) \rbrace \). Therefore, the primal information contained in \(\varvec{x}^0\) is \(|\mathcal {CT}^\star \cap \mathcal {CT}^0| / |\mathcal {CT}^0| = 5/8 = 62.5\%\).

We chose to perturb the (known) optimal solutions to generate initial solutions that contain a similar level of primal information to that experienced in practice. The details of the perturbation method can be found in [37]. These solutions are generally infeasible, so we assign the perturbed columns a high cost, as is done in companies for the schedules that follow the bus/airplane routes. In our perturbation method, the input parameter \(\pi \) is the percentage of columns of the optimal solution that will appear in the new initial solution. For sppaa04 and sppaa01, we generated initial solutions for \(\pi \) = 10, 15, 20, and 35%; for vcs1200 and vcs1600, we used \(\pi =\) 20, 35, and 50%. These parameters were chosen so that the resulting primal information (given in Table 1) is consistent with the typical values. The initial gaps range from 50 to 80%, depending on the instance. Here (and in the rest of the paper), the gap is the ratio of the difference between the current solution value and the optimal solution value divided by the latter.

Table 1 Percentage of primal information in the initial solutions of the benchmark

Remark 5

The primal information is related to a form of consecutive-ones property in the SPP. Suppose here that the rows are re-sorted so that each column of the initial solution has consecutive ones and that the corresponding nonzero rows are sorted in chronological order of the tasks. The primal information is the measure of how close the optimal solution is to the consecutive-ones property with that particular ordering, namely, it is the percentage of consecutive ones of the initial solution that are also consecutive in the optimal solution. With that order, given the typical figures of primal information in practical problems (over 75%), the columns of the optimal solution are thus made of few blocks of consecutive ones (“quasi”-consecutive-ones). From the algorithmic point of view, this characteristic of the problem is only superficial in the sense that it depends on its presentation (ordering of the rows) and not on its intrinsic structure. Furthermore, with a pure chronological ordering (the one that we use), neither the initial nor the optimal solution have the consecutive-ones property. More interesting is the computation of the incompatibility degree (see, Sect. 3.1.2): it is a dynamic measure of the number of breaks of consecutive-ones in the nonbasic columns, based on the ordering in which the current solution has the consecutive-ones property with chronological order of the tasks. Rather than sticking to a static ordering of the rows corresponding to the initial solution when measuring the incompatibility degree, we compute it in terms of the current solution.

5.1.3 Cutting planes strategies

The tests were conducted for the following cutting planes strategies:

none::

Without primal cuts;

Clique::

With primal clique cuts only;

Cycle::

With primal odd-cycle cuts only;

Both::

Both aforementioned cut types are separated at every P-SEP step;

Prio::

Primal odd-cycle cuts are separated only if no primal clique cut is found.

Moreover, another parameter is included, which is the maximum number of separation problems solved before using another technique (such as branching). We analyzed the results for different values ranging from 40 to 40,000 (virtually infinite). In a very large majority of the cases, either no cut could be found before the 40th P-SEP was solved, or the cutting planes yield an integral direction in less than 40 P-SEP. Hence, we fixed the maximum number of consecutive P-SEP to 40 and only present these results here.

5.1.4 Branching strategies

We propose four different nonexhaustive branching techniques to improve the performance of the algorithm. They correspond to the decision made when no primal cut is found that cuts the current fractional direction.

Nobr::

stop the algorithm;

Last::

all variables of the last fractional direction found are set to zero in \({\textsc {CP}}{}\) until an augmentation is performed;

First::

all variables of the first direction found since the last augmentation or the last branching are set to zero in \({\textsc {CP}}{}\) until an augmentation is performed;

Cover::

a subproblem is solved to determine a small subset of \(\mathcal {I}\) such that the support of each fractional solution found since the last augmentation or the last branching contains at least one element of this set. These variables are set to zero in \({\textsc {CP}}{}\), so that all the aforementioned fractional solutions become infeasible.

All these techniques were experimented, but we present detailed results for Nobr and First only. The performance for the other two branching strategies are exposed more briefly. The variation in the results is not significant enough to make a detailed presentation of all four aforementioned strategies.

5.2 Results

All the tests were performed on a Linux PC with a processor of 3.4 GHz. For each problem and each perturbation parameter \(\pi \), 10 different instances (different initial solutions and corresponding artificial columns) are generated. Column \(\textsc {Algo}{}\) indicates the cutting strategy used; Best is the best of the four on each instance, i.e., the one with the smallest gap, and in the event of a tie, the shortest computational time to reach the best solution. All times are given in seconds.

The commercial solver CPLEX was tested on all the instances of the benchmark, with the initial solutions given as MIP-start. Instances sppaa01 and sppaa04 are solved to optimality in an average of 22.1 and 14.4 s, respectively. Within a time limit of 1 h, provided the initial solutions as a warm start, CPLEX only slightly improves them on the instance vcs1200, never lowering the gap under 14% (average gap of 49.7%). Within that same time limit, it never improves any of the initial solutions given for vcs1600 at all (the average gap remains 73% as for the initial solutions). Note that, if the initial solution is not given, CPLEX only finds a feasible solution for 1 of 30 instances for both vcs1200 and vcs1600 within 1 h, and that solution is of comparable cost to that of our initial solutions. Here, one should consider sppaa01 and sppaa04 as benchmark instances used for a detailed analysis of our algorithm and its behavior, and vcs1200 and vcs1600 as practical instances that we aim to solve within a few minutes.

Table 2 Comparison of the performances of Isud with branching Nobr for instance sppaa01
Table 3 Comparison of the performances of Isud with branching Last for instance sppaa01
Table 4 Comparison of the performances of Isud with branching Nobr for instance sppaa04
Table 5 Comparison of the performances of Isud with branching Last for instance sppaa04

Tables 2 and 3 show the results of our algorithm for all instances generated from sppaa01, with associated branching strategies Nobr and Last, for parameters \(\mathrm {INC\_MAX}{} = 10\), \(\mathrm {SEP\_MAX}{} = 40\). Other values were tested and gave very similar results. Columns \(\pi \) and \(\textsc {Algo}{}\) display the perturbation degree of the instances and the chosen separation strategy, respectively. The next three columns display the number of instances (out of 10) that were solved to optimality (\(0\%\)), with a positive gap \(\le 2\%\), and with a gap \(>2\%\). The number of instances where the best solution found by Isud still contains artificial columns from the initial solution (with big-M cost) is indicated between brackets next to those with a gap \(>2\%\) (because of the big-M, no solution with a gap lower or equal to \(2\%\) contains any of these columns). Under label mean is the mean gap computed over instances where solutions contain no artificial column only. Then, the overall computation time (\(t_{\textsc {Isud}{}}\)), the time to reach the best solution obtained (\(t_{\textsc {Best}}\)) and the average time per augmentation (\(t_\mathbf{AUG {}}\)) are displayed. The last columns contain the mean number of augmentations K performed to reach the best solution, and the mean size of \(|\mathcal {S}| = Supp \,\left( \varvec{d}\right) \) for disjoint (\(|\mathcal {S}|^{\textsc {D}}\)) and nondisjoint () directions. Note that, while the other mean values are computed over all executions of the algorithm, the mean number of augmentations K is based on the instances for which the final gap is \(\le 2\%\). Large gaps often mean a much smaller number of iterations, and taking these instances into account would distort that mean. Tables 4 and 5 show the results of the same experiments for instances generated from sppaa04.

Fig. 6
figure 6

Performance diagrams over all sppaa04 and sppaa01 instances for the four branching strategies (Nobr (top-left), Last (top-right), First (bottom-left) and Cover (bottom-right). An instance is considered as solved if the gap is lower than 2%

Consistently with our expectations and whatever the cutting planes or branching strategy, the primal information strongly influences the results of the algorithm. Namely, the higher the percentage of primal information is, the better the algorithm performs, both in terms of quality of the solution, and average running time. The branching strategies highlight common features and global differences between the various cutting techniques. First, the performance of the algorithms can be globally ranked from worst to best as follows: \(\textsc {none}{} \preceq \textsc {Cycle}{} \preceq \textsc {Clique}{} \preceq \textsc {Prio}{} \preceq \textsc {Both}{}\). Moreover, the execution time is significantly shorter for \(\textsc {none}{}\) and \(\textsc {Cycle}{}\) than for the others. In the case of \(\textsc {none}{}\), no primal separation problem is solved, and either no branching, or very simple fixing rules are applied, hence the high speed of the algorithm. In the case of \(\textsc {Cycle}{}\), the number of primal cycle cuts found is quite small, so the computation time is also smaller. The same holds for the time spent to reach the best solution (\(t_{\textsc {Best}}\)) and the time per augmentation (\(t_\mathbf{AUG {}}\)). Note that the difference is much bigger for \(t_{\textsc {Isud}{}}\) than for \(t_{\textsc {Best}}\) because the time spent at the optimal solution to generate cutting planes is important. The number of augmentation steps is higher for the algorithms that generate more primal cuts (column K). Generally, cutting planes allow the algorithm to find more disjoint solutions within the same phase, hence yielding a larger number of steps for each incompatibility degree. The algorithms using cuts tend to generate disjoint and nondisjoint combinations of larger size (columns \(|\mathcal {S}|^{\textsc {D}}{}\) and ). Cutting planes tend to make the problem more complex, add constraints, and the size of the support of a basic solution of \({\textsc {CP}}{}\) therefore increases with the number of cuts inserted, hence the larger size of the combinations.

Fig. 7
figure 7

Percentage of instances solved over solution time for all sppaa01 instances for the four branching strategies (Nobr (top-left), Last (top-right), First (bottom-left) and Cover (bottom-right). An instance is considered as solved if the gap is lower than 2%

Furthermore, the algorithm is significantly better if fixing variables is available. This is particularly true when no cut is applied. Both nonexhaustive branching and cutting planes improve the performance of the algorithm. If we analyze the performance over sppaa01 for \(\pi = 15\%\) (closest to real-life problems) we see that (1) when no branching is made, Prio solves 5/10 problems to optimality (against 2/10 for none), and 9/10 within 2% of the optimum (5/10 for none); and (2) when variables are fixed, Prio solves 9/10 problems to optimality (2/10 for none) and all of them within 2% of the optimum (6/10 for none). Cutting planes therefore solve 7 out of the 8 problems that Isud did not solve previously, hence yielding an improvement of 87.5% over sppaa01 for \(\pi = 15\%\).

Fig. 8
figure 8

Percentage of instances solved over solution time for all sppaa04 instances for the four branching strategies (Nobr (top-left), Last (top-right), First (bottom-left) and Cover (bottom-right). An instance is considered as solved if the gap is lower than 2%

Figure 6 displays the four performance diagrams of the algorithms for branching strategies Nobr, Last, First and Cover, over all sppaa04 and sppaa01 instances (70). Here again, whatever the branching strategy, cutting planes allow to solve many more instances. However, the improvement factor is much higher when no variable is fixed (less that \(50\%\) of the instances are solved without cuts; against more than \(80\%\) for Prio or Both separation strategies). Interestingly, the time-increase factor (x-axis) never exceeds 10, and is most of the time lower than 4. Hence, the addition of cutting planes does not slow down the algorithm too much. Detailed computing times for each of the two problems are displayed in Figs. 7 and 8, respectively. The Cover branching strategy shows slightly better performances, but not significantly enough to draw any conclusion yet. Moreover, more instances are solved by Isud faster than by CPLEX.

Tables 6 and 7 display specific characteristics concerning the cutting planes generated during the process. For each branching strategy (Nobr and Last), the number of instances solved within 2% of the optimum is shown under label inst. The mean time spent in the primal separation and cut management process including the cut pool management is given (\(t_{\textsc {Sep}}\)), as well as the mean number of separation problems solved (\(n_{\textsc {Sep}}\)), the mean number of primal clique cuts (\(n_{\textsc {Cl}}\)), and primal odd-cycle cuts (\(n_{\textsc {Cy}}\)). We can see that the time spent in the separation process is much higher when branching is applied. Indeed, in our algorithm, between two branching stages, \(\mathrm {SEP\_MAX}{}\) separation problems are solved. When no branching is performed, at most one sequence of \(\mathrm {SEP\_MAX}{}\) separation problems are solved and in case of failure to find a new primal cut, the algorithm instantly stops. This can also be seen in the \(n_{\textsc {Sep}}\) column, for which the number of separation problems solved with branching is significantly higher than without branching. Moreover, the time \(t_{\textsc {Sep}}\), and number of P-SEP \(n_{\textsc {Sep}}\) are significantly higher when the algorithm fails to find a good solution (gap \(>2\%\)). This reflects the struggling of the algorithm to improve the solution and the associated running time increase. As it is often the case for the SPP, clique cuts are more efficient, and easier to find. Furthermore, as seen in Sect. 4.1, the requirements for an odd-cycle cut to be a primal cut are harder to meet (alternated cycles) than those that apply to clique cuts (only one vertex of the clique must be in the current solution); this also make them rarer.

Definition 4

A solution \(\varvec{x}^0\) is \(\iota \) -optimal if it is optimal for the restriction of SPP to \(\mathcal {P} \cup \mathcal {I}^{\iota }\), i.e., for the set of all at most \(\iota \)-incompatible columns of \(\varvec{A}\) computed at \(\varvec{x}^0\).

Table 6 Cutting planes behavior of Isud on sppaa01
Table 7 Cutting planes behavior of Isud on sppaa04

Detailed results of Isud on vcs1200 and vcs1600 are displayed in Table 8 and 9. Branching did not change the results, hence the figures shown here correspond to the Nobr strategy and the time limit of half an hour was never reached. The first column shows the perturbation degree \(\pi \), and the second the cutting plane strategy \(\textsc {Algo}{}\). The next four display the number of instances solved to optimality (\(0\%\)), with a positive gap \(\le 2\%\) and with a gap \(>2\%\), as well as the mean gap computed over instances where solutions contained no more artificial columns (mean). Columns \(t_{\textsc {Isud}{}}\) and \(t_{\textsc {Best}}\) indicate the total running time and the time spent before reaching the best solution found, respectively. Then, the number of instances for which \(\iota \)-optimality has been proved is given for \(\iota = 7\) and \(\iota = 8\), and finally the mean size of the disjoint (\(|\mathcal {S}|^D\)) and nondisjoint (\(|\mathcal {S}|^N\)) combination are shown.

Table 8 Results for instance vcs1200
Table 9 Results for instance vcs1600

In the experiments on vcs1200 and vcs1600, we chose a lower value for the maximum incompatibility number (\(\mathrm {INC\_MAX}= 8\)) than for the smaller problems; this choice is motivated by several reasons. First, it limits the running time of the algorithm. Second, this number is already high compared to what swap heuristics can consider. As explained in Sect. 3.1.2, the incompatibility degree of \(\varvec{A}_{.j}\) is proportional to the number of sequences of consecutive tasks from \(\varvec{A}_{.j}\) that are not performed consecutively in the current solution. Hence, it is a kind of measure of the primal distance between a column and the current solution. This number can be compared to the typical parameter of a swap heuristic that tries to swap parts of the current solution to form new individual schedules which is the number of times an individual schedule of the current solution may be split. Nonbasic columns for which \(\iota =8\) are made of at least 4 separate sequences of tasks performed consecutively in the current solution (there is no maximum number); in practice, this number usually ranges between 6 and 7. For an average of 40 nonzero entries per columns, this number therefore seems reasonable. It is in particular much higher than the typical number of splits allowed in the existing schedules in a swap heuristic, and the neighborhood explored by our algorithm is hence significantly wider than that of a swap heuristic.

Furthermore, the numerical results show that, even though primal cuts yield no improvement, they prove the \(\iota \)-optimality of the solution in many cases, i.e., they show that the solution returned by Isud is optimal for the restriction of SPP to the set of at most \(\iota \)-incompatible columns (\(\mathcal {I}^\iota \)). This is especially true in the case of vcs1200, where 3 instances are proven 7-optimal without cutting planes, whereas 26 are with cutting planes.

Finally, the failure of the cutting planes to improve the results of Isud on vcs1200 and vcs1600 can be partly explained by the nature of these instances. Although they describe real-world problems, they are in fact large instances obtained from column generation solution of bus drivers scheduling instances. All columns generated throughout the Branch-and-Price process are stored and put together to form a large scale instance, the solution of which is known. However, due to the intrinsic nature of that solution process, the density of the vertices of the resulting relaxed polyhedron (\(\mathcal {F}_{{\textsc {SPP{}}}^{\mathrm {LR}}}\)) tends to be higher near the optimal solution. Hence, if a wrong decision is made in the beginning, there is a high probability of going to a region of the polyhedron where very few extreme integer neighbors, and thus integer directions, exist. Getting back to an area where the density of integer neighbors is higher, only by traveling alongside integer-to-integer edges, may then become extremely complicated. In the context of all-integer column generation, i.e., alternatively generate columns and improve the integer solution with Isud as evoked in the introduction, this issue would disappear, because the dynamically generated columns increase the density of extreme points around the current solution, hence providing many more potential directions to the complementary problem.

6 Conclusions

In this work, we proposed a purely primal formulation of Isud and introduced cutting planes in the complementary problem CP. This method is guaranteed to reach optimality whenever an appropriate set of cutting planes is used (Gomory-Young, for example). We showed that the incompatibility degree defines a neighborhood of the current solution on which the augmentation problem is solved with integer programming techniques. In order to improve the practical numerical performances, we compromise by both heuristically increasing the neighborhood and by stopping the algorithm prematurely. The efficiency of linear programming and of our separation algorithms (Algorithms 2 and 3) allows us to explore a larger neighborhood than what a typical exchange heuristic would consider.

The work conducted on the mathematical formulation led us to characterize the set of cuts that can be transferred to CP as a nonempty subset of primal cuts tight at \(\varvec{x}^0\). We showed that, given a fractional direction \(\varvec{d}^\star \), a primal clique (or odd-cycle) cut exists if and only if there exists one that only involves variables of \( Supp \,\left( \varvec{d}^\star \right) \). Furthermore, in the case of primal odd-cycles, the separation over all variables is strictly equivalent to that over \( Supp \,\left( \varvec{d}^\star \right) \). Tests conducted over a benchmark of practical-like instances proved the potential of our method and highlight the importance of primal cutting planes in Isud.

This work should be extended threefold. First, to gain better understanding of our algorithm, a larger benchmark is to be taken in consideration and a combination of cutting planes and other techniques (such as those proposed in [23] for instance) must be tested over that larger set of problems. Second, other cutting planes families should be taken in consideration, and the corresponding primal separation procedures must be developed. Last, the algorithm need to be extended to more general {0,1}-programs, and not only to SPP instances. This last point seems to be the most important if the method is to be used in practice, since supplementary constraints are often added to set partitioning problems. All these extensions are active research subjects and the present work will have extensions in a near future. Of course, further extensions to mixed {0,1}-problems and to general mixed-integer ones would be very interesting but also harder, especially for the latter class. This is because the property of any feasible solution being a vertex of the polyhedron of the continuous relaxation is not generally true.