1 Introduction

Resource constrained project scheduling problems (RCPSPs) seek to optimally schedule activities over time in such a way as to comply with precedence and resource usage constraints. These problems can be notoriously difficult. Despite great progress in optimization methodologies during the last fifty years, there are instances of RCPSP involving as few as sixty activities that cannot be solved with today’s most effective algorithms [24]. In multi-modal extensions of RCPSP, jobs can be processed in different ways (or modes). Changing the mode of a job affects its resource utilization, and its processing costs. In some applications, changing the mode of a job changes the duration of its processing time. In batch-processing extensions of RCPSP jobs are grouped together in clusters that must be processed together. For simplicity of notation, we will henceforth refer to multi-modal batch resource constrained project scheduling problems, simply as General Production Scheduling Problems, or GPSPs.

To date, the most effective methods for solving GPSPs, especially modal and batch variants, are based on integer programming methods that use the so-called time index formulations [3, 7, 9, 39]. These formulations define a binary variable for each job-execution time combination. An important limitation of these formulations is that the linear programming (LP) relaxations are very difficult to solve. This is because they tend to be large and degenerate, even for small problem instances.

While classical decomposition techniques and column generation approaches can be used for addressing some of these issues (specially complications related to the large number of variables), they often suffer from slow convergence rate. In this context, an important contribution was made by Bienstock and Zuckerberg [5, 6], where the authors presented an alternative to tackle these limitations effectively. They developed a new algorithm that can considerably outperform classical methods in a broad class of problems, thus providing a novel set of tools with a high practical impact and a wide range of potential extensions.

In this paper we study the Bienstock–Zuckerberg (BZ) algorithm [5, 6] in depth, and find that it can be used to overcome the stated limitations on a wide range of scheduling problems. We provide additional evidence to that in [5, 6], advocating for the efficacy of the algorithm in practice, and we provide new insights on it that allow further extensions. Specifically, we study the BZ algorithm as an approach to batch, multi-modal production scheduling problems where jobs can have arbitrary durations, but where these durations are not affected by changes of mode. The application of the BZ algorithm to this class of problems requires us to extend the algorithmic template as it was originally proposed. We are specifically concerned about this class of problems because, in addition to generalizing the well-known RCPSP, it generalizes three very important problems that arise in the context of mine planning: Underground Production Scheduling Problems (UPSPs), Open Pit Phase Design Problems (OPPDPs), and Open Pit Production Scheduling Problems (OPPSPs).

We present a new proof of algorithm correctness by casting the BZ algorithm as a column generation method, discuss algorithmic speed-ups, computationally test it on a variety of problem instances, and compare the methodology to the more traditional Dantzig–Wolfe decomposition (DW) method. As part of our analysis we prove that the BZ algorithm is closely related to a decomposition scheme that produces a bound somewhere in between that of the DW method, and that of the LP relaxation. This study allows us to conclude that the BZ algorithm is an effective way of solving the LP relaxation of large time index formulations, significantly outperforming the DW method on all considered problem classes, and provides insights on the effectiveness of the methodology.

This article is organized as follows. In Sect. 2 we present a literature review of related scheduling work in mine planning applications and mathematical programming methodologies. In Sect. 3 we present an integer programming model that generalizes the classes of problems we are interested in studying. In Sect. 4 we present a reformulation of this model that fits the algorithmic framework we will be analyzing. In Sect. 5 we present a general column generation framework that can be used to solve the problem. We use this column generation approach to motivate the BZ algorithm, and facilitate a comparison to the DW method. We also introduce important speed ups for the BZ algorithm. Computational results analyzing the performance of BZ and comparing this performance to that of DW are presented in Sect. 6.

2 Background

2.1 Scheduling applications in mine planning

In this article we are mainly interested in addressing scheduling problems that are of relevance to planning applications in the mining industry.

The class of mining problems we are interested in include open pit and underground mine planning problems. In these problems, deposits are discretized into three-dimensional arrays known as block models. The problem to solve consists of deciding which blocks should be extracted, when they should be extracted, and what to do with the blocks once they are extracted. In this context, jobs correspond to extraction activities, modes correspond to different processing options and batches correspond to groups of blocks that must be extracted concurrently in order to comply with equipment extraction requirements. Resource constraints would be used to impose extracting, milling and refining capacity constraints. In an open-pit mine, precedence constraints would be used to impose that a mine must be extracted from the surface on downwards. In an under-ground mine, specifically in a stopping operation, precedence constraints would be used to impose that selected blocks branch out from a network of tunnels descending from the surface.

In all of these mining problems the objective is to maximize net-present-value of the extracted minerals. This is different from traditional scheduling problems, in which the objective is typically to minimize completion time metrics such as makespan, maximum tardiness, or total throughput time. Another difference between mine planning problems and traditional scheduling problem is that in mining it is not a strict requirement to execute all jobs. In all other ways, the Underground Production Scheduling Problems (UPSPs) is identical to the RCPSP. The Open Pit Phase Design Problems (OPPDPs) is a multi-modal RCPSP in which all jobs take a single-time period to complete, regardless of the selected mode. What the mode affects is the objective function value and the resource consumption of each activity. The Open Pit Production Scheduling Problems (OPPSPs) is just like the OPPDP, with the addition that there are batch constraints that force groups of blocks to be extracted simultaneously. These batch constraints are used to enforce equipment operability constraints, with each batch corresponding to contiguous set of blocks typically called a bench-phase or increment in the mining industry.

For an introduction to optimization in underground mining see Alford et al. [1]. For related work in underground mining see Martinez and Newman [27], Newman and Kuchta [32] and O’Sullivan et al. [35, 36]. The user manuals of Deswik Scheduler [15] and MineMax iGantt [28] illustrate how UPSP is solved in practical mining applications. For an introduction to Open Pit Mining see Hustrulid and Kuchta [22], and the user manuals of Dassault Whittle [13], MineMax Scheduler [30], and MineMax Planner [29]. For a discussion on OPPSP see Goycoolea et al. [18]. For a general survey on OR applications in mine planning see Newman et al. [33].

2.2 Mathematical programming methodologies

To our knowledge, the first mathematical programming formulations of GPSPs dates back to Johnson [23] and Pritsker et al. [38], in the late 1960s. Each of these articles spawned its own track of academic articles on production scheduling problems. The first track, following the work of Johnson, mostly focused on strategic open pit mine planning problems. The second track, following the work of Pritsker et al. took a more general approach. Surprisingly, though many of the results found in these two tracks coincide, there are few, if any, citations connecting the two literatures.

The academic literature on exact optimization methods for GPSPs is immensely rich. Well-known surveys from the scheduling track include those of Graham et al. [19], Brucker et al. [8] and Hartmann and Briskorn [20]. In a recent book by Artigues et al. [2] a very thorough survey of GPSP is presented, including a computational study that compares existing methodologies on benchmark instances. Surveys from the mining track include those of Newman et al. [33], Espinoza et al. [16] and Osanloo et al. [34].

A brief summary of advances on solving the LP relaxation of time-index formulations is as follows. Since as early as the 1960s, most methods have been based on some form of decomposition that reduces the problem to a sequence of maximum closure or minimum cut problems. Johnson [23], in the context of mining (single mode OPPDPs), was the first to attempt such an approach with a Dantzig–Wolfe decomposition. Shortly after, and independently, Fisher [17], in the context of scheduling, proposed a Lagrangian Relaxation decomposition approach. Since then, a number of decomposition algorithms, primarily Lagrangian Relaxation decomposition algorithms, have been proposed for the problem in both literatures. Important examples include Dagdelen and Johnson [11] and Lambert and Newman [26] in mining (single mode OPPDPs), and Christofides et al. [10] and Mohring et al. [31] in scheduling.

Some recent approaches are as follows. Chicoisne et al. [9] consider single-mode OPPDPs with a single renewable resource, and propose a decomposition algorithm that solves the continuous relaxation in polynomial time. Boland et al. [7] consider a multi-modal variant of the same problem with two renewable resources, and propose a decomposition algorithm for the continuous relaxation that, while not provably polynomial, is very effective in practice. The BZ algorithm [5], developed shortly after, can be considered a generalization of this last algorithm that extends to multi-modal OPPMPs with an arbitrary number of renewable or non-renewable resources. This algorithm proved to be very efficient, even for extremely large instance sizes; it reduced the solving times drastically and it was able to tackle instances that could not be solved before. Berthold et al. [3] developed a branch-and-bound algorithm that combines mathematical programming and constraint programming methods to close a large number of open benchmark instances of RCPSP. Zhu et al. [39] developed a mathematical programming based algorithm for solving multi-modal variants of RCPSP without batch constraints, but where mode changes affect duration times. In his work he closes a large percentage of open benchmark instances.

3 Integer programming formulation

We now describe an integer programming formulation for a class of production scheduling problems that generalizes the RCPSP, UPSP, OPPDP and OPPSP. As mentioned before, we simply refer to this class of problems as the General Production Scheduling Problem (GPSP).

Sets

  • \(\mathscr {A}\): activities that must be scheduled in the problem.

  • \(\mathscr {C}\): clusters of activities that define a partition of \(\mathscr {A}\).

  • \({prec }(c)\): clusters that must be initiated no later than cluster \(c \in \mathscr {C}\).

  • \(\mathscr {R}= \{1,\ldots ,R\}\): resources that are consumed when carrying out activities.

  • \(\mathscr {T}= \{1,\ldots ,T\}\): time periods in which it is possible to initiate activities.

  • \(\mathscr {M}_a = \{1,\ldots , M_a\}\): possible modes for activity \(a \in \mathscr {A}\).

Parameters

  • \(t^-_a\): release date for activity \(a \in \mathscr {A}\).

  • \(t^+_a\): due date for activity \(a \in \mathscr {A}\).

  • \(d_{a,m}\): duration (number of time periods) of activity \(a \in \mathscr {A}\) when executed in mode \(m \in \mathscr {M}_a\).

  • \(p_{a,m,t}\): profit obtained if activity \(a \in \mathscr {A}\) is initiated in period \(t \in \mathscr {T}\) using mode \(m \in \mathscr {M}_a\).

  • \(l(c_1,c_2)\): lag (number of time periods) that must elapse before activities in cluster \(c_2 \in \mathscr {C}\) can be initiated, after activities in cluster \(c_1 \in {prec }(c_2)\) have been initiated.

  • \(Q_{r,t}\): amount of resource \(r \in \mathscr {R}\) available in time period \(t \in \mathscr {T}\).

  • \(q_{r,a,m}\): amount of resource \(r \in \mathscr {R}\) consumed by activity \(a \in \mathscr {A}\) in each time period it is being executed, when executed in mode \(m \in \mathscr {M}_a\).

Variables

$$\begin{aligned} x_{c,t}&= \left\{ \begin{array}{rl} 1 &{} \text { if the activities in cluster } c \text { all start in time period } t \\ 0 &{} \text { otherwise} \end{array}\right. \\ y_{a,m,t}&= \left\{ \begin{array}{rl} 1 &{} \text { if activity } a \text { starts in time period } t \text { using mode } m \\ 0 &{} \text { otherwise} \end{array}\right. \\ \end{aligned}$$

Objective function

$$\begin{aligned} \text {maximize } \sum _{a \in \mathscr {A}} \sum _{m \in \mathscr {M}_a} \sum _{t \in \mathscr {T}} p_{a,m,t} y_{a,m,t} \end{aligned}$$
(1)

Note that we express the objective function only in terms of the y variables. There is no loss of generality in this due to constraints (3), below.

Constraints 

Clusters can only be initiated once over the time horizon [ \(\forall c \in \mathscr {C}\) ]:

$$\begin{aligned} \sum \limits _{t \in \mathscr {T}} x_{c,t} \le 1. \end{aligned}$$
(2)

Activities in a cluster must start simultaneously, and must be carried out using a unique mode [ \(\forall c \in \mathscr {C}, \; \forall a \in c, \; \forall t \in \mathscr {T}\) ]:

$$\begin{aligned} x_{c,t} = \sum \limits _{m \in \mathscr {M}_a} y_{a,m,t}. \end{aligned}$$
(3)

In order to initiate the activities in a cluster, all activities in preceding clusters must be initiated early enough so as to satisfy the lag requirement \(\;\; [ \forall c_2 \in \mathscr {C}, \; \forall c_1 \in {prec }(c_2), \; \forall t \in \mathscr {T}]\):

$$\begin{aligned} \sum \limits _{s \le t} x_{c_2,s} \le \sum \limits _{s \le t - l(c_1,c_2) } x_{c_1,s}. \end{aligned}$$
(4)

The amount of resources consumed cannot exceed the amount of resources available each period [ \(\forall r \in \mathscr {R}, \; t \in \mathscr {T}\) ]:

$$\begin{aligned} \sum \limits _{a \in \mathscr {A}} \sum \limits _{m \in \mathscr {M}_a} q_{r,a,m} \sum \limits _{s=\max \{1,t-d_{a,m}+1\}}^{t} y_{a,m,s} \le Q_{r,t}. \end{aligned}$$
(5)

Activities can only be scheduled after their release dates [ \(\forall a \in \mathscr {A}\) ]:

$$\begin{aligned} \sum \limits _{m \in \mathscr {M}_a} \sum \limits _{t = 1}^{t^-_a - 1} y_{a,m,t} = 0. \end{aligned}$$
(6)

Activities must be terminated no later than due dates and exactly one mode must be chosen for each executed activity [ \(\forall a \in \mathscr {A}: t^+_a < \infty \) ]:

$$\begin{aligned} \sum \limits _{m \in \mathscr {M}_a} \sum \limits _{t=1}^{t^+_a - d_{a,m} + 1} y_{a,m,t} = 1. \end{aligned}$$
(7)

Whenever \(t^+_a = \infty \), there is no due date for activity a and we do not include constraint (7). Note, however, that due to (2) and (3) we always have:

$$\begin{aligned} \sum \limits _{m \in \mathscr {M}_a} \sum \limits _{t \in \mathscr {T}} y_{a,m,t} \le 1. \end{aligned}$$

Thus, even when \(t^+_a = \infty \), there is an implicit constraint enforcing the choice of one mode and one time period at most.

It should be noted that the GPSP described by Formulation (1)–(7) generalizes the scheduling formulations found in Bienstock and Zuckerberg [6] in that it allows activities to have durations that span multiple time periods. This allows us to consider instances of RCPSP and UPSP which fell outside the scope of the Bienstock and Zuckerberg studies [5, 6].

Though this formulation is intuitive, and also the most commonly used formulation in the academic literature, it must be reformulated so as to fit the algorithmic schemes that will be presented.

4 Reformulation

In this section we present a reformulation of the GPSP formulation described in Sect. 3. As we will see in Sect. 5, this reformulation is key for the decomposition algorithms that we will discuss in this article.

For each \(c \in \mathscr {C}, a \in \mathscr {A}, m \in \mathscr {M}_a\) and \(t \in \mathscr {T}\) define,

$$\begin{aligned} w_{c,t} = \sum \limits _{s=1}^t x_{c,s} \qquad \text { and } \qquad z_{a,m,t} = \sum \limits _{i \in \mathscr {M}_a} \sum \limits _{s=1}^{t-1} y_{a,i,s} + \sum \limits _{i=1}^m y_{a,i,t}. \end{aligned}$$

In this way, \(w_{c,t}\) is a binary variable that takes value one if and only if cluster c is initiated “by” time t (i.e., no later than t). Likewise, \(z_{a,m,t}\) is a binary variable that takes value one if and only if activity a is initiated by time \(t-1\), or, if it is initiated in time period t, and its mode i is such that \(i \le m\).

To see that it is possible to formulate the production scheduling problem given by (1)–(7), using the (zw) variables, note that we can map between the two variable spaces by means of the following linear sets of equations:

$$\begin{aligned} y_{a,m,t}&= z_{a,m,t} - z_{a,m-1,t}&\qquad \forall a \in \mathscr {A}, \; m = 2, \ldots , M_a, \; t \in \mathscr {T}\end{aligned}$$
(8a)
$$\begin{aligned} y_{a,1,t}&= z_{a,1,t} - z_{a,M_a,t-1}&\qquad \forall a \in \mathscr {A}, \; t = 2, \ldots , T \end{aligned}$$
(8b)
$$\begin{aligned} y_{a,1,1}&= z_{a,1,1}&\qquad \forall a \in \mathscr {A}\end{aligned}$$
(8c)
$$\begin{aligned} x_{c,t}&= w_{c,t} - w_{c,t-1}&\qquad \forall c \in \mathscr {C}, \; \forall t = 2, \ldots , T \end{aligned}$$
(8d)
$$\begin{aligned} x_{c,1}&= w_{c,1}&\qquad \forall c \in \mathscr {C}\end{aligned}$$
(8e)

Using this mapping we can substitute out the variables in (1)–(7), to trivially obtain the following equivalent formulation:

Objective function

$$\begin{aligned} \text {max } \sum _{a \in \mathscr {A}} \sum _{m \in \mathscr {M}_a} \sum _{t \in \mathscr {T}} \tilde{p}_{a,m,t} z_{a,m,t} \end{aligned}$$
(9)

where,

$$\begin{aligned} \tilde{p}_{a,m,t} = \left\{ \begin{array}{ll} p_{a,m,t} - p_{a,m+1,t} &{} \text { if } m< M_a \\ p_{a,m,t} - p_{a,1,t+1} &{} \text { if } m = M_a, \; t < T \\ p_{a,m,t} &{} \text { if } m = M_a, \; t = T \end{array} \right. \end{aligned}$$

Constraints 

For \(a \in \mathscr {A}, \; m \in \{1,\ldots ,M_a-1\}, \; t \in \mathscr {T}\),:

$$\begin{aligned} z_{a,m,t} \le z_{a,m+1,t}. \end{aligned}$$
(10)

For \(a \in \mathscr {A}, \; m = M_a, \; t \in \{1,\ldots ,T-1\}\),:

$$\begin{aligned} z_{a,m,t} \le z_{a,1,t+1}. \end{aligned}$$
(11)

For \(c \in \mathscr {C}, \; a \in c, \; t \in \mathscr {T}\),

$$\begin{aligned} w_{c,t} = z_{a,M_a,t}. \end{aligned}$$
(12)

For \(c_2 \in \mathscr {C}, \; c_1 \in {prec }(c_2), \; t \in \{1+l(c_1,c_2),\ldots ,T\}\),

$$\begin{aligned} w_{c_2,t} \le w_{c_1,t-l(c_1,c_2)}. \end{aligned}$$
(13)

For \(r \in R, \; t \in \mathscr {T}\),

$$\begin{aligned} \sum \limits _{a \in \mathscr {A}} \sum \limits _{m \in \mathscr {M}_a} \sum \limits _{s\in \mathscr {T}} \tilde{q}^{r,t}_{a,m,s} z_{a,m,s} \le Q_{r,t}. \end{aligned}$$
(14)

where

$$\begin{aligned} \tilde{q}^{r,t}_{a,m,s} = \left\{ \begin{array}{ll} q_{r,a,m} \mathbf {1}_{[t-d_{a,m}+1,t]}(s) - q_{r,a,m+1} \mathbf {1}_{[t-d_{a,m+1}+1,t]}(s) &{} \text { if } m < M_a \\ q_{r,a,M_a} \mathbf {1}_{[t-d_{a,M_a}+1,t]}(s) - q_{r,a,1} \mathbf {1}_{[t-d_{a,1},t-1]}(s) &{} \text { if } m = M_a \end{array} \right. \end{aligned}$$

and,

$$\begin{aligned} \mathbf {1}_{[x,y]}(s) = \left\{ \begin{array}{ll} 1 &{} \text { if } x \le s \le y \\ 0 &{} \text { otherwise.} \end{array}\right. \end{aligned}$$

(The derivation of this last constraint from (5) can be found in “Appendix A”).

For \(a \in \mathscr {A}, \; m \in \mathscr {M}_a, \; t < t^-_a\):

$$\begin{aligned} z_{a,m,t} = 0. \end{aligned}$$
(15)

For \(a \in \mathscr {A}, \; m \in \mathscr {M}_a, \; t \ge t^+_a\):

$$\begin{aligned} z_{a,m,t} = 1. \end{aligned}$$
(16)

After the reformulation, if we substitute out the w variables by using linear equalities (12), we obtain what Bienstock and Zuckerberg [6] call a General Precedence Constrained Problem (GPCP):

$$\begin{aligned} Z^* =&\max \quad c'z \end{aligned}$$
(17)
$$\begin{aligned}&\text{ s.t. } \quad z_i \le z_j \qquad \forall (i,j) \in I, \end{aligned}$$
(18)
$$\begin{aligned}&Hz \le h, \end{aligned}$$
(19)
$$\begin{aligned}&z \in \{0,1\}^n \end{aligned}$$
(20)

In this problem constraints (18) correspond to constraints (10), (11), and (13), and constraints (19 20) correspond to constraints (14), (15) and (16).

It is interesting that despite the fact that the General Production Scheduling Problem (GPSP) formulated in Sect. 3 is more general than the scheduling problem formulations presented by Bienstock and Zuckerberg [5], both problems can be reformulated as an instance of GPCP.

5 Methodology

In this section we describe a generalized version of the Bienstock–Zuckerberg (BZ) algorithm, originally introduced in [5, 6]. More specifically, we describe a decomposition algorithm well suited for solving the LP relaxation of large mixed integer programming problems having form,

$$\begin{aligned} \begin{array}{rl} Z^{IP} = \max \ \; &{} c'z + d'u \\ \text{ s.t. } \ \; &{} z_i \le z_j, \qquad \forall (i,j) \in I, \\ &{} Hz + Gu \le h, \\ &{} z \in \{0,1\}^n. \end{array} \end{aligned}$$
(21)

Like Bienstock and Zuckerberg [6] before us, we will call this problem the General Precedence Constrained Problem (GPCP). It should be noted, however, that in our definition of GPCP we consider the presence of the extra u variables. These variables can be used for other modeling purposes. For example, they can be used to model variable capacities, or, in the context of mining problems, they can be used to model the use of stockpiles or other requirements.

Our presentation of the BZ algorithm is different than the original presentation in two respects: first, we cast it as a column generation method, and second, as stated before, we consider the presence of extra variables [the u variables in (21)]. These differences require a new proof of algorithm correctness that we present below. The advantage of presenting the algorithm this way is that it is easier to compare to existing decomposition algorithms, and thus, in our opinion, becomes easier to understand and extend it. It also opens up the possibility of developing new classes of algorithms that might be useful in other contexts.

Before we present the BZ algorithm, we present a generic column generation (GCG) algorithm that can be used as a framework for understanding the BZ algorithm. This column generation scheme can be used to solve a relaxation of mixed integer problems having form:

$$\begin{aligned} \begin{array}{rl} Z^{IP} = \max \ \; &{} c'z + d'u \\ \text{ s.t. } \ \; &{} Az \le b, \\ &{} Hz + Gu \le h, \\ &{} z \in \{0,1\}^n. \end{array} \end{aligned}$$
(22)

Much like DW algorithm, this relaxation can yield bounds that are tighter than those obtained by solving the LP relaxation of the same problem. Throughout this section we will assume, without loss of generality, that \(Az \le b\) includes \(0 \le z \le 1\). Other than this, we make no additional assumption on problem structure.

5.1 A general column generation algorithm

In this section we introduce a General Column Generation (GCG) algorithm that will later motivate the BZ algorithm. This column generation algorithm is presented as a decomposition scheme for computing upper bounds of (22) that are no weaker than those provided by the LP relaxation.

Given a set \(S \subseteq R^n\), let lin.hull(S) denote the linear space spanned by S. That is, let lin.hull(S) represent the smallest linear space containing S. Let \(P = \{ z \in \{0,1\}^n: Az \le b \}\). Define problem,

$$\begin{aligned} \begin{array}{rl} Z^{LIN} = \max \ \; &{} c'z + d'u \\ \text{ s.t. } \ \;&{} Az \le b, \\ &{} Hz + Gu \le h, \\ &{} z \in {\mathtt{lin.hull}}(P), \end{array} \end{aligned}$$
(23)

The GCG algorithm that we present computes a value \(Z^{UB}\) such that \(Z^{IP} \le Z^{UB} \le Z^{LIN}\). Observe that the optimal value \(Z^{LIN}\) of problem (23) is such that \(Z^{IP} \le Z^{LIN} \le Z^{LP}\), where \(Z^{LP}\) is the value of the linear relaxation of problem (22).

The key to solving this problem is the observation that

$$\begin{aligned} {\mathtt{lin.hull}}(P) = \{ z : z = \sum \limits _{i = 1}^d \lambda _i v^i, \; \mathrm{for\,some } \lambda \in \mathbb {R}^n \}, \end{aligned}$$
(24)

where \(\{v^1,\ldots ,v^d\}\) is a basis for lin.hull(P). If V is a matrix whose columns \(\{v^1,\ldots ,v^d\}\) define a basis of lin.hull(P), it follows that problem (23) is equivalent to solving,

$$\begin{aligned} \begin{array}{llll} Z(V) = &{} \max \ &{} c'V \lambda + d'u &{} \\ &{} \text{ s.t. } \ &{} A V \lambda \le b &{} (\alpha ) \\ &{}&{} H V \lambda + G u \le h &{} (\pi ). \end{array} \end{aligned}$$
(25)

In order for the optimal solution of (25) to be optimal for (23), it is not necessary for the columns of V to to define a basis of lin.hull(P). It suffices for the columns of V to span a subset of lin.hull(P) containing an optimal solution of (23). This weaker condition is what the column generation seeks to achieve. That is, starting with a matrix V with columns spanning at least one feasible solution of (23), the algorithm iteratively computes new columns until it computes the optimal value \(Z^{LIN}\). As we will see, in some cases it is possible for the algorithm to terminate before having computed \(Z^{LIN}\). In these cases the algorithm will terminate having computed a value \(Z^{UB}\) such that \(Z^{UB} \le Z^{LIN}\).

Henceforth, let \(\alpha , \pi \) denote the dual variables corresponding to the constraints of problem (25). To simplify notation, we will henceforth assume that \(\alpha \) and \(\pi \) are always row vectors. Note that \(\alpha , \pi \ge 0\).

To solve problem (23) with column generation we begin each iteration with a set of points \(\{v^1, \ldots , v^k\}\) such that lin.hull \((\{v^1, \ldots , v^k\}) \subseteq \) lin.hull(P) contains at least one feasible solution of problem (23). At each iteration we add a linearly independent point \(v^{k+1}\) to this set, until we can prove that we have generated enough points so as to generate the optimal solution of (23).

The first step of each iteration consists in solving problem (25), with a matrix \(V^k\) having columns \(\{v^1, \ldots , v^k\}\). Let \(Z^L_k = Z(V^k)\). Let (\(\lambda ^k, u^k)\) and \((\alpha ^k, \pi ^k)\) be the corresponding optimal primal and dual solutions. Note that \(Z^L_k = \alpha ^k b + \pi ^k h\).

Define \(z^k = V^k \lambda ^k\). It is clear that \((z^k, u^k)\) is feasible for (23); hence, \(Z^L_k \le Z^{LIN}\). However, is \((z^k, u^k)\) optimal for (23)? To answer this question, observe that \((\alpha ,\pi )\) is dual-feasible for problem (25) if and only if \(\pi G = d'\) and,

$$\begin{aligned} \bar{c}(v, \alpha , \pi ) \equiv c' v - \alpha Av - \pi H v = 0, \qquad \forall v \in V. \end{aligned}$$

Let \(V^P\) represent a matrix whose columns \(\{v^1,\ldots ,v^{\small |P|}\}\) coincide with set P. Since \(V^P\) is clearly a generator of lin.hull(P), we know that the optimal solution of (25) (for \(V^P\)) corresponds to an optimal solution of (23). Thus, if \(\bar{c}(v, \alpha ^k, \pi ^k) = 0\) for all \(v \in V^P\) (or equivalently, \(v \in P\)), we conclude that \((z^k, u^k)\) is optimal for (23), since we already know that \(\pi ^k G = d'\). On the other hand, if \((z^k, u^k)\) is not optional for (23), we conclude that there must exist \(v \in P\) such that \(\bar{c}(v, \alpha ^k, \pi ^k) \ne 0\).

This suggests how the column generation scheme for computing an optimal solution of (23) should work. Solve (25) with \(V^k\) to obtain primal and dual solutions \((z^k, u^k)\) and \((\alpha ^k, \pi ^k)\). Keeping with traditional column generation schemes, we refer to problem (25) as the restricted master problem. If \(\bar{c}(v, \alpha ^k, \pi ^k) = 0\) for all \(v \in P\), then \((z^k,u^k)\) is an optimal solution of (23). If this condition does not hold, let \(v^{k+1} \in P\) be such that \(\bar{c}(v^{k+1}, \alpha ^k, \pi ^k) \not = 0\). Note that \(v^{k+1}\) must be linearly independent of the columns in \(V^k\). In fact, every column v of \(V^k\) must satisfy \(\bar{c} (v, \alpha ^k, \pi ^k) = 0\), hence so must any vector that can be generated with this set. We refer to the problem of computing a vector \(v^{k+1}\) with \(\bar{c}(v^{k+1}, \alpha ^k, \pi ^k) \not = 0\) as the pricing problem. Obtain a new matrix \(V^{k+1}\) by adding column \(v^{k+1}\) to \(V^k\), and repeat the process. Given that the rank of \(V^k\) increases strictly with each iteration, the algorithm must finitely terminate.

The pricing problem can be tackled by solving:

$$\begin{aligned} \begin{array}{llll} L(\pi ) = &{} \max \ &{} c' v - \pi (Hv - h) &{} \\ &{} \text{ s.t. } \ &{} Av \le b, &{} \\ &{}&{} v \in \{0,1\}^n &{} \end{array} \end{aligned}$$
(26)

For this, at each iteration compute \(L(\pi ^k)\), and let \(\hat{v}\) be the corresponding optimal solution. Observe that \(L(\pi ^k) \ge Z^{IP}\) for all \(k \ge 1\). In fact, since the u variables are free in problem (25), and since \((\alpha ^k,\pi ^k)\) is dual-feasible in problem (25), we have that \(d' - \pi ^k G = 0\). This implies that problem (26), for \(\pi ^k\), is equivalent to

$$\begin{aligned} \begin{array}{llll} &{} \max \ &{} c' v + d' u - \pi ^k(Hv + G u - h) &{} \\ &{} \text{ s.t. } \ &{} Av \le b, &{} \\ &{}&{} v \in \{0,1\}^n. &{} \end{array} \end{aligned}$$
(27)

Given that \(\pi ^k \ge 0\), problem (27) is a relaxation of problem (22), obtained by penalizing constraints \(Hz + Gu \le h\). Hence \(L(\pi ^k) \ge Z^{IP}\) \(\forall k\ \ge 1\).

Define \(Z^U_k = \min \{ L(\pi ^i) : i = 1,\ldots , k \}\) and note that \(Z^U_k \ge Z^{IP}\), by the argument above. If \(Z^U_k \le Z^L_k\), we can define \(Z^{UB} = Z^U_k\), and terminate the algorithm, as we will have that \(Z^{IP} \le Z^{UB} \le Z^{LIN}\). In fact, since \(Z_k^L \le Z^{LIN}\), we have \(Z^{IP} \le Z^{UB} = Z^U_k \le Z^L_k \le Z^{LIN}\).

On the other hand, if \(Z^U_k > Z^L_k\), then the optimal solution \(\hat{v}\) of (26) is such that \(\bar{c}(\hat{v}, \alpha ^k, \pi ^k) \ne 0\), so we obtain an entering column by defining \(v^{k+1} = \hat{v}\). To see this, note that

$$\begin{aligned} Z^U_k - Z^L_k> 0&\Leftrightarrow c'\hat{v} - \pi ^k (H\hat{v} - h) - \alpha ^k b - \pi ^k h> 0 \\&\Leftrightarrow c'\hat{v} - \pi ^k H\hat{v} - \alpha ^k b> 0 \\&\Rightarrow c' \hat{v} - \pi ^k H \hat{v} - \alpha ^k A \hat{v}> 0 \quad (\mathrm{since } \alpha ^k b \ge \alpha ^k A \hat{v} ) \\&\Rightarrow \bar{c}(\hat{v}, \alpha ^k, \pi ^k) > 0. \end{aligned}$$

Observe that, to be precise, we do not need to know a starting feasible solution in order to solve problem (23). In fact, if we do not have a feasible solution, we can add a variable to the right-hand-side of each constraint \(Hz + Gu \le h\), and heavily penalize it to proceed as in a Phase-I simplex algorithm (see [4]). The first iteration only needs a solution \(v^1\) such that \(Av^1 \le b\).

5.2 Comparison to the Dantzig–Wolfe decomposition algorithm

When solving problems with form (22) a common technique is to use the Dantzig–Wolfe decomposition [12] to compute relaxation values. The Dantzig–Wolfe decomposition (DW) algorithm is very similar to the General Column Generation (GCG) algorithm presented in the previous section. In fact, the DW algorithm computes the optimal value of problem,

$$\begin{aligned} \begin{array}{rl} Z^{DW} = \max \ &{} c'z + d'u \\ \text{ s.t. } \ &{} Az \le b, \\ &{} Hz + Gu \le h, \\ &{} z \in {\mathtt{conv.hull}}(P). \end{array} \end{aligned}$$
(28)

Given that conv.hull \((P) \subseteq \{ z : Az \le b \}\), this problem is typically written as,

$$\begin{aligned} \begin{array}{rl} Z^{DW} = \max \ &{} c'z + d'u \\ \text{ s.t. } \ &{} z \in {\mathtt{conv.hull}}(P), \\ &{} Hz + Gu \le h, \end{array} \end{aligned}$$
(29)

or equivalently, as

$$\begin{aligned} \begin{array}{rl} Z^{DW} = \max \ &{} c'V \lambda + d'u \\ \text{ s.t. } \ &{} \lambda \cdot 1 = 1, \\ &{} H V \lambda + Gu \le h, \\ &{} \lambda \ge 0. \end{array} \end{aligned}$$
(30)

In this formulation, the columns \(\{v^1,\ldots ,v^k\}\) of V correspond to the extreme points of conv.hull(P).

Given that conv.hull \((P) \subseteq \) lin.hull(P), it follows that \(Z^{IP} \le Z^{DW} \le Z^{LIN} \le Z^{LP}\). When \(\mathtt{conv.hull}(P) = \{ z : Az \le b \}\) it is easy to see that \(Z^{DW} = Z^{LIN} = Z^{LP}\). However, the following example shows that all of these inequalities can also be strict:

$$\begin{aligned} \begin{array}{rl} Z^{IP} = \max \ &{} -x_1 + 2x_2 + x_3 \\ \text{ s.t. } \ &{} -x_1 + x_2 \le 0.5, \\ &{} x_1 + x_2 \le 1.5, \\ &{} x_3 \le 0.5, \\ &{} x \in \{0,1\}^3. \end{array} \end{aligned}$$
(31)

By decomposing such that \(Hz + Gu \le h\) corresponds to \(-x_1 + x_2 \le 0.5\) we get,

$$\begin{aligned} \{ z : Az \le b \} = \{ z \in [0,1]^3 : x_1 + x_2 \le 1.5, \; x_3 \le 0.5 \}, \end{aligned}$$

and,

$$\begin{aligned} P = \{ z \in \{0,1\}^n : Az \le b \} = \{ (0,0,0), (0,1,0),(1,0,0) \}, \end{aligned}$$

and so,

$$\begin{aligned} {\mathtt{lin.hull}}(P)&= \{ (x_1,x_2,x_3) : x_3 = 0 \}, \\ {\mathtt{conv.hull}}(P)&= \{ (x_1,x_2,x_3) : 0 \le x_1, \;0\le x_2, \; x_1+x_2 \le 1, \; x_3 = 0 \}. \end{aligned}$$

From this it can be verified that \(z^{IP} = 0.0, \; z^{DW} = 1.25, \; z^{LIN} = 1.5, \) and \(z^{LP} = 2.0\).

The DW algorithm is formally presented in Algorithm 1. The proof of correctness is strictly analogous to the proof presented in Sect. 5.1 for the generic column generation algorithm.

figure a

Observe that the DW pricing problem [see (35)] is exactly the same as the GCG pricing problem. However, since optimizing over \(\{ v: Av \le b, \; v \in \{0,1\} \}\) is exactly the same as optimizing over conv.hull(P), we will have at every iteration \(j \ge 1\) that \(L(\pi ^j) \ge Z^{DW}\). Thus, the bounds will never cross as they might in GCG, and there will be no early termination condition.

The main difference between the algorithms is in the master problem, where both solve very different linear models. One would expect that solving the master problem for the DW decomposition method would be much faster. In fact, let \(r_1\) and \(r_2\) represent the number of rows in the A and H matrices, respectively. The DW master problem has \(r_2 + 1\) rows, compared to the \(r_1 + r_2\) rows in the GCG master problem. However, this is only the case because of the way in which problem (22) is presented. If the first system of constraints, \(Az \le b\), instead has form \(Az = b\), the algorithm can be modified to preserve this property. This equality form version of GCG, which we call GCG-EQ, is presented in “Appendix B”.

The discussion above suggests that for the GCG algorithm to outperform the DW algorithm, it would have to do less iterations. It is natural to expect that this would happen. After all, given a common set of columns, the feasible region of the GCG master problem is considerably larger than the corresponding feasible region of the DW master problem. That is, the dual solutions obtained by solving the GCG master problem should be “better” than those produced by the DW master problem, somehow driving the pricing problem to find the right columns quickly throughout the iterative process. This improved performance, of course, would only compensate on problems in which the DW algorithm has a tough time converging, and could come at the cost of a worse upper bound to the underlying integer programming problem. In fact, this slow convergence rate is exactly what happens in General Production Scheduling Problems (GPSPs), and is what motivates the BZ algorithm in the first place.

As column generation algorithms, the size of the master problem will be increasing by one in each iteration, both for the GCG and DW. At some point the number of columns could grow so large that solving the master problem could become unmanageable. An interesting feature of the DW algorithm is that this can be practically managed by removing columns that are not strictly necessary in some iterations. The key is the following Lemma, which is a direct result of well-known linear programming theory (Bertsimas and Tsisiklis [4]).

Lemma 1

Let \((\lambda ^*, u^*)\) represent an optimal basic solution of problem

$$\begin{aligned} \begin{array}{llll} Z(V) =&{} \max \ &{} c' V \lambda + d' u &{} \\ &{} \text{ s.t. } \ &{} 1 \cdot \lambda = 1 &{} \\ &{}&{} H V \lambda + G u \le h &{} \\ &{}&{} \lambda \ge 0. &{} \end{array} \end{aligned}$$

Then, \(|\{ i : \lambda _i^* \ne 0 \}| \le r_2+1\), where \(r_2\) is the number of rows in matrix H.

In fact, what this Lemma says is that after m iterations, no matter how large m, we can always choose to remove all but \(r_2 + 1\) columns, thus making the problem smaller. As a means of ensuring that the resulting column generation algorithm does not cycle, a reasonable technique would be to remove columns only after the primal bound in the DW algorithm changes. That is, only in those iterations k such that \(Z^L_k > Z^L_{k-1}\).

It should be noted that a variant of this Lemma is also true for the GCG-EQ algorithm (see “Appendix B”). However, there is no similar result for the GCG algorithm. That is, while unused columns can be discarded in GCG, there is no guarantee that the columns remaining in the master problem will be few in number.

5.3 The BZ algorithm

The BZ algorithm, originally proposed by Bienstock and Zuckerberg [5, 6] is a variant of the GCG algorithm that is specialized for General Precedence Constrained Problems (GPCPs). That is, the BZ algorithm assumes that constraints \(Az \le b\) correspond to precedence constraints having form \(z_i - z_j \le 0\), for \((i,j) \in I\), and bound constraints, having form \(0 \le z \le 1\). This assumption allows the BZ to exploit certain characteristics of the optimal master solutions in order to speed up convergence. As we will see, the BZ algorithm is very effective when the number of rows in H and the number of u variables is small relative to the number of z variables. It should be noted, however, that the optimal value of the problem solved by the BZ algorithm will have value \(Z^{BZ} = Z^{LP}\). That is, the bound will be no tighter than that of the LP relaxation. This follows directly from the fact that \(\{ z : z_i \le z_j \quad \forall (i,j) \in I \}\) defines a totally unimodular system.

The BZ algorithm is exactly as the GCG algorithm, with the exception of two changes:

First, the BZ algorithm uses a very specific class of generator matrices \(V^{k}\). These matrices have two main properties. The first is that the linear space spanned by \(V^{k+1}\) will always contain \(z^k\), the optimal solution of the master problem in the previous iteration. The second is that the columns of \(V^{k}\) are orthogonal 0-1 vectors. That is, for every column \(v^q\) of matrix \(V^k\) define \(I^q = \left\{ i \in \{1, \ldots , n\} : v^q_i \not = 0 \right\} \) (note that \(I^q\) describes the support of \(v^q\), for \(q = 1,\ldots ,k\)). Then, 0-1 vectors \(v^q\) and \(v^r\) are orthogonal if and only if their supports are disjoint, i.e., \(I^q\cap I^r = \emptyset \).

An intuitive explanation for why this would be desirable is that, when \(V^k\) has orthogonal 0–1 columns, problem (25) is equivalent to

$$\begin{aligned} \begin{array}{llll} v^* = &{} \max \ &{} c'z + d'u &{} \\ &{} \text{ s.t. } \ &{} Az \le b &{} \\ &{}&{} Hz + Gu \le h &{} \\ &{}&{} z_i = z_j &{} \forall i,j \in I^q, \; \forall q = 1,\ldots ,k. \end{array} \end{aligned}$$
(36)

That is, restricting the original feasible region to the linear space spanned by \(V^k\) is equivalent to equating the variables corresponding to the non-zero entries of each column in \(V^k\). In a combinatorial optimization problem, this resembles a contraction operation, which is desirable because it can lead to a significant reduction of rows and variables, while yet preserving the original problem structure.

The matrix \(V^k\) used in the BZ algorithm is obtained by the following recursive procedure. Let \(\hat{v}\) be the 0-1 vector obtained from solving the pricing problem. Let \([V^k,\hat{v}]\) be the matrix obtained by appending column \(\hat{v}\) to \(V^k\). We would like to compute a matrix \(V^{k+1}\) comprised of orthogonal 0–1 columns such that, first, \(Span(V^{k+1}) \supseteq Span([V^k,\hat{v}])\); and second, such that \(V^{k+1}\) does not have too many columns. This can be achieved as follows. For two vectors \(x,y \in \{0,1\}^n\), define \(x \wedge y \in \{0,1\}^n\) such that \((x \wedge y)_i = 1\) if and only if \(x_i = y_i = 1\). Likewise, define \(x \setminus y \in \{0,1\}^n\) such that \((x \setminus y)_i = 1\) if and only if \(x_i = 1\) and \(y_i = 0\). Assume that \(V^k\) is comprised of columns \(\{v^1,\ldots ,v^r\}\). Let \(V^{k+1}\) be the matrix made of the non-zero vectors from the collection:

$$\begin{aligned} \{ v^j \wedge \hat{v} : 1 \le j \le r \} \cup \{ v^j \setminus \hat{v} : 1 \le j \le r \} \cup \left\{ \hat{v} \setminus \left( \sum _{j=1}^{k} v^j \right) \right\} . \end{aligned}$$

We call this procedure of obtaining the matrix \(V^{k+1}\) from \(V^k\) and \(\hat{v}\) the refining procedure. Note that in each iteration k, the refining procedure will produce a matrix \(V^{k+1}\) with at most \(2r+1\) columns.

The second difference between BZ and GCG is that it introduces a mechanism for preventing an unmanageable growth in the number of columns used in the master problem.

Consider any nonzero vector \(z \in \mathbb R^n\). Let \(\lambda _1,\dots ,\lambda _d\) denote the distinct non-zero values of the components of z, and for \(i=1,\dots ,d\) let \(v^i \in \mathbb R^n\) denote the indicator vector of the components of z equal to \(\lambda _i\), that is, \(v^i\) has components \(v^i_j = 1\) if \(z_j = \lambda _i\), and 0 otherwise. Note that \(1 \le d \le n\) and \(v^1,\dots ,v^d\) are orthogonal 0-1 vectors. Thus we can write \(z = \sum _{i=1}^{d} \lambda _i v^i\) and we say that \(v^1,\dots ,v^d\) define the elementary basis of z.

If at some iteration the number of columns in matrix \(V^k\) is too large, the BZ algorithm will replace matrix \(V^k\) with a matrix whose columns make up an elementary basis of the incumbent master solution \(z^k\).

Again, there are two possible problems with this. On the one-hand, it is possible that such a coarsification procedure leads to cycling. On the other-hand, it might still be the case that after applying coarsification procedure the number of columns remains prohibitively large.

To alleviate the first concern, the BZ algorithm applies the coarsification procedure in exactly those iterations in which the generation of a new column leads to a strict improvement of the objective function. That is, when \(Z^L_k > Z^L_{k-1}\). The second concern is alleviated by the fact that under the BZ algorithm assumptions, the elementary basis associated to a basic feasible solution of problem (36) will always have at most \(r_2\) elements, where \(r_2\) is the number of rows in matrix H (see Bienstock and Zuckerberg [5] for the original proof, or “Appendix C” for the proof adapted to our notation and extra variables).

A formal description of the column generation algorithm that results from the previous discussion is summarized in Algorithm 2.

figure b

5.4 BZ algorithm speedups

In this section we describe a number of computational techniques for improving the performance of the BZ algorithm. The computational techniques that we present for speeding up the master problem are generic, and can be used more generally in the GCG algorithm. However, the speedups that we present for the pricing problem are specific for solving generalized production scheduling problems (GPSPs), and thus, are specific for the BZ algorithm.

5.4.1 Speeding up the BZ pricing algorithm

The pricing problem consists of solving a problem of the form

$$\begin{aligned} \begin{array}{llll} &{} \max \ &{} \bar{c}'z &{} \\ &{} \text{ s.t. } \ &{} z_i \le z_j &{} \forall (i,j) \in I \\ &{}&{} 0 \le z \le 1 &{} \end{array} \end{aligned}$$
(40)

for some objective function \(\bar{c}\), and a set of arcs I. This class of problems are known as maximum closure problems. In order to solve (40), we use the Pseudoflow algorithm of [21]. We use the following two features to speed-up this algorithm in the present context.

Pricing hot-starts [PHS] The pricing problem is solved once per iteration of the BZ algorithm, each time with a different objective function. An important feature of the Pseudoflow algorithm is that it can be hot-started. More precisely, the Pseudoflow algorithm uses an internal data structure called a normalized tree that can be used to re-solve an instance of a maximum closure problem after changing the objective function vector. Rather than solve each pricing problem from scratch, we use the normalized tree obtained from the previous iteration to hot-start the algorithm. Because the changes in the objective function are not componentwise monotonous from iteration to iteration, we need to re-normalize the tree each time. For more information on the detailed working of the Pseudoflow algorithm, with indications on how to incorporate hot-starts, see [21].

Path contractions [PC] Consider problem (40), and define an associated directed acyclic graph \(G = (\mathcal {V}, \mathcal {E})\) as follows. For each variable \(z_i\) define a vertex \(v_i\). For each precedence constraint \(z_i \le z_j\) with \((i,j) \in I\), define a directed arc \((v_i, v_j)\) in \(\mathcal {E}\) . We say that a directed path \(P=(v(1), v(2), \ldots , v(k))\) in G is contractible if it is a maximal path in G such that (i) \(k\ge 3\), and (ii) every internal vertex \(v(2),\ldots ,v(k-1)\) has both in-degree and out-degree equal to one.

Observe that the variables associated to this path can only take \(k+1\) different combinations of 0-1 values: either (i) \(z_{v(i)} = 0\) for \(i=1,\ldots ,k\) and the total contribution of the nodes in P is zero; or else, (ii) there exists an index \(j\in \{1,\dots ,k\}\) such that \(z_{v(i)} = 0\) for all \(i = 1,\ldots ,j-1\) and \(z_{v(i)} = 1\) for all \(i = j,\ldots , k\).

Since (40) is a maximization problem, in an optimal integer solution to (40) the contribution of the path will either be 0 (this will happen when \(z_{v(k)} = 0\)), or the contribution will be \(\max _{1\le j\le k} \sum _{i=j}^k \bar{c}_{v(i)}\) (which will happen when \(z_{v(k)} = 1\)).

This suggests the following arc-contracting procedure.

Let \(j(P,\bar{c}) = \mathrm{argmax}_{1\le j\le k} \sum _{i=j}^k \bar{c}_{v(i)}\) and define a new graph \(\hat{G} = (\hat{\mathcal {V}}, \hat{\mathcal {E}})\) from G by eliminating the vertices \(v(2),\ldots ,v(k-1)\) from \(\mathcal {V}\) and the arcs incident to them, and then adding an arc that connects v(1) and v(k). Define \(\hat{c}\) with \(\hat{c}_v = \bar{c}_v\) for all vertices \(v \notin \{v_1, \ldots , v_k \}\), \(\hat{c}_{v(k)} = \sum _{i=j(P,\bar{c})}^k \bar{c}_{v(i)}\) and \(\hat{c}_{v(1)} = \sum _{i<j(P,\bar{c})} \bar{c}_{v(i)}\).

Solving the pricing problem in this smaller graph \(\hat{G}\) with the objective \(\hat{c}\) gives an optimal solution to the original problem on graph G with objective \(\bar{c}\), with the same optimal objective value.

This procedure can be used to contract all contractible paths in G in order to obtain, in some cases, a significantly smaller graph. In fact, problem instances with multiple destinations and batch constraints induce many contractible paths in the pricing problem graph. This is illustrated with an example in Fig. 1.

Fig. 1
figure 1

Example illustrating the possible effect of the path contraction speed-up. This example assumes a cluster c comprised of activities \(\{a_1,\ldots ,a_n\}\). Each activity is assumed to have M possible modes. The graph includes an arc between all pairs of variables for which there is a precedence relationship. a Before path contraction, b After path contraction.

It should be noted that identifying and contracting paths only needs to be done in the first iteration. In all subsequent runs of the pricing problem, it is possible to use the same contracted graph, after updating the indices \(j(P,\bar{c})\) and the objective function \(\hat{c}\).

5.4.2 Speeding up the BZ master algorithm

The following ideas can be used to speed up solving the master problem (38) in the BZ algorithm:

Starting columns [STCOL] As input, the BZ algorithm requires an initial set of columns inducing a (possibly infeasible) solution of the problem. Such columns can be obtained by computing an elementary basis associated to a solution obtained with heuristics.

Table 1 Description of the different instances of the test set used in our computational study

If, when starting the algorithm, we do not have an initial set of columns describing a feasible the problem, we can proceed as in the Phase-I simplex algorithm. For this, start with some arbitrary set of columns, and add “artificial variables” for each row.

Master hot-starts [MHS] Because of the refining procedure, the restricted BZ master problem (38) in an iteration may be quite different from that in the previous iteration. To reduce its solve time, we can feed any simplex-based solver the solution from the previous iteration so that it can be used to attempt and identify a good feasible starting basis.

k-Step column management [k-Step] In the original BZ algorithm, the coarsification procedure is applied in every iteration where there is a strict increase in the primal objective function value. Sometimes, however, it is better to keep the existing columns so as to improve the convergence rate. To avoid having too many columns in each iteration, we use what we refer to as a k-step column management rule. This rule, which requires as input an integer parameter k, works as follows: assume that we are at the m-th iteration of the BZ algorithm, where \(m > k\). Further assume that in the previous iteration (iteration \(m-1\)) there was a strict increase in the primal objective function value. The column matrix \(V^m\) used in the m-th iteration is built from scratch, by first obtaining an elementary basis associated to solution \(z^{m-k}\), and then successively refining this basis with the vectors \(v^{m-k+1}, v^{m-k+2}, \ldots , v^{m-1}\) (see Sect. 5.3).

6 Computational results

Our computational tests consider solving the linear relaxation of three different classes of problems. Specifically, we consider instances of OPPSP, OPPDP and RCPSP. The OPPSP and OPPDP instances are based on both real and hypothetical mine-planning instances obtained from industry partners, and from the MineLib website [16]. In order to protect the confidentiality of the data sets obtained from our industry partners, we have renamed all of the instances using the names of Chilean volcanoes. In Table 1 we present some characteristics of these instances. Note that for each instance we have a specification regarding the maximum number of time periods to consider. For each of the OPPSP problems we count with two versions: one with with less, and one with more clusters. We refer to these as the fine and course versions of each problem. These clusters correspond to what mining engineers refer to as a bench-phase, or increment (see [22] for formal definitions). We do not have cluster definition data for all of the problem instances, thus, the set of instances considered for OPPSP is a subset of the instances considered for OPPDP. The RCPSP instances that we consider are obtained from PSPlib [25] (datasets j30, j60, j90 and j120) and from [14] (dataset RG300). It should be noted that these problems have a different objective function than the OPPDP and OPPSP problems. That is, these problems seek to minimize Makespan, rather than maximize net present value (NPV).

We evaluate the performance of the BZ algorithm, comparing it to the DW algorithm, and to a version of DW that incorporates the dual-stabilization techniques (DW-S) proposed by Pessoa et al. [37]. For each of the algorithms we attempt to compute the linear relaxation solution of the test problems, stopping early if the relative difference between the master problem bound and the pricing problem bound is less than \(10^{-6}\) (the default reduced cost tolerance used by the CPLEX simplex algorithm). We do not report the bounds obtained by each algorithm since they will all coincide with the value of the LP relaxation for each problem.

Table 2 Comparison between the different algorithms for OPPDP instances

We also evaluate the performance of the speed-up techniques described in this paper. Specifically, we analyze the performance of Pricing hot-starts (PHS), Path contractions (PC), Starting columns (STCOL), Master hot-starts (MHS), and k-step column management (k-Step), with \(k=10\). All of these speed-up techniques, with the exception of k-step (which is not applicable) are also implemented for the DW and DW-S algorithms. In fact, our code uses the exact same implementations of these speed-up techniques for the BZ, DW and DW-S algorithms.

To assess the value of our proposed speed-up techniques we defined a default set of speed-up-techniques to activate for each class of problems considered. For the OPPDP and OPPSP we defined the default speed-up-techniques to be PHS, MHS and PC. We found these to be the features that improved performance of the algorithm, without requiring a heuristic to generate a solution. For the RCPSP class of problems we use a different set of default speed-up options. Since we have to compute a feasible solution to each problem in order to define the number of time periods, we change the default settings so as to use this as a starting solution (STSOL). In addition, we observe that these problem instances have signficantly more time periods per activities when compared to the OPPSP and OPPDP instances. This resulted in a very different algorithm behavior that prompted us to include k-Step column management as a default option, as it improved problem performance.

In order to assess the contribution of each individual speed-up technique in the algorithm, we turned each of these off (or on) to measure the impact from the change relative to that of the default settings.

All algorithms were implemented using C programming language, using CPLEX® 12.6 as optimization solver. The machines were running Linux 2.6.32 under x86_64 architecture, with four eight-core Intel® Xeon® E5-2670 processors and with 128 Gb of RAM. For all results, we present normalized geometric means comparing the performance versus BZ algorithm under default configuration.

6.1 Results for OPPDP

Table 2 shows the time required to solve each problem using the different proposed algorithms. The table also shows the number of iterations required by each algorithm, as well as the final number of columns generated by BZ. We do not report the bound generated by each relaxation, since they all generate the same bound for each instance.

As can be seen, DW is nearly six times slower than BZ. Using stabilization techniques, the DW−S is able to signficantly reduced the time required to reach the optimality condition. In fact, stabilization techniques reduce the time of DW in a 60%, but it is still 2.42 times slower than BZ. This can be explained by the number of iterations required to converge by each algorithm. Even though a BZ iteration is, in average, 2.1 times slower than a DW iteration, DW and DW + S require 11.8 and 4.9 times the number of iterations of BZ to converge. This is possible because the BZ algorithm can produce a large number of useful columns in few iterations. In fact, the number of columns generated by BZ is considerably larger than the number of columns produced by DW and DW−S (the number of columns for these algorithms is equal to the number of iterations). Finally, note that CPLEX is only able to solve the five smallest instances.

In Table 3 we show what happens when we change the tolerance used to define optimality in all of the algorithms. It can be seen that BZ algorithm still outperforms the DW and DW−S algorithms after reducing the target optimality gap to \(10^{-4}\) and \(10^{-2}\). However, the performance difference narrows as the target gap becomes smaller. The resulting times, normalized to the time of BZ under default setting, are presented in Table 3.

Table 3 Normalized time required for different optimality gaps for OPPDP instances

Finally, we study the impact of the different speed-up techniques on BZ. From the default settings, we disable the speed-ups PHS, MHS and PC, one-by-one, and also, we disable all three of them together. We also run BZ under our default setting after adding a starting column (STCOL) generated using the Critical Multiplier algorithm (See [9]), and after enabling the k-Step feature. The resulting times, normalized to the time of BZ under default setting, are presented in Table 4.

Table 4 Normalized time required for BZ algorithm with/without features for OPPDP instances

We can see that all speeding features provide, in average, an improvement on the time required by BZ to converge. However, the individual performance of each feature differs instance by instance. The most important speed-up technique is path contraction (PC), as disabling this feature increases the time required to converge by 60%. Since the reduction in number of variables of this speed-up in OPPDP is proportional to the number of modes of the problem, this feature is particularly important for the Guallatari instance, which has 3 different modes. Disabling these three features, the total time required to converge more than doubles. On the other hand, if we start with a preliminary set of starting columns, the time required is decreased by 32%. Finally, we see that k-Step does not improve the convergence time, making the algorithm run 35% slower. In fact, it makes every problem converge slower, with the exception of the ranokau instance.

6.2 Results for OPPSP

For these instances we use the same default settings as those used for OPPDP. We note that these problems are considerably smaller than the OPPSP problems. This is both in the number of variables and precedence constraints. All algorithms are able to solve these problems in a few hours, with the exception of the ranokau instance, which CPLEX fails to due to the memory limit (128Gb). We present the results in Table 5.

Table 5 Comparison between the different algorithms for OPPSP instances

We can see that in this class of problems the performance of DW and DW−S is more similar to that of BZ. This is probably explained by the fact that the number of iterations required by DW and DW−S is much smaller. Note also that stabilization techniques for DW only marginally reduce the number of iterations required by DW to converge, resulting in that DW−S is 18% slower than DW.

Table 6 Normalized time required for BZ algorithm with/without features for OPPSP instances

Comparing the impact of the different features on BZ (see Table 6), we see that the most important feature is, again, path contraction (PC). Disabling this feauture makes the algorithm run almost 10 times slower. Similarly to the OPPDP problems, providing starting columns to BZ reduces the time by \(23\%\), and introducing k-Step column management makes the problem run slower (\(83\%\)).

6.3 Results for RCPSP instances

In order to formulate the RCPSP instances we need a maximum number of time periods to consider. Since the objective of these problems is to minimize Makespan, the number of time periods should be an upper bound on the number of periods required to schedule all of the activities. Such a bound can be computed by running any heuristic to compute any feasible integer solution. For this purpose, we use a greedy TOPOSORT heuristic [9], which takes fractions of a second to solve for all of our instances.

Table 7 describes the performance of the different algorithms on our RCPSP test instances. Note that we only consider instances that are solved by CPLEX in more than 1 s, obtaining a total of 1575 instances. The running times presented in the table are geometric means over instances in the same dataset.

Table 7 Comparison between the different algorithms for RCPSP instances

Table 7 shows that BZ is again faster than the other algorithms when solving RCPSP instances. In fact, it is 2 times faster than DW + S and 4.7 faster than DW. Note that this difference is particularly large for the j120 instances from PSPLIB repository, where DW and DW + S run 8.4 and 3 times slower, respectively. It would seem that the performance of these algorithms is greatly dependent on problem structure. This can be seen when considering the RG300 instances, which are generated in a different way. In these instances, DW with stabilization is only marginally slower than BZ.

Table 8 shows how much the performance of the BZ algorithm is affected by turning off each of the default speed up features. It is interesting to note that on RCPSP instances the k-Step column management rule is very important. Turning it off makes the problem run, in average, 2.51 times slower. This is in stark contrast to what happens with the OPPSP and OPPDP instances, where activating the k-Step feature actually makes the problem run slower. We speculate that this is due to the fact that the RCPSP problems have a significantly greater number of time periods per activity than the OPPSP and OPPDP instances. This results in a problem with significantly more resource consumption constraints per variable. It is also interesting to note that disabling the PC and MHS features actually makes BZ run faster in the RCPSP instances. We speculate that the MHS feature does not improve performance because, having enabled the k-Step feature as well, succesive master problem formulations significantly differ from each other. We speculate that the PC feature does not improve performance because in RCPSP instances there are not as many paths to contract due to the structure of the precedence graphs. This is due to the fact that there are no clusters and that there is just a single mode per activity.

Table 8 Normalized time required for BZ algorithm with/without features for RCPSP instances

6.4 The way forward

As a final remark, we present some computational experiments illustrating the relevance of efficient methodologies for tackling the LP relaxation of the GPSP. In Table 9 we present bounds on the integrality gaps for a subset of our mine planning instances. These gaps are defined as \(\frac{ub}{lb} - 1.0\), where lb represents the value of the best known integer feasible solution of the problem, and ub the value of the LP relaxation, as computed with the techniques described in this paper. The value lb is computed using the TopoSort rounding heuristic, starting from the solutions obtained by the BZ algorithm, as described by [9]. As can be seen, the LP values provide tight bounds for most of the problem instances, with values averaging \(2\%\) or less, and all values being below \(6\%\). Even though the solutions obtained by the BZ algorithm are fractional for every single one of our instances, a very simple rounding algorithm managed to provide integer-feasible solutions with values surprisingly close to that of the LP relaxation, illustrating the value of quickly computing LP relaxations for the problem. Nonetheless, a few of the instances (namely, ranokau, palomo, and chaiten), present more modest gaps in the OPPSP variants of the problem, suggesting the need for further research. This is a natural motivation for developing a branch-and-cut algorithm, for which the BZ algorithm could play a key role. This is the subject of ongoing work.

Table 9 Integrality gap bound for a subset of mine planning instances

6.5 Concluding remarks

We summarize with three important conclusions. First, the BZ algorithm significantly outperforms DW and DW + S on all GPCP problem classes. Second, the speed-up features proposed in this article significantly improve the performance of the BZ algorithm. Third, the algorithm and speed-up performances greatly depend on the problem classes that are being considered. These conclusions suggest that the BZ algorithm with the proposed speed-ups is a technique that can be used to effectively be used to compute the LP relaxation solution of different classes of scheduling problems. Such an algorithm might be useful as a means to provide upper bounds for scheduling problems, or as a part of the many rounding heuristics that have recently been proposed, or eventually, as part of a branch-and-bound solver. In addition, they suggest that the BZ algorithm’s potential use for other classes of problems should also be further studied.