1 Problem statement

We are interested in computing solutions for two-stage stochastic mixed-integer programs (SMIPs) of the form

$$\begin{aligned} z := \min _x \left\{ c^T x + \mathcal {Q}(x) :\; A x \ge b,\; x \in X \right\} , \end{aligned}$$
(1)

where \(\mathcal {Q}(x) = \mathbb {E}_\xi \phi (h(\xi ) - T(\xi )x)\) is the recourse function and

$$\begin{aligned} \phi (t) := \min _y\; \{q(\xi )^T y :\; W(\xi ) y \ge t, \; y \in Y\}, \; t \in \mathbb {R}^{m_2}. \end{aligned}$$
(2)

We assume that the random parameter \(\xi \) follows a discrete distribution with finite support \(\{\xi _1, \dots , \xi _r\}\) and corresponding probabilities \(p_1,\dots ,p_r\) (continuous distributions can be handled by using a sample-average approximation [32]). The sets \(X \subseteq \mathbb {R}^{n_1}\) and \(Y \subseteq \mathbb {R}^{n_2}\) represent integer or binary restrictions on a subset of the decision variables x and y, respectively. The first-stage problem data comprise \(A \in \mathbb {R}^{m_1\times n_1}\), \(b\in \mathbb {R}^{m_1}\), and \(c\in \mathbb {R}^{n_1}\). The second-stage data are given by \(T(\xi _j)\in \mathbb {R}^{m_2 \times n_1}\), \(W(\xi _j)\in \mathbb {R}^{m_2 \times n_2}\), \(h(\xi _j) \in \mathbb {R}^{m_2}\), and \(q(\xi _j) \in \mathbb {R}^{n_2}\) for \(j=1,\dots ,r\). For simplicity, we use the notation \((T_j, W_j, h_j, q_j)\) for \(j=1,\dots ,r\) to represent the scenario data.

The SMIP (1) can be rewritten in the extensive form

$$\begin{aligned} z = \min _{x_j,y_j} \quad&\sum _{j=1}^r p_j \left( c^T x_j + q_j^T y_j \right) \end{aligned}$$
(3a)
$$\begin{aligned} \text {s.t.} \quad&\sum _{j=1}^r H_j x_j = 0 \end{aligned}$$
(3b)
$$\begin{aligned}&(x_j, y_j) \in G_j, \quad \forall j=1,\dots ,r, \end{aligned}$$
(3c)

where the scenario feasibility set is defined as

$$\begin{aligned} G_j := \{(x_j, y_j) \;:\; A x_j \ge b, \; T_j x_j + W_j y_j \ge h_j, \; x_j \in X, \; y_j \in Y\}. \end{aligned}$$
(4)

The nonanticipativity constraints in (3b) represent the equations \(x_1 = x_r\) and \(x_j = x_{j-1}\) for \(j=2,\dots ,r\), and \(H_j\) is a suitable \(r\cdot n_1 \times n_1\) matrix. We assume that SMIP does not necessarily have relatively complete recourse. We recall that, without this property, there can exist an \(\hat{x}\in X\) satisfying \(A\hat{x} \ge b\) for which there does not exist a recourse \({y}\in \mathbb {R}^{m_2}\) satisfying \((\hat{x}, y) \in G_j\) for some j. In other words, not every choice of the first-stage variables is guaranteed to have feasible recourse for all scenarios.

Different techniques exist for solving SMIPs; Benders decomposition (also known as the L-shaped method) can be applied for problems in which integer variables appear only in the first stage [6, 33, 50]. When integer variables appear in the second stage, the recourse value function is nonconvex and discontinuous in the first-stage variables; consequently, more sophisticated approaches are needed. These include convexification of the recourse function (e.g., [16, 30, 51, 52, 59]) or specialized branch-and-bound schemes (e.g., [3, 48]). These approaches are not able to tackle the general SMIP (1), except [48] that may not be computationally practical. For instance, convexification techniques are based on either finite cutting-plane methods (i.e., lift-and-project cuts [52], Gomory cuts [16, 59], and no-good cuts [2]) or disjunctive programming [51]. Consequently, these approaches are limited to certain problem classes (i.e., mixed-binary second-stage variables [52], pure-integer variables [2, 16, 59], integral technology matrix [52], and special second-stage matrix structures [30]). For example, the method proposed in [2] cannot be applied for some of the standard benchmark problems (the dcap problem instances) used in this paper.

Carøe and Schultz [8] proposed a dual decomposition method for SMIPs. Dual decomposition solves a Lagrangian dual problem by dualizing (relaxing) the nonanticipativity constraints to obtain lower bounds. Such lower bounds are often tight and can be used to guide the computation of feasible solutions (and thus upper bounds) using heuristics or branch-and-bound procedures. Tarhan and Grossmann [55] applied mixed-integer programming sensitivity analysis [10] within a dual decomposition method to improve bounds during the solution, and this approach was shown to reduce the number of iterations. A crucial limitation of dual decomposition is that it is not guaranteed to recover feasible solutions for problems without (relatively) complete recourse, which often appear in applications. As a result, the method may not be able to obtain an upper bound, and it is thus difficult to estimate the optimality gap and stop the search.

Dual variables can be updated in a dual decomposition method by using subgradient methods [2, 8, 45, 46, 55], cutting-plane methods [38], or column-generation methods [38, 40]. The efficiency of dual decomposition strongly depends on the update scheme used, and existing schemes are limited. For instance, it is well known that cutting-plane schemes can lead to strong oscillations of the dual variables, and subgradient schemes require ad-hoc step-length selection criteria and can exhibit slow convergence [12, 43].

Progressive hedging is a popular and flexible method for solving SMIPs and can handle problems with mixed-integer recourse [9, 26, 36, 56]. Connections between progressive hedging and dual decomposition have also been established recently to create hybrid strategies [26]. Progressive hedging has no convergence guarantees. Consequently, it is often used as a heuristic to find approximate solutions. Moreover, this method is not guaranteed to find a feasible solution for problems without relatively complete recourse.

Few software packages are available for solving large-scale stochastic programs. SMI [31] is an open-source software implementation that can read SMPS files and solve the extensive form of the problem by using COIN-OR solvers such as Cbc [13]. FortSP is a commercial solver that implements variants of Benders decomposition [44, 60]. The C package ddsip [41] implements dual decomposition for SMIPs and uses the ConicBundle [28] package for updating dual variables. This package was unable to solve many small-size SIPLIB instances [38]. Moreover, ddsip does not support parallelism and does not support model specification through SMPS files and algebraic modeling languages. PIPS provides a parallel interior-point method for solving continuous stochastic programs and provides a basic implementation of the dual decomposition method for solving SMIPs [38, 39]. PySP is a Python-based open-source software package that can model and solve SMIPs in parallel computing environments by using progressive hedging and Benders decomposition [57].

Open source software packages are also available to decompose general MIPs. GCG is a generic branch-and-cut solver based on Dantzig–Wolfe decomposition [17]. DIP implements Dantzig–Wolfe decomposition and Lagrangian relaxation [47]. Both packages can automatically detect decomposition structure of MIP problem and find an optimal solution using branch-and-cut and branch-and-price methods. Hence, these solvers can potentially be used for solving SMIP problems. However, none of these packages provide parallel implementations.

In this work, we present algorithmic innovations for the dual decomposition method. We develop a procedure to generate Benders-like valid inequalities (i.e., Benders feasibility and optimality cuts) that tighten Lagrangian subproblems and that aid the exclusion of infeasible first-stage solutions. The Benders-like cuts are derived from Lagrangian subproblem solutions and shared among subproblems. We demonstrate that this procedure enables us to accelerate solutions and to find upper bounds for problems without relatively complete recourse. We use an interior-point cutting-plane method with new termination criteria for solving the Lagrangian master problem. We prove that the dual decomposition algorithm terminates finitely even without solving the master problem to optimality using the interior point method. We demonstrate that this approach reduces oscillations of the dual variables and ultimately aids convergence. We also introduce DSP, an open-source, object-oriented, parallel software package that enables the implementation and benchmarking of different dual decomposition strategies and of other standard techniques such as Benders decomposition. DSP provides interfaces that can read models expressed in C code and the SMPS format [5, 18]. DSP can also read models expressed in JuMP [37], a Julia-based algebraic modeling package, which can be used to represent large-scale stochastic programs using a compact syntax. We benchmark DSP using SIPLIB instances and large-scale unit commitment problems with up to 1,700,000 rows, 583,200 columns, and 28,512 integer variables on a 310-node computing cluster at Argonne National Laboratory. We demonstrate that our innovations yield important improvements in robustness and solution time.

The contributions of this work can be summarized as follows.

  • We develop a procedure to generate the Benders-like valid inequalities in the dual decomposition framework.

  • We use an interior-point cutting-plane method with new termination criteria for solving the Lagrangian master problem, which allows the finite termination of the dual decomposition even without solving the master problem to optimality.

  • We develop an open-source software package DSP that implements several decomposition methods capable of running on high-performance computing systems by using the MPI library.

The paper is structured as follows. In Sect. 2 we present the standard dual decomposition method and discuss different approaches for updating dual variables. In Sect. 3 we present our algorithmic innovations. Here, we first present valid inequalities for the Lagrangian subproblems to eliminate infeasible first-stage solutions and to tighten the subproblems (Sect. 3.1). We then present an interior-point cutting-plane method and termination criteria for the master problem (Sect. 3.2). In Sect. 4 we describe the design of DSP and the modeling interfaces. In Sect. 5 we present our benchmark results. In Sect. 6 we summarize our innovations and provide directions for future work.

2 Dual decomposition

We describe a standard dual decomposition method for two-stage SMIPs of the form (3). We apply a Lagrangian relaxation of these constraints to obtain the Lagrangian dual function of (3):

$$\begin{aligned} D(\lambda ) := \min _{x_j,y_j} \left\{ \sum _{j=1}^r L_j(x_j,y_j,\lambda ) : (x_j,y_j) \in G_j,\; \forall j=1,\dots ,r \right\} , \end{aligned}$$
(5)

where

$$\begin{aligned} L_j(x_j,y_j,\lambda ) := p_j \left( c^T x_j + q_j^T y_j\right) + \lambda ^T (H_j x_j). \end{aligned}$$
(6)

Here, \(\lambda \in \mathbb {R}^{r\cdot n_1}\) are the dual variables of the nonanticipativity constraints (3b). For fixed \(\lambda \), the Lagrangian dual function can be decomposed as

$$\begin{aligned} D(\lambda ) = \sum _{j=1}^r D_j(\lambda ), \end{aligned}$$
(7)

where

$$\begin{aligned} D_j(\lambda ) := \min _{x_j,y_j} \left\{ L_j(x_j,y_j,\lambda ) : (x_j,y_j) \in G_j \right\} . \end{aligned}$$
(8)

We thus seek to obtain the best lower bound for (3) by solving the maximization problem (the Lagrangian dual problem):

$$\begin{aligned} z_\text {LD} := \max _{\lambda } \sum _{j=1}^r D_j(\lambda ). \end{aligned}$$
(9)

Proposition 1 is a well known result of Lagrangian relaxation that shows the tightness of the lower bound \(z_\text {LD}\) [19].

Proposition 1

The optimal value \(z_\text {LD}\) of the Lagrangian dual problem (9) equals the optimal value of the linear program (LP):

$$\begin{aligned} \min _{x_j,y_j} \left\{ \sum _{j=1}^r p_j \left( c^T x_j + q_j^T y_j\right) \right. :&\;(x_j,y_j) \in \text {conv}(G_j),\nonumber \\&\left. \sum _{j=1}^r H_j x_j = 0,\; \forall j=1,\dots ,r \right\} , \end{aligned}$$
(10)

where \(\text {conv}(G_j)\) denotes the convex hull of \(G_j\). Moreover, \(z_\text {LD} \ge z_\text {LP}\) holds, where \(z_\text {LP}\) is the optimal value of the LP relaxation of SMIP (3).

We highlight that the solution of the maximization problem (9) provides only a lower bound for SMIP (albeit this one is often tight). When the problem at hand has no duality gap, the solution of (9) is feasible for SMIP and thus optimal. When this is not the case, an upper bound \(z_\text {UB}\) for SMIP may be obtained and refined by finding feasible solutions during the dual decomposition procedure. Finding the best upper bound and thus computing the actual duality (optimality) gap \(z_\text {UB}-z_\text {LB}\) can be done by performing a branch-and-bound search, but this is often computationally prohibitive.

The dual decomposition method can be seen as a primal multistart heuristic procedure [15, 38]. With different objective functions \(L_j(x_j,y_j,\lambda )\), the dual decomposition finds multiple primal solutions \((x_j,y_j) \in conv(G_j)\) at each iteration. Particularly for stochastic programs with pure integer first-stage variables, the method can effectively find an optimal solution (e.g., [2]). Note, however, that the original problem (3) may be infeasible for all candidates \(x_j\) if the problem does not have relatively complete recourse. We provide a remedy for this in Sect. 3.

2.1 Dual-search methods

In a dual decomposition method, we iteratively search for dual values \(\lambda \) that maximize the Lagrangian dual function (5). We now present a conventional subgradient method and a cutting-plane method to perform such a search.

2.1.1 Subgradient method

Subgradient methods have been widely used in nonsmooth optimization. We describe a conventional method with a step-size rule described in [11]. Let \(\lambda ^k\) be the dual variable at iteration \(k\ge 0\), and let \(x_j^k\) be an optimal solution of (8) for given \(\lambda ^k\). The dual variable is updated as

$$\begin{aligned} \lambda ^{k+1} = \lambda ^k - \alpha _k \sum _{j=1}^r H_j x_j^k, \end{aligned}$$
(11)

where \(\alpha _k\in (0,1]\) is the step size. This method updates the duals by using a subgradient of \(D(\lambda )\) at \(\lambda ^k\), denoted by \(\sum _{j=1}^r H_j x_j^k\). The step size \(\alpha _k\) is given by

$$\begin{aligned} \alpha _k := \beta _k \frac{ z_\text {UB} - D(\lambda ^k)}{\left\| \sum _{j=1}^r H_j x_j^k\right\| _2^2}, \end{aligned}$$
(12)

where \(z_\text {UB}\) is the objective value of the best-known feasible solution to (1) up to iteration k and \(\beta _k\) is a user-defined positive scalar. Algorithm 1 summarizes the procedure.

figure a

Algorithm 1 is initialized with user-defined parameters \(\lambda ^0, \gamma ^\text {max}\), and \(\beta _0\) and reduces \(\beta _k\) by a half when the best lower bound \(z_\text {LB}\) is not improved for the last \(\gamma ^\text {max}\) iterations (lines 8–10). The best upper bound \(z_\text {UB}\) can be obtained by solving (3) for fixed \(x_j^{k}\) (line 12). An important limitation of subgradient methods is that enforcing convergence using line-search schemes can be computationally impractical, since each trial step requires the solution of the Lagrangian subproblems [11]. Therefore, heuristic step-size rules are typically used.

2.1.2 Cutting-plane method

The cutting-plane method is an outer approximation scheme that solves the Lagrangian dual problem by iteratively adding linear inequalities. The outer approximation of (9) at iteration k is given by the Lagrangian master problem:

$$\begin{aligned} m_k := \max _{\theta _j,\lambda } \quad&\sum _{j=1}^r \theta _j \end{aligned}$$
(13a)
$$\begin{aligned} \text {s.t.} \quad&\theta _j \le D_j(\lambda ^l) + \left( H_j x_j^l\right) ^T (\lambda - \lambda ^l), \; j=1,\dots ,r, \; l=0,1,\dots ,k. \end{aligned}$$
(13b)

The dual variable \(\lambda ^{k+1}\) is obtained by solving the approximation (13) at iteration k. We define the primal-dual solution of the Lagrangian master problem as the triplet \((\theta ,\lambda ,\pi )\). Here, \(\theta :=(\theta _1,...,\theta _r)\) and \(\pi :=(\pi _1^0,...,\pi _1^k,...,\pi _r^0,...,\pi _r^k)\), where \(\pi _j^l\) are the dual variables of (13b). The procedure is summarized in Algorithm 2.

figure b

The function \(D_j(\lambda )\) is piecewise linear concave in \(\lambda \) supported by the linear inequalities (13b). Assuming that the master problem (13) and the subproblem (8) can be solved to optimality, Algorithm 2 terminates with an optimal solution of (9) after a finite number of steps because the number of linear inequalities required to approximate \(D(\lambda )\) is finite. This gives the cutting-plane method a natural termination criterion (i.e., \(m_{k-1} \le D(\lambda ^{k})\)). In other words, this criterion indicates that \(m_{k-1}\) matches the Lagrangian dual function \(D(\lambda ^k)\) and thus the maximum of the Lagrangian master problem matches the maximum of the Lagrangian dual problem.

Remark 1

Instead of adding the linear inequalities (13b) for each \(s\in \mathcal {S}\), one can add a single aggregated cut

$$\begin{aligned} \sum _{j=1}^r \theta _j \le \sum _{j=1}^r D_j(\lambda ^l) + \left( \sum _{j=1}^r H_j x_j^l\right) ^T (\lambda - \lambda ^l) \end{aligned}$$
(14)

per iteration \(l=0,1,\dots ,k\). While the convergence will slow using aggregated cuts, the master problem can maintain a smaller number of constraints.

Remark 2

The column generation method [38, 40] is a variant of the cutting-plane method that solves the dual of the Lagrangian master problem (13). Lubin et al. [38] demonstrated that the dual of the Lagrangian master has a dual angular structure that can be exploited by using decomposition methods.

3 Algorithmic innovations for dual decomposition

We now develop innovations for the dual decomposition method based on cutting planes. In Sect. 3.1 we present a procedure to construct Benders-like valid inequalities that aid the elimination of infeasible first-stage solutions and that tighten the Lagrangian subproblems. As a byproduct, the procedure enables us to obtain upper bounds for SMIP (3). In Sect. 3.2 we present an interior-point method and termination criteria to stabilize the solutions of the Lagrangian master problem (13).

3.1 Tightening inequalities for subproblems

We consider two cases in which a set of valid inequalities can be generated to exclude a subset of (suboptimal) solutions for the subproblems. In the first case we consider a feasible Lagrangian subproblem solution that is infeasible for SMIP.

Proposition 2

Let \((\hat{x},\hat{y})\in G_j\) for some j, and assume that for some scenario \(j'\) there does not exist \(y\in \mathbb {R}^{n_2}\) such \((\hat{x}, y) \in G_{j'}\) for fixed \(\hat{x}\). Let \(\mu _{j'}\) be an optimal solution of the LP

$$\begin{aligned} \max _{\mu } \left\{ \mu ^T \left( h_{j'} - T_{j'} \hat{x} \right) : \mu ^T W_{j'} = 0, \; 0 \le \mu \le 1 \right\} . \end{aligned}$$
(15)

The inequality

$$\begin{aligned} \mu _{j'}^T \left( h_{j'} - T_{j'} x \right) \le 0 \end{aligned}$$
(16)

excludes \(\hat{x}\) from the set \(\{x : (x,y) \in G_{j'}\}\) and is also valid for SMIP (3).

Proof

From Farkas’ lemma, there exists a \(\mu _{j'} \in \mathbb {R}^{m_2}\) such that \(\mu _{j'}^T W_{j'} = 0\) and \(\mu _{j'}^T (h_{j'} - T_{j'} \hat{x}) > 0\), and thus the hyperplane \(\mu _{j'}^T \left( h_{j'} - T_{j'} x \right) \le 0\) separates \(\hat{x}\) from \(\{x : (x,y) \in G_{j'}\}\). Moreover, this is valid for \(G_s\) and thus for SMIP (3). \(\square \)

In the second case we consider a feasible subproblem solution that is also feasible with respect to SMIP. For this case, we present a set of valid inequalities that can tighten the subproblems by using an upper bound \(z_\text {UB}\) of SMIP.

Proposition 3

Assume that for fixed \(\hat{x}\), for all \(j=1,\dots ,r\) there exists \(y\in \mathbb {R}^{n_2}\) such that \((\hat{x},y) \in G_{j}\). Let \(\pi _j\) be the optimal solution of the following recourse problem for each \(j=1,\dots ,r\):

$$\begin{aligned} \max _{\pi } \left\{ \pi ^T \left( h_j - T_j \hat{x} \right) : \pi ^T W_j = q_j, \; \pi \ge 0 \right\} . \end{aligned}$$
(17)

The inequality

$$\begin{aligned} c^T x + \sum _{j=1}^r \pi _j^T \left( h_j - T_j x \right) \le z_\text {UB} \end{aligned}$$
(18)

is valid for SMIP (3).

Proof

Consider a Benders-like decomposition of SMIP with relaxation of second-stage integrality,

$$\begin{aligned} \min _x \quad&c^T x + \sum _{j=1}^r p_j q_j^T \hat{y}_j(x)\\ \text {s.t.} \quad&Ax \ge b, \; x \in X, \end{aligned}$$

where \(\hat{y}_j(x) := \text {argmin} \{ q_j^T y : W_j y \ge h_j - T_j x \}\). This is equivalent to

$$\begin{aligned} \min _x \quad&c^T x + \sum _{j=1}^r p_j \hat{\pi }_j(x)^T (h_j - T_j x)\\ \text {s.t.} \quad&Ax \ge b, \; x \in X, \nonumber \end{aligned}$$

where \(\hat{\pi }_j(x) := \text {argmax} \{ \pi ^T (h_j - T_j x) : \pi ^T W_j = q_j, \; \pi \ge 0\}\). By assumption, there exists a solution \(\pi _s\) for (17) for each \(j=1,\dots ,r\). Because

$$\begin{aligned} c^T x + \sum _{j=1}^r p_j \pi _j^T \left( h_j - T_j x \right)&\le c^T x + \sum _{j=1}^r p_j^T \hat{\pi }_j(x)^T (h_j - T_j x) \\&\le z_\text {UB} \end{aligned}$$

holds for any feasible x, the inequality (18) is valid for SMIP (3). \(\square \)

The inequalities (16) and (18) are feasibility cuts and optimality cuts, respectively, obtained from Benders decomposition of the scenario subproblems for each \(j=1,\dots ,r\). These Benders-like cuts are generated by using a relaxation of second-stage integrality. Consequently the cuts may be loose and are not guaranteed to exclude \(\hat{x}\) or tighten the objective value. Computational experiments provided in Sect. 5, however, provide evidence that the inequalities generated by Proposition 2 are effective in practice.

Procedure 1 summarizes the proposed cutting-plane procedure for solving the Lagrangian subproblems (8) by adding the valid inequalities (16) and (18). This procedure replaces line 3 of the standard dual decomposition method of Algorithm 2. In other words, Procedure 1 is called in each iteration in order to generate feasibility and optimality cuts (16) and (18).

figure c

We explain the proposed cutting-plane procedure as follows. The situation stated in Proposition 2 can occur when SMIP does not have relatively complete recourse. Because the inequality (16) is analogous to the feasibility cut of the L-shaped method [6], we call it a feasibility cut. This cut eliminates a candidate first-stage solution \(\hat{x}\) that does not have a feasible recourse for the subproblem. We highlight, however, that the inequality (16) can be added only when the LP relaxation of the subproblem (8) is infeasible. We emphasize that the inequalities (16) and (18) generated in lines 6 and 14, respectively, are added to all the Lagrangian subproblems (lines 8 and 15). In other words, the inequalities generated for one scenario are shared with all the other scenarios. This approach seeks to prevent a given subproblem from visiting a first-stage solution that is infeasible for another subproblem.

The inequality (18) of Proposition 3 is a supporting hyperplane that lower approximates the objective function of SMIP (3). Moreover, the inequality is parameterized by the best-known upper bound \(z_\text {UB}\) and thus can be tightened as better upper bounds are obtained. In other words, the optimality cut seeks to eliminate first-stage solutions that go above a known upper bound. We call the inequality (18) an optimality cut, because it is analogous to the optimality cut of the L-shaped method [6]. We also note that the same optimality cut is used for all the subproblems.

One can easily prove that Procedure 1 terminates in a finite number of steps by showing that only a finite number of feasibility cuts (16) are generated. This holds because a finite number of bases exist in each one of the cut generation problems (15). Note also that (15) is an LP, where the objective function is parameterized by \(\hat{x}\), and the corresponding bases do not change in \(\hat{x}\). Consequently, only a finite number of cuts can be generated in the loop of lines 2–12 before finding a feasible solution (i.e., isFeasible = true in line 12).

3.2 Interior-point cutting-plane method for the Lagrangian master problem

The simplex method is an efficient algorithm for solving the Lagrangian master problems (13). This efficiency is due to its warm-starting capabilities [7]. The solutions of the Lagrangian master problem, however, oscillate significantly when the epigraph of the Lagrangian dual function is not well approximated (because many near-optimal solutions of the master problem are present) [21, 43]. The oscillations can in turn slow convergence (we illustrate this behavior in Fig. 6 of Sect. 5). To avoid this situation, we solve the Lagrangian master problems suboptimally using an interior-point method (IPM). As noted in [43], early termination on an IPM can enable us to find stronger cuts and to avoid degeneracy. The use of an IPM with early termination has been applied in the context of cutting-plane and column-generation methods [21, 23, 25, 43]. Our IPM implementation follows the lines of the work of [43], which uses Mehrotra’s primal-dual predictor corrector method to deliver suboptimal but strictly feasible solutions of the master problem. Our implementation specializes the relative duality gap criterion of [23] to a dual decomposition setting. In addition to satisfying this criterion, the basic requirement for any IPM used is that it returns a feasible point for the master problem. From a practical standpoint, we also require that the IPM delivers well-centered iterates (i.e., which have well-balanced complementarity products) [22]. This requirement guarantees that the computed solution is an approximate analytic center that is not too close to the boundary of the primal-dual feasible set and ensures that the oscillations of the dual solutions will be relatively small [23]. We also propose a new termination criterion specific to a dual decomposition setting that uses upper-bound information to determine whether the IPM should be terminated.

The IPM checks the termination criteria in the following order:

$$\begin{aligned}&\sum _{j=1}^r \theta _j^k \ge z_\text {UB}, \end{aligned}$$
(19a)
$$\begin{aligned}&g_k(\theta ^k,\pi ^k) < \epsilon _\text {IPM}^k. \end{aligned}$$
(19b)

Here, we define the relative duality gap of the primal-dual feasible solution \((\theta ,\lambda ,\pi )\) of the master (13) at iteration k as

$$\begin{aligned} g_k(\theta ,\pi ) := \frac{\sum _{j=1}^r \sum _{l=1}^k \pi _j^l \left( D_j(\lambda ^l) - (H_j x_j^l)^T \lambda ^l \right) - \sum _{j=1}^r \theta _j}{1 + \left| \sum _{j=1}^r \theta _j\right| }. \end{aligned}$$
(20)

This quantity is the difference between the dual and primal objectives of the master problem scaled by the primal objective. We denote \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) as a primal-dual feasible solution of the master (13) obtained at iteration k such that \(g_k(\tilde{\theta }^k,\tilde{\pi }^k) < \epsilon _\text {IPM}^k\) for some duality gap tolerance \(\epsilon _\text {IPM}^k > 0\) or \(\sum _{j=1}^r \tilde{\theta }_j^k \ge z_\text {UB}\) for the current upper bound \(z_\text {UB} < \infty \).

We adjust the tolerance \(\epsilon _\text {IPM}^k\) at each iteration of the dual decomposition procedure. The tolerance \(\epsilon _\text {IPM}^k\) can be made loose when the duality gap of the dual decomposition method is large, and it is updated as follows:

$$\begin{aligned} \epsilon _\text {IPM}^k := \min \left\{ \epsilon _\text {IPM}^{max}, \frac{g_{k-1}(\tilde{\theta }^{k-1},\tilde{\pi }^{k-1})}{\delta } + \frac{\tilde{m}_{k-1} - \sum _{j=1}^r D_j(\tilde{\lambda }^{k-1})}{1 + \left| \tilde{m}_{k-1}\right| } \right\} , \end{aligned}$$
(21)

where \(\tilde{m}_{k-1} := \sum _{j=1}^r \tilde{\theta }_j^{k-1}\), \(\epsilon _\text {IPM}^{max}\) is the maximum tolerance, and parameter \(\delta > 1\) is the degree of optimality (defined as in [23]). Parameter \(\delta \) controls the tolerance \(\epsilon _\text {IPM}^k\) relative to the duality gap \(g_k(\theta ,\pi )\). This tolerance control (21) mechanism is different from that defined in [23]. In particular, the tolerance update in [23] depends on parameter \(\kappa \) specifically defined for a particular problem instance, whereas (21) does not require such a parameter.

At first sight it seems possible that Algorithm 2 may not generate any cut for the Lagrangian master problem because the solution of the master is terminated early. The following propositions show, however, that Algorithm 2 does not stall and eventually generates cutting planes for the Lagrangian master problem or terminates with optimality.

Proposition 4

Let \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) be a feasible suboptimal solution of the master (13) satisfying \(g_{k}(\tilde{\theta }^k,\tilde{\pi }^k) < \epsilon _\text {IPM}^{k}\) at iteration k with \(\epsilon _{\text {IPM}}^k\) defined in (21). If \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) for all \(j=1,\dots ,r\), then \(\epsilon _\text {IPM}^{k+1} \le g_{k}(\tilde{\theta }^k,\tilde{\pi }^k) < \epsilon _\text {IPM}^{k}\).

Proof

Suppose that

$$\begin{aligned} \epsilon _\text {IPM}^{k+1} = \frac{g_{k}(\tilde{\theta }^{k},\tilde{\pi }^{k})}{\delta } + \frac{\tilde{m}_{k} - \sum _{j=1}^r D_j(\tilde{\lambda }^{k})}{1 + \left| \tilde{m}_{k}\right| } < \epsilon _\text {IPM}^{max}. \end{aligned}$$

Because \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) holds for all \(j=1,\dots ,r\), we have \(\tilde{m}_j^{k} - \sum _{j=1}^r D_j(\tilde{\lambda }^{k}) = \sum _{j=1}^r \left( \tilde{\theta }_j^{k} - D_j(\tilde{\lambda }^{k}) \right) \le 0\) and \(\epsilon _\text {IPM}^{k+1} \le g_{k}(\tilde{\theta }^k,\tilde{\pi }^k) / \delta< g_{k}(\tilde{\theta }^k,\tilde{\pi }^k) < \epsilon _\text {IPM}^{k}\). \(\square \)

Note that \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) in Proposition 4 are feasible solutions of the Lagrangian master problem at any iteration k. Proposition 4 also states that the relative tolerance is decreased at each iteration. This implies that a feasible solution at iteration k will not be the same as that at iteration \(k+1\) obtained with the new tolerance \(\epsilon _\text {IPM}^{k+1}\) (even if no cut was generated at iteration k). In fact, the next iteration finds a feasible solution that, if not optimal, is closer to optimality. We also highlight that the proposition requires feasibility of the subptimal solution of the master problem. This requirement is fulfilled in practice by using an IPM that delivers a feasible solution at each iterate.

We are interested only in feasible solutions satisfying \(\sum _{j=1}^r \tilde{\theta }_j < z_{UB}\) for a given \(z_{UB} < \infty \). Consequently, the IPM can be terminated with a feasible solution \((\tilde{\theta },\tilde{\lambda }, \tilde{\pi })\) whenever \(\sum _{j=1}^r \tilde{\theta }_j \ge z_{UB}\).

Proposition 5

Let \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) be a feasible solution of the master (13) satisfying \(\sum _{j=1}^r \tilde{\theta }_j^k \ge z_\text {UB}\) at iteration k for a given \(z_\text {UB} < \infty \). If \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) for all \(j=1,\dots ,r\), then \(\sum _{j=1}^r D_j(\tilde{\lambda }^k) = z_\text {UB}\).

Proof

Because \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) holds for all \(j=1,\dots ,r\), we have

$$\begin{aligned} z_\text {UB} \le \sum _{j=1}^r \tilde{\theta }_j^k \le \sum _{j=1}^r D_j(\tilde{\lambda }^{k}) \le z_\text {LD}. \end{aligned}$$

Because \(z_\text {LD} \le z_\text {UB}\), we have \(z_\text {LD} = \sum _{j=1}^r D_j(\tilde{\lambda }^{k}) = z_\text {UB}\). \(\square \)

Proposition 5 states that a feasible master solution \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) satisfying \(\sum _{j=1}^r \tilde{\theta }_j^k \ge z_\text {UB}\) at iteration k is an optimal solution of the Lagrangian dual problem if no cut is generated at such a point. The following corollary also suggests that at least one cut is generated for the feasible solution if \(\sum _{j=1}^r \tilde{\theta }_j^k > z_\text {UB}\); that is, in the case that strict inequality holds.

Corollary 1

Let \((\tilde{\theta }^k,\tilde{\lambda }^k,\tilde{\pi }^k)\) be a feasible solution of the master (13) satisfying \(\sum _{j=1}^r \tilde{\theta }_j^k > z_\text {UB}\) at iteration k for given \(\epsilon _\text {IPM}^k\) and \(z_\text {UB}\). There exists some j such that \(\tilde{\theta }_j^k > D_j(\tilde{\lambda }^k)\). \(\square \)

Figure 1 provides a conceptual illustration of the different termination cases for the IPM under the proposed criteria (19). Figure 1a shows the IPM terminated by criterion (19a) prior to satisfying condition (19b). In Fig. 1b, the IPM terminated by criterion (19b), but no cut was generated at the solution. Hence, by decreasing the duality gap tolerance \(\epsilon _\text {IPM}^k\) from Proposition 4, the interior-point solution moves toward an extreme point (see Fig. 1b, c). In Fig. 1c the IPM terminated with the decreased tolerance at the next iteration.

Fig. 1
figure 1

Conceptual illustration of different termination cases of the IPM with the proposed criteria. a Terminated by criterion (19a). b Terminated by criterion (19b), but no cut generation. c Terminated by criterion (19b) with decreased \(\epsilon _\text {IPM}^k\)

We modify the dual decomposition method of Algorithm 2 in order to use the interior-point cutting-plane method with the proposed termination criteria (19) for the master problems (13). Moreover, we solve the Lagrangian subproblems (8) by using Procedure 1. We denote the duality gap tolerance for optimality of the IPM by \(\epsilon _\text {Opt}\); that is, \((\theta ^k,\pi ^k)\) is an optimal solution for the master (13) at iteration k if \(g_k(\theta ^k,\pi ^k) < \epsilon _\text {Opt}\).

figure d

Theorem 1

Algorithm 3 terminates after a finite number of iterations with an optimal solution of the Lagrangian dual problem (9).

Proof

We consider the following two cases:

  • Assume that a feasible solution of the master problem is found that satisfies \(\sum _{j=1}^r \tilde{\theta }_j^{k+1} \ge z_\text {UB}\) for given \(z_\text {UB} < \infty \). We have two subcases:

    • If \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) holds for all \(j=1,\dots ,r\), we have from Proposition 5 that the algorithm terminates with optimality (line 8).

    • Otherwise, from Corollary 1, we must have that the algorithm excludes the current solution by adding cutting-planes (13b) (line 10).

  • Assume that a feasible solution of the master problem is found that satisfies \(g_{k}(\tilde{\theta }^{k}, \tilde{\lambda }^{k}) < \epsilon _\text {IPM}^{k}\). We have two subcases:

    • If \(\tilde{\theta }_j^k \le D_j(\tilde{\lambda }^k)\) holds for all \(j=1,\dots ,r\), we have from Proposition 4 that the algorithm reduces \(\epsilon _\text {IPM}^{k}\) by a factor of \(\delta \) (line 15). An optimal master solution can thus be obtained after a finite number of reductions in \(\epsilon _\text {IPM}^{k}\). (line 17)

    • Otherwise, the algorithm excludes the current solution by adding (13b) (line 20).

Each case shows either a finite termination or the addition of cuts (13b). Because a finite number of inequalities are available for (13b) and because Procedure 1 also terminates in a finite number of steps, the algorithm terminates after a finite number of steps. \(\square \)

4 DSP: an open-source package for SMIP

We now introduce DSP, an open-source software package that provides serial and parallel implementations of different decomposition methods for solving SMIPs. DSP implements the dual decomposition method of Algorithm 3 as well as standard dual decomposition methods (Algorithms 1 and 2) and a standard Benders decomposition method.We note that DSP does not implement a branch-and-bound method to exploit Lagrangian lower bounds (as in [8, 17, 47]) . Consequently, our current implementation cannot be guaranteed to obtain an optimal solution for SMIP.

4.1 Software design

We first provide software implementation details and different avenues for the user to call the solver. The software design is object-oriented and implemented in C++. It consists of Model classes and Solver classes for handling optimization models and scenario data.In Fig. 2 we illustrate the core classes in DSP using an UML class diagram.

Fig. 2
figure 2

UML (unified modeling language) class diagram for DSP

4.1.1 Model classes

An abstract Model class is designed to define a generic optimization model data structure. The StoModel class defines the data structure for generic stochastic programs, including two-stage stochastic programs and multistage stochastic programs. The underlying data structure of StoModel partially follows the SMPS format. The class also defines core functions for problem decomposition. The TssModel class derived defines the member variables and functions specific to two-stage stochastic programs and decompositions. Following the design of the model classes, users can derive new classes for their own purposes and efficiently manage model structure provided from several interfaces (e.g., JuMP and SMPS; see Sect. 4.2).

4.1.2 Solver classes

An abstract Solver class is designed to provide different algorithms for solving stochastic programming problems defined in the Model class. DSP implements the TssSolver class to define solvers specific to two-stage stochastic programs. From the TssSolver class, three classes are derived for each method: TssDe, TssBd, and TssDd.

The TssDe class implements a wrapper of external solvers to solve the extensive form of two-stage stochastic programs. The extensive form is constructed and provided by the TssModel class.

The TssBd class implements a Benders decomposition method for solving two-stage stochastic programs with continuous recourse. A proper decomposition of the model is performed and provided by the TssModel class, while the second-stage integrality restriction is automatically relaxed. Depending on the parameters provided, TssModel can make a different form of the problem decomposition for TssBd. For example, the user can specify the number of cuts added per iteration, which determines the number of auxiliary variables in the master problem of Benders decomposition. Moreover, the Benders master can be augmented for a subset \(\widetilde{\mathcal {J}} \subseteq \mathcal {J} := \{1,\dots ,r\}\) of scenarios to accelerate convergence.

The TssDd class implements the proposed dual decomposition method for solving two-stage stochastic programs with mixed-integer recourse. For this method, an abstract TssDdMaster class is designed to implement methods for updating the dual variables. The subgradient method in Algorithm 1 and the cutting-plane method in Algorithm 2 are implemented in such derived classes. Moreover, a subclass derived from the TssBd is reused for implementing the cutting-plane procedure from Procedure 1. Users can customize this cutting-plane procedure by incorporating more advanced Benders decomposition techniques such as convexification of the recourse function [16, 51, 52, 59]. An \(l_\infty \)-norm trust region is also added to the master problems of Algorithm 2 in order to stabilize the cutting-plane method. The rule of updating the trust region follows that proposed in [35]. Users can also implement their own method for updating the dual variables.

4.1.3 External solver interface classes

DSP uses external MIP solvers to solve subproblems under different decomposition methods. The SolverInterface class is an abstract class to create interfaces to the decomposition methods implemented. Several classes are derived from the abstract class in order to support specific external solvers. The current implementation supports three LP solvers (Clp [14], SoPlex [58], and OOQP [20]) and a mixed-integer solver (SCIP [1]). Users familiar with the COIN-OR Open Solver Interface [49] should easily be able to use the SolverInterfaceOsi class to derive classes for other solvers (e.g., CPLEX [29], Gurobi [27]).

4.1.4 Parallelization

The proposed dual decomposition method can be run on distributed-memory and shared-memory computing systems with multiple cores. The implementation protocol is MPI. In a distributed-memory environment, the scenario data are distributed to multiple processorsin a round-robin fashion. For the distributed scenario data, each processor solves the Lagrangian subproblems, generates Benders-type cuts, and solves upper-bounding problems (i.e., evaluating feasible solutions) in sequence at each iteration. For example, if a processor received the scenario data for scenarios 1 and 3, it solves the Lagrangian subproblems, cut-generation problems, and the upper-bounding problems corresponding to scenarios 1 and 3. The resulting partial information for lower bounds, cuts, and upper bounds will then be combined at the master processor (this prevents each processor from loading all the scenario data). Therefore, each iteration solves many MIP problems (i.e., Lagrangian subproblems and upper-bounding problems) and LP problems (i.e., cut-generation problems) in parallel. The root processor also solves the master problem to update the dual variables. By distributing scenario data, we can store and solve large-scale SMIP problem. As long as a single-scenario problem fits in a computing node, the number of scenarios can be increased to as many computing nodes as are available. However, each computing node requires enough memory to store at least one scenario problem in extensive form.

In a parallel dual decomposition setting, significant amounts of data need to be shared among processors. Shared data include subproblem solutions, dual variables, valid inequalities, best upper bound, and best lower bound. Data are communicated among the processors based on a master-worker framework. That is, the root processor collects and distributes data to all processors in a synchronous manner.

4.2 Interfaces for C, JuMP, and SMPS

The source code of DSP is compiled to a shared object library with C API functions defined in StoCInterface.h. Users can load the shared object library with the implementation of the model to call the API functions. We also provide interfaces to JuMP [4, 37] and SMPS files [5] via a Julia package Dsp.jl.

The Dsp.jl package exploits the algebraic modeling capabilities of JuMP, which is a Julia-based modeling language. The use of Julia facilitates data handling and the use of supporting tools (e.g., statistical analysis and plotting). To illustrate these capabilities, we consider the two-stage stochastic integer program presented in [8]:

$$\begin{aligned} \min \; \left\{ -1.5 \,x_1 - 4\,x_2 + \sum _{s=1}^3 p_j Q(x_1,x_2,\xi _{1,s},\xi _{2,s}) : x_1,x_2 \in \{0,\dots ,5\}\right\} , \end{aligned}$$

where

$$\begin{aligned} Q(x_1,x_2,\xi _{1,s},\xi _{2,s}) = \min _{y_1,y_2,y_3,y_4} \quad&-16 y_1 + 19 y_2 + 23 y_3 + 28 y_4 \\ \text {s.t.} \quad&2 y_1 + 3 y_2 + 4 y_3 + 5 y_4 \le \xi _{1,s} - x_1 \\&6 y_1 + y_2 + 3 y_3 + 2 y_4 \le \xi _{2,s} - x_2 \\&y_1,y_2,y_3,y_4 \in \{0,1\} \end{aligned}$$

and \((\xi _{1,s},\xi _{2,s}) \in \{(7,7),(11,11),(13,13)\}\) with probability 1 / 3. The corresponding JuMP model is given by:

figure e

In the first line of this script we call the required Julia packages: JuMP, Dsp, and MPI. The JuMP model is defined in lines 3 to 14. The Dsp.jl package redefines the functions JuMP.Model and JuMP.solve in order to interface with the DSP solver. The first stage is defined in lines 5 and 6 and the second stage in lines 8 to 12 for each scenario. DSP reads and solves the model in line 14 with optional specifications for the solution method (dual decomposition, :Dual) and the algorithmic parameter file myparam.txt. More details are available in the software repositories. Note that only one line of code is required to invoke the parallel decomposition method. In other words, the user does not need to write any MPI code.

In the script provided we solve the problem using the dual decomposition method described in Sect. 3 (line 14).Arguments can be passed to the function solve to specify the desired solution method. For instance, if the user wishes to solve SMIP with continuous second-stage variables using Benders decomposition, then line 14 is replaced with

figure f

Note that this finds an optimal solution when the second stage has continuous variables only (otherwise it provides a lower bound). If the user desires to find an optimal solution for the extensive form of SMIP, then line 15 is replaced with

figure g

DSP can also read a model provided as SMPS files. In this format, a model is defined by three files – core, time, and stochastic – with file extensions of .cor, .tim, and .sto, respectively. The core file defines the deterministic version of the model with a single reference scenario, the time file indicates a row and a column that split the deterministic data and stochastic data in the constraint matrix, and the stochastic file defines random data. The user can load SMPS files and call DSP using the Julia interface as follows.

figure h

Remark 3

The DSP software package and the Julia packages (JuMP and Dsp.jl) are under active development. The up-to-date syntax and features are available in the project repositories.

5 Computational experiments

We present computational experiments to demonstrate the capabilities of DSP. We solve publicly available test problem instances (Sect. 5.1) and a stochastic unit commitment problem (Sect. 5.2). All experiments were run on Blues, a 310-node computing cluster at Argonne National Laboratory. Blues has a QLogic QDR InfiniBand network, and each node has two octo-core 2.6 GHz Xeon processors and 64 GB of RAM.

5.1 Test problem instances

We use dcap and sslp problem instances from the SIPLIB test library available at http://www2.isye.gatech.edu/~sahmed/siplib/. Characteristics of the problem instances are described in Table 1. The first column of the table lists the names of the problem instances, in which r is substituted by the number of scenarios in the second column. The other columns present the number of rows, columns, and integer variables for each stage, respectively. Note that the dcap instances have a mixed-integer first stage and a pure-integer second stage, whereas the sslp instances have a pure-integer first stage and a mixed-integer second stage.

Table 1 Characteristics of the SIPLIB instances

5.1.1 Benchmarking dual-search strategies

We experiment with different methods for updating the dual variables: DDSub of Algorithm 1, DDCP of Algorithm 2, and DSP of Algorithm 3. In this experiment, DDSub initializes parameters \(\beta _0 = 2\) and \(\gamma ^\text {max} = 3\) for the step-size rule and terminates if one of the following conditions is satisfied: (i) the optimality gap, \((z_\text {UB} - z_\text {LB}) / |10^{-10} + z_\text {UB}|\), is less than \(10^{-5}\); (ii) \(\beta _k \le 10^{-6}\); (iii) the solution time exceeds 6 hours;and (iv) the number of iterations reaches \(10^4\). DDCP solves the master problem by using the simplex method implemented in Soplex-2.0.1 [58]; taking advantage of warm-start information from the previous iteration. DSP solves the master problem by using Mehrotra’s predictor–corrector algorithm [42] implemented in OOQP-0.99.25 [20].In each iteration, the three methods evaluate no more than 100 primal solutions for upper bounds \(z_\text {UB}\). Each Lagrangian subproblem is solved by SCIP-3.1.1 with Soplex-2.0.1. All methods use the initial dual values \(\lambda ^0 = 0\). This experiment was run on a single node with 16 cores.Appendix A provides the computational results obtained with FiberSCIP [54] for the extensive form of the test instances.

Fig. 3
figure 3

Optimality gap obtained with different dual decomposition methods

Figure 3 presents the optimality gap obtained by different dual decomposition methods for each SIPLIB instance. DDCP and DSP found optimal lower bounds \(z_\text {LD}\) for all the test problems, whereas DDSub found only suboptimal values.DSP also found tighter upper bounds for instances dcap233_500, dcap332_200, dcap332_500, and dcap342_200 than did DDCP. Moreover, DDCP returned larger optimality gaps than did DSP for the sslp_10_50_1000 and sslp_10_50_2000 instances, since it did not terminate after 6 hours of solution time (see Table 2).

Table 2 Computational results for SIPLIB instances using different dual decomposition methods
Fig. 4
figure 4

Solution time and number of iterations ratios (relative to those obtained with DSP)

Figure 4 presents the solution time and the number of iterations relative to those from DSP for each SIPLIB instance. DSP reduced the solution time and the number of iterations (relative to DDSub) by factors of up to 199 and 78, respectively. DSP reduced the solution time and number of iterations (relative to DDCP) by factors of up to 7 and 5 achieved, respectively. DDSub terminated earlier than DSP for instances dcap_332_x and dcap_342_x because of the termination criterion \(\beta _k < 10^{-6}\). For these instances poor-quality lower bounds are provided. DDSub did not terminate within the time limit of 6 hours for instance sslp_15_45_15. Detailed numerical results are given in Table 2.

Fig. 5
figure 5

Dual objective values and best lower bounds for SIPLIB instance sslp_15_45_15 under different dual search methods

Figure 5 shows the Lagrangian dual values obtained in each iteration for sslp_15_45_15 by using DDSub, DDCP, and DSP. The lower bound is − 275.7 at iteration \(k=0\). Fewer oscillations in the lower bounds and a faster solution were obtained with DSP compared with the standard cutting-plane method DDCP. Figure 6 presents the Euclidean distance of the dual variable values between consecutive iterations (\(\Vert \lambda ^{k+1}-\lambda ^k\Vert _2\)). The figure is truncated at iteration 100. As can be seen, DSP dual updates oscillate less compared with DDCP updates. DDSub updates seem stable, but this is because of slow progress in the dual variables. These results highlight the benefits gained by the use of the IPM and early termination.

Fig. 6
figure 6

Evolution of dual variables under different methods applied to sslp_15_45_15

Fig. 7
figure 7

Improvement in optimality gaps obtained with DSP over Benders decomposition

5.1.2 Impact of second-stage integrality

We now use DSP to analyze the impact of relaxing recourse integrality in Benders decomposition (this approach is often used as a heuristic). Figure 7 shows that the optimality gap improved (i.e., the DSP optimality gap—the Benders optimality gap). Upper bounds for Benders decomposition were calculated by evaluating the first-stage solutions obtained from relaxation of the recourse function. For the dcap instances, Benders decomposition approximates lower bounds poorly by relaxing the second-stage integrality. The largest gap was 86% obtained for dcap332_500 with Benders decomposition, compared with 0.32 % obtained with dual decomposition. Benders solutions for sslp_15_45_x and sslp_10_50_x have duality gaps \(> 0.2 \%\), whereas the dual decomposition solutions are optimal for all problem instances.

5.1.3 Scaling experiments

We present parallel scalability results for DSP running with up to 200 computing cores. We use the sslp_10_50_r instances where r presents the number of scenarios used (50, 100, 500, 1000, and 2000). We also present scaling profiles for solution times spent on different algorithmic steps and tasks for the sslp_10_50_2000 instance. Figure 8 shows strong scaling performance results of DSP. We define the speedup as the ratio of the solution time with N cores to the solution time with five cores. We observe that DSP scales up as the number of cores increases for all the instances. As expected, scaling efficiency is lower for instances with fewer numbers of scenarios because subproblem solutions become less dominant. Figure 9 itemizes time spent on each of the algorithmic steps in addition to the MPI communication tasks. We see that time spent for solving lower and upper bounding problems dominates. We note that the cut generation times are only a small fraction. We also observe that the master solution time and communication time increases with the number of cores, as expected.

Fig. 8
figure 8

Strong scaling results of DSP for the sslp_10_50_ r instances with 5, 10, 25, 50, 100, and 200 cores, where \(r \in \{50,100,500,1000,2000\}\)

Fig. 9
figure 9

Parallel scaling profile of DSP for sslp_10_50_2000 instance with 5, 10, 25, 50, 100, and 250 cores

5.2 Large-scale stochastic unit commitment

We test performance of different methods using a large-scale day-ahead stochastic unit commitment model. In this model, thermal power generators are scheduled over a day. The schedules are subjected to uncertainty in wind power. We use a modified IEEE 188-bus system with 54 generators, 118 buses, and 186 transmission lines provided in [34]. We assume that 17 of the 54 generators are allowed to start on demand (second stage) whereas the other generators should be scheduled in advance (first stage). We also consider 3 identical wind farms each consisting of 120 wind turbines. The demand load is 3,095 MW on average, with a peak of 3,733 MW. The wind power generation level is 494 MW on average, with a peak of 916 MW for the 64 scenarios generated. Figure 10 shows the 64 scenarios (grey lines) of wind power generation and the mean levels (red lines). We used real wind speed data predicted from the observations of 31 weather stations in Illinois.

Fig. 10
figure 10

Wind generation scenarios used in stochastic unit commitment model

The mathematical formulation and the JuMP modeling script of the two-stage stochastic unit commitment model is provided in Appendices B, C, and D.

Table 3 Sizes for stochastic unit commitment instances
Table 4 Numerical results for stochastic unit commitment problem under DSP

Table 3 presents the size of the stochastic unit commitment instances with 4, 8, 16, 32, and 64 scenarios. The first stage has 10,727 constraints and 2,592 variables including 864 integer variables; and the second stage has 27,322 constraints and 9,072 variables including 432 integer variables. Table 4 summarizes the numerical results. Each instance uses the same number of computing cores as scenarios. We add only one aggregated cut to the master problem at each iteration (see Remark 1). DSP found upper and lower bounds with \(< 0.01\%\) optimality gap for the 4-, 8-, 16-, and 32-scenario instances. Most notably, DSP terminated after the first iteration for the 4-, 8- and 16-scenario instances because of the addition of valid inequalities in Procedure 1 to the Lagrangian subproblems. This fast convergence behavior becomes evident in Sect. 5.2.1, where we compare the LB and Initial LB columns for the instances terminating in one iteration. We observe, however, that the solution time per iteration increases as the number of scenarios increases. The reason is that the method generates more valid inequalities in Procedure 1 and evaluates more solutions to update upper bounds, causing imbalanced computing load among scenarios. In particular, each node needs to evaluate the recourse function for its local first-stage variables and for all scenarios in order to determine the best possible upper bound. This step is currently done sequentially, and its time can be reduced by considering additional computing nodes.The master problem solution time was less than 2 seconds per iteration and thus does not represent a bottleneck.

5.2.1 Impact of procedure 1

For the stochastic unit commitment with 8 scenarios, we use DDSub, DSP without Procedure 1 (DSP-P1), and DSP with Procedure 1 (DSP+P1). Figure 11 shows the best upper bound and the best lower boundof the 8-scenario stochastic unit commitment problem at each iteration. As can be seen, DSP+P1 obtained upper and lower bounds with \(<0.01\%\) duality gap at the first iteration and terminated, whereas DSP-P1 and DDSub were not able to find upper bounds for the first 53 iterations and the first 47 iterations, respectively, because the problem does not have relatively complete recourse. Moreover, DSP+P1 found tighter lower bounds than did DDSub and DSP-P1, because of the ability to tighten the subproblems by Procedure 1. The figure also shows that DSP-P1 still found better lower and upper bounds than did DDSub.

Fig. 11
figure 11

Upper bounds and lower boundsof the stochastic unit commitment problem with 8 scenarios, obtained with DSP with and without Procedure 1 and with the subgradient method

Table 5 reports the lower and upper bounds obtained at the first iteration by DSP-P1 and DSP+P1. Procedure 1 improves lower bound by up to 0.7% by adding valid inequalities. More importantly, it allowed us to find upper bounds at the first iteration, which then can be used as a termination criterion (19a) of the interior-point cutting-plane method.

Table 5 Upper bounds and lower bounds of the stochastic unit commitment problem resulting from DSP with and without Procedure 1 at the first iteration

Table 6 presents detailed computational results for four methods: DDCP-P1 (Algorithm 2 without Procedure 1), DDCP+P1 (Algorithm 2 with Procedure 1), DSP-P1, and DSP+P1 (as defined previously). These results seek to highlight the positive impact of using the interior-point cutting-plane method and Procedure 1. For the 4-, 8, and 16-scenario instances, both DDCP+P1 and DSP+P1 found upper and lower bounds with \(<0.01\%\) optimality gaps. However, without Procedure 1, DDCP-P1 did not terminate within the 6-hour time limit for all the instances, of which no upper bound was found for the 4- and 8-scenario instances. The reasons is that the problem does not have relative complete recourse. DSP-P1 did not terminate within the time limit for all the instances except the 4-scenario instance. Note that DDCP+1 and DSP+P1 are exactly the same method when they terminate at the first iteration for the 4-, 8-, and 16-scenario instances. For the 32-scenario instance, DSP+P1 terminated with \(<0.01\%\) optimality gap, whereas DDCP+P1 took more iterations and reached the time limit with a larger gap. For the 64-scenario instance, DSP+P1 found a better lower bound than did DDCP+P1. However, both DDCP+P1 and DSP+P1 did not terminate after 6 hours. We attribute this behavior to the fact that the interior-point cutting-plane method computed lower bounds more effectively and ameliorated dual oscillations.

Table 6 Computational results from DDCP (Algorithm 2) and DSP (Algorithm 3) with and without Procedure 1

5.2.2 Extensive form solutions

We conclude this section by reporting the computational results from solving the extensive form of the stochastic unit commitment problems. The results are summarized in Table 7. The extensive form of each instance is solved by SCIP on a single node with a single core. For all the instances (except the 4-scenario instance) with a 6-hour time limit, upper and lower bounds obtained from the extensive forms were not better than those obtained with DSP.We also tested ParaSCIP [53] for solving the extensive forms, but the solver ran out of memory in all instances.

Table 7 Numerical results for the extensive form of the stochastic unit commitment problems

6 Summary and directions of future work

We have provided algorithmic innovations for the dual decomposition method. Our first innovation is a procedure to generate valid inequalities that exclude infeasible first-stage solutions in the absence of relatively complete recourse and that tighten subproblem solutions. Hence, we incorporate the Benders-type cut generation within the dual decomposition method. Our second innovation is an interior-point cutting-plane method with early termination criteria to solve the Lagrangian master problem. We proved that the dual decomposition method incorporating our innovations converge in a finite number of iterations to an optimal solution of the Lagrangian dual problem. We have introduced DSP, a software package that provides implementations of the dual decomposition method proposed as well as other standard methods such as Benders decomposition and a subgradient-based dual decomposition method. DSP also allows users to specify large-scale models in C, Julia, and SMPS formats. The object-oriented design of the implementation enables users to customize decomposition techniques. We use DSP to benchmark different methods on standard problem instances and stochastic unit commitment problems. The numerical results show significant improvements in terms of the quality of upper bounds, number of iterations, and time for all instances. We also verified that the solver can achieve strong scaling.

We will seek to scale our method to solve problems with a larger number of scenarios. In particular, we have observed load imbalances because each Lagrangian subproblem is a large mixed-integer program with unpredictable solution times. Such imbalances can be alleviated by asynchronous parallel implementation that enable early subproblem termination. Moreover, solving the master problem can become a bottleneck as cuts are accumulated. A warm-start technique for the primal-dual interior-point method [22, 24] and a parallelization of the master problem as proposed in [38] can ameliorate such effects. We will address these issues in future work.