1 Introduction

We study algorithms for finding global solutions for nonconvex nonlinear programs (NLPs) arising in two-stage stochastic programming. Our work is motivated by the observation that the direct application of spatial branch and bound (BB) techniques (as those implemented in several popular packages such as BARON [21], ANTIGONE [18], and SCIP [17]) do not scale well with the number of scenarios because branching may be performed on all variables (which include first and second-stage variables). Several approaches have been previously proposed to exploit the structure of stochastic programs in order to achieve better scalability. A class of these methods is based on direct application of generalized Benders decomposition (GBD) [10], which solves a sequence of master problems to generate lower bounds and primal problems to generate upper bounds. The master problems are obtained from outer approximation and the primal problems are obtained by fixing the first-stage variables at candidate values. Convergence of GBD is not guaranteed for nonconvex problems. The work in [16] proposes a nonconvex GBD scheme in which a lower bound is generated by solving a convexified problem with GBD and an upper bound is generated by fixing first stage variables and solving the resulting scenario subproblems to global optimality. Finite termination of this approach can be guaranteed if the first stage variables are all bounded integers.

Another class of methods is based on Lagrangian relaxation (LR) [8, 11, 15]. Lagrangian relaxation is used to generate lower bounds within a BB framework. The approaches reported in [5] and [14] use this approach to solve stochastic MILPs and stochastic MINLPs, respectively. Lagrangian relaxation has also been used to develop reduced-space search approaches that only branch on complicating variables. The approaches reported in the literature, however, do not provide convergence guarantees for general two-stage stochastic problems.

Reduced-space BB approaches with convergence guarantees have been reported for special problem classes. The approach proposed in [6] provides a reduced-space BB scheme for solving partially convex problems (i.e., problems that are convex when a subset of variables are fixed). The authors provide a proof of convergence when branching only in the space of variables that induce nonconvexity. The approaches proposed in [7] and [9] are also reduced-space BB schemes that can be used to solve partially convex problems.

In this paper we introduce a tailored reduced-space BB scheme for general two-stage stochastic NLPs. For each node in the BB scheme, a lower bound is constructed by relaxing the so-called non-anticipativity constraints and an upper bound is constructed by fixing the first-stage variables to a given candidate solution and solving the scenario subproblems. The proposed lower bound is a special case of Lagrangian relaxation with the dual variables fixed to zero. A key advantage of the proposed approach is that both lower bounding and upper bounding problems can be decomposed into scenario subproblems that are solved independently to global optimality. Another key property of this approach is that we only need to perform spatial branching on the first-stage variables to guarantee convergence. The algorithm also exploits the fact that the gap between the upper and lower bounding problems is the so-called expected value of perfect information, which is usually small in applications (relative to the overall magnitude of the objective). Notably, our convergence results also hold for two-stage stochastic MINLPs with mixed-integer first stage variables and continuous two-stage variables. We provide a software implementation of the algorithm that we call SNGO. This is a Julia-based package that is interfaced to the modeling language Plasmo.jl, which facilitates model processing. Our implementation contains typical algorithmic features of global optimization solvers such as convexification, outer approximation, feasibility-based bound tightening (FBBT), optimality-based bound tightening (OBBT), and local search. SNGO also exploits lower bounds obtained from linear programming (LP) relaxations.

The paper is organized as follows: Sect. 2 introduces basic nomenclature and lower/upper bounding problems. Section 3 introduces the BB algorithm and provides a convergence proof. Section 4 provides implementation details for this algorithm. Section 5 illustrates numerical performance in a stochastic programming formulation for controller tuning, a parameter estimation formulation for microbial community models, and stochastic versions of GLOBALLib instances. The paper closes in Sect. 6 with final remarks and directions of future work.

2 Basic nomenclature and setting

We consider two-stage stochastic programs of the form:

$$\begin{aligned} z = \min \limits _{x\in X_0}&\; \sum _{s\in \mathcal {S}} Q_s(x). \end{aligned}$$
(2.1)

Here, \(\mathcal {S}:=\{1 , \ldots , S\}\) is the scenario set, \(x \in X_0 \subset \mathbb {R}^{n_x}\) are the first-stage variables, \(X_0:=\{x| x_l \le x \le x_u \}\) is a closed set, and \(Q_s (x)\) is the optimal value of the second-stage problem:

figure a

Here, \(y_s \in \mathbb {R}^{n_{y_s}}\) are the second-stage variables. The scenario objectives \(f_s: \mathbb {R}^{n_{x}} \times \mathbb {R}^{n_{y_s}} {\rightarrow } \mathbb {R}\) and constraints \(g_{s,j}: \mathbb {R}^{n_{x}} \times \mathbb {R}^{n_{y_s}} {\rightarrow } \mathbb {R}\) are assumed to be continuous and potentially nonconvex. Equality constraints such as \(c_{s}(x, y_s)= 0\) can be reformulated by using sets of inequalities. We define the feasible set for the recourse variables as \(Y_s(x) := \{y_s | g_{s,j} (x, y_s) \le 0, j =1,\ldots ,m_s\}\). The recourse function \(Q_s (\cdot )\) implicitly defines a feasible set of x, which is in turn defined as \(K_s := \{x \;|\; \exists y_s\in Y_s(x)\}\). If \(\not \exists y_s\in Y_s(x)\) for some x, we set \( Q_s (x) = \infty \). We define the relative interior of a set X as \(\text {relint}(X)\). We use \(\delta (X)\) to denote the diameter of set X. In our context, the diameter of the box set \(\{x\,|\, x_l \le x \le x_u \}\) is \(\delta (X) = ||x_u - x_l ||_\infty \).

The feasible set defined by all second-stage subproblems is denoted as \(K = \bigcap \limits _{s \in \mathcal {S}} K_s\). Consequently, the feasible region of the first-stage variables x is \(X_0 \cap K\). We make the following blanket assumptions:

Assumption 1

The set K is compact and \(X_0 \cap K\) is nonempty.

Assumption 2

The feasible sets \(Y_s(x)\) are compact for all \(x \in X_0\) and \(s\in \mathcal {S}\).

Assumption 2 implies that \(Q_s (x)\) is lower semicontinuous in x for all \(s\in \mathcal {S}\) (see Theorem 35 in [3]). This also implies that the function \(Q(x):=\sum \limits _{s \in \mathcal {S}} Q_s(x)\) is lower semicontinuous in x. Assumption 1 ensures that problem (2.1) attains its minimum according to the generalized Weierstrass theorem.

At each node in a BB algorithm we solve the following problem with respect to the partition set \(X \subseteq X_0\):

$$\begin{aligned} z(X) = \min \limits _{x\in X}&\; \sum _{s\in \mathcal {S}} Q_s(x) \end{aligned}$$
(2.2)

We refer to this problem as the primal node problem. This problem can be written in the extensive form:

$$\begin{aligned}&\min \limits _{x\in X, y_s} \; \sum _{s\in \mathcal {S}} \,f_s(x, y_s) \end{aligned}$$
(2.3a)
$$\begin{aligned}&\mathrm{s.t.} \;\; g_{s,j} (x, y_s) \le 0, \;\; j =1,\ldots ,m_s, s\in \mathcal {S}. \end{aligned}$$
(2.3b)

We can also lift the node problem (2.2) by replicating the first stage variables across scenarios and then enforce non-anticipativity constraints. This gives the lifted problem:

$$\begin{aligned} \min \limits _{x_s\in X}&\; \sum _{s\in \mathcal {S}} Q_s(x_s) \end{aligned}$$
(2.4a)
$$\begin{aligned} \mathrm{s.t.} \;\;&x_s = x_{s+1}, \;\; s = 1,\ldots ,S-1. \end{aligned}$$
(2.4b)

It is easy to verify that problems (2.2), (2.3), (2.4) are equivalent.

2.1 Lower bounding problem

If the nonanticipativity constraints of (2.4) are removed, we obtain a lower bounding problem of the form:

$$\begin{aligned} \beta (X) := \min \limits _{x_s\in X}&\; \sum _{s\in \mathcal {S}} Q_s(x_s). \end{aligned}$$
(2.5)

Clearly, the lower bounding problem can be decomposed into S subproblems of the form:

$$\begin{aligned} \beta _s(X) := \min \limits _{x_s\in X}&\; Q_s(x_s), \end{aligned}$$
(2.6)

or, equivalently:

$$\begin{aligned} \beta _s(X) = \min \limits _{x_s\in X, y_s}&\; f_s({x_s}, y_s) \nonumber \\ \mathrm{s.t.}\; \;\;&g_{s,j} (x_s, y_s) \le 0\;, j =1,\ldots ,m_s, \end{aligned}$$
(2.7)

with \(\beta (X) =\sum \limits _{s \in \mathcal {S}} \beta _s(X)\). It is also obvious that \(\beta (X) \le z(X)\) because the feasible region of (2.4) is a subset of the feasible region of (2.5). Moreover, we have that \(\beta (X_1) \ge \beta (X_2)\) for \(X_1 \subset X_2\). The lower bounding problem is also called wait-and-see problem and the gap between the primal problem (2.2) and the lower bounding problem (2.5) is called the expected value of perfect information (EVPI). We highlight that \(\beta _s(X)\) is obtained by solving the scenario subproblems to global optimality. We also note that, if \(\beta _s(X) = \infty \) for some \(s\in \mathcal {S}\), then \(z(X) = \infty \). In other words, if a subproblem is infeasible, the primal node problem is infeasible.

The lower bounding problem can be seen as the first iteration of Lagrangian relaxation (obtained by dualizing the non-anticipativity constraints and initializing the dual variables to zero). In principle, it is possible to perform multiple Lagrangian relaxation iterations to update the dual variables (i.e., by using subgradient schemes) to obtain a tighter lower bound. Unfortunately, subgradient schemes are difficult to tune and performance is ad-hoc. The approach that we present in this work does not require computation and updates of dual variables.

2.2 Upper bounding problem

An upper bound for the stochastic program can be obtained by fixing the first stage variable at a candidate solution \(\hat{x} \in X\). The upper bound is denoted as \(\alpha (X) := \sum \limits _{s \in \mathcal {S}} Q_s(\hat{x})\) and we note that the upper bound can be computed by solving S subproblems with optimal values \(\alpha _s(X)=Q_s(\hat{x})\) and by setting \(\alpha (X) =\sum \limits _{s \in \mathcal {S}} \alpha _s(X)\). In our proposed scheme, the subproblems are also solved to global optimality. It is easy to see that \(z(X) \le \alpha (X)\) holds for any \(\hat{x}\in X\). We highlight that, a classic BB scheme obtains an upper bound by solving the extensive form (2.3) to local optimality. We note this approach requires solving a coupled problem, while our approach requires solutions of decoupled scenario subproblems to global optimality. In Sect. 4 we discuss implementations that allow for the possibility of using a local solver to solve the extensive form and possibly obtain a better upper bound. In the convergence analysis that follows we assume that the lower and upper bounding problems are solved exactly to global optimality. In Sect. 4 we discuss implementation strategies to set proper solution tolerances.

3 Convergence of branch and bound socheme

This section establishes convergence for a BB scheme constructed using the proposed lower and upper bounding problems. We outline the BB scheme as follows:

figure b

A key feature of the proposed BB scheme is that it only branches on the first-stage variables (because the recourse variables are handled implicitly in the evaluation of the recourse functions).

The BB scheme can be viewed as a rooted tree, where \(X_0\) is the root node at level 0 and \(X_{k_q}\) denotes a node at level q explored at iteration \(k_q\). An arc connects a node \(X_{k_q}\) at level q with one of its children \(X_{k_{q+1}}\) at level \(q+1\). In other words, \(X_{k_{q+1}}\) is a direction partition of \(X_{k_q}\) satisfying \(X_{k_{q+1}} \subset X_{k_q}\). A path in the tree from the root node corresponds to a sequence \(\{X_{k_q}\}\) of successively refined partition elements.

It is easy to see that the sequence \(\{\alpha _k\}\) is monotonically nonincreasing and that \(\{\beta _k\}\) is monotonically nondecreasing. The BB scheme is said to be convergent if \(\lim \limits _{k\rightarrow \infty } \alpha _k = \lim \limits _{k \rightarrow \infty } \beta _k =z\). If the scheme is convergent then it produces a global \(\epsilon \)-optimal solution in a finite number of steps. We now proceed to prove convergence; to do so, we adapt basic results from Chapter VI of the seminal work in [12] (Definitions IV.6, IV.7, IV.8, IV.10, Theorem IV.3, and Corollary IV.1) to our context.

Lemma 1

If a BB procedure is infinite, then it generates at least one infinitely decreasing sequence \(\{X_{k_q}\}\) of successively refined partition elements, \(X_{k_{q+1}} \subset X_{k_q}\) [12].

Definition 1

A subdivision is called exhaustive if \(\lim \limits _{q\rightarrow \infty } \delta (X_{k_q}) = 0\), for all decreasing subsequences \(X_{k_q}\) generated by the subdivision [12].

A subdivision can be guaranteed to be exhaustive on \(X_0\) if a first-stage variable \(x_i\) that corresponds to the diameter of X is selected for partitioning.

In a BB scheme, a “delete by infeasibility” rule is used to delete infeasible partition sets X (i.e., sets X with \(X \cap K = \emptyset \)). For example, if the lower bounding problem is infeasible, the node problem is infeasible and the partition set can be deleted from further consideration.

Definition 2

The “delete by infeasibility” rule throughout a BB procedure is called certain in the limit if, for every infinite decreasing sequence \(X_{k_q}\) of successively refined partition elements with limit \(\bar{X}\), we have \(\bar{X} \cap K \ne \emptyset \) [12].

Lemma 2

Given an exhaustive subdivision, the “delete by infeasibility” rule is certain in the limit.

Proof

Under an exhaustive subdivision, \(X_{k_q}\) eventually collapses to a single point \({\bar{x}}\) and we thus have that \(\bar{X} = \{\bar{x}\}\). Assume by contradiction that there exists a sequence \(X_{k_q}\) converging to an infeasible point \(\bar{x}\). Since \(\bar{x}\) is infeasible and \(\bar{x} \in X_{k_q} \subseteq X_0\), we have that \(\bar{x} \notin K\). Consequently, there is at least one set \(K_i\) satisfying \(\bar{x} \notin K_i\). By the compactness of \(K_i\), the distance between \(\bar{x}\) and \(K_i\) is nonzero and there is a ball around \(\bar{x}\), denoted as \(B_r(\bar{x}) = \{x|\Vert x-\bar{x}\Vert \le r\}\), satisfying \(B_r(\bar{x}) \cap K_i = \emptyset \). Since \(\lim \limits _{q\rightarrow \infty } \delta (X_{k_q}) = 0\), there is a \(q_0\) such that \(X_{k_q} \subset B_r(\bar{x}), \forall q \ge q_0\). At this iteration, \(X_{k_{q_0}} \cap K_i = \emptyset \), which implies that \(\beta _s(X_{k_{q_0}} ) = \infty \). Consequently, the infeasible set will be detected and deleted. Hence, it is impossible that \(X_{k_q}\) converges to an infeasible point without being detected by the “delete by infeasibility” rule. \(\square \)

Definition 3

A lower bounding operation is called strongly consistent if, at every iteration, any undeleted partition set can be further refined and if any infinite decreasing sequence \({X_{k_q}}\) of successively refined partition elements contains a sub-sequence \({X_{k_{q'}}}\) satisfying \(\bar{X} \cap K \ne \emptyset \), \(\lim \limits _{q' \rightarrow \infty } \beta ({X_{k_{q'}}}) = z(\bar{X} \cap K)\), where \(\bar{X} = \bigcap \limits _{q} X_{k_q}\) [12].

Lemma 3

Given an exhaustive subdivision on x, Algorithm 1 is strongly consistent.

Proof

From Lemma 2 we have that \(\bar{X} \cap K \ne \emptyset \) holds. With an exhaustive subdivision, \(X_{k_q}\) shrinks to a single point \(\bar{x}\) and we thus have that \(\bar{X} = \{\bar{x}\}\) and \(\bar{X} \cap K = \{\bar{x}\}\). The result can thus be proven by showing that \(\lim \limits _{q\rightarrow \infty } \beta ({X_{k_{q}}}) = z(\bar{X} \cap K) = \sum \limits _{s\in \mathcal {S}} Q_s(\bar{x})\). Take \(\tilde{x}_{k_{q}, s} \in \text {argmin} \{Q_s(x_s):x_s \in X_{k_{q}}\}\), since \(X_{k_q}\) shrinks to \(\bar{x}\), \(\lim \limits _{q\rightarrow \infty } \tilde{x}_{k_{q}, s} = \bar{x}\). Since \( \bar{x} \in X_{k_{q}}\), it follows that \(Q_s(\tilde{x}_{k_{q}, s}) \le Q_s(\bar{x})\). From the lower semicontinuity of \(Q_s\), it follows that \(Q_s(\bar{x}) \le \lim \limits _{q\rightarrow \infty } Q_s(\tilde{x}_{k_{q}, s})\). Therefore, \(Q_s(\bar{x}) = \lim \limits _{q\rightarrow \infty } Q_s(\tilde{x}_{k_{q}, s})=\lim \limits _{q\rightarrow \infty } \beta _s({X_{k_{q}}})\). Take the sum over s, we obtain \(\lim \limits _{q\rightarrow \infty }\beta ({X_{k_{q}}}) = \sum \limits _{s\in \mathcal {S}} Q_s(\bar{x})\). \(\square \)

We now proceed to prove convergence of the lower bounds.

Definition 4

A selection operation is said to be bound improving if, after a finite number of steps, at least one partition element where the actual lower bounding is attained is selected for further partition. [12].

Algorithm 1 is bound improving since, at every step, a partition where the actual lower bounding is attached is selected for further partition.

Lemma 4

For a BB scheme using a lower bounding operation that is strongly consistent and using a selection operation that is bound improving, we have that \(\lim \limits _{k\rightarrow \infty } \beta _k = z\) [12].

Lemma 5

Given an exhaustive subdivision on x, Algorithm 1 satisfies \(\lim \limits _{k\rightarrow \infty } \beta _k = z\).

Proof

This result can be established by combining Lemmas 4 and 3. \(\square \)

To prove finite-epsilon convergence of the upper bounds, we need to make the following assumption.

Assumption 3

The recourse function \(Q(\cdot )\) is Lipschitz continuous in a nonempty neighborhood of a solution \(x^*\).

This assumption requires that the feasible set of the stochastic NLP has a nonempty interior for the first-stage variables, which preclude the application of the following lemma when the formulation has nontrivial equality constraints involving only first stage variables. Typical regularity conditions of the scenario subproblems guaranteeing Lipschitz continuity of \(Q(\cdot )\) are discussed in [4]. In particular, Lipschitz continuity follows if the solutions of the subproblems satisfy the strong second order condition and the linear independence constraint qualification (at fixed \(x^*\)).

Lemma 6

Given an exhaustive subdivision on x, Algorithm 1 delivers a sequence \(\{\alpha _k\}\) satisfying \(\lim \limits _{k\rightarrow \infty } \alpha _k=z\).

Proof

Because \(Q(\cdot )\) is Lipschitz continuous around \(x^*\), there is a ball denoted as \(B_r(x^*) = \{x| \parallel x-x^*\parallel \le r\}\), satisfying \(Q(x)-Q(x^*) \le K \Vert x-x^*\Vert \) for all \(x\in B_r(x^*)\) and where K is a Lipschitz constant. Therefore, for \(\epsilon > 0\) and every point \(x\in B_{r^\prime }(x^*)\), we have that \(Q(x)-Q(x^*) \le \epsilon \) holds with \(r^\prime = \min (r, \epsilon /K)\).

Because the subdivision is exhaustive we have that, either after a finite number of iterates \(\bar{k}\), the partition considered satisfies \(X_{\bar{k}} \subseteq B_{r^\prime }(x^*)\), or at iteration \(\hat{k}\), the partition \(X_{\hat{k}}\) containing \(x^*\) is pruned. In the first case, because any point \(x\in X_{\bar{k}}\) satisfies \(Q(x)-Q(x^*) \le \epsilon \), we have that \(\alpha _{\bar{k}} \le \alpha (X_{\bar{k}}) \le Q(x^*) + \epsilon \). In the second case, the node is pruned because \(\alpha _{\hat{k}} \le \beta (X_{\hat{k}}) + \epsilon \). Since \(x^*\in X_{\hat{k}}\), we have \(\beta (X_{\hat{k}}) \le Q(x^*)\) and thus \(\alpha _{\hat{k}} \le Q(x^*) + \epsilon \) . Because the value of \(\epsilon \) is arbitrary, we have \( \lim \limits _{k\rightarrow \infty } \alpha _k=z\). \(\square \)

Combining Lemmas 5 and 6, we obtain our main result:

Theorem 1

Given an exhaustive subdivision on x, Algorithm 1 is convergent in the sense that:

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \alpha _k = \lim \limits _{k\rightarrow \infty } \beta _k =z. \end{aligned}$$
(3.1)

Remark

We note that the assumptions are also expected to hold for two-stage stochastic MINLPs with mixed-integer first-stage variables x but purely continuous recourse variables. In particular, Lipschitz continuity of the recourse function holds under purely continuous recourse variables and typical regularity assumptions. Consequently, the convergence results hold in this case as well.

4 Implementation details

The software implementation of the proposed algorithm is called SNGO (Structured Nonlinear Global Optimizer). SNGO is implemented in Julia and interfaced with the following tools: Plasmo for modeling stochastic programs, IPOPT for solving an extensive form of the problem to local optimality, SCIP to solve subproblems to global optimality, and Gurobi for solving linear programs (LPs). We leverage the syntax and interfaces of the algebraic modeling language JuMP in our implementation.

To compute lower bounds, SNGO first creates an LP relaxation (obtained from convexification and outer approximation using the auxiliary variable technique of [20]) of the form:

$$\begin{aligned} z^{LP}(X)&= \min \limits _{x\in X, y_s, w_s} \; \sum _{s\in \mathcal {S}} \,\bar{f}_s(x, y_s, w_s) \end{aligned}$$
(4.1a)
$$\begin{aligned} \mathrm{s.t.} \;\; \bar{g}_{s,j} (x, y_s, w_s)&\le 0, \;\; j =1,\ldots ,\bar{m}_s, s\in \mathcal {S}, \end{aligned}$$
(4.1b)

where \(\bar{f}_s(\cdot )\) and \(\bar{g}_{s,j}(\cdot )\) are linear underestimators for \({f}_s(\cdot )\) and \({g}_{s,j}(\cdot )\), respectively; and \(w_s\) are auxiliary variables. Here, \(\bar{m}_s \ge m_s\) holds, since auxiliary constraints might be introduced. These LP relaxations are constructed automatically. After solving the LP, the outer approximation is refined at the solution of the LP problem (resulting in a tighter LP relaxation). This process is repeated until the improvement of the lower bounds is sufficiently small. The LP is also a stochastic program because the underestimators are generated for each scenario subproblem. Note that this is a coupled but structured problem in which the number of variables and constraints increases linearly with the number of scenarios. This problem can, in principle, be solved using a structure-exploiting interior-point solver such as PIPS. In our implementation, we solve this problem directly with Gurobi. Since \(z^{LP}(X) \le z(X)\) holds by construction, the solution of the LP relaxation can be used as an alternative lower bound. SNGO also adds \(\alpha \)BB cuts [2] and cuts from the reformulation-linearization technique (RLT) [19] automatically to the relaxed LP. The cost of generating and solving the relaxed LP is modest compared to the solution of the nonlinear scenario subproblems in the branch and bound scheme.

After solving the LP relaxation, SNGO solves the lower bounding problem, which is decomposed into S subproblems of the form (2.6). The lower bound of the node is set to be the maximum of the lower bounds from the LP relaxation and of the lower bounding problem. We consider different strategies to strengthen the tightness of the lower bounds. First, it is easy to see that the scenario objective is always greater than the optimal value of a subproblem. Consequently, the cut \(f_s(x, y_s)\ge \beta _s(X)\) can be added to the node with partition X. Moreover, for all the descendants of this node \(X^{'} \subset X\), we have that \(\beta _s(X^{'})\ge \beta _s(X)\) holds; consequently, this cut is still valid. SNGO creates an auxiliary variable denoting the scenario objective and the lower bound of this variable is updated at each node. Second, we note that if we pick \(\tilde{x}_{s} \in \text {argmin} \{Q_s(x_s): x_s \in X\}\) and assume X is partitioned into two subsets \(X_1\) and \(X_2\) with \(X_1\cap X_2=\emptyset \), then either \(\tilde{x}_{s} \in X_1\) or \(\tilde{x}_{s} \in X_2\) is valid, or both are valid. If \(\tilde{x}_{s} \in X_1\), then this implies that \(\tilde{x}_{s} \in \text {argmin} \{Q_s(x_s): x_s \in X_1\}\). Consequently, the solution of the subproblem in a parent node can be reused in the children nodes and the associated subproblems do not need to be re-solved. The solution of lower bounding subproblems is thus stored and passed to the children nodes.

To compute upper bounds, SNGO first solves the extensive form (2.3) with a local NLP solver. At the root node, the local solver is run with a multi-start technique. If the local solver returns a feasible solution, the first stage solution from the local solver can be set to \(\hat{x}\) for the upper bounding problem. We also note that, when solving the extensive form, removing redundant duplicates of first stage constraints can aid the local solver. If the local solver fails, then SNGO takes the expected value of first stage solutions from the lower bounding subproblems to obtain \(\hat{x}\), as propopsed in [14]. Having \(\hat{x}\), the upper bounding problem is solved by solving S separate subproblems \(P_s(\hat{x})\) to global optimality. Through experiments we have found that, in many cases, upper bounds reach optimality at an early stage. Consequently, upper bounding problems are solved at the first \(L_t\) levels (default of three) and then every \(L_e\) levels (default of two) .

SNGO also implements bound tightening techniques including feasibility-based bound tightening (FBBT) and optimality-based bound tightening (OBBT). OBBT is performed by cycling through each component \(x_i\) of first stage variables and solving \(2n_x\) LPs of the form:

$$\begin{aligned}&\max / \min \limits _{x\in X, y_s, w_s} \; x_i \end{aligned}$$
(4.2a)
$$\begin{aligned}&\mathrm{s.t.} \;\; \bar{g}_{s,j} (x, y_s, w_s) \le 0, \;\; j =1,\ldots ,\bar{m}_s, s\in \mathcal {S}. \end{aligned}$$
(4.2b)

Each LP can be decomposed into S subproblems, where subproblem s is of the form:

$$\begin{aligned}&\max / \min \limits _{x\in X, y_s, w_s} \; x_i \end{aligned}$$
(4.3a)
$$\begin{aligned}&\mathrm{s.t.} \;\; \bar{g}_{s,j} (x, y_s, w_s) \le 0, \;\; j =1,\ldots ,\bar{m}_s, . \end{aligned}$$
(4.3b)

Each pair of subproblems minimizing or maximizing \(x_i\) gives a pair of lower and upper bounds of \(x_i\). Pairs of bounds from different scenarios are summarized by computing the tightest lower and upper bounds. In many other global optimization solvers, OBBT is not performed in every BB node because the computational expense of this procedure is high. For SNGO, however, the solution of the nonlinear subproblems is the main computational bottleneck and the number of first stage variables is usually small. Consequently, OBBT is performed at every BB node.

SNGO uses strong branching [1] to select branching variables. For each component of first stage variables \(x_i\), we compute the branching point \(x^b_i\), divide the current partition set X, into two sets \(X_1 = \{x| x \in X, x_i \le x^b_i\}\) and \(X_2 = \{x| x \in X, x_i \ge x^b_i\}\), and compute the lower bounds of the two subsets from LP relaxation \(z^{LP}(X_1)\) and \(z^{LP}(X_2)\). We compare the improvement of \(z^{LP}(X_1)\) and \(z^{LP}(X_2)\) over the lower bound of the current node \(\beta (X)\), and select the first stage variable \(x_i\) that achieves the largest improvement. When the improvement in terms of the lower bounds from the LP relaxation are lower than a threshold, the first stage variable with the longest width is selected. The branching point is decided according to the expected value of solutions from the lower bounding problem, that is \(\sum \limits _{s\in \mathcal {S}} \tilde{x}_{s} / |\mathcal {S}|\). We also enforce that the branching point keeps a minimum distance away from the variables bounds to ensure that the overall subdivision is exhaustive, which is a standard practice in global optimization software such as ANTIGONE [18] and BARON [21]. In our implementation, we project the expected value of solutions from the lower bounding problem to be within \([x^l_i + \theta (x^u_i - x^l_i), x^u_i - \theta (x^u_i - x^l_i) ]\), where the default setting for \(\theta \) is 0.1. To avoid excessive partitioning on one variable, a first stage variable with a range below a certain threshold (i.e., \(x^u_i - x^l_i<\gamma \) with \(\gamma =10^{-4}\)) is not considered for further branching until the ranges of all first stage variables are below this threshold.

The SNGO implementation follows six major computational steps: (1) solution of LP relaxation, (2) solution of extensive form to local optimality, (3) solution of at most S lower bounding subproblems to global optimality, (4) solution of at most S upper bounding subproblems to global optimality, (5) solution of \(2n_x \cdot S\) small LPs for OBBT, and (6) solution of \(2n_x\) LPs for branching variable selection. Experiments in Sect. 5 show that, for most problems, over half of the solution time is spent in steps (3) and (4). For some cases (ex2_1_8 and ex8_4_1 in Sect. 5.3), over \(90\%\) of the solution time is spent on these two steps. Ideally, the time spent on these steps grows linearly with the number of scenarios, however, the solution time for a subproblem is not consistent and the number of subproblems is not always equal to the number of scenarios at every node. For example, the information from step (1) and (2) might be enough to decide that this node can be pruned and thus no subproblem are solved. With the number of variables to branch on fixed (i.e., the number of first stage variables), the number of nodes needed is not expected to explode with the increase in the number of scenarios. The problems to be solved in steps (1, 2, 6) have an extensive form and the size of the problems grows linearly with the number of scenarios, while the problems to be solved at steps (3, 4, 5) can be decomposed into subproblems and the number of subproblems grows linearly with the number of scenarios. We also note that steps (3, 4, 5, 6) are directly parallelizable and step (1) can also be parallelized by using solvers such as PIPS and PIPS-NLP. In this paper, however, we use a serial implementation because we aim to compare algorithmic performance with off-the-shelf solvers on an equal basis. We also highlight that achieving an efficient parallel implementation is challenging because of memory management and load imbalancing issues.

When deriving convergence results, we assumed that the lower/upper bounding problems are solved exactly. However, in a practical implementation, we need to set up a tolerance for the global solver to solve lower/upper bounding problems. For each subproblem, we set the termination tolerance to be \(\frac{\epsilon }{2S}\). For the upper bounding subproblems, the primal bound returned from global solver is used to compute the upper bounds, while for lower bounding subproblems, the dual bound returned from global solver is used to compute the lower bounds.

Fig. 1
figure 1

Snippet of a stochastic NLP implementation in Plasmo.jl

We use Plasmo.jlFootnote 1 to express the stochastic NLPs under study. Plasmo.jl is a Julia-based algebraic modeling framework that facilitates the construction and analysis of large-scale structured optimization models. To do this, it leverages a hierarchical graph abstraction wherein nodes and edges can be associated with individual optimization models that are linked together [13]. Given a graph structure with models and connections, Plasmo.jl can produce either a pure (flattened) optimization model to be solved using off-the-shelf optimization solvers such as IPOPT and SCIP, or it can communicate graph structures to structure-exploiting solvers such as SNGO.

The code snippet shown in Fig. 1 illustrates how to implement stochastic problems in Plasmo.jl. As can be seen, the individual scenario models are created and appended to the parent node on-the-fly to create a two-level graph structure. The snippet also shows how to use Plasmo.jl to create a flattened NLP to be solved by off-the-shelf solvers like SCIP [17]. This allows the user to compare computational performance.

5 Computational experiments

We evaluate the performance of SNGO by using stochastic NLPs arising from applications such as optimal controller tuning and parameter estimation formulations for microbial community models, and a test set containing stochastic variants of the GLOBALlib set. The Julia scripts of the test cases are available at https://github.com/zavalab/JuliaBox/tree/master/SNGO/examples. We compare the computational results against those of the state-of-the-art global solver SCIP 4.0.0, which is linked to SoPlex 3.0.0 and IPOPT 3.12.7. Each solver terminates under one of the following conditions: (1) relative optimality gap satisfies \(\frac{\alpha _k - \beta _k}{\min \{|\beta _k|, |\alpha _k|\}} \le 1\%\), (2) absolute optimality gap satisfies \({\alpha _k - \beta _k} \le 0.01\), or (3) the search reaches a 12-h time limit. We use a computing server with Intel(R) Xeon(R) CPU E5-2698 v3 processors running at 2.30 GHz to conduct the experiments.

5.1 Optimal controller tuning

We consider the identification of optimal PID controller parameters capable of withstanding diverse scenarios on set-point changes \(\bar{x}_s\), model structural uncertainty (\(\tau _{x,s}\) and \(\tau _{u,s}\)), and disturbances \(d_s\). The optimal parameters aim to minimize the expected error between the state and the desired set-point. The formulation is given in (5.1). The first stage variables are the controller parameters (gain \(K_p\), integral gain \(K_i\), and derivative gain \(K_d\)) of the controller and the second-stage variables are the state time trajectories \(x_s(t)\) for each scenario \(s\in \mathcal {S}\).We generate the scenarios using Monte Carlo simulations and assume that \(\bar{x}\), \(\tau _{x}\)\(\tau _{u}\), and d are independent and uniform random variables. The state trajectories are discretized using an implicit Euler scheme and the integral term in (5.1d) is approximated as the accumulated errors prior to a given time step. We note that the number of state variables grows linearly with the number of scenarios. The largest problem solved includes 60 scenarios and has a total of 4803 variables, 4800 constraints, and 4800 nonlinear nonconvex terms.

Table 1 Computational performance of SNGO and SCIP on controller tuning problem

Table 1 compares the performance of SNGO, SCIP, IPOPT. We note that, when the number of scenarios is 10, 30, 40, 50 and 60, SNGO can solve problems to a gap of \(1\%\) while SCIP cannot solve the problem within 12 h. For the problem with 20 scenarios, SCIP can solve the problem but this requires 7.5 h of solution time while SNGO solves the problem in 30 min. The key advantage of SNGO over SCIP is that it only needs to branch on the three first stage variables while branching over second-stage variables is done implicitly in the solution of the scenario subproblems. SCIP, on the other hand, needs to branch on both first and second-stage variables (2323 in the 20 scenario case) simultaneously. As a result, the number of nodes visited by SCIP is 14,053 times more than those visited by SNGO. On the other hand, since at each node SNGO needs to solve subproblems to global optimality, the computational cost of SNGO for each BB node is 923 times larger than that of SCIP. Despite of this, the computational benefits are significant. We emphasize that the subproblems solved in SNGO are solved with SCIP. We have found SCIP to be robust in solving small to medium-sized problems but it is clear that direct branching on all variables is not scalable.

Table 1 also shows time spent on different tasks of the solver including solving lower bounding subproblems to global optimality (LB1), solving LP relaxations (LB2), solving upper bounding subproblems to global optimality (UB1), solving extensive form NLPs to local optimality (UB2), bound tightening, and branching variable selection (VS). For this problem, solving lower bounding subproblems is relatively expensive while solving upper bounding subproblems is relatively cheap. One reason for this is that the upper bounding subproblems are not solved at every node. Another reason is that this problem has the property that, when the first stage variables are fixed, each subproblem has only one feasible solution. When the number of scenarios is between 10 and 50, the number of nodes is quite consistent and the solution time per node grows almost linearly (as shown in the Fig. 2). However, when the number of scenarios is 60, the node tree explored might be quite different, thus the number of nodes explored and the solution time per node grows quite quickly. We note that SNGO is not able to solve problems with more than 60 scenarios within the time limit.

$$\begin{aligned}&\min \limits _{x_s(t),K_\text {p},K_\text {i}, K_\text {d}} \; \sum _{s\in \mathcal {S}} \int _{0}^{T} e_s(t)^2 dt \end{aligned}$$
(5.1a)
$$\begin{aligned}&\text {s.t.} \;\; \frac{dx_s(t)}{dt} = - \tau _{x,s} x_s(t)^2 + \tau _{u,s} u_s(t) + \tau _{d,s} d_s, s\in \mathcal {S}\end{aligned}$$
(5.1b)
$$\begin{aligned}&e_s(t) = x_s(t) - \bar{x}_s, s\in \mathcal {S}\end{aligned}$$
(5.1c)
$$\begin{aligned}&u_s(t) = {K_\text {p}} e_s(t) + {K_\text {i}} \int _0^t e_s(\tau ) \,d\tau + {K_\text {d}}\frac{de_s(t)}{dt}, s\in \mathcal {S} \end{aligned}$$
(5.1d)
Fig. 2
figure 2

Total solution time of SNGO and solution time per node to solve robust controller problem with different numbers of scenarios

Table 2 Computational performance of SNGO and SCIP on estimation problems for microbial growth models
Table 3 Computational performance of SNGO and SCIP on stochastic variants of GLOBALLib instance for \(|\mathcal {S}|=100\)
Table 4 Computational performance of SNGO and SCIP on stochastic variants of GLOBALLib instance for \(|\mathcal {S}|=1000\)
Table 5 Size of problems from stochastic version of GLOBALLib when the number of scenarios is 100 and 1000
Table 6 Computational performance of SNGO on five problem instances from GLOBALLib with different numbers of scenarios. We use “–” to denote the situations when the time limit is reached
Fig. 3
figure 3

Total solution time of SNGO to solve five problem instances from GLOBALLib with different numbers of scenarios

Fig. 4
figure 4

Solution time of SNGO per node to solve five problem instances from GLOBALLib with different numbers of scenarios

Fig. 5
figure 5

Evolution of lower and upper bounds for five problem instances from GLOBALLib

5.2 Estimation for microbial growth models

We now consider the problem of estimating parameters in microbial community models. This problem is not a stochastic program but exhibits the same algebraic structure if the time horizon is partitioned into blocks. We can view each time partition as a scenario and the parameters to be estimated and the variable linking partitions as first stage (complicating) variables. The problem formulation has the form:

$$\begin{aligned}&\min \limits _{x_k(t),\alpha , \beta } \; \sum _{k\in \mathcal {K}}\int _{t_k}^{t_{k+1}} (x_k(t) - \bar{x}_k)^2 dt \end{aligned}$$
(5.2a)
$$\begin{aligned}&\text {s.t.} \;\; \frac{dx_k(t)}{dt} = {\alpha } x_k(t)^2 + {\beta } x_k(t), \quad t\in [t_k,t_{k+1}],\quad k\in \mathcal {K} \end{aligned}$$
(5.2b)
$$\begin{aligned}&x_{k+1}(t_{k+1})=x_k(t_{k+1}),\quad k\in \mathcal {K}, \end{aligned}$$
(5.2c)

where \(\alpha ,\beta \) are the parameters to be estimated, \(\mathcal {K}\) is the set of time partitions, and \(x_k(\cdot )\) is the state trajectory in partition \(k\in \mathcal {K}\). We partition the time domain into 47 blocks to obtain a problem with 48 first stage variables, 1082 total variables, 1080 total constraints, and 1411 nonlinear terms. We solved the problem using 12 different real experimental data sets, corresponding to the growth of different species of bacteria.

Table 2 summarizes the performance of the solvers on these instances. As can be seen, SCIP cannot solve most problems within 12 h, while SNGO can solve most of the problems within 20 min. The shortest solution time is 8 min and the longest solution time is 2 h.

5.3 Stochastic GLOBALLib instances

We have also tested the algorithm using stochastic variants of the GLOBALLib instances. To construct such variants, we selected 28 problems with 20–50 variables, and added random perturbations to the right hand side of a subset of the constraints. The first 5 variables of the problem are selected as first stage variables. There are 7 problems (ex5_4_4, ex8_4_2, ex8_6_2, hhfair, launch, maxmin, prolog) also with 20–50 variables not selected because SCIP cannot solve a single scenario problem. The nonconvexities encountered in these 28 problems include bilinear terms, fractional terms, logarithmic terms, and signomial terms, as well as composite functions of these terms and linear terms. The size of the problems depends on the number of scenarios and is outlined in Table 5. The total number of variables when the number of scenarios is 1000 ranges from 21,005 to 44,005 (these are large-scale instances).

Tables 3 and 4 summarize the computational performance when the number of scenarios is 100 and 1000, respectively. We use “f” to indicate when the solver failed to return any bounds or candidate solution. For problems with 100 scenarios, SCIP can only solve 7 problems while SNGO can solve 26 out of 28 problems. For problems with 1000 scenarios, SCIP can only solve 4 problems while SNGO can solve 20 problems. Most of the problems are solved with only one node, this might be related to the way how random perturbations are introduced. However, a naive implementation of lower/upper bounding problems explores significantly more nodes. One reason why SNGO only explores one node for these problems is because bounding tightening and multi-start local search is quite extensive at the root node.

By comparing the results of Tables 3 and 4 we can again see favorable scalability of SNGO. For 19 problems (ex2_1_10, ex2_1_7, ex8_4_1, hydro, immun, st_fp7a, st_fp7b, st_fp7c, st_fp7d, st_fp7e, st_fp8, st_rv2, st_rv3, st_rv7, st_rv8, ex8_4_8, pollut, ramsey, srcpm), the solution time increases by less than 30 times when the number of scenarios increases from 100 to 1000 (a factor of 10). For 5 problems (abel, ex5_2_5, ex5_3_2, harker, and ex8_4_8_bnd), the increase in solution time is more dramatic. For the rest 4 problems SNGO reaches the time limit but the gap is kept below 10% in most cases (only two instances have a larger gap).

To further illustrate the scalability of SNGO, Table 6 shows how the time spent on different tasks changes as the number of scenarios increase. Figures 3 and 4 illustrate how the total solution time and the solution time per node change as the number of scenarios increase. We present five relatively difficult instances ex2_1_10, ex2_1_8, ex5_3_2, ex8_4_1, chenery. For problems (ex2_1_8, ex5_3_2, ex8_4_1, chenery), solving the subproblems to global optimality requires more than half of the solution time. For problems (ex2_1_10, ex2_1_8, chenery), the number of nodes explored is relatively consistent with the number of scenarios. For each test problem, the solution time per node grows nearly linearly when the number of scenarios are within a certain range (10–1000 for ex2_1_10, 20–500 for ex2_1_8, 10–200 for ex5_3_2, 20–200 for ex8_4_1 and chenery).

Figure 5 shows the progression of the lower and upper bounds for these five problems. The dots represent iterates under which the bound updates are obtained from lower and upper bounding problems while the crosses represent iterates under which updates are obtained from convexification and local search (i.e., lower bounds are obtained from the LP relaxation). The bounds obtained from convexification and local search at the root node are shown at iteration zero. Figure 5 shows that, while the lower bounding problems proposed play a significant role, the bounds obtained from convexification also help accelerate the solution process. Interestingly, for all test problems, optimal solutions are always found at the root node and the rest of the process is used to prove that these solutions are within the optimality gap. For ex2_1_8, the solution of upper bounding problems finds a significantly better solution than the local search with a multi-start scheme while, for the other problems, optimal solutions are found from local search. The gap between the primal problem and the lower bounding problem (the expected value of perfect information (EVPI)) is typically small in many applications. This can be verified in our test set, where we observe that the gaps at the first iteration are \(20.0\%, 16.0\%, 14.3\%, 1.36\%, 32.0\%\); while the initial gaps from convexification and local search observed at iteration zero are \(286\%,126\%, 43.8\%, 354\%, \ge 10,000\%\). This illustrates that our lower bounding approach provides tight lower bounds. We expect that a parallel version of our implementation can help reduce the solution times.

6 Conclusions and future work

We have proposed and implemented a global optimization algorithm for stochastic nonlinear programs. The main advantages of the proposed algorithm are that both lower bounding and upper bounding problems can be decomposed into smaller scenario subproblems and that branching needs to be performed only on the first-stage variables. We provide a proof of convergence and numerical evidence that the proposed approach significantly outperforms the state-of-the-art solver SCIP. As a part of future work, we are interested in extending the work to stochastic mixed integer nonlinear programs and to develop parallel implementations.