1 Introduction

The problem considered in this paper can be classified as a nonconvex, mixed-integer quadratically constrained problem (MIQCP) with the following general form:

where \({{\mathbf {x}}}\) is a vector of continuous non-negative variables and \({{\mathbf {y}}}\) are binary variables. BL is an \(\left( {i,j} \right) \)-index set that defines the bilinear \(x_i x_j \) and quadratic terms (\(i=j)\) present in the problem and it is assumed that it is possible to infer finite upper bounds \({{\mathbf {x}}}^{U}\) on variables \(x_i \) and \(x_j \). Set Q includes all functions \(f_q \), the objective function \(f_0 \) and all the constraints, \(a_{ijq} \) and \(d_q \) are scalars whereas \(B_q \) and \(C_q \) are matrices.

Problem (P) is frequently encountered in engineering. An example is the optimization of process networks in process systems engineering [1]. Nodes involve equipment units that are connected by streams, which can be characterized by a flowrate and a set of properties. Whenever there are multiple input streams to a node, mixing is involved, with the properties of the output stream being estimated as a weighted sum by flowrates of the properties of the input streams. As a consequence, nonconvex bilinear terms appear.

Process network problems are considerably more general than generalized pooling problems [2, 3] and include blending of crude oil [46] or refined products in refineries [79], and reducing freshwater consumption and wastewater treatment cost in industrial water networks [1014]. Researchers started looking into the steady-state design or operation of the system, typically considering solely continuous variables. Exceptions leading to mixed-integer bilinear problems were due to constraints allowing a connecting stream only if the flowrate is above a given minimum value [3], and the selection of one of alternative technologies for a unit [10]. A recent trend made possible by the advancements in computer hardware and optimization algorithms, is the incorporation of the scheduling component through the consideration of multiple time periods [6, 9]. This is also the case of the thermal unit commitment [15] and hydroelectric scheduling problems [16], which divide a 24-h horizon into hourly intervals. Note that no mixing is involved in the latter systems, with the bilinear terms appearing to compute either the production cost from a generator or the power output from a reservoir. In all mentioned cases, the binary variables appear linearly in the constraints. In contrast, the trim loss problem features bilinear terms with integer variables [17, 18]. While it is not of type (P) the relaxation approaches discussed in this paper can also be applied [19].

Global optimization solvers [2022] rely on linear or mixed integer linear programming relaxations of (P). A tight relaxation is crucial to achieve fast convergence and this is highly dependent on the bounds of the variables involved in the bilinear terms, improving as their domain is reduced. In optimality-based bound contraction [23, 24], two LPs or MILPs are solved for each such variable. Spatial branch-and-bound can also be seen as an iterative procedure acting on a single variable at a time (when branching). Simultaneous variable partitioning using piecewise McCormick envelopes [3, 11, 13, 2528] or multiparametric disaggregation [19, 29, 30] provide better approximations but may be too demanding computationally since additional binary, continuous variables and constraints, are part of the MILP relaxation.

Piecewise McCormick (PCM) involves ab initio partitioning of the domain of variables \(x_j \) in (P), being uniform partitioning the most common. Since the number of binary variables added is related to the number of partitions N, this is an important tuning parameter. Experiments in [31] used \(N=2, 4\,\hbox {and}\,8\) up to a maximum of 30 partitioned variables. For a water network problem, Castro [24] proposed a formula to compute N as a function of problem size, to avoid generating intractable MILPs [26]. However, even though it increases the chances that the bilinear problem will solve to global optimality within a given time limit, the piecewise relaxation option in GloMIQO is turned off by default [20] since convergence is generally faster that way. One way to improve the quality of the piecewise McCormick relaxation for a given N is to use partition-dependent bounds also for variables \(x_i \), which can be obtained by optimality-based bound contraction [24].

Multiparametric disaggregation (MDT) is a closely related but conceptually different type of MILP relaxation approach. It works by first discretizing variables \(x_j \) in (P) to a specified accuracy level p [29] and then adding slack variables to achieve continuous domains [30]. In very special cases, the accuracy parameter of MDT can be directly related to the number of uniform partitions N of PCM with the latter allowing for a better control of the complexity of the resulting MILP. As an example, for \(x_j \in \left[ {0,1} \right] \), N can be any integer greater than one, whereas the full set of values for \(p\in \left\{ {\ldots ,-3,-2,-1} \right\} \) corresponds to the subset \(N\in \left\{ {10,100,1000,\ldots } \right\} \). Accuracy in MDT can thus only change by one order of magnitude since partitioning is based on a numeric representation system. Yet, for MDT, the number of added binary variables grows logarithmically with N rather than linearly (for a common implementation of piecewise McCormick [30, 32], other piecewise linear schemes are not as harsh [33]). As a consequence, MDT may reach considerably smaller optimality gaps, due to a computational performance that is often orders of magnitude faster [32].

In the context of a spatial branch-and-bound algorithm with frequent domain reduction, PCM has another advantage over MDT. Besides also benefiting from tighter variable bounds, it can generate consistently shorter partitions while keeping the value of N constant. In contrast, MDT will require fewer discrete points (i.e. fewer binary variables) to include the reduced domain, but the location of those still required will be exactly the same. In other words, we have with PCM a double impact in relaxation quality with the same problem size whereas in MDT we have a smaller impact with a reduced size. Since the ultimate goal is to prove global optimality, the former is preferable.

In this paper, we bring multiparametric disaggregation closer to uniform piecewise McCormick. The main novelty is to consider a dimensionless domain for the discretized variables in MDT. More specifically, rather than discretizing all possible values that a variable can assume, we discretize the range [0, 1] between the lower (LB) and upper bound (UB). One immediate consequence is that the accuracy level parameter p can now be directly related to the number of partitions N, regardless of the LB and UB values, allowing for a better comparison between the two alternative relaxation methods for (P). Note that its most important feature of scaling logarithmically with the number of partitions is kept.

2 Normalized multiparametric disaggregation

Given a nonconvex bilinear term \(w_{ij} =x_i x_j \), multiparametric disaggregation works by discretizing \(x_j \) over a set of powers \(l\in \left\{ {p,\ldots ,P} \right\} \), where \(P= \lfloor \log _{10} x_j^U\rfloor \) and p is chosen by the user so as to reach a certain accuracy level [23, 29, 30]. The formula for P assumes discretization with base 10, preferred in this paper over bases 2 and 16 because it makes the approach easier to describe and illustrate (the interested reader can find in references [9, 19, 23, 32, 34] the full set of constraints for a generic base and extensive computational results for the binary representation system).

We now propose a normalized version of multiparametric disaggregation that discretizes \(\lambda _j \in \left[ {0,1} \right] \), an auxiliary variable that is used to compute \(x_j \) as a linear combination of its lower \(x_j^L \) and upper \(x_j^U \) bounds:

$$\begin{aligned} x_j =x_j^L +\lambda _j \left( {x_j^U -x_j^L } \right) \quad \forall j \end{aligned}$$
(1)

The exact representation of \(\lambda _j \) can be achieved by considering an infinite number of positions \(l\in {\mathbb {Z}}^{-}\) in the decimal system,

$$\begin{aligned} \lambda _j =\sum \limits _{l\in {\mathbb {Z}}^{-}} \lambda _{jl} \quad \forall j \end{aligned}$$
(2)

and by picking the appropriate digit \(k\in \left\{ {0,1,\ldots ,9} \right\} \) for each power l. This can be stated as a disjunction [35, 36], where binary variables \(z_{jkl} \) take the value of one if digit k is selected for position l for discretized variable \(\lambda _j \):

$$\begin{aligned} \mathop \bigvee \limits ^9_{k=0} \left[ {{\begin{array}{c} {z_{jkl} } \\ {\lambda _{jl} =10^{l}\cdot k} \\ \end{array} }} \right] \quad \forall j,l\in {\mathbb {Z}}^{-} \end{aligned}$$
(3)

The convex hull reformulation [37] of the disjunction in (3) can be simplified so as to generate a sharp formulation without disaggregated variables [38].

$$\begin{aligned}&\displaystyle \lambda _j =\sum \limits _{l\in {\mathbb {Z}}^{-}} \sum \limits _{k=0}^9 10^{l}\cdot k\cdot z_{jkl} \quad \forall j \end{aligned}$$
(4)
$$\begin{aligned}&\displaystyle \sum \limits _{k=0}^9 z_{jkl} =1\quad \forall j,l\in {\mathbb {Z}}^{-} \end{aligned}$$
(5)

Multiplying variable \(x_i \) by (1) and substituting \(x_i x_j \) and \(x_i \lambda _j \) with bilinear variables \(w_{ij} \) and \(\nu _{ij} \) leads to,

$$\begin{aligned} w_{ij} =x_i x_j^L +\nu _{ij} \left( {x_j^U -x_j^L } \right) \quad \forall \left( {i,j} \right) \end{aligned}$$
(6)

Substituting (4) into the definition of \(\nu _{ij} \) leads to the appearance of bilinear terms involving the product of a continuous and a binary variable.

$$\begin{aligned} \nu _{ij} =\sum \limits _{l\in {\mathbb {Z}}^{-}} \sum \limits _{k=0}^9 10^{l}\cdot k\cdot x_i z_{jkl} \quad \forall \left( {i,j} \right) \end{aligned}$$
(7)

We can now perform an exact linearization [39] by introducing new continuous variables \({\hat{x}}_{ijkl} =x_i z_{jkl} \) so that:

$$\begin{aligned} \nu _{ij}= & {} \sum \limits _{l\in {\mathbb {Z}}^{-}} \sum \limits _{k=0}^9 10^{l}\cdot k\cdot {\hat{x}}_{ijkl} \quad \forall \left( {i,j} \right) \end{aligned}$$
(8)
$$\begin{aligned} z_{jkl} x_i^L\le & {} {\hat{x}}_{ijkl} \le z_{jkl} x_i^U \quad \forall \left( {i,j} \right) ,k\in \left\{ {0,\ldots ,9} \right\} ,l\in {\mathbb {Z}}^{-} \end{aligned}$$
(9)

Finally, multiplying (5) by \(x_i \) and replacing the bilinear terms by the new continuous variables results in,

$$\begin{aligned} x_i =\sum \limits _{k=0}^9 {\hat{x}}_{ijkl}\quad \forall \left( {i,j} \right) ,l\in {\mathbb {Z}}^{-} \end{aligned}$$
(10)

The full set of mixed integer linear constraints for the exact representation of bilinear terms \(w_{ij} =x_i x_j \) is thus given by Eqs. (1), (46) and (810), leading to optimization problem (P’).

The following property can be readily established from the derivation of (P’).

Property 1

Problem (P’) is equivalent to problem (P), i.e. \(f_0^{\prime } =\mathop {\min }\nolimits _{{{\mathbf {x}}},{{\mathbf {y}}}} f_0^{\prime } \left( {{{\mathbf {x}}},{{\mathbf {y}}}} \right) =\mathop {\hbox {min}}\nolimits _{{{\mathbf {x}}},{{\mathbf {y}}}} f_0 \left( {{{\mathbf {x}}},{{\mathbf {y}}}} \right) =f_0 \).

2.1 Lower bounding formulation

Since it is impossible to compute the infinite sums over all negative integers, we represent \(\lambda _j \) to a finite accuracy level by replacing \(l\in {\mathbb {Z}}^{-}\) with \(l\in \left\{ {p,p+1,\ldots ,-1} \right\} \), where p is a negative integer chosen by the user. In order to close the gap between discretization points so as to allow for all possible values for \(\lambda _j \), we introduce slack variables \(\Delta \lambda _j \) that are bounded between 0 and \(10^{p}\), see Fig. 1. The continuous representation of \(\lambda _j \) is then given by:

$$\begin{aligned} \lambda _j= & {} \sum \nolimits _{l=p}^{-1} \sum \nolimits _{k=0}^9 10^{l}\cdot k\cdot z_{jkl} +\Delta \lambda _j \quad \forall j \end{aligned}$$
(11)
$$\begin{aligned} 0\le & {} \Delta \lambda _j \le 10^{p}\quad \forall j \end{aligned}$$
(12)

As an example, to obtain \(\lambda _j =0.4385\) for \(p=-2\), we need \(z_{j,4,-1} =1\), \(z_{j,3,-2} =1\) and \(\Delta \lambda _j =0.0085\le 0.01=10^{-2}\). To generate \(\lambda _j =1\) for \(p=-1\), the non-zero variables are \(z_{j,9,-1} =1\) and \(\Delta \lambda _j =0.1\).

Fig. 1
figure 1

The continuous representation of variable \(\lambda _{{j}} \) is achieved by discretizing the domain up to a certain level \({\textit{p}}\) and adding a bounded variable \({\Delta }{\lambda } _{j} \)

For the continuous representation of the bilinear term \(\nu _{ij} =x_i \lambda _j \), and following the same reasoning as before, we get:

$$\begin{aligned} \nu _{ij} =\sum \nolimits _{l=p}^{-1} \sum \nolimits _{k=0}^9 10^{l}\cdot k\cdot {\hat{x}}_{ijkl} +x_i \cdot \Delta \lambda _j \quad \forall \left( {i,j} \right) \end{aligned}$$
(13)

Notice the appearance of new bilinear terms \(x_i \cdot \Delta \lambda _j \) that are going to be relaxed using the McCormick envelopes [40], which in this case coincide with the reformulation linearization technique bound factor products \(x_i -x_i^L \ge 0\), \(x_i^U -x_i \ge 0\), \(\Delta \lambda _j \ge 0\) and \(10^{p}-\Delta \lambda _j \ge 0\) [41, 42]. Variables \(\Delta \nu _{ij} \) replace \(x_i \cdot \Delta \lambda _j \) in Eq. (13).

$$\begin{aligned}&x_i^L \cdot \Delta \lambda _j \le \Delta \nu _{ij} \le x_i^U \cdot \Delta \lambda _j\quad \forall \left( {i,j} \right) \end{aligned}$$
(14)
$$\begin{aligned}&\left( {x_i -x_i^U } \right) \cdot 10^{p}+x_i^U \cdot \Delta \lambda _j \le \Delta \nu _{ij} \le \left( {x_i -x_i^L } \right) \cdot 10^{p}+x_i^L \cdot \Delta \lambda _j\quad \forall \left( {i,j} \right) \end{aligned}$$
(15)

Replacing Eqs. (4) and (8) in (P’) by (1115), we obtain the new optimization problem (PR) that is a relaxation of (P). In other words, (PR) is feasible for values of \(w_{ij}\), \(x_i\) and \(x_j \) that do not satisfy \(w_{ij} =x_i x_j \).

The following property can be stated from the above derivation and discussion:

Property 2

The solution of problem (PR) yields a lower bound for problem (P), i.e. \(f_0^R =\mathop {\min }\nolimits _{{{\mathbf {x}}},{{\mathbf {y}}}} f_0^R \left( {{{\mathbf {x}}},{{\mathbf {y}}}} \right) \le f_0 \).

2.2 Algorithm for global optimization

All algorithms for global optimization work by approaching a lower bound from the solution of a relaxation problem to an upper bound typically obtained from solving a constrained version of the original problem. Spatial branch and bound is an iterative procedure for raising the lower bound that acts on a single variable at a time. In contrast, the proposed MILP relaxation (PR) of the original MINLP problem (P) involves simultaneous variable partitioning and so can be made as tight as desired simply by acting on accuracy level parameter p.

Property 3

As p approaches \(-\infty \), \(f_0^R \) approaches \(f_0 \).

Proof

As p approaches \(-\infty \) in (PR) we get from Eq. (12):

$$\begin{aligned} \mathop {\lim }\limits _{p\rightarrow -\infty } \Delta \lambda _j =\mathop {\lim }\limits _{p\rightarrow -\infty } 10^{p}=0 \end{aligned}$$

Then, from Eq. (14):

$$\begin{aligned} \mathop {\lim }\limits _{p\rightarrow -\infty } \Delta \nu _{ij}= & {} \mathop {\lim }\limits _{p\rightarrow -\infty } \left( {x_i^L \cdot \Delta \lambda _j } \right) =\mathop {\lim }\limits _{p\rightarrow -\infty } \left( {x_i^U \cdot \Delta \lambda _j } \right) \\= & {} x_i^U \cdot \mathop {\lim }\limits _{p\rightarrow -\infty } \Delta \lambda _j =x_i^U \cdot 0=0 \end{aligned}$$

And from Eq. (15):

$$\begin{aligned} \mathop {\lim }\limits _{p\rightarrow -\infty } \Delta \nu _{ij}= & {} \mathop {\lim }\limits _{p\rightarrow -\infty } \left[ {\left( {x_i -x_i^U } \right) \cdot 10^{p}+x_i^U \cdot \Delta \lambda _j } \right] \\= & {} \mathop {\lim }\nolimits _{p\rightarrow -\infty } \left[ {\left( {x_i -x_i^L } \right) \cdot 10^{p}+x_i^L \cdot \Delta \lambda _j } \right] \\= & {} \left( {x_i -x_i^L } \right) \cdot \mathop {\lim }\limits _{p\rightarrow -\infty } 10^{p}+\mathop {\lim }\limits _{p\rightarrow -\infty } \left( {x_i^L \cdot \Delta \lambda _j } \right) \\= & {} \left( {x_i -x_i^L } \right) \cdot 0+0=0 \end{aligned}$$

Variables \(\Delta \lambda _j \) and \(\Delta \nu _{ij} \) are thus eliminated. Since the domain of l in the summations and constraints of (PR) becomes the set of negative integers \({\mathbb {Z}}^{-}\), (PR) becomes (P’) and hence \(f_0^R \) approaches \(f_0^{\prime } \). Then, from Property 1, \(f_0^R \) approaches \(f_0 \). \(\square \)

The optimal solution from (PR) provides a good initialization point for the solution of (P). An efficient method to generate a feasible solution for (P) that will be an upper bound, is to fix all binary variables and solve the resulting NLP with a fast local solver [30, 32]. Note however, that this may render the problem infeasible, which is more likely to occur for higher p values.

The global optimization algorithm resulting from the aforementioned lower and upper bounding methods is as follows. Starting with the lowest accuracy level, we solve (PR) and (P) in sequence. If the difference between the values of the objective function is smaller than the given relative optimality tolerance \(\varepsilon \), then the algorithm terminates; otherwise, accuracy is increased to the next level and the problems are solved once more.

Algorithm

  • Step 0. Choose \(p=-1\) and let \(f_0^*=+\infty \).

  • Step 1. Solve (PR) to obtain \(f_0^R \) and point \(\left( {{{\mathbf {x}}}^{R},{{\mathbf {y}}}^{R}} \right) \).

  • Step 2. Add constraint \({{\mathbf {y}}}={{\mathbf {y}}}^{R}\) to (P) reducing it to an NLP. Using \({{\mathbf {x}}}^{R}\) as a starting point, solve the NLP with a local solver. If the NLP is feasible and a lower value is obtained for the objective function, update the upper bound \(f_0^*\) and best solution \(\left( {{{\mathbf {x}}}^{*},{{\mathbf {y}}}^{*}} \right) \).

  • Step 3. If \(\left( {f_0^*-f_0^R } \right) /f_0^R \le \varepsilon \), STOP, the solution \(\left( {{{\mathbf {x}}}^{*},{{\mathbf {y}}}^{*}} \right) \) is globally optimal. Otherwise, set \(p=p-1\) and return to step 1.

3 Motivation for normalized multiparametric disaggregation

The difference between the standard and normalized multiparametric disaggregation comes down to the choice of either discretizing the values of the variables or the range between their lower and upper bounds. Discretized variables will typically have dissimilar domains, perhaps even of a different order of magnitude, and it is difficult to predict which will lead to the largest error between the exact representation of the bilinear term and its mixed-integer linear relaxation. While it is straightforward to alter the formulations to consider a variable dependent accuracy parameter p, given the large number of discretized variables that may be involved it is highly desirable to keep the number of tuning parameters to a minimum and consider a global value instead.

Fig. 2
figure 2

Unlike the standard approach (MDT), the normalized multiparametric disaggregation (NMDT) generates the same number of discrete points for all discretized variables using a global accuracy parameter \({\textit{p}}\)

If this is done for the standard MDT, we are effectively giving a different importance to the discretized variables, as can be seen in the simple example in Fig. 2. Assuming accuracy in the units level (\(p=1)\), MDT generates 11 discrete points for variable \(x_1 \in \left[ {0,10} \right] \), whereas variable \(x_2 \in \left[ {3,5} \right] \) gets only three. In contrast, our new NMDT approach generates the same relaxation for \(x_1 \) using \(p=-1\), while \(x_2 \) gets the same number of discrete points as \(x_1 \). Compared to MDT, variable \(x_2 \) is now partitioned in uniform intervals that are five times shorter, potentially leading to a tighter relaxation.

The normalized multiparametric disaggregation can also take further advantage of information resulting from bound contracting procedures that are frequently called in spatial branch and bound algorithms, as can be seen from the solution of problem (P1). This has been adapted from Problem 106 in Hock and Schittowski [43] so as to facilitate comparison. More specifically, the lower bounds for variables \(x_4 \) to \(x_8 \) have been lowered from 10 to 0, but this has no effect in the value of the objective function: 7049.248. Of the different choices available concerning the discretized variables, we select to discretize \(x_4 ,\ldots ,x_8 \).

$$\begin{aligned}&\hbox {min}\,x_1 +x_2 +x_3\\&\hbox {s.t}.\,0.0025\left( {x_4 +x_6 } \right) -1\le 0\\&\qquad 0.0025\left( {-x_4 +x_5 +x_7 } \right) -1\le 0\\&\qquad 0.01\left( {-x_5 +x_8 } \right) -1\le 0\\&\qquad 100x_1 -x_1 x_6 +833.33252x_4 -83{,}333.333\le 0\\&\qquad x_2 x_4 -x_2 x_7 -1250x_4 +1250x_5 \le 0\\&\qquad x_3 x_5 -x_3 x_8 -2500x_5 +1{,}250{,}000\le 0\\&\qquad 100\le x_1 \le 10{,}000,1000\le x_2 ,x_3 \le 10{,}000,0\le x_4 ,\ldots ,x_8 \le 1000\qquad \qquad \qquad \mathbf{(P1)} \end{aligned}$$

Assume for illustrative purposes that discrete points are placed at the hundreds, which corresponds to a coarse accuracy level, as can be seen in Fig. 3 for variable \(x_5 \). It corresponds to \(p=-1\) for discretized variable \(\lambda _5 \) in NMDT and to \(p=2\) for discretized variable \(x_5 \) in MDT [30], leading respectively to the definition of 10 and 12 binary variables for this variable. The two extra binaries for MDT result from the way it was coded, and is due to discrete point 1000 that is part of the domain of \(x_5 \). In contrast, in NMDT, \(x_5 =1000\) is achieved by making \(z_{5,9,-1} =1\) and \(\Delta \lambda _5 =0.1\), recall Eq. (11). Despite this minor difference, the quality of the relaxation is the same for both approaches.

Fig. 3
figure 3

Discrete points generated for variable \({{x}}_5 \) of (P1) by standard (MDT) and normalized multiparametric disaggregation (NMDT) for the cases of original and tightened variable bounds

Table 1 Comparison between standard (MDT) and normalized multiparametric disaggregation (NMDT) for problem (P1)

If we now apply optimality based bound contraction [23] using the McCormick envelopes [40] to generate the relaxation, the domain can almost be cut by half to \(x_5 \in \left[ {107.78,580.22} \right] .\) As a consequence, MDT reduces the number of discrete points to five, with the first (\(x_5 =100)\) being actually placed outside the variable domain so as to obtain values up to 200 by adding the contribution of the slack variable. Due to the tighter bounds, the relative optimality gap, \(gap=\frac{f_0 -f_0^R }{f_0^R },\)decreases by a factor between 1.02 and 2.74 from \(p=2\) to \(p=-2\) (Table 1). In contrast, NMDT keeps the same ten discrete points, but since these are now closer together, the optimality gap can be reduced further by a factor between 2.00 (\(p=-1)\) and 4.48 (\(p=-5)\).

With the increase in problem size, it becomes apparent from the results in Table 1 that the relaxation problems generated from the new approach can be solved faster. However, this is no longer the case if their solution is preceded by bound contraction. In fact, there is a reduction in computational time for MDT, which was expected due to the fewer binary variables, but for NMDT the effort typically increases, reaching a factor of 4.37 for \(p=-5\). Thus, there is a clear tradeoff between relaxation quality and computational effort.

4 Piecewise McCormick relaxation

Piecewise McCormick envelopes [2, 11, 25] also provide a MILP relaxation for problem (P). Of at least 15 alternatives that have been proposed [26, 27], we consider the one below that relies on uniform partitioning, noting that others may perform better. The basic principle is to divide the domain of variable \(x_j \) in bilinear term \(w_{ij} =x_i x_j \) into N partitions and apply the McCormick envelopes [40] on each, taking advantage that the partition bounds \(x_{jn}^L \) and \(x_{jn}^U \) are tighter than the global bounds \(x_j^L \) and \(x_j^U \), see Fig. 4. Binary variables \(z_{jn} \) select the best partition.

The complete lower bounding formulation (PR-PCM) is given below.

5 Normalized multiparametric disaggregation versus piecewise McCormick

The new normalized multiparametric disaggregation approach can also be viewed as partitioning the domain of variable \(x_j \). However, every single decrease in p corresponds to a ten-fold increase in N.

Fig. 4
figure 4

In the piecewise McCormick approach, variable \({{x}}_{\textit{j}}\) is divided into N partitions, with binary variables \({{z}}_{{\textit{jn}}} \) selecting the optimal one

Remark 1

The partitioning (accuracy) level parameters N, of the piecewise McCormick, and p, of normalized multiparametric disaggregation technique, are related as follows: \(N=10^{-p}\).

Piecewise McCormick is thus better at controlling the tightness of the MILP relaxation of the original MINLP problem, which may be an important point when dealing with large bilinear problems. The advantage of multiparametric disaggregation is that it that scales much more favorably with the accuracy level [30, 32].

Remark 2

The number of binary variables required per partitioned variable \(x_j \) is equal to N for piecewise McCormick formulation (PR-PCM), and to \(10\log N\) for normalized multiparametric disaggregation.

From Remarks 1 and 2, one can claim that piecewise McCormick is the only option for \(N\in \left\{ {2,\ldots ,9} \right\} \) and predict that normalized multiparametric disaggregation will have a stronger computational performance for \(N\in \left\{ {100,1000,\ldots } \right\} \). But which approach is better when the same number of binary variables is involved (\(N=10)\)? This question will be answered in Sect. 6.1.

5.1 The special case of quadratic terms

Previous research has shown [30] that if (P) is restricted to strictly bilinear terms, the quality of the relaxation from multiparametric disaggregation is the same as from piecewise McCormick (for the same accuracy level). This is no longer the case if quadratic terms are involved, with the latter approach leading to a tighter relaxation.

For illustration purposes, consider the very simple nonlinear problem (P2) taken from [44], for which the global optimal solution is equal to 58.383672 at (2.555772, 3.130169).

As can be seen in Table 2, the value of the lower bound from (PR-PCM) is always higher than from (PR), with the absolute difference decreasing with an increase in the number of partitions.

Table 2 Value of objective function \({{f}}_{\textit{0}}^{{R}} \) for motivating example (P2) with quadratic terms

6 Numerical experiments

The performance of the new normalized multiparametric disaggregation approach is evaluated through the solution of 43 benchmark problems from the literature. These involve four distinct sets of process network problems of type (P) featuring solely strictly bilinear terms. More specifically, we have: (i) 18 water-using network design problems [34]: WUN_2-18 and 20; (ii) 15 distributed wastewater treatment networks [34]: WTN_2-16; (iii) 7 multiperiod blending problems [9]: MPBP_029, 146, 480, 531, 718, 721, 852; (iv) 3 hydroelectric scheduling problems considering a subset of reservoirs [23]: HYD_2, 4 and 7.

In the given problems, bilinear terms involve the multiplication of two different sets of variables: flowrates by concentrations in (i)–(iii) and flowrates by volumes in (iv). Of the two alternatives available to generate the relaxation MILP, we choose to discretize concentrations in (i)–(iii) and flowrates in (iv). Note that problem types (i)–(ii) are NLPs, whereas (iii)–(iv) are MINLPs. The latter are freely available in www.minlp.org and also include the model for the standard multiparametric disaggregation approach. Statistics related to problem size and number of nonlinear terms for problems (i)–(iii) can be found in [45] since all these problems are part of the GloMIQO 2.2 [20] test suite.

All mathematical formulations and algorithms were implemented in GAMS 24.3 and solved on an Intel i7-4790 (3.6 GHz) processor with 8 GB of RAM, solid-state drive and running Windows 7, 64-bit operating system. The MILP relaxation problems (PR) and (PR-PCM) were solved by CPLEX 12.6 running in parallel deterministic mode using up to 8 threads. The termination criteria were a relative optimality tolerance equal to \(10^{-6}\) or a maximum computational effort equal to 3600 CPUs for problems (i)–(iii) and 18,000 CPUs for the most difficult problems (iv). The NLPs arising from either the complete problem (P) or its constrained version without the binary variables (recall step 2 of the algorithm presented in Sect. 2.2), where tackled by local optimization solver CONOPT 3.16C [46]. Problem (P) was also tackled by global optimization solvers BARON 14.0.3 [21, 47] and GloMIQO 2.3 [20], a MIQCP standalone solver that is also part of the general MINLP solver ANTIGONE [22].

Optimality-based bound contraction, illustrated in Fig. 3, was used exclusively in problem (P1).

6.1 Which approach is best for 10 partitions?

We start by solving the MILP relaxation problems from normalized multiparametric disaggregation (NMDT) and piecewise McCormick (PCM), using respectively \(p=-1\) and \(N=10\) in all 43 benchmark problems. We know from the discussion in Sect. 5 that the MILPs from NMDT and PCM feature the same number of binary variables (the number of total variables and constraints is very similar) and give rise to the exact same relaxation \(f_0^R \), and so the computational time is the obvious choice for performance measure. To get the overall assessment of the performance of a solver compared to the other solvers in a test group, one can generate the performance profiles of Dolan and Moré [48], a widely used tool for benchmarking and comparing optimization software. We plot the cumulative distribution as a function of \(\tau \), a parameter that tells us that the performance ratio of a solver with respect to the best solver is below \(2^{\tau }\). As an example, the results for \(\tau =0\) give the probability that a solver is the fastest, while for \(\tau =1\) we have the probability that a solver is at most 2 times slower than the fastest solver.

From the results in Fig. 5, one can see that the new approach is clearly superior, being the fastest 88.3 % of the times vs. 20.9 % for PCM (note that ties count for both solvers and so the sum can be greater than one). Nevertheless, it should be highlighted that the computational times are typically of the same order of magnitude, the most notable exception being WTN_9, which took 1360 CPUs to solve to optimality by NMDT and 14,436 CPUs by PCM. It is the reason why the two curves meet around \(\tau =3.4\) (\(2^{3.4}\approx \frac{14{,}436}{1360})\).

Fig. 5
figure 5

Performance profile for computational time when considering 10 partitions for all discretized variables

The above comparison is important in the context of convex hull reformulations of generalized disjunctive programming (GDP) models [24, 3537, 49]. While for multiparametric disaggregation there is no need to define disaggregated variables associated to discretized variable \(x_j \) [as discussed in Sect. 2, Eqs. (34)], these can be found in (PR-PCM) in the form of continuous variables \({\hat{x}}_{jn} \). We do require disaggregated variables \({\hat{x}}_{ijkl} \) in (PR), which for \(N=10\) have a direct correspondence with \({\hat{x}}_{ijn} \) in (PR-PCM). Overall, when alternatives exist to model a particular problem as a GDP, one should look for the one leading to the most compact convex hull reformulation, since this will typically lead to the best performance, see also [49].

Table 3 Key performance indicators for the different global optimization algorithms

6.2 Comparison to commercial solvers

The algorithm discussed in Sect. 2.2 for normalized multiparametric disaggregation is capable of finding feasible solutions to problem (P) and proving global optimality if sufficient computational resources are provided. It has also been implemented for piecewise McCormick, after replacing \(p=-1\) with \(N=10\) in step 0, (PR) with (PR-PCM) in step 1, and \(p=p-1\) with \(N=10N\) in step 3 [32]. Note that it has been verified experimentally that for a particular iteration both NMDT and PCM lead to the same value for the objective function \(f_0^R \), if enough time is given for the MILP solver (CPLEX) to converge. The total computational time listed is the sum over all iterations and does not exceed the maximum limits defined in Sect. 6. Since the focus is on relaxation quality, the value of \(f_0^*\) used to compute the optimality gap was the best known solution and not necessarily the best value found in step 2. The same is also true for the outputs of commercial global optimization solvers BARON and GloMIQO that are used for comparative purposes.

Table 4 Detailed results for the solution of the benchmark NLP and MINLP problems

Table 3 provides an overview of computational performance by listing key performance indicators. While one can see that a particular algorithm is the best performer in at least two problems, the new method and GloMIQO have the most wins. Note that the first criteria used for this metric was the optimality gap, ultimately leading to solutions that are proven global optimal, followed by computational time. GloMIQO actually proved global optimality the most, while BARON was the most successful in finding the actual optima. All algorithms returned worse solutions in HYD_4 and HYD_7 than the outer approximation solver DICOPT [50] (the values in column 2 of Table 4 are taken from [23]). The other two failures for NMDT and GloMIQO came from the multiperiod blending problems. Interestingly, BARON could solve all multiperiod blending problems in less than half an hour (see detailed results in Table 4). On the other hand, and considering the comparison between the commercial solvers, GloMIQO did a better job solving the NLPs resulting from the two types of water network problems.

Another aspect worth highlighting from the results in Table 3, is related to the ability of the NMDT algorithm to find optimal solutions. Note that unlike the commercial algorithms that solve constrained versions of (P) in many nodes of the spatial branch and bound tree, the new algorithm solves (P) just a few times but still manages to find the optima. This is a direct consequence of having a tight relaxation that provides excellent starting points for the solution of (P) with a local solver (recall step 2 in Sect. 2.2) and is consistent with our previous work. In that sense, and due to the worse scaling of problem size, PCM cannot go to accuracy levels as deep as NMDT (see p values in Table 4), returning lower \(f_0^R \) values and inferior initialization points that sometimes lead to failure.

While the key performance indicators are useful, they do not tell the whole story. The detailed results in Table 4 are needed for a more thorough analysis, which is now performed with the help of the performance profiles of Dolan and Moré [48]. The optimality gap is selected as the performance measure, leading to the chart in Fig. 6.

Interestingly, there is a clear ranking amongst the global optimization algorithms tested: NMDT\(>\)GloMIQO\(>\)BARON\(>\)PCM, which remains constant throughout the x-axis. Note that the chosen upper bound (\(\tau =10)\) will count problems for which the optimality gap is at most three orders of magnitude the gap returned by the best algorithm. Thus, if an algorithm can solve a problem to global optimality, that problem will only contribute to the cumulative distribution function of the other algorithms if they return a gap below 0.1 %. In this case, the probability that a solver is within such region is equal to 93 % for NMDT, 76.7 % for GloMIQO and 65.1 % for BARON, which is an indication that the NMDT and commercial algorithms have complementary strengths (the gap for NMDT was always at least as good as the one for PCM). Overall, NMDT can obtain consistently low optimality gaps, with a 72.1 % probability that it will return the lowest gap amongst all algorithms within the given maximum computational time. The highest gap (3.57 %) is obtained for MPBP_480, which can actually be reduced to just 0.30 % if more time is given (5877 CPUs). In contrast, GloMIQO and BARON return gaps as high as 41.4 and 72.3 %.

Fig. 6
figure 6

Performance profile for optimality gap

7 Conclusions

This paper has presented a mixed integer linear programming model for the relaxation of mixed-integer quadratically constrained problems. It can be viewed as an upgrade of the computationally efficient multiparametric disaggregation technique that relies of the concept of discretizing one of the variables in every bilinear term to a certain accuracy level. The novel aspect concerns the discretization of the range between the variable’s bounds rather than its actual values, leading to a normalized approach that is more suitable for integration with bound contracting procedures that are part of spatial branch and bound algorithms.

It has been shown that accuracy levels in normalized multiparametric disaggregation can be related to the number of uniform partitions of a piecewise McCormick relaxation approach, thus allowing for a direct comparison. Through the solution of four sets of NLP and MINLP problems from the literature involving strictly bilinear terms, we have seen that the relaxation quality for a given number of partitions is the same. For the lowest possible accuracy setting, the new approach already outperforms the particular piecewise McCormick formulation presented in the paper, becoming significantly faster with the increase in the number of partitions due to the logarithmic vs. linear increase in problem size. This is translated into an ability to generate optimality gaps that can be orders of magnitude narrower.

The relaxation problem has been used as basis for a rigorous global optimization algorithm that was compared to commercial solvers BARON and GloMIQO. It has been shown that the proposed simple algorithm outperforms its state-of-the-art counterparts in forty-three benchmark problems, clearly indicating that multiparametric disaggregation should be used to a greater extent.