1 Introduction

Many process systems engineering problems can be cast as mixed-integer bilinear programs (MIBLP), such as crude oil scheduling [15, 21, 22, 26], multi-period gasoline blending [5, 18, 19, 23], water network design and operation [1, 2, 14, 16, 17, 20, 31], supply chain management [27] and hydrogen network optimization [13]. In this paper, we consider MIBLP in the following form:

$$\begin{aligned} \min&\;\; c({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta }),\\ s.t.&\;\; {\varvec{g}}({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta })\le 0,\\&\varvec{\delta } \in \{0,1\}^n,\\&(w_{l,m}, x_m, y_l)\in {\mathcal {S}},\quad \forall (l,m)\in \varTheta , \end{aligned}$$
(MIBLP)

where c, \({\varvec{g}}\) are linear functions \({\varvec{x}}\), \({\varvec{y}}\) are normalized variables involved in bilinear terms and \({\varvec{w}}\) represents values of the bilinear terms, index set \(\varTheta \) associates the components in \({\varvec{x}}\) and \({\varvec{y}}\) to components in \({\varvec{w}}\), and set \({S=\{(w,x,y)\in {\mathbb {R}}\times [0,1]^2:w=xy\}}\). Note that the binary variables \(\varvec{\delta }\) appear linearly in the objective and the constraints.

Due to the integer variables and the nonconvex bilinear terms, obtaining a global solution of a MIBLP is computationally challenging. In the spatial branch-and-bound framework, the solution efficiency depends heavily on the tightness of the relaxation problem. A straightforward way to generate a relaxation problem is to replace the bilinear terms with their convex envelopes [24], and then the relaxation problem is a mixed-integer linear program (MILP). Strengthening constraints derived from the Reformulation-Linearization Technique [29] can be added to tighten the MILP relaxation. A more sophisticated relaxation strategy, called piecewise McCormick relaxation, is to partition the domain of variables in a bilinear term into multiple subdomains and replace the bilinear term with its convex envelops on each individual subdomain [16, 35]. The disjunctive constraints from piecewise McCormick relaxation can be represented by a logarithmic number of binary variables and constraints [34], and this leads to a logarithmic partitioning scheme [7, 25].

Another approach to generate a tight MILP relaxation is based on variable discretization. In this approach, one variable in the bilinear term is represented by an integer part and a decimal part. The integer part can take a set of equally spaced discrete values in the variable range, and the decimal part represents the deviation of the variable value from one of the discrete values. Using a radix-based representation [19], the integer part can be expressed by a set of binary variables, then the bilinear term becomes the sum of a set of bilinear terms and most of the terms can be rigorously transformed into linear constraints via exact linearization. The base of the radix-based representation, R, can be any natural number, and the existing discretization based MILP relaxation methods differ primarily in the choice of R. To the best of our knowledge, Pham et al. [28] first considered the discretization of a continuous variable involved in a bilinear term. In order to reduce the search space within the branch and bound framework, they finitely enumerated possible values of the variable being discretized, so essentially \(R=1\) was selected. Teles et al. [32] developed a multi-parametric disaggregation technique (MDT), which is a discretization based method using \(R=10\). Following the MDT approach, Kolodziej et al. [18] constructed restricted MILP and relaxation MILP in branch-and-bound search for bilinear programming. They concluded that MDT yields better MILP relaxation than piecewise McCormick relaxation because of smaller problem size. Kolodziej et al. [19] extended MDT to the general radix-based representation. They also implemented both binary system (\(R=2\)) and decimal system (\(R=10\)) in their global optimization algorithms to solve multi-period pooling problems. Both methods showed significant computational advantage over commercial global optimization solvers. Castro [6] proposed a discretization method using a mixed-radix numeral system, which includes multiple bases coming from the prime factorization of b. Through a set of benchmark pooling problems, he showed the computational advantage of the mixed-radix discretization over a single-radix discretization as well as the logarithmic partitioning scheme in [25].

The discretization points divide the variable range into subintervals of equal length. Let b be the number of subintervals, then it indicates the level of discretization. The aforementioned discretization strategies assume that b is a power of the selected base R, but some other studies do not make this assumption. For example, Gupte et al. [10] considered the \(R=2\) case and allowed b to be any natural number no larger than a power of R, and they proved the condition under which the continuous relaxation of the discretization formulation is a hull relaxation. Gupte et al. [11] further performed an extensive computational study on discretization based MILP approximation for the pooling problem and the results suggested that discretization seems to be a promising approach especially for large-scale standard or generalized pooling problems. Figure 1 can help explain why in some cases a smaller b is preferred and why different choices of R may affect the computational efficiency. The figure shows the same optimal variable value (indicated by the star sign) in the contexts of three different discretization levels. In the first discretization level, \(R=2\) and \(b=2^3=8\), so the variable range is divided into 8 subintervals. In the second discretization level, \(R=10\) and \(b=10\), so the variable range is divided into 10 subintervals. When \(R=10\), the discretization formulation requires more binary variables, but the optimal point is closer to the discretization point that it belongs to (i.e. the discretization point located to the left), implying that the MILP relaxation at the optimal solution is tighter (see Sect. 4 for more details). Therefore, \(R=10\) might be better than \(R=2\). Following this idea, if the optimal point is right on a discretization point, as with the third discretization level in the figure (\(b=6\)), the MILP relaxation is equivalent to the original problem at the optimal point. Note that for realizing the third discretization level, one may choose any \(R>0\), but \(R=6\) is the most natural choice.

Fig. 1
figure 1

A same optimal point under different discretization levels

In this paper, we start from the general radix-based discretization formulation, called R-formulation, where R can be any natural number. We allow discretization level b to be any natural number no larger than a power of R. We prove the conditions under which R-formulation is locally sharp (i.e., its continuous relaxation leads to a hull relaxation of a part of the problem). We then propose a \(R^+\)-formulation and prove that it is always locally sharp. We further propose a H-formulation that uses hybrid base and prove that it is also always locally sharp. We also develop a global optimization method where the MILP relaxations are generated from adaptive discretization. The remaining part of the paper is organized as follows: Sect. 2 presents a general process for constructing discretization based MILP relaxations. Section 3 presents the three discretization formulations and provides theoretical results regarding the strength of the formulations. Section 4 proposes the global optimization algorithm with adaptive discretization. Section 5 performs computational study on a set of multi-period pooling problems, and the paper ends with conclusions in Sect. 6.

2 Discretization based MILP relaxation

Suppose we discretize variable x in each bilinear term, then Problem (MIBLP) can be re-written as:

$$\begin{aligned} \min&\;\; c({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta }), \\ s.t.&\;\; g({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta })\le 0,\\&\;\;\varvec{\delta } \in \{0,1\}^n,\\&\;\;(w_{l,m}, x_m, y_l)\in {\mathcal {S}}_I(b_m),\quad \forall (l,m)\in \varTheta , \end{aligned}$$
(D-MIBLP)

where

$$\begin{aligned} {\mathcal {S}}_I(b) = \left\{ \begin{aligned}&(w, x, y)\in {\mathbb {R}}\times [0,1]^2: \;\; x=(X+{\tilde{z}})/b,\;\; 0\le {\tilde{z}}\le 1, \;\; w=({\bar{w}}+{\tilde{y}})/b,\\&({\bar{w}},X,y)\in P(b), \;\; {\tilde{y}}={\tilde{z}}y \end{aligned}\right\} , \end{aligned}$$

and

$$\begin{aligned} {\mathcal {P}}(b)=\left\{ ({\bar{w}},X,y): \;\; {\bar{w}}=Xy,\quad X\in {\mathbb {Z}},\;\;0\le X\le b,\;\; 0\le y\le 1\right\} . \end{aligned}$$

Here continuous variable x is represented by integer variable \(X \in [0,b]\) and a residual term \({\tilde{z}}\in [0,1]\). Parameter b is the discretization level and it can be different for different bilinear terms. We can relax set \({\mathcal {S}}_I(b)\) by replacing \({\tilde{z}}y\) with its convex envelope, represented by \(\text {conv}({\tilde{z}}y)\), then the relaxed set is:

$$\begin{aligned} {\mathcal {S}}_{I-Relax}(b)= \left\{ \begin{aligned}&(w, x, y)\in {\mathbb {R}}\times [0,1]^2:\;\; x=(X+{\tilde{z}})/b,\;\;0\le {\tilde{z}}\le 1,\;\; w=({\bar{w}}+{\tilde{y}})/b,\\&({\bar{w}},X,y)\in P(b),\;\; {\tilde{y}}=\text {conv}({\tilde{z}}y) \end{aligned}\right\} . \end{aligned}$$

Since \({\tilde{y}}=\text {conv}({\tilde{z}}y)\) only involves a set of linear constraints, set \({\mathcal {S}}_{I-Relax}(b)\) does not contain any bilinear terms. It can be further reformulated such that it only contains 0–1 integer variables:

$$\begin{aligned} {\mathcal {S}}_{B-Relax}(b)= \left\{ \begin{aligned}&(w, x, y)\in {\mathbb {R}}\times [0,1]^2:\;\; x=(X+{\tilde{z}})/b,\;\;0\le {\tilde{z}}\le 1,\;\; w=({\bar{w}}+{\tilde{y}})/b,\\&({\bar{w}},X,y)\in {\mathcal {F}}(b),\;\; {\tilde{y}}=\text {conv}({\tilde{z}}y) \end{aligned}\right\} , \end{aligned}$$

where \({\mathcal {F}}(b)\) denotes any representation of set \({\mathcal {P}}(b)\) that uses binary variables instead of integer variables, such as a radix-based representation. Discretization based methods in the literature have different ways to represent \({\mathcal {F}}(b)\). Naturally, we hope the change from \({\mathcal {S}}_{I-Relax}(b)\) to \({\mathcal {S}}_{B-Relax}(b)\) does not loose tightness of the relaxation, so we assess the quality of any binary representation \({\mathcal {F}}(b)\) according to the following two conditions:

$$\begin{aligned} Proj_{{\bar{w}},X,y}{\mathcal {F}}(b)={\mathcal {P}}(b), \end{aligned}$$
(C1)
$$\begin{aligned} Proj_{{\bar{w}},X,y}(\text {relax}({\mathcal {F}}(b)))=\text {conv}({\mathcal {P}}(b)), \end{aligned}$$
(C2)

where \(Proj_{{\bar{w}},X,y}\) denotes the projection to the \(({\bar{w}},X,y)\) space, and \(\text {relax}(\cdot )\) denotes continuous relaxation. The first condition means that \({\mathcal {F}}(b)\) is a valid binary representation of \({\mathcal {P}}(b)\). \({\mathcal {F}}(b)\) usually includes additional binary and continuous variables, so it is defined on a higher-dimensional space. The second condition means that formulation \({\mathcal {F}}(b)\) is sharp for representing set \({\mathcal {P}}(b)\) [33]. Furthermore, we say that \({\mathcal {F}}(b)\) is locally sharp for the discretized MIBLP problem, because its continuous relaxation leads to the convex hull of part of the problem [33].

By replacing set \({\mathcal {S}}_I(b_m)\) in Problem (D-MIBLP) with the relaxed set \({\mathcal {S}}_{B-Relax}(b_m)\), we generate the following MILP relaxation for the original problem:

$$\begin{aligned} \min&\;\; c({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta }), \\ s.t.&\;\; g({\varvec{w}}, {\varvec{x}}, {\varvec{y}},\varvec{\delta })\le 0,\\&\;\;\varvec{\delta } \in \{0,1\},\\&\;\; (w_{lm}, x_m, y_l)\in {\mathcal {S}}_{B-Relax}(b_m),\quad \forall (l,m)\in \varTheta .\\ \end{aligned}$$
(R-MILP)

Table 1 summarizes the list of symbols for parameters, variables and sets used in the formulations and algorithm description.

Table 1 List of symbols

3 Binary representation formulations and their strength

In this section, we present three formulations for the binary representation \({\mathcal {F}}(b)\), which all lead to valid relaxations to Problem (MIBLP). We also prove the tightness of continuous relaxations of the three formulations.

3.1 R-formulation

We first consider the case that the binary representation \({\mathcal {F}}(b)\) is the R-formulation, which is the general radix-based discretization formulation widely used in the literature [19]. We write set \({\mathcal {F}}(b)\) as \({\mathcal {R}}(b)\) for this case, and this set is:

$$\begin{aligned} {\mathcal {R}}(b)=\left\{ \begin{aligned}&({\bar{w}},X,y,z,{\hat{y}})\in {\mathbb {R}}\times {\mathbb {Z}}\times [0,1]\times \{0,1\}\times {\mathbb {R}}:\\&X=\sum _{i=1}^{k} R^{i-1}(\sum _{j=0}^{R-1} j\cdot z_{ij}),\quad 0\le X\le b,\\&{\bar{w}}=\sum _{i=1}^{k}R^{i-1}(\sum _{j=0}^{R-1} j\cdot {\hat{y}}_{ij}),\\&\sum _{j=0}^{R-1}z_{ij}=1,\quad \sum _{j=0}^{R-1}{\hat{y}}_{ij}=y,\quad \forall i\in \{1,\ldots ,k\},\\&0\le {\hat{y}}_{ij}\le z_{ij},\quad \forall i\in \{1,\ldots ,k\},j\in \{1,\ldots ,k\} \end{aligned}\right\} , \end{aligned}$$

where \(k= \lceil \log _Rb\rceil \) and R is the predefined base. The following proposition implies that first condition (C1) is satisfied by the R-formulation:

Proposition 1

\({\mathcal {P}}(b)=\)Proj\(_{{\bar{w}},X,y}{\mathcal {R}}(b)\).

Proof

By construction, \({\mathcal {P}}(b)\subseteq \)Proj\(_{{\bar{w}},X,y}{\mathcal {R}}(b)\). Now we prove \(\text {Proj}_{{\bar{w}},X,y}{\mathcal {R}}(b)\subseteq {\mathcal {P}}(b)\). Pick any \(({\bar{w}},X,y,z,{\hat{y}})\in {\mathcal {R}}(b)\), and we are to show \(({\bar{w}},X,y) \in {\mathcal {P}}(b)\), or more specifically, \({\bar{w}}=Xy\).

First we show that \({\hat{y}}_{i,j}=yz_{i,j}\) (\(\forall i, j\)). If \(z_{i,j}=0\), then \({\hat{y}}_{i,j}\le z_{i,j}=0\), so \({\hat{y}}_{i,j}=yz_{i,j}\). If \(z_{i,j}=1\), then \(\forall j'\ne j\), \(z_{i,j'}=0\) and therefore \({\hat{y}}_{i,j'}=0\). Hence, \({\hat{y}}_{i,j}=y\) holds. So in this case we also have \({\hat{y}}_{i,j}=yz_{i,j}\).

Based on the above result,

$$\begin{aligned} {\bar{w}}&=\sum _{j=0}^{R-1}\sum _{i=1}^{k}j\cdot R^{i-1}\cdot {\hat{y}}_{i,j}\\&=\sum _{i=1}^kR^{i-1}\sum _{j=0}^{R-1}j\cdot y\cdot z_{i,j} \\&= \left( \sum _{i=1}^kR^{i-1}\sum _{j=0}^{R-1}j\cdot z_{i,j}\right) \cdot y\\&=Xy. \end{aligned}$$

This completes the proof. \(\square \)

To see whether \({\mathcal {R}}(b)\) satisfies condition (C2), we introduce an intermediate set \({\mathcal {M}}(b)\), which was initially introduced by Gupte et al. [10] for \(R=2\):

$$\begin{aligned} {\mathcal {M}}(b) =\left\{ \begin{aligned}&({\bar{w}},X,y)\in {\mathbb {R}}\times {\mathbb {Z}}\times [0,1]: \;\; by+X-b\le {\bar{w}} \le by,\\&0\le {\bar{w}}\le X,\;\; 0\le X\le b\\ \end{aligned}\right\} . \end{aligned}$$

Then we have the following proposition that indicates the conditions under which (C2) is satisfied.

Proposition 2

conv(\({\mathcal {P}}(b)\))= relax(\({\mathcal {M}}(b)\)) = Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {R}}(b)\))) if and only if \({b \ge R^k-1}\).

Proof

First, the convex hull of \({\mathcal {M}}(b)\) and the convex hull of \({\mathcal {P}}(b)\) are same and they are equal to \(\text {relax}({\mathcal {M}}(b))\), as explained in Gupte et al. [10]. Therefore, \(\text {relax}({\mathcal {M}}(b))=\text {conv}({\mathcal {P}}(b))\).

Second, we prove \(\text {relax}({\mathcal {M}}(b)) \subseteq \text {Proj}_{{\bar{w}},X,y}(\text {relax}({\mathcal {R}}(b)))\). From \({\mathcal {P}}(b)=\) Proj\(_{{\bar{w}},X,y}{\mathcal {R}}(b)\) (Proposition 1), we have \({\mathcal {P}}(b)\subseteq \) Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {R}}(b)\))). And since Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {R}}(b)\))) is convex, \(\text {conv}({\mathcal {P}}(b))\subseteq \text {Proj}_{{\bar{w}},X,y}(\text {relax}({\mathcal {R}}(b)))\). And the fact \(\text {relax}({\mathcal {M}}(b))=\text {conv}({\mathcal {P}}(b))\) gives the result.

Next, we prove Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {R}}(b)\))) \(\subseteq \) relax(\({\mathcal {M}}(b)\)) if \({b \ge R^k-1}\). Pick any point \(({\bar{w}},X,y,z,{\hat{y}})\in \) relax(\({\mathcal {R}}(b)\)), then:

Therefore, \(({\bar{w}},X,y)\in \) relax(\({\mathcal {M}}(b)\)). Note that the condition \({b \ge R^k-1}\) implies \(b = R^k-1\) or \(b = R^k\) because \(k= \lceil \log _Rb\rceil \).

Finally, we show that \({b \ge R^k-1}\) is also necessary for Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {R}}(b)\))) \(\subseteq \) relax(\({\mathcal {M}}(b)\)) by an counterexample. Construct a point \(({\bar{w}}, X, y, z, {\hat{y}})\in relax({\mathcal {R}}(b))\) by setting \(\forall i\)

$$\begin{aligned} \begin{aligned}&z_{i,0}=\dfrac{R-1}{R},\quad z_{i,R-1}=\dfrac{1}{R},\quad z_{i,j'}=0 \;\; (\forall j'\ne 0,R-1), \\&{\hat{y}}_{i,R-1}=\dfrac{1}{R}, \quad {\hat{y}}_{i,j'}=0 \ (\forall j'\ne R-1), \end{aligned} \end{aligned}$$

then

$$\begin{aligned} \begin{aligned}&X=\sum _{j=0}^{R-1}\sum _{i=1}^k j\cdot R^{i-1}z_{i,j}=\dfrac{R^k-1}{R-1}\dfrac{R-1}{R}=\dfrac{R^k-1}{R},\\&y=\sum _{j=0}^{R-1}{\hat{y}}_{i,j}= \dfrac{1}{R}, \\&{\bar{w}}=\sum _{j=0}^{R-1}\sum _{i=1}^k j\cdot R^{i-1}{\hat{y}}_{i,j}=\dfrac{R^k-1}{R-1}\dfrac{R-1}{R}=\dfrac{(R^k-1)}{R}. \end{aligned} \end{aligned}$$

Note that \({\bar{w}}=\dfrac{R^k-1}{R}=(R^k-1)y\). Therefore, if \(b<R^k-1\), \({\bar{w}}>by\) and \(({\bar{w}}, X, y)\notin \) relax(\({\mathcal {M}}(b)\)). \(\square \)

3.2 \(R^+\)-formulation

Proposition 2 indicates that in order for R-formulation to be locally sharp, the discretization level b needs to be a power of the base R or the power minus 1. To avoid this problem, we can add strengthening constraints \(X+by-b\le {\bar{w}}\le by\) to \({\mathcal {R}}(b)\). The new formulation is called \(R^+\)-formulation, as shown below:

$$\begin{aligned} \mathcal {R^+}(b):=\left\{ \begin{aligned}&({\bar{w}},X, y, z,{\hat{y}})\in {\mathbb {R}}\times {\mathbb {Z}}\times [0,1]\times \{0,1\}\times {\mathbb {R}}: \\&X=\sum _{i=1}^{k} R^{i-1}(\sum _{j=0}^{R-1} j\cdot z_{i,j}),\\&{\bar{w}}=\sum _{i=1}^{k}R^{i-1}(\sum _{j=0}^{R-1} j\cdot {\hat{y}}_{i,j}),\\&\sum _{j=0}^{R-1}z_{i,j}=1,\quad \sum _{j=0}^{R-1}{\hat{y}}_{i,j}=y,\quad \forall i\in \{1,\ldots ,k\},\\&0\le {\hat{y}}_{i,j}\le z_{i,j},\quad \forall i\in \{1,\ldots ,k\},\ j\in \{1,\ldots ,k\},\\&X+by-b\le {\bar{w}}\le by \end{aligned}\right\} , \end{aligned}$$

where \(k= \lceil \log _Rb\rceil \). By construction, binary representation \(\mathcal {R^+}(b)\) satisfies condition (C1), so the following proposition holds.

Proposition 3

\({\mathcal {P}}(b)=\)Proj\(_{{\bar{w}},X,y}\mathcal {R^+}(b)\).

In addition, we can prove that \(\mathcal {R^+}(b)\) always satisfies condition (C2), as stated in the following proposition.

Proposition 4

conv(\({\mathcal {P}}(b)\))= relax(\({\mathcal {M}}(b)\)) = Proj\(_{{\bar{w}},X,y}\)(relax(\(\mathcal {R^+}(b)\))).

Proof

The inequalities \({\bar{w}}\le by\) and \({{\bar{w}}\ge b(y-1)+X}\) in relax(\({\mathcal {M}}(b)\)) are implied by the strengthening constraints. The other inequalities in relax(\({\mathcal {M}}(b)\)) are implied by Proj\(_{{\bar{w}},X,y}\)(relax(\(\mathcal {R^+}(b)\))) regardless of the relationship between b and \(R^k\), as shown in the proof for Proposition 2. \(\square \)

3.3 H-formulation

When using the discretization based MILP relaxation for global optimization, we may want to increase the discretization level adaptively in order to avoid unnecessarily large MILP relaxations. In this case, the relationship between b and R varies during the optimization procedure. According to the previous discussions, either we use the R-formulation that may lose local sharpness or the \(R^+\)-formulation that includes additional constraints. Here we propose a novel H-formulation that uses multiple bases to represent b:

$$\begin{aligned} {\mathcal {H}}(b)=\ \left\{ \begin{aligned}&({\bar{w}},y,X,z,{\hat{y}})\in {\mathbb {R}}\times {\mathbb {Z}}\times [0,1]\times \{0,1\}\times {\mathbb {R}}:\\&X=\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}z_{i,j}\cdot j \right) ,\ 0 \le X\le b,\\&{\bar{w}}=\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}\cdot j \right) ,\\&\sum _{j=0}^{R_i-1}z_{i,j}= 1,\quad \quad \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}=y,\quad \quad \forall i=1,\ldots , k,\\&0\le {\hat{y}}_{i,j}\le z_{i,j}, \quad \forall i=1,\ldots ,k, \quad \forall j=1,\ldots , R_i-1\\ \end{aligned}\right\} , \end{aligned}$$

where the bases \(R_i\) are selected such that \(b=\prod _{i=1}^{k}R_i\). For example, if \(b=12\), then we may select \(R_1=3\) and \(R_2=4\), or \(R_1=6\) and \(R_2=2\). Essentially \({\mathcal {H}}(b)\) uses an extension of the classical radix-based number system that may use different bases for different digits. Next, we prove that the H-formulation satisfies both condition (C1) and (C2) (and therefore it is locally sharp).

Proposition 5

\({\mathcal {P}}(b)=\)Proj\(_{{\bar{w}},X,y}{\mathcal {H}}(b)\).

Proof

By construction, \({\mathcal {P}}(b)\subseteq \)Proj\(_{{\bar{w}},X,y}{\mathcal {H}}(b)\). Now we prove \(\text {Proj}_{{\bar{w}},X,y}{\mathcal {H}}(b)\subseteq {\mathcal {P}}(b)\). Pick any \(({\bar{w}},X,y,z,{\hat{y}})\in {\mathcal {H}}(b)\). Same to the proof for Proposition 1, we can show that \({\hat{y}}_{i,j}=yz_{i,j}\) (\(\forall i, j\)). Consequently,

$$\begin{aligned} {\bar{w}}&=\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}\cdot j \right) ,\\&=\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}yz_{i,j}\cdot j \right) ,\\&=\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}z_{i,j}\cdot j \right) y,\\&=Xy. \end{aligned}$$

This completes the proof. \(\square \)

Proposition 6

\(\text {Proj}_{x,y,{\bar{w}}}(\text {relax}({\mathcal {H}}(b)))= \text {relax}({\mathcal {M}}(b))=\text {conv}({\mathcal {P}}(b))\).

Proof

Pick any point \(({\bar{w}},y,X,z,{\hat{y}})\in \) relax(\({\mathcal {H}}(b)\)), then

$$\begin{aligned} {\bar{w}}&= \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}\cdot j\right) \ge 0, \\ {\bar{w}}&=\ \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}\cdot j\right) \le \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}z_{i,j}\cdot j\right) =X,\\ {\bar{w}}&=\ \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}{\hat{y}}_{i,j}\cdot j\right) \\&\le \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}y(R_i-1)\right) \\&=\ y\left( (R_1-1)R_2\ldots R_k+(R_2-1)R_3\ldots R_k+\ldots (R_{k-1}-1)R_k+R_k-1 \right) \\&=\ y(b-1)\le by,\\ {\bar{w}}&=\ \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}({\hat{y}}_{i,j}-z_{i,j}+z_{i,j})\cdot j\right) \\&=\ \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}({\hat{y}}_{i,j}-z_{i,j})\cdot j\right) +\sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}z_{i,j}\cdot j\right) \\&\ge \sum _{i=1}^{k}\left( \prod _{n=i+1}^{k}R_n\right) \left( \sum _{j=0}^{R_i-1}({\hat{y}}_{i,j}-z_{i,j})(R_i-1)\right) +X\\&=\ \sum _{i=1}^{k}\left( (R_i-1)\prod _{n=i+1}^{k}R_n\right) (y-1)+X \\&=(b-1)(y-1)+X \\&\ge b(y-1)+X, \end{aligned}$$

so \(({\bar{w}},X,y) \in \text {relax}({\mathcal {M}}(b))\). This proves Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {H}}(b)\)))\(\subseteq \) relax(\({\mathcal {M}}(b)\)).

We can also prove conv(\({\mathcal {P}}(b)\))\(\subseteq \)Proj\(_{{\bar{w}},X,y}\)(relax(\({\mathcal {H}}(b)\))) using the same procedure in the second part of proof for Proposition 2. In addition, according to the first part of the proof for Proposition 2, \(\text {relax}({\mathcal {M}}(b))=\text {conv}({\mathcal {P}}(b))\). This completes the proof. \(\square \)

4 Global optimization algorithm with adaptive discretization

In this section, we develop a global optimization algorithm with adaptive discretization (GOAD) for MIBLP. The algorithm finds a global solution by iteratively solving a lower bounding problem and an upper bounding problem.

4.1 Lower and upper bounding problems

At iteration \(k'\), we solve the following lower bounding problem:

figure a

This formulation is slightly different from the general form (R-MILP) introduced in Sect. 2. First, integer cuts are added to exclude previously visited \(\varvec{\delta }\) solutions through set \(\Delta ^{k'}\), which is

$$\begin{aligned} \Delta ^{k'}=\left\{ \varvec{\delta }\in \{0,1\}^n: \sum _{i \in {\mathcal {I}}_1^p}\delta _i-\sum _{j \in {\mathcal {I}}_0^p}\delta _j\le |{\mathcal {I}}_{1}^p|-1,\quad \forall p=1,2,\ldots ,k' \right\} . \end{aligned}$$

Index set \({\mathcal {I}}_1^p=\{i\in \{1,\ldots ,n\}: \ \delta ^{p}_i=1 \}\), and \(\varvec{\delta }^{p}\) denotes the solution generated in iteration p. Index set \({\mathcal {I}}_0^p=\{i\in \{1,\ldots ,n\}: \ \delta ^{p}_i=0 \}\). Second, the last constraint in the formulation is added to exclude any solution that is no better than the incumbent solution. UB is the current best upper bound on the optimal objective value and it is updated throughout the solution procedure. Finally, the discretization level \(b^{k'}_m\) may vary over the iterations, which is explained later.

Let the value of \(\varvec{\delta }\) at the solution of (L-MILP\(^{k'}\)) be \(\varvec{\delta }^{k'}\), then fixing \(\varvec{\delta }\) to this value will yield the following upper bounding problem:

figure b

This is a nonlinear programming (NLP) problem and has no integer variables. This NLP can be solved to global optimality much more quickly than the original MIBLP. For all problem instances in the computational study, the solution time for Problem (U-NLP\(^{k'}\)) is negligible in comparison to the solution time for the original MIBLP.

4.2 Adaptive discretization scheme

We choose to adaptively determine the discretization levels of variables instead of using predefined discretization levels. The adaptive scheme has two benefits. First, at the optimal solution many variables take value at the boundary of their domains, so these variables do not need to be discretized. Some variables have little impact on the optimal objective value, so a low discretization level is sufficient for them. Our proposed adaptive discretization scheme is likely to result in reasonable discretization levels for different variables. Second, the discretization level of a variable x determines the residual term \({\tilde{z}}\) needed to represent its solution, and a smaller \({\tilde{z}}\) would imply a smaller relaxation gap because the gap comes from the difference between \({\tilde{y}}={\tilde{z}}y\) in \({\mathcal {S}}_I(b)\) and \({\tilde{y}}=\text {conv}({\tilde{z}}y)\) in \({\mathcal {S}}_{B-Relax(b)}\).

The adaptive discretization strategy perfectly matches the hybrid radix-based H-formulation, because it determines one base at each iteration. We use the example in Fig. 2 to explain the adaptive discretization scheme. For convenience, we assume there is only one normalized variable x to be discretized. The optimal solution of the problem is shown as \(x^*\) in the figure. This value is not known by the algorithm in the first several iterations, but provided in the figure as a reference. In iteration 1, there is no discretization, so the optimal solution of the lower bounding problem (L-MILP\(^{1}\)) only has the x value, represented by \(x^*_1\). We want to divide the range of x into \(R_{1}\) pieces such that \(x^*_1\) is close to the discretization point it belongs to. From the discretization formulation, \(x=(X+{\tilde{z}})/b\) where b is the \(R_1\) that we search for, so \({\tilde{z}}=R_1x-X\). Note that X is the largest integer that is no larger than \(R_1x\), so \({\tilde{z}}=R_1x-\lfloor R_1x\rfloor \). Therefore, the residual term for \(x_1^*\) is \(R_1 x^*_1-\lfloor R_1 x^*_1\rfloor \). On the other hand, we do not want to increase the discretization level too much within one iteration, so we limit \(R_{1}\) to be between 2 and 10. Thus we can formalize the calculation of \(R_1\) as:

$$\begin{aligned} R_{1}=\underset{2\le r\le 10}{argmin} \ \left( r\cdot x^*_1-\lfloor r\cdot x^*_1\rfloor \right) . \end{aligned}$$

As shown in Fig. 2, following this formula we get \(R_1=3\) in iteration 1. Then the discretization level \(b=R_1\) and the range of x is divided into 3 equal pieces for the next iteration.

Fig. 2
figure 2

Example of adaptive discretization

In iteration 2, we have the residual term value at the optimal solution of (L-MILP\(^{2}\)), represented as \({\tilde{z}}^*_{2}\). As the residual term is scaled to [0, 1], \({\tilde{z}}^*_{2}\) reflects the scaled distance between the \(x^*_{2}\) and the largest discretized point that is no larger than \(x^*_{2}\). The actual difference is \({\tilde{z}}^*_{2}/R_2\). Optimal point \(x^*_{2}\) is located in the 3rd subinterval of the range. Now we want to further divide the 3rd subinterval into several pieces such that \(x^*_2\) is close to the discretization point it belongs to. Following the following formula that uses \({\tilde{z}}^*_2\):

$$\begin{aligned} R_{2}=\underset{2\le r\le 10}{argmin}\ \left( r\cdot {\tilde{z}}^*_2-\lfloor r\cdot {\tilde{z}}^*_2\rfloor \right) \end{aligned}$$

we get \(R_2=2\), which indicates further dividing the 3rd subinterval into two pieces, and therefore dividing each other subinterval into two as well. As a result, the new discretization level is \(b=R_1R_2=6\). For all subsequent iterations, we follow the same formula that uses the residual value. For example, in iteration 3, using the residual value \(R_3\) should be 5, and then the discretization level becomes \(b=R_1 R_2 R_3=30\). From Fig. 2, the optimal solution from the lower bound problem \(x^*_3\) is already very close to the actual optimal solution \(x^*\), so the algorithm is likely to converge right after this iteration.

In some cases, the residual from the optimal solution of (L-MILP\(^{k'}\)) is close to an extreme point of the range and the above formula won’t give a reasonable result. For these cases we simply further divide all subintervals into two. Specifically, for iteration \(k'\), if \(|{\tilde{z}}^*_{k'}-0.5|\ge 0.45\) (or \(|x^*_{k'}-0.5|\ge 0.45\) when \(k'=1\)), we set \(R_{k'}=2\). Therefore, the full formula for calculating \(R_{k'}\) is:

$$\begin{aligned} R_{k'}= {\left\{ \begin{array}{ll} \underset{2\le r\le 10}{argmin} \ \left( r\cdot x^*_{k'}-\lfloor r\cdot x^*_{k'}\rfloor \right) ,\qquad &{}k'=1 \ \text {and}\ |x^*_{k'}-0.45|< 0.45,\\ \underset{2\le r\le 10}{argmin} \ \left( r\cdot {\tilde{z}}^*_{k'}-\lfloor r\cdot {\tilde{z}}^*_{k'}\rfloor \right) ,\qquad &{}k'\ge 2\ \text {and}\ |{\tilde{z}}^*_{k'}-0.45|< 0.45,\\ 2,\qquad &{}k'=1\ \text {and}\ |x^*_{k'}-0.45|\ge 0.45,\\ 2,\qquad &{}k'\ge 2\ \text {and}\ |z^*_{k'}-0.45|\ge 0.45.\\ \end{array}\right. } \end{aligned}$$

Note that R-formulation or \(R^+\) formulation can also be used for the adaptive discretization scheme. In this case, the discretization level b is still calculated from \(R_{k'}\), but it is represented in the formulation using the predefined base R.

4.3 The GOAD algorithm

Table 2 presents the GOAD algorithm. Finite termination of the algorithm is guaranteed, because total number of integer values \(\varvec{\delta }\) can take is finite and the integer cuts in (L-MILP\(^{k'}\)) exclude all previously generated values \(\varvec{\delta }\).

Table 2 The GOAD algorithm for problem (MIBLP)

5 Computational study

The computational study contains two parts. The first part aims to compare R- and \(R^+\)-formulations and to verify Propositions 2 and 4. The second part aims to compare the performance of the GOAD algorithm using different discretization formulations as well as commercial global optimization solvers.

We test on 20 multi-period pooling problems, including 5 benchmark problems from the literature (P146t3, P480t4, P531t4, P721t3, P852t4), and they are available in MINLP problem library at www.minlp.org. For each benchmark problem, we also construct several variants that address different numbers of time periods, so each benchmark problem is extended to 4 problem instances that have 3, 4, 5, 6 time periods (labelled as t3–t6 respectively). The parameters for all problem instances are provided in “Appendix A”. We formulate all problem instances using a formulation that is essentially the source-based (SB) formulation in [23], but the blender operating modes are described using the expression in [19]. The details of the formulation are provided in “Appendix B”. In this formulation, each bilinear term includes a variable that represents a fraction of inventory leaving a tank, and we always discretize this variable for the MILP relaxation. The simulation is performed on a computer with 4-core 3.40 GHz CPU, 4GB memory, and Windows 10 operating system. The binary variables \(z_{i,j}\) in all formulations are set to be SOS1 variables in GAMS.

5.1 Comparison of R- and \(R^+\)-formulations

From Propositions 2 and 4, the key difference between R and \(R^+\)-formulations is that the former is locally sharp only when \(b\ge R^k-1\) while the latter is always locally sharp. In order to show this, we compare the MILP relaxation problem (L-MILP) in three scenarios: \(b=100\), \(b=512\) and \(b=750\). In each scenario, (L-MILP) adopts 4 different formulations: R formulation and \(R+\) formulation with \(R=2\) and \(R=10\) respectively. The problems are modeled on GAMS 24.8.5 [3] and solved by CPLEX 12.7.1 using 6 threads, with absolute/relative tolerance \(10^{-5}\). We set CPLEX option ”cut \(=\) − 1” so that the solver do not generate cuts that may influence the comparison. We set a time limit of 3600 s for each problem.

We also compare the root gaps of the formulations, by solving their continuous relaxations at the root node. Specifically, the root gap is calculated as

$$\begin{aligned} \text {Gap}=|\frac{\text {Obj}_{MILP}-\text {Obj}_{LP}}{\text {Obj}_{MILP}}|\times 100\%, \end{aligned}$$

where \(\text {Obj}_{MILP}\) and \(\text {Obj}_{LP}\) represent the optimal objective values of the MILP problem and its continuous relaxation, respectively. The continuous relaxations are also solved by CPLEX 12.7.1 with absolute/relative tolerance \(10^{-4}\).

Table 3 Comparison of the fixed-base discretization formulations for b \(=\) 100
Table 4 Comparison of the fixed-base discretization formulations for b \(=\) 512
Table 5 Comparison of the fixed-base discretization formulations for b \(=\) 750

Tables 3, 4 and 5 show respectively the results for \(b=100\), \(b=512\), \(b=750\). For each problem instance, we highlight the lowest root gap(s) in bold unless all formulations have the same root gap. First, we discuss the second set of problem instances P480t3-P480t6 which best demonstrate the strength of the formulations. When \(b=100\) (Table 3), \(2^+\)-formulation has a tighter root gap than 2-formulation. This verifies Proposition 2 and 4, because for \(b=100\) and \(R=2\), \(k=\lceil \log _2 100\rceil =7\), so \(b<2^k-1\) and 2-formulation is not locally sharp but \(2^+\)-formulation is. On the other hand, 10-formulation and \(10^+\)-formulation have the same root gap (which is also same to that of \(2^+\)-formulation). This is because b is a power of 10 and therefore both formulations are locally sharp. When \(b=512\) (Table 4) where b is a power of 2, 2-formulation, \(2^+\)-formulation, and \(10^+\)-formulation have the same root gap, because they are all locally sharp. 10-formulation has a larger root gap because it is not locally sharp. When \(b=750\) (Table 5) where b is neither a power of 2 or a power of 10, only \(2^+\)- and \(10^+\)-formulations are locally sharp. So the root gaps of 2- and 10-formulations are larger. The different formulations for the other sets of the problem instances in Tables 3, 4 and 5 have the same root gap, so they can not be used to verify Proposition 2 and 4. But they do not contradict the propositions either.

Second, we can see that for each problem the locally sharp formulation with the smallest problem size usually has the best solution efficiency, such as the 2-formulation when \(b=512\). However, the solution efficiency is also influenced by many other factors, such as the variables to be branched on and the relaxation problems solved at the nodes. So it is hard to conclude that \(R=2\) is better than \(R=10\) or the other way. Next, we will examine whether within the GOAD framework certain formulation is better than the others.

5.2 Performance of the GOAD algorithm

We compare three GOAD algorithms that use \(2^+\)-, \(10^+\)-, and H-formulations, general-purpose global optimization solvers SCIP 7.0 [9] and BARON 21.1.13 [30], and Gurobi 9.1.2 [12] that can solve MIBLP problems to global optimality. We implement all solution methods via GAMS 36.2 [4]. For a GOAD algorithm, the lower bounding problem (L-MILP) is solved by CPLEX 20.1.0 with 6 threads, the upper bounding problem (U-NLP) is solved by SCIP 7.0, and for both bounding problems the time limit is 1800 s and absolute/relative tolerance is \(10^{-3}\). For using a global solver to a problem, we also allow the solver to use 6 threads. For either a GOAD algorithm or global solver, the total time limit is 3600 s and the absolute/relative tolerance is \(10^{-3}\).

Note that at an iteration we determine the next discretization level according to the optimal solution of (L-MILP). This problem has the same optimal objective value under different discretization formulations, but since it usually has multiple optimal solutions, different discretization formulations may give us different optimal solutions and therefore different discretization level for the next iteration. For a fair comparison, we use the discretization level determined from the H-formulation, no matter what discretization formulation the GOAD algorithm uses.

Figure 3 shows the performance profiles [8] of the solution methods under consideration. The horizontal axis \(\tau \) denotes the relative solution time (or called performance ratio), and the vertical axis \(\rho _s({\tau })\) denotes the percentage of problem instances solved within relative time \(\tau \). It can be seen that the GOAD algorithm with any of the three discretization formulations performs better than SCIP and BARON. There is no clear winner among the three discretization formulations within the GOAD framework, which is not unexpected as the three formulations are all locally sharp. In addition, the solution times for the three formulations not only depend on the tightness of the continuous relaxations, but also depend on other factors such as number of binary variables, number of constraints, the nodes explored in the branch-and-bound search. Therefore, the relative performance of the three formulations is different for different problem instances. It can also be seen that Gurobi performs better than any of the GOAD algorithm, SCIP, and BARON. This may be because Gurobi can generate additional strengthening cuts from solutions of convex relaxations, while the GOAD algorithm does not generate cuts from solutions of lower bounding problems.

Fig. 3
figure 3

Performance profiles of different solution methods

Table 6 Additional results for the GOAD algorithm

Table 6 provides additional results for the GOAD algorithm. The column of \(b_{max}\) shows the maximum discretization level among discretization levels of all variables at the termination of GOAD, and the column of \(b_{min}\) shows the minimum discretization level. The last column includes the percentage of variables that has the minimum discretization level at the termination of GOAD. It can be seen that for all problems, the vast majority of the variables have the minimum discretization level, but some variables may need a much higher discretization level. This indicates the importance of allowing different discretization levels in the adaptive discretization scheme. In addition, the various maximum discretization levels indicate the importance of flexible choice of bases. For example, P721t6 requires \(b=1024\) for some variable; within a traditional 10-based formulation it would have to partition the variable range into \(10^4\) pieces. P721t4 requires \(b=10\) for some variables and a traditional 2-based formulation may not be as good as a 10-based formulation.

6 Concluding remarks

The GOAD method we develop for MIBLP generates strong MILP relaxations via an adaptive discretization scheme. The adaptive scheme allows to increase discretization levels of different variables over the iterations and different variables may have different discretization levels. The advantage of this flexibility is demonstrated in the computational study, where different variables do have different discretization levels at the convergence of the algorithm and most of them are much smaller than the maximum discretization level. The adaptive discretization scheme demands that the discretization level b can be any natural number instead of a power of a predefined base R. In this case, the classical fixed-base R-formulation may not be locally sharp if \(b<R^k-1\) (where \(k=\lceil log_R b\rceil \)). We propose a \(R^+\)-formulation that includes extra strengthening constraint and prove that this formulation is locally sharp no matter what b is. We further propose a H-formulation that allows multiple bases and prove that it is also locally sharp because b is a multiplication of the bases used. Compared to a fixed-base formulation, H-formulation fits more naturally for the adaptive discretization scheme. The GOAD method solves the original MIBLP by iteratively solving the lower bounding MILP that comes from discretization and an upper bounding NLP. The time for global optimization of the NLP is negligible in comparison to the solution time for the MILP, so the latter determines the overall solution time.

The GOAD method has better performance profile than general-purpose global solvers SCIP and BARON in the computational study, no matter whether \(2^+\)-, \(10^+\), or H-formulation is used for discretization. However, the GOAD method is less efficient than Gurobi, which is not a general-purpose global solver but can solve MIBLPs to global optimality. Further improvement of the GOAD method may be achieved by introduction of improved discretization formulations and strengthening cuts. For example, we may limit the bases in the H-formuation to prime numbers, such as in [6], and add more cuts in set \({\mathcal {S}}_{B-Relax}(b_m)\) to strengthen the lower bounding problem. We may also generate additional valid cuts from the solution of the lower bounding problem at each step and use these cuts to strengthen the problem in future steps.