1 Introduction

Polynomial programming is the class of nonlinear optimization problems involving polynomials only:

(1)

where f and all \(g_j\) are n-variate polynomials. We will assume throughout that

  • the feasible set is compact;

  • for all \(x \in F\) one has .

The second condition is theoretically without loss of generality (by scaling the \(g_j\)).

In general, these problems are \(\mathcal {NP}\)-hard, since they contain problems like the maximum cut problem as special cases; see e.g. Laurent (2009). In 2015, Lasserre et al. introduced the so-called bounded degree sum-of-squares (BSOS) hierarchy to obtain a nondecreasing sequence of lower bounds on the optimal value of problem (1) when the feasible set is compact. Each lower bound in the sequence is the optimal value of a semidefinite programming (SDP) problem. Moreover, the authors of Lasserre et al. (2015) showed that, under some assumptions, this sequence converges to the optimal value of problem (1). From their numerical experiments, they concluded that the BSOS hierarchy was efficient for quadratic problems.

In this paper, we analyze the BSOS hierarchy in more detail. We also study variants of the BSOS hierarchy where the number of variables is reduced.

The numerical results in this paper are on pooling problems, that belong to the class of problems with bilinear functions. The pooling problem is well-studied in the chemical process and petroleum industries. It has also been generalised for application to wastewater networks; see e.g. Karuppiah and Grossmann (2006). It is a generalization of a minimum cost network flow problem where products possess different specifications. There are many equivalent mathematical models for a pooling problem and all of them include bilinear functions in their constraints. Haverly (1978) described the so-called P-formulation, and afterwards many researchers used this model (e.g. Adhya et al. 1999; Ben-Tal et al. 1994; Foulds et al. 1992). Also, there are Q-, PQ-, and TP-formulations; in this paper, we use the P- and PQ-formulations and point the reader to the survey by Gupte et al. (2017) where all the formulations are described, as well as the PhD thesis by Alfaki (2012).

One way of getting a lower bound for a pooling problem is using convex relaxation, as done e.g. by Foulds et al. (1992). Similarly, Adhya et al. (1999) introduced a Lagrangian approach to get tighter lower bounds for pooling problems. Also, there are many other papers studying duality (Ben-Tal et al. 1994), piecewise linear approximation (Misener et al. 2011), heuristics for finding a good feasible solution (Alfaki and Haugland 2014), etc. A relatively recent survey on solution techniques is Misener and Floudas (2009).

In a seminal paper in 2000, Lasserre (2001) first introduced a hierarchy of lower bounds for polynomial optimization using SDP relaxations. Frimannslund et al. (2010) tried to solve pooling problems with the LMI relaxations obtained by this hierarchy. They found that, due to the growth of the SDP problem sizes in the hierarchy, this method is not effective for the pooling problems. In this paper, we therefore consider the BSOS hierarchy as an alternative, since it is not so computationally intensive.

The structure of this paper is as follows: We describe the BSOS hierarchy in Sect. 2. In Sect. 3 the pooling problem is defined, and we review three mathematical models for it, namely the P-, Q- and PQ-formulations. Also, we solve some pooling problems by the BSOS hierarchy in this section. Section 4 contains the numerical results after a reduction in the number of linear variables and constraints in each iteration of the BSOS hierarchy.

2 The bounded degree SOS (BSOS) hierarchy for polynomial optimization

In this section, we briefly review the background of the BSOS hierarchy from Lasserre et al. (2015). For easy reference, we will use the same notation as in Lasserre et al. (2015).

In what follows \({\mathbb {N}}^{k}\) will denote all k-tuples of nonnegative integers, and we define

$$\begin{aligned} {\mathbb {N}}^{k} _d=\left\{ \alpha \in {\mathbb {N}}^{k}: \sum _{i=1}^k \alpha _i \le d \right\} . \end{aligned}$$

The space of \(n\times n\) symmetric matrices will be denoted by \({\mathcal {S}}^n\), and its subset of positive semidefinite matrices by \({\mathcal {S}}_+^n\).

Consider the general nonlinear optimization problem (1). For fixed \(d\ge 1\), the following problem is clearly equivalent to (1):

(2)

The underlying idea of the BSOS hierarchy is to rewrite problem (1) as

The next step is to use the following positivstellensatz by Krivine (1964) to remove the quantifier ‘\(\forall x \in F\)’.

Theorem 1

([14], see also §3.6.4 in Laurent 2009 ) Assume that \(g_j(x)\le 1\) for all \(x\in F\) and \(j=1,\ldots ,m\), and \(\{1,g_1,\ldots ,g_m\}\) generates the ring of polynomials. If a polynomial g is positive on F then

$$\begin{aligned} g(x) = \sum _{(\alpha ,\beta )\in {\mathbb {N}}^{2m}}\lambda _{\alpha \beta }\prod _{j=1}^{m}g_j(x)^{\alpha _j}(1-g_j(x))^{\beta _j} \end{aligned}$$

for finitely many \(\lambda _{\alpha \beta } > 0\).

Defining

we arrive at the following sequence of lower bounds (indexed by d) for problem (1):

$$\begin{aligned} f^* \ge \sup _t \left\{ t: f(x)-t = \sum _{(\alpha ,\beta )\in {\mathbb {N}}^{2m}_d}\lambda _{\alpha \beta }h_{\alpha \beta }(x)\right\} . \end{aligned}$$
(3)

For a given integer \(d > 0\) the right-hand-side is a linear programming (LP) problem, and the lower bounds converge to \(f^*\) in the limit as \(d \rightarrow \infty \), by Krivine’s positivstellensatz. This hierarchy of LP bounds was introduced by Lasserre (2005).

A subsequent idea, from Lasserre et al. (2015) was to strengthen the LP bounds by enlarging its feasible set as follow: If we fix \(\kappa \in {\mathbb {N}}\), and denote by \(\sum [x]_\kappa \) the space of sums of squares polynomials of degree at most \(2\kappa \), then we may define the bounds

$$\begin{aligned} q^\kappa _d:=\sup _{t, \lambda } \left\{ t: f(x)-t -\sum _{(\alpha ,\beta )\in {\mathbb {N}}^{2m} _d}\lambda _{\alpha \beta }h_{\alpha \beta }(x)\ \in \varSigma [x]_\kappa \right\} . \end{aligned}$$

The resulting problem is a semidefinite programming (SDP) problem, and the size of the positive semidefinite matrix variable is determined by the parameter \(\kappa \), hence the name bounded-degree sum-of-squares (BSOS) hierarchy. By fixing \(\kappa \) to a small value, the resulting SDP problem is not much harder to solve than the preceding LP problem, but potentially yields a better bound for given d.

For fixed \(\kappa \) and for each d, one has

(4)

where \(s(\kappa )={n+\kappa \atopwithdelims ()\kappa }\), and \(v_\kappa (x)\) is a vector with a basis for the n-variate polynomials up to degree \(\kappa \).

Letting \(\tau =\max \{deg(f),2\kappa ,d\max _j{deg(g_j)}\}\), we may eliminate the variables x in two ways to get an SDP problem:

  • Equate the coefficients of the polynomials on both sides of the equality in (4), i.e. use the fact that two polynomials are identical if they have the same coefficients in some basis.

  • Use the fact that two n-variate polynomials of degree \(\tau \) are identical if their function values coincide on a finite set of \(s(\tau ) = {n+\tau \atopwithdelims ()\tau }\) points in general position.

The second way of obtaining an SDP problem is called the ‘sampling formulation’, and was first studied in Lofberg and Parrilo (2004). It was also used for the numerical BSOS hierarchy calculations in Lasserre et al. (2015), with a set of \(s(\tau )\) randomly generated points in \({\mathbb {R}}^n\).

We will instead use the points

$$\begin{aligned} \varDelta (n,\tau )=\left\{ x\in {\mathbb {R}}^n \ \left| \ \tau x\in {\mathbb {N}}^{n},\ \sum _{i=1}^{n} x_i\le 1 \right\} . \right. \end{aligned}$$

Thus we obtain the following SDP reformulation of (4):

(5)

The following theorem, proved in Lasserre et al. (2015), gives some information on feasibility and duality issues for the BSOS relaxation.

Theorem 2

(Lasserre et al. 2015) If problem (1) is Slater feasible, then so is the dual SDP problem of (5). Thus (by the conic duality theorem), if the SDP problem (5) has a feasible solution, it has an optimal solution as well.

Note that problem (5) may be infeasible for given d and \(\kappa \). One only knows that it will be feasible, and therefore \(q^\kappa _d\) will be defined, for sufficiently large d.

Remark 1

Assume that at the d-th level of the hierarchy we have \(q_d^\kappa =f^*\), i.e. finite convergence of the BSOS hierarchy, then

(6)

Let \(x^* \in F\) be an optimal solution (\(f(x^*) = f^*\)), then it is clear from (6) that

and due to the fact that Q is positive semidefinite, then

(7)

Hence, for an \((\alpha , \beta )\in {\mathbb {N}}^{2m} _d\), if \(h_{\alpha \beta }(x)\) is not binding at an optimal solution, then \(\lambda _{\alpha \beta }=0\). We will use this observation to reduce the number of variables later on. \(\square \)

3 The P-, Q- and PQ-formulations of the pooling problem

In this section, we describe the P-, Q- and PQ-formulations of the pooling problem. The notation we are using is the same as in Gupte et al. (2017). To define the pooling problem, consider an acyclic directed graph \(G=({\mathcal {N}},{\mathcal {A}})\) where \({\mathcal {N}}\) is the set of nodes and \({\mathcal {A}}\) is the set of arcs. This graph defines a pooling problem if:

  1. (i)

    the set \({\mathcal {N}}\) can be partitioned into three subsets \({\mathcal {I}},{\mathcal {L}}\) and \({\mathcal {J}}\), where \({\mathcal {I}}\) is the set of inputs with I members, \({\mathcal {L}}\) is the set of pools with L members and \({\mathcal {J}}\) is the set of outputs with J members.

  2. (ii)

    \({\mathcal {A}}\subseteq ({\mathcal {I}}\times {\mathcal {L}}) \cup ({\mathcal {I}}\times {\mathcal {J}}) \cup ({\mathcal {L}}\times {\mathcal {L}}) \cup ({\mathcal {L}} \times {\mathcal {J}}) \); see Fig. 1.

In this paper, we consider cases where \({\mathcal {A}}\cap {\mathcal {L}}\times {\mathcal {L}}=\emptyset \), which is called standard pooling problem because there is no arc between the pools.

Fig. 1
figure 1

An example of a standard pooling problem with I inputs, L pools and J outputs

For each arc \((i,j)\in {\mathcal {A}}\), let \(c_{ij}\) be the cost of sending a unit flow on this arc. For each node, there is possibly a capacity restriction, which is a limit for sum of the incoming (outgoing) flows to a node. The capacity restriction is denoted by \(C_i\) for each \(i\in N\). Also, there are some specifications for the inputs, e.g. the sulfur concentrations in them, which are indexed by k in a set of specifications \({\mathcal {K}}\) with K members. By letting \(y_{ij}\) be the flow from node i to node \(j, u_{ij} \) the restriction on \(y_{ij}\) that can be carried from i to j, and \(p_{lk}\) the concentration value of kth specification in the pool l, the pooling problem can be written as the following optimization model:

(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)

where \(\mu ^{max}_{jk}\) and \(\mu ^{min}_{jk}\) are the upper and lower bound of the kth specification in output \(j\in {\mathcal {J}}\), and \(\lambda _{ik}\) is the concentration of kth specification in the input i. Here is a short interpretation of the constraints:

  • (9): volume balance between the incoming and outgoing flows in each pool.

  • (10): capacity restriction for each input.

  • (11): capacity restriction for each pool.

  • (12): capacity restriction for each output.

  • (13): limitation on each flow.

  • (14): specification balance between the incoming and outgoing flows in each pool.

  • (15): upper bound of the output specification.

  • (16): lower bound of the output specification.

For a general pooling problem, the aforementioned model is called the P-formulation. Consider a pool \(l\in {\mathcal {L}}\) and the arc incident to it from input \(i\in {\mathcal {I}}\). Let us denote by \(q_{il}\) the ratio between the flow in this arc and the total incoming flow to this pool. So, \(y_{il}=q_{il}\sum _{j\in {\mathcal {J}}}y_{lj}\), and \(p_{lk}=\sum _{i\in {\mathcal {I}}}\lambda _{ik}q_{il}\) for any \(k\in {\mathcal {K}}\). Applying these to the P-formulation yields the following problem called the Q-formulation:

(17)

Adding two sets of redundant constraints

(18a)
(18b)

gives an equivalent problem, called the PQ-formulation. It is clear that all formulations are nonconvex quadratic optimization problems which are not easy to solve (Haugland 2016).

3.1 McCormick relaxation and the pooling problem

Assume that x and y are variables with given lower and upper bounds

$$\begin{aligned} \ell _x \le x \le u_x, \;\;\; \ell _y \le y \le u_y. \end{aligned}$$

Then, the following inequalities are implied when \(\chi =xy\):

$$\begin{aligned} \chi\ge & {} \ell _x y + \ell _y x - \ell _x \ell _y, \end{aligned}$$
(19a)
$$\begin{aligned} \chi\ge & {} u_x y + u_y x - u_x u_y, \end{aligned}$$
(19b)
$$\begin{aligned} \chi\le & {} \ell _x y + u_y x - \ell _x u_y, \end{aligned}$$
(19c)
$$\begin{aligned} \chi\le & {} u_x y + \ell _y x - u_x \ell _y. \end{aligned}$$
(19d)

It is known that the convex hull of

$$\begin{aligned} {\mathcal {B}}:=\left\{ (x,y, \chi )\; \left| \;\chi =xy,\; \ell _x \le x \le u_x, \;\;\; \ell _y \le y \le u_y \right\} , \right. \end{aligned}$$

which is called the McCormick relaxation (Gupte et al. 2017), is exactly the set of \((x,y,\chi )\) that satisfies the inequalities (19).

In the pooling problem, the following lower and upper bounds on the variables are implied:

So, one can get a lower bound by using the McCormick relaxation of each bilinear term in the P- or PQ-formulations.

The redundant constraints (18) guarantee that the relaxation obtained by using the McCormick relaxation for the PQ-formulation is stronger than that for the P-formulation (Gupte et al. 2017).

In this paper, we are going to use the BSOS hierarchy to find a sequence of lower bounds that converges to the optimal value of the pooling problem. First we analyze the P-formulation and in Sect. 4.3 we compare the results by using the PQ-formulation.

3.2 Solving pooling problems with the BSOS hierarchy

The BSOS hierarchy is only defined for problems without equality constraints and the P-formulation has \((K+1)L\) equality constraints. The simplest way of dealing with equality constraints, is to replace each equality constraint by two inequalities; however, this process increases the number of constraints which is not favorable for the BSOS hierarchy. Another way of doing so is eliminating the equality constraints (9) and (14), if possible.

3.2.1 Eliminating equality constraints

Let \(l\in {\mathcal {L}}\). We assume without loss of generality that the first t inputs feed the pool l. Therefore, equality constraints (9) and (14) can be written as follows:

(20)

Let \(rank(A)=r\). Applying a singular value decomposition (see, e.g. Boyd and Vandenberghe 2004, Appendix A.5.4), there are matrices \(U=[U_1,U_2]\in {\mathbb {R}}^{(K+1)\times (K+1)},V=[V_1,V_2]\in {\mathbb {R}}^{t\times t},\varSigma \in {\mathbb {R}}^{r \times r}\) such that

$$\begin{aligned} \begin{aligned}&A=U\left[ \begin{matrix} \varSigma &{}0_1\\ 0_2&{}0_3 \end{matrix}\right] V^T ,U^TU=I,\;\;V^TV=I,\;\;\\ {}&\varSigma =diag(\sigma _1,\ldots ,\sigma _r),\;\; \sigma _1>\sigma _2>\cdots>\sigma _r>0, \end{aligned} \end{aligned}$$

where \(V_1\in {\mathbb {R}}^{t \times r}, V_2\in {\mathbb {R}}^{t\times (t-r)}, U_1\in {\mathbb {R}}^{(K+1) \times r}, U_2\in {\mathbb {R}}^{(K+1)\times (K+1-r)}, 0_1\in {\mathbb {R}}^{r \times (t-r)},\;0_2\in {\mathbb {R}}^{(K+1-r)\times r}, 0_3\in {\mathbb {R}}^{(K+1-r) \times (t-r)}\). Therefore, (20) can be written as

(21)
(22)

The fact that \(V^TV=I\), implies that all columns in V, and hence in \(V_1\) are linearly independent. Therefore, taking the QR decomposition of \(V_1^T\), i.e., \(V_1^T=Q[R_1,R_2]\), where \(R_1\in {\mathbb {R}}^{r \times r}\) is upper triangular and invertible, \(R_2\in {\mathbb {R}}^{r\times (t-r)}\), and \(Q\in {\mathbb {R}}^{r \times r}\) is orthonormal (\(Q^TQ=QQ^T=I\)), (21) is equivalent to

(23)

Note that one may use Eq. (23) to eliminate the variables \(y_{1l},y_{2l}, \ldots , y_{rl}\).

Concerning (22), if, for a feasible solution of (8)–(16), \(\sum _{j\in J}y_{lj}=0 \), then any other choice of the values \(p_{lk}, k=1,\ldots ,K,\) gives another feasible solution. So in this case one may choose values that satisfy

$$\begin{aligned} {0 = U_2^T\left[ \begin{matrix} 1\\ p_{l1}\\ p_{l2}\\ \vdots \\ p_{lK} \end{matrix} \right] ,} \end{aligned}$$
(24)

which is a system of K variables and \(K-r+1\) linearly independent equalities with \(r\ge 1\). Conversely, a feasible solution with the property \(\sum _{j\in J}y_{lj}\ne 0 \) definitely satisfies (24). So, instead of (22), we may solve (24), which may be done using the QR decomposition. Thus we may write \(p_{lr},\ldots , p_{lK}\) as a linear function of \(p_{l1},\ldots ,p_{l(r-1)}\), and subsequently eliminate \(p_{lr},\ldots , p_{lK}\) as well.

Remark 2

We emphasize that after these substitutions, the equivalent mathematical model to the pooling problem is still nonconvex quadratic optimization problem.

Remark 3

The interpretation of eliminating equality constraints is as follows when the matrix A is full rank (\(rank(A)=\min \{K+1,t\}\)): For pools with exactly \(K + 1\) entering arcs, the entering flow values are given by the total leaving flow and the concentrations in the pool. With more than \(K + 1\) arcs, say \(t, t - K - 1\) flow values can be chosen freely and the remaining \(K + 1\) determined by total leaving flow and concentrations. When \(t < K + 1\), a basis of t concentration values define the \(K + 1 - t\) remaining ones.

3.2.2 First numerical results

In this section, we study convergence of the BSOS hierarchy of lower bounds \(q^1_d\) (\(d=1,2,\ldots \)) for pooling problems (\(\kappa =1\)). First, it is worth pointing out the number of variables and constraints needed to compute \(q^1_d\). The number of constraints, as it is mentioned in the previous section, is \({n+2d \atopwithdelims ()2d}\). Also, the number of linear variables is one more than the size of \({\mathbb {N}}^{2m}_d\), namely \({2m+d\atopwithdelims ()d}+1\).

Table 1 Details for some well-known pooling problem instances

Table 1 gives some information of the standard pooling problem instances we use in this paper. The GAMS files of the pooling problem instances that we use in this paper, except DeyGupte4, can be found on the website http://www.ii.uib.no/~mohammeda/spooling/. DeyGupte4 is constructed in this paper (Appendix) by using the results of Dey and Gupte (2015). In Table 1, we recall in column “PQ-linear relaxation value” the lower bound proposed in Alfaki and Haugland (2013) of each instance. This lower bound is the optimal value of the PQ-formulation after applying McCormick relaxation for each bilinear term. It is proved in Dey and Gupte (2015) that any optimal value of a piecewise linear approximation of the PQ-formulation (for a precise definition see Appendix) for DeyGupte4, has optimal value in \([-4,-3]\). Also, columns “# var.” and “# const.” contain the number of variables and constraints in the P-formulation after eliminating equality constraints, respectively.

Fig. 2
figure 2

Optimal solution for Haverly1

Example 1

By way of example, we give the details for the first instance in Table 1, called Haverly1. Its optimal solution is shown in Fig. 2, and the optimal value is \(-400\) (Adhya et al. 1999). The optimal flow from node i to node j is denoted by \(y^*_{ij}\) in Fig. 2.

This instance has three inputs (denoted by \(1,\ 2,\ 3\)), one pool (denoted by 4), two outputs (denoted by \(5,\ 6\)), and one specification. The mathematical model for this instance is as follows:

$$\begin{aligned} \min&\quad 6y_{14}+16y_{24}+10\left[ y_{35}+y_{36}\right] -9\left[ y_{45}+y_{35}\right] -15\left[ y_{46}+y_{36}\right] \nonumber \\ \text{ s.t. }&\quad y_{14}+y_{24}=y_{45 }+y_{46}, \nonumber \\&\quad 0\le y_{45 }+y_{35}\le 100,\end{aligned}$$
(25a)
$$\begin{aligned}&\quad 0\le y_{46}+y_{36}\le 200,\end{aligned}$$
(25b)
$$\begin{aligned}&\quad 3y_{14}+y_{24}=p_{1}\left[ y_{45 }+y_{46}\right] ,\nonumber \\&\quad 2y_{35}+p_{1}y_{45 }\le 2.5\left[ y_{35}+y_{45 } \right] ,\end{aligned}$$
(25c)
$$\begin{aligned}&\quad 3y_{36}+p_{1}y_{46 }\le 1.5\left[ y_{36}+y_{46 } \right] ,\nonumber \\&\quad y_{ij}\ge 0, p_1\ge 0. \end{aligned}$$
(25d)

So, we can use the elimination method described in the previous section, which implies that \(y_{14}=\frac{1}{2}(y_{45}+y_{46})(p_1-1), y_{24}=\frac{1}{2}(y_{45}+y_{46})(3-p_1)\). Therefore, the reformulated model of this instance using scaling \(x_1:=\frac{p_1}{3}, \ x_2:=\frac{y_{45}}{200}, \ x_3:=\frac{y_{46}}{200}, \ x_4:=\frac{y_{35}}{200}, \ x_5:=\frac{y_{36}}{200}\), is

(26a)
(26b)
(26c)
(26d)
(26e)
(26f)

where the leftmost inequalities are redundant, (26a) and (26b) are from the sign constraints after the elimination, (26c), (26d), (26e), and (26f) are from (25a), (25b), (25c) and (25d), respectively.

The last step is to multiply the constraint functions by a factor 0.9 (any value in (0, 1) will do, but we used 0.9 for our computations), to ensure that the ‘\(\le 1\)’ conditions hold with strict inequality on the feasible set. Thus, we define \(g_1(x) = -0.9\cdot \frac{3}{4}(x_1-1)(x_2+x_3)\), etc.

We will use the BSOS hierarchy to find the optimal value of this example (Table 2). \(\square \)

The results for Haverly1 and the other pooling problem instances are listed in Table 2. All computations in this paper were carried out with MOSEK 8 on an Intel i7-4790 3.60 GHz Windows computer with 16 GB of RAM. “Numerical Prob.” and “\(\approx \)” in the tables mean the solver reported a numerical problem, and only obtained an approximate optimal value, respectively. In all the tables from now on, columns “# lin. var.”, “size of SDP var.” and “# const.” present the number of linear variables, the size of the semidefinite matrix variable and the number of constraints in the hierarchy.

Table 2 Results for computing the lower bounds \(q^1_d\) for various pooling problem instances

As it is clear, in order to compute \(q^\kappa _d\) we can have a large number of linear variables and constraints (depending of d), which affects the speed and the time we need to solve (5). In the coming section, we describe how one can reduce the number of linear variables and constraints at each level of the BSOS hierarchy significantly.

4 Reduction in the number of linear variables and constraints

In this section, we propose a method to reduce the number of linear variables and an upper bound for the number of linearly independent constraints in each iteration.

4.1 Reduction in the number of variables

As it is mentioned in Remark 1, if we can identify constraints that are not binding at optimality, then we can reduce the number of variables.

In particular, by construction the constraints \(g_j(x) \le 1\) will never be binding at optimality. Recalling that the variable \(\lambda _{\alpha \beta }\) corresponds to

we know from Remark 1 that, in case of finite convergence, we will have \(\lambda _{\alpha \beta } = 0\) whenever \(\alpha = 0\).

Hence, instead of solving (5) to compute \(q_d^\kappa \), we may compute the following (weaker) bound more efficiently:

(27)

The advantage of (27) is that it has \({ m+d \atopwithdelims ()d }\) fewer nonnegative variables than (5). We emphasize that problem (27) is not equivalent to (5), i.e. the lower bounds \(q_d^\kappa \) and \({{\hat{q}}}_d^\kappa \) are not equal in general — the bound \({{\hat{q}}}_d^\kappa \) is weaker, and may be strictly weaker.

The precise relation of the bounds \(q_d^\kappa \) and \({{\hat{q}}}_d^\kappa \) is spelled out in the next theorem, which follows from the argument in Remark 1.

Theorem 3

If, for given d and \(\kappa , q_d^\kappa \) and \({{\hat{q}}}_d^\kappa \) are both finite, then \({{\hat{q}}}_d^\kappa \le q_d^\kappa \). Moreover, if the sequence \(q_d^\kappa \) (\(d= 1,2,\ldots \)) from (5) converges finitely to \(f^*\), then so does \({{\hat{q}}}_d^\kappa \) (\(d= 1,2,\ldots \)) from (27). More precisely, if \(q_{d^*}^\kappa = f^*\) for some \(d^* \in {\mathbb {N}}\), then \({{\hat{q}}}_{d^*}^\kappa = f^*\).

It is important to note that finite convergence of the sequence \(q_d^\kappa \) (\(d= 1,2,\ldots \)) is not guaranteed in general. Sufficient conditions for finite convergence are described in Lasserre et al. (2015).

The numerical results for using (27) for the pooling problem instances is demonstrated in Table 3. The “rel. time” column from this table onward gives the solution time for each level of the hierarchy as a ratio of that in Table 2, which shows that there is a significant reduction in computational times when compared to Table 2.

Table 3 Results for computing the lower bounds \({{\hat{q}}}_d^1\) for pooling problem instances using (27)

4.2 Reduction in the number of constraints

From now on we fix \(\kappa =1\) and \(v_1(x)=\left( 1,x_1,\ldots ,x_n\right) \). As it was mentioned, the number of constraints in each level of the BSOS hierarchy is \({n+2d \atopwithdelims ()2d}\), where n is the number of variables in the original problem (1) and d is the level of the BSOS hierarchy. So, the number of constraints increases quickly with d. In this subsection, we discuss the redundancy of constraints and how we can eliminate linearly dependent constraints.

Let \(\text{ svec }\) denote the map from the \((n+1)\times (n+1)\) symmetric matrix space \(S^{n+1}\) to \({\mathbb {R}}^{1 \times {n+2 \atopwithdelims ()2}}\) given by

It will also be convenient to number the elements of \(\varDelta (n,\tau )\) as \(x^1,\ldots ,x^L\) where \(L = s(\tau )\). Finally, we will use the notation \(|\beta | = \sum _i \beta _i\).

So, for \(d\ge 1\) and \(\kappa =1\) we may write the linear equality constraints in (5) as \(H_dy_d=b_d\), where

and \(L={n+2d \atopwithdelims ()2d}\). It is clear that \(H_d \in {\mathbb {R}}^{L \times \left[ {2m+d \atopwithdelims ()d}+L+1\right] }\).

In the following theorem we prove that all the constraints are linearly independent when \(d=1\).

Theorem 4

For the general problem (1) with quadratic functions f(x) and \(g_j(x), j=1,\ldots ,m\), all of the constraints in the first iteration of the BSOS hierarchy are linearly independent, i.e. if \(d=1\), all of the constraints of (5) are linearly independent.

Proof

Fix \(d=1\), which implies \(\tau =2\) and \(L={n+2 \atopwithdelims ()2}\) in (5). Then,

$$\begin{aligned} H_1=\left[ \begin{matrix} 1&{}g_1(x^1)&{}\dots &{}g_m(x^1)&{}1-g_1(x^1)&{}\dots &{}1-g_m(x^1)&{}\text{ svec }\left( v_1(x^1)v_1(x^1)^T\right) \\ \vdots &{}\vdots &{}\ddots &{}\vdots &{}\vdots &{}\ddots &{}\vdots &{}\vdots \\ 1&{}g_1(x^L)&{}\dots &{}g_m(x^L)&{}1-g_1(x^L)&{}\dots &{}1-g_m(x^L)&{}\text{ svec }\left( v_1(x^L)v_1(x^L)^T\right) \end{matrix}\right] , \end{aligned}$$

and,

for \( x^1,\ldots ,x^L \in \varDelta (n,2)\), defined in (5). To show that all of the rows in \(H_1\) are linearly independent, we prove that the submatrix

is a full rank matrix by induction over n, the dimension of x. Assume that \(n=1\). Because \(\varDelta (1,2)= \{0,\frac{1}{2},1\}\), it is clear that the rank of the matrix \(V^1_1=\left[ \begin{matrix} 1 &{}0 &{}0\\ 1&{} \frac{\sqrt{2}}{2}&{}\frac{1}{4}\\ 1&{} \sqrt{2}&{} 1 \end{matrix}\right] ,\) is 3, which means that \(V^1_1\) is a full rank matrix.

Now, suppose that \(V^1_n\) is a full rank matrix, and let us show it is full rank for \(n+1\). When \(x\in {\mathbb {R}}^{n+1}\), we can partition the points in \(\varDelta (n+1,2)\) into three cases:

  1. (I)

    points with \(x_{n+1}=0\). These points can be generated by considering all of the points in \(\varDelta (n,2)\), and adding a 0 as their last component.

  2. (II)

    points with \(x_{n+1}=\frac{1}{2}\). The points in this class can be sub-partitioned into two groups:

    1. (i)

      points with one nonzero component.

    2. (ii)

      points with two nonzero components.

  3. (III)

    points with \(x_{n+1}=1\). Clearly, there is only one point in this class.

According to the definition of \(\text{ svec }\left( v_1(x)v_1(x)^T\right) \), each of \(V^1_{n+1}\)’s column is related to \(x^\gamma \), where \(\gamma \in {\mathbb {N}}_2^{n+1}\). Let us order the columns of \(V^1_{n+1}\) as follows: first we put all of the columns related to \(x^{\alpha }\), where \((\alpha ,0)\in {\mathbb {N}}^{n+1}_2\), after that the columns related to \(x_{n+1}, x_{n+1}^2, x_{n+1}x_i, i=1,\ldots ,n\). So, because each row of \(V^1_{n+1}\) is related to a point in \(\varDelta (n+1,2)\), after ordering its rows, the matrix looks like this:

for some \(a_1\in {\mathbb {R}}^{n\times L}, \text{ and } a_2, a_3 \in {\mathbb {R}}^{1 \times L}\). Due to the induction assumption, \(V^1_n\) is a full rank matrix, which implies that \(V^1_{n+1}\) is a full rank matrix. Therefore, the constraints in the first iteration of the BSOS hierarchy are linearly independent.\(\square \)

In Theorem 4, we prove that if \(d=1\), then all of the constraints in (5) are linearly independent. In the next theorem, we prove that for \(d\ge 2\), if we rewrite \(H_d\) as \([{\bar{H}}_d,V_n^d]\), where

then \(Rank(H_d)=Rank({\bar{H}}_d)\).

Theorem 5

Suppose that f is quadratic, \(d\ge 2\), and \(\varTheta \subseteq \varDelta (n,2d)\). The equality constraints in (5) corresponding to the points in \(\varTheta \) applied to the general problem (1) with sign constraints over all of the variables, are linearly independent if and only if rows in \({\bar{H}}_d\) corresponding to the points in \(\varTheta \) are linearly independent.

Proof

The ‘if’ part is trivial.

To prove the ‘only if’ part, without loss of generality we assume that \(x^p, p=1,\ldots ,t\) generate linearly independent constraints, which means that the first t rows of \(H_d\) are linearly independent. Since the objective function f is quadratic, \(b_d\) is a linear combination of the columns of \(V^d_n\). Because of the sign constraints for all of the variables, each column of \(V_n^d\) is also a column in \({\bar{H}}_d\), for \(d\ge 2\). This means that \(V_n^d\) is a submatrix of \({\bar{H}}_d\), which implies that the first t rows in \({\bar{H}}_d\) are linearly independent. \(\square \)

After elimination of the equality constraints in pooling problem (8), we rewrite the model with sign constraints over all of the remaining variables. So, when using Theorem 5 to find the linearly independent constraints, we only need to check \({\bar{H}}_d\).

Theorem 6

Fix \(d\ge 2\). Consider \({\hat{H}}_d\), which is a matrix with all of the columns of \({\bar{H}}_d\) related to \(( \alpha , \beta )\) with \(\beta = 0\). Then \(\text{ Range }({\bar{H}}_d)=\text{ Range }({\hat{H}}_d)\).

Proof

Since we consider \(\beta =0\), we can write \({\hat{H}}_d\) as follows:

$$\begin{aligned} {\hat{H}}_d=\left[ \begin{matrix} \left( g(x^1)^{\alpha }\right) _{\alpha \in {\mathbb {N}}^m_d}\\ \vdots \\ \left( g(x^L)^{\alpha }\right) _{\alpha \in {\mathbb {N}}^m_d} \end{matrix} \right] , \end{aligned}$$

where \(L={n+2d \atopwithdelims ()2d}, g(x)=\left( g_1(x),\ldots ,g_m(x)\right) \), for each \(\alpha \in {\mathbb {N}}^m_d, g(x^p)^{\alpha }=\prod _{j=1}^{m}g_j(x)^{\alpha _j}\), and \( \left( g(x^p)^{\alpha }\right) _{\alpha \in {\mathbb {N}}^m_d} \in {\mathbb {R}}^{1 \times {m+d \atopwithdelims ()d}}, p=1,\ldots ,L\).

Because the columns of \({\hat{H}}_d\) are a subset of the columns of \({\bar{H}}_d\), so \(\text{ Range }({\hat{H}}_d)\subseteq \text{ Range }({\bar{H}}_d)\). To prove the other containment, we show that all columns of \({\bar{H}}_d\) are linear combinations of \({\hat{H}}_d\)’s columns. Each column of \({\bar{H}}_d\) is related to a function \(h_{\alpha \beta }(x)\) for some \((\alpha ,\beta )\in {\mathbb {N}}_d^{2m}\). If \(\beta =0\) for a column of \({\bar{H}}_d\), then it is a column of \({\hat{H}}_d\). Now consider a column with \(\beta \ne 0\). Therefore, \(h_{\alpha \beta }(x)\) related to this column is equal to

$$\begin{aligned} \prod _{j=1}^{m}g_j(x)^{\alpha _j}\prod _{j=1}^{t}\left( 1-g_j(x)\right) ^{\beta _j}= g(x)^{\alpha }\left( \sum _{i=1}^{w}a_ig(x)^{\gamma _i}\right) , \end{aligned}$$

for some \(\gamma _i \in {\mathbb {N}}^m_{|\beta |}, a_i\in {\mathbb {R}}, i=1,\ldots ,w\), and \(w\ge 0\). Hence, \( h_{\alpha \beta }(x)=\sum _{i=1}^{w}a_ig(x)^{\gamma _i+\alpha } \). Because \(\gamma _i+\alpha \in {\mathbb {N}}^m_{d}, g(x)^{\gamma _i+\alpha }\) is related to a column of \({\hat{H}}_d\), for each \(i=1,\ldots ,w\). This means that any column of \({\bar{H}}_d\) is a linear combination of the columns in \({\hat{H}}_d\). \(\square \)

By Theorem 6, to find the number of linearly independent constraints in (5), we only need to check the columns related to \(h_{\alpha \beta }(x)\) with \(\beta =0\).

It is clear that the results in this paper, except Theorem  4, can be modified for the LP bounds (3). In fact, Theorems 5 and 6 are true in each level, even \(d=1\). In Table 4, the results of solving the pooling problem instances in Table 1 are shown after eliminating the linearly dependent constraints using (3) and \({{\hat{q}}}_d^1\) in (27). Note that the computational times at the \(d=2\) and \(d=3\) levels are greatly reduced when compared to the times in Table 3. For some instances because of the large number of constraints in the last level of the hierarchy, we could not find the number of linearly independent constraints and we put “-” as in Table 4. Also in this table we show how much stronger the BSOS hierarchy is compared to the LP bounds (3) after reducing the number of variables and deleting the linear dependent constraints. As one can see, the main difference between the BSOS hierarchy and (3) is in the first level, in which the number of independent constraints in (3) is much smaller than the BSOS hierarchy. If there is a difference between two hierarchies, it is presented in Table 4 with “()”, in which the value corresponds to the LP bounds (3). It can be seen that there is a pay-off between using (3) and the BSOS hierarchy. By using the LP bounds you may solve each level faster (4 cases) but the lower bound can be strictly weaker than the one from the BSOS hierarchy (2 cases).

Table 4 Results for computing the bounds from (3) and \({{\hat{q}}}_d^1\) in (27) after elimination of linearly dependent constraints

4.3 Lower bounds using PQ-formulation

Up to now, we evaluated the BSOS hierarchy on the P-formulation. Since the McCormick relaxation (Sect. 3.1) of the PQ-formulation is stronger than that of the P-formulation Gupte et al. (2017), it is worthwhile to evaluate the BSOS hierarchy using the PQ-formulation. In Table 5 we present these results for the PQ-formulation. To eliminate the equality constraints, we replace them by two inequalities.

Table 5 Results for computing the bounds \({{\hat{q}}}_d^1\) in (27) after elimination of linearly dependent constraints on the PQ-formulation

4.4 Upper bound for the number of linearly independent constraints

According to Theorem 6, to find the number of linearly independent columns of \(H_d\), for \(d\ge 2\) we only need to find the rank of the linear space, say \(N_d\), spanned by \(\{g(x)^{\alpha }\}_{\alpha \in {\mathbb {N}}^m_{d}}\). Hence, the dimension of \(N_d\) is an upper bound on the number of linearly independent constraints. In this part we give an upper bound on the dimension of \(N_d\), which is an upper bound on the number of linearly independent constraints in (5).

It is clear that \(N_d\) is a subspace of the linear space \(M_d\) spanned by \(\left\{ w(x)^{\alpha }\right\} _{\alpha \in {\mathbb {N}}^{\omega }_{d}}\), where w(x) is a vector containing all of the monomial existing in (1), and \(\omega \) in the size of w(x). Therefore, \(rank(M_d)\) is an upper bound on \(rank(N_d)\), and hence an upper bound of the number of linearly independent constraints in each iteration of the BSOS hierarchy.

In the rest of this part, we try to find \(rank(M_d)\) for the pooling problems, and assume that the number of outgoing flows from each pool is equal to J. After elimination of equality constraints in the pooling problem (8), the functions defining the inequality constraints can be partitioned into three classes:

  1. (I)

    bilinear functions,

  2. (II)

    \(x_i, i=1,\ldots ,n\),

  3. (III)

    some other affine functions.

The bilinear functions are those related to constraints (15) and (16), or those related to the constraints (13) after elimination of equality constraints. Hence, the only bilinear terms in the reformulated problem are \(p_{lk}y_{lj}\), for each pool l and specification k, where there is an outgoing flow from pool l to output j. Therefore,

$$\begin{aligned} \left\langle \left\{ \left( 1,\left\{ y_{ij} \right\} _ {\begin{array}{c} i\in {\mathcal {I}}\\ j\in {\mathcal {J}} \end{array},} \left\{ y_{lj} \right\} _ {\begin{array}{c} l \in {\mathcal {L}} \\ j\in {\mathcal {J}} \end{array},} \left\{ y_{il} \right\} _{(i,l) \in \bar{{\mathcal {I}}}},\left\{ p_{lk} \right\} _{(l,k)\in \bar{{\mathcal {L}}}}, \left\{ p_{lk}y_{lj} \right\} _{(l,k,j)\in \bar{{\mathcal {J}}}} \right) ^{\alpha } \right\} _{\alpha \in {\mathbb {N}}_d^{\omega }} \right\rangle =M_d, \end{aligned}$$

where \(\bar{{\mathcal {I}}}, \bar{{\mathcal {L}}}\) and \(\bar{{\mathcal {J}}}\) are respectively including (il), (lk) and (lkj) that \(y_{il}, p_{lk}\) and \(p_{lk}y_{lj}\) appear in (8) after elimination of the equality constraints, and

$$\begin{aligned} \omega =1+I\times L+L\times J+|\bar{{\mathcal {I}}}|+|\bar{{\mathcal {L}}}|+|\bar{{\mathcal {J}}}|. \end{aligned}$$

Clearly the number of variables in the pooling problem (8) after elimination of equality constraints is \(I\times L+L\times J+|\bar{{\mathcal {I}}}|+|\bar{{\mathcal {L}}}|\). For \(d=1\), we prove in Theorem 4 that all of the constraints in (5) are linearly independent, with the number of \({n+2 \atopwithdelims ()2}\). For \(d\ge 2\), we are seeking for the monomials up to degree 2d that appear in \(M_d\). If \(d=2\), the number of monomials with degree at most 2 is \({n+2 \atopwithdelims ()2}\). The number of monomials with degree 3 that appear in \(M_d\) is at most

$$\begin{aligned} K\times L\times \left[ {n+1 \atopwithdelims ()2}-{n-J+1 \atopwithdelims ()2}\right] , \end{aligned}$$

because for each \(k \in {\mathcal {K}}\) and \(l\in {\mathcal {L}}\), in this case the only way of having a monomial with degree 3 is by multiplying a monomial of degree 2 with a variable, which makes \({n+1 \atopwithdelims ()2}-{n-J+1 \atopwithdelims ()2}\) monomials of degree 3. And finally, the number of monomials of degree 4 that appear in \(M_d\) is \(\left[ {K\times L\times J \atopwithdelims ()2}+K\times L\times J\right] \), because the only ways to make such monomials are by taking the square of a monomial with degree 2, or multiplying two degree 2 monomials. Therefore, the number of linearly independent constraints for \(d=2\) is at most

$$\begin{aligned} {n+2 \atopwithdelims ()2}+K\times L\times \left[ {n+1 \atopwithdelims ()2}-{n-J+1 \atopwithdelims ()2}\right] +{K\times L\times J \atopwithdelims ()2}+K\times L\times J.\nonumber \\ \end{aligned}$$
(28)

With the same line of reasoning as above, the number of monomials with degree at most 6 for \(d=3\) is less than or equal to

$$\begin{aligned}&\underbrace{{n+3 \atopwithdelims ()3}}_{\begin{array}{c}\text {monomials up} \\ \text {to degree 3}\end{array}}+ \underbrace{K\times L\times \left[ {n+2 \atopwithdelims ()3}-{n-J+2 \atopwithdelims ()3}\right] }_{\begin{array}{c}\text {monomials of} \\ \text { degree 4}\end{array}}\nonumber \\&+\underbrace{K\times L\times \left[ {n+2 \atopwithdelims ()3}-{n-J+2\atopwithdelims ()3}-J\times {n-J+1\atopwithdelims ()2}\right] }_{\begin{array}{c}\text {monomials of}\\ \text { degree 5}\end{array}}\nonumber \\&+ \underbrace{\left[ {K\times L\times J\atopwithdelims ()3}+K\times L\times J+2{K\times L\times J\atopwithdelims ()2}\right] }_{\begin{array}{c}\text {monomials of} \\ \text { degree 6}\end{array}}. \end{aligned}$$
(29)

Example 2

Consider the example (1). The only bilinear terms in (1) are \(y_1y_2\) and \(y_1y_3\). So,

$$\begin{aligned} M_d=\left\langle \left\{ \left( 1, y_1,y_2,y_3,y_4,y_5,y_1y_2,y_1y_3\right) ^\alpha \right\} _{\alpha \in {\mathbb {N}}_d^{8}} \right\rangle \end{aligned}$$

Therefore, the number of linearly independent constraints is at most

$$\begin{aligned} {7 \atopwithdelims ()2}+{6 \atopwithdelims ()2}-{4 \atopwithdelims ()2}+{2 \atopwithdelims ()2}+2=33, \end{aligned}$$

if \(d=2\), and

$$\begin{aligned} {8 \atopwithdelims ()3}+ 2\times {7 \atopwithdelims ()3}-2\times {5 \atopwithdelims ()3}-2\times {4 \atopwithdelims ()2}+2\times {2 \atopwithdelims ()2}+2=98, \end{aligned}$$

if \(d=3\). \(\square \)

4.5 Improving lower bounds by adding valid inequalities

Adding redundant constraints to the original problem (1) increases the number of linear variables in (4); this introduces some flexibility in each level of the hierarchy because of the new linear variables and may provide a stronger lower bound. As it was mentioned in Sect. 3.1, for each bilinear term in the P- or PQ-formulations there are four valid inequalities given by (19). So, in Table 6 we present the result of adding these valid inequalities to the P-formulation and using \({{\hat{q}}}_d^1\) in (27) to solve the problem. In each level of the hierarchy in this table, we use the upper bounds for the number of constraints proposed in Sect. 4.4. As Table 6 shows, this improvement helps to obtain the optimal values of Haverly1, Harverly2, Ben-Tal4, and DeyGupte4, and to get a good approximation of the optimal value of Foulds2 in the second level of the hierarchy. Also, for Adhya1,2,4 we obtained better lower bounds than the PQ-linear relaxation values in Table 1.

Table 6 Results for computing the lower bounds \({{\hat{q}}}_d^1\) for the P-formulation after adding valid inequalities and considering (28) and (29) as the upper bound on the number of linearly independent constraints

5 Conclusion

In this paper we analysed and evaluated the bounded degree sum-of-squares (BSOS) hierarchy of Lasserre et al. (2015) for a class of bilinear optimization problems, namely pooling problems. We showed that this approach is successful in obtaining the global optimal values for smaller instances, but scalability remains a problem for larger instances. In particular, the number of nonnegative variables and linear constraints grows quickly with the level of the hierarchy. We have showed that it is possible to eliminate some variables and redundant linear constraints in the hierarchy in a systematic way, and this goes some way in improving scalability. More ideas are needed, though, if this approach is to become competitive for medium to larger scale pooling problems.