1 Introduction

We consider the polynomial optimization problem:

$$\begin{aligned} (P):\quad f^*=\displaystyle \min _x\,\{f(x): x\in \mathbf {K}\,\}, \end{aligned}$$
(1)

where \(f\in \mathbb {R}[x]\) is a polynomial and \(\mathbf {K}\subset \mathbb {R}^n\) is the basic semi-algebraic set

$$\begin{aligned} \mathbf {K}=\{\,x\in \mathbb {R}^n: g_j(x)\,\ge \,0,\quad j=1,\ldots ,m\}, \end{aligned}$$
(2)

for some polynomials \(g_j\in \mathbb {R}[x]\), \(j=1,\ldots ,m\). In order to approximate (and sometimes solve exactly) (P) one may instead solve a hierarchy of convex relaxations of (P) of increasing sizes, namely for instance,

\(\bullet \) Semidefinite relaxations defined in Lasserre (2001) and based on Putinar’s certificate of positivity on K Putinar (1993), where the d-th convex relaxation of the hierarchy is a semidefinite program given by

$$\begin{aligned} \gamma _d=\max _{t,\sigma _j}\,\left\{ \,t: f-t=\sigma _0+\sum _{j=1}^m \sigma _j\,g_j\,\right\} . \end{aligned}$$
(3)

The unknowns \(\sigma _j\) are sums of squares (SOS) polynomials with the degree bound constraint, \(\mathrm{degree}(\sigma _jg_j)\le 2d\), \(j=0,\ldots ,m\), and the expression in (3) is a certificate of positivity on \(\mathbf {K}\) for the polynomial \(x\mapsto f(x)-t\).

\(\bullet \) LP-relaxations based on Krivine-Stengle’s certificate of positivity on \(\mathbf {K}\) Krivine (1964), Stengle (1974), where the d-th convex relaxation of the hierarchy is a linear program given by

$$\begin{aligned} \theta _d=\max _{\lambda \ge 0,t}\,\left\{ t:f-t=\sum _{(\alpha ,\beta )\in \mathbb {N}^{2m}_{d}}\lambda _{\alpha \beta }\,\prod _{j=1}^m\left( g_j^{\alpha _j}\,(1-g_j)^{\beta _j}\right) \right\} , \end{aligned}$$
(4)

where \(\mathbb {N}^{2m}_{d}=\{(\alpha ,\beta )\in \mathbb {N}^{2m}:\sum _j\alpha _j+\beta _j\le d\}\). The unknown are t and the nonnegative scalars \(\lambda =(\lambda _{\alpha \beta })\), and it is assumed that \(0\le g_j\le 1\) on \(\mathbf {K}\) (possibly after scaling) and the family \(\{1,g_j\}\) generates the algebra \(\mathbb {R}[x]\) of polynomials. Problem (4) is an LP because stating that the two polynomials in both sides of “\(=\)” are equal yields linear constraints on the \(\lambda _{\alpha \beta }\)’s. For instance, the LP-hierarchy from Sherali–Adams RLT Sherali and Adams (1990) and their variants Sherali and Adams (1999) are of this form.

In both cases, \((\gamma _d)\) and \((\theta _d)\), \(d\in \mathbb {N}\), provide two monotone nondecreasing sequences of lower bounds on \(f^*\) and if \(\mathbf {K}\) is compact, then both converge to \(f^*\) as one lets d increases. For more details as well as a comparison of such relaxations, the reader is referred to, e.g., Lasserre (2009, 2002) and Laurent (2003), as well as Chlamtac and Tulsiani (2012) for the impact of LP- and SOS-hierarchies on approximation algorithms in combinatorial optimization.

Of course, in principle, one would much prefer to solve LP-relaxations rather than semidefinite relaxations (i.e. compute \(\theta _d\) rather than \(\gamma _d\)) because present LP-software packages can solve sparse problems with millions of variables and constraints, which is far from being the case for today’s semidefinite solvers. Therefore the hierarchy (3) applies to problems of modest size only. However if some sparsity or symmetry is taken into account then specialized variants as in Waki et al. (2006) can handle problems of much larger size. However, on the other hand, the LP-relaxations (4) suffer from several serious theoretical and practical drawbacks. For instance, it has been shown in Lasserre (2002, 2009) that the LP-relaxations cannot be exact for most convex problems, i.e., the sequence of the associated optimal values converges to the global optimum only asymptotically, and not in finitely many steps. Moreover, the LPs of the hierarchy are numerically ill-conditioned. This is in contrast with the semidefinite relaxations (3) for which finite convergence takes place for convex problems where \(\nabla ^2f(x^*)\) is positive definite at every minimizer \(x^*\in \mathbf {K}\) (see de Klerk and Laurent 2011, Corollary 3.3) and occurs at the first relaxation for SOS-convexFootnote 1 problems (Lasserre 2009, Theorem 3.3). In fact, as demonstrated in recent works of Marshall (2009) and Nie (2014), finite convergence is generic even for non convex problems.

1.1 Contribution

This paper is in the vein of recent attempts in Lasserre (2013) and Ahmadi and Majumdar (2014) to overcome the important computational burden associated with the standard SOS-hierarchy (3). In particular, in Lasserre (2013) we have suggested another hierarchy of convex relaxations which combines some of the advantages of the SOS- and LP- hierarchies (3) and (4). In the present paper we take advantage of attractive features of the SDPT3 solver Toh et al. (1999, 2003) to provide an effective implementation of this new hierarchy. First preliminary tests on a sample of non convex problems are encouraging and suggest that this new hierarchy might be efficient. This new hierarchy is another type of SOS-hierarchy labelled BSOS (for hierarchy with bounded degree SOS) with the following attractive features:

\(\bullet \) In contrast to the standard SOS-hierarchy (3), for each semidefinite program in the hierarchy, the size \(\big ({\begin{array}{c} {n + k} \\ n \\ \end{array} }\big )\) of the semidefinite matrix variable is now fixed, parametrized by an integer k that one fixes in advance. This integer k determines the degree of a certain SOS polynomial (for instance one may fix \(k=2\)), whence the label BSOS (for “bounded”-SOS). Recall that in the standard SOS-hierarchy (3) the size of the semidefinite matrix variable is \( \big ({\begin{array}{c} {n + d} \\ n \\ \end{array} }\big ) \) with rank d in the hierarchy.

\(\bullet \) In contrast to the LP-hierarchy (4), finite convergence occurs at the first step in the hierarchy for a large class of convex problems: typically convex problems defined with convex quadratic polynomials or SOS-convex polynomials of degree at most k. Recall that such finite convergence is impossible for the LP-hierarchy (4).

\(\bullet \) Just as in the standard SOS-hierarchy (3), there also exists a sufficient condition for finite convergence of the hierarchy. Namely, it suffices to check whether at an optimal solution of the corresponding SDP some associated moment matrix is rank-one.

\(\bullet \) Last but not least, to implement this hierarchy one uses important techniques that dramatically alleviate the computational burden associated with a standard (careless) implementation. Namely, to declare that two polynomials are identical one uses that their values are equal on finitely many randomly chosen points (instead of equating their coefficients). In addition, the SDP solver SDPT3 Toh et al. (1999, 2003) can be used to handle efficiently some types of matrices used in our positivity certificate.

Preliminary computational experiments First, we have compared our results with those obtained with the GloptiPoly software Henrion et al. (2009) (devoted to solving the SOS-hierarchy (3)) on a sample of non convex problems with up to 20 variables. For problems with low degree (in the initial data) and/or low dimension we obtain the global optimum, whereas good lower bounds are always obtained for problems with high degree or higher dimension (e.g. problems with degree 4 and up to 20 variables).

Next, we have also tested the LP-hierarchy (4) on a sample of convex problems and as expected the convergence is very poor and the resulting LPs become ill conditioned. In addition, the LP can be expensive to solve as the LP data are typically dense. In contrast, the new hierarchy (with smallest value \(k=1\) of its parameter) converges at the first step even though some of the problems are defined with polynomials of degree larger than 2.

We have also considered a sample of non convex quadratic problems of the form \(\inf \{x^TAx: x\in \Delta \}\), where \(\Delta \subset \mathbb {R}^n\) is the canonical simplex, and A is a randomly generated real symmetric matrix with r negative eigenvalues and \(n-r\) positive eigenvalues. For all problems that could be solved with GloptiPoly (up to \(n=20\) variables) we obtain the optimal values. For the other problems with \(n=40, 50,100\) variables, only the first (dense) relaxation of GloptiPoly can be implemented and yields only a lower bound on the global optimum. For those problems, a better lower bound is obtained in a reasonable amount of time by running the BSOS hierarchy.

Finally, we have considered the minimization of quadratic and quartic polynomials (with up to 40 variables) on the Euclidean unit ball intersected with the positive orthant. Again in those examples only the first SDP-relaxation of GloptiPoly can be implemented, providing only a lower bound. In contrast, BSOS solves all problems at step 2 of the hierarchy in a reasonable amount of time.

Of course this new hierarchy of semidefinite relaxations also has its drawbacks (at least in its present version). Namely, some submatrix (of the matrix used to describe the linear equality constraints of the resulting SDP) is fully dense and many of these linear constraints are nearly dependent, which yields a lack of accuracy in the optimal solution when the order of relaxation d is increased.

2 Main result

2.1 Notation and definitions

Let \(\mathbb {R}[x]\) be the ring of polynomials in the variables \(x=(x_1,\ldots ,x_n)\). Denote by \(\mathbb {R}[x]_d\subset \mathbb {R}[x]\) the vector space of polynomials of degree at most d, which forms a vector space of dimension \( s(d)= \big ( {\begin{array}{c} {n + d} \\ d \\ \end{array} } \big ) \), with, e.g., the usual canonical basis \((x^\alpha )\) of monomials. Also, denote by \(\Sigma [x]\subset \mathbb {R}[x]\) (resp. \(\Sigma [x]_d\subset \mathbb {R}[x]_{2d}\)) the space of sums of squares (SOS) polynomials (resp. SOS polynomials of degree at most 2d). If \(f\in \mathbb {R}[x]_d\), we write \(f(x)=\sum _{\alpha \in \mathbb {N}^n_d}f_\alpha x^\alpha \) in the canonical basis and denote by \(\varvec{f}=(f_\alpha )\in \mathbb {R}^{s(d)}\) its vector of coefficients. Finally, let \(\mathcal {S}^n\) denote the space of \(n\times n\) real symmetric matrices, with inner product \(\langle \mathbf {A},\mathbf {B}\rangle =\mathrm{trace}\,\mathbf {A}\mathbf {B}\). We use the notation \(\mathbf {A}\succeq 0\) (resp. \(\mathbf {A}\succ 0\)) to denote that \(\mathbf {A}\) is positive semidefinite (definite). With \(g_0:=1\), the quadratic module \(Q(g_1,\ldots ,g_m)\subset \mathbb {R}[x]\) generated by polynomials \(g_1,\ldots ,g_m\), is defined by

$$\begin{aligned} Q(g_1,\ldots ,g_m):=\left\{ \sum _{j=0}^m\sigma _j\,g_j:\sigma _j\in \Sigma [x]\,\right\} . \end{aligned}$$

We briefly recall two important theorems by Putinar (1993) and Krivine (1964), Stengle (1974) respectively, on the representation of polynomials that are positive on \(\mathbf {K}\).

Theorem 1

Let \(g_0=1\) and \(\mathbf {K}\) in (2) be compact.

(a) If the quadratic polynomial \(x\mapsto M-\Vert x\Vert ^2\) belongs to \(Q(g_1,\ldots ,g_m)\) and if \(f\in \mathbb {R}[x]\) is strictly positive on \(\mathbf {K}\), then \(f\in Q(g_1,\ldots ,g_m)\).

(b) Assume that \(0\le g_j\le 1\) on \(\mathbf {K}\) for every j, and the family \(\{1,g_j\}\) generates \(\mathbb {R}[x]\). If f is strictly positive on \(\mathbf {K}\), then

$$\begin{aligned} f=\sum _{\alpha ,\beta \in \mathbb {N}^m}c_{\alpha \beta }\,\prod _{j}\left( g_j^{\alpha _j}\,(1-g_j)^{\beta _j}\right) , \end{aligned}$$

for some (finitely many) nonnegative scalars \((c_{\alpha \beta })\).

2.2 The bounded-SOS-hierarchy (BSOS)

Consider the problem

$$\begin{aligned} {(P):} \quad f^*= \min \{ f(x) \mid x\in \mathbf {K}\} \end{aligned}$$

where \(K\subset \mathbb {R}^n\) is the basic semi-algebraic set defined in (2), assumed to be compact. Moreover we also assume that \(g_j(x) \le 1\) for all \(x\in \mathbf {K}\) and \(j=1,\dots ,m\), and \(\{1,g_j\}\) generates the ring of polynomials \(\mathbb {R}[x]\). For every fixed \(d\in \mathbb {N}\) and \(\lambda =(\lambda _\alpha )\), \(\alpha \in \mathbb {N}^{2m}_d\), let \(L_d(\cdot ,\lambda )\in \mathbb {R}[x]\) be the “Lagrangian” polynomial:

$$\begin{aligned} x\mapsto \, L_d(x,\lambda ):=f(x)-\sum _{(\alpha ,\beta )\in \mathbb {N}^{2m}_d}\lambda _{\alpha \beta }\,\prod _{j=1}^mg_j(x)^{\alpha _j} \,(1-g_j(x))^{\beta _j}. \end{aligned}$$
(5)

With \(k\in \mathbb {N}\) fixed, consider the family of optimization problems indexed by \(d\in \mathbb {N}\):

$$\begin{aligned} q_d^k := \sup _{\mathbf {\lambda },t}\,\{\, t \mid L_d(x,\lambda ) - t \in \Sigma [x]_{k}, \; \lambda \ge 0\,\},\qquad d\in \mathbb {N}. \end{aligned}$$
(6)

Observe that when k is fixed, then for each \(d\in \mathbb {N}\)

  • computing \(q^k_d\) in (6) reduces to solving a semidefinite program and therefore (6) defines a hierarchy of semidefinite programs because \(q^k_{d+1}\ge q^k_d\) for all \(d\in \mathbb {N}\).

  • The semidefinite constraint is associated with the constraint \(L_d(x,\lambda ) - t \in \Sigma [x]_{k}\) and the associated matrix has fixed size \( \big ( {\begin{array}{c} {n + k} \\ n \\ \end{array} } \big ) \), independent of \(d\in \mathbb {N}\), a crucial feature for computational efficiency of the approach.

  • If \(k=0\) then (6) is the linear program (4) and so \(\theta _d=q^0_d\le q^k_d\) for all dk.

Interpretation For any fixed \(d\ge 1\), problem (P) is easily seen to be equivalent to the following problem by adding redundant constraints:

$$\begin{aligned} (\tilde{P}): \quad f^*= \min \{ f(x) \mid h_{\alpha \beta }(x)\ge 0\;\; \forall \; (\alpha ,\beta ) \in \mathbb {N}_d^{2m}\}, \end{aligned}$$

where \(\mathbb {N}_d^{2m} = \{ (\alpha ,\beta )\in \mathbb {N}^{2m} \mid |\alpha |+|\beta | \le d\}\) and \(h_{\alpha \beta }\in \mathbb {R}[x]\) is the polynomial

$$\begin{aligned} x\,\mapsto \,h_{\alpha \beta }(x):=\prod _{j=1}^m g_j(x)^{\alpha _j}(1-g_j(x))^{\beta _j},\qquad x\in \mathbb {R}^n. \end{aligned}$$

The Lagrangian dual of \((\tilde{P})\) is given by

$$\begin{aligned} (\tilde{P}^*_d):\quad \sup _{\lambda }\,\{\,G_d(\lambda ):\,\lambda \ge 0\,\} \end{aligned}$$

where the function \(G_d(\cdot )\) is given by

$$\begin{aligned} \lambda \,\mapsto G_d(\lambda ) := \inf _{x\in \mathbb {R}^n} \{\,L_d(x,\lambda ) \},\qquad \lambda \,\ge \,0, \end{aligned}$$

and \(L_d(\cdot ,\lambda )\) is the Lagrangian function defined in (5).

Now for a fixed \(\lambda \), the evaluation of \(G_d(\lambda )\) is computational intractable. However, let \(k\in \mathbb {N}\) be fixed and observe that

$$\begin{aligned} G_d(\lambda ) = \inf _{x\in \mathbb {R}^n} L_d(x,\lambda )= & {} \sup _{t}\,\{ \,t \mid L_d(x,\lambda )-t \ge 0,\quad \forall \, x\,\}\\\ge & {} \sup _{t}\,\{\, t \mid L_d(x,\lambda )-t \in \Sigma [x]_{k}\,\}, \end{aligned}$$

where recall that \(\Sigma [x]_{k}\) is the space of SOS polynomials of degree at most 2k. Moreover,

$$\begin{aligned} f^*\,\ge \, \sup _{\lambda \ge 0}\, G_d(\lambda ) \; \ge \,q^k_d,\qquad \forall \,d\in \mathbb {N}. \end{aligned}$$

So the semidefinite program (6) can be seen as a tractable simplification of the intractable problem: \(\sup _{\lambda \ge 0}G(\lambda )\). The linear program (4) (which is (6) with \(k=0\)) is an even more brutal simplification so that \(q^k_d\ge q^0_d=\theta _d\) for all dk. As a matter of fact we have the more precise and interesting result.

Theorem 2

(Lasserre 2013) Let \(\mathbf {K}\subset \mathbb {R}^n\) in (2) be compact with nonempty interior and \(g_j(x) \le 1\) for \(x\in \mathbf {K}\) and \(j=1,\dots ,m\). Assume further that the family \(\{1,g_j\}\) generates the algebra \(\mathbb {R}[x]\). Let \(k\in \mathbb {N}\) be fixed and for each \(d\in \mathbb {N}\), let \(q^k_d\) be as in (6). Then

(a) The sequence \((q^k_d)\), \(d\in \mathbb {N}\), is monotone nondecreasing and \(q^k_d\rightarrow f^*\) as \(d\rightarrow \infty \).

(b) Moreover, assume that Slater’s condition holds, i.e., there exists \(x_0\in K\) such that \(g_j(x_0)>0\) for all \(j=1,\ldots ,m\). If f and \(-g_j\), \(j=1,\ldots ,m\) are SOS-convexFootnote 2 polynomials of degree at most 2k then \(q^k_1=f^*\), i.e., finite convergence takes places at the first relaxation in the hierarchy. In particular when \(f,-g_j\) are convex quadratic polynomials then \(q^1_1=f^*\).

Proof

The first result is a direct application of Theorem 1(b) since for any integer dk, \(f^*\ge q^k_d\ge q^0_d=\theta _d\), and by Theorem 1, \(\theta _d\rightarrow f^*\) as \(d\rightarrow \infty \). Next, if Slater’s condition holds and f and \(-g_j\) are SOS-convex, \(j=1,\ldots ,m\), then there exist nonnegative Lagrange-KKT multipliers \(\lambda \in \mathbb {R}^m_+\) such that

$$\begin{aligned}\nabla f(x^*)-\sum _j\lambda _j\,\nabla g_j(x^*)=0;\quad \lambda _j\,g_j(x^*)=0,\quad j=1,\ldots ,m.\end{aligned}$$

In other words, the Lagrangian \(x\mapsto L(x):=f(x)-f^*-\sum _j\lambda _j\,g_j(x)\) is SOS-convex and satisfies \(L(x^*)=0\) and \(\nabla L(x^*)=0\). By Helton and Nie (2010, Lemma 8, p. 33), L is SOS (of degree at most 2k), and thus \(q^k_1=f^*\). \(\square \)

2.3 The SDP formulation of (6)

To formulate (6) as a semidefinite program one has at least two possibilities depending on how we state that two polynomials \(p,q\in \mathbb {R}[x]_d\) are identical. Either by equating their coefficients (e.g. in the monomial basis), i.e., \(p_\alpha =q_\alpha \) for all \(\alpha \in \mathbb {N}^n_d\), or by equating their values on \( \big ( {\begin{array}{c} {n + d} \\ n \\ \end{array} } \big )\) generic points (e.g. randomly generated on the box \([-1,1]^n\)). In the present context of (6) we prefer the latter option since expanding the polynomial \(h_{\alpha \beta }(x)\) symbolically to get the coefficients with respect to the monomial basis can be expensive and memory intensive.

Let \(\tau = \max \{\mathrm{deg}(f), 2k, d\max _{j}\{\mathrm{deg}(g_j)\}\}\). Then for k fixed and for each d, we get

$$\begin{aligned} q_d^k= & {} \sup \Big \{ t \mid f(x) - t -\sum _{(\alpha ,\beta )\in \mathbb {N}^{2m}_{d}} \lambda _{\alpha \beta }\,h_{\alpha \beta }(x) = \langle Q,v_k(x)v_k(x)^T\rangle ;\nonumber \\&\ Q \in \mathcal {S}^{s(k)}_+,\lambda \ge 0\Big \} \end{aligned}$$
(7)
$$\begin{aligned}= & {} \sup \left\{ t\; \Big | \begin{array}{l} f(x^{(p)}) = t +\displaystyle \sum _{(\alpha ,\beta )\in \mathbb {N}^{2m}_{d}} \lambda _{\alpha \beta }\,h_{\alpha \beta }(x^{(p)}) + \langle Q,v_k(x^{(p)})v_k(x^{(p)})^T\rangle ,\\ p=1,\dots ,L, \; Q \in \mathcal {S}^{s(k)}_+,\; \lambda \ge 0,\; t\in \mathbb {R}\end{array} \right\} \nonumber \\ \end{aligned}$$
(8)

where \(L:=|\mathbb {N}^{n}_{\tau }|={\left( \begin{array}{c} {n+\tau } \\ {n}\end{array}\right) }\) and \(\{ x^{(p)}\in \mathbb {R}^n \mid p =1,\dots , L\}\) form a set of generic points in \([-1,1]^n\), and \(v_k(x)\) is the vector of the monomial basis of \(\mathbb {R}[x]_k\), the vector space of polynomials of degree at most k \(\Big (\text {whose \,dimension \,is\,} s(k)={\left( \begin{array}{c} {n+k} \\ {k}\end{array}\right) } \Big )\). Therefore, if the optimal value \(q_d^k\) of (7) is finite then every optimal solution \((q^k_d,\lambda ^*,Q^*)\) of (7) is also an optimal solution of (8).

Remark 1

To get a set of generic points one may first start with a set \(\Theta \) of L points randomly generated, e.g., according to the uniform distribution on \([0,1]^n\). Then it is well known that with probability one, the resulting collection of equality constraints in (8) is equivalent to the polynomial identity in (7). In principle, once such a set \(\Theta \) has been generated, the equivalence is guaranteed if the matrix

$$\begin{aligned} R(\Theta ):= \left[ \begin{array}{ccccc} v_\tau (x^{(1)}), \ldots , v_\tau (x^{(L)}) \end{array} \right] \end{aligned}$$

is nonsingular. In the unlikely event that \(R(\Theta )\) is singular, one can generate another sample \(\Theta \) of random points and check again whether the new corresponding matrix \(R(\Theta )\) is singular or not. If not, the procedure is repeated until a non-singular matrix \(R(\Theta )\) is obtained. However, in practice there is little need to do so since the above matrix \(R(\Theta )\) is nonsingular with probability one.

2.4 Sufficient conditions for finite convergence

By looking at the dual of the semidefinite program (8) one obtains sufficient conditions for finite convergence. To describe the dual of the semidefinite program (8) we need to introduce some notation.

For every \(p=1,\ldots ,L\), denote by \(\delta _{x^{(p)}}\) the Dirac measure at the point \(x^{(p)}\in \mathbb {R}\) and let \(\langle q,\delta _{x^{(p)}}\rangle =q(x^{(p)})\) for all \(p=1,\ldots ,L\), and all \(q\in \mathbb {R}[x]\).

With a real sequence \(\mathbf {y}=(y_\alpha )\), \(\alpha \in \mathbb {N}^n_{2\ell }\), denote by \(\mathbf {M}_\ell (\mathbf {y})\) the moment matrix associated with \(\mathbf {y}\). It is a real symmetric matrix with rows and columns indexed by \(\mathbb {N}^n_\ell \), and with entries

$$\begin{aligned} \mathbf {M}_\ell (\mathbf {y})(\alpha ,\beta )=y_{\alpha +\beta },\qquad \forall \,\alpha ,\beta \in \mathbb {N}^n_\ell . \end{aligned}$$

Similarly, for \(j=1,\ldots ,m\), letting \(g_j(x)=\sum _\gamma g_{j\gamma }x^\gamma \), denote by \(\mathbf {M}_\ell (g_j\,\mathbf {y})\) the localizing matrix associated with \(\mathbf {y}\) and \(g_i\in \mathbb {R}[x]\). Its rows and columns are also indexed by \(\mathbb {N}^n_\ell \), and with entries

$$\begin{aligned} \mathbf {M}_\ell (g_j\,\mathbf {y})(\alpha ,\beta )=\sum _{\gamma \in \mathbb {N}^n}g_{j\gamma }\,y_{\alpha +\beta +\gamma },\qquad \forall \,\alpha ,\beta \in \mathbb {N}^n_\ell . \end{aligned}$$

Moment and localizing matrices play an important role in semidefinite relaxations of polynomial optimization problems. For more details the interested reader is referred to Lasserre (2009).

The dual of the semidefinite program (8) reads

$$\begin{aligned} \begin{array}{rl} \tilde{q}_d^k := \displaystyle \inf _{\theta \in \mathbb {R}^L}&{}\displaystyle \sum _{p=1}^L \theta _p\,\langle f,\delta _{x^{(p)}}\rangle \\ \text{ s.t. } &{} \displaystyle \sum _{p=1}^L\theta _p\,(v_k(x^{(p)})\,v_k(x^{(p)})^T)\succeq 0\\ &{}\displaystyle \sum _{p=1}^L \theta _p\,\langle h_{\alpha \beta },\delta _{x^{(p)}}\rangle \ge 0,\quad (\alpha ,\beta )\in \mathbb {N}^{2m}_d\\ &{}\displaystyle \sum _{p=1}^L \theta _p=1. \end{array} \end{aligned}$$
(9)

(Notice that the weights \(\theta _p\) are not required to be nonnegative.) By standard weak duality in convex optimization, and for every fixed \(k\in \mathbb {N}\), one has

$$\begin{aligned} f^*\,\ge \,\tilde{q}^k_d\,\ge \,q^k_d,\qquad \forall d\in \mathbb {N}. \end{aligned}$$

Next, we have the following verification lemma.

Lemma 1

Let K in (2) be compact with nonempty interior and assume that there exists \(x_0\in K\) such that \(0<g_j(x_0)<1\) for all \(j=1,\ldots ,m\).

(a) For every fixed d sufficiently large (say \(d\ge d_0\)), the semidefinite program (7) (and thus (8) as well) has an optimal solution.

(b) Let \(s\in \mathbb {N}\) be the smallest integer such that \(2s\ge \max [\mathrm{deg}(f);\,\mathrm{deg}(g_j)]\), and let \(r:=\max _j\lceil \mathrm{deg}(g_j)/2\rceil \). Let \(\theta ^*\in \mathbb {R}^L\) be an optimal solution of (9) (whenever it exists) and let \(\mathbf {y}^*=(y^*_\alpha )\), \(\alpha \in \mathbb {N}^n_{2s}\), with

$$\begin{aligned} y^*_\alpha :=\sum _{p=1}^L \theta ^*_p\,(x^{(p)})^\alpha ,\qquad \alpha \in \mathbb {N}^n_{2s}. \end{aligned}$$
(10)

\(\bullet \) If \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=1\) then \(\tilde{q}^k_d=f^*\) and \(x^*=(y^*_\alpha )\), \(\vert \alpha \vert =1\), i.e., \(x^*=\sum _{p=1}^L\theta ^*_p\,x^{(p)}\), is an optimal solution of problem (P).

\(\bullet \) If \(\mathbf {M}_s(\mathbf {y}^*)\succeq 0\), \(\mathbf {M}_{s-r}(g_j\,\mathbf {y}^*)\succeq 0\), \(j=1,\ldots ,m\), and \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=\mathrm{rank}\,\mathbf {M}_{s-r}(\mathbf {y}^*)\), then \(\tilde{q}^k_d=f^*\), and problem (P) has \(\mathrm{rank}\,\mathbf {M}_{s}(\mathbf {y}^*)\) global minimizers that can be extracted by a linear algebra procedure.

The proof is postponed to the Appendix. Notice that we do not claim that problem (9) has always an optimal solution. Lemma 1 is a verification lemma (or a stopping criterion) based on some sufficient rank condition on \(\mathbf {M}_s(\mathbf {y}^*)\) and \(\mathbf {M}_{s-r}(\mathbf {y}^*)\), provided that an optimal solution \(\mathbf {y}^*\) exists. When the latter condition holds true then \(f^*=\tilde{q}^k_d\) and we can stop as at least one global optimal solution \(x^*\in K\) has been identified.

2.5 On the rank-one matrices of (6) and SDPT3

Note that in the SDP (8), the constraint matrices associated with Q are all dense rank-1 matrices of the form \(A_p = v_k(x^{(p)})v_k(x^{(p)})^T\). Thus if we let \(\varvec{v}_p = v_k(x^{(p)})\), then the linear maps involved in the equality constraints of the SDP can be evaluated cheaply based on the following formulas:

$$\begin{aligned} \mathcal {A}(X) := \Big [ \langle A_p,X\rangle \Big ]_{p=1}^L = \Big [ \langle \varvec{v}_p, X \varvec{v}_p\rangle \Big ]_{p=1}^L,\quad \mathcal {A}^*y := \sum _{y=1}^{L} y_p A_p = V\,\mathrm{Diag}(y) V^T, \end{aligned}$$

where \(X\in \mathcal {S}^{s(k)}\), \(y\in \mathbb {R}^L\), \(V=[\varvec{v}_1,\dots ,\varvec{v}_L]\in \mathbb {R}^{s(k)\times L}\). Moreover, one need not store the dense constraint matrices \(\{ A_p \mid p=1,\dots ,L\}\) but only the vectors \(\{ \varvec{v}_p\mid p=1,\dots ,L\}.\) To solve the SDP (8) efficiently, we need to exploit the rank-1 structure of the constraint matrices during the iterations. Fortunately, the SDPT3 solver Toh et al. (1999, 2003) based on interior point methods has already been designed to exploit such a rank-1 structure to minimize the memory needed to store the constraint matrices, as well as to minimize the computational cost required to compute the Schur complement matrix arising in each interior-point iteration. More precisely, in each iteration where a positive definite matrix \(W\in \mathcal {S}^{s(k)}\) is given, one needs to compute the Schur complement matrix \(\mathbf{S}\) whose (pq) element is given by

$$\begin{aligned} \mathbf{S}_{pq} = \langle A_p,WA_q W\rangle = \langle \varvec{v}_p\varvec{v}_p^T,W\varvec{v}_q\varvec{v}_q^TW\rangle = \langle \varvec{v}_p,W\varvec{v}_q\rangle ^2, \quad p,q = 1,\dots ,L. \end{aligned}$$

It is the combination of these two implementation techniques (point evaluation in the formulation and exploiting rank-one structure in the interior point algorithm) that makes our implementation of the SOS-hierarchy (6) efficient.

3 Computational issues

Given \(f\in \mathbb {R}[x]_d\), to efficiently evaluate the vector \(f(x^{(p)})\), \(p=1,\dots , L\), we need a convenient representation of the polynomial f(x). In our implementation of BSOS, we use the following data format to input a polynomial:

$$\begin{aligned} \varvec{F}(i,1:n+1) = [\alpha ^T, f_\alpha ] \end{aligned}$$

where \(f_\alpha \) is the ith coefficient corresponding to the monomial \(x^\alpha \). Note that the enumeration of the coefficients of f(x) is not important. For a given point \(z\in \mathbb {R}^n\) such that \(z_i\not =0\) for all \(i=1,\dots ,n\), we evaluate f(z) via the following procedure written in Matlab syntax:

Step 1::

Set \(\varvec{P}= \varvec{F}(:,1:n)\), \(\varvec{f}=\varvec{F}(:,n+1)\), and \(s = (s_1,\dots , s_n)^T\), where \(s_i = 1\) if \(z_i < 0\), and \(s_i=0\) if \(z_i \ge 0\).

Step 2::

Compute \(\bar{s} = \mathtt{rem}(\varvec{P}s,2)\) and \(\varvec{z}=\exp (\varvec{P}\log |z|)\).

Step 3::

Compute \(f(z) = \langle \varvec{f}^{(a)},\varvec{z}^{(a)}\rangle - \langle \varvec{f}^{(b)}, \varvec{z}^{(b)}\rangle \), where \(\varvec{f}^{(a)} =\varvec{f}(\mathtt{find}(\bar{s}==0))\) and \(\varvec{f}^{(b)} =\varvec{f}(\mathtt{find}(\bar{s}==1)).\)

(The above procedure can be modified slightly to handle the case when z has some zero components.) Note that in the above procedure, \( \langle \varvec{f}^{(a)},\varvec{z}^{(a)}\rangle \) and \( \langle \varvec{f}^{(b)}, \varvec{z}^{(b)}\rangle \) correspond to the sum of positive terms and sum of negative terms in the evaluation of f(z). By separating the summation of the positive and negative terms in the evaluation of f(z), it is hoped that cancellation errors can be minimized.

We should mention that some of the equality constraints in (8) may be redundant. For the sake of reducing the computational cost and improve the numerical stability, we remove these redundant constraints before solving the SDP. However, as d increases, the linear constraints would become more and more nearly dependent, and typically the SDP problem cannot be solved accurately by either SDPT3 or SEDUMI.

Another numerical issue which we should point out is that the constraint matrix

$$\begin{aligned} \left[ \begin{array}{c} \big (h_{\alpha \beta }(x^{(1)})\big )_{(\alpha ,\beta )\in \mathbb {N}^{2m}_d} \\ \vdots \\ \big (h_{\alpha \beta }(x^{(L)})\big )_{(\alpha ,\beta )\in \mathbb {N}^{2m}_d} \end{array}\right] \end{aligned}$$

associated with the nonnegative vector \((\lambda _{\alpha \beta })\) is typically fully dense. Such a matrix would consume too much memory and also computational cost when d increases or when m is large.

4 Numerical experiments

We call our approach BSOS (for hierarchy with bounded degree SOS). As mentioned in the Introduction, we conduct experiments on three classes of problems which will be described in the ensuing subsections.

4.1 Comparison of BSOS with gloptiploy

We construct a set of test functions with 5 constraints. The test functions are mainly generated based on the following two problems:

$$\begin{aligned} \begin{array}{r@{\quad }r@{\quad }l@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }l} (P_1) &{} &{} f =&{}x_1^2 &{}- x_2^2 &{} &{}+x_3^2 &{}- x_4^2 &{} +x_1 &{}-x_2\\ \text{ s.t. } &{}0 &{}\le g_1=&{}2x_1^2 &{}+\,3x_2^2 &{}+\,2x_1x_2 &{}+\,2x_3^2 &{}+\,3x_4^2 &{}+\,2x_3x_4 &{} \le 1\\ &{}0 &{}\le g_2=&{}3x_1^2 &{}+\,2x_2^2 &{}-\,4x_1x_2 &{}+\,3x_3^2 &{}+\,2x_4^2 &{}-\,4x_3x_4 &{}\le 1\\ &{}0 &{}\le g_3=&{}x_1^2 &{}+\,6x_2^2 &{}-\,4x_1x_2 &{}+\,x_3^2 &{}+\,6x_4^2 &{}-\,4x_3x_4 &{}\le 1\\ &{}0 &{}\le g_4=&{}x_1^2 &{}+\,4x_2^2 &{}-\,3x_1x_2 &{}+\,x_3^2 &{}+\,4x_4^2 &{}-\,3x_3x_4 &{}\le 1\\ &{}0 &{}\le g_5=&{}2x_1^2&{}+\,5x_2^2 &{}+\,3x_1x_2 &{}+\,2x_3^2 &{}+\,5x_4^2 &{}+\,3x_3x_4 &{}\le 1\\ &{}0 &{}\le x . \quad \end{array} \end{aligned}$$

The optimal value of \((P_1)\) is \(f(x^*)=-0.57491\), as computed by GloptiPoly3. For BSOS, we get the result \(q^{k=1}_{d=1} = -0.57491\), which is the exact result.

The second problem is

$$\begin{aligned}&\begin{array}{r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }r@{\quad }l} (P_2) &{} &{} f =&{}x_{1}^4x_{2}^2 &{}+\,x_{1}^2x_{2}^4 &{}-\,x_{1}^2x_{2}^2 \\ \text{ s.t. } &{}0 &{}\le g_1 =&{}x_{1}^2 &{}+\,x_{2}^2 &{} &{}\le 1\\ &{}0 &{}\le g_2=&{}3x_{1}^2 &{}+\,2x_{2}^2 &{}-\,4x_{1}x_{2}&{}\le 1\\ &{}0 &{}\le g_3=&{}x_{1}^2 &{}+\,6x_{2}^4 &{}-8x_{1}x_{2}+2.5&{}\le 1\\ &{}0 &{}\le g_4=&{}x_{1}^4 &{}+\,3x_{2}^4 &{} &{} \le 1\\ &{}0 &{}\le g_5=&{}x_{1}^2 &{}+\,x_{2}^3 &{} &{} \le 1 \end{array} \\&\qquad \quad \; 0 \le x_1,\quad 0 \le x_2. \end{aligned}$$

The optimal value of \((P_2)\) is \(f(x^*)=-0.037037\), as computed by GloptiPoly3. The results obtained by BSOS are

$$\begin{aligned} \begin{array}{l@{\quad }l@{\quad }l} q^{k=3}_{d=1} = -0.041855, &{}q^{k=3}_{d=2} = -0.037139, &{}q^{k=3}_{d=3} = -0.037087 \\ q^{k=3}_{d=4} = -0.037073, &{} q^{k=3}_{d=5} = -0.037046\\ q^{k=4}_{d=1} = -0.038596, &{}q^{k=4}_{d=2} = -0.037046, &{}q^{k=4}_{d=3} = -0.037040\\ q^{k=4}_{d=4} =-0.037038, &{} q^{k=4}_{d=5} = -0.037037. \end{array} \end{aligned}$$

Based on the above two problems, we increase the degree of the objective function and constraint functions to generate other test instances which are given explicitly in the Appendix.

Table 1 compares the results obtained by BSOS and GloptiPoly3 for the tested instances. We observe that BSOS can give the exact result for those problems with either low degree or low dimension, while also providing a good lower bound for high-degree and high-dimensional problems. In particular on this sample of problems, k is chosen so that the size of the semidefinite constraint \(\mathrm{\bigg (which\,is}\,\bigg ( {\begin{array}{c} {n + k} \\ n \\ \end{array} } \bigg )\bigg )\) is the same as the one needed in GloptiPoly, to certify global optimality. Then notice that BSOS succeeds in finding the optimal value even though the positivity certificate used in (8) is not Putinar’s certificate (3) used in GloptiPoly. In addition, for most of test problems, BSOS can usually get better bounds as d increases, and in most cases, the bound is good enough for small \(d=2,3\).

In Table 1, we also use the sufficient condition stated in Lemma 1 to check whether the generated lower bound is indeed optimal. For quite a number of instances, the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) associated with the optimal solution \(\theta ^*\) of (9) indeed has numerical rank equal to one (we declare that the matrix has numerical rank equal to one if the largest eigenvalue is at least \(10^4\) times larger than the second largest eigenvalue), which certifies that the lower bound is actually the optimal value. We should note that for some of the instances, although the lower bound is actually the optimal value (as declared by GloptiPoly), but the rank of the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) is larger than one.

Table 1 Comparison of BSOS and GloptiPoly3

4.2 Comparison of BSOS with the LP relaxations of Krivine-Stengle on convex problems

Here we compare the performance of BSOS with the LP relaxations of Krivine-Stengle on convex problems where each test problem has 5 constraint functions in addition to the nonnegative constraint \(x\ge 0\). Note that the LP relaxation problem has exactly the same form as in (8), except that the positive semidefinite matrix variable Q is set to 0. We should mention that even though the Krivine-Stengle scheme generates LP problems instead of SDP problems, the size of the corresponding LP problems also increases rapidly with d, like for the BSOS scheme. In particular, in both LP- and BSOS-relaxations, the dimension of the nonnegative variable \(\lambda \) is \( \left( {\begin{array}{c} {2m + d} \\ d \\ \end{array} } \right) \), and the constraint matrix is fully dense. \(\big (\mathrm{The\,BSOS}\text {-}\mathrm{relaxations\,include\,an\,additional\,semidefinite\,constraint\,with\,fixed}\,\mathrm{matrix\, size} {\left( \begin{array}{c} {n+k} \\ {k}\end{array}\right) }.\big )\) The following example illustrates the performance of LP relaxation method:

$$\begin{aligned} \begin{array}{rlrrrl} (C_1) \quad \min &{}f &{}= &{}x_1^4 &{}+ \,x_2^4 &{}+2x_1^2x_2^2-x_1-x_2\\ \text{ s.t. } &{} 0 \le g_1&{}=&{}-x_1^4 &{}\,-2x_2^4 &{}+1 \\ &{}0 \le g_2 &{}=&{}-2x_1^4 &{}\,-x_2^4&{}+1 \\ &{}0 \le g_3 &{}=&{}-x_1^4 &{}\,-\,4x_2^2&{}+1.25 \\ &{}0 \le g_4 &{}=&{}-4x_1^4 &{}\,-x_2^4&{}+1.25 \\ &{}0 \le g_5 &{}=&{}-2x_1^4 &{}\,-3x_2^2&{}+1.1 \\ &{} 0\le x_1\\ &{} 0 \le x_2. \end{array} \end{aligned}$$

For this problem, the functions f and \(-g_i\)’s are all convex. The optimal value for this problem is \(f(x^*)=-0.7500\), as computed by GloptiPoly3. For BSOS, we get \(q^{k=2}_{d=1}=-0.7500\), and we obtained the exact result by just choosing \(d=1\). This observation is consistent with Theorem 4.1 in Lasserre (2013). For the LP relaxation method, we get the following values for various choices of d:

$$\begin{aligned} q^{LP}_{d=1}= & {} \text{ infeasible },\;\; q^{LP}_{d=2} = -1.2200, \\ q^{LP}_{d=3}= & {} -1.0944,\;\; q^{LP}_{d=4} =-0.9696,\;\; q^{LP}_{d=5} =\text{ fail. } \end{aligned}$$

Observe that when d increases, we could get a better lower bound for the exact optimal value. However, as d increases, the LP relaxation problem would become increasing ill-posed and the solver has difficulty in solving LP problem accurately. In particular, for \(d=5\), both the solvers SeDuMi and SDPT3 fail to compute an accurate enough solution for the LP to generate a sensible lower bound for \(f(x^*)\).

In Table 2, we observe that BSOS can achieve the exact result with \(d=1\) for all the test instances. In contrast, the LP relaxation method of Krivine-Stengle does not perform very well even though the test instances are convex problems. In particular, observe that for the last instance C20_2, the LP relaxation method cannot produce a good lower bound even when we choose \(d=3\), and the time taken to solve the correspond LP is about 40 minutes.

Table 2 Comparison of BSOS with LP relaxations of Krivine-Stengle on convex problems

4.3 Performance of BSOS on quadratic problems with polyhedral constraints

Here consider the following problem:

$$\begin{aligned} \begin{array}{rl} \min &{} x^T A x \\ \text{ s.t. } &{} e^T x \le 1, \quad x \ge 0, \quad x\in \mathbb {R}^n, \end{array} \end{aligned}$$
(11)

where A is a given \(n\times n\) symmetric matrix. In our numerical experiments, we generate random instances such as Qn10_r2 for which \(n=10\) and A is randomly generated so that it has \(r=2\) negative eigenvalues and \(n-r\) positive eigenvalues as follows:

figure a

Table 3 compares the performance of BSOS and GloptiPoly3. From the numerical results, we can see that BSOS is far more efficient than GloptiPoly3 in solving the problems (11). For example, for the problem Qn20_r2 with \(n=20\), BSOS took only 1.9 seconds to generate the lower bound \(-2.0356e3\) for the problem, but GloptiPoly3 took more than 1 hour to generate the same bound. The disparity in the efficiency between BSOS and GloptiPoly3 is expected to become even wider for other instances with n larger than 20.

In Table 3, we again use the sufficient condition stated in Lemma 1 to check whether the generated lower bound is indeed optimal. For each of the first eight instances, the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) associated with the optimal solution \(\theta ^*\) of (9) has numerical rank equal to one (we declare that the matrix has numerical rank equal to one if the largest eigenvalue is at least \(10^4\) times larger than the second largest eigenvalue), which certifies that the lower bound is actually the optimal value.

Table 3 Comparison of BSOS and GloptiPoly3 on quadratic problems with polyhedral constraints
Table 4 Comparison of BSOS and GloptiPoly3 on higher order problems with more variables

4.4 Performance of BSOS on higher order problems with more variables

Here we consider the following problem:

$$\begin{aligned} \begin{array}{rl} \min &{} \quad \sum _{|\alpha | \le \ell }\,c_{\alpha }\,x^{\alpha }\\ \text{ s.t. }&{} \quad x_i \ge 0,\quad i=1,\ldots n\\ &{}\sum \limits _{i=1}^n x_i^2 \le 1 \end{array} \end{aligned}$$
(12)

where \(\ell =2\) or \(\ell =4\) and \(c_{\alpha }\) are randomly generated in \([-1,1]\). In our numerical experiments, we generated instances such as Hn20_4 for which \(n=20\) and the degree of the polynomial is 4. The problem is find the minimal value of a polynomial on the Euclidean unit ball intersected with the positive orthant.

Table 4 displays the respective performance of BSOS and GloptiPoly3. From the numerical results, we can see that BSOS is far more efficient than GloptiPoly3 in solving problems (12). For example, for problem Hn20_4 (with \(n=20\) variables and degree \(\ell =4\)), BSOS took about 250s to generate the first lower bound \(-2.1860\), and took nearly 300s to generate a better lower bound of \(-1.5943\), which is also the exact optimal value for the problem. But GloptiPoly3 got out of memory when solving the same problem. Similarly for problem Hn40_2, it took BSOS and GloptiPoly3 very little time to generate the first lower bound of \(-2.3789\). To get a better lower bound, it took BSOS 29s to generate the optimal value of the problem. In contrast GloptiPoly3 got out of memory for improving the bound. From our observations, the disparity in efficiency between BSOS and GloptiPoly will become wider for instances with larger n and/or of higher degree.

5 Conclusion

We have described and tested a new hierarchy of semidefinite relaxations for global polynomial optimization. It tries to combine some advantages of previously defined LP- and SOS-hierarchies. Essentially, it uses a positivity certificate already used in the LP-hierarchy but with an additional semidefinite constraint which thus makes it an SOS-hierarchy. However the main and crucial point is that the size of this additional semidefinite constraint is fixed in advance and decided by the user (in contrast to the standard SOS-hierarchy in which the size of the semidefinite constraint increases in the hierarchy). Preliminary results are encouraging especially for non convex problems on convex polytopes where problems with up to 100 variables have been solved in a reasonable amount of time (whereas the standard SOS-hierarchy of GloptiPoly cannot be implemented).

For problems of larger size one needs to consider some serious numerical issues due to the presence of some fully dense submatrix and some nearly dependent linear constraints. In addition, to be able to handle large-scale problems one also needs to provide a “sparse version” of this hierarchy, an analogue of the sparse version of the SOS-hierarchy defined in Waki et al. (2006). Both issues (a topic of further investigation) are certainly non trivial, in particular the latter issue because the positivity certificate used in this new hierarchy involves products of initial polynomial constraints, which destroys the sparsity pattern considered in Waki et al. (2006).