Abstract
We consider a new hierarchy of semidefinite relaxations for the general polynomial optimization problem \((P):\,f^{*}=\min \{f(x):x\in K\}\) on a compact basic semi-algebraic set \(K\subset \mathbb {R}^n\). This hierarchy combines some advantages of the standard LP-relaxations associated with Krivine’s positivity certificate and some advantages of the standard SOS-hierarchy. In particular it has the following attractive features: (a) in contrast to the standard SOS-hierarchy, for each relaxation in the hierarchy, the size of the matrix associated with the semidefinite constraint is the same and fixed in advance by the user; (b) in contrast to the LP-hierarchy, finite convergence occurs at the first step of the hierarchy for an important class of convex problems; and (c) some important techniques related to the use of point evaluations for declaring a polynomial to be zero and to the use of rank-one matrices make an efficient implementation possible. Preliminary results on a sample of non convex problems are encouraging.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
We consider the polynomial optimization problem:
where \(f\in \mathbb {R}[x]\) is a polynomial and \(\mathbf {K}\subset \mathbb {R}^n\) is the basic semi-algebraic set
for some polynomials \(g_j\in \mathbb {R}[x]\), \(j=1,\ldots ,m\). In order to approximate (and sometimes solve exactly) (P) one may instead solve a hierarchy of convex relaxations of (P) of increasing sizes, namely for instance,
\(\bullet \) Semidefinite relaxations defined in Lasserre (2001) and based on Putinar’s certificate of positivity on K Putinar (1993), where the d-th convex relaxation of the hierarchy is a semidefinite program given by
The unknowns \(\sigma _j\) are sums of squares (SOS) polynomials with the degree bound constraint, \(\mathrm{degree}(\sigma _jg_j)\le 2d\), \(j=0,\ldots ,m\), and the expression in (3) is a certificate of positivity on \(\mathbf {K}\) for the polynomial \(x\mapsto f(x)-t\).
\(\bullet \) LP-relaxations based on Krivine-Stengle’s certificate of positivity on \(\mathbf {K}\) Krivine (1964), Stengle (1974), where the d-th convex relaxation of the hierarchy is a linear program given by
where \(\mathbb {N}^{2m}_{d}=\{(\alpha ,\beta )\in \mathbb {N}^{2m}:\sum _j\alpha _j+\beta _j\le d\}\). The unknown are t and the nonnegative scalars \(\lambda =(\lambda _{\alpha \beta })\), and it is assumed that \(0\le g_j\le 1\) on \(\mathbf {K}\) (possibly after scaling) and the family \(\{1,g_j\}\) generates the algebra \(\mathbb {R}[x]\) of polynomials. Problem (4) is an LP because stating that the two polynomials in both sides of “\(=\)” are equal yields linear constraints on the \(\lambda _{\alpha \beta }\)’s. For instance, the LP-hierarchy from Sherali–Adams RLT Sherali and Adams (1990) and their variants Sherali and Adams (1999) are of this form.
In both cases, \((\gamma _d)\) and \((\theta _d)\), \(d\in \mathbb {N}\), provide two monotone nondecreasing sequences of lower bounds on \(f^*\) and if \(\mathbf {K}\) is compact, then both converge to \(f^*\) as one lets d increases. For more details as well as a comparison of such relaxations, the reader is referred to, e.g., Lasserre (2009, 2002) and Laurent (2003), as well as Chlamtac and Tulsiani (2012) for the impact of LP- and SOS-hierarchies on approximation algorithms in combinatorial optimization.
Of course, in principle, one would much prefer to solve LP-relaxations rather than semidefinite relaxations (i.e. compute \(\theta _d\) rather than \(\gamma _d\)) because present LP-software packages can solve sparse problems with millions of variables and constraints, which is far from being the case for today’s semidefinite solvers. Therefore the hierarchy (3) applies to problems of modest size only. However if some sparsity or symmetry is taken into account then specialized variants as in Waki et al. (2006) can handle problems of much larger size. However, on the other hand, the LP-relaxations (4) suffer from several serious theoretical and practical drawbacks. For instance, it has been shown in Lasserre (2002, 2009) that the LP-relaxations cannot be exact for most convex problems, i.e., the sequence of the associated optimal values converges to the global optimum only asymptotically, and not in finitely many steps. Moreover, the LPs of the hierarchy are numerically ill-conditioned. This is in contrast with the semidefinite relaxations (3) for which finite convergence takes place for convex problems where \(\nabla ^2f(x^*)\) is positive definite at every minimizer \(x^*\in \mathbf {K}\) (see de Klerk and Laurent 2011, Corollary 3.3) and occurs at the first relaxation for SOS-convexFootnote 1 problems (Lasserre 2009, Theorem 3.3). In fact, as demonstrated in recent works of Marshall (2009) and Nie (2014), finite convergence is generic even for non convex problems.
1.1 Contribution
This paper is in the vein of recent attempts in Lasserre (2013) and Ahmadi and Majumdar (2014) to overcome the important computational burden associated with the standard SOS-hierarchy (3). In particular, in Lasserre (2013) we have suggested another hierarchy of convex relaxations which combines some of the advantages of the SOS- and LP- hierarchies (3) and (4). In the present paper we take advantage of attractive features of the SDPT3 solver Toh et al. (1999, 2003) to provide an effective implementation of this new hierarchy. First preliminary tests on a sample of non convex problems are encouraging and suggest that this new hierarchy might be efficient. This new hierarchy is another type of SOS-hierarchy labelled BSOS (for hierarchy with bounded degree SOS) with the following attractive features:
\(\bullet \) In contrast to the standard SOS-hierarchy (3), for each semidefinite program in the hierarchy, the size \(\big ({\begin{array}{c} {n + k} \\ n \\ \end{array} }\big )\) of the semidefinite matrix variable is now fixed, parametrized by an integer k that one fixes in advance. This integer k determines the degree of a certain SOS polynomial (for instance one may fix \(k=2\)), whence the label BSOS (for “bounded”-SOS). Recall that in the standard SOS-hierarchy (3) the size of the semidefinite matrix variable is \( \big ({\begin{array}{c} {n + d} \\ n \\ \end{array} }\big ) \) with rank d in the hierarchy.
\(\bullet \) In contrast to the LP-hierarchy (4), finite convergence occurs at the first step in the hierarchy for a large class of convex problems: typically convex problems defined with convex quadratic polynomials or SOS-convex polynomials of degree at most k. Recall that such finite convergence is impossible for the LP-hierarchy (4).
\(\bullet \) Just as in the standard SOS-hierarchy (3), there also exists a sufficient condition for finite convergence of the hierarchy. Namely, it suffices to check whether at an optimal solution of the corresponding SDP some associated moment matrix is rank-one.
\(\bullet \) Last but not least, to implement this hierarchy one uses important techniques that dramatically alleviate the computational burden associated with a standard (careless) implementation. Namely, to declare that two polynomials are identical one uses that their values are equal on finitely many randomly chosen points (instead of equating their coefficients). In addition, the SDP solver SDPT3 Toh et al. (1999, 2003) can be used to handle efficiently some types of matrices used in our positivity certificate.
Preliminary computational experiments First, we have compared our results with those obtained with the GloptiPoly software Henrion et al. (2009) (devoted to solving the SOS-hierarchy (3)) on a sample of non convex problems with up to 20 variables. For problems with low degree (in the initial data) and/or low dimension we obtain the global optimum, whereas good lower bounds are always obtained for problems with high degree or higher dimension (e.g. problems with degree 4 and up to 20 variables).
Next, we have also tested the LP-hierarchy (4) on a sample of convex problems and as expected the convergence is very poor and the resulting LPs become ill conditioned. In addition, the LP can be expensive to solve as the LP data are typically dense. In contrast, the new hierarchy (with smallest value \(k=1\) of its parameter) converges at the first step even though some of the problems are defined with polynomials of degree larger than 2.
We have also considered a sample of non convex quadratic problems of the form \(\inf \{x^TAx: x\in \Delta \}\), where \(\Delta \subset \mathbb {R}^n\) is the canonical simplex, and A is a randomly generated real symmetric matrix with r negative eigenvalues and \(n-r\) positive eigenvalues. For all problems that could be solved with GloptiPoly (up to \(n=20\) variables) we obtain the optimal values. For the other problems with \(n=40, 50,100\) variables, only the first (dense) relaxation of GloptiPoly can be implemented and yields only a lower bound on the global optimum. For those problems, a better lower bound is obtained in a reasonable amount of time by running the BSOS hierarchy.
Finally, we have considered the minimization of quadratic and quartic polynomials (with up to 40 variables) on the Euclidean unit ball intersected with the positive orthant. Again in those examples only the first SDP-relaxation of GloptiPoly can be implemented, providing only a lower bound. In contrast, BSOS solves all problems at step 2 of the hierarchy in a reasonable amount of time.
Of course this new hierarchy of semidefinite relaxations also has its drawbacks (at least in its present version). Namely, some submatrix (of the matrix used to describe the linear equality constraints of the resulting SDP) is fully dense and many of these linear constraints are nearly dependent, which yields a lack of accuracy in the optimal solution when the order of relaxation d is increased.
2 Main result
2.1 Notation and definitions
Let \(\mathbb {R}[x]\) be the ring of polynomials in the variables \(x=(x_1,\ldots ,x_n)\). Denote by \(\mathbb {R}[x]_d\subset \mathbb {R}[x]\) the vector space of polynomials of degree at most d, which forms a vector space of dimension \( s(d)= \big ( {\begin{array}{c} {n + d} \\ d \\ \end{array} } \big ) \), with, e.g., the usual canonical basis \((x^\alpha )\) of monomials. Also, denote by \(\Sigma [x]\subset \mathbb {R}[x]\) (resp. \(\Sigma [x]_d\subset \mathbb {R}[x]_{2d}\)) the space of sums of squares (SOS) polynomials (resp. SOS polynomials of degree at most 2d). If \(f\in \mathbb {R}[x]_d\), we write \(f(x)=\sum _{\alpha \in \mathbb {N}^n_d}f_\alpha x^\alpha \) in the canonical basis and denote by \(\varvec{f}=(f_\alpha )\in \mathbb {R}^{s(d)}\) its vector of coefficients. Finally, let \(\mathcal {S}^n\) denote the space of \(n\times n\) real symmetric matrices, with inner product \(\langle \mathbf {A},\mathbf {B}\rangle =\mathrm{trace}\,\mathbf {A}\mathbf {B}\). We use the notation \(\mathbf {A}\succeq 0\) (resp. \(\mathbf {A}\succ 0\)) to denote that \(\mathbf {A}\) is positive semidefinite (definite). With \(g_0:=1\), the quadratic module \(Q(g_1,\ldots ,g_m)\subset \mathbb {R}[x]\) generated by polynomials \(g_1,\ldots ,g_m\), is defined by
We briefly recall two important theorems by Putinar (1993) and Krivine (1964), Stengle (1974) respectively, on the representation of polynomials that are positive on \(\mathbf {K}\).
Theorem 1
Let \(g_0=1\) and \(\mathbf {K}\) in (2) be compact.
(a) If the quadratic polynomial \(x\mapsto M-\Vert x\Vert ^2\) belongs to \(Q(g_1,\ldots ,g_m)\) and if \(f\in \mathbb {R}[x]\) is strictly positive on \(\mathbf {K}\), then \(f\in Q(g_1,\ldots ,g_m)\).
(b) Assume that \(0\le g_j\le 1\) on \(\mathbf {K}\) for every j, and the family \(\{1,g_j\}\) generates \(\mathbb {R}[x]\). If f is strictly positive on \(\mathbf {K}\), then
for some (finitely many) nonnegative scalars \((c_{\alpha \beta })\).
2.2 The bounded-SOS-hierarchy (BSOS)
Consider the problem
where \(K\subset \mathbb {R}^n\) is the basic semi-algebraic set defined in (2), assumed to be compact. Moreover we also assume that \(g_j(x) \le 1\) for all \(x\in \mathbf {K}\) and \(j=1,\dots ,m\), and \(\{1,g_j\}\) generates the ring of polynomials \(\mathbb {R}[x]\). For every fixed \(d\in \mathbb {N}\) and \(\lambda =(\lambda _\alpha )\), \(\alpha \in \mathbb {N}^{2m}_d\), let \(L_d(\cdot ,\lambda )\in \mathbb {R}[x]\) be the “Lagrangian” polynomial:
With \(k\in \mathbb {N}\) fixed, consider the family of optimization problems indexed by \(d\in \mathbb {N}\):
Observe that when k is fixed, then for each \(d\in \mathbb {N}\)
-
computing \(q^k_d\) in (6) reduces to solving a semidefinite program and therefore (6) defines a hierarchy of semidefinite programs because \(q^k_{d+1}\ge q^k_d\) for all \(d\in \mathbb {N}\).
-
The semidefinite constraint is associated with the constraint \(L_d(x,\lambda ) - t \in \Sigma [x]_{k}\) and the associated matrix has fixed size \( \big ( {\begin{array}{c} {n + k} \\ n \\ \end{array} } \big ) \), independent of \(d\in \mathbb {N}\), a crucial feature for computational efficiency of the approach.
-
If \(k=0\) then (6) is the linear program (4) and so \(\theta _d=q^0_d\le q^k_d\) for all d, k.
Interpretation For any fixed \(d\ge 1\), problem (P) is easily seen to be equivalent to the following problem by adding redundant constraints:
where \(\mathbb {N}_d^{2m} = \{ (\alpha ,\beta )\in \mathbb {N}^{2m} \mid |\alpha |+|\beta | \le d\}\) and \(h_{\alpha \beta }\in \mathbb {R}[x]\) is the polynomial
The Lagrangian dual of \((\tilde{P})\) is given by
where the function \(G_d(\cdot )\) is given by
and \(L_d(\cdot ,\lambda )\) is the Lagrangian function defined in (5).
Now for a fixed \(\lambda \), the evaluation of \(G_d(\lambda )\) is computational intractable. However, let \(k\in \mathbb {N}\) be fixed and observe that
where recall that \(\Sigma [x]_{k}\) is the space of SOS polynomials of degree at most 2k. Moreover,
So the semidefinite program (6) can be seen as a tractable simplification of the intractable problem: \(\sup _{\lambda \ge 0}G(\lambda )\). The linear program (4) (which is (6) with \(k=0\)) is an even more brutal simplification so that \(q^k_d\ge q^0_d=\theta _d\) for all d, k. As a matter of fact we have the more precise and interesting result.
Theorem 2
(Lasserre 2013) Let \(\mathbf {K}\subset \mathbb {R}^n\) in (2) be compact with nonempty interior and \(g_j(x) \le 1\) for \(x\in \mathbf {K}\) and \(j=1,\dots ,m\). Assume further that the family \(\{1,g_j\}\) generates the algebra \(\mathbb {R}[x]\). Let \(k\in \mathbb {N}\) be fixed and for each \(d\in \mathbb {N}\), let \(q^k_d\) be as in (6). Then
(a) The sequence \((q^k_d)\), \(d\in \mathbb {N}\), is monotone nondecreasing and \(q^k_d\rightarrow f^*\) as \(d\rightarrow \infty \).
(b) Moreover, assume that Slater’s condition holds, i.e., there exists \(x_0\in K\) such that \(g_j(x_0)>0\) for all \(j=1,\ldots ,m\). If f and \(-g_j\), \(j=1,\ldots ,m\) are SOS-convexFootnote 2 polynomials of degree at most 2k then \(q^k_1=f^*\), i.e., finite convergence takes places at the first relaxation in the hierarchy. In particular when \(f,-g_j\) are convex quadratic polynomials then \(q^1_1=f^*\).
Proof
The first result is a direct application of Theorem 1(b) since for any integer d, k, \(f^*\ge q^k_d\ge q^0_d=\theta _d\), and by Theorem 1, \(\theta _d\rightarrow f^*\) as \(d\rightarrow \infty \). Next, if Slater’s condition holds and f and \(-g_j\) are SOS-convex, \(j=1,\ldots ,m\), then there exist nonnegative Lagrange-KKT multipliers \(\lambda \in \mathbb {R}^m_+\) such that
In other words, the Lagrangian \(x\mapsto L(x):=f(x)-f^*-\sum _j\lambda _j\,g_j(x)\) is SOS-convex and satisfies \(L(x^*)=0\) and \(\nabla L(x^*)=0\). By Helton and Nie (2010, Lemma 8, p. 33), L is SOS (of degree at most 2k), and thus \(q^k_1=f^*\). \(\square \)
2.3 The SDP formulation of (6)
To formulate (6) as a semidefinite program one has at least two possibilities depending on how we state that two polynomials \(p,q\in \mathbb {R}[x]_d\) are identical. Either by equating their coefficients (e.g. in the monomial basis), i.e., \(p_\alpha =q_\alpha \) for all \(\alpha \in \mathbb {N}^n_d\), or by equating their values on \( \big ( {\begin{array}{c} {n + d} \\ n \\ \end{array} } \big )\) generic points (e.g. randomly generated on the box \([-1,1]^n\)). In the present context of (6) we prefer the latter option since expanding the polynomial \(h_{\alpha \beta }(x)\) symbolically to get the coefficients with respect to the monomial basis can be expensive and memory intensive.
Let \(\tau = \max \{\mathrm{deg}(f), 2k, d\max _{j}\{\mathrm{deg}(g_j)\}\}\). Then for k fixed and for each d, we get
where \(L:=|\mathbb {N}^{n}_{\tau }|={\left( \begin{array}{c} {n+\tau } \\ {n}\end{array}\right) }\) and \(\{ x^{(p)}\in \mathbb {R}^n \mid p =1,\dots , L\}\) form a set of generic points in \([-1,1]^n\), and \(v_k(x)\) is the vector of the monomial basis of \(\mathbb {R}[x]_k\), the vector space of polynomials of degree at most k \(\Big (\text {whose \,dimension \,is\,} s(k)={\left( \begin{array}{c} {n+k} \\ {k}\end{array}\right) } \Big )\). Therefore, if the optimal value \(q_d^k\) of (7) is finite then every optimal solution \((q^k_d,\lambda ^*,Q^*)\) of (7) is also an optimal solution of (8).
Remark 1
To get a set of generic points one may first start with a set \(\Theta \) of L points randomly generated, e.g., according to the uniform distribution on \([0,1]^n\). Then it is well known that with probability one, the resulting collection of equality constraints in (8) is equivalent to the polynomial identity in (7). In principle, once such a set \(\Theta \) has been generated, the equivalence is guaranteed if the matrix
is nonsingular. In the unlikely event that \(R(\Theta )\) is singular, one can generate another sample \(\Theta \) of random points and check again whether the new corresponding matrix \(R(\Theta )\) is singular or not. If not, the procedure is repeated until a non-singular matrix \(R(\Theta )\) is obtained. However, in practice there is little need to do so since the above matrix \(R(\Theta )\) is nonsingular with probability one.
2.4 Sufficient conditions for finite convergence
By looking at the dual of the semidefinite program (8) one obtains sufficient conditions for finite convergence. To describe the dual of the semidefinite program (8) we need to introduce some notation.
For every \(p=1,\ldots ,L\), denote by \(\delta _{x^{(p)}}\) the Dirac measure at the point \(x^{(p)}\in \mathbb {R}\) and let \(\langle q,\delta _{x^{(p)}}\rangle =q(x^{(p)})\) for all \(p=1,\ldots ,L\), and all \(q\in \mathbb {R}[x]\).
With a real sequence \(\mathbf {y}=(y_\alpha )\), \(\alpha \in \mathbb {N}^n_{2\ell }\), denote by \(\mathbf {M}_\ell (\mathbf {y})\) the moment matrix associated with \(\mathbf {y}\). It is a real symmetric matrix with rows and columns indexed by \(\mathbb {N}^n_\ell \), and with entries
Similarly, for \(j=1,\ldots ,m\), letting \(g_j(x)=\sum _\gamma g_{j\gamma }x^\gamma \), denote by \(\mathbf {M}_\ell (g_j\,\mathbf {y})\) the localizing matrix associated with \(\mathbf {y}\) and \(g_i\in \mathbb {R}[x]\). Its rows and columns are also indexed by \(\mathbb {N}^n_\ell \), and with entries
Moment and localizing matrices play an important role in semidefinite relaxations of polynomial optimization problems. For more details the interested reader is referred to Lasserre (2009).
The dual of the semidefinite program (8) reads
(Notice that the weights \(\theta _p\) are not required to be nonnegative.) By standard weak duality in convex optimization, and for every fixed \(k\in \mathbb {N}\), one has
Next, we have the following verification lemma.
Lemma 1
Let K in (2) be compact with nonempty interior and assume that there exists \(x_0\in K\) such that \(0<g_j(x_0)<1\) for all \(j=1,\ldots ,m\).
(a) For every fixed d sufficiently large (say \(d\ge d_0\)), the semidefinite program (7) (and thus (8) as well) has an optimal solution.
(b) Let \(s\in \mathbb {N}\) be the smallest integer such that \(2s\ge \max [\mathrm{deg}(f);\,\mathrm{deg}(g_j)]\), and let \(r:=\max _j\lceil \mathrm{deg}(g_j)/2\rceil \). Let \(\theta ^*\in \mathbb {R}^L\) be an optimal solution of (9) (whenever it exists) and let \(\mathbf {y}^*=(y^*_\alpha )\), \(\alpha \in \mathbb {N}^n_{2s}\), with
\(\bullet \) If \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=1\) then \(\tilde{q}^k_d=f^*\) and \(x^*=(y^*_\alpha )\), \(\vert \alpha \vert =1\), i.e., \(x^*=\sum _{p=1}^L\theta ^*_p\,x^{(p)}\), is an optimal solution of problem (P).
\(\bullet \) If \(\mathbf {M}_s(\mathbf {y}^*)\succeq 0\), \(\mathbf {M}_{s-r}(g_j\,\mathbf {y}^*)\succeq 0\), \(j=1,\ldots ,m\), and \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=\mathrm{rank}\,\mathbf {M}_{s-r}(\mathbf {y}^*)\), then \(\tilde{q}^k_d=f^*\), and problem (P) has \(\mathrm{rank}\,\mathbf {M}_{s}(\mathbf {y}^*)\) global minimizers that can be extracted by a linear algebra procedure.
The proof is postponed to the Appendix. Notice that we do not claim that problem (9) has always an optimal solution. Lemma 1 is a verification lemma (or a stopping criterion) based on some sufficient rank condition on \(\mathbf {M}_s(\mathbf {y}^*)\) and \(\mathbf {M}_{s-r}(\mathbf {y}^*)\), provided that an optimal solution \(\mathbf {y}^*\) exists. When the latter condition holds true then \(f^*=\tilde{q}^k_d\) and we can stop as at least one global optimal solution \(x^*\in K\) has been identified.
2.5 On the rank-one matrices of (6) and SDPT3
Note that in the SDP (8), the constraint matrices associated with Q are all dense rank-1 matrices of the form \(A_p = v_k(x^{(p)})v_k(x^{(p)})^T\). Thus if we let \(\varvec{v}_p = v_k(x^{(p)})\), then the linear maps involved in the equality constraints of the SDP can be evaluated cheaply based on the following formulas:
where \(X\in \mathcal {S}^{s(k)}\), \(y\in \mathbb {R}^L\), \(V=[\varvec{v}_1,\dots ,\varvec{v}_L]\in \mathbb {R}^{s(k)\times L}\). Moreover, one need not store the dense constraint matrices \(\{ A_p \mid p=1,\dots ,L\}\) but only the vectors \(\{ \varvec{v}_p\mid p=1,\dots ,L\}.\) To solve the SDP (8) efficiently, we need to exploit the rank-1 structure of the constraint matrices during the iterations. Fortunately, the SDPT3 solver Toh et al. (1999, 2003) based on interior point methods has already been designed to exploit such a rank-1 structure to minimize the memory needed to store the constraint matrices, as well as to minimize the computational cost required to compute the Schur complement matrix arising in each interior-point iteration. More precisely, in each iteration where a positive definite matrix \(W\in \mathcal {S}^{s(k)}\) is given, one needs to compute the Schur complement matrix \(\mathbf{S}\) whose (p, q) element is given by
It is the combination of these two implementation techniques (point evaluation in the formulation and exploiting rank-one structure in the interior point algorithm) that makes our implementation of the SOS-hierarchy (6) efficient.
3 Computational issues
Given \(f\in \mathbb {R}[x]_d\), to efficiently evaluate the vector \(f(x^{(p)})\), \(p=1,\dots , L\), we need a convenient representation of the polynomial f(x). In our implementation of BSOS, we use the following data format to input a polynomial:
where \(f_\alpha \) is the ith coefficient corresponding to the monomial \(x^\alpha \). Note that the enumeration of the coefficients of f(x) is not important. For a given point \(z\in \mathbb {R}^n\) such that \(z_i\not =0\) for all \(i=1,\dots ,n\), we evaluate f(z) via the following procedure written in Matlab syntax:
- Step 1::
-
Set \(\varvec{P}= \varvec{F}(:,1:n)\), \(\varvec{f}=\varvec{F}(:,n+1)\), and \(s = (s_1,\dots , s_n)^T\), where \(s_i = 1\) if \(z_i < 0\), and \(s_i=0\) if \(z_i \ge 0\).
- Step 2::
-
Compute \(\bar{s} = \mathtt{rem}(\varvec{P}s,2)\) and \(\varvec{z}=\exp (\varvec{P}\log |z|)\).
- Step 3::
-
Compute \(f(z) = \langle \varvec{f}^{(a)},\varvec{z}^{(a)}\rangle - \langle \varvec{f}^{(b)}, \varvec{z}^{(b)}\rangle \), where \(\varvec{f}^{(a)} =\varvec{f}(\mathtt{find}(\bar{s}==0))\) and \(\varvec{f}^{(b)} =\varvec{f}(\mathtt{find}(\bar{s}==1)).\)
(The above procedure can be modified slightly to handle the case when z has some zero components.) Note that in the above procedure, \( \langle \varvec{f}^{(a)},\varvec{z}^{(a)}\rangle \) and \( \langle \varvec{f}^{(b)}, \varvec{z}^{(b)}\rangle \) correspond to the sum of positive terms and sum of negative terms in the evaluation of f(z). By separating the summation of the positive and negative terms in the evaluation of f(z), it is hoped that cancellation errors can be minimized.
We should mention that some of the equality constraints in (8) may be redundant. For the sake of reducing the computational cost and improve the numerical stability, we remove these redundant constraints before solving the SDP. However, as d increases, the linear constraints would become more and more nearly dependent, and typically the SDP problem cannot be solved accurately by either SDPT3 or SEDUMI.
Another numerical issue which we should point out is that the constraint matrix
associated with the nonnegative vector \((\lambda _{\alpha \beta })\) is typically fully dense. Such a matrix would consume too much memory and also computational cost when d increases or when m is large.
4 Numerical experiments
We call our approach BSOS (for hierarchy with bounded degree SOS). As mentioned in the Introduction, we conduct experiments on three classes of problems which will be described in the ensuing subsections.
4.1 Comparison of BSOS with gloptiploy
We construct a set of test functions with 5 constraints. The test functions are mainly generated based on the following two problems:
The optimal value of \((P_1)\) is \(f(x^*)=-0.57491\), as computed by GloptiPoly3. For BSOS, we get the result \(q^{k=1}_{d=1} = -0.57491\), which is the exact result.
The second problem is
The optimal value of \((P_2)\) is \(f(x^*)=-0.037037\), as computed by GloptiPoly3. The results obtained by BSOS are
Based on the above two problems, we increase the degree of the objective function and constraint functions to generate other test instances which are given explicitly in the Appendix.
Table 1 compares the results obtained by BSOS and GloptiPoly3 for the tested instances. We observe that BSOS can give the exact result for those problems with either low degree or low dimension, while also providing a good lower bound for high-degree and high-dimensional problems. In particular on this sample of problems, k is chosen so that the size of the semidefinite constraint \(\mathrm{\bigg (which\,is}\,\bigg ( {\begin{array}{c} {n + k} \\ n \\ \end{array} } \bigg )\bigg )\) is the same as the one needed in GloptiPoly, to certify global optimality. Then notice that BSOS succeeds in finding the optimal value even though the positivity certificate used in (8) is not Putinar’s certificate (3) used in GloptiPoly. In addition, for most of test problems, BSOS can usually get better bounds as d increases, and in most cases, the bound is good enough for small \(d=2,3\).
In Table 1, we also use the sufficient condition stated in Lemma 1 to check whether the generated lower bound is indeed optimal. For quite a number of instances, the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) associated with the optimal solution \(\theta ^*\) of (9) indeed has numerical rank equal to one (we declare that the matrix has numerical rank equal to one if the largest eigenvalue is at least \(10^4\) times larger than the second largest eigenvalue), which certifies that the lower bound is actually the optimal value. We should note that for some of the instances, although the lower bound is actually the optimal value (as declared by GloptiPoly), but the rank of the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) is larger than one.
4.2 Comparison of BSOS with the LP relaxations of Krivine-Stengle on convex problems
Here we compare the performance of BSOS with the LP relaxations of Krivine-Stengle on convex problems where each test problem has 5 constraint functions in addition to the nonnegative constraint \(x\ge 0\). Note that the LP relaxation problem has exactly the same form as in (8), except that the positive semidefinite matrix variable Q is set to 0. We should mention that even though the Krivine-Stengle scheme generates LP problems instead of SDP problems, the size of the corresponding LP problems also increases rapidly with d, like for the BSOS scheme. In particular, in both LP- and BSOS-relaxations, the dimension of the nonnegative variable \(\lambda \) is \( \left( {\begin{array}{c} {2m + d} \\ d \\ \end{array} } \right) \), and the constraint matrix is fully dense. \(\big (\mathrm{The\,BSOS}\text {-}\mathrm{relaxations\,include\,an\,additional\,semidefinite\,constraint\,with\,fixed}\,\mathrm{matrix\, size} {\left( \begin{array}{c} {n+k} \\ {k}\end{array}\right) }.\big )\) The following example illustrates the performance of LP relaxation method:
For this problem, the functions f and \(-g_i\)’s are all convex. The optimal value for this problem is \(f(x^*)=-0.7500\), as computed by GloptiPoly3. For BSOS, we get \(q^{k=2}_{d=1}=-0.7500\), and we obtained the exact result by just choosing \(d=1\). This observation is consistent with Theorem 4.1 in Lasserre (2013). For the LP relaxation method, we get the following values for various choices of d:
Observe that when d increases, we could get a better lower bound for the exact optimal value. However, as d increases, the LP relaxation problem would become increasing ill-posed and the solver has difficulty in solving LP problem accurately. In particular, for \(d=5\), both the solvers SeDuMi and SDPT3 fail to compute an accurate enough solution for the LP to generate a sensible lower bound for \(f(x^*)\).
In Table 2, we observe that BSOS can achieve the exact result with \(d=1\) for all the test instances. In contrast, the LP relaxation method of Krivine-Stengle does not perform very well even though the test instances are convex problems. In particular, observe that for the last instance C20_2, the LP relaxation method cannot produce a good lower bound even when we choose \(d=3\), and the time taken to solve the correspond LP is about 40 minutes.
4.3 Performance of BSOS on quadratic problems with polyhedral constraints
Here consider the following problem:
where A is a given \(n\times n\) symmetric matrix. In our numerical experiments, we generate random instances such as Qn10_r2 for which \(n=10\) and A is randomly generated so that it has \(r=2\) negative eigenvalues and \(n-r\) positive eigenvalues as follows:
Table 3 compares the performance of BSOS and GloptiPoly3. From the numerical results, we can see that BSOS is far more efficient than GloptiPoly3 in solving the problems (11). For example, for the problem Qn20_r2 with \(n=20\), BSOS took only 1.9 seconds to generate the lower bound \(-2.0356e3\) for the problem, but GloptiPoly3 took more than 1 hour to generate the same bound. The disparity in the efficiency between BSOS and GloptiPoly3 is expected to become even wider for other instances with n larger than 20.
In Table 3, we again use the sufficient condition stated in Lemma 1 to check whether the generated lower bound is indeed optimal. For each of the first eight instances, the moment matrix \(\mathbf {M}_\ell (\mathbf {y}^*)\) associated with the optimal solution \(\theta ^*\) of (9) has numerical rank equal to one (we declare that the matrix has numerical rank equal to one if the largest eigenvalue is at least \(10^4\) times larger than the second largest eigenvalue), which certifies that the lower bound is actually the optimal value.
4.4 Performance of BSOS on higher order problems with more variables
Here we consider the following problem:
where \(\ell =2\) or \(\ell =4\) and \(c_{\alpha }\) are randomly generated in \([-1,1]\). In our numerical experiments, we generated instances such as Hn20_4 for which \(n=20\) and the degree of the polynomial is 4. The problem is find the minimal value of a polynomial on the Euclidean unit ball intersected with the positive orthant.
Table 4 displays the respective performance of BSOS and GloptiPoly3. From the numerical results, we can see that BSOS is far more efficient than GloptiPoly3 in solving problems (12). For example, for problem Hn20_4 (with \(n=20\) variables and degree \(\ell =4\)), BSOS took about 250s to generate the first lower bound \(-2.1860\), and took nearly 300s to generate a better lower bound of \(-1.5943\), which is also the exact optimal value for the problem. But GloptiPoly3 got out of memory when solving the same problem. Similarly for problem Hn40_2, it took BSOS and GloptiPoly3 very little time to generate the first lower bound of \(-2.3789\). To get a better lower bound, it took BSOS 29s to generate the optimal value of the problem. In contrast GloptiPoly3 got out of memory for improving the bound. From our observations, the disparity in efficiency between BSOS and GloptiPoly will become wider for instances with larger n and/or of higher degree.
5 Conclusion
We have described and tested a new hierarchy of semidefinite relaxations for global polynomial optimization. It tries to combine some advantages of previously defined LP- and SOS-hierarchies. Essentially, it uses a positivity certificate already used in the LP-hierarchy but with an additional semidefinite constraint which thus makes it an SOS-hierarchy. However the main and crucial point is that the size of this additional semidefinite constraint is fixed in advance and decided by the user (in contrast to the standard SOS-hierarchy in which the size of the semidefinite constraint increases in the hierarchy). Preliminary results are encouraging especially for non convex problems on convex polytopes where problems with up to 100 variables have been solved in a reasonable amount of time (whereas the standard SOS-hierarchy of GloptiPoly cannot be implemented).
For problems of larger size one needs to consider some serious numerical issues due to the presence of some fully dense submatrix and some nearly dependent linear constraints. In addition, to be able to handle large-scale problems one also needs to provide a “sparse version” of this hierarchy, an analogue of the sparse version of the SOS-hierarchy defined in Waki et al. (2006). Both issues (a topic of further investigation) are certainly non trivial, in particular the latter issue because the positivity certificate used in this new hierarchy involves products of initial polynomial constraints, which destroys the sparsity pattern considered in Waki et al. (2006).
Notes
An SOS-convex polynomial is a convex polynomial whose Hessian factors as \(L(x)L(x)^T\) for some rectangular matrix polynomial L. For instance, separable convex polynomials are SOS-convex.
A polynomial \(f\in \mathbb {R}[x]\) is SOS-convex if its Hessian \(\nabla ^2 f\) is an SOS matrix, i.e., \(\nabla ^2f(x)=L(x)\,L(x)^T\) for some matrix polynomial \(L\in \mathbb {R}[x]^{n\times p}\) and some \(p\in \mathbb {N}\).
References
Ahmadi AA, Majumdar A (2014) DSOS and SDSOS Optimization: LP and SOCP-Based Alternatives to SOS Optimization. In: Proceedings of the 48th annual conference on information sciences and systems, Princeton, NJ, USA, pp 1–5
Benabbas S, Magen A (2010) Extending SDP integrality gaps to Sherali–Adams with applications to quadratic programming and MaxCutGain, In: Shepherd FB (ed) Integer programming and combinatorial optimization. Lecture Notes in Computer Science, Springer, New York, pp 299–312
Curto RE, Fialkow LA (2000) The truncated complex K-moment problem. Trans Amer Math Soc 352:2825–2855
Curto RE, Fialkow LA (2005) Truncated K-moment problem in several variables. J Op Theory 54:189–226
Chlamtac E, Tulsiani M (2012) Convex relaxations and integrality gaps. In: Anjos M, Lasserre JB (eds) Handbook of semidefinite, conic and polynomial optimization. Springer, New York, pp 139–170
de Klerk E, Laurent M (2011) On the Lasserre hierarchy of semidefinite programming relaxations of convex polynomial optimization problems. SIAM J Optim 21:824–832
Floudas CA, Pardalos PM (1990) A collection of test problems for constrained global optimization algorithms. Lecture Notes in Computer Science, vol 455, Springer, Berlin
Henrion D, Lasserre JB (2005) Detecting global optimality and extracting solutions in Gloptipoly. In: Henrion D, Garulli A(eds) Positive polynomials in control, Lecture Notes in Control and Information Science, vol 312, pp 293–310
Henrion D, Lasserre JB, Lofberg J (2009) GloptiPoly 3: moments, optimization and semidefinite programming. Optim Methods Softw 24:761–779
Helton JW, Nie J (2010) Semidefinite representations of convex sets. Math Program 122:21–64
Krivine JL (1964) Anneaux préordonnés. J Anal Math 12:307–326
Lasserre JB (2001) Global optimization with polynomials and the problem of moments. SIAM J Optim 11:796–817
Lasserre JB (2002) Semidefinite programming vs. LP relaxations for polynomial programming. Math Op Res 27:347–360
Lasserre JB (2009) Convexity in semi-algebraic geometry and polynomial optimization. SIAM J Optim 19:1995–2014
Lasserre JB (2009) Moments, positive polynomials and their applications. Imperial College Press, London
Lasserre JB (2013) A Lagrangian relaxation view of linear and semidefinite hierarchies. SIAM J Optim 23:1742–1756
Laurent M (2003) A comparison of the Sherali–Adams, Lovász-Schrijver and Lasserre relaxations for 0–1 programming. Math Op Res 28:470–496
Marshall M (2009) Representation of non-negative polynomials, degree bounds and applications to optimization. Can J Math 61:205–221
Nie J (2014) Optimality conditions and finite convergence of Lasserre’s hierarchy. Math Program 146:97–121
Putinar M (1993) Positive polynomials on compact semi-algebraic sets. Ind Univ Math J 42:969–984
Sherali HD, Adams WP (1990) A hierarchy of relaxations between the continuous and convex hull representations for zero-one programming problems. SIAM J Discr Math 3:411–430
Sherali HD, Adams WP (1999) A reformulation-linearization technique for solving discrete and continuous nonconvex problems. Kluwer, Dordrecht, MA
Stengle G (1974) A nullstellensatz and a positivstellensatz in semialgebraic geometry. Math Ann 207:87–97
Toh KC, Todd MJ, Tutuncu RH (1999) SDPT3—a Matlab software package for semidefinite programming. Optim Methods Softw 11:545–581
Toh KC, Todd MJ, Tutuncu RH (2003) Solving semidefinite-quadratic-linear programs using SDPT3. Math Program 95:189–217
Waki S, Kim S, Kojima M, Maramatsu M (2006) Sums of squares and semidefinite programming relaxations for polynomial optimization problems with structured sparsity. SIAM J Optim 17:218–242
Acknowledgments
The work of the first author is partially supported by a PGMO grant from Fondation Mathématique Jacques Hadamard, and an ERC-ADG grant from the European Research Council (ERC): grant agreement 666981 TAMING.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Before proving Lemma 1 we need introduce some notation. Given \(k\in \mathbb {N}\) fixed, let \(\tau = \max \{\mathrm{deg}(f), 2k, d\max _{j}\{\mathrm{deg}(g_j)\}\}\). For a sequence \(\mathbf {y}=(y_\alpha )\in \mathbb {N}^n_\tau \), let \(L_\mathbf {y}:\mathbb {R}[x]_\tau \rightarrow \mathbb {R}\) be the Riesz functional:
and let \(\mathbf {M}_k(\mathbf {y})\) be the moment matrix of order k, associated with \(\mathbf {y}\). If \(q\in \mathbb {R}[x]_k\) with coefficient vector \(\mathbf {q}=(q_\alpha )\), then \(\langle \mathbf {q},\mathbf {M}_k(\mathbf {y})\,\mathbf {q}\rangle =L_\mathbf {y}(q^2)\) and if \(\mathbf {y}\) is the (truncated) moment sequence of a measure \(\mu \),
1.1 Proof of Lemma 1
(a) We first prove that the dual of (7) which is the semidefinite program:
satisfies Slater’s condition. Recall that K has nonempty interior; so let \(\mathbf {y}\) be the sequence of moments of the Lebesgue measure \(\mu \) on \(\mathbf {K}\), scaled to be a probability measure, so that \(L_\mathbf {y}(1)=1\). Necessarily \(\mathbf {M}_k(\mathbf {y})\succ 0\). Otherwise there would exists \(0\ne q\in \mathbb {R}[x]_k\) such that
But then q vanishes almost everywhere on K, which implies \(q=0\), a contradiction.
Next, observe that for each \((\alpha ,\beta )\in \mathbb {N}^{2m}_d\), the polynomial \(h_{\alpha \beta }\in \mathbb {R}[x]_\tau \) is nonnegative on K and since there exists \(x_0\in K\) such that \(0<g_j(x_0)<1\) for all \(j=1,\ldots ,m\), there is an open set \(O\subset K\) such that \(h_{\alpha \beta }(x)>0\) on O for all \((\alpha ,\beta )\in \mathbb {N}^{2m}\). Therefore
Therefore \(\mathbf {y}\) is a strictly feasible solution of (13), that is, Slater’s condition holds true for (13). Hence \(\rho ^k_d=q^k_d\) for all d. It remains to prove that \(q^k_d>-\infty \). But this follows from Theorem 1(b) as soon as d is sufficiently large, say \(d\ge d_0\) for some integer \(d_0\). Indeed then \(-\infty <\theta _d\le q^k_d\le f^*\) for all \(d\ge d_0\) (where \(\theta _d\) is defined in (4)). Finally for each fixed d, (7) and (8) have same optimal value \(q^k_d\) and an optimal solution \((q^k_d,\lambda ^*,Q^*)\) of (7) is also an optimal solution of (8).
(b) Let \(\theta ^*\) be an optimal solution of (9) and let \(\mathbf {y}^*\) be as in (10).
\(\bullet \) If \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=1\) then \(\mathbf {M}_s(\mathbf {y}^*)=v_s(x^*)\,v_s(x^*)^T\) for some \(x^*\in \mathbb {R}^n\); this is due to the Hankel-like structure of the moment matrix combined with the rank-one property. So by definition of the moment matrix \(\mathbf {M}_s(\mathbf {y}^*)\), \(\mathbf {y}^*=(y^*_\alpha )\), \(\alpha \in \mathbb {N}^n_{2s}\), is the vector of moments (up to order 2s) of the Dirac measure \(\delta _{x^*}\) at the point \(x^*\). That is, \(y^*_\alpha =(x^*)^\alpha \) for every \(\alpha \in \mathbb {N}^n_{2s}\). But from (10),
In particular, for moments of order 1 we obtain \(x^*=\sum _{p=1}^L\theta ^*_p\,x^{(p)}\). In other words, up to moments of order 2s, one cannot distinguish the Dirac measure \(\delta _{x^*}\) at \(x^*\) from the signed measure \(\mu =\sum _p\theta ^*_p\delta _{x^{(p)}}\) (recall that the \(\theta ^*_p\)’s are not necessarily nonnegative). That is, \((x^*)^\alpha =\int x^\alpha d\delta _{x^*}=\int x^\alpha \mathrm{d}\mu \) for all \(\alpha \in \mathbb {N}^n_{2s}\). This in turn implies that for every \(q\in \mathbb {R}[x]_{2s}\):
Next, as \(\theta ^*\) is feasible for (9) and \(2s\ge \max [\mathrm{deg}(f);\,\mathrm{deg}(g_j)]\),
In particular, choosing \((\alpha ,\beta )\in \mathbb {N}^{2m}_{2s}\) such that \(h_{\alpha \beta }=g_j\) (i.e. \(\beta =0\), \(\alpha _i=\delta _{i=j}\)), one obtains \(g_j(x^*)\ge 0\), \(j=1,\ldots ,m\), which shows that \(x^*\in K\). In addition,
which proves that \(x^*\in K\) is an optimal solution of problem (P).
\(\bullet \) If \(\mathbf {M}_s(\mathbf {y}^*)\succeq 0\), \(\mathbf {M}_{s-r}(g_j\,\mathbf {y}^*)\succeq 0\), \(j=1,\ldots ,m\), and \(\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)=\mathrm{rank}\,\mathbf {M}_{s-r}(\mathbf {y}^*)\) then by Curto and Fialkow (2000, 2005, Theorem 1.1), \(\mathbf {y}^*\) is the vector of moments up to order 2s, of some atomic-probability measure \(\mu \) supported on \(v:=\mathrm{rank}\,\mathbf {M}_s(\mathbf {y}^*)\) points \(z(i)\in K\), \(i=1,\ldots ,v\). That is, there exist positive weights \((w_i)\subset \mathbb {R}_+\) such that
Therefore,
which shows that \(\tilde{q}^k_d=f^*\). In addition
which implies \(f(z(i))=f^*\) for every \(i=1,\ldots ,v\). Finally, the v global minimizers can be extracted from the moment matrix \(\mathbf {M}_s(\mathbf {y}^*)\) by the simple linear algebra procedure described in Henrion and Lasserre (2005). \(\square \)
1.2 Test functions for BSOS and GloptiPoly in Table 1
Example P4_2 (4 variables, degree 2):
Example P4_4 (4 variables, degree 4):
Example P4_6 (4 variables, degree 6):
Example P4_8 (4 variables, degree 8):
Example P6_2 (6 variables, degree 2):
Example P6_4 (6 variables, degree 4):
Example P6_6 (6 variables, degree 6):
Example P6_8 (6 variables, degree 8):
Example P8_2 (8 variables, degree 2):
Example P8_4 (8 variables, degree 4):
Example P8_6 (8 variables, degree 6):
Example P10_2 (10 variables, degree 2):
Example P10_4 (10 variables, degree 4):
Example P20_2 (20 variables, degree 2):
Example P20_4 (20 variables, degree 4): same as P20_2 except that f is replaced by
1.3 Test functions for BSOS versus LP relaxations of Krivine-Stengle on convex problems in Table 2
Example C4_2 (4 variables, degree 2):
Example C4_4 (4 variables, degree 4):
Example C4_6 (4 variables, degree 6):
Example C6_2 (6 variables, degree 2):
Example C6_4 (6 variables, degree 4):
Example C6_6 (6 variables, degree 6):
Example C8_2 (8 variables, degree 2):
Example C8_4 (8 variables, degree 4):
Example C10_2 (10 variables, degree 2):
Example C10_4 (10 variables, degree 4):
Example C20_2 (20 variables, degree 2):
Rights and permissions
About this article
Cite this article
Lasserre, J.B., Toh, KC. & Yang, S. A bounded degree SOS hierarchy for polynomial optimization. EURO J Comput Optim 5, 87–117 (2017). https://doi.org/10.1007/s13675-015-0050-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13675-015-0050-y