1 Introduction

This paper is concerned with polynomial optimization on non-compact semialgebraic sets. Its spirit and main motivation is to voluntarily avoid the big-ball trick which reduces the problem to the compact case. The big-ball “trick” is to simply assume that the global minimum is attained in some a priori known ball \(B_M\) centered at zero of radius \(M>0\) potentially large. Therefore, by adding this additional constraint to the definition of the feasible set, one is back to the compact case.

Why? This “trick” has definitely some merit since in some practical applications such an M can be sometimes determined with ad-hoc arguments. However, it is not satisfactory from a mathematical point of view. Indeed after one has found a minimizer \(x^\star \in B_M\), one is still left with the question: Is really \(x^\star \) a global minimizer? Was M chosen sufficiently large? In other words, in doing so one does not obtain an certificate that \(x^\star \) is a global minimizer. As we will see, the challenge is to adapt some certificates of positivity on non-compact sets already available in the literature, to turn them into a practical algorithm.

Background Deciding nonnegativity of a polynomial is an important and attractive problem throughout history of the development of real algebraic geometry. In his famous and seminal work [16], Hilbert characterized all cases where nonnegative polynomials are sums of squares (SOS) of polynomials and later Blekherman showed in [4] that there are significantly more nonnegative polynomials than SOS. In 1927, Artin proved in [1] that every nonnegative polynomial can be decomposed as a sum of squares of rational functions, thereby solving Hilbert’s 17th problem. Namely, f is nonnegative if and only if \(\sigma _D f = \sigma _N\) for some SOS polynomials \(\sigma _N\) and \(\sigma _D\ne 0\). Later on, certificates of positivity on a general semialgebraic set have been proposed by Stengle [58] (see also Krivine [23]). A basic semialgebraic set S(gh) can be written as

$$\begin{aligned} S(g,h)\,:=\,\left\{ \,x \in {\mathbb {R}}^n :\, g_j(x)\ge 0,\,j=1,\ldots ,m;\,h_t(x)=0,\,t=1,\ldots ,l\,\right\} , \end{aligned}$$
(1.1)

where n is the dimension of the ambient space and \((g_j, h_{t})\) are polynomials. Stengle and Krivine rely on a tool from real algebraic geometry called preordering

$$\begin{aligned} P\left( {g,h} \right) := \left\{ {\sum \limits _{\alpha \in {{\{ 0,1\} }^m}} {{\sigma _\alpha }g_1^{{\alpha _1}} \ldots g_m^{{\alpha _m}}} + \sum \limits _{t = 1}^l {{\phi _t}{h_t}} :\, {\sigma _\alpha } \in \varSigma \left[ x \right] ,\ {\phi _t} \in {\mathbb {R}}\left[ x \right] } \right\} , \end{aligned}$$
(1.2)

associated with the polynomials \((g_j,h_t)\). Here \({\mathbb {R}}[x]\) denotes the ring of real polynomials and \(\varSigma [x] \subset {\mathbb {R}}[x]\) stands for the set of SOS polynomials. Krivine-Stengle’s Positivstellensatz (or representation) states that

$$\begin{aligned} f \ge 0\text { on }S( {g,h} )\Leftrightarrow & {} \exists {q_1},{q_2} \in P( {g,h} ),\ s \in {\mathbb {N}}:\ {q_1}f = {f^{2s}} + {q_2} \end{aligned}$$
(1.3)
$$\begin{aligned} f > 0 \text { on }S({g,h} )\Leftrightarrow & {} \exists {q_1},{q_2} \in P( {g,h} ):\, {q_1}f = 1 + {q_2}. \end{aligned}$$
(1.4)

Notice that the above representations involve a multiplier \(q_1\) for f as well as cross-products of the \(g_j\)’s in (1.2). In 1993, Putinar [45] refined a result of Schmüdgen [54] for certificates of positivity on a basic semi algebraic set (1.1) assumed to be compact plus an Archimedean assumption, described below. It avoids a multiplier for f and no cross-product of the \(g_j\)’s. Namely, Putinar’s Positivstellensatz states that f is positive on S(gh) if f belongs to the set

$$\begin{aligned} Q\left( {g,h} \right) : = \left\{ {\sigma _0 + \sum \limits _{j = 1}^m {{\sigma _j}{g_j}} + \sum \limits _{t = 1}^l {{\phi _t}{h_t}} :\,{\sigma _j} \in \varSigma \left[ x \right] ,\,{\phi _t} \in {\mathbb {R}}\left[ x \right] } \right\} . \end{aligned}$$
(1.5)

The set Q(gh) is called the quadratic module associated with the polynomials \((g_j,h_t)\).

SOS for optimization. More recently and since the pioneer works of Lasserre [24] and Parrilo [43], SOS-based certificates of nonnegativity have now become a powerful tool in polynomial optimization and control. In the unconstrained case, let \(f^\star := \inf _{x \in {\mathbb {R}}^n} f(x)\). If \(f-f^\star \) (\(\ge 0\) on \({\mathbb {R}}^n\)) is an SOS polynomial then \(f^\star \) can be obtained by solving a single semidefinite program (SDP). However in general \(f-f^\star \) is an SOS of rational functions, which yields:

$$\begin{aligned} f^\star \,=\,\displaystyle \sup _{\lambda ,\sigma _N,\sigma _D}\, \{\lambda : \,\sigma _D\,(f-\lambda )\,=\,\sigma _N\,;\quad \sigma _N,\sigma _D\,\in \,\varSigma [x]\,;\quad \sigma _D(0)= 1\,\}\, \end{aligned}$$
(1.6)

By fixing in advance a bound d on the degree of \(\sigma _D\), one may solve (1.6) by SDP combined with bisection search on \(\lambda \) and increase of d when no solution exists. The normalization constraint \(\sigma _D(0)= 1\) ensures that neither \(\sigma _D\) nor \(\sigma _N\) is the zero polynomial. In the constrained case, let S(gh) in (1.1) be compact and assume with no loss of generality that the so-called Archimedean assumption holds, namely that \(L-\Vert x\Vert ^2_2\) belongs to \(Q\left( {g,h} \right) \) for some \(L>0\). This can be automatically ensured by setting \(g_m(x)=L-\Vert x\Vert ^2_2\). Under this assumption, the second author provides in [24] a so-called moment-SOS hierarchy of semidefinite relaxations based on Putinar’s representation, yielding a non-decreasing sequence of lower bounds on \({f^\star } := \min _{x \in S({g,h})}f( x)\), which converges to \(f^\star \). Convergence is finite for generic constraints S(gh) due to [39] and a numerical procedure from [15] allows one to extract global minimizers from an optimal solution of the (exact) semidefinite relaxation in the hierarchy. It relies on the flat extension condition of Curto and Fialkow [8, 30]. In the above-mentioned frameworks, compactness of S(gh) is crucial.

Related works on SOS approximations of nonnegative polynomials Blekherman’s result [4] states that for a fixed degree, the cone of nonnegative polynomials is way larger than the cone of SOS polynomials. This is in contrast with the denseness result from [3, Theorem 5], which establishes that the cone of SOS polynomials is dense in the space of polynomials being nonnegative on \([-1, 1]\), for the \(l_1\)-norm of coefficients, defined by \(\Vert f\Vert _1 = \sum _{\alpha } |f_\alpha |\) (whenever one writes \(f = \sum _{\alpha } f_\alpha x^{\alpha }\) in the standard canonical basis of monomials). Other denseness results from [25, 29] are based on perturbations of nonnegative polynomials to obtain SOS certificates. In [25], Lasserre states that any given nonnegative polynomial f can be approximated by a sequence \((f_\varepsilon )_{\varepsilon }\) of SOS polynomials given by

$$\begin{aligned} {f_\varepsilon } := f + \varepsilon \sum \limits _{k = 0}^{{r_\varepsilon }} {\sum \limits _{j = 1}^n {x_j^{2k}/( {k!} )} } , \ \varepsilon >0 , \end{aligned}$$

for some \(r_\varepsilon \in {\mathbb {N}}\), so that \(\Vert f_\varepsilon - f\Vert _1 \rightarrow 0\), as \(\varepsilon \downarrow 0\). Similarly, Lasserre and Netzer prove in [29] that every polynomial f being nonnegative on the unit box \([0, 1]^n\) can be approximated in \(l_1\)-norm by a sequence of SOS

$$\begin{aligned} {f_{r \varepsilon }} := f + \varepsilon \left( {1 + \sum \limits _{j = 1}^n {x_j^{2r}} } \right) , \ \varepsilon >0 . \end{aligned}$$

Provided that r is large enough, one has \(\Vert f_{r \varepsilon } - f \Vert _1 \rightarrow 0\), as \(\varepsilon \downarrow 0\). Jibetean and Laurent [21] compute tight upper bounds for the unconstrained polynomial optimization problem \(f^\star :=\min _{x\in {\mathbb {R}}^n}f(x)\) based on the perturbed problem \(f^\star _\varepsilon :=\inf _{x\in {\mathbb {R}}^n}f(x)+\varepsilon \sum _{j = 1}^n {x_j^{2d + 2}} \) and SDP relaxations over the gradient ideal (see also [41]). Besides their theoretical aspects, these approximation results allow one to interpret some paradoxical behaviors (due to numerical roundoff errors) observed while relying on SDP relaxations for polynomial optimization. Such behavior occurs for instance while extracting the minimizers of Motzkin’s polynomial \(f = ( {x_1^2 + x_2^2-1})x_1^2x_2^2+1/27\) with the algorithmic procedure from [15]. Motzkin’s polynomial is globally nonnegative but does not belong to the SOS cone. However, the perturbed polynomial \(\tilde{f} = f + \varepsilon (1 + x_1^6 + x_2^6)\) is an SOS for small \(\varepsilon > 0\). This implies that an SDP solver can find an approximation of the optimal value of f for a sufficiently high order of relaxation, and that one can extract the global minimizers of f. In fact  [28], an SDP solver performs “robust optimization” in the following sense: instead of solving the original optimization problem with nominal criterion f, the solver considers a perturbed criterion which lies in a ball of small radius \(\varepsilon \) and center f. In [37], the authors explain a similar paradox occurring in a noncommutative setting.

As shown in [34], the user can also introduce perturbations to compensate the numerical uncertainties added by the solver. This perturbation/compensation scheme is the main ingredient of the hybrid numeric-symbolic algorithm from [34], designed to compute exact rational SOS decompositions for polynomials lying in the interior of the SOS cone.

The non-compact case There have been several attempts to provide a hierarchy of semidefinite relaxations when S(gh) is not compact. In [20] the authors consider polynomial optimization problems with non-compact set in the special case where \(S( {g \cup \{ {c - f} \},h} )\) satisfies the Archimedian assumption. Later on, Dickinson and Povh [11] obtain a certificate for homogeneous polynomials positive on the intersection of the nonnegative orthant with a basic semialgebraic cone. This latter one generalizes Pólya’s result [44], which states that one can always multiply a homogeneous polynomial positive on the nonnegative orthant by some power of \((x_1 + \dots + x_n)\) to obtain a polynomial with nonnegative coefficients. Reznick proves in [49] that any positive definite form can be multiplied by a large enough power of \(\Vert x\Vert _2^2\) to become a sum of powers of linear forms (which is in particular an SOS polynomial). For this specific class of nonnegative polynomials, Reznick’s result provides a suitable decomposition into SOS of rational functions, which can be practically computed via SDP. Reznick’s result was generalized in [53], where Scheiderer proves that one can replace \(\Vert x\Vert _2^2\) by an even power of any strictly positive polynomial. An interesting related result is the Positivstellensatz [47] of Putinar and Vasilescu. Let us define

$$\begin{aligned} S(g)\,:=\,\{x:\,g_j(x)\ge 0,\, j=1,\ldots ,m\} \ \text {and} Q\left( {g} \right) \,: =\, \left\{ {\sigma _0+ \sum \limits _{j = 1}^m {{\sigma _j}{g_j}} :\;{\sigma _j} \in \varSigma [ x ]} \right\} . \end{aligned}$$

Theorem 1

(Putinar–Vasilescu [47, Corollary 4.3 and 4.4]) Let \(\theta \in {\mathbb {R}}[x]\) be the quadratic polynomial \(x\mapsto \theta (x):= 1 + \Vert x\Vert _2^2\), and denote by \(\tilde{p} \in {\mathbb {R}}[ {x,{x_{n + 1}}} ]\) the homogeneous polynomial associated with \(p\in {\mathbb {R}}[ {x} ]\), defined by \(x\mapsto \tilde{p}(x): = x_{n + 1}^{\deg ( p )}p( {x/{x_{n + 1}}} )\).

  1. 1.

    Let \(f\in {\mathbb {R}}[ {x} ]\) such that \(\tilde{f}>0\) on \({{\mathbb {R}}^{n + 1}}\backslash \{ 0 \}\). Then \({\theta ^k}f \in \varSigma [ x ]\) for some \(k\in {\mathbb {N}}\).

  2. 2.

    Let \(f,g_1,\dots ,g_m \in {\mathbb {R}}[ {x} ]\) satisfy the following two conditions:

    1. (a)

      \(f = {f_0} + {f_1}\) such that \(\deg ({f_0}) < \deg ( {f_1})\) and \({{\tilde{f}}_1} > 0\) on \({{\mathbb {R}}^{n + 1}}\backslash \{ 0 \}\);

    2. (b)

      \(f > 0\) on S(g).

    Then \({\theta ^{2k}}f \in Q(g)\) for some \(k\in {\mathbb {N}}\).

As a consequence, they also obtain:

Corollary 1

(Putinar–Vasilescu [47, Final remark 2]) Let \(\theta := 1 + \Vert x\Vert _2^2\).

  1. 1.

    Let \(f \in {\mathbb {R}}[ x ]_{2d}\) be such that \(f\ge 0\) on \({\mathbb {R}}^n\). Then for all \(\varepsilon >0\), there exists \(k\in {\mathbb {N}}\) such that \({\theta ^k}( {f + \varepsilon {\theta ^d}} ) \in \varSigma [ x]\).

  2. 2.

    Let \(f \in {\mathbb {R}}[ x ]\) such that \(f\ge 0\) on S(g). Let \(d\in {\mathbb {N}}\) such that \(2d > \deg ( f )\). Then for all \(\varepsilon >0\), there exists \(k\in {\mathbb {N}}\) such that \({\theta ^{2k}}( {f + \varepsilon {\theta ^d}} ) \in Q ( g )\).

Marshall [36, Corollary 4.3] states a slightly more general result but with no explicit d, and Schweighofer [55, Corollary 6.3] provides a new algebraic proof of Marshall’s result. To summarize, for every polynomial f nonnegative on a general basic semialgebraic set S(g), one obtains the following representation result: for a given \(\varepsilon >0\), there exist a nonnegative integer k and SOS polynomials \(\sigma _0, \sigma _1, \dots , \sigma _m\), such that

$$\begin{aligned} f + \varepsilon \theta ^d = \frac{{{\sigma _0} + {\sigma _1}{g_1} + \dots + {\sigma _m}{g_m}}}{{{\theta ^k}}} . \end{aligned}$$
(1.7)

Although this representation is theoretically attractive, their previous proofs are not constructive and do not provide any explicit algorithm, especially in polynomial optimization. The underlying reason is that there is no degree bounds on the SOS weights \(\sigma _0,\dots ,\sigma _m\). Therefore, checking membership of \({\theta ^k}( {f + \varepsilon {\theta ^d}} )\) in Q(g) is still a challenge. If one fixes a degree k which is not large enough then one would have to increase the degree of the \(\sigma _j\)’s forever without getting an answer. This restriction comes from the proof techniques used by Putinar and Vasilescu, Marshall, and Schweighofer used in [36, 47] and [55], respectively. The main idea of Putinar and Vasilescu is to localize the ring of polynomials by allowing inverses of \({\Vert {( {x,{x_{n + 1}}} )} \Vert ^2_2}\) and then to use Cassier’s technique for separating two convex cones of rational functions. Marshall proves Corollary 1 by applying a generalized Jacobi-Prestel criterion. Schweighofer’s proof for representation (1.7) is based on the relationship between the subring of bounded elements and the subring of arithmetically bounded elements, together with Monnier’s conjecture. The main point of Schweighofer’s theorem in [55] is that the multiplier \(\theta ^k\) can be replaced by a power of any “quickly growing” polynomial which does not even have to be an SOS. Another deep theorem of this sort is the one of Scheiderer in [53]. From their proofs, it is possible but difficult to find degree bounds for the SOS weights \(\sigma _j\) with respect to the input polynomial data \(f,g_j\).

Degree bounds of SOS weights Recent work by Lombardi, Perucci and Roy [33] provides degree bounds in a quite general situation. The best degree bounds for Hilbert’s 17th problem in three homogeneous variables are actually due to Hilbert’s original 1893 paper [16]. Some lower bounds for Hilbert’s 17th problem (quite far away from the upper bounds) were proved in [5] by Blekherman, Gouveia and Pfeiffer. Hilbert’s 1893 result and sharp degree bounds for projective curves (i.e., curves defined by homogeneous polynomials) were proved in [6] by Blekherman, Smith and Velasco.

Contribution As already mentioned, our approach is to treat the non-compact case frontally and avoid the big-ball trick. Our contribution is threefold:

I. In Sect. 3 we first provide an alternative proof of (1.7), with an explicit degree bound on the SOS weights, by relying on Jacobi’s technique in the proof of [18, Theorem 7]; this is crucial as it has immediate implications on the algorithmic side. More precisely, the degrees of SOS weights \(\sigma _j\) are bounded above by \(k + d - \lceil {\deg ( {{g_j}} )/2} \rceil \). First, one transforms the initial polynomials to homogeneous forms, then one relies on Putinar’s Positivstellensatz for the compact case, and finally one transforms back the obtained forms to dehomogenized polynomials. As a consequence, with \(\varepsilon >0\) fixed, arbitrary, this degree bound allows us to provide hierarchies \((\rho _k^i(\varepsilon ))_{k\in {\mathbb {N}}}\), \(i=1,2,3\) for unconstrained polynomial optimization (\(m=0\) and \(i=1\), see Sect. 4.1) as well as for constrained polynomial optimization (\(m\ge 1\) and \(i=2,3\), see Sect. 4.2). Computing each \(\rho _k^i(\varepsilon )\) boils down to solving a single SDP, with strong duality property. For k sufficiently large, \(\rho _k^i(\varepsilon )\) becomes an upper bound for the optimal value \(f^\star \) of the corresponding polynomial optimization problem (POP) \(\min _{x\in S(g)}f(x)\). If this problem has an optimal solution \(x^\star \), the gap between \(\rho _k^i(\varepsilon )\) and \(f^\star \) is at most \(\varepsilon \theta (x^\star )^d\). The related convergence rates are also analyzed in these sections.

II. In Sect. 4.3, we provide a new algorithm to find a feasible solution in the set S(gh) defined in (1.1). The idea is to include appropriate additional spherical equality constraints \(\varphi _t:={\xi _t} - \Vert {x - {a_t}} \Vert _2^2\), \(t = 0,\dots ,n\), in S( gh) so that the system \(S( {g,h\cup \{ {{\varphi _0},\dots ,{\varphi _n}} \}} )\) has a unique real solution. The nonnegative reals \(({\xi _t})_{t=0,\dots ,n}\) are computed with an adequate moment-SOS hierarchy. Moreover, this solution might be extracted in certain cases by checking whether some (moment) matrix satisfies a flat extension condition.

III. Finally we use this method to approximate a global minimizer of f on S(gh). Namely, we fix \(\varepsilon >0\) small and find a point in \(S( {g\cup \{ {\rho _k^i(\varepsilon ) - f} \},h } )\). This procedure works in certain cases, even if the set of minimizers is infinite. This is in deep contrast with the extraction procedure of [15] (via some flat extension condition) which works only for finite solution sets. Assuming that the set of solutions is finite, one may compare our algorithm with the procedure from [15] as follows. On the one hand, the latter extraction procedure provides global optimizers, provided that one has solved an SDP-relaxation with sufficiently large “k” (so as to get an appropriate rank condition). On the other hand, our algorithm that adds spherical equality constraints “divides” the problem into \(n+1\) SDP relaxations with additional constraints but with smaller order “k” (which is the crucial parameter for the SDP solvers). Numerical examples are provided in Sect. 5 to illustrate the difference between these two strategies.

For clarity of exposition, most proofs are postponed to Sect. A.

Comparison to other methods for solving POPs on non-compact semialgebraic sets We consider the general POP \({f^\star } = \inf \{ {f( x ):\, x \in S( {g,h} )} \}\) where S( gh) are unbounded.

  1. 1.

    In [19, 20], Jeyakumar et al. solve a POPs on an unbounded semialgebraic set after checking the coercivity of f on S( gh) and as well as the Archimedeanness of \(S( {g \cup \{ {c - f} \},h})\) for some \(c\ge f(\bar{x})\) with \(\bar{x} \in S(g,h)\). In particular, [19] uses the polynomial optimization solver SparsePOP developed by Waki [61] which exploits a structured sparsity of f, g and h. However checking these conditions can be difficult. Our method which solves SDPs for the hierarchy \((\rho _k^i( \varepsilon ))_{k\in {\mathbb {N}}}\) avoids checking coercivity and Archimedean assumptions.

  2. 2.

    Demmel et al. [10, 41] provide two representations of polynomials positive (resp. nonnegative) on S( gh) for solving POPs on unbounded domains. They state that f can be represented as an SOS of polynomials modulo the KKT ideal on S(gh) if the minimal value of f on S(gh) is attained at some KKT point and assuming that one of the following conditions holds:

    1. (a)

      \(f>0\) on S(gh);

    2. (b)

      \(f\ge 0\) on S(gh) and the KKT ideal is radical.

    This method is restricted to the case of global minimums satisfying KKT condition and testing if f belongs to the related KKT preorder requires a large number of SOS weights. Moreover, checking the radical property of the KKT ideal is difficult in general. Our method goes beyond these restrictions by only testing membership of the perturbation of \(\theta ^k f\) in the truncation of Q( gh), even if the KKT condition is not satisfied. Reader may have a look at Ha and Pham [60, §3.3] with the same comparison as to Demmel et al. [10].

  3. 3.

    Schweighofer [56] extends Nie et al. [41] (\(S(g,h)={\mathbb {R}}^n\)) to the case that f is bounded from below but does not necessarily attain a minimum. Schweighofer’s gradient tentacles method replaces the gradient variety by larger semialgebraic sets. Here we assume that f attains its minimum and compute an approximation of the optimal value \(f^\star \) as well as an approximation of some minimum \(x^\star \).

  4. 4.

    Greuet et al. [14] provide a probabilistic algorithm for solving POP on a real algebraic set \(S(g,h)=V(h)\). They can extract a solution under the following assumptions: \(\langle h \rangle \) is radical, V(h) is equidimensional of dimension \(d > 0\) and V(h) has finitely many singular points. When h generates an ideal with co-dimension c, the assumptions in Greuet et al. can be checked by verifying that the ideal and the minors of size c+1 of the related Jacobian matrix are of dimension at most 0. To do this, they consider the intersection with a generic hyperplan and check that the resulting complex polynomial system has no solution. This latter step can be performed with efficient algorithms. We emphasize that our method does not require such assumptions on \(\langle h \rangle \) and V(h).

  5. 5.

    In [40], Nie proposes a hierarchy of tight semidefinite relaxations for a POP on a (possibly non-compact) basic semialgebraic set. The idea is to replace the Lagrange multipliers in the constraint qualification condition by polynomials. Under certain assumptions (which hold generically) stated in [40, Theorem 3.3], Nie’s hierarchy has finite convergence if the minimum value of the POP is achieved at a critical point (the point satisfies the constraint qualification condition). Moreover there are many additional affine and semidefinite constraints in his relaxations [40, (3.8)–(3.9)]. By comparison, our hierarchy involves a smaller number of affine and semidefinite constraints while ensuring the convergence to a value in the neighborhood of the minimal value of the POP without any assumption, even if the minimum is not a critical point.

2 Notation, definitions and preliminaries

In this section, we introduce notation and basic facts about polynomial optimization and the moment-sums-of-squares (moment-SOS) hierarchy. With \(x = (x_1,\dots ,x_n)\), let \({\mathbb {R}}[x]\) stands for the ring of real polynomials and let \(\varSigma [x]\subset {\mathbb {R}}[x]\) be the subset of SOS polynomials. Let us note \({\mathbb {R}}[x]_t\) and \(\varSigma [x]_t\) the respective restrictions of these two sets to polynomials of degree at most t and 2t. Given \(\alpha = (\alpha _1,\dots ,\alpha _n) \in {\mathbb {N}}^n\), we note \(|\alpha | := \alpha _1 + \dots + \alpha _n\). Let \((x^\alpha )_{\alpha \in {\mathbb {N}}^n}\) be the canonical basis of monomials for \({\mathbb {R}}[x]\) (ordered according to the graded lexicographic order) and \(v_t(x)\) be the vector of monomials up to degree t, with length \(s(t) = {\left( {\begin{array}{c}n+t\\ n\end{array}}\right) }\). A polynomial \(f\in {\mathbb {R}}[x]_t\) is written as \(f(x)\,=\,\sum _{| \alpha | \le t} f_\alpha \,x^\alpha \,=\,\mathbf {f}^Tv_t(x)\), where \(\mathbf {f}=(f_\alpha )\in {\mathbb {R}}^{s(t)}\) is its vector of coefficients in the canonical basis. A polynomial f is homogeneous of degree t if \(f(\lambda x)=\lambda ^t f(x)\) for all \(x\in {\mathbb {R}}^n\) and each \(\lambda \in {\mathbb {R}}\). Equivalently, a homogeneous polynomial can be written as \(f = \sum _{| \alpha | = t} {f_\alpha x^\alpha }\). The degree-t homogenization \(\tilde{f}\) of \(f\in {\mathbb {R}}[x]_t\) is a homogeneous polynomial of degree t in \(n+1\) variables, defined by \(\tilde{f} ( x,x_{n + 1}) = x_{n + 1}^{t} f ( x/x_{n + 1} )\). A positive form is a nonnegative homogeneous polynomial which is positive everywhere except at the origin.

Moment and localizing matrix For a given real-valued sequence \(y=(y_\alpha )_{\alpha \in {\mathbb {N}}^n}\), let us define the Riesz linear functional \(L_y:{\mathbb {R}}[ x ] \rightarrow {\mathbb {R}}\) by:

$$\begin{aligned} f\mapsto {L_y}( f ) := \sum _{\alpha } f_\alpha y_\alpha ,\quad \forall f\in {\mathbb {R}}[x]. \end{aligned}$$

We say that a real infinite (resp. finite) sequence \(( y_\alpha )_{\alpha \in {\mathbb {N}}^n}\) (resp. \(( y_\alpha )_{\alpha \in {\mathbb {N}}^n_t}\)) has a representing measure if there exists a finite Borel measure \(\mu \) such that \(y_\alpha = \int _{{\mathbb {R}}^n} {x^\alpha d\mu (x)}\) is satisfied for every \(\alpha \in {{\mathbb {N}}^n}\) (resp. \(\alpha \in {{\mathbb {N}}^n_t}\)). In this case, \(( y_\alpha )_{\alpha \in {\mathbb {N}}^n}\) is called be the moment sequence of \(\mu \). Next, given \(y=(y_\alpha )_{\alpha \in {\mathbb {N}}^n}\) and \(d\in {\mathbb {N}}^*\), the moment matrix \(M_d(y)\) of degree d associated to y is the real symmetric matrix of size s(d) defined by \(M_d(y) := ( y_{\alpha + \beta })_{\alpha ,\beta \in {\mathbb {N}}^n_d} \). Let \(g = \sum _{\gamma } g_\gamma x^\gamma \in {\mathbb {R}}[x]\). The localizing matrix \(M_d(gy)\) of degree d associated with y and g is the real symmetric matrix of the size s(d) given by \(M_d(gy) = (\sum _\gamma {{g_\gamma }{y_{\gamma + \alpha + \beta }}})_{\alpha , \beta \in {\mathbb {N}}^n_d}\). We next recall an important result of [45], which is crucial for convergence of the moment-SOS hierarchy provided in the sequel.

Theorem 2

(Putinar [45]) Given \(g_1,\ldots ,g_m, h_1,\ldots ,h_l \in {\mathbb {R}}[x]\), let \(S(g,h)\subset {\mathbb {R}}^n\) be as in (1.1) and \(Q(g,h)\subset {\mathbb {R}}[x]\) be as in (1.5). Assume that there exists \(L > 0\) such that \(L - \Vert x\Vert _2^2 \in Q( g,h )\) (Archimedean condition).

  1. (i)

    If a polynomial \(f\in {\mathbb {R}}[ x ]\) is positive on S(gh) then \(f\in Q(g,h)\).

  2. (ii)

    A real sequence \(y=(y_\alpha )_{\alpha \in {\mathbb {N}}^n}\) has a representing measure \(\mu \) on S(gh) if \(M_d(y)\succeq 0\), \(M_d(g_j\,y)\succeq 0\), \(j=1,\ldots ,m\), and \(M_d(h_t\,y)=0\), \(t=1,\ldots ,l\), for all \(d\in {\mathbb {N}}\).

The moment-SOS hierarchy Let \(S(g,h)\subset {\mathbb {R}}^n\) be as in (1.1) and \(Q(g,h)\subset {\mathbb {R}}[x]\) be as in (1.5) and assume that Q(gh) is Archimedean. Consider the POP:

$$\begin{aligned} f^\star := \min \left\{ {f(x):\ x \in S(g,h)}\right\} , \end{aligned}$$
(2.8)

known to be challenging as it is NP hard in general [31]. One may rewrite (2.8) as:

$$\begin{aligned} f^\star =\sup \left\{ {\lambda \in {\mathbb {R}}:\ f - \lambda > 0\text { on }S(g,h)} \right\} , \end{aligned}$$
(2.9)

and by invoking Theorem 2(i), \(f^\star =\sup \{ {\lambda \in {\mathbb {R}}:\ f - \lambda \in Q(g,h)}\}\). For each \(d\in {\mathbb {N}}\), let:

$$\begin{aligned} \rho _d:=\sup \left\{ {\lambda \in {\mathbb {R}}: \ f - \lambda \in Q_d( g,h)} \right\} ,\quad d\in {\mathbb {N}}, \end{aligned}$$
(2.10)

where \({Q_d}(g,h)\) stands for the truncated quadratic module of order d:

$$\begin{aligned} Q_d(g,h): = \left\{ {\sum \limits _{j = 0}^m {{\sigma _j}{g_j}} + \sum \limits _{t = 1}^l {{\phi _t}{h_t}} :\;{\sigma _j} \in \varSigma {\left[ x\right] _{d - {u_j}}},\ {\phi _t} \in {\mathbb {R}}[x]_{2d - 2{w_t}}}\right\} , \end{aligned}$$

with \({u_j}: = \lceil {\deg (g_j)/2} \rceil \) and \({w_t}: = \lceil {\deg ( h_t)/2}\rceil \). For each fixed d, problem (2.10) is a reinforcement of (2.9) and so \(\rho _d\le f^\star \) for all d. It also turns out that (2.8) can also be written as:

$$\begin{aligned} f^\star =\mathop {\inf }\limits _{\mu \in {\mathcal M}(S( g,h))} \int _{S(g,h)} {fd\mu } , \end{aligned}$$
(2.11)

where \({\mathcal M}( {S(g,h)})\) is the set of all finite Borel measures supported on S(gh). Let us denote by \({\mathbb {R}}^{{\mathbb {N}}^n}\) the set of all real sequences ordered by \({{\mathbb {N}}^n}\). With \(y=( y_\alpha )_{\alpha \in {{\mathbb {N}}^n}}\in {\mathbb {R}}^{{\mathbb {N}}^n}\) being the moment sequence of a measure \(\mu \), one has:

$$\begin{aligned} f^\star = \min \left\{ { L_y(f) :\, y\in {\mathbb {R}}^{{\mathbb {N}}^n}{\text { has a representing measure in }}{\mathcal M}(S( g,h))} \right\} , \end{aligned}$$
(2.12)

and by Theorem 2(ii):

$$\begin{aligned} \begin{array}{rl} {{f^\star } = \mathop {\min }\limits _{y \in {{\mathbb {R}}^{\mathbb {N}}}} }&{} L_y(f) \\ \qquad {{\text {s.t. }}}&{} M_d ( {{g_j}y}) \succeq 0, \, \forall d \in {\mathbb {N}},\\ {}&{} M_d\left( {{h_t}y}\right) = 0,\, \forall d \in {\mathbb {N}},\\ &{}y_0=1. \end{array} \end{aligned}$$
(2.13)

Let \(d_{\min } := \max \{\lceil \deg (f) / 2 \rceil , u_j, w_t \}\). For every fixed \(d \ge d_{\min }\), consider the finite truncation of the above problem:

$$\begin{aligned} \begin{array}{rl} {\tau _d} := \mathop {\min }\limits _{y \in {{\mathbb {R}}^{s( 2d)} }} &{} L_y(f)\\ \qquad \text {s.t. }&{} M_{d - u_j } ( {{g_j}y} ) \succeq 0,\\ &{}M_{d - w_t }( {{h_t}y} ) = 0,\\ &{}y_0=1. \end{array} \end{aligned}$$
(2.14)

Then (2.14) is a semidefinite relaxation of (2.13) and (2.10) is the dual of (2.14). Moreover, strong duality holds according to Josz and Henrion [22]. Therefore, one has \(\rho _d=\tau _d\le f^\star \) for all d. This primal-dual sequence of semidefinite programs (2.14)-(2.10) is the so-called moment-SOS hierarchy for optimization (also known as “Lasserre’s hierarchy”), and

$$\begin{aligned} f^\star \,=\,\lim _{d\rightarrow \infty }\rho _d\,=\,\lim _{d\rightarrow \infty }\tau _d. \end{aligned}$$

For more details on the moment-SOS hierarchy, the interested reader is referred to [24].

Complexity of Putinar’s Positivstellensatz Let \(c_n(\alpha ) := \frac{|\alpha |!}{\alpha _1!\dots \alpha _n!}\) for each \(\alpha \in {\mathbb {N}}^n\). We note \(\Vert h \Vert _{\max }: = \mathop {\max }\limits _\alpha \left\{ {{| {{h_\alpha }} |} / c_n(\alpha )} \right\} \), for a given \(h \in {\mathbb {R}}[x]\). The convergence rate of the sequence \((\rho _d)_{d\in {\mathbb {N}}}\) relies on the following result.

Theorem 3

(Nie–Schweighofer [42]) Assume that \(\emptyset \ne S(g,h)\subset (-1,1)^n\) is Archimedean and \(f^\star >0\). Then there exists \(C>0\) depending on g and h such that for \(d\in {\mathbb {N}}\) and

$$\begin{aligned} d\ge C\exp \left( \left( \deg (f)^2 n^{\deg (f)}\left( f^\star \right) ^{-1} \Vert f \Vert _{\max } \right) ^C \right) , \end{aligned}$$

one has \( f \in {Q_{ d}}( g,h )\).

Extraction of global minimizers Assume that the optimal value \(\tau _d\) is reached for a solution \(y^\star \) and that this solution satisfies the flat extension condition, that is:

$$\begin{aligned} {\hbox {rank}}\left( {{M_d}\left( y^\star \right) }\right) = {\hbox {rank}}\left( {M_{d - w}\left( y^\star \right) }\right) , \end{aligned}$$

for some \(d\ge w:=\max \{ {{u_i},{w_j}}\}\). Let \(r:={\hbox {rank}}( {{M_d}( y^\star )})\) and let \(\delta _a\) stands for the Dirac measure at point \(a\in {\mathbb {R}}^n\). Then \(\tau _d=f^\star \) and there exist \(x^{(j)}\in {\mathbb {R}}^n\) and \(\lambda _j\ge 0\), \(j=1,\dots ,r\), with \(\sum _{j = 1}^{r} {\lambda _j}=1\) such that and the sequence \(y^\star \) has the representing r-atomic measure \(\mu = \sum _{j = 1}^{r} {{\lambda _j}{\delta _{ x^{(j)} }}} \). The support of \(\mu \) is the set \(\{ {x^{(j)}:\ j = 1,\dots ,r} \}\), which belongs to the set of all minimizers of the original problem (2.8). Henrion and Lasserre [15] provide a numerical algorithm to extract the r atoms corresponding to the support of the atomic measure \(\mu \).

3 Representation theorems

In this section we provide two exact representations of globally nonnegative polynomials and polynomials nonnegative on basic semialgebraic sets (not necessarily compact). The representations are obtained thanks to a perturbation argument as well as existing representations for positive definite forms. Let \(\theta := 1 + {\Vert x \Vert _2^2}\). We denote by \(\mathbb S^{n-1}\) the unit sphere in \({\mathbb {R}}^n\). For each \(h\in {\mathbb {R}}[x]\), let

$$\begin{aligned} \delta (h):=\frac{{\sup \left\{ {h(x):x \in \mathbb S^{n-1}} \right\} }}{{\inf \left\{ {h(x):x \in \mathbb S^{n-1}} \right\} }}. \end{aligned}$$

For later use recall the following theorem.

Lemma 1

(Reznick [49, Theorem 3.12]) Suppose that \(p \in {\mathbb {R}}[ x]\) is a positive definite form of degree 2d, for some \(d\in {\mathbb {N}}\). Then for \(k \in {\mathbb {N}}\) and

$$\begin{aligned} k\ge \frac{{2nd(2d - 1)}}{{4\log 2}}\delta (p) - \frac{{n + 2d}}{2}, \end{aligned}$$

one has \({\Vert x \Vert ^{2k}_2}p \in \varSigma {[ x ]_{k + d}}\).

In [49, Theorem 3.12], Reznick guarantees that the SOS decomposition of \({\Vert x \Vert ^{2k}_2}p\) is actually a sum of powers of linear forms.

3.1 Globally nonnegative polynomials

Let us note \(\Vert h\Vert _1:=\sum \nolimits _\alpha {{|h_\alpha |}} \) for a given \(h \in {\mathbb {R}}[x]\). The following result provides a representation of globally nonnegative polynomials.

Theorem 4

Let \(f\in {\mathbb {R}}[ x]_{2d}\) be nonnegative on \({\mathbb {R}}^n\). Then for every \(\varepsilon >0\), for \(k_\varepsilon \in {\mathbb {N}}\) and

$$\begin{aligned} k_\varepsilon \ge \frac{{2(n+1)d(2d - 1)}}{{4\log 2}}(\varepsilon ^{-1}\Vert f\Vert _{1}+1) - \frac{{n+1 + 2d}}{2}, \end{aligned}$$

one has

$$\begin{aligned} { \theta ^{{k_\varepsilon }}}\left( {f + \varepsilon {{\theta }^{d}}} \right) \in \varSigma {[ x ]_{{k_\varepsilon } + d}}. \end{aligned}$$
(3.15)

Proof

The proof consists of three steps:

  1. 1.

    Associate a positive definite form to the globally nonnegative polynomial f.

  2. 2.

    Use Reznick’s representation from Lemma 1 to get a representation of this homogeneous form.

  3. 3.

    Transform back the homogeneous polynomial together with its representation to the original polynomial.

Let \(\tilde{f} = x_{n + 1}^{2d} f ( x/x_{n + 1} )\) be the degree 2d homogenization of f. Since f is globally nonnegative, \(\tilde{f}\) is nonnegative on \({\mathbb {R}}^{n+1}\). Let \(\varepsilon >0\) be fixed. We claim that

$$\begin{aligned} \tilde{f} + \varepsilon {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2d}} \in {\mathbb {R}}\left[ {x,{x_{n + 1}}}\right] \end{aligned}$$

is positive definite, i.e., is homogeneous and positive on \({{\mathbb {R}}^{n + 1}}\backslash \{ 0_{{{\mathbb {R}}^{n+1}}} \}\). Since

$$\begin{aligned} {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2 d}} = {\left( {x_1^2 + \dots + x_n^2 + x_{n + 1}^2} \right) ^{d}}, \end{aligned}$$

the polynomial \({\Vert {( {x,{x_{n + 1}}} )} \Vert _2^{2 d}}\) is homogeneous of degree 2d on \({\mathbb {R}}^{n+1}\). From this and since \(\tilde{f}\) is homogeneous of degree 2d, \(\tilde{f} + \varepsilon {\Vert {( {x,{x_{n + 1}}} )} \Vert _2^{2 d}}\) is homogeneous of degree 2d. For every \(( {x,{x_{n + 1}}} )\in {{\mathbb {R}}^{n + 1}}\backslash \{ 0_{{{\mathbb {R}}^{n+1}}} \}\), \(\Vert {( {x,{x_{n + 1}}} )} \Vert _2 > 0\). From this and since \(\tilde{f}\) is nonnegative on \({\mathbb {R}}^{n+1}\),

$$\begin{aligned} \tilde{f}\left( {x,{x_{n + 1}}} \right) + \varepsilon {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2 d}} > 0 , \end{aligned}$$

for all \(( {x,{x_{n + 1}}})\in {{\mathbb {R}}^{n + 1}}\backslash \{ 0_{{{\mathbb {R}}^{n+1}}} \}\). In addition, it is not hard to show that

$$\begin{aligned} \inf \left\{ {\tilde{f}}\left( x,{x_{n + 1}}\right) + \varepsilon {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2 d}}\, :\left( x,{x_{n + 1}}\right) \in {\mathbb {S}}^{n}\right\} \ge \varepsilon \end{aligned}$$

and

$$\begin{aligned}&\sup \left\{ \tilde{f}\left( x,{x_{n + 1}}\right) + \varepsilon {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2 d}}\,:\left( x,{x_{n + 1}}\right) \in \mathbb S^{n}\right\} \\&\quad \le \sup \{\tilde{f}\left( x,{x_{n + 1}}\right) \, :\left( x,{x_{n + 1}}\right) \in \mathbb S^{n}\}+\varepsilon \\&\quad \le \Vert \tilde{f}\Vert _{1}+\varepsilon . \end{aligned}$$

Thus, \(\delta (\tilde{f} +\Vert .\Vert ^{2d}_2)\le (\Vert \tilde{f}\Vert _{1}+\varepsilon )/\varepsilon =(\Vert f\Vert _{1}+\varepsilon )/\varepsilon \). From this and by applying Lemma 1 with \(p=\tilde{f}+ \varepsilon {\Vert {( {x,{x_{n + 1}}} )} \Vert _2^{2 d}}\), for \(k_\varepsilon \in {\mathbb {N}}\) and

$$\begin{aligned} k_\varepsilon \ge \frac{{2(n+1)d(2d - 1)}}{{(4\log 2)}}(\varepsilon ^{-1}\Vert f\Vert _{1}+1) - \frac{{n + 1+2d}}{2}, \end{aligned}$$

there exists \({{\tilde{\sigma }}_\varepsilon } \in \varSigma [ {x,{x_{n + 1}}} ]_{k_{\varepsilon } + d}\) such that

$$\begin{aligned} {\Vert {\left( {x,{x_{n + 1}}} \right) } \Vert _2^{2{k_\varepsilon }}}\left( {\tilde{f} + \varepsilon {{\Vert \left( {x,{x_{n + 1}}} \right) \Vert }_2^{2 d}}}\right) = {{\tilde{\sigma }}_\varepsilon }. \end{aligned}$$

By replacing \(x_{n+1}\) by 1, one has

$$\begin{aligned} {\theta ^{{k_\varepsilon }}}\left( {f + \varepsilon {{\theta }^{d}}} \right) = {{\tilde{\sigma }}_\varepsilon }( {x,1} ). \end{aligned}$$

Let us note \({\sigma _\varepsilon }( x ) := {{\tilde{\sigma }}_\varepsilon }( {x,1})\), for every \(x\in {\mathbb {R}}^n\). Since \({{\tilde{\sigma }}_\varepsilon } \in \varSigma [ {x,{x_{n + 1}}} ]_{k_{\varepsilon } + d}\), it follows that \({\sigma _\varepsilon } \in \varSigma [ x ]_{k_{\varepsilon } + d}\), yielding the desired result. \(\square \)

3.2 Polynomials nonnegative on a basic semialgebraic set

We recall the definition of the truncated quadratic module of order d associated with S(g):

$$\begin{aligned} Q_d(g): = \left\{ \sigma _0+\sum \limits _{j = 1}^m {{\sigma _j}{g_j}} :\;{\sigma _0} \in \varSigma {[ x]_{d}},\, {\sigma _j} \in \varSigma {[ x]_{d - {u_i}}}\right\} , \end{aligned}$$

where \({u_j}: = \lceil {\deg (g_j)/2} \rceil \), \(j=1,\dots ,m\). For every \(h \in {\mathbb {R}}[ x]\), let us define

$$\begin{aligned} d_1( h): = {1 + \lfloor {{{\deg ( h )}}/{2}} \rfloor } \qquad \text { and }\qquad d_2( h): = {\lceil {{{\deg ( h )}}/{2}} \rceil } . \end{aligned}$$

The following result provides a degree bound for the SOS weights of [46, Theorem 1].

Theorem 5

Let \(g = \{ {{g_1},\dots ,{g_m}} \} \subset {\mathbb {R}}[ x]\) and \(f\in {\mathbb {R}}[ x]\) such that f is nonnegative on S(g). Let \(\varepsilon > 0\) and \(d\in {\mathbb {N}}\) be such that at least one of the following two conditions is satisfied:

  1. (i)

    \(d\ge d_1(f)\);

  2. (ii)

    \(d\ge d_2(f)\) and \(g_m:=f+\lambda \) for some real \(\lambda \ge 0\).

    Then there exist \(k_\varepsilon \in {\mathbb {N}}\) such that

    $$\begin{aligned} {\theta ^{{k_\varepsilon }}}\left( {f + \varepsilon \, {\theta ^{d}}}\right) \in {Q_{{k_\varepsilon } + d}}( g ). \end{aligned}$$
    (3.16)

The detailed proof of Theorem 5 relies on Jacobi’s technique in his proof of [18, Theorem 7] and is postponed to Appendix A.1. This proof consists of three steps:

  1. 1.

    Associate a homogeneous polynomial \({\tilde{f}}\) to the polynomial f.

  2. 2.

    Use Putinar’s Positivstellensatz (Theorem 2 (i)) to obtain a representation of \({\tilde{f}}\).

  3. 3.

    Transform back the representation of \({\tilde{f}}\) to obtain a representation of f.

Remark 1

Theorem 5 is an extension of Putinar’s Positivstellensatz to (possibly) non-compact sets S(g), and so does not require the Archimedean condition. The price to pay for such an extension is the presence of the multiplier \(\theta ^{k_\varepsilon }\) in front of f and the perturbation term \(\varepsilon \,\theta ^{d}\). Note that (ii) involves a tighter bound for d, compared to (i), since \(d_1(f)\ge d_2(f)\). The counterpart is that (ii) requires to include the additional constraint \(f+\lambda \ge 0\), for some \(\lambda \ge 0\).

Complexity of Putinar–Vasilescu’s Positivstellensatz For each \(h\in {\mathbb {R}}[x]_{2l}\), we denote \(\Vert h \Vert _{\max ,l}: = \mathop {\max }\limits _\alpha \left\{ {{| {{h_\alpha }} |} / c_{n+1}(\alpha ,2l-|\alpha |)} \right\} \). Let us recall \(c_n(\alpha ) := \frac{|\alpha |!}{\alpha _1!\dots \alpha _n!}\) for each \(\alpha \in {\mathbb {N}}^n\). By relying on Nie–Schweighofer’s result (Theorem 3) after the homogenization trick, it is straightforward to analyze the complexity of Putinar–Vasilescu’s Positivstellensatz:

Proposition 1

Assume that all assumptions of Theorem 5 hold and \(0_{{\mathbb {R}}^n}\in S(g)\). Then there exists \(C>0\) depending on g such that for all \(k_\varepsilon \in {\mathbb {N}}\) satisfying

$$\begin{aligned} k_\varepsilon \ge C\exp \left( \left( 4^{d+1}d^2(n + 1)^{2d}\left( \varepsilon ^{-1} \Vert f \Vert _{\max , d}+ \max \limits _{\bar{\alpha }\in {\mathbb {N}}^{n+1}_d} \displaystyle \frac{c_{n+1}(\bar{\alpha })}{c_{n+1}(2\bar{\alpha })} \right) \right) ^C \right) -d, \end{aligned}$$

one has \({\theta ^{{k_\varepsilon }}}\left( {f + \varepsilon \, {\theta ^{d}}}\right) \in {Q_{{k_\varepsilon } + d}}( g )\).

The proof of Proposition 1 is postponed to Appendix A.2.

Discussion about the \(\varepsilon \) parameter The (arbitrary small) positive parameter \(\varepsilon \) in Theorems 4 and 5 ensures the positivity of polynomials over the respective considered domain \({\mathbb {R}}^n\) or S(g), excluding the origin in the homogenized representations. However these representations can still hold, even when \(\varepsilon =0\), as illustrated in the following two examples:

Example 1

  1. (i)

    Motzkin’s polynomial \(f = x_1^4x_2^2 + x_1^2x_2^4 + 1 - 3x_1^2x_2^2\) is globally nonnegative but not SOS. However, \(\theta f\) is SOS since

    $$\begin{aligned} \begin{array}{rl} \theta f =&{} 2{\left( {\frac{1}{2}x_1^3{x_2} + \frac{1}{2}{x_1}x_2^3 - {x_1}{x_2}} \right) ^2} + {\left( {x_1^2{x_2} - {x_2}} \right) ^2} + {\left( {{x_1}x_2^2 - {x_1}} \right) ^2}\\ &{}+\, \frac{1}{2}{\left( {x_1^3{x_2} - {x_1}{x_2}} \right) ^2} + \frac{1}{2}{\left( {{x_1}x_2^3 - {x_1}{x_2}} \right) ^2} + {\left( {x_1^2x_2^2 - 1} \right) ^2}. \end{array} \end{aligned}$$
  2. (ii)

    Let \(f = \left( {x_1^2 + x_2^2} \right) x_1^2x_2^2 - 3x_1^2x_2^2\) and \(g = x_1^2 + x_2^2 - 4\). It is not hard to show that f is nonnegative on the non-compact set S(g). Moreover, \(f = \frac{1}{4}x_1^2x_2^2\left( {x_1^2 + x_2^2} \right) + \frac{3}{4}x_1^2x_2^2g\). Thus, \({\theta ^0}f \in {Q_3}\left( g \right) \).

However, the certificate (3.15) for global nonnegativity with \(\varepsilon =0\) is not true in general, as shown in the following lemma. This is due to the fact that any SOS multiplier for Delzell’s polynomial must have a zero at the “bad point”, so one can not find any globally positive multiplier.

Lemma 2

The nonnegative dehomogenized Delzell’s polynomial [9]:

$$\begin{aligned} f=x_1^4x_2^2+x_2^4x_3^2+x_1^2x_3^4-3x_1^2x_2^2x_3^2+x_3^8 \end{aligned}$$

satisfies that \({\theta ^k}f \notin \varSigma [x]\) for all \(k\in {\mathbb {N}}\).

Proof

Assume by contradiction that \({\theta ^K}f \in \varSigma [x]\) for some \(K\in {\mathbb {N}}\). Note that \(n=3\) here. We denote by \(\tilde{f}\) the degree 8 homogenization of f, i.e.,

$$\begin{aligned} \tilde{f}=x_4^2\left( x_1^4x_2^2+x_2^4x_3^2+x_1^2x_3^4-3x_1^2x_2^2x_3^2\right) +x_3^8. \end{aligned}$$

Then \({\Vert (x,x_{n+1})\Vert _2^{2K}}\tilde{f} \in \varSigma [x,x_{n+1}]\). As shown in [50, §6], it is impossible. This contradiction yields the conclusion. \(\square \)

The certificate (3.16) for global nonnegativity on basic semialgebraic sets with \(\varepsilon =0\) is also not true in general, as shown in the following lemma:

Lemma 3

With \(n=1\), let \(f=x\) and \(g = \{ {{x^3}, - {x^3}} \}\). Then \(f = 0\) on \(S( g ) = \{ 0 \}\). It follows that f is nonnegative on S(g), but:

  1. (i)

    \({\theta ^k}f \notin Q( g )\) for all \(k\in {\mathbb {N}}\) and;

  2. (ii)

    for every \(\varepsilon >0\), \({\theta ^k}(f + \varepsilon \theta ) \in Q_{k+1}( g )\) for all \(k\in {\mathbb {N}}\) with \(k \ge \max \{2,\,{\varepsilon ^{ - 2}}/4 - 1\}\).

Proof

We will show statement (i). Assume by contradiction that there exists \(k\in {\mathbb {N}}\) such that \({\theta ^k}f \in Q( g )\). Then there exists \({q_j}( x ) \in {\mathbb {R}}[ x ]\), \(j=0,\dots ,r\) such that

$$\begin{aligned} {\theta ^k}f = \sum \limits _{j = 1}^m {{q_j}{{( x )}^2}} + {q_0}( x ){x^3}. \end{aligned}$$

Assume that \({q_j}( x ) = {a_j} + {b_j}x + {x^2}{d_j}( x )\), where \(a_j,b_j\in {\mathbb {R}}\) and \(d_j\in {\mathbb {R}}[x]\), \(j=1,\dots ,r\). From this and since \({\theta ^k} = 1 + {x^2}e(x)\) for some \(e\in {\mathbb {R}}[x]\), one has

$$\begin{aligned} \left( 1 + {x^2}e(x)\right) x = \sum \limits _{j = 1}^r {a_j^2} + 2\sum \limits _{j = 1}^r {{a_j}{b_j}x} + {x^2}p(x), \end{aligned}$$

for some \(q\in {\mathbb {R}}[x]\). By comparing coefficients of monomials 1 and x in the two sides of the above equality, \(\sum \nolimits _{j = 1}^r {a_j^2} = 0\) and \(2\sum \nolimits _{j = 1}^r {{a_j}{b_j}} = 1\). It implies that \(a_j=0\), \(j=1,\dots ,r\), and \(2\sum \nolimits _{j = 1}^r {{a_j}{b_j}} = 1\). It follows that \(0=1\). It is impossible.

Let us prove the statement (ii). Let \(\varepsilon >0\) and \(k\in {\mathbb {N}}\), \(k\ge 2\). Since \({\theta ^k} = 1 + kx^2 + {x^4}e(x)\) for some \(e\in {\mathbb {R}}[x]_{2k-4}\), one has

$$\begin{aligned} {\theta ^k}\left( f + \varepsilon \theta \right) = \left( 1 + k{x^2} + {x^4}e(x)\right) \left( \varepsilon + x + \varepsilon {x^2}\right) = \varepsilon + x + \varepsilon ( {k + 1} ){x^2} + {x^3}q( x ), \end{aligned}$$

for some \(q\in {\mathbb {R}}[x]_{2k-2}\). Assume that \(k \ge {\varepsilon ^{ - 2}}/4 - 1\). Then

$$\begin{aligned} {\theta ^k}(f + \varepsilon \theta ) = \varepsilon - \frac{1}{{4\varepsilon (k + 1)}} + {\left( {x\sqrt{\varepsilon (k + 1)} + \frac{1}{{2\sqrt{\varepsilon (k + 1)} }}} \right) ^2} + {x^3}q( x )\in Q_{k+1}( g ). \end{aligned}$$

\(\square \)

From Lemmas 2 and 3, we conclude that the strict positivity of the \(\varepsilon \) parameter is necessary in general although the certification with \(\varepsilon =0\) may happen in many cases.

When the certificate (3.15) with \(\varepsilon =0\) occurs, one has the following remark about the exponent of \(\theta \) in (3.15).

Remark 2

If \(n=2\), there does not exist a fixed \(K\in {\mathbb {N}}\) such that for all nonnegative \(f\in {\mathbb {R}}[x]_6\) , \(\theta ^K f\) is SOS. Indeed, assume by contradiction that there exists such a K. Then, the degree 6 homogenization \(\tilde{f}\) of f would be a positive ternary sextic such that \(\Vert (x,x_{n+1})\Vert ^{2K}_2 \tilde{f}\) is SOS. By using [51, Theorem 1] and the fact that the homogeneous Motzkin’s polynomial is a positive ternary sextic which is not SOS, we obtain a contradiction.

In (3.15) with \(\varepsilon =0\), the multiplier \(\theta ^{k_{\varepsilon }}\) can be replaced with other SOS multipliers; see, e.g., Leep-Starr’s polynomial [32, Example 2]. With the multiplier \(\theta :=x_1^2+x_2^2+x_3^2\) and the homogenized Delzell’s polynomial D, Schabert provides in [52, Example 4.4] the exact SOS decomposition of the product \(\varTheta (x_1,x_2,x_3) D(x_1,x_2,x_3,x_4)\).

4 Polynomial optimization

In this section, we exploit the two representations from Theorem 4 and Theorem 5 to construct new hierarchies of semidefinite programs for POPs of the form \({f^\star } = \inf \{ {f( x ):\ x \in \varOmega } \}\) where \(\varOmega ={\mathbb {R}}^n\) for the unconstrained case and \(\varOmega =S\left( g\right) \) for the constrained case (with no compactness assumption), respectively. Instead of solving the original problem, we are rather interested in the perturbed problem:

$$\begin{aligned} f_\varepsilon ^\star := \inf \{ {f( x ) + \varepsilon \,\theta {{( x )}^d}:\ x \in \varOmega } \}, \end{aligned}$$
(4.17)

where \(\varepsilon >0\) is fixed, \(\theta (x) := 1 + {\Vert x \Vert _2^2}\), and \(2d \ge \deg \left( f \right) \). Now, assume that the optimal value \(f^\star \) of the original problem is attained at some \(x^\star \in \varOmega \). It is not difficult to show that if \(\varOmega \) is unbounded, the polynomial \(f + \varepsilon \,{\theta ^d}\) is coercive on \(\varOmega \), i.e.

$$\begin{aligned} \mathop {\lim }\limits _{x\in \varOmega ,\,\Vert x \Vert _2 \rightarrow \infty } \left( {f( x ) + \varepsilon \theta {{( x )}^d}} \right) = \infty , \end{aligned}$$

(see more in [2]). Indeed, it is due to the fact that f is bounded from below by \(f^\star \) on \(\varOmega \) and \(\theta {{( x )}^d}\rightarrow \infty \) as \(\Vert x \Vert _2 \rightarrow \infty \). Thus, the optimal value \(f_\varepsilon ^\star \) of the perturbed problem (4.17) is always attained at some global minimizer \(x^\star _\varepsilon \) even if \(\varOmega \) is non-compact. Then:

$$\begin{aligned} \begin{array}{rl} {f^\star } + \varepsilon \,\theta {\left( {{x^\star }} \right) ^d} &{}= f\left( {{x^\star }} \right) + \varepsilon \,\theta {\left( {{x^\star }}\right) ^d}\\ &{}\ge f_\varepsilon ^\star = f\left( {x_\varepsilon ^\star } \right) + \varepsilon \,\theta {\left( {x_\varepsilon ^\star } \right) ^d} \ge f\left( {x_\varepsilon ^\star } \right) \ge {f^\star }. \end{array} \end{aligned}$$

Thus, \({f^\star } \in [ {f_\varepsilon ^\star - \varepsilon \, \theta {{( {{x^\star }} )}^d},f_\varepsilon ^\star } ]\), i.e., \(f^\star _\varepsilon \) is a perturbation of \(f^\star \) and the gap between both of them is at most \(\varepsilon \, \theta ( x^\star )^d\). Next, observe that:

$$\begin{aligned} \begin{array}{rl} f_\varepsilon ^\star &{}= \sup \left\{ {\lambda \in {\mathbb {R}}:\,f + \varepsilon \, {\theta ^d} - \lambda \ge 0{\text { on }}\varOmega } \right\} \\ &{} = \sup \left\{ {\lambda \in {\mathbb {R}}:\,{\theta ^k}\left( {f + \varepsilon \, {\theta ^d} - \lambda } \right) \ge 0{\text { on }}\varOmega } \right\} ,\ k\in {\mathbb {N}}. \end{array} \end{aligned}$$

The following hierarchies are based on the simple idea of replacing constraint “\(\theta ^k( f + \varepsilon {\theta ^d} - \lambda ) \ge 0{\text { on }}\varOmega \)” by relaxed constraint “\({\theta ^k}( {f + \varepsilon {\theta ^d} - \lambda } )\) is in the truncated quadratic module associated with \(\varOmega \)”.

4.1 Unconstrained case

Given \(f \in {\mathbb {R}}[ x]_{2 d}\), let us consider the following problem:

$$\begin{aligned} \begin{array}{l} f^\star :=\inf \limits _{x \in {{\mathbb {R}}^n}} f( x ). \end{array} \end{aligned}$$
(4.18)

In the sequel, we assume that \(f^\star >-\infty \) and let \(\varepsilon >0\) be fixed. Consider the hierarchy of semidefinite programs indexed by \(k\in {\mathbb {N}}\):

$$\begin{aligned} \begin{array}{rl} {\tau _k^1}(\varepsilon ): = \inf &{}{L_y}\left( {{\theta ^k}\left( {f + \varepsilon {\theta ^d}} \right) } \right) \\ \text {s.t.}&{}y = {\left( {{y_\alpha }} \right) _{\alpha \in {\mathbb {N}}^n_{2\left( {d + k} \right) }}} \subset {\mathbb {R}},\\ &{}{M_{k + d}}( y )\succeq 0,\\ &{}{L_y}\left( {{\theta ^k}} \right) = 1. \end{array} \end{aligned}$$
(4.19)

Theorem 6

For every \(k\in {\mathbb {N}}\), the dual of (4.19) reads:

$$\begin{aligned} {\rho _k^1}(\varepsilon ) := \sup \left\{ {\lambda \in {\mathbb {R}}:\ {\theta ^k}\left( {f- \lambda + \varepsilon {\theta ^d} } \right) \in \varSigma {[ x ]_{k + d}}} \right\} . \end{aligned}$$
(4.20)

The following statements hold:

  1. 1.

    The sequence \({( {{\rho _k^1}}(\varepsilon ))_{k \in {\mathbb {N}}}}\) is monotone non-decreasing.

  2. 2.

    Assume that \(f^\star \) in (4.18) is attained at \(x^\star \in {\mathbb {R}}^n\). Then there exists \(K\in {\mathbb {N}}\) such that \(f^\star \le {\rho _k^1} (\varepsilon )\le {f^\star } + \varepsilon \, \theta {( {{x^\star }})^d}\) for all \(k\ge K\). In particular, K is upper bounded by \( O(\varepsilon ^{-1})\) as \(\varepsilon \downarrow 0\).

Proof

  1. 1.

    Let \(k\in {\mathbb {N}}\) and fix \(\bar{\varepsilon }>0\), arbitrary. By (4.20), there exists a real \(\bar{\lambda }\) such that

    $$\begin{aligned} {\rho _k^1}( \varepsilon ) - \bar{\varepsilon }\le \bar{\lambda }\text { and }{\theta ^k}\left( {f- \bar{\lambda }+ \varepsilon {\theta ^d} } \right) \in \varSigma {[ x ]_{k + d}}. \end{aligned}$$

    Since \(\theta \in \varSigma {[ x ]_{1}}\), \({\theta ^{k + 1}}( {f - \bar{\lambda }+ \varepsilon {\theta ^d} } ) \in \varSigma {[ x ]_{k + d+1}}\). By (4.20), \({\rho ^1_{k + 1}}( \varepsilon ) \ge \bar{\lambda }\ge {\rho ^1 _k}( \varepsilon ) - \bar{\varepsilon }\). This implies that \({\rho ^1_{k + 1}}( \varepsilon ) \ge {\rho ^1 _k}( \varepsilon )\).

  2. 2.

    By (4.18), \(f-f^\star \) is nonnegative. By Theorem 4, there exists \(K\in {\mathbb {N}}\) and \(K= O(\varepsilon ^{-1})\) as \(\varepsilon \downarrow 0\) such that

    $$\begin{aligned} {\theta ^K}\left( {f - f^\star + \varepsilon {\theta ^d}} \right) \in \varSigma [ x ]_{K + d}. \end{aligned}$$

    Let \(k\ge K\) be fixed. Since \(\theta \in \varSigma {[ x ]_{1}}\), one has

    $$\begin{aligned} {\theta ^k}\left( {f - {f^\star }+ \varepsilon {\theta ^d} } \right) = {\theta ^{K + ( {k - K} )}}\left( {f - {f^\star }+ \varepsilon {\theta ^d}} \right) \in \varSigma {[ x]_{k + d}}. \end{aligned}$$

    By (4.20), \({f^ \star } \le {\rho ^1 _k}( \varepsilon )\). Thus, \({f^ \star } \le {\rho ^1_k}( \varepsilon )\) for all \(k\ge K\). Let \(k\in {\mathbb {N}}\) and fix \(\bar{\varepsilon }>0\), arbitrary. By (4.20), there exists a real \(\bar{\lambda }\) such that

    $$\begin{aligned} {\rho ^1_k}( \varepsilon ) - \bar{\varepsilon }\le \bar{\lambda }\text { and }{\theta ^k}\left( {f - \bar{\lambda }+ \varepsilon {\theta ^d}} \right) \in \varSigma [ x ]_{k + d}. \end{aligned}$$

    It follows that \(f - \bar{\lambda }+ \varepsilon {\theta ^d} \ge 0\) on \({\mathbb {R}}^n\). From this,

    $$\begin{aligned} f^\star + \varepsilon \theta {\left( {{x^\star }} \right) ^d}=f\left( {{x^\star }} \right) + \varepsilon \theta {\left( {{x^\star }} \right) ^d} \ge \bar{\lambda }\ge {\rho ^1 _k} -\bar{\varepsilon }. \end{aligned}$$

    This implies \(f^\star + \varepsilon \theta {( {{x^\star }} )^d} \ge {\rho ^1_k}( \varepsilon )\), the desired result.\(\square \)

We guarantee strong duality for previous primal-dual problems:

Proposition 2

Let \(k\in {\mathbb {N}}\). Then \( \tau _k^1(\varepsilon )=\rho _k^1(\varepsilon )\). Moreover, if \(\tau _k^1(\varepsilon )>-\infty \) then the optimal value \(\rho _k^1(\varepsilon )\) is attained.

Proof

By Slater’s constraint qualification [7, §5.2.3], it suffices to show that (4.19) admits a strictly feasible solution. Let us denote by \(\mu \) the measure with density \({\chi _{{{[ {0,1} ]}^n}}}\theta ^{-k}\) with respect to the Lebesgue measure, where \(\chi _A\) is the characteristic function of a given set \(A\subset {\mathbb {R}}^n\). Set \({y_\alpha }: = \int {{x^\alpha }d\mu } \) for all \(\alpha \in {\mathbb {N}}^n\). We claim that \(y_\alpha \in {\mathbb {R}}\) for all \(\alpha \in {\mathbb {N}}^n\), \({L_y}( {{\theta ^k}} ) = 1\) and \({M_{k + d}}( y )\succ 0\). Indeed, for all \(\alpha \in {\mathbb {N}}^n\)

$$\begin{aligned} \begin{array}{rl} | {{y_\alpha }} | &{}= | {\int {{x^\alpha }{\chi _{{{[ {0,1}]}^n}}}{\theta ^{ - k}}dx} } | = | {\int _{{{[ {0,1} ]}^n}} {{x^\alpha }{\theta ^{ - k}}dx} } |\\ &{}\le \int _{{{[ {0,1} ]}^n}} {{{| {{x_1}} |}^{{\alpha _1}}}\dots {{| {{x_n}} |}^{{\alpha _n}}}{\theta ^{ - k}}dx} \le 1, \end{array} \end{aligned}$$

since \(\theta ^{-k}\le 1\). Thus, \(y_\alpha \in {\mathbb {R}}\) for all \(\alpha \in {\mathbb {N}}^n\). In addition,

$$\begin{aligned} {L_y}\left( {{\theta ^k}}\right) = \int {{\theta ^k}{\chi _{{{[ {0,1}]}^n}}}{\theta ^{ - k}}dx} = \int _{{{[ {0,1}]}^n}} {dx} = 1. \end{aligned}$$

Let \(p\in {{\mathbb {R}}^{s(d+k)}}\backslash \{ 0 \}\) be fixed. We state that \({p^T}{M_{d + k}}( y )p>0\). Assume by contradiction that \({p^T}{M_{d + k}}( y)p\le 0\). One has

$$\begin{aligned} \begin{array}{rl} 0 &{}\ge {p^T}{M_{d + k}}( y )p = \int {{p^T}{v_{d + k}}v_{d + k}^Tpd\mu } \\ &{}= \int {{p^T}{v_{d + k}}v_{d + k}^Tp{\chi _{{{[ {0,1}]}^n}}}{\theta ^{ - k}}dx} = \int _{{{[ {0,1}]}^n}} {{{\left( {{p^T}{v_{d + k}}}\right) }^2}{\theta ^{ - k}}dx} \ge 0. \end{array} \end{aligned}$$

It follows that \({p^T}{v_{d + k}}=0\) on \({{[ {0,1}]}^n}\), thus \(p=0\) yielding a contradiction. From this, \({( {{y_\alpha }} )_{\alpha \in {\mathbb {N}}_{d + k}^n}}\) is a feasible solution of (4.19) with \({M_{k + d}}( y )\succ 0\). By strong duality, the conclusion follows. \(\square \)

4.2 Constrained case

Consider the following problem:

$$\begin{aligned} \begin{array}{l} f^\star :=\inf \limits _{x \in S( g)} f( x), \end{array} \end{aligned}$$
(4.21)

where \(f \in {\mathbb {R}}[ x ]\), \(g = \{ {{g_1},\dots ,{g_m}} \} \subset {\mathbb {R}}[ x]\). Assume that \(S( g) \ne \emptyset \) and \(f^\star >-\infty \). Denote \({u_j}: = \lceil {\deg ( {{g_j}} )/2} \rceil \), \(j=0,1,\dots ,m\). Let \(\varepsilon >0\) be fixed.

4.2.1 Unknown lower bound

Let \(d: = \lfloor \deg (f)/2\rfloor +1\) and consider the hierarchy of semidefinite programs indexed by \(k\in {\mathbb {N}}\):

$$\begin{aligned} \begin{array}{rl} {\tau _k^2}(\varepsilon ): = \inf &{}{L_y}\left( {{\theta ^k}\left( {f + \varepsilon {\theta ^d}} \right) } \right) \\ \text {s.t.}&{}y = {\left( y_\alpha \right) _{\alpha \in {\mathbb {N}}^n_{2( {d + k})}}} \subset {\mathbb {R}},\\ &{}{M_{k + d}}\left( {y}\right) \succeq 0,\\ &{}{M_{k + d - {u_j}}}\left( {{g_j}y}\right) \succeq 0 ,\;j = 1,\dots ,m,\\ &{}{L_y}\left( {{\theta ^k}}\right) = 1. \end{array} \end{aligned}$$
(4.22)

Theorem 7

For every \(k\in {\mathbb {N}}\), the dual of (4.22) reads:

$$\begin{aligned} {\rho _k^2} (\varepsilon ):= \sup \left\{ {\lambda \in {\mathbb {R}}:\ {\theta ^k}\,\left( {f - \lambda + \varepsilon \,{\theta ^d} }\right) \in Q_{k + d}( g)} \right\} . \end{aligned}$$
(4.23)

The following statements hold:

  1. 1.

    The sequence \({( {{\rho _k^2}(\varepsilon )})_{k \in {\mathbb {N}}}}\) is monotone non-decreasing.

  2. 2.

    Assume that problem (4.21) has an optimal solution \(x^\star \). Then there exists \(K\in {\mathbb {N}}\) such that \(f^\star \le {\rho _k^2}(\varepsilon ) \le {f^\star } + \varepsilon \, \theta {( {{x^\star }})^d}\) for all \(k\ge K\). In particular, K is upper bounded by \(C \exp \left( O(\varepsilon ^{-1})^C \right) -d\) as \(\varepsilon \downarrow 0\), for some \(C>0\) depending on g.

The proof of Theorem 7 relies on Theorem 5 (i) and is similar to Theorem 6. The upper bound on K is based on Proposition 1.

We guarantee strong duality for previous primal-dual problems:

Proposition 3

There exists \(K\in {\mathbb {N}}\) such that \(\tau _k^2(\varepsilon )=\rho _k^2(\varepsilon )\) for all \(k\ge K\). Moreover, if \(\tau _k^2(\varepsilon )>-\infty \), the optimal value \(\rho _k^2(\varepsilon )\) is attained.

The proof of Proposition 3 is postponed to Appendix A.3.

Remark 3

If S(g) has nonempty interior then strong duality holds for all orders k of the primal-dual problems (4.22)–(4.23). Indeed, by constructing a sequence of moments from the Lebesgue measure on an open ball contained in S(g), one can find a strictly feasible solution of (4.22) and then apply Slater’s constraint qualification [7, § 5.2.3].

4.2.2 Known lower bound

Assume that \(g_m:=f-\underline{f}\) for some real \(\underline{f}\le f^\star \) and let \(d: = \lceil \deg ( f)/2\rceil \). We then obtain the same conclusion as Theorem 7 with replacing here \(\tau _k^2(\varepsilon )\) and \(\rho _k^2(\varepsilon )\) by \(\tau _k^3(\varepsilon )\) and \(\rho _k^3(\varepsilon )\), respectively. The proof relies on Theorem 5 (ii) and is similar to Theorem 6. Note that here \(g_m=(f-f^\star )+(f^\star -\underline{f} )\) with \(f-f^\star \ge 0\) on S(g) and \(f^\star -\underline{f}\ge 0\). The upper bound on K is also based on Proposition 1.

The next proposition states that strong duality is guaranteed for each relaxation order k.

Proposition 4

Let \(k\in {\mathbb {N}}\). Then \(\tau _k^3(\varepsilon )=\rho _k^3(\varepsilon )\). Moreover, if \(\tau _k^3(\varepsilon )>-\infty \) then the optimal value \(\rho _k^3(\varepsilon )\) is attained.

The proof of Proposition 4 is postponed to Appendix A.4.

Remark 4

A lower bound \(\underline{f}\) of problem (4.21) can be obtained by solving the following SDP:

$$\begin{aligned} \sup \left\{ {\lambda \in {\mathbb {R}}:\ {\theta ^k}\,\left( {f - \lambda } \right) \in Q_{k + d}( g)} \right\} ,\ k \in {\mathbb {N}}. \end{aligned}$$

Assume that we know a lower bound \(\underline{f}\) of problem (4.21). By adding the inequality constraint \(f-\underline{f}\ge 0\) in S(g), we obtain the same semialgebraic set, i.e. \(S( {g \cup \{ {f - \underline{f} } \}} ) = S(g)\). Thus, the two problems \(\inf \{f(x):\,x\in S(g)\}\) and \(\inf \{f(x):\,x\in S( {g \cup \{ {f - \underline{f} } \}} )\}\) are identical and the primal-dual SDP relaxations with values \(\rho _k^3(\varepsilon )\) and \(\tau _k^3(\varepsilon )\) of the latter one satisfy strong duality for each relaxation order k.

General case Since \({g_j} \ge 0\) on S(g) and \(- {g_j} \ge 0\) on S(g) is equivalent to \({g_j} = 0\) on S(g), S(g) can be rewritten as S(gh) with \(g =\{ {{g_1},\dots ,{g_m}} \}\) is the set of polynomials involved in the inequality constraints and \(h = \{ {{h_1},\dots ,{h_l}} \} \) is the set of polynomials involved in the equality constraints; in addition, \({\mathbb {R}}^n=S( {\{ 0 \},\emptyset } )\). Consider the general POP:

$$\begin{aligned} f^\star :=\inf \limits _{x \in S( g,h)} f( x), \end{aligned}$$
(4.24)

with \(f^\star \in {\mathbb {R}}\) and define

$$\begin{aligned} (d,i) = \left\{ \begin{array}{ll} \left( \lceil {\deg (f)/2} \rceil ,1\right) &{}\text { if }S(g,h) = {{\mathbb {R}}^n},\\ \left( 1 + \lfloor {\deg (f)/2} \rfloor ,2\right) &{}\text { if }S(g,h) \ne {{\mathbb {R}}^n}\text { and lower bound} \,\underline{f}\, \text { is unknown},\\ \left( \lceil {\deg (f)/2} \rceil ,3\right) &{}\text { otherwise and set }g_m:=f-\underline{f}. \end{array} \right. \end{aligned}$$

For fixed \(\varepsilon >0\), one considers the following SDP relaxation of POP (4.24)

$$\begin{aligned} \begin{array}{ll} {\tau _k^i}(\varepsilon ): = \inf &{}{L_y}\left( {{\theta ^k}\left( {f + \varepsilon {\theta ^d}} \right) } \right) \\ \text {s.t.}&{}y = {\left( {{y_\alpha }} \right) _{\alpha \in {\mathbb {N}}^n_{2\left( {d + k} \right) }}} \subset {\mathbb {R}},\\ &{}{M_{k + d - {u_j}}}\left( {{g_j}y} \right) \succeq 0 ,\;j = 0,\dots ,m,\\ &{}{M_{k + d - {w_t}}}\left( {{h_t}y} \right) = 0 ,\;t = 1,\dots ,l,\\ &{}{L_y}\left( {{\theta ^k}}\right) = 1, \end{array} \end{aligned}$$
(4.25)

where \(g_0:=1\), \({u_j} = \lceil {\deg ( {{g_j}} )/2}\rceil \), \({w_t} = \lceil {\deg ( {{h_t}} )/2} \rceil \). Its dual is the semidefinite program:

$$\begin{aligned} {\rho _k^i}(\varepsilon ) := \sup \left\{ {\lambda \in {\mathbb {R}}:\ {\theta ^k}\left( {f - \lambda + \varepsilon {\theta ^d} } \right) \in Q_{k + d}( g,h)} \right\} \end{aligned}$$
(4.26)

The zero-duality gap between SDP (4.25) and SDP (4.26) is guaranteed for large enough k.

Remark 5

The condition \(\tau _k^i(\varepsilon )>-\infty \) is always satisfied whenever k is sufficently large. Indeed by weak duality, when \(\varepsilon \) is fixed and k is sufficiently large then \(\tau _k^i(\varepsilon ) \ge \rho _k^i(\varepsilon ) \ge f^\star >-\infty \). However, when k is small, \(\tau _k^i(\varepsilon ) = -\infty \) may happen.

Let us now assume that the POP (4.24) has an optimal solution \(x^\star \). Then \(\rho _k^i( \varepsilon ) \in [f^\star ,f^\star +\varepsilon \theta (x^\star )^d]\) when \(\varepsilon >0\) is fixed and k is sufficiently large. Moreover, the gap between \(\rho _k^i( \varepsilon )\), and \(f^\star \) is at most \(\varepsilon \, \theta {( {{x^\star }})^d}\). Therefore, \(\rho _k^i ( \varepsilon )\) is indeed an approximation of \( {f^\star }\). In practice, \((\rho _k^i( \varepsilon ))_{k\in {\mathbb {N}}}\) often converges to the optimal value \(f_\varepsilon ^\star := \min \{ {f( x ) + \varepsilon \,\theta {{( x )}^d}\,:\, x \in S(g,h)} \}\) after finitely many steps (see Sect. 5).

Remark 6

  1. (i)

    The term \(\theta ^d\) in both (4.25) and (4.26) can be replaced by \(\varphi _d(x,1)\) where \(\varphi _d :{\mathbb {R}}^{n+1}\rightarrow {\mathbb {R}}\) is a positive form of degree 2d. For instant, one can select \(\varphi _d(x,1)=x_1^{2d}+\dots +x_n^{2d}+1\).

  2. (ii)

    Let \(r\in {\mathbb {N}}\) be fixed. For every k divisible by 2r, the term \(\theta ^k\) appearing in both (4.25) and (4.26) can be replaced by \(\psi _r(x,1)^{k/(2r)}\) where \(\psi _r :{\mathbb {R}}^{n+1}\rightarrow {\mathbb {R}}\) is a coercive positive form of degree 2r. For instant, one can select \(\psi _r(x,1)=x_1^{2r}+\dots +x_n^{2r}+1\).

Relation between classical optimality conditions and nonnegativity certificates When \(\rho _k^i (0)= f^\star \), the constraint qualification conditions hold at \(x^\star \).

Proposition 5

Assume that \(\rho _k^i (0)= f^\star \) for some \(k\in {\mathbb {N}}\), i.e., there exists \(\sigma _j\in \varSigma [x]\), \(j=0,\dots ,m\) and \(\phi _t\in {\mathbb {R}}[x]\), \(t=1,\dots ,l\) such that \({\theta ^k}(f - {f^\star }) = {\sigma _0} + \sum \nolimits _{j = 1}^m {{\sigma _j}{g_j}} + \sum \nolimits _{t = 1}^l {{\phi _t}{h_t}}\). Then the constraint qualification conditions hold at \(x^\star \):

  1. 1.

    \(\sigma _j\left( x^\star \right) \ge 0\) and \(g_j\left( x^\star \right) \ge 0\), for all \(j=1\dots ,m\);

  2. 2.

    \(\sigma _j\left( x^\star \right) g_j\left( x^\star \right) =0\), for all \(j=1,\dots ,m\);

  3. 3.

    \({\theta \left( x^\star \right) ^k}\nabla f \left( x^\star \right) = \sum \nolimits _{j = 1}^m {{\sigma _j\left( x^\star \right) }\nabla {g_j}\left( x^\star \right) } + \sum \nolimits _{t = 1}^l {{\phi _t\left( x^\star \right) }\nabla {h_t\left( x^\star \right) }}\).

The proof of Proposition 5 is similar to [26, Theorem 7.4].

If we take an arbitrary small \(\varepsilon >0\) then \(\rho _k^i ( \varepsilon )\) is arbitrary close to \(f^\star \) for large enough k. However, if one sets \(\varepsilon =0\), the statement “\(\rho _K^i (0)=f^\star \) for some \(K\in {\mathbb {N}}\)” is not true in general as stated in the following proposition:

Proposition 6

If the first order optimality condition fails at a global minimizer of problem (4.24), then \(\rho _k^i (0)< f^\star \) for all \(k\in {\mathbb {N}}\).

The proof of Proposition 6 is similar to [39, Proposition 3.4].

By applying Proposition 6 to POP \(\min \{x\,:\, x^3=0\}\), we obtain the statement (i) of Lemma 3. Indeed, the first order optimality condition fails at the global minimizer 0 of this problem. Therefore, the positivity of \(\varepsilon \) ensures convergence of \((\rho _k^i (\varepsilon ))_{k\in {\mathbb {N}}}\) to the neighborhood \([f^\star ,f^\star +\varepsilon \theta (x^\star )^d]\) of the optimal value \(f^\star \). We also conjecture that \(\rho _K^i (0)=f^\star \) for some \(K\in {\mathbb {N}}\) when some classical optimality conditions hold at every global minimizer of (4.24). In many cases, \(\rho _K^i(0)=f^\star \) with \(K=0,1\) when the KKT conditions hold (see Example 1 and [38, Example 4.4] with \(x_n=1\)). However, the KKT conditions are not enough for this conjecture due to the fact that the minimizer \(x^\star =(0,0,0)\) of dehomogenized Delzell’s polynomial in Lemma 2 satisfies the KKT conditions and \(\rho _k^i (0)<f^\star \) for all \(k\in {\mathbb {N}}\) in this case.

Reducing the non-compact case to a compact case Consider the POP: \(f^\star :=\inf \{f(x)\,:\,x\in S(g,h)\}\) where the feasible set S(gh) is possibly non-compact, and the associated perturbed POP: \(f^\star _\varepsilon :=\inf \{f(x)+\varepsilon \theta (x)^d \,:\,x\in S(g,h)\}\) with fixed \(\varepsilon >0\). Here one assumes that \(f^\star \) is attained at \(x^\star \) and \(2d>\deg (f)\). As in Sect. 4, \({f^\star } \in [ {f_\varepsilon ^\star - \varepsilon \, \theta {{( {{x^\star }} )}^d},f_\varepsilon ^\star } ]\). Suppose that a point \(\bar{x}\) in S(gh) is known. It is not hard to show that \(f+\varepsilon \theta ^d\) is coercive and therefore with \(C:=f(\bar{x})+\varepsilon \theta (\bar{x})^d\), the set \(S(\{C-f-\varepsilon \theta ^d\})\) is compact. Moreover,

$$\begin{aligned} f^\star _\varepsilon =\inf \left\{ f(x)+\varepsilon \theta (x)^d \,:\,x\in S\left( g\cup \{C-f-\varepsilon \theta ^d\},h\right) \right\} . \end{aligned}$$
(4.27)

Note that the quadratic module associated with the constraint set of POP (4.27) is Archimedean and so \(f^\star _\varepsilon \) can be approximated as close as desired by the Moment-SOS hierarchy. This approach is similar in spirit to that of [20]. However, determining a point \(\bar{x}\) in S(gh) is not easy in general. The hierarchy (4.26) relying on Putinar–Vasilescu’s Positivstellensatz goes beyond this restriction.

In a different way, if one relies on the big ball trick, we consider the following POP, for all \(\varepsilon > 0\):

$$\begin{aligned} \hat{f}^*_{\varepsilon }:= \inf \left\{ f(x)\,:\,x\in S\left( g\cup \{1-\varepsilon \Vert x\Vert _2^2\},h\right) \right\} . \end{aligned}$$
(4.28)

Obviously, one has \(\hat{f}^*_{\varepsilon }\downarrow f^\star \) as \(\varepsilon \downarrow 0\). The quadratic module associated with the constraint set of POP (4.28) is Archimedean and \(f^\star _\varepsilon \) can be approximated by the Moment-SOS hierarchy:

$$\begin{aligned} \rho _k(\varepsilon ):=\sup \left\{ \lambda \in {\mathbb {R}}\,:\,f-\lambda \in Q\left( g\cup \{1-\varepsilon \Vert x\Vert _2^2\},h\right) \right\} ,k\in {\mathbb {N}},\,\varepsilon >0. \end{aligned}$$
(4.29)

Thus the SOS weight of \(1-\varepsilon \Vert x\Vert _2^2\) appears in the SOS decomposition of \(f-\lambda \). However, the hierarchy (4.26) relying on Putinar–Vasilescu’s Positivstellensatz does not require such SOS weight in the decomposition of \(f-\lambda \), but requires a perturbation \(\varepsilon \theta ^d\) and a multiplier \(\theta ^k\).

4.3 Global optimizers

In this section we introduce a new method to find an approximation of a feasible point of a basic semialgebraic set S( gh) as defined in (1.1). We then apply this method to obtain an approximation of a global minimizer \(x^\star \) associated to \(f^\star =\min \{ {f( x):\, x \in S( {g,h})} \}\) via finding a feasible solution of \(S( {g\cup \{ {\rho _k^i ( \varepsilon )- f} \},h } )\).

Remark 7

Let \(\varepsilon >0\) be fixed and \(k\in {\mathbb {N}}\) be sufficiently large such that \(\rho _k^i( \varepsilon )\)is an upper bound of \(f^\star \). Let \(x^\star \) be a global minimizer of f on S(gh) and let \(\bar{x} \in S( {g\cup \{ {\rho _k^i( \varepsilon ) - f} \},h} )\). Then \(\bar{x} \in S( {g,h} )\) and \({f^\star }\le f( {\bar{x}} ) \le \rho _k^i( \varepsilon ) \le {f^\star }+\varepsilon \theta {( {{x^\star }} )^d}\).

Let us consider an arbitrary small \(\varepsilon >0\). The difference between \(\rho _k^i( \varepsilon )\) and \(f^\star \) will be as close as desired to \(\varepsilon \theta {( {{x^\star }} )^d}\) for large enough k. Assume that the solution set \(S( {g\cup \{ {f^\star - f}\},h })\) is finite and denote by \(y^\star _\varepsilon \) an optimal solution of SDP (4.25). In practice, when k is sufficiently large, \(y^\star _\varepsilon \) satisfies numerically the flat extension condition defined in Sect. 2. One may then use the algorithm of Henrion and Lasserre [15] to extract numerically the support of a representing measure for \(y^\star _\varepsilon \) which may include global minimizers of \(f^\star =\min \{ {f( x ):\, x \in S( {g,h})}\}\) (see the same extraction in [21, §3.2]). However we cannot guarantee the success of this extraction procedure in theory because the set \(S( {g\cup \{ {\rho _k^i( \varepsilon ) - f} \},h } )\) may not be zero dimensional when \(\rho _k^i( \varepsilon ) > f^\star \). For example, if \(f=\Vert x\Vert _2^2\) and \(S(g,h)={\mathbb {R}}^n\), \(S( {g\cup \{ {\rho _k^i( \varepsilon ) - f} \},h } )\) is a closed ball centered at the origin with radius \(\rho _k^i( \varepsilon )^{1/2}\). The following method aims at overcoming this issue from both theoretical and algorithmic sides. For further application cases of the flat moment criterion, we refer the interested reader to the framework from [48], which is based on altering the bottom right part of the moment matrix.

The Adding-Spherical-Constraints method (ASC) For \(a\in {\mathbb {R}}^n\) and \(r\ge 0\), let \(\overline{B( {a,r} )}\) (resp. \(\partial {B( {a,r} )} \)) be the closed ball (resp. sphere) centered at a with radius r, i.e.,

$$\begin{aligned} \overline{B( {a,r} )} = \left\{ {x \in {{\mathbb {R}}^n}:\, \Vert {x - a} \Vert _2 \le r} \right\} \quad \left( \text {resp.}\quad \partial {B( {a,r} )} = \left\{ {x \in {{\mathbb {R}}^n}:\, \Vert {x - a} \Vert _2 = r} \right\} \right) . \end{aligned}$$

The following result provides an efficient way to find a sequence of additional spherical equality constraints for a given semialgebraic set such that (i) the resulting set is a singleton (i.e. it contains a single real point), and (ii) this point is solution of a non-singular system of linear equations.

Lemma 4

Assume that \(S( g,h ) \ne \emptyset \). Let \({( {{a_t}} )_{t = 0,1,\dots ,n}} \subset {{\mathbb {R}}^n}\) such that \({a_t} - {a_0},\, t = 1,\dots ,n\) are linearly independent in \({\mathbb {R}}^n\). Let us define the sequence \({( {{\xi _t}} )_{t = 0,1,\dots ,n}} \subset {{\mathbb {R}}_ + }\) as follows:

$$\begin{aligned} \left\{ \begin{array}{l} {\xi _0} := \min \left\{ {{{\Vert {x - {a_0}} \Vert }_2^2}:\, x \in S( {g,h} )} \right\} ,\\ {\xi _t}: = \min \left\{ {{{\Vert {x - {a_t}} \Vert }_2^2}:\, x \in S\left( {g,h \cup \left\{ {{\xi _j} - {{\Vert {x - {a_j}} \Vert }_2^2}:\, j = 0,\dots ,t - 1} \right\} } \right) } \right\} ,\\ \quad \quad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \ \ \ \ t = 1,\dots ,n. \end{array} \right. \end{aligned}$$
(4.30)

Then there exists a unique real point \(x^\star \) in \(S( {g,h \cup \{ {{\xi _t} - {{\Vert {x - {a_t}} \Vert }_2^2}:\, t = 0,\dots ,n} \}} )\) which satisfies the non-singular linear system of equations

$$\begin{aligned} {\left( {{a_t} - {a_0}} \right) ^T}x^\star = -\frac{1}{2}\left( {\xi _t} - {\xi _0} -{\Vert {{a_t}} \Vert _2^2} + {\Vert {{a_0}} \Vert _2^2}\right) ,\ t = 1,\dots ,n. \end{aligned}$$
(4.31)

The proof of Lemma 4 is postponed to Appendix A.5.

Fig. 1
figure 1

Illustration of Lemma 4

Geometrically speaking, we find a sequence of spheres \({\partial {B( {{a_t},\xi _t^{1/2}})} }\), \(t=0,\dots ,n\), such that the intersection between these spheres and S(gh) is the singleton \(\{x^\star \}\) (see Fig. 1). Next, we use Lasserre’s hierarchy to compute the optimal values \(\xi _t\), \(t=0,\dots ,n\) of problem (4.30).

Theorem 8

Assume that \(S(g, h ) \cap \overline{B( {0,L^{1/2}})} \ne \emptyset \) for some \(L>0\). Let \({( {{a_t}} )_{t = 0,1,...,n}} \subset {{\mathbb {R}}^n}\) such that \({a_t}-a_0,\ t = 1,\dots ,n,\) are linearly independent in \({\mathbb {R}}^n\). Assume that the Moment-SOS hierarchies associated with the following POPs:

$$\begin{aligned} \left\{ \begin{array}{l} {\xi _0} := \min \left\{ {{{\Vert {x - {a_0}} \Vert }_2^2}:\, x \in S\left( {g\cup \left\{ {L - {{\Vert {x } \Vert }_2^2}} \right\} ,h} \right) } \right\} ,\\ {\xi _t}: = \min \left\{ {{\Vert {x - {a_t}} \Vert }_2^2}:\, x \in S\left( g,h\cup {\{ {{\xi _j} - {{\Vert {x - {a_j}} \Vert }_2^2}:\, j = 0,\dots ,t - 1} \}} \right) \right\} ,\,\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \ \ \ \ t = 1,\dots ,n, \end{array} \right. \end{aligned}$$
(4.32)

have finite convergence, and let \(w:=\max \{ {u_j,{w_q},1} \}\). For every \(k\in {\mathbb {N}}\), consider the following semidefinite programs:

$$\begin{aligned} \left\{ \begin{array}{l} \begin{array}{rl} \eta _{k}^0: = \mathop {\inf }\limits _{y \in {{\mathbb {R}}^{s( {2(k+w)} )}}}&{} {L_y}\left( {{{\Vert {x }-a_0 \Vert }_2^2}} \right) \\ \text {s.t. }&{}{M_{k+w-u_j}}\left( g_jy \right) \succeq 0,\\ &{}{M_{k+w - 1}}\left( {\left( {L - {{\Vert {x } \Vert }_2^2}} \right) y} \right) \succeq 0,\\ &{}{M_{{k+w} - {w_q}}}\left( {h_qy}\right) = 0,\\ &{}{y_0} = 1, \end{array}\\ \begin{array}{rl} \eta _{k}^t: = \mathop {\inf }\limits _{y \in {{\mathbb {R}}^{s( {2(k+w)} )}}}&{} {L_y}\left( {{{\Vert {x - {a_t}} \Vert }_2^2}} \right) \\ \text {s.t. }&{}{M_{k+w-u_j}}\left( g_jy \right) \succeq 0,\\ &{}{M_{{k+w} - {w_q}}}( {h_qy} ) = 0,\\ &{}{M_{k+w - 1}}\left( {\left( {{\eta _{k}^ j} - {{\Vert {x - {a_j}} \Vert }_2^2}} \right) y} \right) = 0,\, j = 0,\dots ,t - 1,\\ &{}{y_0} = 1, \end{array}\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad t=1,\dots ,n. \end{array} \right. \end{aligned}$$
(4.33)

Then there exists \(K\in {\mathbb {N}}\) such that for all \(k\ge K\), \(\eta _{k}^t = {\xi _t}\), \(t = 0,\dots ,n\). Moreover, there exist \(t \in \left\{ {0,\dots ,n} \right\} \) and \(\tilde{K}\in {\mathbb {N}}\) such that for all \(k\ge \tilde{K}\), the solution y of SDP (4.33) with value \(\eta _{k}^t\) satisfies the flat extension condition, i.e., \(\text {rank}\left( {{M_{k+w}}\left( y \right) } \right) = \text {rank}\left( {{M_{k}}\left( y \right) } \right) \). In addition, y has a representing \(\text {rank}\left( {{M_k}\left( y \right) } \right) \)-atomic measure \(\mu \) and \({\mathop \mathrm{supp}\nolimits } \left( \mu \right) \subset S\left( {g,h} \right) \).

The proof of Theorem 8 is postponed to Appendix A.6.

Remark 8

The Moment-SOS hierarchy of each POP (4.32) has finite convergence when one of the following conditions is satisfied:

  1. 1.

    (Lasserre [26, Theorem 7.5]) The ideal \(\left<h\right>:=\{\sum _j\psi _jh_j\,:\,\psi _j\in {\mathbb {R}}[x]\}\) is real radical, and the second-order sufficient condition holds at every global minimizer of each POP in (4.32).

  2. 2.

    (Lasserre et al. [27, Proposition 1.1] and [26, Theorem 6.13]) The real variety V(h) (\(=S(\emptyset ,h)\)) is finite.

Remark 9

In the final conclusion of Theorem 8, when y has a representing \({\hbox {rank}}( {{M_k}( y )} )\)-atomic measure \(\mu \), we may use the extraction algorithm from [15] to obtain the atomic support of \(\mu \).

Based on Theorem 8, Algorithm 1 below finds a feasible point in a nonempty (possibly non-compact) semialgebaric set \(S\left( {g,h} \right) \).

Algorithm 1

PolySys

Input: \(S( {g,h})\ne \emptyset \), \({( {{a_t}} )_{t = 0,1,\dots ,n}} \subset {{\mathbb {R}}^n}\) such that \({a_t} - {a_0},\, t = 1,\dots ,n\) are linearly independent, \(\varepsilon >0\) and \(k\in {\mathbb {N}}\).

Output: \(\bar{x}\). Begin with \(t:=0\) and do:

  1. 1.

    Solve SDP (4.26) with \(f = {\Vert {x }\Vert _2^2}\) to obtain \(\rho ^i_k( \varepsilon )\). Set \(L:=\rho ^i_k( \varepsilon )\) and go to step 2.

  2. 2.

    Solve SDP (4.33) to obtain \(\eta _{k}^t\) and an associated solution y.

    1. (a)

      If \(t\le n\) and \({\hbox {rank}}\,( {{M_{k+w}}( y )} ) = {\hbox {rank}}\,( {{M_k}( y )} )\), i.e., y has a representing measure \(\mu \), extract \({\mathop \mathrm{supp}\nolimits } ( \mu )\) from y by using the algorithm from [15]. Take \({\bar{x}}\in {\mathop \mathrm{supp}\nolimits } ( \mu )\) and stop.

    2. (b)

      If \(t\le n\) and \({\hbox {rank}}\,( {{M_{k+w}}( y )} ) \ne {\hbox {rank}}\,( {{M_k}( y )} )\), set \(t:=t+1\) and do again step 2.

    3. (c)

      If \(t= n+1\), stop.

Proposition 7

Let the assumptions of Theorem 8 hold. For k sufficiently large, Algorithm 1 terminates and \(\bar{x}\in S(g,h)\).

Proof

The proof follows from Theorem 8 and Remark 9. \(\square \)

In Algorithm 1, step 1 computes the radius \(L^{1/2}\) of the ball \(\overline{B( {0,L^{1/2}} )}\) which has non-empty intersection with S(gh). Then step 2 checks the flat extension condition and extracts the solution \(\bar{x}\).

Remark 10

At step 2 in Algorithm 1, for k sufficiently large, the rank of the moment matrix \({\hbox {rank}}( {{M_{k+w}}( y )} )\) decreases to one when t goes from 0 to n. Indeed, for each t between 0 and n, we replace the semialgebraic set S( gh ) by its intersection with the t spheres \({\partial {B( {{a_j},\xi _j^{1/2}} )} }\), \(j=0,\dots ,t-1\). This intersection includes the support of the measure with moments y. Since \(S( {g,h}) \cap \bigcap _{j = 0}^n {\partial B( {{a_j},\xi _j^{1/2}} )} = \{ {{x^\star }} \}\), this support converges to \(\{x^\star \}\) when t goes from 0 to n. Thus for large enough k, the solution y of SDP (4.33) with value \(\eta ^n_k\) has a representing measure supported on \({x^\star } = ( {{y_{{e_1}}},\dots ,{y_{{e_n}}}} )\). Here \(e_i\), \(i=1,\dots ,n\) is canonical basis of \({\mathbb {R}}^n\).

The decrease of the moment matrix rank in Algorithm 1 for the kissing number problem with \(g_1 = x_1^2 + x_2^2 + x_3^2 + x_4^2 -2x_1x_3-2x_2x_4-1\), \(h_1 = x_1^2 + x_2^2 - 1\) and \(h_2 = x_3^2 + x_4^2 - 1\) is illustrated in Table 1. Here \(e_i\), \(i=1,\dots ,4\) is the canonical basis of \({\mathbb {R}}^4\). In this example, \({\hbox {rank}}( {{M_1}( y )} )\) decreases from 5 to 1 when t goes from 0 to 4 and \(M_1(y)\) fulfills the flat extension condition at from \(t=3\).

Table 1 Decrease of the moment matrix rank in Algorithm 1

Remark 11

ASC can be used to find an approximation of a real point in S( gh ) even if S( gh) is positive dimensional. This is illustrated later on by our numerical experiments from Sect. 5 (see the polynomial systems corresponding to Id 6, 7, 8 and 13).

Obtaining a minimizer by using the ASC method We rely on the following algorithm to find the value \(\rho _k^i( \varepsilon )\) of SDP (4.26), which approximates \(f^\star =\min \{ {f( x ):\, x \in S( {g,h} )}\}\), together with an approximation \( \bar{x} \) of a minimizer \(x^\star \) for this problem.

Algorithm 2

PolyOpt

Input: f, \(S(g,h)\ne \emptyset \), \(\varepsilon >0\) and \(k\in {\mathbb {N}}\).

Output: \(\rho _k^i( \varepsilon )\) and \(\bar{x}\).

  1. 1.

    Solve SDP (4.26) to obtain \(\rho _k^i( \varepsilon )\).

  2. 2.

    Compute \(\bar{x}\) in \(S( {g\cup \{ {{\rho _k^i( \varepsilon )} - f}\},h })\) by using Algorithm 1 and stop.

Proposition 8

If POP \(f^\star :=\inf \{ {f( x ):\, x \in S( {g,h} )}\}\) admits an optimal solution at \(x^\star \), then for k large enough, Algorithm 2 terminates and \(f^\star \le \rho _k^i( \varepsilon ) \le f^\star + \varepsilon \theta (x^\star )^d\). Moreover, for k large enough, \(\bar{x} \in S( {g,h} )\) and \({f^\star }\le f( {\bar{x}} ) \le {f^\star }+\varepsilon \theta {( {{x^\star }} )^d}\) if the assumption of Theorem 8 holds for \(S( {g\cup \{ {{\rho _k^i( \varepsilon )} - f}\},h })\).

In practice, one performs Algorithm 2 several times by updating \(k:=k+1\) until one obtains \(\bar{x}\) in \(S( {g\cup \{ {{\rho _k^i( \varepsilon )} - f}\},h })\). Obviously, one has \(f^\star +\varepsilon \theta (x^\star )^d\ge \rho _k^i( \varepsilon )\ge f(\bar{x})\ge f^\star \).

5 Examples

In this section, we report results obtained after solving some instances of POP (4.24) with Algorithm 2. As before, let us note \(g = \{ {{g_1},\dots ,{g_m}} \}\) and \(h = \{ {h_1},\dots ,{h_l} \}\) the sets of polynomials involved in the inequality constraints and the equality constraints, respectively. In particular, the resulting set S(gh) is unbounded for all examples. The experiments are performed with both MATLAB R2018a/Yalmip and Julia 1.1.1/JuMP to model the semidefinite optimization problems and Mosek 8.0 to solve these problems. The codes for Algorithm 1 (PolySys) and Algorithm 2 (PolyOpt) can be downloaded from the link: https://github.com/maihoanganh. In these codes, we always set \(a_0:=0_{{\mathbb {R}}^n}\) and \(a_1,\dots ,a_n\) as the canonical basis of \({\mathbb {R}}^n\). We use a desktop computer with an Intel(R) Pentium(R) CPU N4200 @ 1.10 GHz and 4.00 GB of RAM. The input data given in Table 2 include examples of unconstrained and constrained POPs. The corresponding output data, the exact results and timings are given in Table 3. In these tables, the SOS hierarchy (4.26) is solved by optimization models in Yalmip (Y) and JuMP (J). The symbol “−” in a column entry indicates that the calculation did not finish in a couple of hours.

Table 2 Examples of POPs
Table 3 Numerical experiments with \(\varepsilon =10^{-5}\)

Id 1–5 are unconstrained POPs. Id 6–12 are POPs with inequality constrains, Id 13–18 are POPs with equality constraints and Id 19–25 are POPs with both inequality and equality constraints. Id 8, 11 and 12 correspond to examples from Jeyakumar et al. [19, 20]. Id 9 and 10 are selected from Demmel et al. [10]. Id 13-17 come from Greuet et al. [14]. Id 23, 24 and 25 are POPs constructed from some inequalities issued from Mathematics competitions mentioned in [12, 57], yielding non-compact POPs with known optimal values and optimizers.

Even though the sets of minimizers associated to Id 6, 7, 8 and 13 are positive dimensional, we can still extract a single approximate optimal solution by using our ASC algorithm. Note that ASC computes a real point \(\bar{x}\) in \(S( {g\cup \{ {{\rho _k^i( \varepsilon )} - f}\},h })\) which is an outer approximation of \(\{x^\star \in S(g,h): f(x^\star )=f^\star \}\) for k sufficiently large.

In Table 3, Algorithm 2 terminates at some order \(k\le 5\) for all POPs except Id 16. Note that for Id 16, the global minimum does not satisfy the KKT conditions. Thus the method of Demmel et al. [10, 41] and Nie’s [40] cannot be used to solve this POP. Moreover, the convergence rate of \((\rho _k^i( \varepsilon ))_{k\in {\mathbb {N}}}\) in Id 16 is very poor when \(\varepsilon \le 10^{-5}\). We overcome this issue by fixing k, multiplying \(\varepsilon \) by 10, and solving again the relaxations. The computational cost that we must pay here is due to the largest gap \(\varepsilon \, \theta {( {{x^\star }} )^d}\) between \(\rho _k^i( \varepsilon )\) and \(f^\star \). This behavior is illustrated in Table 4.

Table 4 Numerical experiments for Id 16 with various values of \(\varepsilon \)

In Id 18, even if the ideal \(\langle h \rangle \) is not radical and V(h) is not equidimensional (the assumptions required to apply the framework in [14] are not guaranteed) our ASC method can still extract one solution of the problem.

For Id 21, we can improve the quality of the approximation \(\rho _k^i( \varepsilon )\) of the optimal value \(f^\star \) by fixing \(k=1\), dividing \(\varepsilon \) by 10, and solving again the relaxations. This is illustrated in Table 5.

Table 5 Numerical experiments for Id 21 with various values of \(\varepsilon \)

We emphasize that we can customize the \(\varepsilon \) parameter for different purposes. On the one hand, one increases \(\varepsilon \) to improve the convergence speed of the sequence \((\rho ^i_k(\varepsilon ))_{k\in {\mathbb {N}}}\) to the neighborhood \([f^\star ,f^\star +\varepsilon \theta (x^\star )^d]\) of \(f^\star \) (see Table 4). On the other hand, one decreases \(\varepsilon \) to improve the accuracy of the approximate optimal value \(\rho ^i_k(\varepsilon )\) and the approximate optimal solution \(\bar{x}\) (see Table 5).

Our numerical benchmarks also show that modeling in JuMP is faster and provides more accurate outputs than modeling in Yalmip. In particular, the JuMP implementation is the only one which provides solutions for Id 11, 12, 17 and 23.

Let us now denote by \(k_\varepsilon \) the smallest nonnegative integer such that \(\rho _{k_\varepsilon }^i(\varepsilon )\ge f^\star \), for each \(\varepsilon >0\). The graph of the function \(\varepsilon ^{-1}\mapsto k_\varepsilon \) on (0, 100] for Id 9 and Id 16 is illustrated in Fig. 2. Here Id 9 (resp. Id 16) is an example of POP such that the global minimums satisfy the KKT condition (resp. do not satisfy the KKT condition). We can experimentally compare the complexity of Algorithm 2 in both cases. For Id 9, the function seems to increase as slowly as a constant function, which is in deep contrast with Id 16, where the function increases more quickly and seems to have a step-wise linear growth. Theorem 7 states that \(k_\varepsilon \) has an upper bound which is exponential in \(\varepsilon ^{-1}\), i.e., \(k_\varepsilon \le C \exp \left( O(\varepsilon ^{-1})^C \right) -d\) as \(\varepsilon ^{-1} \rightarrow \infty \). This is an open question to guarantee that \(k_\varepsilon \le O(\varepsilon ^{-N})\) as \(\varepsilon \downarrow 0\) for some \(N>0\) in the constrained case. Another open questions are whether and how the KKT condition affects the convergence rate of \((\rho ^i_k(\varepsilon ))_{k}\).

Fig. 2
figure 2

Plot of the complexity

6 Conclusion

In this paper, we have established new proofs for two representations of globally nonnegative polynomials and polynomials nonnegative on semialgebraic sets based on the homogeneous representations in [47, 49]. Then we rely on these representations to convert them into a practical numerical scheme for approximating the global minimum. We provide converging hierarchies of semidefinite relaxations for unconstrained and constrained polynomial optimization problems.

We have also introduced a method based on adding spherical constraints (ASC) to solve systems of polynomial equalities and inequalities, and to obtain global solutions of polynomial optimization problems as well.

In view of the practical efficiency of ASC, a topic of further investigation is to provide a more detailed comparison with other methods for solving polynomial systems. Another direction of research is to derive a sparse variant of Putinar–Vasilescu’s Positivstellensatz in order to improve the scalability of our optimization framework. For this, we cannot directly rely on existing sparse polynomial optimization techniques by [61] since the perturbation term is not sparse. One possible workaround is to rely on a sparse variant of Reznick’s Positivstellensatz [35], involving sparse uniform denominators.

We currently need the assumption that the infimum of a given POP is attained to guarantee convergence of our hierarchy of relaxations to a value in a neighborhood of this infimum. It raises the open question of convergence to the infimum in the case where it is not attained.