1 Introduction

A general class of optimization problems is

$$\begin{aligned} \left\{ \begin{array}{rl} f_{\min } := \min &{} f(x) \\ s.t. &{} c_i(x) = 0 \, ( i \in \mathcal {E} ), \\ &{} c_j(x) \ge 0 \, ( j \in \mathcal {I} ), \end{array} \right. \end{aligned}$$
(1.1)

where f and all \(c_i, c_j\) are polynomials in \(x :=(x_1, \ldots , x_n)\), the real decision variable. The \(\mathcal {E}\) and \(\mathcal {I}\) are two disjoint finite index sets of constraining polynomials. Lasserre’s relaxations [17] are generally used for solving (1.1) globally, i.e., to find the global minimum value \(f_{\min }\) and minimizer(s) if any. The convergence of Lasserre’s relaxations is related to optimality conditions.

1.1 Optimality conditions

A general introduction of optimality conditions in nonlinear programming can be found in [1, Section 3.3]. Let u be a local minimizer of (1.1). Denote the index set of active constraints

$$\begin{aligned} J(u) := \{i \in \mathcal {E}\cup \mathcal {I} \, \mid \, c_i(u) = 0\}. \end{aligned}$$
(1.2)

If the constraint qualification condition (CQC) holds at u, i.e., the gradients \(\nabla c_i(u)\)   \((i \in J(u))\) are linearly independent (\(\nabla \) denotes the gradient), then there exist Lagrange multipliers \(\lambda _i \, (i \in \mathcal {E} \cup \mathcal {I})\) satisfying

$$\begin{aligned} \nabla f(u)= & {} \sum _{ i \in \mathcal {E} \cup \mathcal {I} } \lambda _i \nabla c_i(u), \end{aligned}$$
(1.3)
$$\begin{aligned} c_i(u)= & {} 0 \, (i \in \mathcal {E}), \quad \lambda _j c_j(u) = 0 \, (j \in \mathcal {I}), \end{aligned}$$
(1.4)
$$\begin{aligned} c_j(u)\ge & {} 0 \, (j \in \mathcal {I}), \quad \lambda _j \ge 0 \, (j \in \mathcal {I}). \end{aligned}$$
(1.5)

The second equation in (1.4) is called the complementarity condition. If \(\lambda _j + c_j(u) >0\) for all \(j \in \mathcal {I}\), the strict complementarity condition (SCC) is said to hold. For the \(\lambda _i\)’s satisfying (1.3)–(1.5), the associated Lagrange function is

$$\begin{aligned} \mathscr {L}(x) := f(x) - \sum _{ i \in \mathcal {E} \cup \mathcal {I} } \lambda _i c_i(x). \end{aligned}$$

Under the constraint qualification condition, the second order necessary condition (SONC) holds at u, i.e., (\(\nabla ^2\) denotes the Hessian)

$$\begin{aligned} v^T \Big ( \nabla ^2 \mathscr {L}(u) \Big ) v \ge 0 \quad \hbox { for all } \, v \in \bigcap _{i\in J(u)} \nabla c_i(u)^\perp . \end{aligned}$$
(1.6)

Here, \(\nabla c_i(u)^\perp \) is the orthogonal complement of \(\nabla c_i(u)\). If it further holds that

$$\begin{aligned} v^T \Big ( \nabla ^2 \mathscr {L}(u) \Big ) v > 0 \quad \hbox { for all } \, 0 \ne v \in \bigcap _{i\in J(u)} \nabla c_i(u)^\perp , \end{aligned}$$
(1.7)

then the second order sufficient condition (SOSC) is said to hold. If the constraint qualification condition holds at u, then (1.3), (1.4) and (1.6) are necessary conditions for u to be a local minimizer. If (1.3), (1.4), (1.7) and the strict complementarity condition hold, then u is a strict local minimizer.

1.2 Some existing work

Under the archimedean condition (see Sect. 2), the hierarchy of Lasserre’s relaxations converges asymptotically [17]. Moreover, in addition to the archimedeanness, if the constraint qualification, strict complementarity, and second order sufficient conditions hold at every global minimizer, then the Lasserre’s hierarchy converges in finitely many steps [33]. For convex polynomial optimization, the Lasserre’s hierarchy has finite convergence under the strict convexity or sos-convexity condition [7, 20]. For unconstrained polynomial optimization, the standard sum of squares relaxation was proposed in [35]. When the equality constraints define a finite set, the Lasserre’s hierarchy also has finite convergence, as shown in [18, 24, 31]. Recently, a bounded degree hierarchy of relaxations was proposed for solving polynomial optimization [23]. General introductions to polynomial optimization and moment problems can be found in the books and surveys [21, 22, 25, 26, 39]. Lasserre’s relaxations provide lower bounds for the minimum value. There also exist methods that compute upper bounds [8, 19]. A convergence rate analysis for such upper bounds is given in [9, 10]. When a polynomial optimization problem does not have minimizers (i.e., the infimum is not achievable), there are relaxation methods for computing the infimum [38, 42].

A new type of Lasserre’s relaxations, based on Jacobian representations, were recently proposed in [30]. The hierarchy of such relaxations always has finite convergence, when the tuple of constraining polynomials is nonsingular (i.e., at every point in \(\mathbb {C}^n\), the gradients of active constraining polynomial are linearly independent; see Definition 5.1). When there are only equality constraints \(c_1(x)=\cdots = c_m(x)=0\), the method needs the maximal minors of the matrix

$$\begin{aligned} \begin{bmatrix} \nabla f(x)&\quad \nabla c_1(x)&\quad \cdots&\quad \nabla c_m(x) \end{bmatrix}. \end{aligned}$$

When there are inequality constraints, it requires to enumerate all possibilities of active constraints. The method in [30] is expensive when there are a lot of constraints. For unconstrained optimization, it is reduced to the gradient sum of squares relaxations in [27].

1.3 New contributions

When Lasserre’s relaxations are used to solve polynomial optimization, the following issues are typically of concerns:

  • The convergence depends on the archimedean condition (see Sect. 2), which is satisfied only if the feasible set is compact. If the set is noncompact, how can we get convergent relaxations?

  • The cost of Lasserre’s relaxations depends significantly on the relaxation order. For a fixed order, can we construct tighter relaxations than the standard ones?

  • When the convergence of Lasserre’s relaxations is slow, can we construct new relaxations whose convergence is faster?

  • When the optimality conditions fail to hold, the Lasserre’s hierarchy might not have finite convergence. Can we construct a new hierarchy of stronger relaxations that also has finite convergence for such cases?

This paper addresses the above concerns. We construct tighter relaxations by using optimality conditions. In (1.3)–(1.4), under the constraint qualification condition, the Lagrange multipliers \(\lambda _i\) are uniquely determined by u. Consider the polynomial system in \((x,\lambda )\):

$$\begin{aligned} \sum _{ i \in \mathcal {E} \cup \mathcal {I} } \lambda _i \nabla c_i(x) = \nabla f(x), \quad c_i(x) = 0 \, (i \in \mathcal {E}), \quad \lambda _j c_j(x) = 0 \, (j \in \mathcal {I}). \end{aligned}$$
(1.8)

A point x satisfying (1.8) is called a critical point, and such \((x,\lambda )\) is called a critical pair. In (1.8), once x is known, \(\lambda \) can be determined by linear equations. Generally, the value of x is not known. One can try to express \(\lambda \) as a rational function in x. Suppose \(\mathcal {E}\cup \mathcal {I} = \{1,\ldots , m\}\) and denote

$$\begin{aligned} G(x) := \begin{bmatrix} \nabla c_1(x)&\quad \cdots&\quad \nabla c_m(x) \end{bmatrix}. \end{aligned}$$

When \(m\le n\) and \(\text{ rank }\,G(x)=m\), we can get the rational expression

$$\begin{aligned} \lambda = \big ( G(x)^TG(x) \big )^{-1} G(x)^T \nabla f(x). \end{aligned}$$
(1.9)

Typically, the matrix inverse \(\big ( G(x)^TG(x) \big )^{-1}\) is expensive for usage. The denominator \(\det \big ( G(x)^TG(x) \big )\) is typically a high degree polynomial. When \(m>n\), \(G(x)^TG(x)\) is always singular and we cannot express \(\lambda \) as in (1.9).

Do there exist polynomials \(p_i\) (\(i \in \mathcal {E} \cup \mathcal {I}\)) such that each

$$\begin{aligned} \lambda _i = p_i(x) \end{aligned}$$
(1.10)

for all \((x,\lambda )\) satisfying (1.8)? If they exist, then we can do:

  • The polynomial system (1.8) can be simplified to

    $$\begin{aligned} \sum _{ i \in \mathcal {E} \cup \mathcal {I} } p_i(x) \nabla c_i(x) = \nabla f(x), \quad c_i(x) = 0 (i \in \mathcal {E}), \quad p_j(x) c_j(x) = 0 (j \in \mathcal {I}). \end{aligned}$$
    (1.11)
  • For each \(j \in \mathcal {I}\), the sign condition \(\lambda _j \ge 0\) is equivalent to

    $$\begin{aligned} p_j(x) \ge 0. \end{aligned}$$
    (1.12)

The new conditions (1.11) and (1.12) are only about the variable x, not \(\lambda \). They can be used to construct tighter relaxations for solving (1.1).

When do there exist polynomials \(p_i\) satisfying (1.10)? If they exist, how can we compute them? How can we use them to construct tighter relaxations? Do the new relaxations have advantages over the old ones? These questions are the main topics of this paper. Our major results are:

  • We show that the polynomials \(p_i\) satisfying (1.10) always exist when the tuple of constraining polynomials is nonsingular (see Definition 5.1). Moreover, they can be determined by linear equations.

  • Using the new conditions (1.11)–(1.12), we can construct tight relaxations for solving (1.1). To be more precise, we construct a hierarchy of new relaxations, which has finite convergence. This is true even if the feasible set is noncompact and/or the optimality conditions fail to hold.

  • For every relaxation order, the new relaxations are tighter than the standard ones in the prior work.

The paper is organized as follows. Section 2 reviews some basics in polynomial optimization. Section 3 constructs new relaxations and proves their tightness. Section 4 characterizes when the polynomials \(p_i\)’s satisfying (1.10) exist and shows how to determine them, for polyhedral constraints. Section 5 discusses the case of general nonlinear constraints. Section 6 gives examples of using the new relaxations. Section 7 discusses some related issues.

2 Preliminaries

Notation The symbol \(\mathbb {N}\) (resp., \(\mathbb {R}\), \(\mathbb {C}\)) denotes the set of nonnegative integral (resp., real, complex) numbers. The symbol \(\mathbb {R}[x] := \mathbb {R}[x_1,\ldots ,x_n]\) denotes the ring of polynomials in \(x:=(x_1,\ldots ,x_n)\) with real coefficients. The \(\mathbb {R}[x]_d\) stands for the set of real polynomials with degrees \(\le d\). Denote

$$\begin{aligned} \mathbb {N}^n_d :=\{ \alpha := (\alpha _1, \ldots , \alpha _n) \in \mathbb {N}^n \mid |\alpha |:=\alpha _1+\cdots +\alpha _n \le d \}. \end{aligned}$$

For a polynomial p, \(\deg (p)\) denotes its total degree. For \(t \in \mathbb {R}\), \(\lceil t \rceil \) denotes the smallest integer \(\ge t\). For an integer \(k>0\), denote \( [k]:= \{1,2,\ldots , k\}. \) For \(x =(x_1,\ldots , x_n)\) and \(\alpha = (\alpha _1, \ldots , \alpha _n)\), denote

$$\begin{aligned} x^\alpha := x_1^{\alpha _1} \cdots x_n^{\alpha _n}, \quad [x]_{d} := \begin{bmatrix} 1&x_1&\cdots&x_n&x_1^2&x_1x_2&\cdots&x_n^{d}\end{bmatrix}^T. \end{aligned}$$

The superscript \(^T\) denotes the transpose of a matrix/vector. The \(e_i\) denotes the ith standard unit vector, while e denotes the vector of all ones. The \(I_m\) denotes the m-by-m identity matrix. By writing \(X\succeq 0\) (resp., \(X\succ 0\)), we mean that X is a symmetric positive semidefinite (resp., positive definite) matrix. For matrices \(X_1,\ldots , X_r\), \(\text{ diag }(X_1, \ldots , X_r)\) denotes the block diagonal matrix whose diagonal blocks are \(X_1,\ldots , X_r\). In particular, for a vector a, \(\text{ diag }(a)\) denotes the diagonal matrix whose diagonal vector is a. For a function f in x, \(f_{x_i}\) denotes its partial derivative with respect to \(x_i\).

We review some basics in computational algebra and polynomial optimization. They could be found in [4, 21, 22, 25, 26]. An ideal I of \(\mathbb {R}[x]\) is a subset such that \( I \cdot \mathbb {R}[x] \subseteq I\) and \(I+I \subseteq I\). For a tuple \(h := (h_1,\ldots ,h_m)\) of polynomials, \(\hbox {Ideal}(h)\) denotes the smallest ideal containing all \(h_i\), which is the set

$$\begin{aligned} h_1 \cdot \mathbb {R}[x] + \cdots + h_m \cdot \mathbb {R}[x]. \end{aligned}$$

The 2kth truncation of \(\hbox {Ideal}(h)\) is the set

$$\begin{aligned} \hbox {Ideal}(h)_{2k} := h_1 \cdot \mathbb {R}[x]_{2k-\deg (h_1)} + \cdots + h_m \cdot \mathbb {R}[x]_{2k-\deg (h_m)}. \end{aligned}$$

The truncation \(\hbox {Ideal}(h)_{2k}\) depends on the generators \(h_1,\ldots , h_m\). For an ideal I, its complex and real varieties are respectively defined as

$$\begin{aligned} \mathcal {V}_{\mathbb {C}}(I) := \{v\in \mathbb {C}^n \mid \, p(v) = 0 \, \forall \, p \in I \}, \quad \mathcal {V}_{\mathbb {R}}(I) := \mathcal {V}_{\mathbb {C}}(I) \cap \mathbb {R}^n. \end{aligned}$$

A polynomial \(\sigma \) is said to be a sum of squares (SOS) if \(\sigma = s_1^2+\cdots + s_k^2\) for some polynomials \(s_1,\ldots , s_k \in \mathbb {R}[x]\). The set of all SOS polynomials in x is denoted as \(\varSigma [x]\). For a degree d, denote the truncation

$$\begin{aligned} \varSigma [x]_d := \varSigma [x] \cap \mathbb {R}[x]_d. \end{aligned}$$

For a tuple \(g=(g_1,\ldots ,g_t)\), its quadratic module is the set

$$\begin{aligned} \hbox {Qmod}(g):= \varSigma [x] + g_1 \cdot \varSigma [x] + \cdots + g_t \cdot \varSigma [x]. \end{aligned}$$

The 2kth truncation of \(\hbox {Qmod}(g)\) is the set

$$\begin{aligned} \hbox {Qmod}(g)_{2k} := \varSigma [x]_{2k} + g_1 \cdot \varSigma [x]_{2k - \deg (g_1)} + \cdots + g_t \cdot \varSigma [x]_{2k - \deg (g_t)}. \end{aligned}$$

The truncation \(\hbox {Qmod}(g)_{2k}\) depends on the generators \(g_1,\ldots , g_t\). Denote

$$\begin{aligned} \left\{ \begin{array}{l} \hbox {IQ}(h,g):= \hbox {Ideal}(h) + \hbox {Qmod}(g), \\ \hbox {IQ}(h,g)_{2k}:=\hbox {Ideal}(h)_{2k} + \hbox {Qmod}(g)_{2k}. \end{array} \right. \end{aligned}$$
(2.1)

The set \(\hbox {IQ}(h,g)\) is said to be archimedean if there exists \(p \in \hbox {IQ}(h,g)\) such that \(p(x) \ge 0\) defines a compact set in \(\mathbb {R}^n\). If \(\hbox {IQ}(h,g)\) is archimedean, then

$$\begin{aligned} K := \{x \in \mathbb {R}^n \mid h(x)=0, g(x)\ge 0 \} \end{aligned}$$

must be a compact set. Conversely, if K is compact, say, \(K \subseteq B(0,R)\) (the ball centered at 0 with radius R), then \(\hbox {IQ}(h, (g, R^2-x^Tx) )\) is always archimedean and \(h=0, \, (g, R^2-x^Tx) \ge 0\) define the same set K.

Theorem 2.1

(Putinar [36]) Let hg be tuples of polynomials in \(\mathbb {R}[x]\). Let K be as above. Assume \(\hbox {IQ}(h,g)\) is archimedean. If a polynomial \(f\in \mathbb {R}[x]\) is positive on K, then \(f\in \hbox {IQ}(h,g)\).

Interestingly, if f is only nonnegative on K but standard optimality conditions hold (see Sect. 1.1), then we still have \(f \in \hbox {IQ}(h,g)\) [33].

Let \(\mathbb {R}^{\mathbb {N}_d^n}\) be the space of real multi-sequences indexed by \(\alpha \in \mathbb {N}^n_d\). A vector in \(\mathbb {R}^{\mathbb {N}_d^n}\) is called a truncated multi-sequence (tms) of degree d. A tms \(y :=(y_\alpha )_{ \alpha \in \mathbb {N}_d^n}\) gives the Riesz functional \(\mathscr {R}_y\) acting on \(\mathbb {R}[x]_d\) as

$$\begin{aligned} \mathscr {R}_y\Big (\sum _{\alpha \in \mathbb {N}_d^n} f_\alpha x^\alpha \Big ) := \sum _{\alpha \in \mathbb {N}_d^n} f_\alpha y_\alpha . \end{aligned}$$
(2.2)

For \(f \in \mathbb {R}[x]_d\) and \(y \in \mathbb {R}^{\mathbb {N}_d^n}\), we denote

$$\begin{aligned} \langle f, y \rangle := \mathscr {R}_y(f). \end{aligned}$$
(2.3)

Let \(q \in \mathbb {R}[x]_{2k}\). The kth localizing matrix of q, generated by \(y \in \mathbb {R}^{\mathbb {N}^n_{2k}}\), is the symmetric matrix \(L_q^{(k)}(y)\) such that

$$\begin{aligned} vec(a_1)^T \Big ( L_q^{(k)}(y) \Big ) vec(a_2) = \mathscr {R}_y(q a_1 a_2) \end{aligned}$$
(2.4)

for all \(a_1,a_2 \in \mathbb {R}[x]_{k - \lceil \deg (q)/2 \rceil }\). (The \(vec(a_i)\) denotes the coefficient vector of \(a_i\).) When \(q = 1\), \(L_q^{(k)}(y)\) is called a moment matrix and we denote

$$\begin{aligned} M_k(y):= L_{1}^{(k)}(y). \end{aligned}$$
(2.5)

The columns and rows of \(L_q^{(k)}(y)\), as well as \(M_k(y)\), are indexed by \(\alpha \in \mathbb {N}^n\) with \(2|\alpha | + \deg (q) \le 2k\). When \(q = (q_1, \ldots , q_r)\) is a tuple of polynomials, we define

$$\begin{aligned} L_q^{(k)}(y) := \hbox {diag}\Big ( L_{q_1}^{(k)}(y), \ldots , L_{q_r}^{(k)}(y) \Big ), \end{aligned}$$
(2.6)

a block diagonal matrix. For the polynomial tuples hg as above, the set

$$\begin{aligned} \mathscr {S}(h,g)_{2k} := \Big \{\left. y \in \mathbb {R}^{ \mathbb {N}_n^{2k} } \right| L_h^{(k)}(y) =0, \, L_g^{(k)}(y) \succeq 0 \Big \} \end{aligned}$$
(2.7)

is a spectrahedral cone in \(\mathbb {R}^{ \mathbb {N}_n^{2k} }\). The set \(\hbox {IQ}(h,g)_{2k}\) is also a convex cone in \(\mathbb {R}[x]_{2k}\). The dual cone of \(\hbox {IQ}(h,g)_{2k}\) is precisely \(\mathscr {S}(h,g)_{2k}\) [22, 25, 34]. This is because \(\langle p, y \rangle \ge 0\) for all \(p \in \hbox {IQ}(h,g)_{2k}\) and for all \(y \in \mathscr {S}(h,g)_{2k}\).

3 The construction of tight relaxations

Consider the polynomial optimization problem (1.1). Let

$$\begin{aligned} \lambda :=(\lambda _i)_{ i \in \mathcal {E}\cup \mathcal {I} } \end{aligned}$$

be the vector of Lagrange multipliers. Denote the critical set

$$\begin{aligned} \mathcal {K} := \left\{ (x, \lambda ) \in \mathbb {R}^n \times \mathbb {R}^{ \mathcal {E} \cup \mathcal {I} } \left| \begin{array}{c} c_i(x) = 0 (i \in \mathcal {E}), \, \lambda _j c_j(x) = 0\, (j \in \mathcal {I}) \\ \nabla f(x) = \sum \limits _{ i \in \mathcal {E} \cup \mathcal {I} } \lambda _i \nabla c_i(x) \end{array} \right. \right\} . \end{aligned}$$
(3.1)

Each point in \(\mathcal {K}\) is called a critical pair. The projection

$$\begin{aligned} \mathcal {K}_c := \{ u \mid (u, \lambda ) \in \mathcal {K} \} \end{aligned}$$
(3.2)

is the set of all real critical points. To construct tight relaxations for solving (1.1), we need the following assumption for Lagrange multipliers.

Assumption 3.1

For each \(i \in \mathcal {E} \cup \mathcal {I}\), there exists a polynomial \(p_i \in \mathbb {R}[x]\) such that for all \((x,\lambda ) \in \mathcal {K}\) it holds that

$$\begin{aligned} \lambda _i \, = \, p_i(x). \end{aligned}$$

Assumption 3.1 is generically satisfied, as shown in Proposition 5.7. For the following special cases, we can get polynomials \(p_i\) explicitly.

  • (Simplex) For the simplex \(\{e^Tx-1 =0, \, x_1 \ge 0, \ldots , x_n \ge 0\}\), it corresponds to that \(\mathcal {E} = \{0\}\), \(\mathcal {I} = [n]\), \(c_0(x)=e^Tx-1, \, c_j(x)=x_j \, (j\in [n]).\) The Lagrange multipliers can be expressed as

    $$\begin{aligned} \lambda _0 = x^T \nabla f(x), \quad \lambda _j = f_{x_j}-x^T\nabla f(x) \quad (j \in [n]). \end{aligned}$$
    (3.3)
  • (Hypercube) For the hypercube \([-1,1]^n\), it corresponds to that \(\mathcal {E} = \emptyset \), \(\mathcal {I} = [n]\) and each \(c_j(x)= 1-x_j^2\). We can show that

    $$\begin{aligned} \lambda _j = - \frac{1}{2}x_j f_{x_j} \quad (j\in [n]). \end{aligned}$$
    (3.4)
  • (Ball or sphere) The constraint is \(1-x^Tx=0\) or \(1-x^Tx\ge 0\). It corresponds to that \(\mathcal {E} \cup \mathcal {I} = \{1\}\) and \(c_1 = 1-x^Tx\). We have

    $$\begin{aligned} \lambda _1 = -\frac{1}{2}x^T \nabla f(x). \end{aligned}$$
    (3.5)
  • (Triangular constraints) Suppose \(\mathcal {E} \cup \mathcal {I} =\{1, \ldots , m\} \) and each

    $$\begin{aligned} c_i(x) = \tau _i x_i + q_i(x_{i+1}, \ldots , x_n) \end{aligned}$$

    for some polynomials \(q_i \in \mathbb {R}[x_{i+1}, \ldots , x_n]\) and scalars \(\tau _i \ne 0\). The matrix T(x), consisting of the first m rows of \([ \nabla c_1(x), \ldots , \nabla c_m(x) ],\) is an invertible lower triangular matrix with constant diagonal entries. Then,

    $$\begin{aligned} \lambda = T(x)^{-1} \cdot \begin{bmatrix} f_{x_1}&\quad \cdots&\quad f_{x_m} \end{bmatrix}^T. \end{aligned}$$

    Note that the inverse \(T(x)^{-1}\) is a matrix polynomial.

For more general constraints, we can also express \(\lambda \) as a polynomial function in x on the set \(\mathcal {K}_c\). This will be discussed in Sect. 4 and Sect. 5.

For the polynomials \(p_i\) as in Assumption 3.1, denote

$$\begin{aligned} \phi \, := \Big ( \nabla f - \sum \limits _{ i \in \mathcal {E} \cup \mathcal {I} } p_i \nabla c_i, \,\, \big ( p_j c_j \big )_{ j \in \mathcal {I} } \Big ), \quad \psi \, := \big ( p_j \big )_{ j \in \mathcal {I} }. \end{aligned}$$
(3.6)

When the minimum value \(f_{\min }\) of (1.1) is achieved at a critical point, (1.1) is equivalent to the problem

$$\begin{aligned} \left\{ \begin{array}{rl} f_c := \min &{} f(x) \\ s.t. &{} c_{eq}(x) = 0, \, c_{in}(x) \ge 0, \\ &{} \phi (x) = 0, \, \psi (x) \ge 0. \end{array} \right. \end{aligned}$$
(3.7)

We apply Lasserre’s relaxations to solve it. For an integer \(k>0\) (called the relaxation order), the kth order Lasserre’s relaxation for (3.7) is

$$\begin{aligned} \left\{ \begin{array}{rl} f_k^\prime := \min &{} \langle f, y \rangle \\ s.t. &{} \langle 1, y \rangle = 1, M_k(y) \succeq 0 \\ &{} L_{c_{eq}}^{(k)}(y) = 0, \, L_{c_{in}}^{(k)}(y) \succeq 0, \\ &{} L_{\phi }^{(k)}(y) = 0, \, L_{\psi }^{(k)}(y) \succeq 0, \, y \in \mathbb {R}^{ \mathbb {N}_{2k}^n }. \end{array} \right. \end{aligned}$$
(3.8)

Since \(x^0 = 1\) (the constant one polynomial), the condition \(\langle 1, y \rangle = 1\) means that \((y)_0=1\). The dual optimization problem of (3.8) is

$$\begin{aligned} \left\{ \begin{array}{rl} f_k := \, \max &{} \gamma \\ s.t. &{} f-\gamma \in \hbox {IQ}(c_{eq},c_{in})_{2k} + \hbox {IQ}(\phi ,\psi )_{2k}. \end{array} \right. \end{aligned}$$
(3.9)

We refer to Sect. 2 for the notation used in (3.8)–(3.9). They are equivalent to semidefinite programs (SDPs), so they can be solved by SDP solvers (e.g., SeDuMi [40]). For \(k=1,2,\ldots \), we get a hierarchy of Lasserre’s relaxations. In (3.8)–(3.9), if we remove the usage of \(\phi \) and \(\psi \), they are reduced to standard Lasserre’s relaxations in [17]. So, (3.8)–(3.9) are stronger relaxations.

By the construction of \(\phi \) as in (3.6), Assumption 3.1 implies that

$$\begin{aligned} \mathcal {K}_c = \{ u \in \mathbb {R}^n :c_{eq}(u)=0, \quad \phi (u)=0 \}. \end{aligned}$$

By Lemma 3.3 of [6], f achieves only finitely many values on \(\mathcal {K}_c\), say,

$$\begin{aligned} v_1< \cdots < v_N. \end{aligned}$$
(3.10)

A point \(u \in \mathcal {K}_c\) might not be feasible for (3.7), i.e., it is possible that \(c_{in}(u) \not \ge 0\) or \(\psi (u) \not \ge 0\). In applications, we are often interested in the optimal value \(f_c\) of (3.7). When (3.7) is infeasible, by convention, we set

$$\begin{aligned} f_c = +\,\infty . \end{aligned}$$

When the optimal value \(f_{\min }\) of (1.1) is achieved at a critical point, \(f_c = f_{\min }\). This is the case if the feasible set is compact, or if f is coercive (i.e., for each \(\ell \), the sublevel set \(\{f(x) \le \ell \}\) is compact), and the constraint qualification condition holds. As in [17], one can show that

$$\begin{aligned} f_k \le f_k^\prime \le f_c \end{aligned}$$
(3.11)

for all k. Moreover, \(\{ f_k \}\) and \(\{ f_k^\prime \}\) are both monotonically increasing. If for some order k it occurs that

$$\begin{aligned} f_k = f_k^\prime = f_c,\end{aligned}$$

then the kth order Lasserre’s relaxation is said to be tight (or exact).

3.1 Tightness of the relaxations

Let \(c_{in},\psi , \mathcal {K}_c,f_c\) be as above. We refer to Sect. 2 for the notation \(\hbox {Qmod}\,(c_{in},\psi )\). We begin with a general assumption.

Assumption 3.2

There exists \(\rho \in \hbox {Qmod}(c_{in},\psi )\) such that if \(u\in \mathcal {K}_c\) and \(f(u) < f_c\), then \(\rho (u) < 0\).

In Assumption 3.2, the hypersurface \(\rho (x) = 0\) separates feasible and infeasible critical points. Clearly, if \(u\in \mathcal {K}_c\) is a feasible point for (3.7), then \(c_{in}(u) \ge 0\) and \(\psi (u) \ge 0\), and hence \(\rho (u) \ge 0\). Assumption 3.2 generally holds. For instance, it is satisfied for the following general cases.

  1. (a)

    When there are no inequality constraints, \(c_{in}\) and \(\psi \) are empty tuples. Then, \(\hbox {Qmod}(c_{in},\psi ) = \varSigma [x]\) and Assumption 3.2 is satisfied for \(\rho =0\).

  2. (b)

    Suppose the set \(\mathcal {K}_c\) is finite, say, \(\mathcal {K}_c = \{ u_1, \ldots , u_D \}\), and

    $$\begin{aligned} f(u_1), \ldots , f(u_{t-1}) < f_c \le f(u_{t}), \ldots , f(u_D). \end{aligned}$$

    Let \(\ell _1,\ldots , \ell _D\) be real interpolating polynomials such that \(\ell _i(u_j) = 1\) for \(i=j\) and \(\ell _i(u_j) = 0\) for \( i \ne j\). For each \(i = 1, \ldots , t{-}1\), there must exist \(j_i \in \mathcal {I}\) such that \(c_{j_i}(u_i) < 0\). Then, the polynomial

    $$\begin{aligned} \rho := \sum _{i < t} \frac{-1}{c_{j_i}(u_i)} c_{j_i}(x) \ell _i(x)^2 + \sum _{i \ge t} \ell _i(x)^2 \end{aligned}$$
    (3.12)

    satisfies Assumption 3.2.

  3. (c)

    For each x with \(f(x)=v_i<f_c\), at least one of the constraints \(c_j(x)\ge 0, p_j(x) \ge 0 (j \in \mathcal {I})\) is violated. Suppose for each critical value \(v_i < f_c\), there exists \(g_i \in \{ c_j, p_j \}_{j \in \mathcal {I} }\) such that

    $$\begin{aligned} g_i < 0 \quad \hbox { on } \quad \mathcal {K}_c \cap \{ f(x) = v_i \}. \end{aligned}$$

    Let \(\varphi _1, \ldots , \varphi _N\) be real univariate polynomials such that \(\varphi _i( v_j) = 0\) for \(i \ne j\) and \(\varphi _i( v_j) = 1\) for \(i = j\). Suppose \(v_t = f_c\). Then, the polynomial

    $$\begin{aligned} \rho := \sum _{ i < t } g_i(x) \big ( \varphi _i(f(x)) \big )^2 + \sum _{ i \ge t } \big ( \varphi _i(f(x)) \big )^2 \end{aligned}$$
    (3.13)

    satisfies Assumption 3.2.

We refer to Sect. 2 for the archimedean condition and the notation \(\hbox {IQ}(h,g)\) as in (2.1). The following is about the convergence of relaxations (3.8)–(3.9).

Theorem 3.3

Suppose \(\mathcal {K}_c \ne \emptyset \) and Assumption 3.1 holds. If

  1. (i)

    \(\hbox {IQ}(c_{eq},c_{in}) + \hbox {IQ}(\phi , \psi )\) is archimedean, or

  2. (ii)

    \(\hbox {IQ}(c_{eq},c_{in}) \) is archimedean, or

  3. (iii)

    Assumption 3.2 holds,

then \(f_k = f_k^\prime = f_c\) for all k sufficiently large. Therefore, if the minimum value \(f_{\min }\) of (1.1) is achieved at a critical point, then \(f_k = f_k^\prime = f_{\min }\) for all k big enough if one of the conditions (i)–(iii) is satisfied.

Remark

In Theorem 3.3, the conclusion holds if any of conditions (i)–(iii) is satisfied. The condition (ii) is only about constraining polynomials of (1.1). It can be checked without \(\phi ,\psi \). Clearly, the condition (ii) implies the condition (i).

The proof for Theorem 3.3 is given in the following. The main idea is to consider the set of critical points. It can be expressed as a union of subvarieties. The objective f is a constant in each one of them. We can get an SOS type representation for f on each subvariety, and then construct a single one for f over the entire set of critical points.

Proof of Theorem 3.3

Clearly, every point in the complex variety

$$\begin{aligned} \mathcal {K}_1 := \{ x \in \mathbb {C}^n \mid c_{eq}(x) = 0, \phi (x) = 0 \} \end{aligned}$$

is a critical point. By Lemma 3.3 of [6], the objective f achieves finitely many real values on \(\mathcal {K}_c = \mathcal {K}_1 \cap \mathbb {R}^n\), say, they are \(v_1< \cdots < v_N\). Up to the shifting of a constant in f, we can further assume that \( f_c = 0. \) Clearly, \(f_c\) equals one of \(v_1, \ldots , v_N\), say \( v_t = f_c = 0. \)

Case I  Assume \(\hbox {IQ}(c_{eq},c_{in}) + \hbox {IQ}(\phi , \psi )\) is archimedean. Let

$$\begin{aligned} I := \hbox {Ideal}(c_{eq}, \phi ), \end{aligned}$$

the critical ideal. Note that \(\mathcal {K}_1 = \mathcal {V}_{\mathbb {C}}(I)\). The variety \(\mathcal {V}_{\mathbb {C}}(I)\) is a union of irreducible subvarieties, say, \(V_1, \ldots , V_\ell \). If \(V_i \cap \mathbb {R}^n \ne \emptyset \), then f is a real constant on \(V_i\), which equals one of \(v_1, \ldots , v_N\). This can be implied by Lemma 3.3 of [6] and Lemma 3.2 of [30]. Denote the subvarieties of \(\mathcal {V}_{\mathbb {C}}(I)\):

$$\begin{aligned} T_i := \mathcal {K}_1 \cap \{ f(x) = v_i \} \quad (i=t, \ldots ,N). \end{aligned}$$

Let \(T_{t-1}\) be the union of irreducible subvarieties \(V_i\), such that either \(V_i \cap \mathbb {R}^n = \emptyset \) or \(f \equiv v_j\) on \(V_i\) with \(v_j < v_t = f_c\). Then, it holds that

$$\begin{aligned} \mathcal {V}_{\mathbb {C}}(I) = T_{t-1} \cup T_t \cup \cdots \cup T_N. \end{aligned}$$

By the primary decomposition of I [11, 41], there exist ideals \(I_{t-1}, I_t, \ldots , I_N \subseteq \mathbb {R}[x]\) such that

$$\begin{aligned} I = I_{t-1} \,\cap \, I_t \,\cap \,\cdots \,\cap \, I_N \end{aligned}$$

and \(T_i = \mathcal {V}_{\mathbb {C}}(I_i)\) for all \(i = t-1, t, \ldots , N\). Denote the semialgebraic set

$$\begin{aligned} S := \{ x \in \mathbb {R}^n \mid \, c_{in}(x) \ge 0, \, \psi (x) \ge 0 \}. \end{aligned}$$
(3.14)

For \(i=t-1\), we have \(\mathcal {V}_{\mathbb {R}}(I_{t-1}) \cap S = \emptyset \), because \(v_1, \ldots , v_{t-1} < f_c\). By the Positivstellensatz [2, Corollary 4.4.3], there exists \(p_0 \in \hbox {Preord}(c_{in},\psi )\)Footnote 1 satisfying \(2 + p_0 \in I_{t-1}\). Note that \(1+p_0 >0\) on \(\mathcal {V}_{\mathbb {R}}(I_{t-1}) \cap S\). The set \(I_{t-1} + \hbox {Qmod}(c_{in}, \psi )\) is archimedean, because \(I \subseteq I_{t-1}\) and

$$\begin{aligned} \hbox {IQ}(c_{eq},c_{in}) + \hbox {IQ}(\phi ,\psi ) \, \subseteq \, I_{t-1} + \hbox {Qmod}(c_{in}, \psi ). \end{aligned}$$

By Theorem 2.1, we have

$$\begin{aligned} p_1 := 1+ p_0 \in I_{t-1} + \hbox {Qmod}(c_{in}, \psi ). \end{aligned}$$

Then, \(1+p_1 \in I_{t-1}\). There exists \(p_2 \in \hbox {Qmod}(c_{in}, \psi )\) such that

$$\begin{aligned} -1 \equiv p_1 \equiv p_2 \, \mod \, I_{t-1}. \end{aligned}$$

Since \(f = (f/4+1)^2 -1 \cdot (f/4-1)^2\), we have

$$\begin{aligned} f&\equiv \sigma _{t-1} := \left\{ (f/4+1)^2 + p_2 (f/4-1)^2 \right\} \quad \mod \quad I_{t-1}. \end{aligned}$$

So, when k is big enough, we have \(\sigma _{t-1} \in \hbox {Qmod}(c_{in}, \psi )_{2k}\).

For \(i=t\), \(v_t=0\) and f(x) vanishes on \(\mathcal {V}_{\mathbb {C}}(I_t)\). By Hilbert’s Strong Nullstellensatz [4], there exists an integer \(m_t > 0\) such that \(f^{m_t} \in I_t\). Define the polynomial

$$\begin{aligned} s_t(\epsilon ) := \sqrt{\epsilon } \sum \limits _{j=0}^{m_t-1} \left( {\begin{array}{c}1/2\\ j\end{array}}\right) \epsilon ^{-j} f^j. \end{aligned}$$

Then, we have that

$$\begin{aligned} D_1 := \epsilon (1 + \epsilon ^{-1} f ) - \big ( s_t(\epsilon ) \big )^2 \equiv 0 \quad \mod \quad I_t. \end{aligned}$$

This is because in the subtraction of \(D_1\), after expanding \(\big ( s_t(\epsilon ) \big )^2\), all the terms \(f^j\) with \(j < m_t\) are cancelled and \(f^j \in I_t\) for \(j \ge m_t\). So, \(D_1 \in I_t\). Let \(\sigma _t(\epsilon ) := s_t(\epsilon )^2\), then \(f + \epsilon - \sigma _t(\epsilon ) = D_1\) and

$$\begin{aligned} f + \epsilon - \sigma _t(\epsilon ) =\sum \limits _{j=0}^{m_t-2} b_j(\epsilon ) f^{m_t+j} \end{aligned}$$
(3.15)

for some real scalars \(b_j(\epsilon )\), depending on \(\epsilon \).

For each \(i = t+1, \ldots , N\), \(v_i>0\) and \(f(x)/v_i - 1\) vanishes on \(\mathcal {V}_{\mathbb {C}}(I_i)\). By Hilbert’s Strong Nullstellensatz [4], there exists \(0< m_i \in \mathbb {N}\) such that \((f/v_i - 1)^{m_i} \in I_i.\) Let

$$\begin{aligned} s_i := \sqrt{v_i} \sum \limits _{j=0}^{m_i-1} \left( {\begin{array}{c}1/2\\ j\end{array}}\right) (f/v_i - 1)^j. \end{aligned}$$

Like for the case \(i=t\), we can similarly show that \(f - s_i^2 \, \in \, I_i\). Let \(\sigma _i = s_i^2\), then \(f - \sigma _i \, \in \, I_i\).

Note that \(\mathcal {V}_{\mathbb {C}}(I_i) \cap \mathcal {V}_{\mathbb {C}}(I_j) = \emptyset \) for all \(i \ne j\). By Lemma 3.3 of [30], there exist polynomials \(a_{t-1}, \ldots , a_N \in \mathbb {R}[x]\) such that

$$\begin{aligned} a_{t-1}^2+\cdots +a_N^2-1 \, \in \, I, \quad a_i \in \bigcap _{ i \ne j \in \{ t-1, \ldots ,N \} } I_j. \end{aligned}$$

For \(\epsilon >0\), denote the polynomial

$$\begin{aligned} \sigma _\epsilon := \sigma _t(\epsilon ) a_t^2 + \sum _{ t \ne j \in \{ t-1, \ldots ,N \} } (\sigma _j+\epsilon ) a_j^2, \end{aligned}$$

then

$$\begin{aligned} \begin{array}{rcl} f + \epsilon - \sigma _\epsilon &{} = &{} (f+\epsilon )(1-a_{t-1}^2-\cdots -a_N^2) \\ &{} &{}+ \sum \limits _{ t \ne i \in \{ t-1, \ldots ,N \} } (f-\sigma _i) a_i^2 + (f+\epsilon -\sigma _t(\epsilon ) ) a_t^2. \end{array} \end{aligned}$$

For each \(i \ne t\), \(f-\sigma _i \in I_i\), so

$$\begin{aligned} (f-\sigma _i) a_i^2 \, \in \, \bigcap _{j=t-1}^N I_j = I. \end{aligned}$$

Hence, there exists \(k_1 >0\) such that

$$\begin{aligned} (f-\sigma _i) a_i^2 \, \in \, I_{2k_1} \, \, (t \ne i \in \{t-1, \ldots , N \}). \end{aligned}$$

Since \(f+\epsilon -\sigma _t(\epsilon ) \in I_t\), we also have

$$\begin{aligned} (f+\epsilon -\sigma _t(\epsilon ) ) a_t^2 \in \bigcap _{j=t-1}^N I_j = I. \end{aligned}$$

Moreover, by the Eq. (3.15),

$$\begin{aligned} (f+\epsilon -\sigma _t(\epsilon ) ) a_t^2 = \sum _{j=0}^{m_t-2} b_j(\epsilon ) f^{m_t+j} a_t^2. \end{aligned}$$

Each \(f^{m_t+j} a_t^2 \in I\), since \(f^{m_t+j} \in I_t\). So, there exists \(k_2 >0\) such that for all \(\epsilon >0\)

$$\begin{aligned} (f+\epsilon -\sigma _t(\epsilon ) ) a_t^2 \in I_{2k_2}. \end{aligned}$$

Since \(1-a_{t-1}^2-\cdots -a_N^2 \in I\), there also exists \(k_3 >0\) such that for all \(\epsilon >0\)

$$\begin{aligned} (f+\epsilon )(1-a_{t-1}^2-\cdots -a_N^2) \in I_{2k_3}. \end{aligned}$$

Hence, if \(k^* \ge \max \{k_1, k_2, k_3\}\), then we have

$$\begin{aligned} f(x) + \epsilon - \sigma _\epsilon \in I_{2k^*} \end{aligned}$$

for all \(\epsilon >0\). By the construction, the degrees of all \(\sigma _i\) and \(a_i\) are independent of \(\epsilon \). So, \(\sigma _\epsilon \, \in \, \hbox {Qmod}(c_{in}, \psi )_{2k^*}\) for all \(\epsilon >0\) if \(k^*\) is big enough. Note that

$$\begin{aligned} I_{2k^*} + \hbox {Qmod}(c_{in}, \psi )_{2k^*} = \hbox {IQ}(c_{eq}, c_{in})_{2k^*} + \hbox {IQ}(\phi , \psi )_{2k^*}. \end{aligned}$$

This implies that \(f_{k^*} \ge f_c -\epsilon \) for all \(\epsilon >0.\) On the other hand, we always have \(f_{k^*} \le f_c\). So, \(f_{k^*} = f_c\). Moreover, since \(\{ f_k \}\) is monotonically increasing, we must have \(f_{k} = f_c\) for all \(k\ge k^*\).

Case II  Assume \(\hbox {IQ}(c_{eq},c_{in})\) is archimedean. Because

$$\begin{aligned} \hbox {IQ}(c_{eq},c_{in}) \subseteq \hbox {IQ}(c_{eq},c_{in}) + \hbox {IQ}(\phi ,\psi ), \end{aligned}$$

the set \(\hbox {IQ}(c_{eq},c_{in})+ \hbox {IQ}(\phi ,\psi )\) is also archimedean. Therefore, the conclusion is also true by applying the result for Case I.

Case III  Suppose the Assumption 3.2 holds. Let \(\varphi _1, \ldots , \varphi _N\) be real univariate polynomials such that \(\varphi _i( v_j) = 0\) for \(i \ne j\) and \(\varphi _i( v_j) = 1\) for \(i = j\). Let

$$\begin{aligned} s := s_t + \cdots + s_N \quad \hbox {where\,each} \quad s_i := ( v_i - f_c) \big ( \varphi _i ( f ) \big )^2. \end{aligned}$$

Then, \(s \in \varSigma [x]_{2k_4}\) for some integer \(k_4 > 0\). Let

$$\begin{aligned} \hat{f}:=f - f_c - s. \end{aligned}$$

We show that there exist an integer \(\ell > 0\) and \(q \in \hbox {Qmod}(c_{in}, \psi )\) such that

$$\begin{aligned} \hat{f}^{2\ell } + q \in \hbox {Ideal}(c_{eq},\phi ). \end{aligned}$$

This is because, by Assumption 3.2, \(\hat{f}(x) \equiv 0\) on the set

$$\begin{aligned} \mathcal {K}_2 := \{ x \in \mathbb {R}^n: \, c_{eq}(x) = 0, \, \phi (x) = 0, \, \rho (x) \ge 0 \}. \end{aligned}$$

It has only a single inequality. By the Positivstellensatz [2, Corollary 4.4.3], there exist \(0< \ell \in \mathbb {N}\) and \(q = b_0 + \rho b_1 \) (\(b_0, b_1 \in \varSigma [x]\)) such that \(\hat{f}^{2\ell } + q \in \hbox {Ideal}(c_{eq},\phi ).\) By Assumption 3.2, \(\rho \in \hbox {Qmod}(c_{in}, \psi )\), so we have \(q \in \hbox {Qmod}(c_{in}, \psi )\).

For all \(\epsilon >0\) and \(\tau >0\), we have \(\hat{f} + \epsilon = \phi _\epsilon + \theta _\epsilon \) where

$$\begin{aligned} \phi _\epsilon = -\tau \epsilon ^{1-2\ell } \big (\hat{f}^{2\ell } + q \big ), \end{aligned}$$
$$\begin{aligned} \theta _\epsilon = \epsilon \Big (1 + \hat{f}/\epsilon + \tau ( \hat{f}/\epsilon )^{2\ell } \Big ) + \tau \epsilon ^{1-2\ell } q. \end{aligned}$$

By Lemma 2.1 of [31], when \(\tau \ge \frac{1}{2\ell }\), there exists \(k_5\) such that, for all \(\epsilon >0\),

$$\begin{aligned} \phi _\epsilon \in \hbox {Ideal}(c_{eq},\phi )_{2k_5}, \quad \theta _\epsilon \in \hbox {Qmod}(c_{in},\psi )_{2k_5}. \end{aligned}$$

Hence, we can get

$$\begin{aligned} f - (f_c -\epsilon ) = \phi _\epsilon + \sigma _\epsilon , \end{aligned}$$

where \(\sigma _\epsilon = \theta _\epsilon + s \in \hbox {Qmod}(c_{in},\psi )_{2k_5}\) for all \(\epsilon >0\). Note that

$$\begin{aligned} \hbox {IQ}(c_{eq},c_{in})_{2k_5} + \hbox {IQ}(\phi ,\psi )_{2k_5} = \hbox {Ideal}(c_{eq},\phi )_{2k_5} + \hbox {Qmod}(c_{in},\psi )_{2k_5}. \end{aligned}$$

For all \(\epsilon >0\), \(\gamma = f_c-\epsilon \) is feasible in (3.9) for the order \(k_5\), so \(f_{k_5} \ge f_c\). Because of (3.11) and the monotonicity of \(\{ f_k \}\), we have \(f_{k} = f_{k}^\prime = f_c\) for all \(k \ge k_5\). \(\square \)

3.2 Detecting tightness and extracting minimizers

The optimal value of (3.7) is \(f_c\), and the optimal value of (1.1) is \(f_{\min }\). If \(f_{\min }\) is achievable at a critical point, then \(f_c = f_{\min }\). In Theorem 3.3, we have shown that \(f_k = f_c\) for all k big enough, where \(f_k\) is the optimal value of (3.9). The value \(f_c\) or \(f_{\min }\) is often not known. How do we detect the tightness \( f_k = f_c \) in computation? The flat extension or flat truncation condition [5, 14, 32] can be used for checking tightness. Suppose \(y^*\) is a minimizer of (3.8) for the order k. Let

$$\begin{aligned} d:= \lceil \deg (c_{eq}, c_{in}, \phi , \psi )/2 \rceil . \end{aligned}$$
(3.16)

If there exists an integer \(t \in [d, k]\) such that

$$\begin{aligned} \text{ rank }\, M_t (y^*) = \text{ rank }\, M_{t-d}(y^*) \end{aligned}$$
(3.17)

then \(f_k = f_{c}\) and we can get \(r := \text{ rank }\, M_t (y^*)\) minimizers for (3.7) [5, 14, 32]. The method in [14] can be used to extract minimizers. It was implemented in the software GloptiPoly 3 [13]. Generally, (3.17) can serve as a sufficient and necessary condition for detecting tightness. The case that (3.7) is infeasible (i.e., no critical points satisfy the constraints \(c_{in} \ge 0, \psi \ge 0\)) can also be detected by solving the relaxations (3.8)–(3.9).

Theorem 3.4

Under Assumption 3.1, the relaxations (3.8)–(3.9) have the following properties:

  1. (i)

    If (3.8) is infeasible for some order k, then no critical points satisfy the constraints \(c_{in} \ge 0, \psi \ge 0\), i.e., (3.7) is infeasible.

  2. (ii)

    Suppose Assumption 3.2 holds. If (3.7) is infeasible, then the relaxation (3.8) must be infeasible when the order k is big enough.

In the following, assume (3.7) is feasible (i.e., \(f_c < + \infty \)). Then, for all k big enough, (3.8) has a minimizer \(y^*\). Moreover,

  1. (iii)

    If (3.17) is satisfied for some \(t \in [d,k]\), then \(f_k = f_c\).

  2. (iv)

    If Assumption 3.2 holds and (3.7) has finitely many minimizers, then every minimizer \(y^*\) of (3.8) must satisfy (3.17) for some \(t \in [d,k]\), when k is big enough.

Proof

By Assumption 3.1, u is a critical point if and only if \(c_{eq}(u) = 0, \phi (u)=0\).

  1. (i)

    For every feasible point u of (3.7), the tms \([u]_{2k}\) (see Sect. 2 for the notation) is feasible for (3.8), for all k. Therefore, if (3.8) is infeasible for some k, then (3.7) must be infeasible.

  2. (ii)

    By Assumption 3.2, when (3.7) is infeasible, the set

    $$\begin{aligned} \{x\in \mathbb {R}^n:c_{eq}(x)=0, \phi (x)=0, \rho (x) \ge 0 \} \end{aligned}$$

    is empty. It has a single inequality. By the Positivstellensatz [2, Corollary 4.4.3], it holds that \( -1\in \hbox {Ideal}(c_{eq}, \phi ) + \hbox {Qmod}(\rho ). \) By Assumption 3.2,

    $$\begin{aligned} \hbox {Ideal}(c_{eq}, \phi ) + \hbox {Qmod}(\rho ) \subseteq \hbox {IQ}(c_{eq}, c_{in}) + \hbox {IQ}(\phi , \psi ). \end{aligned}$$

    Thus, for all k big enough, (3.9) is unbounded from above. Hence, (3.8) must be infeasible, by weak duality. When (3.7) is feasible, f achieves finitely many values on \(\mathcal {K}_c\), so (3.7) must achieve its optimal value \(f_c\). By Theorem 3.3, we know that \(f_k = f_k^\prime = f_c\) for all k big enough. For each minimizer \(u^*\) of (3.7), the tms \([u^*]_{2k}\) is a minimizer of (3.8).

  3. (iii)

    If (3.17) holds, we can get \(r := \text{ rank }\, M_t (y^*)\) minimizers for (3.7) [5, 14], say, \(u_1, \ldots , u_r\), such that \(f_k = f(u_i)\) for each i. Clearly, \(f_k = f(u_i) \ge f_c\). On the other hand, we always have \(f_k \le f_c\). So, \(f_k = f_c\).

  4. (iv)

    By Assumption 3.2, (3.7) is equivalent to the problem

    $$\begin{aligned} \left\{ \begin{array}{rl} \min &{} f(x) \\ s.t. &{} c_{eq}(x) = 0, \, \phi (x) = 0, \, \rho (x) \ge 0. \end{array} \right. \end{aligned}$$
    (3.18)

    The optimal value of (3.18) is also \(f_c\). Its kth order Lasserre’s relaxation is

    $$\begin{aligned} \left\{ \begin{array}{rl} \gamma _k^\prime := \min &{} \langle f, y \rangle \\ s.t. &{} \langle 1, y \rangle = 1, M_k(y) \succeq 0, \\ &{} L_{c_{eq}}^{(k)}(y) = 0, \, L_{\phi }^{(k)}(y) = 0, \, L_{\rho }^{(k)}(y) \succeq 0. \end{array} \right. \end{aligned}$$
    (3.19)

    Its dual optimization problem is

    $$\begin{aligned} \left\{ \begin{array}{rl} \gamma _k := \, \max &{} \gamma \\ s.t. &{} f-\gamma \in \hbox {Ideal}(c_{eq},\phi )_{2k} + \hbox {Qmod}(\rho )_{2k}. \end{array} \right. \end{aligned}$$
    (3.20)

    By repeating the same proof as for Theorem 3.3(iii), we can show that

    $$\begin{aligned} \gamma _k = \gamma _k^\prime = f_c \end{aligned}$$

    for all k big enough. Because \(\rho \in \hbox {Qmod}(c_{in}, \psi ),\) each y feasible for (3.8) is also feasible for (3.19). So, when k is big, each \(y^*\) is also a minimizer of (3.19). The problem (3.18) also has finitely many minimizers. By Theorem 2.6 of [32], the condition (3.17) must be satisfied for some \(t\in [d,k]\), when k is big enough. \(\square \)

If (3.7) has infinitely many minimizers, then the condition (3.17) is typically not satisfied. We refer to [25, §6.6].

4 Polyhedral constraints

In this section, we assume the feasible set of (1.1) is the polyhedron

$$\begin{aligned} P :=\{ x \in \mathbb {R}^n \, \mid \, Ax-b \ge 0\}, \end{aligned}$$

where \(A = \begin{bmatrix} a_1&\cdots&a_m \end{bmatrix}^T \in \mathbb {R}^{m \times n}\), \(b = \begin{bmatrix} b_1&\cdots&b_m \end{bmatrix}^T \in \mathbb {R}^m\). This corresponds to that \(\mathcal {E}=\emptyset \), \(\mathcal {I}=[m]\), and each \(c_i(x) = a_i^Tx - b_i\). Denote

$$\begin{aligned} D(x):= \text{ diag }( c_1(x), \ldots , c_m(x)), \quad C(x):=\begin{bmatrix} A^T \\ D(x) \end{bmatrix}. \end{aligned}$$
(4.1)

The Lagrange multiplier vector \(\lambda := \begin{bmatrix} \lambda _1&\cdots&\lambda _m \end{bmatrix}^T\) satisfies

$$\begin{aligned} \begin{bmatrix} A^T \\ D(x) \end{bmatrix} \lambda = \begin{bmatrix} \nabla f(x) \\ 0 \end{bmatrix}. \end{aligned}$$
(4.2)

If \(\text{ rank }\, A= m\), we can express \(\lambda \) as

$$\begin{aligned} \lambda = (AA^T)^{-1} A \nabla f(x). \end{aligned}$$
(4.3)

If \(\text{ rank }\, A < m\), how can we express \(\lambda \) in terms of x? In computation, we often prefer a polynomial expression. If there exists \(L(x) \in \mathbb {R}[x]^{m \times (n+m)}\) such that

$$\begin{aligned} L(x) C(x) = I_m, \end{aligned}$$
(4.4)

then we can get

$$\begin{aligned} \lambda \, = \, L(x) \begin{bmatrix} \nabla f(x) \\ 0 \end{bmatrix} \,=\, L_1(x) \nabla f(x), \end{aligned}$$

where \(L_1(x)\) consists of the first n columns of L(x). In this section, we characterize when such L(x) exists and give a degree bound for it.

The linear function \(Ax-b\) is said to be nonsingular if \(\text{ rank }\, C(u) = m\) for all \(u \in \mathbb {C}^n\) (also see Definition 5.1). This is equivalent to that for every u, if \(J(u) = \{ i_1, \ldots , i_k \}\) (see (1.2) for the notation), then \( a_{i_1}, \ldots , a_{i_k} \) are linearly independent.

Proposition 4.1

The linear function \(Ax-b\) is nonsingular if and only if there exists a matrix polynomial L(x) satisfying (4.4). Moreover, when \(Ax-b\) is nonsingular, we can choose L(x) in (4.4) with \(\deg (L) \le m-\text{ rank }\,A\).

Proof

Clearly, if (4.4) is satisfied by some L(x), then \(\text{ rank }\,C(u) \ge m\) for all u. This implies that \(Ax-b\) is nonsingular.

Next, assume that \(Ax-b\) is nonsingular. We show that (4.4) is satisfied by some \(L(x) \in \mathbb {R}[x]^{m\times (n+m)}\) with degree \(\le m-\text{ rank }\,A\). Let \(r = \text{ rank }\,A\). Up to a linear coordinate transformation, we can reduce x to a r-dimensional variable. Without loss of generality, we can assume that \(\text{ rank }\, A = n\) and \(m \ge n\).

For a subset \(I := \{i_1, \ldots , i_{m-n}\}\) of [m], denote

$$\begin{aligned} c_I(x) := \prod _{ i \in I } c_i(x), \quad E_I(x):= c_I(x) \cdot \text{ diag }( c_{i_1}(x)^{-1}, \ldots , c_{i_{m-n}}(x)^{-1} ), \end{aligned}$$
$$\begin{aligned} D_I(x):= \text{ diag }( c_{i_1}(x), \ldots , c_{i_{m-n}}(x) ), \quad A_I = \begin{bmatrix} a_{i_1}&\cdots&a_{i_{n-m}} \end{bmatrix}^T. \end{aligned}$$

For the case that \(I=\emptyset \) (the empty set), we set \(c_{\emptyset }(x) = 1\). Let

$$\begin{aligned} V = \{ I \subseteq [m]: \,\, |I|=m-n, \, \text{ rank }\, A_{[m]\backslash I} = n \}. \end{aligned}$$

Step I   For each \(I \in V\), we construct a matrix polynomial \(L_I(x)\) such that

$$\begin{aligned} L_I(x) C(x) = c_I(x) I_m. \end{aligned}$$
(4.5)

The matrix \(L_I:= L_I(x)\) satisfying (4.5) can be given by the following \(2\times 3\) block matrix (\(L_I(\mathcal {J},\mathcal {K})\) denotes the submatrix whose row indices are from \(\mathcal {J}\) and whose column indices are from \(\mathcal {K}\)):

(4.6)

Equivalently, the blocks of \(L_I\) are:

$$\begin{aligned} L_I\big (I, [n]\big )= 0, \quad L_I\big (I, n + [m]\backslash I \big ) = 0, \quad L_I\big ([m]\backslash I, n + [m]\backslash I \big ) = 0, \end{aligned}$$
$$\begin{aligned} L_I \big (I, n+I\big ) = E_I(x), \quad L_I \big ([m]\backslash I, [n]\big ) = c_I(x) \big ( A_{[m]\backslash I} \big )^{-T}, \end{aligned}$$
$$\begin{aligned} L_I \big ( [m]\backslash I, n+I \big ) = -\big ( A_{[m]\backslash I} \big )^{-T} \big ( A_{I} \big )^T. \end{aligned}$$

For each \(I \in V\), \( A_{[m]\backslash I}\) is invertible. The superscript \(^{-T}\) denotes the inverse of the transpose. Let \(G:=L_I(x) C(x)\), then one can verify that

$$\begin{aligned} G(I,I) = E_I(x) D_I(x) = c_I(x) I_{m-n}, \quad G(I,[m]\backslash I) = 0, \end{aligned}$$
$$\begin{aligned} G([m]\backslash I, [m]\backslash I) = \begin{bmatrix} c_I(x) \big ( A_{[m]\backslash I} \big )^{-T}&-A_{[m]\backslash I}^{-T} A_{I}^{T} E_I(x) \end{bmatrix} \begin{bmatrix} \big ( A_{[m]\backslash I} \big )^T \\ 0 \end{bmatrix} = c_I(x) I_n. \end{aligned}$$
$$\begin{aligned} G([m]\backslash I, I) = \begin{bmatrix} c_I(x) \big ( A_{[m]\backslash I} \big )^{-T}&-\big ( A_{[m]\backslash I} \big )^{-T} \big ( A_{I} \big )^T E_I(x) \end{bmatrix} \begin{bmatrix} A_{I}^T \\ D_I(x) \end{bmatrix} = 0. \end{aligned}$$

This shows that the above \(L_I(x)\) satisfies (4.5).

Step II   We show that there exist real scalars \(\nu _I\) satisfying

$$\begin{aligned} \sum _{ I \in V} \nu _I c_I(x) = 1. \end{aligned}$$
(4.7)

This can be shown by induction on m.

  • When \(m=n\), \(V = \emptyset \) and \(c_{\emptyset }(x) = 1\), so (4.7) is clearly true.

  • When \(m > n\), let

    $$\begin{aligned} N := \{ i \in [m] \, \mid \, \text{ rank }\, A_{[m]\backslash \{i\} } = n \}. \end{aligned}$$
    (4.8)

For each \(i \in N\), let \(V_i\) be the set of all \(I' \subseteq [m]\backslash \{i\}\) such that \(|I'|=m-n-1\) and \(\text{ rank }\, A_{[m]\backslash (I'\cup \{i\}) } = n\). For each \(i \in N\), by the assumption, the linear function \(A_{m \backslash \{i\} } x - b_{m \backslash \{i\}}\) is nonsingular. By induction, there exist real scalars \(\nu _{I'}^{(i)}\) satisfying

$$\begin{aligned} \sum _{ I' \in V_i } \nu _{I'}^{(i)} c_{I'}(x) = 1. \end{aligned}$$
(4.9)

Since \(\text{ rank }\, A = n\), we can generally assume that \(\{a_1, \ldots , a_n\}\) is linearly independent. So, there exist scalars \(\alpha _1, \ldots , \alpha _n\) such that

$$\begin{aligned} a_m = \alpha _1 a_1 + \cdots + \alpha _n a_n. \end{aligned}$$

If all \(\alpha _i =0\), then \(a_m = 0\), and hence A can be replaced by its first \(m-1\) rows. So, (4.7) is true by the induction. In the following, suppose at least one \(\alpha _i \ne 0\) and write

$$\begin{aligned} \{ i: \, \alpha _i \ne 0 \} = \{ i_1, \ldots , i_k \}. \end{aligned}$$

Then, \(a_{i_1}, \ldots , a_{i_k}, a_{m}\) are linearly dependent. For convenience, set \(i_{k+1} := m\). Since \(Ax-b\) is nonsingular, the linear system

$$\begin{aligned} c_{i_1}(x) = \cdots = c_{i_k}(x)= c_{i_{k+1}}(x) = 0 \end{aligned}$$

has no solutions. Hence, there exist real scalars \(\mu _1, \ldots , \mu _{k+1}\) such that

$$\begin{aligned} \mu _{1} c_{i_1}(x) + \cdots + \mu _{k} c_{i_k}(x) + \mu _{k+1} c_{i_{k+1}}(x) = 1. \end{aligned}$$

This above can be implied by echelon’s form for inconsistent linear systems. Note that \(i_1, \ldots , i_{k+1} \in N\). For each \(j = 1,\ldots , k+1\), by (4.9),

$$\begin{aligned} \sum _{ I' \in V_{i_j} } \nu _{I'}^{(i_j)} c_{I'}(x) = 1. \end{aligned}$$

Then, we can get

$$\begin{aligned} 1= & {} \sum _{j=1}^{k+1} \mu _{j} c_{i_j}(x) = \sum _{j=1}^{k+1} \mu _{j} \sum _{ I' \in V_{i_j}} \nu _{I'}^{(i_j)} c_{i_j}(x) c_{I'}(x)\\= & {} \sum _{ I = I' \cup \{i_j\}, I' \in V_{i_j}, 1 \le j \le k+1 } \nu _{I'}^{(i_j)} \mu _{j} c_{I}(x). \end{aligned}$$

Since each \(I' \cup \{i_j\} \in V\), (4.7) must be satisfied by some scalars \(\nu _I\).

Step III   For \(L_I(x)\) as in (4.5), we construct L(x) as

$$\begin{aligned} L(x) := \sum _{ I \in V } \nu _I c_I(x) L_I(x). \end{aligned}$$
(4.10)

Clearly, L(x) satisfies (4.4) because

$$\begin{aligned} L(x) C(x) = \sum _{ I \in V } \nu _I L_I(x) C(x) = \sum _{ I \in V } \nu _I c_I(x) I_m = I_m. \end{aligned}$$

Each \(L_I(x)\) has degree \(\le m-n\), so L(x) has degree \(\le m-n\). \(\square \)

Proposition 4.1 characterizes when there exists L(x) satisfying (4.4). When it does, a degree bound for L(x) is \(m-\text{ rank }\,A\). Sometimes, its degree can be smaller than that, as shown in Example 4.3. For given Ab, the matrix polynomial L(x) satisfying (4.4) can be determined by linear equations, which are obtained by matching coefficients on both sides. In the following, we give some examples of \(L(x)C(x)=I_m\) for polyhedral sets.

Example 4.2

Consider the simplicial set

$$\begin{aligned} x_1 \ge 0, \, \ldots , \, x_n \ge 0, \, 1-e^Tx \ge 0. \end{aligned}$$

The equation \(L(x)C(x) = I_{n+1}\) is satisfied by

$$\begin{aligned} L(x) = \begin{bmatrix} 1 - x_1&\quad -\,x_2&\quad \cdots&\quad -\,x_n&\quad 1&\quad \cdots&\quad 1 \\ -\,x_1&\quad 1 -\, x_2&\quad \cdots&\quad -\,x_n&\quad 1&\quad \cdots&\quad 1 \\ \vdots&\quad \vdots&\quad \ddots&\quad \vdots&\quad \vdots&\quad \vdots&\quad \vdots \\ -\,x_1&\quad -\,x_2&\quad \cdots&\quad 1 -x_n&\quad 1&\quad \cdots&\quad 1 \\ -\,x_1&\quad -\,x_2&\quad \cdots&\quad -\,x_n&\quad 1&\quad \cdots&\quad 1 \end{bmatrix}. \end{aligned}$$

Example 4.3

Consider the box constraint

$$\begin{aligned} x_1 \ge 0, \, \ldots , \, x_n \ge 0, \quad 1-x_1 \ge 0, \, \ldots , \, 1-x_n \ge 0. \end{aligned}$$

The equation \(L(x)C(x) = I_{2n}\) is satisfied by

$$\begin{aligned} L(x) = \begin{bmatrix} I_n- \text{ diag }(x)&\quad I_n&\quad I_n \\ -\, \text{ diag }(x)&\quad I_n&\quad I_n \\ \end{bmatrix}. \end{aligned}$$

Example 4.4

Consider the polyhedral set

$$\begin{aligned} 1-x_4 \ge 0, \quad x_4-x_3 \ge 0, \quad x_3-x_2 \ge 0, \quad x_2-x_1 \ge 0, \quad x_1+1 \ge 0. \end{aligned}$$

The equation \(L(x)C(x) = I_{5}\) is satisfied by

$$\begin{aligned} L(x) = \frac{1}{2} \left[ \begin{array}{ccccccccc} -\,x_{1} - 1 &{}\quad -\,x_{2} - 1 &{}\quad -\,x_{3} - 1 &{}\quad -\, x_{4} - 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ -\,x_{1} - 1 &{}\quad -\,x_{2} - 1 &{}\quad -\, x_{3} - 1 &{}\quad 1 - x_{4} &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ -\, x_{1}- 1 &{}\quad -\,x_{2} - 1 &{}\quad 1 - x_{3} &{}\quad 1 - x_{4} &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ -\,x_{1} - 1 &{}\quad 1 - x_{2} &{}\quad 1 - x_{3} &{}\quad 1 - x_{4} &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1\\ 1 - x_{1} &{}\quad 1 - x_{2} &{}\quad 1 - x_{3} &{}\quad 1 - x_{4} &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 &{}\quad 1 \end{array}\right] . \end{aligned}$$

Example 4.5

Consider the polyhedral set

$$\begin{aligned} 1+x_1 \ge 0, \, 1-x_1 \ge 0, \quad 2-x_1-x_2 \ge 0, \quad 2-x_1 +x_2 \ge 0. \end{aligned}$$

The matrix L(x) satisfying \(L(x)C(x) = I_4\) is

$$\begin{aligned} \frac{1}{6} \left[ \begin{array}{cccccc} {x_{1}}^2 - 3\, x_{1} + 2 &{}\quad x_{1}\,x_{2} - x_{2} &{}\quad 4 - x_{1} &{}\quad 2 - x_{1} &{}\quad 1 - x_{1} &{}\quad 1 - x_{1}\\ 3\,{x_{1}}^2 - 3\, x_{1} - 6 &{}\quad 3\, x_{2} + 3\, x_{1}\, x_{2} &{}\quad 6 - 3\,x_{1} &{}\quad - 3\, x_{1} &{}\quad - 3\, x_{1} - 3 &{}\quad - 3\, x_{1} - 3\\ 1-{x_{1}}^2 &{}\quad -\, 2\, x_{2} - x_{1}\, x_{2} - 3 &{}\quad x_{1} - 1 &{}\quad x_{1} + 1&{}\quad x_{1} + 2 &{}\quad x_{1} + 2\\ 1 - {x_{1}}^2 &{}\quad 3 - x_{1}\, x_{2} - 2\,x_{2} &{}\quad x_{1} - 1 &{}\quad x_{1} + 1 &{}\quad x_{1} + 2 &{}\quad x_{1} + 2 \end{array}\right] . \end{aligned}$$

5 General constraints

We consider general nonlinear constraints as in (1.1). The critical point conditions are in (1.8). We discuss how to express Lagrange multipliers \(\lambda _i\) as polynomial functions in x on the set of critical points.

Suppose there are totally m equality and inequality constraints, i.e.,

$$\begin{aligned} \mathcal {E} \cup \mathcal {I} = \{1, \ldots , m \}. \end{aligned}$$

If \((x,\lambda )\) is a critical pair, then \(\lambda _i c_i(x) = 0\) for all \(i \in \mathcal {E} \cup \mathcal {I}\). So, the Lagrange multiplier vector \(\lambda := \begin{bmatrix} \lambda _1&\cdots&\lambda _m \end{bmatrix}^T\) satisfies the equation

$$\begin{aligned} \underbrace{ \begin{bmatrix} \nabla c_1(x)&\quad \nabla c_2(x)&\quad \cdots&\quad \nabla c_m(x) \\ c_1(x)&\quad 0&\quad \cdots&\quad 0 \\ 0&\quad c_2(x)&\quad 0&\quad 0 \\ \vdots&\quad \vdots&\quad \ddots&\quad \vdots \\ 0&\quad 0&\quad \cdots&\quad c_m(x) \\ \end{bmatrix} }_{C(x)} \lambda = \begin{bmatrix} \nabla f(x) \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix}. \end{aligned}$$
(5.1)

Let C(x) be as in above. If there exists \(L(x) \in \mathbb {R}[x]^{m \times (m+n)}\) such that

$$\begin{aligned} L(x) C(x) = I_m, \end{aligned}$$
(5.2)

then we can get

$$\begin{aligned} \lambda = L(x) \begin{bmatrix} \nabla f(x) \\ 0 \end{bmatrix} = L_1(x) \nabla f(x), \end{aligned}$$
(5.3)

where \(L_1(x)\) consists of the first n columns of L(x). Clearly, (5.2) implies that Assumption 3.1 holds. This section characterizes when such L(x) exists.

Definition 5.1

The tuple \(c := (c_1, \ldots , c_m)\) of constraining polynomials is said to be nonsingular if \(\text{ rank }\, C(u) = m\) for every \(u \in \mathbb {C}^n\).

Clearly, c being nonsingular is equivalent to that for each \(u \in \mathbb {C}^n\), if \(J(u) = \{ i_1, \ldots , i_k \}\) (see (1.2) for the notation), then the gradients \( \nabla c_{i_1}(u), \ldots , \nabla c_{i_k}(u) \) are linearly independent. Our main conclusion is that (5.2) holds if and only if the tuple c is nonsingular.

Proposition 5.2

  1. (i)

    For each \(W(x) \in \mathbb {C}[x]^{s \times t}\) with \(s \ge t\), \(\text{ rank }\,W(u) = t\) for all \(u \in \mathbb {C}^n\) if and only if there exists \(P(x) \in \mathbb {C}[x]^{t \times s}\) such that

    $$\begin{aligned} P(x) W(x) = I_t. \end{aligned}$$

    Moreover, for \(W(x) \in \mathbb {R}[x]^{s \times t}\), we can choose \(P(x) \in \mathbb {R}[x]^{t \times s}\) for the above.

  2. (ii)

    The constraining polynomial tuple c is nonsingular if and only if there exists \(L(x) \in \mathbb {R}[x]^{m \times (m+n)}\) satisfying (5.2).

Proof

(i)“\(\Leftarrow \)”: If \(L(x) W(x) = I_t\), then for all \(u\in \mathbb {C}^n\)

$$\begin{aligned} t = \text{ rank }\, I_t \le \text{ rank }\, W(u) \le t. \end{aligned}$$

So, W(x) must have full column rank everywhere.

\(\Rightarrow \)”:  Suppose \(\text{ rank }\,W(u) = t\) for all \(u\in \mathbb {C}^n\). Write W(x) in columns

$$\begin{aligned} W(x) = \begin{bmatrix} w_1(x)&w_2(x)&\cdots&w_t(x) \end{bmatrix}. \end{aligned}$$

Then, the equation \(w_1(x)=0\) does not have a complex solution. By Hilbert’s Weak Nullstellensatz [4], there exists \(\xi _1(x) \in \mathbb {C}[x]^s\) such that \(\xi _1(x)^T w_1(x) = 1.\) For each \(i = 2, \ldots , t\), denote

$$\begin{aligned} r_{1,i}(x):= \xi _1(x)^T w_i(x), \end{aligned}$$

then (use \(\sim \) to denote row equivalence between matrices)

$$\begin{aligned} W(x) \sim \begin{bmatrix} 1&\quad r_{1,2}(x)&\quad \cdots&\quad r_{1,t}(x) \\ w_1(x)&\quad w_2(x)&\quad \cdots&\quad w_m(x) \\ \end{bmatrix} \sim W_1(x) := \begin{bmatrix} 1&\quad r_{1,2}(x)&\quad \cdots&\quad r_{1,m}(x) \\ 0&\quad w_2^{(1)}(x)&\quad \cdots&\quad w_m^{(1)}(x) \\ \end{bmatrix}, \end{aligned}$$

where each (\(i=2,\ldots ,m\))

$$\begin{aligned} w_i^{(1)}(x) = w_i(x) - r_{1,i}(x) w_1(x). \end{aligned}$$

So, there exists \(P_1(x) \in \mathbb {R}[x]^{(s+1) \times s}\) such that

$$\begin{aligned} P_1(x) W(x) = W_1(x). \end{aligned}$$

Since W(x) and \(W_1(x)\) are row equivalent, \(W_1(x)\) must also have full column rank everywhere. Similarly, the polynomial equation

$$\begin{aligned} w_2^{(1)}(x) = 0 \end{aligned}$$

does not have a complex solution. Again, by Hilbert’s Weak Nullstellensatz [4], there exists \(\xi _2(x) \in \mathbb {C}[x]^{s}\) such that

$$\begin{aligned} \xi _2(x)^T w_2^{(1)}(x) = 1. \end{aligned}$$

For each \(i = 3, \ldots , t\), let \(r_{2,i}(x):= \xi _2(x)^T w_2^{(1)}(x)\), then

$$\begin{aligned} W_1(x)\sim & {} \begin{bmatrix} 1&\quad r_{1,2}(x)&\quad r_{1,3}(x)&\quad \cdots&\quad r_{1,m}(x) \\ 0&\quad 1&\quad r_{2,3}(x)&\quad \cdots&\quad r_{2,m}(x) \\ 0&\quad w_2^{(1)}(x)&\quad w_3^{(1)}(x)&\quad \cdots&\quad w_m^{(1)}(x) \\ \end{bmatrix}\\\sim & {} W_2(x) := \begin{bmatrix} 1&\quad r_{1,2}(x)&\quad r_{1,3}(x)&\quad \cdots&\quad r_{1,m}(x) \\ 0&\quad 1&\quad r_{2,3}(x)&\quad \cdots&\quad r_{2,m}(x) \\ 0&\quad 0&\quad w_3^{(2)}(x)&\quad \cdots&\quad w_m^{(2)}(x) \\ \end{bmatrix}, \end{aligned}$$

where each (\(i=3,\ldots ,m\))

$$\begin{aligned} w_i^{(2)}(x) = w_i^{(1)}(x) - r_{2,i}(x) w_2^{(1)}(x). \end{aligned}$$

Similarly, \(W_1(x)\) and \(W_2(x)\) are row equivalent, so \(W_2(x)\) has full column rank everywhere. There exists \(P_2(x) \in \mathbb {C}[x]^{(s+2) \times (s+1)}\) such that

$$\begin{aligned} P_2(x) W_1(x) = W_2(x). \end{aligned}$$

Continuing this process, we can finally get

$$\begin{aligned} W_2(x) \sim \cdots \sim W_t(x):= \begin{bmatrix} 1&\quad r_{1,2}(x)&\quad r_{1,3}(x)&\quad \cdots&\quad r_{1,t}(x) \\ 0&\quad 1&\quad r_{2,3}(x)&\quad \cdots&\quad r_{2,t}(x) \\ 0&\quad 0&\quad 1&\quad \cdots&\quad r_{3,t}(x) \\ \vdots&\quad \vdots&\quad \vdots&\quad \ddots&\quad \vdots \\ 0&\quad 0&\quad 0&\quad \cdots&\quad 1 \\ 0&\quad 0&\quad 0&\quad \cdots&\quad 0 \\ \end{bmatrix}. \end{aligned}$$

Consequently, there exists \(P_i(x) \in \mathbb {R}[x]^{(s+i) \times (s+i-1)}\) for \(i=1,2,\ldots , t\), such that

$$\begin{aligned} P_t(x) P_{t-1}(x) \cdots P_1(x) W(x) = W_t(x). \end{aligned}$$

Since \(W_t(x)\) is a unit upper triangular matrix polynomial, there exists \(P_{t+1}(x) \in \mathbb {R}[x]^{t \times (s+t)}\) such that \(P_{t+1}(x) W_t(x) = I_t.\) Let

$$\begin{aligned} P(x) \, := \, P_{t+1}(x) P_t(x) P_{t-1}(x) \cdots P_1(x), \end{aligned}$$

then \(P(x) W(x) = I_m\). Note that \(P(x) \in \mathbb {C}[x]^{t \times s}\). For \(W(x) \in \mathbb {R}[x]^{s\times t}\), we can replace P(x) by

\(\big (P(x) + \overline{P(x)}\big )/2\) (the \(\overline{P(x)}\) denotes the complex conjugate of P(x)), which is a real matrix polynomial. (ii) The conclusion is implied directly by the item (i). \(\square \)

In Proposition 5.2, there is no explicit degree bound for L(x) satisfying (5.2). This question is mostly open, to the best of the author’s knowledge. However, once a degree is chosen for L(x), it can be determined by comparing coefficients of both sides of (5.2). This can be done by solving a linear system. In the following, we give some examples of L(x) satisfying (5.2).

Example 5.3

Consider the hypercube with quadratic constraints

$$\begin{aligned} 1-x_1^2 \ge 0, 1-x_2^2 \ge 0, \ldots , 1-x_n^2 \ge 0. \end{aligned}$$

The equation \(L(x)C(x)=I_n\) is satisfied by

$$\begin{aligned} L(x) = \begin{bmatrix} -\frac{1}{2} \text{ diag }(x)&I_n \end{bmatrix}. \end{aligned}$$

Example 5.4

Consider the nonnegative portion of the unit sphere

$$\begin{aligned} x_1 \ge 0, x_2 \ge 0, \ldots , x_n\ge 0, x_1^2+\cdots +x_n^2 - 1 =0. \end{aligned}$$

The equation \(L(x)C(x)=I_{n+1}\) is satisfied by

$$\begin{aligned} L(x) = \begin{bmatrix} I_n - xx^T&\quad x \mathbf {1}_n^T&\quad 2x \\ \frac{1}{2}x^T&\quad -\frac{1}{2}\mathbf {1}_n^T&\quad -\,1 \end{bmatrix}. \end{aligned}$$

Example 5.5

Consider the set

$$\begin{aligned} 1-x_1^3-x_2^4\ge 0,\quad 1- x_3^4 - x_4^3 \ge 0. \end{aligned}$$

The equation \(L(x)C(x)=I_2\) is satisfied by

$$\begin{aligned} L(x) = \left[ \begin{array}{cccccc} -\frac{x_{1}}{3} &{}\quad -\frac{x_{2}}{4} &{}\quad 0&{}\quad 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad -\frac{x_{3}}{4} &{}\quad -\frac{x_{4}}{3} &{}\quad 0 &{}\quad 1 \end{array}\right] . \end{aligned}$$

Example 5.6

Consider the quadratic set

$$\begin{aligned} 1-x_1x_2 -x_2x_3 - x_1x_3 \ge 0, \quad 1- x_1^2 - x_2^2 - x_3^2 \ge 0. \end{aligned}$$

The matrix \(L(x)^T\) satisfying \(L(x)C(x)=I_2\) is

$$\begin{aligned} \left[ \begin{array}{cc} 25\, {x_{1}}^3 + 10\, {x_{1}}^2\, x_{2} + 40\, x_{1}\, {x_{2}}^2 - 25\, x_{1} - 2\, x_{3} &{}\quad - 25\, {x_{1}}^3 - 10\, {x_{1}}^2\, x_{2} - 40\, x_{1}\, {x_{2}}^2 + \frac{49\, x_{1}}{2} + 2\, x_{3}\\ - 15\, {x_{1}}^2\, x_{2} + 10\, x_{1}\, {x_{2}}^2 + 20 \,x_{3}\, x_{1}\, x_{2} - 10\, x_{1} &{}\quad 15\, {x_{1}}^2\, x_{2} - 10\, x_{1}\, {x_{2}}^2 - 20\, x_{3}\, x_{1}\, x_{2} + 10\, x_{1} - \frac{x_{2}}{2}\\ 25\, x_{3}\, {x_{1}}^2 - 20\, x_{1}\, {x_{2}}^2 + 10\, x_{3}\, x_{1}\, x_{2} + 2\, x_{1} &{}\quad - 25\, x_{3}\, {x_{1}}^2 + 20\, x_{1}\, {x_{2}}^2 - 10\, x_{3}\, x_{1}\, x_{2} - 2\, x_{1} - \frac{x_{3}}{2}\\ 1 - 20\, x_{1}\, x_{3} - 10\,{x_{1}}^2 - 20\, x_{1}\, x_{2} &{}\quad 20\, x_{1}\, x_{2} + 20\, x_{1}\, x_{3} + 10\, {x_{1}}^2\\ - 50\, {x_{1}}^2 - 20\, x_{2}\, x_{1} &{}\quad 50\, {x_{1}}^2 + 20\, x_{2}\, x_{1} + 1 \end{array}\right] . \end{aligned}$$

We would like to remark that a polynomial tuple \(c=(c_1, \ldots , c_m)\) is generically nonsingular and Assumption 3.1 holds generically.

Proposition 5.7

For all positive degrees \(d_1, \ldots , d_m\), there exists an open dense subset \(\mathcal {U}\) of \(\mathcal {D} := \mathbb {R}[x]_{d_1} \times \cdots \times \mathbb {R}[x]_{d_m}\) such that every tuple \(c=(c_1, \ldots , c_m) \in \mathcal {U}\) is nonsingular. Indeed, such \(\mathcal {U}\) can be chosen as a Zariski open subset of \(\mathcal {D}\), i.e., it is the complement of a proper real variety of \(\mathcal {D}\). Moreover, Assumption 3.1 holds for all \(c \in \mathcal {U}\), i.e., it holds generically.

Proof

The proof needs to use resultants and discriminants, which we refer to [29].

First, let \(J_1\) be the set of all \((i_1, \ldots , i_{n+1})\) with \(1 \le i_1< \cdots < i_{n+1} \le m\). The resultant \(Res(c_{i_1}, \ldots , c_{i_{n+1}})\) [29, Section 2] is a polynomial in the coefficients of \(c_{i_1}, \ldots , c_{i_{n+1}}\) such that if \(Res(c_{i_1}, \ldots , c_{i_{n+1}}) \ne 0\) then the equations

$$\begin{aligned} c_{i_1}(x) =\cdots = c_{i_{n+1}}(x) = 0 \end{aligned}$$

have no complex solutions. Define

$$\begin{aligned} F_1 (c) := \prod _{ (i_1, \ldots , i_{n+1}) \in J_1 } Res(c_{i_1}, \ldots , c_{i_{n+1}}). \end{aligned}$$

For the case that \(m\le n\), \(J_1 = \emptyset \) and we just simply let \(F_1(c) = 1\). Clearly, if \(F_1(c) \ne 0\), then no more than n polynomials of \(c_1, \ldots , c_m\) have a common complex zero.

Second, let \(J_2\) be the set of all \((j_1, \ldots , j_k)\) with \(k \le n\) and \(1 \le j_1< \cdots < j_k \le m\). When one of \(c_{j_1}, \ldots , c_{j_k}\) has degree bigger than one, the discriminant \(\varDelta (c_{j_1}, \ldots , c_{j_k})\) is a polynomial in the coefficients of \(c_{j_1}, \ldots , c_{j_k}\) such that if \(\varDelta (c_{j_1}, \ldots , c_{j_k}) \ne 0\) then the equations

$$\begin{aligned} c_{j_1}(x) =\cdots = c_{j_k}(x) = 0 \end{aligned}$$

have no singular complex solution [29, Section 3], i.e., at every complex common solution u, the gradients of \(c_{j_1}, \ldots , c_{j_k}\) at u are linearly independent. When all \(c_{j_1}, \ldots , c_{j_k}\) have degree one, the discriminant of the tuple \((c_{j_1}, \ldots , c_{j_k})\) is not a single polynomial, but we can define \(\varDelta (c_{j_1}, \ldots , c_{j_k})\) to be the product of all maximum minors of its Jacobian (a constant matrix). Define

$$\begin{aligned} F_2 (c) := \prod _{ (j_1, \ldots , j_k) \in J_2 } \varDelta (c_{j_1}, \ldots , c_{j_k}). \end{aligned}$$

Clearly, if \(F_2(c) \ne 0\), then then no n or less polynomials of \(c_1, \ldots , c_m\) have a singular complex comon zero.

Last, let \(F(c) := F_1(c) F_2(c)\) and

$$\begin{aligned} \mathcal {U} := \{ c=(c_1, \ldots , c_m) \in \mathcal {D}: \, F(c) \ne 0 \}. \end{aligned}$$

Note that \(\mathcal {U}\) is a Zariski open subset of \(\mathcal {D}\) and it is open dense in \(\mathcal {D}\). For all \(c\in \mathcal {D}\), no more than n of \(c_1, \ldots , c_m\) can have a complex common zero. For any k polynomials (\(k \le n\)) of \(c_1, \ldots , c_m\), if they have a complex common zero, say, u, then their gradients at u must be linearly independent. This means that c is a nonsingular tuple.

Since every \(c \in \mathcal {U}\) is nonsingular, Proposition 5.2 implies (5.2), whence Assumption 3.1 is satisifed. Therefore, Assumption 3.1 holds for all \(c \in \mathcal {U}\). So, it holds generically. \(\square \)

6 Numerical examples

This section gives examples of using the new relaxations (3.8)–(3.9) for solving the optimization problem (1.1), with usage of Lagrange multiplier expressions. Some polynomials in the examples are from [37]. The computation is implemented in MATLAB R2012a, on a Lenovo Laptop with CPU@2.90GHz and RAM 16.0G. The relaxations (3.8)–(3.9) are solved by the software GloptiPoly 3 [13], which calls the SDP package SeDuMi [40]. For neatness, only four decimal digits are displayed for computational results.

The polynomials \(p_i\) in Assumption 3.1 are constructed as follows. Order the constraining polynomials as \(c_1, \ldots , c_m\). First, find a matrix polynomial L(x) satisfying (4.4) or (5.2). Let \(L_1(x)\) be the submatrix of L(x), consisting of the first n columns. Then, choose \((p_1, \ldots , p_m)\) to be the product \(L_1(x) \nabla f(x)\), i.e.,

$$\begin{aligned} p_i = \Big ( L_1(x) \nabla f(x) \Big )_i. \end{aligned}$$

In all our examples, the global minimum value \(f_{\min }\) of (1.1) is achieved at a critical point. This is the case if the feasible set is compact, or if f is coercive (i.e.,the sublevel set \(\{f(x) \le \ell \}\) is compact for all \(\ell \)), and the constraint qualification condition holds.

By Theorem 3.3, we have \(f_k = f_{\min }\) for all k big enough, if \(f_c = f_{\min }\) and any of its conditions i)-iii) holds. Typically, it might be inconvenient to check these conditions. However, in computation, we do not need to check them at all. Indeed, the condition (3.17) is more convenient for usage. When there are finitely many global minimizers, Theorem 3.4 proved that (3.17) is an appropriate criteria for detecting convergence. It is satisfied for all our examples, except Examples 6.1, 6.7 and 6.9 (they have infinitely many minimizers).

We compare the new relaxations (3.8)–(3.9) with standard Lasserre’s relaxations in [17]. The lower bounds given by relaxations in [17] (without using Lagrange multiplier expressions) and the lower bounds given by (3.8)–(3.9) (using Lagrange multiplier expressions) are shown in the tables. The computational time (in seconds) is also compared. The results for standard Lasserre’s relaxations are titled “w./o. L.M.E”, and those for the new relaxations (3.8)–(3.9) are titled “with L.M.E.”.

Table 1 Computational results for Example 6.1

Example 6.1

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{rl} \min &{} x_1x_2(10 - x_3 )\\ s.t. &{} x_1\ge 0, x_2 \ge 0, x_3 \ge 0, 1-x_1-x_2-x_3 \ge 0. \end{array}\right. \end{aligned}$$

The matrix polynomial L(x) is given in Example 4.2. Since the feasible set is compact, the minimum \(f_{\min } =0\) is achieved at a critical point. The condition ii) of Theorem 3.3 is satisfied.Footnote 2 Each feasible point \((x_1,x_2,x_3)\) with \(x_1x_2=0\) is a global minimizer. The computational results for standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are in Table 1. It confirms that \(f_k = f_{\min }\) for all \(k\ge 3\), up to numerical round-off errors.

Example 6.2

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{ll} \min &{} x_1^4x_2^2 + x_1^2x_2^4 + x_3^6 - 3 x_1^2x_2^2 x_3^2 + (x_1^4 + x_2^4 + x_3^4) \\ s.t. &{} x_1^2 + x_2^2 + x_3^2 \ge 1. \end{array}\right. \end{aligned}$$

The matrix polynomial \(L(x) = \begin{bmatrix} \frac{1}{2}x_1&\frac{1}{2}x_2&\frac{1}{2}x_3&-1 \end{bmatrix}\). The objective f is the sum of the positive definite form \(x_1^4+x_2^4+x_3^4\) and the Motzkin polynomial

$$\begin{aligned} M(x) := x_1^4x_2^2 + x_1^2x_2^4 + x_3^6 - 3 x_1^2x_2^2 x_3^2. \end{aligned}$$

Note that M(x) is nonnegative everywhere but not SOS [37]. Clearly, f is coercive and \(f_{\min }\) is achieved at a critical point. The set \(\hbox {IQ}(\phi ,\psi )\) is archimedean, because

$$\begin{aligned} c_1(x)p_1(x) = (x_1^2 + x_2^2 + x_3^2 -1)\big (3M(x) + 2(x_1^4 + x_2^4 + x_3^4)\big ) = 0 \end{aligned}$$

defines a compact set. So, the condition i) of Theorem 3.3 is satisfied.Footnote 3 The minimum value \(f_{\min }= \frac{1}{3}\), and there are 8 minimizers \( (\pm \frac{1}{\sqrt{3}}, \, \pm \frac{1}{\sqrt{3}}, \, \pm \frac{1}{\sqrt{3}}). \) The computational results for standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are in Table 2. It confirms that \(f_k = f_{\min }\) for all \(k\ge 4\), up to numerical round-off errors.

Example 6.3

Consider the optimization problem:

$$\begin{aligned} \left\{ \begin{array}{rl} \min &{} x_1x_2 + x_2x_3 + x_3x_4 - 3 x_1x_2 x_3x_4 +( x_1^3+\cdots + x_4^3) \\ s.t. &{} x_1, x_2, x_3, x_4 \ge 0, 1-x_1-x_2 \ge 0, 1 - x_3 - x_4 \ge 0. \end{array}\right. \end{aligned}$$

The matrix polynomial L(x) is

$$\begin{aligned} \left[ \begin{array}{cccccccccc} 1 - x_{1} &{}\quad -\,x_{2} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1&{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0\\ -\, x_{1} &{}\quad 1 - x_{2} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad 1 - x_{3} &{}\quad -\, x_{4} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 1\\ 0 &{}\quad 0 &{}\quad -\,x_{3} &{}\quad 1 - x_{4} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 1\\ -\,x_{1} &{}\quad -\,x_{2} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 0\\ 0 &{}\quad 0 &{}\quad -\,x_{3} &{}\quad -\,x_{4} &{}\quad 0 &{}\quad 0 &{}\quad 1 &{}\quad 1 &{}\quad 0 &{}\quad 1 \end{array}\right] . \end{aligned}$$

The feasible set is compact, so \(f_{\min }\) is achieved at a critical point. One can show that \(f_{\min } =0\) and the minimizer is the origin. The condition ii) of Theorem 3.3 is satisfied, because \(\hbox {IQ}(c_{eq},c_{in})\) is archimedean.Footnote 4 The computational results for standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are in Table 3.

Table 2 Computational results for Example 6.2
Table 3 Computational results for Example 6.3

Example 6.4

Consider the polynomial optimization problem

$$\begin{aligned} \left\{ \begin{array}{rl} \underset{x\in \mathbb {R}^2}{\min } &{} x_1^2+50x_2^2 \\ s.t. &{} x_1^2- \frac{1}{2} \ge 0, x_2^2-2x_1x_2 - \frac{1}{8} \ge 0, x_2^2+2x_1x_2 - \frac{1}{8} \ge 0. \end{array} \right. \end{aligned}$$

It is motivated from an example in [12, §3]. The first column of L(x) is

$$\begin{aligned} \left[ \begin{array}{l} \frac{8\, {x_{1}}^3}{5} + \frac{x_{1}}{5}\\ \frac{288\, x_{2}\, {x_{1}}^4}{5} - \frac{16\, {x_{1}}^3}{5} - \frac{x_{2}\, {x_{1}}^2\, 124}{5} + \frac{8\, x_{1}}{5} - 2\, x_{2}\\ - \frac{288\, x_{2}\, {x_{1}}^4}{5} - \frac{16\, {x_{1}}^3}{5} + \frac{x_{2}\, {x_{1}}^2\, 124}{5} + \frac{8\, x_{1}}{5} + 2\, x_{2} \end{array}\right] , \end{aligned}$$

and the second column of L(x) is

$$\begin{aligned} \left[ \begin{array}{l} - \frac{8\, {x_{1}}^2\, x_{2}}{5} + \frac{4\, {x_{2}}^3}{5} - \frac{x_{2}}{10}\\ \frac{288\, {x_{1}}^3\, {x_{2}}^2}{5} + \frac{16\, {x_{1}}^2\, x_{2}}{5} - \frac{142\, x_{1}\, {x_{2}}^2}{5} - \frac{9\, x_{1}}{20} - \frac{8\, {x_{2}}^3}{5} + \frac{11\, x_{2}}{5}\\ - \frac{288\, {x_{1}}^3\, {x_{2}}^2}{5} + \frac{16\, {x_{1}}^2\, x_{2}}{5} + \frac{142\, x_{1}\, {x_{2}}^2}{5} + \frac{9\, x_{1}}{20} - \frac{8\, {x_{2}}^3}{5} + \frac{11\, x_{2}}{5} \end{array}\right] . \end{aligned}$$

The objective is coercive, so \(f_{\min }\) is achieved at a critical point. The minimum value \(f_{\min } = 56+3/4+25\sqrt{5} \approx 112.6517\) and the minimizers are \((\pm \,\sqrt{1/2}, \pm \,(\sqrt{5/8}+\sqrt{1/2}))\). The computational results for standard Lasserre’s relaxations and the new ones (3.8)-(3.9) are in Table 4. It confirms that \(f_k = f_{\min }\) for all \(k\ge 4\), up to numerical round-off errors.

Table 4 Computational results for Example 6.4

Example 6.5

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{ll} \min \limits _{x\in \mathbb {R}^3} &{} x_1^3+x_2^3+x_3^3 + 4x_1x_2x_3 -\big ( x_1(x_2^2+x_3^2)+x_2(x_3^2+x_1^2)+x_3(x_1^2+x_2^2) \big ) \\ \hbox {s.t.} &{} x_1 \ge 0, x_1x_2-1 \ge 0, \, x_2x_3 -1 \ge 0. \end{array}\right. \end{aligned}$$

The matrix polynomial L(x) is

$$\begin{aligned} \left[ \begin{array}{cccccc} 1 - x_{1}\, x_{2} &{}\quad 0 &{}\quad 0 &{}\quad x_{2} &{}\quad x_{2} &{}\quad 0\\ x_{1} &{}\quad 0 &{}\quad 0 &{}\quad -\,1 &{}\quad -\,1 &{}\quad 0\\ - x_{1} &{}\quad x_{2} &{}\quad 0 &{}\quad 1 &{}\quad 0 &{}\quad -\,1 \end{array} \right] . \end{aligned}$$

The objective is a variation of Robinson’s form [37]. It is a positive definite form over the nonnegative orthant \(\mathbb {R}_+^3\), so the minimum value is achieved at a critical point. In computation, we got \(f_{\min } \approx 0.9492\) and a global minimizer (0.9071, 1.1024, 0.9071). The computational results for standard Lasserre’s relaxations and the new ones (3.8)-(3.9) are in Table 5. It confirms that \(f_k = f_{\min }\) for all \(k\ge 3\), up to numerical round-off errors.

Table 5 Computational results for Example 6.5

Example 6.6

Consider the optimization problem (\(x_0 := 1\))

$$\begin{aligned} \left\{ \begin{array}{rl} \min \limits _{x\in \mathbb {R}^4} &{} x^Tx + \sum _{i=0}^4\prod \limits _{j\ne i} (x_i-x_j) \\ \hbox {s.t.} &{} x_1^2-1 \ge 0,\, x_2^2 -1 \ge 0,\, x_3^2 - 1 \ge 0,\, x_4^2-1 \ge 0. \end{array}\right. \end{aligned}$$

The matrix polynomial \(L(x) = \begin{bmatrix} \frac{1}{2}\text{ diag }(x)&-I_4 \end{bmatrix}\).The first part of the objective is \(x^Tx\), while the second part is a nonnegative polynomial [37]. The objective is coercive, so \(f_{\min }\) is achieved at a critical point. In computation, we got \(f_{\min } = 4.0000\) and 11 global minimizers:

$$\begin{aligned} (1, 1, 1, 1), \quad (1, -1, -1, 1), \quad (1, -1, 1, -1), \quad (1, 1, -1, -1), \end{aligned}$$
$$\begin{aligned} (1, -1, -1, -1), \quad (-1, -1, 1, 1), \quad (-1, 1, -1, 1), \quad (-1, 1, 1, -1), \end{aligned}$$
$$\begin{aligned} (-1, -1, -1, 1), \quad (-1, -1, 1, -1), \quad (-1, 1, -1, -1). \end{aligned}$$

The computational results for standard Lasserre’s relaxations and the new ones (3.8)-(3.9) are in Table 6. It confirms that \(f_k = f_{\min }\) for all \(k\ge 4\), up to numerical round-off errors.

Table 6 Computational results for Example 6.6

Example 6.7

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{rl} \min \limits _{x\in \mathbb {R}^3} &{} x_1^4x_2^2 + x_2^4x_3^2 + x_3^4x_1^2 - 3 x_1^2x_2^2 x_3^2 + x_2^2 \\ \hbox {s.t.} &{} x_1-x_2x_3 \ge 0, -x_2+x_3^2 \ge 0. \end{array}\right. \end{aligned}$$

The matrix polynomial \( L(x) = \left[ \begin{array}{rrrrr} 1 &{} 0 &{} 0 &{} 0 &{} 0\\ - x_{3} &{} -1 &{} 0 &{} 0 &{} 0 \end{array}\right] . \) By the arithmetic-geometric mean inequality, one can show that \(f_{\min } = 0 \). The global minimizers are \((x_1,0,x_3)\) with \(x_1 \ge 0\) and \(x_1x_3 =0\). The computational results for standard Lasserre’s relaxations and the new ones (3.8)-(3.9) are in Table 7. It confirms that \(f_k = f_{\min }\) for all \(k\ge 5\), up to numerical round-off errors.

Table 7 Computational results for Example 6.7

Example 6.8

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{rl} \min \limits _{x\in \mathbb {R}^4} &{} x_1^2(x_1-x_4)^2 + x_2^2(x_2-x_4)^2 + x_3^2(x_3-x_4)^2 + \\ &{} \qquad 2x_1x_2x_3(x_1+x_2+x_3-2x_4) + (x_1-1)^2 + (x_2-1)^2 + (x_3-1)^2 \\ \hbox {s.t.} &{} x_1 - x_2 \ge 0, \,\, x_2 - x_3 \ge 0. \end{array} \right. \end{aligned}$$

The matrix polynomial \(L(x) = \left[ \begin{array}{rrrrrr} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \end{array}\right] .\) In the objective, the sum of the first 4 terms is a nonnegative form [37], while the sum of the last 3 terms is a coercive polynomial. The objective is coercive, so \(f_{\min }\) is achieved at a critical point. In computation, we got \(f_{\min } \approx 0.9413 \) and a minimizer

$$\begin{aligned} (0.5632, \, 0.5632, \, 0.5632, \, 0.7510). \end{aligned}$$

The computational results for standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are in Table 8. It confirms that \(f_k = f_{\min }\) for all \(k\ge 3\), up to numerical round-off errors.

Table 8 Computational results for Example 6.8

Example 6.9

Consider the optimization problem

$$\begin{aligned} \left\{ \begin{array}{rl} \min \limits _{x\in \mathbb {R}^4} &{} (x_1+ x_2 + x_3 + x_4 + 1)^2 -4(x_1x_2 +x_2x_3 +x_3x_4 + x_4 + x_1) \\ \hbox {s.t.} &{} 0 \le x_1, \ldots , x_4 \le 1. \end{array} \right. \end{aligned}$$

The matrix L(x) is given in Example 4.3. The objective is the dehomogenization of Horn’s form [37]. The feasible set is compact, so \(f_{\min }\) is achieved at a critical point. The condition ii) of Theorem 3.3 is satisfied.Footnote 5 The minimum value \(f_{\min }=0\). For each \(t\in [0,1]\), the point \((t,0,0,1-t)\) is a global minimizer. The computational results for standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are in Table 9.

Table 9 Computational results for Example 6.9

For some polynomial optimization problems, the standard Lasserre’s relaxations might converge fast, e.g., the lowest order relaxation may often be tight. For such cases, the new relaxations (3.8)–(3.9) have the same convergence property, but might take more computational time. The following is such a comparison.

Example 6.10

Consider the optimization problem (\(x_0 :=1\))

$$\begin{aligned} \left\{ \begin{array}{rl} \min \limits _{x\in \mathbb {R}^n} &{} \sum _{0 \le i \le j \le j \le n} c_{ijk} x_i x_j x_k \\ \hbox {s.t.} &{} 0 \le x\le 1, \end{array} \right. \end{aligned}$$

where each coefficient \(c_{ijk}\) is randomly generated (by randn in MATLAB). The matrix L(x) is the same as in Example 4.3. Since the feasible set is compact, we always have \(f_c = f_{\min }\). The condition ii) of Theorem 3.3 is satisfied, because of box constraints. For this kind of randomly generated problems, standard Lasserre’s relaxations are often tight for the order \(k=2\), which is also the case for the new relaxations (3.8)–(3.9). Here, we compare the computational time that is consumed by standard Lasserre’s relaxations and (3.8)–(3.9). The time is shown (in seconds) in Table 10. For each n in the table, we generate 10 random instances and we show the average of the consumed time. For all instances, standard Lasserre’s relaxations and the new ones (3.8)–(3.9) are tight for the order \(k=2\), while their time is a bit different. We can observe that (3.8)–(3.9) consume slightly more time.

Table 10 Consumed time (in seconds) for Example 6.10

7 Discussions

7.1 Tight relaxations using preorderings

When the global minimum value \(f_{\min }\) is achieved at a critical point, the problem (1.1) is equivalent to (3.7). We proposed relaxations (3.8)–(3.9) for solving (3.7). Note that

$$\begin{aligned} \hbox {IQ}(c_{eq},c_{in})_{2k} + \hbox {IQ}(\phi ,\psi )_{2k} = \hbox {Ideal}(c_{eq},\phi )_{2k} + \hbox {Qmod}(c_{in},\psi )_{2k}. \end{aligned}$$

If we replace the quadratic module \(\hbox {Qmod}(c_{in},\psi )\) by the preordering of \((c_{in},\psi )\) [21, 25], we can get further tighter relaxations. For convenience, write \((c_{in},\psi )\) as a single tuple \((g_1, \ldots , g_\ell ).\) Its preordering is the set

$$\begin{aligned} \hbox {Preord}(c_{in},\psi ):= \sum _{ r_1, \ldots , r_\ell \in \{0, 1\} } g_1^{r_1}\cdots g_\ell ^{r_\ell } \varSigma [x]. \end{aligned}$$

The truncation \(\hbox {Preord}(c_{in},\psi )_{2k}\) is similarly defined like \(\hbox {Qmod}(c_{in},\psi )_{2k}\) in Sect. 2. A tighter relaxation than (3.8), of the same order k, is

$$\begin{aligned} \left\{ \begin{array}{rl} f_k^{\prime , pre} := \min &{} \langle f, y \rangle \\ s.t. &{} \langle 1, y \rangle = 1, \, L_{c_{eq}}^{(k)}(y) = 0, \, L_{\phi }^{(k)}(y) = 0, \\ &{} L_{ g_1^{r_1}\cdots g_\ell ^{r_\ell } }^{(k)}(y) \succeq 0 \quad \forall \, r_1,\ldots , r_\ell \in \{0, 1\}, \\ &{} y \in \mathbb {R}^{ \mathbb {N}_{2k}^n }. \end{array} \right. \end{aligned}$$
(7.1)

Similar to (3.9), the dual optimization problem of the above is

$$\begin{aligned} \left\{ \begin{array}{rl} f_k^{pre} := \, \max &{} \gamma \\ s.t. &{} f-\gamma \in \hbox {Ideal}(c_{eq},\phi )_{2k} + \hbox {Preord}(c_{in},\psi )_{2k}. \end{array} \right. \end{aligned}$$
(7.2)

An attractive property of the relaxations (7.1)–(7.2) is that: the conclusion of Theorem 3.3 still holds, even if none of the conditions (i)–(iii) there is satisfied. This gives the following theorem.

Theorem 7.1

Suppose \(\mathcal {K}_c \ne \emptyset \) and Assumption 3.1 holds. Then,

$$\begin{aligned} f_k^{pre} = f_k^{\prime ,pre} = f_c \end{aligned}$$

for all k sufficiently large. Therefore, if the minimum value \(f_{\min }\) of (1.1) is achieved at a critical point, then \(f_k^{pre} = f_k^{\prime ,pre} = f_{\min }\) for all k big enough.

Proof

The proof is very similar to the Case III of Theorem 3.3. Follow the same argument there. Without Assumption 3.2, we still have \(\hat{f}(x) \equiv 0\) on the set

$$\begin{aligned} \mathcal {K}_3 := \{ x \in \mathbb {R}^n \, \mid \, c_{eq}(x) = 0, \quad \phi (x) = 0, \quad c_{in}(x) \ge 0, \quad \psi (x) \ge 0 \}. \end{aligned}$$

By the Positivstellensatz, there exists an integer \(\ell >0\) and \(q \in \hbox {Preord}(c_{in},\psi )\) such that \( \hat{f}^{2\ell } + q \in \hbox {Ideal}(c_{eq},\phi ). \) The resting proof is the same. \(\square \)

7.2 Singular constraining polynomials

As shown in Proposition 5.2, if the tuple c of constraining polynomials is nonsingular, then there exists a matrix polynomial L(x) such that \(L(x) C(x) = I_m.\) Hence, the Lagrange multiplier \(\lambda \) can be expressed as in (5.3). However, if c is not nonsingular, then such L(x) does not exist. For such cases, how can we express \(\lambda \) in terms of x for critical pairs \((x,\lambda )\)? This question is mostly open, to the best of the author’s knowledge.

7.3 Degree bound for L(x)

For a nonsingular tuple c of constraining polynomials, what is a good degree bound for L(x) in Proposition 5.2? When c is linear, a degree bound is given in Proposition 4.1. However, for nonlinear c, an explicit degree bound is not known. Theoretically, we can get a degree bound for L(x). In the proof of Proposition 5.2, the Hilbert’s Nullstellensatz is used for t times. There exists sharp degree bounds for Hilbert’s Nullstellensatz [16]. For each time of its usage, if the degree bound in [16] is used, then a degree bound for L(x) can be eventually obtained. However, such obtained bound is enormous, because the one in [16] is already exponential in the number of variables. An interesting future work is to get a useful degree bound for L(x).

7.4 Rational representation of Lagrange multipliers

In (5.1), the Lagrange multiplier vector \(\lambda \) is determined by a linear equation. Naturally, one can get

$$\begin{aligned} \lambda \, = \, \Big ( C(x)^T C(x) \Big )^{-1} C(x)^T \begin{bmatrix} \nabla f(x) \\ 0 \end{bmatrix}, \end{aligned}$$

when C(x) has full column rank. This rational representation is expensive for usage, because its denominator is typically a high degree polynomial. However, \(\lambda \) might have rational representations other than the above. Can we find a rational representation whose denominator and numerator have low degrees? If this is possible, the methods for optimizing rational functions [3, 15, 28] can be applied. This is an interesting question for future research.