1 Introduction

The optimal value of a polynomial optimization over a compact semialgebraic set can be approximated as closely as desired by solving a hierarchy of semidefinite programming (SDP) relaxations and the convergence is finite generically under a mild assumption called Archimedean condition which requires the compactness of the feasible region (see [19, 20, 24, 27, 28, 30, 31]). It is known that the size of the SDP relaxations of the hierarchy, known now as the Lasserre hierarchy, rapidly grows as the number of variables and the relaxation order increase, preventing applications of the hierarchy to large scale polynomial optimization problems as the size of the SDP relaxations are too large to solve. A great deal of attention has recently been focused on reducing the size of these SDP relaxations. This has led to a sparse variant of the Lasserre hierarchy that allowed applications to various large scale polynomial optimization problems over compact semialgebraic feasible sets [5, 8, 16, 18, 36] such as the sensor network localization problems [17, 25], hybrid system identification problems [7], optimal power flow problems [9], robust optimization problems [1], portfolio optimization problems [15] and filter design problems [22].

More recently, the standard Lasserre hierarchy of SDP relaxations has been shown to extend to polynomial optimization over unbounded semialgebraic feasible sets via suitable modifications [11]. The purpose of this paper is to present a convergent sparse SDP hierarchy for solving polynomial optimization problems with sparse patterns and unbounded semialgebraic feasible sets, extending the unbounded version of the Lasserre hierarchy [11]. More specifically, we make the following contributions to global polynomial optimization.

  1. (1)

    We present a sparse SDP hierarchy for solving polynomial optimization problems with unbounded feasible sets incorporating the objective function in the construction of quadratic modules that generate the sequence of SDP relaxations. This approach extends the relaxation scheme, developed for convex polynomial optimization over noncompact sets [12]. This is done by employing known sums-of-squares sparsity techniques [18, 21] together with a representation theorem for polynomials over unbounded sets, established recently in [11].

  2. (2)

    By solving some numerical test problems, we illustrate that our sparse SDP hierarchy can easily be adapted with the current large scale polynomial optimization solver SparsePOP [37] to solve some classes of large scale polynomial optimization problems with unbounded feasible sets. We also apply our SDP hierarchy to solve a class of sparse polynomial optimization problems with unbounded feasible sets and hidden coercivity.

The organization of the paper is as follows. In Sect. 2, we fix the notation and recall some basic facts on polynomial optimization. In Sect. 3, we present our sparse SDP hierarchy for polynomial optimization with unbounded feasible sets and establish its convergence. In Sect. 4, we illustrate how our proposed scheme works by solving various large scale numerical test problems.

2 Preliminaries

Throughout this paper, \(\mathbb {R}^n\) denotes the Euclidean space with dimension n. The inner product in \(\mathbb {R}^n\) is defined by \(\langle x,y\rangle := x^Ty\) for all \(x,y \in \mathbb {R}^n\). The non-negative orthant of \(\mathbb {R}^n\) is denoted by \(\mathbb {R}^n_{+}\) and is defined by \(\mathbb {R}^n_{+}:=\{(x_1,\ldots ,x_n) \in \mathbb {R}^n \ | \ x_i \ge 0\}\). The closed ball with center x and radius r is denoted by \(\overline{\mathbb {B}}(x,r)\). We use \(\mathbf{e}_n\) to denotes the vector in \(\mathbb {R}^n\) whose elements are all one. For a multi-index \(\alpha =(\alpha _1,\ldots ,\alpha _n)\), the monomial \(x^{\alpha }\) is defined by \(x^{\alpha }=x_1^{\alpha _1},\ldots , x_n^{\alpha _n}\). Denote by \(\mathbb {R}[\underline{x}]\) the ring of polynomials in \(x := (x_1,x_2, \ldots , x_n)\) with real coefficients. The degree of a real polynomial f is denoted by \(\mathrm{deg}f\). We say that a real polynomial \(f\in \mathbb {R}[\underline{x}]\) is a sum of squares (SOS) polynomial if there exist real polynomials \(f_j, j = 1,\ldots , r,\) such that \(f=\sum _{j=1}^rf_j^2\). The set of all sum of squares real polynomials with variable x is denoted by \(\Sigma ^2[\underline{x}]\). The set of all sum of squares real polynomial with variable x and degree at most d is denoted by \(\Sigma ^2_d[\underline{x}]\). An important property of SOS polynomials is that checking a given polynomial with degree d is a sum of squares polynomial with degree d or not is equivalent to solving a semidefinite linear programming problem (see [10, 13, 14, 20, 24, 28, 32]).

The quadratic module \(\mathbf{M}(g;h)=\mathbf{M}(-g_1,\ldots -g_m;h_1,\ldots ,h_q)\subset \mathbb {R}[\underline{x}]\) associated with the semi-algebraic set K, defined by

$$\begin{aligned} K\,:=\,\{x\,:\,g_j(x) \le 0,\quad \,j=1,\ldots ,m;\quad h_s(x)=0, \,s=1,\ldots ,q\}, \end{aligned}$$
(2.1)

is given by

$$\begin{aligned} \mathbf{M}(g;h):= & {} \left\{ \sigma _0-\sum _{j=1}^m\sigma _j(x)g_j(x)+\sum _{s=1}^q\phi (x)h_s(x) :\right. \\&\left. \sigma _j\,\in \,\Sigma ^2[x],\, j=0,1,\ldots ,m;\quad \phi _s \in \mathbb {R}[x], \, s=1,\ldots ,q\right\} . \end{aligned}$$

We note that any member of the quadratic module \(\mathbf {M}(g;h)\) is a polynomial which is non-negative over the set K. The quadratic module \(\mathbf {M}(g;h)\) is called Archimedean [20, 24] if there exists a polynomial \(\psi \in \mathbf {M}(g;h)\) such that \(\{x: \psi (x) \ge 0\}\) is compact. A necessary condition for the quadratic module \(\mathbf {M}(g;h)\) to be Archimedean is that the set K is a compact set.

Lemma 2.1

(Putinar positivstellensatz) [29] Let \(f,g_j,h_s\), \(j=1,\ldots ,m\), \(s=1,\ldots ,q\), be real polynomials with \(K:=\{x:g_j(x) \le 0,j=1,\ldots ,m, h_s(x)=0, s=1,\ldots ,q\} \ne \emptyset \). Let \(\mathbf{M}(g;h)\) be Archimedean. If \(f(x)>0\) for all \(x \in K\), then \(f \in M(g;h)\).

We now introduce a sparse version of Putinar positivestellensatz which was derived by Lasserre [21] and improved later on by Kojima and Muramatsu [18]. To do this, we need the definitions of the support of a polynomial and the running intersection property. Recall that, for a polynomial \(f(x)=\sum _{\alpha }f_{\alpha }x^{\alpha }\) on \(\mathbb {R}^n\) with degree d, the support of f is denoted by \(\mathrm{supp}f\) and is defined by

$$\begin{aligned} \mathrm{supp}\,f=\left\{ \alpha \in (\mathbb {N} \cup \{0\})^d{:} f_{\alpha } \ne 0\right\} . \end{aligned}$$

Definition 2.1

(Running intersection property) Let \(I_l\), \(l=1,\ldots ,p\) be index sets such that \(\bigcup _{l=1}^p I_l=\{1,\ldots ,n\}\). We say that the running intersection property holds for \(I_l\), \(l=1,\ldots ,p\), if for each \(l=1,\ldots ,p-1\), there exists \(s \le l\) such that

$$\begin{aligned} I_{l+1} \cap \left( \bigcup _{j=1}^l I_j\right) \subseteq I_s. \end{aligned}$$

We note that the running intersection property has its roots in graph theory, and can be satisfied for many polynomial optimization problems with structured sparsity patterns. In particular, It is known that maximal cliques of a chordal graph (after possible reordering) satisfy the running intersection property [35, Theorem 3.5]. For a detailed discussion, see [18].

Lemma 2.2

(Sparse version of Putinar positivstellensatz [18, 21]) Let \(f(x)=\sum _{l=1}^p f^l(x)\), \(g_j(x)=\sum _{\alpha }(g_j)_{\alpha }x^{\alpha }\), \(j=1,\ldots ,m\), and \(h_s(x)=\sum _{\alpha }(h_s)_{\alpha }x^{\alpha }\), \(s=1,\ldots ,q\), be polynomials on \(\mathbb {R}^n\) with degree d. Let \(I_l\) be a set of indexes such that \(\mathrm{supp}f^l \subseteq I_l \subseteq \{1,\ldots ,n\}\), \(l=1,\ldots ,p\) and \(\bigcup _{l=1}^p I_l=\{1,\ldots ,n\}\). Suppose that for each \(j=1,\ldots ,m\) and \(s=1,\ldots ,q\), \(\mathrm{supp}g_j \cup \mathrm{supp}h_s \subseteq I_l\) for some \(l \in \{1,\ldots ,p\}\), and the running intersection property holds for \(I_l\), \(l=1,\ldots ,p\). Let \(K:=\{x:g_j(x) \le 0,j=1,\ldots ,m, h_s(x)=0, s=1,\ldots ,q\} \ne \emptyset \) and let \(\mathbf{M}(g;h)\) be Archimedean. If \(f(x)>0\) on K, then

$$\begin{aligned} f=\sum _{l=1}^p\left( \sigma _{0l}-\sum _{j=1}^m \sigma _{jl} g_j+\sum _{s=1}^q \phi _{sl} h_s\right) \end{aligned}$$

where \(\sigma _{jl}\), \(j=0,1,\ldots ,m\), are SOS polynomials with variables \(\{x_i:i \in I_l\}\), and \(\phi _{sl}\), \(s=1,\ldots ,q\), are real polynomials with variables \(\{x_i:i \in I_l\}\).

We note that, the assumption “for each \(j=1,\ldots ,m\) and \(s=1,\ldots ,q\), \(\mathrm{supp}g_j \cup \mathrm{supp}h_s \subseteq I_l\) for some \(l \in \{1,\ldots ,p\}\)” and the running intersection property are automatically satisfied in the special case where \(p=1\) and \(I_1=\{1,\ldots ,n\}\). So, in this case, the sparse version of Putinar positivstellensatz reduces to Putinar positivstellensatz.

2.1 Coercivity of polynomials

We now recall the definitions for coercive polynomials. This definition plays an important role in our analysis for positivity over noncompact sets later on.

Definition 2.2

(Coercivity and strong coercivity) Let \(f(x)=\sum _{\alpha }f_{\alpha }x^{\alpha }\) be a polynomial on \(\mathbb {R}^n\) with degree d. Let \(f(x)=\sum _{i=0}^d f_i(x)\) where each \(f_i\) is a homogeneous polynomial of degree i, \(i=0,1,\ldots ,d\). We say the polynomial f is

  • coercive if \(f(x) \rightarrow +\infty \) whenever \(\Vert x\Vert \rightarrow +\infty \);

  • s -strongly coercive for some \(s \in \{1,\ldots ,d\}\) if \(f_{s}(x)>0\) for all \(x \ne 0\) and \(f_{i}(x) \ge 0\) for all \(s+1 \le i \le d\) and \(x \in \mathbb {R}^n\);

  • strongly coercive if f is d-strongly coercive.

It follows from the definition that an s-strongly coercive polynomial, \(s=1,\ldots ,d\), must be coercive. On the other hand, the converse is not true. As an example, the 2-dimensional Rosenbrock function

$$\begin{aligned} f(x_1,x_2)= 1+\left( x_2-x_1^2\right) ^2+\left( 1-x_2\right) ^2 \end{aligned}$$

is a coercive polynomial which is not s-strongly coercive for \(s=1,2,3,4\). We also note that it was shown in [11] that the strong coercivity can be numerically checked by solving semidefinite programming problems. Furthermore, any polynomial of the form \(\sum _{i=1}^na_ix_i^d+ \sum _{|\alpha |\le d-1}h_{\alpha }x^{\alpha }\) where \(|\alpha |=\sum _{i=1}^n \alpha _i\) with \(\alpha =(\alpha _1,\ldots ,\alpha _n) \in (\mathbb {N} \cup \{0\})^n\), \(d \in 2\mathbb {N}\) and \(a_i>0\), \(i=1,\ldots ,n\), is a strongly coercive polynomial.

The following proposition shows that a coercive polynomial is always level-bounded. Moreover, the corresponding bound for the lower level set can be computed using the coefficients of the underlying polynomial if the polynomial is assumed to be s-strongly coercive for some \(s=1,\ldots ,d\).

Proposition 2.1

(Boundness of the lower level set via coercivity) Let \(f(x)=\sum _{\alpha }f_{\alpha }x^{\alpha }\) be a coercive polynomial on \(\mathbb {R}^n\) with degree d. Let \(f(x)=\sum _{i=0}^d f_i(x)\) where each \(f_i\) is a homogeneous polynomial with degree i, \(i=0,1,\ldots ,d\). Then, for each \(c \in \mathbb {R}\), the lower level set \(\{x:f(x) \le c\}\) is a compact set. Furthermore, if f is s-strongly coercive for some \(s=1,\ldots ,d\), then \(\{x: f(x) \le c\} \subseteq \overline{\mathbb {B}}(0,r)\) where

$$\begin{aligned} r=\max \left\{ 1, \displaystyle \frac{c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }|}{\rho _{s}}\right\} \quad \text{ and }\quad \rho _{s}=\min \{f_{s}(x):\Vert x\Vert =1\}. \end{aligned}$$

Proof

Fix any \(c \in \mathbb {R}\). Then, the lower level set \(\{x:f(x) \le c\}\) is a compact set. To see this, we suppose on the contrary that there exists \(\{x_n\}_{n=1}^{\infty }\), with \(f(x_n) \le c\) and \(\{x_n\}\) is unbounded. By passing to subsequence if necessary, we may assume that \(\Vert x_n\Vert \rightarrow +\infty \). As f is a coercive polynomial, we must have \(f(x_n) \rightarrow +\infty \). This contradicts the fact that \(f(x_n) \le c\) for all \(n \in \mathbb {N}\).

To see the second assertion, we assume that f is s-strongly coercive for some \(s=1,\ldots ,d\). Let \(a \in \mathbb {R}^n\) be any point such that \(f(a) \le c\). Then,

$$\begin{aligned} \rho _{s} \Vert a\Vert ^{s} \le f_{s}(a) \le \sum _{j=s}^d f_j(a)= f(a)-\sum _{j=0}^{s-1} f_j(a) \le c-\sum _{j=0}^{s-1} f_j(a) \le c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }|\, |a^{\alpha }|. \end{aligned}$$

This gives us that either \(\Vert a\Vert \le 1\) or

$$\begin{aligned}&\rho _{s} \Vert a\Vert ^{s} \le c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }|\, |a^{\alpha }| \le c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }| \Vert a\Vert ^{|\alpha |}\\&\quad \le \left( c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }|\right) \Vert a\Vert ^{s-1}, \end{aligned}$$

where the second inequality follows from the fact that \(|x^{\alpha }| \le \Vert x\Vert ^{|\alpha |}\) for all \(\Vert x\Vert \ge 1\) and the last inequality is from the fact that \(\Vert x\Vert ^{|\alpha |} \le \Vert x\Vert ^{s-1}\) for all \(|\alpha | \le s-1\) and \(\Vert x\Vert \ge 1\). So, we have

$$\begin{aligned} \Vert a\Vert \le \max \left\{ 1, \displaystyle \frac{c+\sum _{0 \le |\alpha | \le s-1}|f_{\alpha }|}{\rho _{s}}\right\} , \end{aligned}$$

and hence, the conclusion follows. \(\square \)

Corollary 2.1

Let \(f(x)=\sum _{\alpha }f_{\alpha }x^{\alpha }\) and \(g_j(x)=\sum _{\alpha }(g_j)_{\alpha }x^{\alpha }\), \(j=1,\ldots ,m\), be polynomials on \(\mathbb {R}^n\) with degree d.

  1. (i)

    If there exist \(\mu _j \ge 0\), \(j=0,1,\ldots ,m\), such that \(\mu _0 f+\sum _{j=1}^m \mu _j g_j\) is coercive, then, for each \(c \in \mathbb {R}\), the set \(\{x: g_j(x) \le 0, j=1,\ldots ,m, f(x) \le c\}\) is a compact set.

  2. (ii)

    If \(\mu _0 f+\sum _{j=1}^m \mu _j g_j\) is s-strongly coercive for some \(s\in \{1,\ldots ,d\}\), then

    $$\begin{aligned} \{x: g_j(x) \le 0, \quad j=1,\ldots ,m, f(x) \le c\} \subseteq \overline{\mathbb {B}}((0,r), \end{aligned}$$

    where

    $$\begin{aligned} r=\max \left\{ 1, \displaystyle \frac{\mu _0 c+\sum _{0 \le |\alpha | \le s-1}|\mu _0f_{\alpha }+\sum _{j=1}^m\mu _j(g_j)_{\alpha }|}{\rho _{s}}\right\} \end{aligned}$$

    and

    $$\begin{aligned} \rho _s=\min \left\{ \left( \mu _0 f+\sum _{j=1}^m\mu _jg_j\right) _{s}(v):\Vert v\Vert =1\right\} . \end{aligned}$$

Proof

Note that \(\{x: g_j(x) \le 0, j=1,\ldots ,m, f(x) \le c\} \subseteq \{x: \mu _0 f(x)+\sum _{j=1}^m \mu _j g_i(x) \le \mu _0 c\}\). The conclusion follows by applying the Proposition 2.1 with f replaced by \(\mu _0 f+\sum _{j=1}^m \mu _j g_i\). \(\square \)

3 A sparse hierarchy for optimization over unbounded sets

Consider the polynomial optimization problem

where \(f^l,g_j,h_s\) are (nonconvex) polynomials on \(\mathbb {R}^n\), \(l=1,\ldots ,p\), \(j=1,\ldots ,m\) and \(s=1,\ldots ,q\). Let the feasible set be denoted by K, that is, \(K=\{x: g_j(x) \le 0, j=1,\ldots ,m, h_s(x)=0, \, s=1,\ldots ,q\}\).

Let \(f(x)=\sum _{l=1}^p f^l(x)\), \(g_j(x)=\sum _{\alpha }(g_j)_{\alpha }x^{\alpha }\), \(j=1,\ldots ,m\) and \(h_s(x)=\sum _{\alpha } (h_s)_{\alpha } x^{\alpha }\), \(s=1,\ldots ,q\), be polynomials on \(\mathbb {R}^n\) with degree d. For simplicity, here we assume that the objective polynomial functions and the constraint polynomial functions have the same degree because one can let d be the maximum of the degree of f, \(g_j\), \(j=1,\ldots ,m\) and \(h_s\), \(s=1,\ldots ,q\), and regard f, \(g_j\) and \(h_s\) as polynomials with degree d.

Let \(I_l\) be a set of indices such that \(\mathrm{supp}f^l \subseteq I_l \subseteq \{1,\ldots ,n\}\), \(l=1,\ldots ,p\) and \(\bigcup _{l=1}^p I_l=\{1,\ldots ,n\}\). Let \(x^0 \in \{x: g_j(x) \le 0, j=1,\ldots ,m\}\) and let c be a number such that \(c>\max _{1 \le l \le p}\{f^l(x^0)\}\). For each integer k,  we define the truncated sparse version of the quadratic module \(\bar{\mathbf {M}}_k\) by

$$\begin{aligned} \bar{\mathbf {M}}_k:= & {} \left\{ \sum _{l=1}^p\bigg (\sigma _{0l} - \sum _{j=1}^m \sigma _{jl} g_{j}+ \bar{\sigma }_l (c - f^l) + \sum _{s=1}^q \phi _{sl} h_s \bigg ) \ | \ \sigma _{0l}, \sigma _{jl},\bar{\sigma }_l \in \Sigma ^2[\underline{x_l}], \, \phi _s \in \mathbb {R}[\underline{x_l}] \right. \\&\quad \qquad \left. \deg \, \sigma _{0l} \le 2k, \, \deg \, \sigma _{jl} g_{j} \le 2k, \deg \bar{\sigma }_l (c - f^l) \le 2k, \deg \phi _{sl} h_s \le 2k\right\} , \end{aligned}$$

where for each \(l=1,\ldots ,p\), \(\Sigma ^2[\underline{x_l}]\) denotes the set of all SOS polynomials with variable \(\{x_i: i \in I_l\}\) and \(\mathbb {R}[\underline{x_l}]\), denotes the set of all real polynomials with variable \(\{x_i:i \in I_l\}\).

Consider the following relaxation problem

$$\begin{aligned} \bar{f}_k^* := \sup \left\{ \mu \in {\mathbb R} \ | \ f - \mu \in \bar{\mathbf {M}}_{k} \right\} . \end{aligned}$$
(3.1)

By construction, \(\bar{f}^*_k \le \bar{f}^*_{k + 1} \le \cdots \le \min (P).\) Note that, if we set \(\bar{\sigma }_l \equiv 0\), \(l=1,\ldots ,p\), then the hierarchy (3.1) reduces to the known sparse SDP hierarchy proposed in [17, 18].

Theorem 3.1

(Convergence of sparse SDP hierarchy) Let \(f(x)=\sum _{l=1}^p f^l(x)\), \(g_j(x)=\sum _{\alpha }(g_j)_{\alpha }x^{\alpha }\), \(j=1,\ldots ,m\) and \(h_s(x)=\sum _{\alpha } (h_s)_{\alpha } x^{\alpha }\), \(s=1,\ldots ,q\), be polynomials on \(\mathbb {R}^n\) with degree d. Let \(I_l\) be a set of indexes such that \(\mathrm{supp}f^l \subseteq I_l \subseteq \{1,\ldots ,n\}\), \(l=1,\ldots ,p\) and \(\bigcup _{l=1}^p I_l=\{1,\ldots ,n\}\). Consider the SDP hierarchy (3.1) and denote its optimal value by \(\bar{f}_k^*\). Let \(K=\{x: g_j(x) \le 0, j=1,\ldots ,m, \, h_s(x)=0, \, s=1,\ldots ,q\}\), \(x^0 \in K\) and \(c>\max _{1 \le l \le p}\{f^l(x^0)\}\).

Suppose that for each \(j=1,\ldots ,m\) and \(s=1,\ldots ,q\), \(\mathrm{supp}g_j \cup \mathrm{supp}h_s \subseteq I_l \text{ for } \text{ some } l \in \{1,\ldots ,p\}\), and the running intersection property holds for \(I_l\), \(l=1,\ldots ,p\). If, for each \(l=1,\ldots ,p\), there exist \(\mu _{0l} \ge 0\), \(l=1,\ldots ,p\), \(\mu _j \ge 0\), \(j=1,\ldots ,m\) and \(\tau _s \in \mathbb {R}\), \(s=1,\ldots ,q\), such that \(\sum _{l=1}^p\mu _{0l} f^l+\sum _{j=1}^m \mu _j g_j+\sum _{s=1}^q \tau _s h_s\) is coercive, then \(\lim _{k \rightarrow \infty }\bar{f}_k^*=\min (P)\).

Proof

Let \(\epsilon >0\). Then, we have \(f-\min (P)+\epsilon >0\) over the feasible set K. As there exist \(\mu _{0l} \ge 0\), \(l=1,\ldots ,p\), \(\mu _j \ge 0\), \(j=1,\ldots ,m\) and \(\tau _s \in \mathbb {R}\), \(s=1,\ldots ,q\), such that \(\sum _{l=1}^p\mu _{0l} f^l+\sum _{j=1}^m \mu _j g_j+\sum _{s=1}^q \tau _s h_s\) is coercive, we see that \(\sum _{l=1}^p\mu _{0l} (f^l-c)+\sum _{j=1}^m \mu _j g_j+\sum _{s=1}^q \tau _s h_s\) is also coercive. It follows from Proposition 2.1 that \(\{x: \sum _{l=1}^p\mu _{0l} (c-f^l(x))+\sum _{j=1}^m \mu _j (-g_j)(x)+\sum _{s=1}^q (-\tau _s) h_s \ge 0\}\) is compact. So, by definition, \(\mathbf{M}(-g_1,\ldots ,-g_m,c-f^1,\ldots ,c-f^p;h_1,\ldots ,h_q)\) is Archimedean. Note that \(f-\min (P)+\epsilon >0\) over \(\hat{K}\) where

$$\begin{aligned} \hat{K}=\{x: g_j(x) \le 0, j=1,\ldots ,m, f^l(x)- c \le 0, l=1,\ldots ,p, h_s(x)=0, \, s=1,\ldots ,q\}. \end{aligned}$$

Hence, by Lemma 2.2, we obtain that

$$\begin{aligned} f-\min (P)+\epsilon =\sum _{l=1}^p\left( \sigma _{0l}-\sum _{j=1}^m \sigma _{jl} g_j+\bar{\sigma }_l (c-f^l) + \sum _{s=1}^q \phi _{sl} h_s \right) , \end{aligned}$$

for some sum-of squares polynomials \(\sigma _{0l},\ldots ,\sigma _{ml},\bar{\sigma }_{l}\) with variables \(\{x_i:i \in I_l\}\) and real polynomials \(\phi _{sl}\) with variables \(\{x_i:i \in I_l\}\), \(l=1,\ldots ,p\). Thus, for each \(\epsilon >0\), there exists \(k_0 \in \mathbb {N}\) such that \(\bar{f}_{k_0}^* \ge \min (P)-\epsilon \). On the other hand, from the construction of the hierarchy, we see that \(\bar{f}^*_k \le \bar{f}^*_{k + 1} \le \cdots \le \min (P).\) Therefore, the conclusion follows. \(\square \)

Remark 3.1

It is worth noting that, if a polynomial f on \(\mathbb {R}^n\) is convex in the sense that \(\nabla ^2 f(x)\) is positive semi-definite for all \(x \in \mathbb {R}^n\) and there exists \(x^* \in \mathbb {R}^n\) such that \(\nabla ^2 f(x^*)\) is positive definite, then it is coercive (for example see [12]). Therefore, our coercivity assumption of Theorem 3.1 that, “there exist \(\mu _{0l} \ge 0\), \(l=1,\ldots ,p\) \(\mu _j \ge 0\), \(j=1,\ldots ,m\) and \(\tau _s \in \mathbb {R}\), \(s=1,\ldots ,q\) such that \(\sum _{l=1}^p\mu _{0l} f^l+\sum _{j=1}^m \mu _j g_j+\sum _{s=1}^q \tau _q h_q\) is coercive” is satisfied, if one of the polynomials \(f^l,g_j,h_s\), \(l=1,\ldots ,p\), \(j=1,\ldots ,m\), \(s=1,\ldots ,q\), is convex and has a positive definite Hessian at some point \(x^* \in \mathbb {R}^n\).

In the special case where \(p=1\), \(I_1=\{1,\ldots ,n\}\) and each \(h_s \equiv 0\), \(s=1,\ldots ,q\), the optimization problem (P) reduces to

In this case, the sparse truncated quadratic module \(\bar{\mathbf {M}}_k\) reduces to the truncated quadratic module \(\mathbf {M}_k\) generated by the polynomials \(c - f\) and \(-g_1, \ldots , - g_m\) given by

$$\begin{aligned} \mathbf {M}_k \:= & {} \ \left\{ \sigma _0 + \sum _{j= 1}^{m} \sigma _j g_j + \bar{\sigma }(c - f) \quad \big | \quad \sigma _0,\sigma ,\bar{\sigma }\in \Sigma ^2[\underline{x}] \subset {\mathbb R}[\underline{x}], \right. \\&\qquad \left. \deg \, \sigma _0 \le 2k, \, \deg \, \sigma _j g_j \le 2k, \text{ and } \deg \bar{\sigma }(c - f) \le 2k\right\} , \end{aligned}$$

and the corresponding relaxation problem collapses to

$$\begin{aligned} f_k^* := \sup \left\{ \mu \in {\mathbb R} \ | \ f - \mu \in \mathbf {M}_{k} \right\} . \end{aligned}$$
(3.2)

So, Theorem 3.1 collapses to the convergence result of the dense SDP hierarchy for polynomial optimization problem with noncompact sets proposed in [11].

Remark 3.2

(Locating a global minimizer) It is worth noting that, in addition to the assumptions of Theorem 3.1, if we further assume that problem (P) has a unique solution \(\bar{x}\), then we can also find the global minimizer \(\bar{x}\) by solving a sequence of semidefinite programming problems which are the dual problems of the equivalent SDP reformulation of (3.2). The details are omitted here and they can be found in [20, 21].

4 Numerical experiments

In this Section, we show the effectiveness of the proposed sparse SDP hierarchy in Sect. 4 by solving some numerical test problems with unbounded feasible sets. All the numerical tests were conducted on a computer, with a 2.8 GHz Intel Core i7 and 8 GB RAM, equipped with Matlab 7.14 (R2012a).

The purpose of the numerical experiments is to illustrate how our proposed sparse SDP hierarchy works for solving polynomial optimization problems with unbounded feasible sets. Therefore, we first select some known test problems which are nonconvex coercive polynomials and test them by minimizing them over unbounded feasible sets. The selected test problems are:

  1. (1)

    a nonconvex quadratic programming problem with an unbounded feasible set, in which Lasserre’s hierarchy is known to fail;

  2. (2)

    the Rosenbrock function over the nonnegative orthant;

  3. (3)

    the Chain-wood function over the nonnegative orthant;

For the numerical tests, we first form the proposed sparse SDP hierarchy (3.1) for the above test problems. Then, we use a Matlab software SparsePOP [37] to solve the corresponding sparse SDP problems. Let \(f(x)=\sum _{l=1}^q f^l(x)\). We note that the proposed sparse SDP hierarchy (3.1) can be obtained from the existing sparse SDP hierarchy used in [18, 36, 37] designed for problems with compact feasible sets by adding additional constraints \(f^l(x) \le c\), \(l=1,\ldots ,q\), to the test problems with unbounded feasible sets, where c is appropriately chosen. SparsePOP can solve polynomial optimization problems exploiting the sparsity described in Sect. 4 by setting the parameter sparseSW = 1, and can also implement Lasserre’s hierarchy by setting the parameter sparseSW = 0. In addition, a parameter, called the relaxation order, can be chosen in SparsePOP, depending on the degree of the polynomial optimization problem. The larger value for the relaxation order is used, the better approximation to the optimal value of the polynomial optimization problem can be expected.

In SparsePOP, the accuracy of an obtained objective value is computed by

$$\begin{aligned} \text{ Relative } \text{ objective } \text{ error } \text{(Rel.Er) } = \frac{ \text{ POP.objValue } - \text{ SDP.objValue } }{ \max \{ 1, | \text{ POP.objValue } | \} }, \end{aligned}$$

where POP.objValue means the value of the objective function of a polynomial optimization problem computed using a candidate for the optimal solution, and SDP.objValue the value of the objective function of the SDP relaxation of the polynomial optimization problem. Moreover, CPU time reported in the subsequent discussion is measured in seconds. For details, we refer to [37].

4.1 A 2-dimensional QP with an unbounded feasible set

Consider the following nonconvex quadratic optimization problem:

(4.1)

It was shown in [23, 26] that the global minimizers are \((\pm \frac{M+\sqrt{M^2+4}}{2},\pm 1)\) and its global minimum is \(1+\frac{\sqrt{2M^2+4+2M\sqrt{M^2+4}}}{4}\). For instance, if \(M=5\), then, the global minimizers are \((\pm 5.1926,\pm 1)\) and its global minimum is 27.9629.

Let \(M=5\). It was shown in [26] that Lasserre’s hierarchy only provides a lower bound 2, no matter how large the relaxation order k is chosen. As explained in [26], the main reason why the Lasserre’s hierarchy fails to achieve the global minimum of this quadratic problem is that the feasible set of this nonconvex quadratic problem is unbounded.

Notice that the objective \(f(x_1,x_2):=x_1^2+x_2^2\) is coercive. Thus, we can apply the proposed sparse hierarchy. Since the problem has only two variables without sparsity, we test the problem using our dense SDP hierarchy (3.2) with \(c=40\). Note that \(c>f(6,1)\) and the point (6, 1) is feasible for this quadratic optimization problem.

As shown in Table 1, by running the sparsePOP with sparseSW = 0 and the relaxation order 4, the relaxation problem in the dense hierarchy (3.2) solves the original problem and returns a good approximation to the true global minimizer (5.1926, 1.000). Table 1 summarizes the optimal values for the dense SDP hierarchy (3.1) with relaxation order \(k=2,3,4\), which illustrates the effectiveness of our approach.

Table 1 Numerical test on problem (QP2)

4.2 The Rosenbrock function over nonnegative orthant

The Rosenbrock function is described as

$$\begin{aligned} f_R(x_1,\ldots ,x_n)=1+\sum _{i=2}^n \big ((x_i-x_{i-1}^2)^2+(1-x_i)^2\big ),\quad \ n\ge 2. \end{aligned}$$

Clearly, \(f_R\) is a SOS nonconvex polynomial, and is coercive. We add constraints to the Rosenbrock function to have a polynomial optimization problem with an unbounded region as follows:

It can be easily verified that this problem has a unique global minimizer \(\mathbf{e}_n:=(\underbrace{1,\ldots ,1}_n)\).

Let \(I_l=\{l,l+1\}\), \(l=1,\ldots ,n-1\) and \(g_j(x)=x_j\), \(j=1,\ldots ,n\). Then, the assumptions in Theorem 3.1 are satisfied as

$$\begin{aligned} \mathrm{supp}g_j=\{j\} \subseteq \left\{ \begin{array}{lll} I_j, &{}\quad \text{ if } &{} j=1,\ldots ,n-1, \\ I_{n-1}, &{}\quad \text{ if } &{} j=n, \end{array} \right. \end{aligned}$$

and for each \(l=1,\ldots ,n-1\),

$$\begin{aligned} I_{l+1} \cap \left( \bigcup _{j=1}^l I_j\right) =\{l\} \subseteq I_l. \end{aligned}$$

Therefore, according to Theorem 3.1, the optimal value of the proposed sparse SDP hierarchy (3.1) converges to the optimal value of the global minimum of \((EP_R)\).

We now test our proposed sparse SDP hierarchy (3.1) using the global optimization problem \((EP_R)\) for different dimension n with \(c=2\). From Table 2, we see that for \(n=10{,}000\) or \(n=20{,}000\), the sparsePOP returns an accurate solution for the relaxation order 2 in the proposed sparse hierarchy Sparse SDP hierarchy (3.1).

Table 2 Numerical tests on the problem \((EP_R)\)

When the dimension n of the problem is large, directly applying the dense SDP hierarchy proposed in [11] without exploiting the sparsity leads to very large SDP problems which cannot be handled by a SDP solver such as SeDuMi [34] . Indeed, we confirm in our numerical computation that the dense SDP hierarchy proposed in [11] can only be used to solve \((EP_R)\) up to dimension 20. The larger problems than dimension 20 resulted in out-of-memory error.

4.3 Chained-wood function over nonnegative orthant

Let \(n \in 4\mathbb {N}\) where \(\mathbb {N}\) denotes the set of integers. Consider the Chained-wood function given by

$$\begin{aligned} f_C(x_1,\ldots ,x_n)= & {} 1+\sum _{i \in J} \big ((x_{i+1}-x_{i}^2)^2+(1-x_i)^2+90(x_{i+3}^2-x_{i+2})^2+(x_{i+2}-1)^2\\&+\,10(x_{i+1}+x_{i+3}-2)^2+\frac{1}{10}(x_{i+1}-x_{i+3})^2\big ), \end{aligned}$$

where \(J=\{1,3,5,\ldots ,n-3\}\). The function \(f_C\) is a SOS nonconvex polynomial, and is coercive. Adding nonnegative constraints for the variables \(x_i, \ i=1,\ldots ,n\) results in the following polynomial optimization problem:

Clearly, the feasible set of this polynomial optimization is unbounded, and this problem has a unique global minimizer \(\mathbf{e}_n:=(\underbrace{1,\ldots ,1}_n)\).

Note that \(f_C\) can be equivalently rewritten as

$$\begin{aligned} f_C(x_1,\ldots ,x_n)= & {} 1+\sum _{l=1}^{\frac{n}{2}-1} \big ((x_{2l}-x_{2l-1}^2)^2+(1-x_{2l-1})^2+90(x_{2l+2}^2-x_{2l+1})^2\\&+\,(x_{2l+1}-1)^2+10(x_{2l}+x_{2l+2}-2)^2+\frac{1}{10}(x_{2l}-x_{2l+2})^2\big ). \end{aligned}$$

For each \(l=1,\ldots ,\frac{n}{2}-1\), let \(I_l=\{2l-1,2l,2l+1,2l+2\}\) and \(g_j(x)=x_j,\) \(j=1,\ldots ,n\). Then, the assumptions in Theorem 3.1 are satisfied as \(\mathrm{supp} g_j=\{j\} \subseteq I_{[\frac{j}{2}]}\), where \([\frac{j}{2}]\) is the largest integer that is smaller than \(\frac{j}{2}\), and for each \(l=1,\ldots ,n-1\),

$$\begin{aligned} I_{l+1} \cap \left( \bigcup _{j=1}^l I_j\right) =\left\{ 2l+1,2l+2\right\} \subseteq I_l. \end{aligned}$$

Therefore, Theorem 3.1 implies that the optimal value of the proposed sparse SDP hierarchy (3.1) converges to the optimal value of the global minimum of \((EP_R)\).

We now test our proposed sparse SDP hierarchy (3.1) on \((EP_C)\) for different values of n with \(c=n+1\). From Table 3, we see that for dimension \(n=10{,}000\) or \(n=20{,}000\), the sparsePOP with the relaxation order =2 returns an accurate solution in the proposed sparse hierarchy Sparse SDP hierarchy (3.1).

Table 3 Numerical tests on the problem \((EP_C)\)

The numerical experiment using the dense SDP hierarchy proposed in [11] could solve \((EP_C)\) up to only \(n=10\). We observe again that much larger problems can be solved by the sparse SDP hierarchy.

4.4 Further examples

In this part, for further illustration, we consider the following polynomial optimization problem:

where \(I_l \subseteq \{1,\ldots ,n\}\), \(l=1,\ldots ,q\), \(\bigcup _{l=1}^qI_l=\{1,\ldots ,n\}\), \(n_l=|I_l|\), that is, \(n_l\) is the cardinality of \(I_l\), \(l=1,\ldots ,q\), \(I_l=\{i_1,\ldots ,i_{n_l}\}\) and \(g_j^l\) are polynomials, \(j=1,\ldots ,m, \, l=1,\ldots ,q\).

Note that problem \((P_0)\) is a polynomial optimization reformulation of the sparse optimization problem with polynomial inequality constraints. Problems of this kind arise in signal processing and statistics [3, 4] and have received a great deal of attention in the literature. For a recent comprehensive survey, see [2]. Other related references can be found in [33]. The relationship between problem \((P_0)\) with the sparse optimization problem has been briefly outlined in the Appendix.

We also note that, there are other efficient approaches for solving the problem \((P_0)\) (or the corresponding equivalent sparse optimization problem) such as the branch and bound approach and the techniques of mathematical programming with equilibrium constraints (for example see [6]). Due to the limitation of the SDP solvers, our approach cannot compete with these ad-hoc but numerically tractable approaches. The purpose of this section is to illustrate that our proposed sparse SDP hierarchy can be used to solve problem \((P_0)\) with large size where the usual dense SDP hierarchy may not be directly applicable.

We now introduce a SDP hierarchy for problem \((P_0)\) using the results in Sect. 3. It is worth noting that, problem \((P_0)\) is a minimization problem with variables \((x,w)\in \mathbb {R}^n \times \mathbb {R}^n\). Thus, the feasible set of \((P_0)\) can be unbounded in general (for example, simply take \(g_j\equiv 0\). Then, (w, 0) is feasible for problem \((P_0)\) for any \(w \in \mathbb {R}^n\)).

Let \(f(x,w)=\sum _{l=1}^q f_l(x^l,w^l)\), where \(f_l(x^l,w^l)=\sum _{i \in I_l}(w_{i}^l)^2\) and \(x=(x^{1},\ldots ,x^{q}) \in \mathbb {R}^{\sum _{l=1}^q n_l}\) and \(w=(w^1,\ldots ,w^{q}) \in \mathbb {R}^{\sum _{l=1}^q n_l}\). For each \(l=1,\ldots ,q\), define \(\bar{g}_{il}(x^l,w^l)=(1-w_i^l)x_i^l\), \(i \in I_l\) and \(G_l(x^l,w^l):=\Vert x^l\Vert _2^2-M^l\). Let c be a number such that \(c>n^l\), \(l=1,\ldots ,q\). For each integer k,  we define the following sparse truncated quadratic module \(\hat{\mathbf {M}}_k\) generated by the polynomials \(c - f^l\), \(-g_j^l, j=1,\ldots ,m\), \(\bar{g}_{il}\), \(i \in I_l\) and \(G_l\), \(l=1,\ldots ,q\), as follows:

$$\begin{aligned} \hat{\mathbf {M}}_k:= & {} \left\{ \sum _{l=1}^q\bigg (\sigma _{0l} - \sum _{j=1}^m \sigma _{jl} g_{j}^l + \sum _{i=1}^{I_l}h_{il} \bar{g}_{il}+\sigma _l (-G_l)+ \bar{\sigma }_l \left( c - f^l\right) \bigg )\right. \\&\left. \ |\ \sigma _{0l},\sigma _{il},\sigma _l,\bar{\sigma }_l \in \Sigma ^2\left[ \underline{x^l},\underline{w^l}\right] , \right. \\&h_{il} \in \mathbb {R}\left[ \underline{x^l},\underline{w^l}\right] , \ \deg \, \sigma _{0l} \le 2k, \, \deg \, \sigma _{jl} g_{jl} \le 2k \, \deg \, h_{il} \bar{g}_{il} \le 2k, \text { and } \\&\left. \deg \, \sigma _l G_{l} \le 2k, \deg \bar{\sigma }\left( c - f^l\right) \le 2k\right\} . \end{aligned}$$

Consider the following relaxation problem

$$\begin{aligned} f_k^* := \sup \left\{ \mu \in {\mathbb R} \ | \ f - \mu \in \hat{\mathbf {M}}_{k} \right\} . \end{aligned}$$
(4.2)

Then, one can show that \(\lim _{k \rightarrow \infty }f_k^*=\min (P_0)\) if the running intersection property holds.

Proposition 4.1

(Convergence of the sparse SDP hierarchy value for \((P_0)\) ) For problem \((P_0)\) and the SDP hierarchy (4.2), assume that the running intersection property holds for \(I_l\), \(l=1,\ldots ,p\). Then, \(\lim _{k \rightarrow \infty }f_k^*=\min (P_0)\).

Proof

Note that the problem \((P_0)\) can be equivalently rewritten as

As \(\bigcup _{l=1}^pI_l=\{1,\ldots ,n\}\), \(\sum _{l=1}^pf^l(x^l,w^l)+\sum _{l=1}^p G_l(x^l,w^l)\) is strongly coercive. So, the assumptions in Theorem 3.1 are satisfied. Therefore, Theorem 3.1 implies that \(\lim _{k \rightarrow \infty }f_k^*=\min (P_0)\). \(\square \)

We now illustrate the SDP hierarchy using a numerical example. For the numerical test on the effectiveness of the proposed SDP hierarchy, we let \(q \in \mathbb {N}\). For each \(l=1,\ldots ,q\), we generate a 3-by-4 matrix \(A_l\) containing random values drawn from the uniform distribution on the interval [0, 1] and define \(b_l=4 A_l(:,1)\), where \(A_l(:,1)\) denotes the first column of \(A_l\). Let \(I_l=\{4l-3,4l-2,4l-1,4l\}\), \(l=1,\ldots ,q\). Denote \(x^l=(x^l_i)_{i \in I_l} \in \mathbb {R}^4\) and \(w^l=(w^l_i)_{i \in I_l} \in \mathbb {R}^4\), \(l=1,\ldots ,q\). Consider the problem:

It is not hard to see that the optimal solution of (EP) is \((x^*,w^*)\) with optimal value q, where \(x^*:=[x^{1*},\ldots ,x^{q*}]\) with \(x^{l*}=[4,0,0,0]\) and \(w^*:=[w^{1*},\ldots ,w^{q*}]\) with \(w^{l*}=[1,0,0,0]\), \(l=1,\ldots ,q\). Moreover, it can be verified that the running intersection property holds. So, the optimal value of the proposed sparse SDP hierarchy (4.2) converges to the optimal value of (EP).

As shown in Table 4, for \(q=200\) and \(q=1000\), in our numerical experiment, sparsePOP with the relaxation order 2, \(c=5\) and \(M=100\), returns an accurate optimal value and an accurate optimal solution \(x^*\) in the proposed Sparse SDP hierarchy (4.2).

We also notice that the numerical experiment using the dense SDP hierarchy proposed in [11] could solve the above problem with number of blocks q up to only 5. We observe again that much larger problems can be solved by the sparse SDP hierarchy.

Table 4 Numerical tests on the polynomial optimization problem with hidden coercivity

Finally, we note that we can obtain optimal solution for (EP) using the solver sparsePOP. For example, consider problem (EP) where \(q=200\),

$$\begin{aligned}&A_l=A:=\left( \begin{array}{llll} 2 &{} \quad -1 &{}\quad 30 &{}\quad 3\\ 3 &{} \quad 3 &{} \quad 44 &{} \quad 2 \\ -2 &{}\quad 7 &{} \quad -40 &{} \quad -6 \end{array}\right) ,\\&\ b_l=b:=4 A(:,1)=\left( \begin{array}{l} 8 \\ 12 \\ -8 \end{array} \right) \quad \text{ for } \text{ all } l=1,\ldots ,200. \end{aligned}$$

Then, the true unique solution of this problem is \((x^*,w^*)\) where \(x^*:=[x^{1*},\ldots ,x^{q*}]\) with \(x^{l*}=[4,0,0,0]\), and \(w^*:=[w^{1*},\ldots ,w^{q*}]\) with \(w^{l*}=[1,0,0,0]\) \(l=1,\ldots ,200\). In our numerical experiment, sparsePOP with the relaxation order 2 and \(c=5\), returns an accurate optimal value 200 and an accurate optimal solution \((x^*,w^*)\), in the proposed Sparse SDP hierarchy (4.2), as shown in Table 5.

Table 5 Numerical tests on the polynomial optimization problem with hidden coercivity