1 Introduction

Polynomials appear in a wide variety of areas in science. It is not surprising then that optimizing a polynomial optimization problem (POP), in which both the objective function and constraints are defined by multivariate polynomials, continues to be the focus of novel research work (cf., [3]). Here, the interest is the class of non-convex, non-linear POPs. Clearly, a non-convex quadratic program (QP) belongs to this class of problems, and its study has been widely addressed in the literature. For example, semidefinite programming (SDP) relaxations have been actively used to find good bounds and approximate solutions for general QPs (see, e.g., [17, 36, 50]), and for important QP problems such as the max-cut and the stable set problem (see, e.g., [18, 19, 24, 41]). In [27], more computationally efficient second order cone programming (SOCP) relaxations have also been proposed to approximately solve non-convex QPs.

The early work linking convex optimization and polynomial optimization in [39, 47] revealed the possibility to use conic optimization to obtain global or near-global solutions for non-convex POPs in which polynomials with degree higher than two are used. In the seminal work of Parrilo [40] and Lasserre [29], SDP is used to obtain the global or near-global optimum of POPs. Besides SDP approximations, other convex approximations to address the solution of POPs have been investigated using linear programming (LP) and SOCP techniques [1, 30, 31, 42, 52]. These techniques are at the core of the well-known area of polynomial optimization (cf., [3]).

Alternatively, it has been shown that several NP-hard optimization problems can be reformulated as a completely positive (CP) program; that is, a linear program over the convex cone of CP matrices or its dual cone, the cone of copositive matrices, including standard QPs [10], stable set problems [18, 22], graph partitioning problems [44], and quadratic assignment problems [45]. In [13], Burer derives a more general result; namely, that every linearly constrained QP with binary variables can be reformulated as a copositive program. CP programming relaxations for general quadratically constrained quadratic programs (QCQPs) have been studied in [4, 15]. In [6], CP programming reformulation for QCQPs and QPs with complementarity constraints (QPCCs) are discussed without any boundedness assumption on the problems’ feasible regions. Although the CP matrix cone is not tractable in general, recent advances on obtaining approximation algorithms [2, 12, 20] for CP programs, provide an alternative way to globally solve QCQPs. Recently, Bomze shows in [8] that CP programming relaxations provide tighter bounds than Lagrangian dual bounds for quadratically and linearly constrained QPs.

A natural thought is whether one can extend the copositive programming or completely positive programming reformulations for QPs to POPs. Arima et al. [5] propose the moment cone relaxation for a class of POPs to extend the results on CP programming relaxations for the QCQPs. Recently, Peña et al. [43] show that under certain conditions, general POPs can be reformulated as a conic program over the cone of CP tensors, which is a natural extension of the cone of CP matrices used for the solution of quadratic POPs. This tensor representation was originally proposed in [21], and is now the focus of active research (see, e.g., [25, 26, 35, 48]). In [43], it is also shown that the conditions for the equivalence between POPs and its associated CP programming relaxation, when applied to QCQPs, lead to conditions that are weaker than the ones introduced in [13].

In this article, we study CP and completely positive semidefinite (CPSD) tensor relaxations for POPs (cf., Sect. 2.1). Our main contributions are: (1) We extend the results for QPs in [8] to general POPs by using CP and CPSD tensor cones. In particular, we show that CP tensor relaxations provide tighter bounds than Lagrangian relaxations for general POPs. (2) We provide tractable approximations for CP and CPSD tensor cones that can be used to globally approximate general POPs. (3) We prove that CP tensor relaxations yield tighter bounds than CP matrix relaxations for quadratic reformulations of some classes of POPs. (4) We provide numerical results to show that, in more generality, approximations to the CP tensor relaxations of a POP can be used to obtain a tighter bound for the problem than the one obtained based on approximations to the CP matrix relaxation of the POP’s associated quadratic reformulation.

The remainder of the article is organized as follows. We briefly introduce the basic concepts of tensor cones and tensor representations of polynomials in Sect. 2. Lagrangian relaxation, CP and CPSD tensor relaxations for POPs are discussed in Sect. 3. In Sect. 4, we discuss the quadratic reformulation of a general POP; that is, auxiliary decision variables are introduced to the problem to reformulate it as a QCQP. Then, for a class of POPs, the bounds on the POP obtained from a CP matrix relaxations of the quadratic reformulation of the POP are compared with the ones obtained from a CP tensor relaxation of the POP. In Sect. 5, linear matrix inequality (LMI) approximation strategies for the CP and CPSD tensor cones are proposed, and a comparison of bounds obtained by tensor relaxation and matrix relaxation is presented for general small scale POPs. Lastly, Sect. 6 summarizes the article’s results and provides future working directions.

2 Preliminaries

2.1 Basic concepts and notation

We first introduce basic concepts and the notation used throughout the article. Following [43], we start by defining tensors.

Definition 1

Let \(\mathcal{T}_{n,d}\) denote the set of tensors of dimension n and order d in \(\mathbb {R}^n\), that is

$$\begin{aligned} \mathcal{T}_{n,d} =\underbrace{\mathbb {R}^n\otimes \cdots \otimes \mathbb {R}^n}_d, \end{aligned}$$

where \(\otimes \) is the tensor product.

A tensor \(T\in \mathcal{T}_{n,d}\) is symmetric if the entries are invariant with respect to permutations of its indices. We denote by \(\mathcal{S}_{n,d} \subseteq \mathcal{T}_{n,d}\) the set of symmetric tensors of dimension n and order d. For any \(T^1,T^2\in \mathcal{T}_{n,d}\), let \(\langle \cdot ,\cdot \rangle _{n,d}\) denote the tensor inner product defined by

$$\begin{aligned} \langle T^1,T^2\rangle _{n,d}=\sum _{\{i_1,\ldots ,i_d\}\in \{1,\ldots ,n\}^d} T^1_{(i_1,\ldots ,i_d)}T^2_{(i_1,\ldots ,i_d)}. \end{aligned}$$

Definition 2

For any \(x\in \mathbb {R}^n\), let the mapping \(\mathbb {R}^n\rightarrow \mathcal{S}_{n,d}\) be defined by

$$\begin{aligned} M_d(x) = \underbrace{x\otimes \cdots \otimes x}_d. \end{aligned}$$

Definitions 1 and 2 are natural extensions of matrix notation to tensors. For example, \(\mathcal{T}_{n,2}\) is the set of \(n\times n\) matrices, while \(\mathcal{S}_{n,2}\) is the set of \(n\times n\) symmetric matrices. Also, \(\langle \cdot ,\cdot \rangle _{n,2}\) is the Frobenius inner product and \(M_2(x)=xx^{\intercal }\) for any \(x\in \mathbb {R}^n\). In general, \(M_d(x)\) is the symmetric tensor whose \((i_1,\ldots ,i_d)\) entry is \(x_{i_1}\cdots x_{i_d}\).

Proposition 1

Let \(\mathbb {E}_{n,d}\) be all-ones tensor with dimension n and order d and \(e\in \mathbb {R}^n\) be the all-ones vector, then

$$\begin{aligned} \langle \mathbb {E}_{n,d},M_d(x)\rangle _{n,d}=(e^{\intercal }x)^d,\forall x\in \mathbb {R}^n. \end{aligned}$$

Proof

By the definition of \(M_d(\cdot )\) and \(\langle \cdot ,\cdot \rangle _{n,d}\),

$$\begin{aligned} \langle \mathbb {E}_{n,d},M_d(x)\rangle _{n,d}=\sum _{k_1+k_2+\cdots +k_n = d} {d \atopwithdelims ()k_1, k_2, \ldots , k_n} x_1^{k_1} x_2^{k_2} \cdots x_n^{k_n}= (e^{\intercal }x)^d, \end{aligned}$$

where \({d \atopwithdelims ()k_1, k_2, \ldots , k_n}\) is the multinomial coefficient. \(\square \)

Proposition 2

For \(x\in \mathbb {R}^n,y\in \mathbb {R}^n\),

$$\begin{aligned} \langle M_d(x),M_d(y)\rangle _{n,d}=(x^{\intercal }y)^d. \end{aligned}$$

Proof

Let \(x,y\in \mathbb {R}^n\) be given and \(z\in \mathbb {R}^n\) be defined as \(z_i=x_iy_i, i=1,\ldots ,n\), and let \(e\in \mathbb {R}^n\) be the all-ones vector. From the definition of \(M_d(\cdot )\) and \(\langle \cdot ,\cdot \rangle _{n,d}\) we have

$$\begin{aligned} \begin{aligned} \langle M_d(x),M_d(y)\rangle _{n,d}&= \sum _{\{i_1,\ldots ,i_d\}\in \{1,\ldots ,n\}^d} M_d(x)_{(i_1,\ldots ,i_d)}M_d(y)_{(i_1,\ldots ,i_d)}\\&= \sum _{\{i_1,\ldots ,i_d\}\in \{1,\ldots ,n\}^d} x_{i_1}x_{i_2}\cdots x_{i_d}\cdot y_{i_1}y_{i_2}\cdots y_{i_d}\\&= \sum _{\{i_1,\ldots ,i_d\}\in \{1,\ldots ,n\}^d} (x_{i_1}y_{i_1})(x_{i_2}y_{i_2})\cdots (x_{i_d}y_{i_d})\\&= \langle \mathbb {E}_{n,d},M_d(z)\rangle _{n,d}\\&= (e^{\intercal }z)^d\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (from Proposition 1)\\&= (x^{\intercal }y)^d. \end{aligned} \end{aligned}$$

\(\square \)

Analogous to PSD and copositive matrices, PSD and copositive tensors can be defined as follows.

Definition 3

Define the \(\mathcal{K}\)-semidefinite (or set-semidefinite) symmetric tensor cone of dimension n and order d as:

$$\begin{aligned} \mathcal{C}_{n,d}(\mathcal{K}) = \left\{ T\in \mathcal{S}_{n,d}:\langle T,M_d(x)\rangle _{n,d}\ge 0,\forall x\in \mathcal{K}\right\} . \end{aligned}$$

For \(\mathcal{K}=\mathbb {R}^n\), \(\mathcal{C}_{n,d}(\mathbb {R}^n)\) denotes the positive semidefinite (PSD) tensor cone. For \(\mathcal{K}=\mathbb {R}^n_+\), \(\mathcal{C}_{n,d}(\mathbb {R}^n_+)\) denotes the copositive tensor cone.

Similar to the one-to-one correspondence of \(n\times n\) PSD matrices to nonnegative homogeneous quadratic polynomials of n variables, there is also a one-to-one correspondence of PSD tensors with dimension n and order d to nonnegative homogeneous polynomials with n variables and degree d (cf., [35]). Note that the set of non-negative homogeneous polynomials of odd degree is empty; that is, \(\mathcal{C}_{n,d}(\mathbb {R}^n) = \emptyset \) when d is odd. Next we discuss the dual cones of \(\mathcal{C}_{n,d}(\mathbb {R}^n_+)\) and \(\mathcal{C}_{n,d}(\mathbb {R}^n)\), following the discussion in [35] and [43].

Definition 4

Given any cone \(\mathcal{C}\subseteq \mathcal{S}_{n,d}\), the dual cone of \(\mathcal{C}\) is

$$\begin{aligned} \mathcal{C}^*= \left\{ Y\in \mathcal{S}_{n,d}:\langle X,Y\rangle _{n,d}\ge 0,\forall X\in \mathcal{C}\right\} . \end{aligned}$$

If \(\mathcal{C}^*=\mathcal{C}\), then cone \(\mathcal{C}\) is self-dual.

The dual cones of the PSD tensor cone and copositive tensor cone have been studied in [35, 43]. Formally,

Proposition 3

  1. (a)

    \(\mathcal{C}_{n,d}^*(\mathbb {R}^n_+) = \hbox {conv}\{M_d(x):x\in \mathbb {R}^n_+\}\).

  2. (b)

    \(\mathcal{C}_{n,2d}^*(\mathbb {R}^n) = \hbox {conv}\{M_{2d}(x):x\in \mathbb {R}^n\}\).

Similar to the cone of CP and PSD matrices, we denote by \(\mathcal{C}^*_{n,d}(\mathbb {R}^n_+)\) the completely positive (CP) tensor cone, and by \(\mathcal{C}^*_{n,2d}(\mathbb {R}^n)\) the completely positive semidefinite (CPSD) tensor cone. Also, let the homogeneous sum of square (SOS) tensor cone of dimension n and order 2d be defined by

$$\begin{aligned} \begin{aligned} \mathcal{C}_{n,2d}({\mathcal {SOS}}) =\left\{ T_{n,2d}: \langle T_{n,2d},M_{2d}(x)\rangle _{n,2d} = \sum _{i \in {\mathbb {N}}}\lambda _i\left( \langle T_{n,d}^i,M_d(x)\rangle _{n,d}\right) ^2, \lambda _i\ge 0,~\forall i \in {\mathbb {N}}\right\} . \end{aligned} \end{aligned}$$

It is well known that the cone of PSD matrices is self-dual; however, in general, the PSD tensor cone is not self-dual (cf., [35]) as discussed next.

Proposition 4

([35, Prop. 5.8 (i)])

$$\begin{aligned} \mathcal{C}^*_{n,2d}(\mathbb {R}^{n})\subseteq \mathcal{C}_{n,2d}({\mathcal {SOS}})\subseteq \mathcal{C}_{n,2d}(\mathbb {R}^{n}). \end{aligned}$$

Proof

Let \(T\in \mathcal{C}^*_{n,2d}(\mathbb {R}^{n})\), by Proposition 3, \(T=\sum _{i \in {\mathbb {N}}} \lambda _i M_{2d}(y^i)\), where \(y^i\in \mathbb {R}^n,\lambda _i\ge 0\), for all \(i \in {\mathbb {N}}\) and \(\sum _{i \in {\mathbb {N}}} \lambda _i=1\). Then \(\forall x\in \mathbb {R}^n\),

$$\begin{aligned} \begin{aligned} \langle T,M_{2d}(x)\rangle _{n,2d}&= \left\langle \sum _{i} \lambda _i M_{2d}(y^i),M_{2d}(x)\right\rangle _{n,2d}\\&= \sum _{i \in {\mathbb {N}}} \lambda _i\left\langle M_{2d}(y^i),M_{2d}(x)\right\rangle _{n,2d}\\&= \sum _{i \in {\mathbb {N}}} \lambda _i (x^{\intercal }y^i)^{2d}\ \ \ \ \ \ \ \ \ \ \ \ \ (from Proposition 2)\\&= \sum _{i \in {\mathbb {N}}} \left[ \sqrt{\lambda _i}(x^{\intercal }y^i)^d\right] ^2. \end{aligned} \end{aligned}$$

Take \(z^i_k=x_ky^i_k\), then \(x^{\intercal }y^i=e^{\intercal }z^i\) where \(e\in \mathbb {R}^n\) is the all-ones vector for all \(i \in {\mathbb {N}}\). Therefore,

$$\begin{aligned} \begin{aligned} \langle T,M_{2d}(x)\rangle _{n,2d}&= \sum _{i \in {\mathbb {N}}} \left[ \sqrt{\lambda _i}(e^{\intercal }z^i)^d\right] ^2\\&= \sum _{i \in {\mathbb {N}}} \left[ \sqrt{\lambda _i}\langle {\mathbb {E}}_{n,d},M_d(z^i)\rangle _{n,d}\right] ^2\ \ \ \ \ \ \ (from Proposition 1). \end{aligned} \end{aligned}$$

Therefore, \(\mathcal{C}^*_{n,2d}(\mathbb {R}^{n})\subseteq \mathcal{C}_{n,2d}({\mathcal {SOS}})\). By the definition of the homogeneous SOS tensor cone, it is clear that \(\mathcal{C}_{n,2d}({\mathcal {SOS}})\subseteq \mathcal{C}_{n,2d}(\mathbb {R}^{n})\). \(\square \)

The proof of Proposition 4 can be seen as an alternative proof for Proposition 5.8 (i) in [35] that uses the tensor notation introduced in this article. As mentioned before, it is well known that \(\mathcal{C}^*_{n,2}(\mathbb {R}^{n})= \mathcal{C}_{n,2}({\mathcal {SOS}})= \mathcal{C}_{n,2}(\mathbb {R}^{n})\). This statement coincides with the self-duality of the cone of PSD matrices. Luo et al. showed in [35] that \(\mathcal{C}^*_{n,2d}(\mathbb {R}^{n})\subsetneq \mathcal{C}_{n,2d}({\mathcal {SOS}})\) for \(d\ge 2\). On the other hand, the Motzkin polynomial together with the isomorphism between homogeneous polynomials and tensors shows that \(\mathcal{C}_{n,2d}({\mathcal {SOS}})\subsetneq \mathcal{C}_{n,2d}(\mathbb {R}^{n})\) when \(d\ge 2\) and \(n\ge 2\).

2.2 Tensor representation of general polynomials

In Sect. 2.1, we mentioned the isomorphism between symmetric tensors and homogenous polynomials. Next, we introduce a tensor representation for general polynomials that are not necessarily homogeneous. Define \(\mathbb {R}[x]\) as the ring of polynomials with real coefficients in \(\mathbb {R}^n\), and let \(\mathbb {R}_d[x]:=\{p\in \mathbb {R}[x]:{\text {deg}}(p)\le d \}\) denote the set of polynomials with dimension n and degree at most d. For simplicity, we use \(M_d(1,x), x\in \mathbb {R}^n\) to represent \(M_d((1,x^{\intercal })^{\intercal }), x\in \mathbb {R}^n\) and use \(T_d(p)\) to represent \(T_d(p(x))\) for \(p(x)\in \mathbb {R}_d[x]\) throughout the article. Then, for any \(p(x)\in \mathbb {R}_d[x]\), we have

$$\begin{aligned} p(x)=\langle T_d(p),M_d(1,x)\rangle _{n+1,d}, \end{aligned}$$
(1)

where \(T_d(\cdot )\) is the mapping of coefficients of p(x) in terms of \(M_d(1,x)\) in \(\mathcal{S}_{n+1,d}\). Following [43], define \(T_d:\mathbb {R}_d[x]\rightarrow \mathcal{S}_{n+1,d}\) as

$$\begin{aligned} T_d\left( \sum _{\beta \in \mathbb {Z}_+^n:|\beta |\le d}p_\beta x^\beta \right) _{i_1,\ldots ,i_d}:=\frac{\alpha _1!\cdots \alpha _n!}{|\alpha |!}p_\alpha , \end{aligned}$$

where \(\alpha \) is the (unique) exponent such that \(x^\alpha :=x_1^{\alpha _1}\cdots x_n^{\alpha _n}=x_{i_1}\cdots x_{i_d}\) (i.e., \(\alpha _k\) is the number of times k appears in the multi-set \(\{i_1,\ldots ,i_d\}\)) and \(|\alpha |=\sum _{i=1}^n\alpha _i\). For any polynomial \(p(x)\in \mathbb {R}_d[x]\), let \(\tilde{p}(x)\) denote the homogenous component of p(x) with the highest degree, then it follows

$$\begin{aligned} \tilde{p}(x)=\langle T_d(p),M_d(0,x)\rangle _{n+1,d}. \end{aligned}$$
(2)

Equations (1) and (2) can be used to characterize the boundedness of general polynomials using their associated tensor representation.

Theorem 1

Let \(\mu \in {\mathbb {R}}\) and

  1. (a)

    \(p(x)\in \mathbb {R}_d[x]\). Then \(p(x)\ge \mu \) for all \(x\in {\mathbb {R}}^n_+\) if and only if \(T_d(p-\mu )\in {\mathcal {C}}_{n+1,d}(\mathbb {R}^{n+1}_+)\).

  2. (b)

    \(p(x)\in \mathbb {R}_{2d}[x]\). Then \(p(x)\ge \mu \) for all \(x\in {\mathbb {R}}^n\) if and only if \(T_{2d}(p-\mu )\in {\mathcal {C}}_{n+1,2d}(\mathbb {R}^{n+1})\).

Proof

For (a), assume \(T_d(p-\mu )\in {\mathcal {C}}_{n+1,d}(\mathbb {R}^{n+1}_+)\). By Definition 3, \(\langle T_d(p-\mu ), M_d(1,x)\rangle _{n+1,d}\ge 0,\forall x\in \mathbb {R}^n_+\), then

$$\begin{aligned} p(x)-\mu =\langle T_d(p-\mu ),M_d(1,x)\rangle _{n+1,d}\ge 0,~\forall x \in \mathbb {R}^n_+. \end{aligned}$$
(3)

For the other direction, assume \(p(x)\ge \mu ,\forall x\in \mathbb {R}^n_+\), then by (3), \(\langle T_d(p-\mu ),M_d(1,x)\rangle _{n+1,d}\ge 0,\forall x\in \mathbb {R}^n_+\). Thus, for any \((x_0, x) \in \mathbb {R}_{++} \times \mathbb {R}^n_+\),

$$\begin{aligned} \langle T_d(p-\mu ),M_d(x_0,x)\rangle _{n+1,d} = x_0^d\left\langle T_d(p-\mu ),M_d\left( 1,\frac{x}{x_0}\right) \right\rangle _{n+1,d} \ge 0. \end{aligned}$$
(4)

Furthermore, from the continuity of polynomials, we have that for \(k>0\),

$$\begin{aligned} \langle T_d(p-\mu ),M_d(0,x)\rangle =\lim _{k\rightarrow +\infty }\langle T_d(p-\mu ),M_d(1/k,x)\rangle \ge 0, \end{aligned}$$
(5)

where the last inequality follows from (4). From (4), (5), and Definition 3, it follows that \(T_d(p-\mu )\in {\mathcal {C}}_{n+1,d}(\mathbb {R}^{n+1}_+)\).

The proof of (b) is similar to the proof of (a). \(\square \)

Corollary 1 Let \(\mu \in {\mathbb {R}}\) and

  1. (a)

    \(p(x)\in \mathbb {R}_d[x]\). Then \(\inf \{p(x):x\in {\mathbb {R}}^n_+\}=\sup \{\mu \in {\mathbb {R}}:T_d(p-\mu )\in {\mathcal {C}}_{n+1,d}(\mathbb {R}^{n+1}_+)\}\).

  2. (b)

    \(p(x)\in \mathbb {R}_{2d}[x]\). Then \(\inf \{p(x):x\in {\mathbb {R}}^n\}=\sup \{\mu \in {\mathbb {R}}:T_{2d}(p-\mu )\in {\mathcal {C}}_{n+1,2d}(\mathbb {R}^{n+1})\}\).

Theorem 1 and Corollary 1 generalize the key Lemma 2.1 and Corollary 2.1 in [8] for polynomials with degree higher than 2 by using a tensor representation. Moreover, Corollary 1 can be seen as a convexification of an unconstrained (possibly non-linear non-convex) POP to a linear conic program over the CP and CSDP tensor cones. In the next section, we will discuss CP and CPSD tensor relaxations for general constrained POPs.

3 Lagrangian and conic relaxations of POPs

Let \(p_i(x)\in \mathbb {R}_d[x], i=0,\ldots ,m\). Consider two general POPs with polynomial constraints:

(6)

and

(7)

where \(d=\max \{{\text {deg}}(p_i(x)):i\in \{0,1,\ldots ,m\}\} \) is the degree of the POP. Problems (6) and (7) represent general POPs, which encompass a large class of non-linear non-convex problems, including non-convex QPs with binary variables (i.e., binary constraints can be written in the polynomial form \(x_i(1-x_i)\le 0\), \(-x_i(1-x_i)\le 0\)). Naturally, we have \(z\le z_+\) since the feasible set of problem (6) is a subset of problem (7). Next we show that the results of Bomze for QPs in [8] can be extended to POPs of form (6) and (7).

3.1 Lagrangian relaxations

Let \(u_i\ge 0\) be the Lagrangian multiplier of the inequality constraints \(p_i(x)\le 0\) for \(i=1,..,m\) and \(v_i\ge 0\) for constraints \(x_i\in \mathbb {R}_+\) for \(i=1,\ldots ,n\). The Lagrangian function for problem (6) is

$$\begin{aligned} L_+(x;u,v):=p_0(x)+\sum _{i=1}^mu_ip_i(x)-v^{\intercal }x, \end{aligned}$$

and the Lagrangian dual function of problem (6) is

$$\begin{aligned} {\varTheta }_+(u,v):=\inf \{L_+(x;u,v):x\in {\mathbb {R}}^n\}, \end{aligned}$$

with its optimal value

$$\begin{aligned} z_{LD,+}=\sup \{{\varTheta }_+(u,v):(u,v)\in \mathbb {R}^m_+\times \mathbb {R}^n_+\}, \end{aligned}$$

We also use a semi-Lagrangian dual function to represent the nonnegative variable constraints of problem (6),

$$\begin{aligned} {\varTheta }_{{\textit{semi}}}(u):=\inf \{L(x;u):x\in {\mathbb {R}}^n_+\}, \end{aligned}$$

where \(L(x;u):=p_0(x)+\sum _{i=1}^m u_ip_i(x)\), with its optimal value

$$\begin{aligned} z_{{\textit{semi}}}=\sup \{{\varTheta }_{{\textit{semi}}}(u):u\in \mathbb {R}^m_+\}. \end{aligned}$$

Similarly, let \(u_i\ge 0\) be the Lagrangian multiplier of the inequality constraints \(p_i(x)\le 0\) for \(i=1,\ldots ,m\). The Lagrangian function for problem (7) is

$$\begin{aligned} L(x;u):=p_0(x)+\sum _{i=1}^m u_ip_i(x), \end{aligned}$$

and the Lagrangian dual function of problem (7) is

$$\begin{aligned} {\varTheta }(u):=\inf \{L(x;u):x\in \mathbb {R}^n\}, \end{aligned}$$

with its optimal value

$$\begin{aligned} z_{LD}=\sup \{{\varTheta }(u):u\in \mathbb {R}^m_+\}. \end{aligned}$$

Thus we have the following relationship:

$$\begin{aligned} \begin{aligned} {\varTheta }_+(u,v)&=\inf \{L_+(x;u,v):x\in {\mathbb {R}}^n\}\\&\le \inf \{L_+(x;u,v):x\in {\mathbb {R}}^n_+\}\\&=\inf \{L(x;u)-v^{\intercal }x:x\in {\mathbb {R}}^n_+\}\\&\le \inf \{L(x;u):x\in {\mathbb {R}}^n_+\}={\varTheta }_{{\textit{semi}}}(u),\\ \end{aligned} \end{aligned}$$

where the second inequality holds because \(x,v\in \mathbb {R}_+^n\) always implies \(v^{\intercal }x\ge 0\). Therefore, we have:

$$\begin{aligned} z_{LD,+}\le z_{{\textit{semi}}}\le z_+, \end{aligned}$$

where the latter inequality holds by weak duality. Similarly, from weak duality theory, we have \(z_{LD}\le z\).

3.2 CPSD tensor relaxation for POP with free variables

Consider the following conic program:

(8)

and its conic dual problem

$$\begin{aligned} z_{{\textit{SD}}} = \sup \left\{ \mu :T_d(p_0)-\mu T_d(1)+\sum _{i=1}^mu_iT_d(p_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+\right\} . \end{aligned}$$
(9)

Recall that \(\mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\) is empty if d is odd. Thus, for simplicity, in what follows we assume without loss of generality that d is even in (8) (if this is not the case, one can change \(d \rightarrow 2 \lceil d/2 \rceil \) by adding explicit zeros to higher order monomials in \(p_i\), \(i=0,1,\dots ,m\)). Also we use \(\langle \cdot ,\cdot \rangle \) represent the tensor inner product of appropriate dimension and order.

Proposition 5

Problem (8) is a relaxation of problem (7) with \(z_{{\textit{SP}}}\le z\).

Proof

Let \(x\in \mathbb {R}^n\) be a feasible solution of problem (7). It follows that \(X=M_d(1,x)\) is a feasible solution of problem (8) directly by applying (1). Also \(p(x)=\langle T_d(p_0),X\rangle \) is a direct result of (1) with the same objective value.\(\square \)

Theorem 2

For problem (7), its Lagrangian dual function optimal value satisfies,

$$\begin{aligned} z_{{\textit{LD}}}=\sup \left\{ \mu :(\mu ,u)\in {\mathbb {R}}\times {\mathbb {R}}^m_+,T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} \end{aligned}$$

and \(z_{{\textit{LD}}}=z_{{\textit{SD}}}\le z_{{\textit{SP}}}\le z\).

Proof

By Corollary 1 (b),

$$\begin{aligned} \begin{aligned} {\varTheta }(u)&= \inf \{L(x;u):x\in \mathbb {R}^n\}\\&= \sup \{\mu :T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\}, \end{aligned} \end{aligned}$$

then

$$\begin{aligned} \begin{aligned} z_{{\textit{LD}}}&=\sup \{{\varTheta }(u):u\in \mathbb {R}^m_+\}\\&=\sup \left\{ \mu :(\mu ,u)\in {\mathbb {R}}\times {\mathbb {R}}^m_+,T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} . \end{aligned} \end{aligned}$$

From (9), we have

$$\begin{aligned} \begin{aligned} z_{{\textit{SD}}}&= \sup \left\{ \mu :T_d(p_0)-\mu T_d(1)+\sum _{i=1}^mu_iT_d(p_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+\right\} \\&= \sup \left\{ \mu :T_d\left( p_0+\sum _{i=1}^mu_ip_i-\mu \right) \in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+\right\} \\&= \sup \{{\varTheta }(u):u\in \mathbb {R}^m_+\}\\&= z_{{\textit{LD}}}. \end{aligned} \end{aligned}$$

Furthermore, \(z_{{\textit{SD}}}\le z_{{\textit{SP}}}\le z\) holds directly from weak conic duality and Proposition 5.\(\square \)

From Theorem 2, the Lagrangian dual optimal value has no duality gap if and only if conic program itself has no duality gap and CPSD tensor relaxation is tight.

3.3 CP and CPSD tensor relaxations for POP with nonnegative variables

Consider following conic programs:

(10)

and

(11)

and their conic dual problems

$$\begin{aligned} z_{{\textit{CD}}}= & {} \sup \left\{ \mu :T_d(p_0)-\mu T_d(1)+\sum _{i=1}^mu_iT_d(p_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+),u\in \mathbb {R}^m_+\right\} .\nonumber \\ z_{{\textit{SD}},+}= & {} \sup \left\{ \mu :T_d(p_0-\mu )+\sum _{i=1}^mu_iT_d(p_i)\right. \nonumber \\&\left. +\sum _{i=1}^nv_iT_d(-x_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+,v\in \mathbb {R}^n_+\right\} . \end{aligned}$$
(12)

Proposition 6

Problem (10) and problem (11) are relaxations for problem (6) with \(z_{{\textit{CP}}}\le z_+\) and \(z_{{\textit{SP}},+}\le z_+\).

Theorem 3

For problem (6), its Semi-Lagrangian dual function optimal value and its Lagrangian dual function optimal value satisfy

$$\begin{aligned} \begin{aligned} z_{{\textit{semi}}}&=\sup \left\{ \mu :(\mu ,u)\in {\mathbb {R}}\times \mathbb {R}^m_+, T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+)\right\} ,\\ z_{{\textit{LD}},+}&=\sup \left\{ \mu :(\mu ,u,v)\in {\mathbb {R}}\times {\mathbb {R}}^m_+\times {\mathbb {R}}^n_+, T_d(L_+(x;u,v)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} , \end{aligned} \end{aligned}$$

and

  1. (a)

    \(z_{{\textit{LD}},+}\le z_{{\textit{semi}}}=z_{{\textit{CD}}}\le z_{{\textit{CP}}} \le z_+\).

  2. (b)

    \(z_{{\textit{LD}},+}=z_{{\textit{SD}},+}\le z_{{\textit{SP}},+}\le z_+\).

Proof

By Corollary 1,

$$\begin{aligned} \begin{aligned} {\varTheta }_{{\textit{semi}}}(u)&= \inf \{L(x;u):x\in \mathbb {R}^n\}\\&= \sup \left\{ \mu :T_d(L_+(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} ,\\ {\varTheta }_+(u,v)&= \inf \{L_+(x;u,v):x\in \mathbb {R}^n\}\\&= \sup \left\{ \mu :T_d(L_+(x;u,v)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} , \end{aligned} \end{aligned}$$

then

$$\begin{aligned} \begin{aligned} z_{{\textit{semi}}}&=\sup \{{\varTheta }_{{\textit{semi}}}(u):u\in \mathbb {R}^m_+\}\\&=\sup \left\{ \mu :(\mu ,u)\in {\mathbb {R}}\times {\mathbb {R}}^m_+, T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+)\right\} .\\ z_{{\textit{LD}},+}&=\sup \{{\varTheta }_+(u,v):u\in \mathbb {R}^m_+,v\in \mathbb {R}^n_+\}\\&=\sup \left\{ \mu :(\mu ,u,v)\in {\mathbb {R}}\times {\mathbb {R}}^m_+\times {\mathbb {R}}^n_+, T_d(L_+(x;u,v)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} . \end{aligned} \end{aligned}$$

For (a), from (12), we have,

$$\begin{aligned} \begin{aligned} z_{{\textit{CD}}}&=\sup \left\{ \mu :T_d(p_0)-\mu T_d(1)+\sum _{i=1}^mu_iT_d(p_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+),u\in \mathbb {R}^m_+\right\} \\&=\sup \left\{ \mu :T_d(p_0)+\sum _{i=1}^m u_ip_i-\mu \in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+),u\in \mathbb {R}^m_+\right\} \\&=\sup \left\{ \mu :(\mu ,u)\in \mathbb {R}\times \mathbb {R}_+^m,T_d(L(x;u)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}_+)\right\} \\&=\sup \{{\varTheta }_{semi}(u):u\in \mathbb {R}^m_+\}\\&=z_{{\textit{semi}}}. \end{aligned} \end{aligned}$$

And \(z_{{\textit{CD}}}\le z_{{\textit{CP}}} \le z_+\) is an immediate result of weak conic duality and Proposition 6. For (b), from (12), we have

$$\begin{aligned} \begin{aligned} z_{{\textit{SD}},+} =&\sup \left\{ \mu :T_d(p_0-\mu ) +\sum _{i=1}^mu_iT_d(p_i)\right. \\&\left. +\sum _{i=1}^nv_iT_d(-x_i)\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+,v\in \mathbb {R}^n_+\right\} \\ =&\sup \left\{ \mu :T_d\left( p_0(x)+\sum _{i=1}^mu_ip_i(x)\right. \right. \\&\left. \left. -\sum _{i=1}^nv^{\intercal }x-\mu \right) \in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1}),u\in \mathbb {R}^m_+,v\in \mathbb {R}^n_+\right\} \\ =&\sup \left\{ \mu :(\mu ,u,v)\in {\mathbb {R}}\times {\mathbb {R}}^m_+\times {\mathbb {R}}^n_+, T_d(L_+(x;u,v)-\mu )\in \mathcal{C}_{n+1,d}(\mathbb {R}^{n+1})\right\} \\ =&\sup \{{\varTheta }_+(u,v):u\in \mathbb {R}^m_+,v\in \mathbb {R}^n_+\}\\ =&z_{{\textit{LD}},+}. \end{aligned} \end{aligned}$$

And \(z_{{\textit{SD}},+}\le z_{{\textit{SP}},+}\le z_+\) holds directly from weak conic duality and Proposition 6.\(\square \)

4 Quadratic reformulation for POPs and its relaxations

In Sect. 3, we show that CP and CPSD tensor relaxations are as tight as Lagrangian relaxations for general POPs. In this section, we will compare CP and CPSD tensor relaxations of a POP with CP and PSD matrix relaxations of the quadratic reformulation of the POP. By quadratic reformulation, we refer to the reformulation of the POP obtained by introducing additional variables and constraints so that all the polynomials involved in the POP can be rewritten as quadratic polynomials on the original and additional variables. Given a QCQP reformulation of a POP, an approximate solution to the POP can be obtained using well-studied SDP or CP relaxations. On the other hand, as discussed in Sect. 3, relaxations for general POPs can be obtained directly by using the CP or the CPSD tensor cones. In general, it is difficult to compare these two relaxation approaches. However, we show in this section that for a fairly wide class of POPs, the tensor relaxation approach provides bounds for the POP that are as tight as the bounds for the POP obtained by using the quadratic reformulation approach described above.

Note that the results stated thus far in the article hold in similar fashion for maximization problems (beyond minimization problems). In what follows in this section, we purposely choose to consider maximization POPs for ease of presentation of Theorem 4, the main result in this section.

4.1 QCQP reformulation of a POP

A general POP can be reformulated as a QCQP in different ways by adding appropriate additional variables and constraints (see, e.g., [43], Sect. 4.5) In this section, the main focus is on some classes of 4th degree POPs. Thus, we use a specific reformulation approach for such problems; that is, we will introduce additional variables to represent the quadratic terms (i.e. the square of single variable and the multiplication of two variables) of the variables in the original problem. Specifically, consider the following POP:

$$\begin{aligned} \begin{aligned}&\sup&p_0(x) \\&\hbox {s.t.}&p_i(x)\le d_i,\ i=1,\ldots ,m_0, \\&&q_j(x)\le 0,\ j=1,\ldots ,m_1,\\&&x\in \mathbb {R}^n_+, \end{aligned} \end{aligned}$$
(13)

where \(p_0(x)\in \mathbb {R}_4[x],q_j(x)\in \mathbb {R}_2[x]\) (recall that \(\mathbb {R}_d[x]:=\{p\in \mathbb {R}[x]:{\text {deg}}(p)\le d \}\)) and \(p_i(x)\) are homogeneous polynomials of degree 4. Problem (13) encompasses a large class of 4th degree optimization problems, including QCQPs. Some problems that belong to this class are biquadratic assignment problems [38, 46], alternating current optimal power flow (ACOPF) problems [11, 23, 28, 32], independent component analysis problems [16], blind channel equalization problems in digital communication [37] and sensor localization problems [7] .

Define an index set

$$\begin{aligned} S= & {} \left\{ (a,b,c)\in {\mathbb {N}}^3: a=1,\ldots ,n, b=a,\ldots ,n,c=\left( n-\frac{a}{2}\right) (a-1)+b\right\} ,\qquad \end{aligned}$$
(14)

as the index for the additional variables to be added, where the index c goes from 1 to \(|S|={n+1 \atopwithdelims ()2}\), which is the maximum number of additional variables needed to reformulate 4th degree polynomials using 2nd degree polynomials. Specifically, introducing additional variables \(y_c=x_ax_b, \forall (a,b,c)\in S\), the QCQP reformulation of problem (13) can be reformulated as

(15)

where \(q_0(x,y)\) and \(h_i(y)\), \(i=1,\dots , m_0\), are the reformulated quadratic polynomials after replacing \(x_ax_b\) with \(y_c\), \(\forall (a,b,c)\in S\) in \(q_0(x)\), \(p_i(x)\), \(i=1,\dots ,m_0\). Clearly, problem (13) and (15) are equivalent. Furthermore, as \(p_i(x)\) and \(h_i(y)\) are homogeneous polynomials, then it follows that

$$\begin{aligned} \tilde{p}_i(x)=p_i(x)=h_i(y)=\tilde{h}_i(y),i=1,\ldots ,m_0. \end{aligned}$$
(16)

This fact will be used in Theorem 4 later in this section.

For ease of notation, let \(z=(x,y)\in \mathbb {R}^{n+|S|}_+\), then (15) is equivalent to

(17)

As an illustration of the quadratic reformulation discussed above, consider the next example.

Example 1

(QCQP reformulation) Consider the following univariate program,

(18)

Let \(y=x^2\) and \(z=(x,y)\in \mathbb {R}^2_+\), then problem (18) is equivalent to

4.2 CP matrix relaxations for a QCQP

Consider the following CP matrix relaxations for problem (17),

(19)

After relaxing the equality constraints \(Z_{1,c+n+1} - Z_{a+1,b+1} = 0, \forall (a,b,c)\in S\) into inequality constraints, we have the following CP tensor relaxation of problem (19)

(20)

Proposition 7

If problem (19) is feasible and the coefficients of \(q_0(z)\) in problem (19) are nonnegative, then problems (19) and (20) are equivalent.

Proof

This follows from the fact that problem (20) is a relaxation of problem (19), and the fact that if the coefficients of \(q_0(z)\) are nonnegative, then any optimal solution \(Z \in \mathcal{C}^*_{n+|S|+1,2}(\mathbb {R}^{n+|S|+1}_+)\) of problem (20) would satisfy \(Z_{1,c+n+1} = Z_{a+1,b+1}, \forall (a,b,c)\in S\).\(\square \)

Recall that the CP tensor relaxation (10) for general POPs. Below, we apply it directly to problem (17) to obtain the following conic program,

(21)

Problem (19) and (21) can be seen as two different relaxations for the POP (13). In problem (19) the polynomials with degree higher than 2 are reformulated as quadratic polynomials. SDP and CP matrix relaxations for the reformulated QCQP are well studied in the literature [3, 8, 9, 10, 13, 15, 27, 49, among others]. However, the introduction of the additional variables and constraints in problem (19) may result in the problem not satisfying the conditions required for QCQPs to be equivalent to their associated CP relaxation, even when the original POP satisfies the conditions required for its CP tensor relaxation being equivalent to the POP. Also, the additional variables and constraints can become substantially burdensome in terms of the problem’s size. In contrast, in problem (21) the polynomials with degree higher than 2 are represented by higher order tensors which avoids introducing additional variables and constraints. Next we will show that under some conditions, the latter relaxation will provide bounds that are at lease as tight as the ones obtained with the former approach for problem (13).

Lemma 1

([43, Lemma 2]) For any \(d>0\) and \(n>0\), \(\mathcal{C}^*_{n+1,d}(\mathbb {R}^{n+1}_+)=\text {conic}(M_d(\{0,1\}\times \mathbb {R}^n_+)).\)

Theorem 4

Consider a feasible problem (13) where the coefficients of \(p_0(x)\) are nonnegative, then problem (19) is a relaxation of problem (21).

Proof

By Proposition 7, problems (19) and (20) are equivalent. Using Lemma 1, for any feasible solution \(X \in \mathcal{C}^*_{n+1,4}(\mathbb {R}^{n+1}_+)\) to problem (21) we have

$$\begin{aligned} X = \sum _{s=1}^{n_1}\lambda _sM_4(1,u_s) + \sum _{t=1}^{n_0}\gamma _tM_4(0,v_t), \end{aligned}$$

for some \(n_0,n_1\ge 0,\lambda _s,\gamma _t>0\) and \(u_s,v_t\in \mathbb {R}^n_+\). Then by using (1),

$$\begin{aligned} \begin{aligned} 1&= \langle T_4(1),X\rangle = \sum _{s=1}^{n_1}\lambda _s,\\ d_i&\ge \langle T_4(p_i),X\rangle = \sum _{s=1}^{n_1}\lambda _sp_i(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{p}_i(v_t), \ i=1,\ldots ,m_0,\\ 0&\ge \langle T_4(q_j),X\rangle = \sum _{s=1}^{n_1}\lambda _sq_j(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{q}_j(v_t), \ j=1,\ldots ,m_1, \end{aligned} \end{aligned}$$
(22)

with an objective function value of \(\sum _{s=1}^{n_1}\lambda _sp_0(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{p}_0(v_t)\). Recall the index set S in (14), and construct a vector of \(w_s\), \(w'_t\) for \(s=1,\ldots ,n_1,t=1,\ldots ,n_0\) as follows:

$$\begin{aligned}&(w_s)_c = (u_s)_a(u_s)_b,\ \ (a,b,c)\in S,\nonumber \\&(w'_t)_c = (v_t)_a(v_t)_b,\ \ (a,b,c)\in S. \end{aligned}$$
(23)

Next we show that

$$\begin{aligned} Z = \sum _{s=1}^{n_1}\lambda _sM_2(1,(u_s,w_s)) + \sum _{t=1}^{n_0}\gamma _tM_2(0,(v_t,w'_t)), \end{aligned}$$
(24)

is a feasible solution to problem (19). Clearly, \(Z\in \mathcal{C}^*_{n+|S|+1,2}(\mathbb {R}^{n+|S|+1}_+)\), and from equation (23) and (24), we have

$$\begin{aligned} \begin{aligned} Z_{1,c+n+1}&= \sum _{s=1}^{n_1}\lambda _s(w_s)_c = \sum _{s=1}^{n_1}\lambda _s (u_s)_a(u_s)_b, \forall (a,b,c)\in S,\\ Z_{a+1,b+1}&= \sum _{s=1}^{n_1}\lambda _s (u_s)_a(u_s)_b + \sum _{t=1}^{n_0}\gamma _t (v_t)_a(v_t)_b, \forall (a,b,c)\in S, \end{aligned} \end{aligned}$$

which indicates that \(Z_{1,c+n+1} \le Z_{a+1,b+1},\ \forall (a,b,c)\in S\). From (22), it follows that

$$\begin{aligned} \begin{aligned} \langle T_2(1),Z\rangle&= \sum _{s=1}^{n_1}\lambda _s = 1,\\ \langle T_2(q_j),Z\rangle&= \sum _{s=1}^{n_1}\lambda _sq_j(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{q}_j(v_t)\le 0, \ j=1,\ldots ,m_1.\\ \end{aligned} \end{aligned}$$

Also, given that \(p_i(x), i=1,\dots ,m_0\) are homogeneous polynomials of degree 4, it follows from equations (16) and (22) that

$$\begin{aligned} \langle T_2(h_i),Z\rangle= & {} \sum _{s=1}^{n_1}\lambda _sh_i(w_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{h}_i(w'_t)\nonumber \\= & {} \sum _{s=1}^{n_1}\lambda _sp_i(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{p}_i(v_t)\le d_i, \ i=1,\ldots ,m_0. \end{aligned}$$
(25)

Furthermore, the feasible solution Z has an objective value equal to

$$\begin{aligned} \sum _{s=1}^{n_1}\lambda _sq_0(u_s,w_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{q}_0(v_t,w'_t)= \sum _{s=1}^{n_1}\lambda _sp_0(u_s)+\sum _{t=1}^{n_0}\gamma _t\tilde{q}_0(v_t,w'_t). \end{aligned}$$

Under the condition that \(p_0(x)\) has nonnegative coefficients and \(x\in \mathbb {R}^n_+\), we have

$$\begin{aligned} \sum _{t=1}^{n_0}\gamma _t\tilde{q}_0(v_t,w'_t)\ge \sum _{t=1}^{n_0}\gamma _t\tilde{p}_0(v_t). \end{aligned}$$

Therefore, from any feasible solution to problem (21), we can construct a feasible solution to problem (20) with a larger objective function value, which indicates that problem (19) is a relaxation for problem (21).\(\square \)

To illustrate the use of the condition that \(p_i(x), i=1,\dots ,m_0\) are homogeneous polynomials of degree 4 in Theorem 4, consider a constraint with two variables, \(p_1(x_1, x_2)=x_1^4+x_1^2x_2\le 1\), \(x_1,x_2 \ge 0\). For the QCQP reformulation of the constraint, the additional variable \(y=x_1^2\) is introduced. Then \(h_1(x_1, x_2,y)=y^2+yx_2\). Thus

$$\begin{aligned} \tilde{h}_1(x_1,x_2,y)=y^2+yx_2 \ge y^2 = x_1^4 = \tilde{p}_1(x_1, x_2), \end{aligned}$$

and it indicates (16) might not hold, which is needed for the proof of Theorem 4. Also note that for minimization POPs, Theorem 4 applies after changing the condition on the polynomial \(p_0(x)\) to have nonpositive coefficients.

Theorem 4 proves that the CP tensor relaxation can provide bounds that are at lease as tight as the ones obtained from CP matrix relaxations of a quadratic reformulation of a class of POPs. In next section, we will provide the results of numerical experiments that show that indeed, bounds for a POP based on CP tensor relaxations are tighter than the ones obtained via CP matrix relaxations.

5 Numerical comparison of two relaxations for PO

Unlike the tractability of the PSD matrix cone, the CPSD tensor cone is not tractable in general. Also similar to the intractability of CP matrices, the CP tensor cone is also not tractable in general. In this section, we will discuss and develop tractable approximations for the CP and the CPSD tensor cones. Then, we use these approximations to show that these CP and CPSD tensor relaxation of a POP provide tighter bounds than CP and PSD relaxations of the QCQP obtained from a quadratic reformulation of the POP.

5.1 Approximation of the CP and CPSD tensor cones

Let us first introduce some additional notation. For \(T=M_d(x),x\in \mathbb {R}^n\), denote \(T_{(i_1,\ldots ,i_d)}\) as the element in \((i_1,\ldots ,i_d)\) position of tensor T, where \((i_1,\ldots ,i_d)\in \{1,\ldots ,n\}^d\). To be more specific, \(i_j\) with \(j=1,\ldots ,d\) means the choice of \(\{x_1,\ldots ,x_n\}\) in the jth position in the tensor product, i.e. \(i_1=2\) means choosing \(x_2\) as the first position in the tensor product. To illustrate, let \(x\in \mathbb {R}^3\) and let

$$\begin{aligned} T^1=M_2(x)=\begin{pmatrix}x_1^2 &{} x_1x_2 &{} x_1x_3\\ x_1x_2&{} x_2^2 &{} x_2x_3\\ x_1x_3&{}x_2x_3&{}x_3^2\end{pmatrix}, \end{aligned}$$

then \(T^1_{(1,2)}=x_1x_2\) and it is in the (1,2) position in \(T^1\). Also for \(T=M_d(x),x\in \mathbb {R}^n\), when \(d>2\), let \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\) denote the matrix whose elements are given by

$$\begin{aligned} \left( T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\right) _{{\textit{jk}}}=T_{(i_1,\ldots ,i_{d-2},j,k)},\ j,k=1,\ldots ,n. \end{aligned}$$

For example, let \(T^2=M_3(x),x\in \mathbb {R}^3\), then

$$\begin{aligned} T^2_{(1,\cdot ,\cdot )}=\begin{pmatrix}x_1^3 &{} x_1^2x_2 &{} x_1^2x_3\\ x_1^2x_2&{} x_1x_2^2 &{} x_1x_2x_3\\ x_1^2x_3&{} x_1x_2x_3 &{} x_1x_3^2\end{pmatrix}, T^2_{(2,\cdot ,\cdot )}=\begin{pmatrix}x_1^2x_2 &{} x_1x_2^2 &{} x_1x_2x_3\\ x_1x_2^2&{} x_2^3 &{} x_2^2x_3\\ x_1x_2x_3&{} x_2^2x_3 &{} x_2x_3^2\end{pmatrix}. \end{aligned}$$

Definition 5

Let \(T=M_d(x),x\in \mathbb {R}^n\). For any \((i_1,\ldots ,i_{d-2})\in \{1,\ldots ,n\}^{d-2}\), \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\) is a principal matrix if \(I_k\subseteq \{0,\ldots ,d-2\}\) is even for all \(k=1,\ldots ,n\), where \(I_k\) is an ordered set of the number of appearance \(i_j=k, \forall j=1,\ldots ,d-2\) where \(k=1,\ldots ,n\).

For example, let \(T^3=M_8(x),x\in \mathbb {R}^3\), then

$$\begin{aligned} \begin{aligned}&T^{3}_{(1,1,2,2,3,3,\cdot ,\cdot )}, T^{3}_{(1,2,2,2,1,2,\cdot ,\cdot )}, T^{3}_{(2,3,2,1,3,1,\cdot ,\cdot )} {\text { are principal matrices}};\\&T^{3}_{(1,1,1,2,3,3,\cdot ,\cdot )}, T^{3}_{(1,2,2,2,2,2,\cdot ,\cdot )}, T^{3}_{(2,3,2,2,3,1,\cdot ,\cdot )} {\text { are not principal matrices.}} \end{aligned} \end{aligned}$$

Next we will discuss the approximation strategies for the CP and the CPSD tensor cones based on PSD and DNN matrices.

Definition 6

A symmetric matrix X is called doubly nonnegative (DNN) if and only if \(X\succeq 0\) and \(X\ge 0\), where \(X\ge 0\) indicates that every element of X is nonnegative.

Proposition 8

Let \(T \in \mathcal{S}_{n,d}\) be a given symmetric tensor.

  1. (a)

    If \(T\in \mathcal{C}_{n,d}^*(\mathbb {R}^n_+)\), then \(T_{(i_1,\ldots ,i_d)}\ge 0, T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\succeq 0, \forall (i_1,\ldots ,i_d)\in \{1,\ldots ,n\}^d\).

  2. (b)

    If \(T\in \mathcal{C}_{n,d}^*(\mathbb {R}^n)\), for all principal matrices \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\), \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\succeq 0, \forall (i_1,\ldots ,i_d)\in \{1,\ldots ,n\}^d\).

Proof

For part (a), by Proposition 3 (a), \(T=\sum _{i \in {\mathbb {N}}}\lambda _iM_d(x^i)\), where \(x^i\in \mathbb {R}^n_+, \lambda _i\ge 0\), for all \(i \in {\mathbb {N}}\), and \(\sum _{i \in {\mathbb {N}}} \lambda _i=1\). Then it is clear that \(T_{(i_1,\ldots ,i_d)}\ge 0\), and

$$\begin{aligned} T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}=\sum _{i \in {\mathbb {N}}} \lambda _i\prod _{k=1}^n(x^i_k)^{I_k} (x^i(x^i)^{\intercal }), \end{aligned}$$
(26)

as \(x^i(x^i)^{\intercal }\succeq 0,\forall i \in {\mathbb {N}}\) and \(\prod _{k=1}^n(x^i_k)^{I_k}\ge 0\), then \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\succeq 0\), and \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\ge 0\) for all \((i_1,\ldots ,i_d)\in \{1,\ldots ,n\}^d\). For part (b), notice that the number of appearance \(I_k, k=1,\ldots ,n\) is even if \(T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\) is a principal matrix, then the proof follows as in part (a) from the fact that \(\prod _{k=1}^n(x^i_k)^{I_k}\ge 0\) in (26).\(\square \)

Example 2

To illustrate Proposition 8, take \(T\in \mathcal{C}^*_{2,4}(\mathbb {R}^2_+)\) as an example. By Proposition 3 (a), \(T=\sum _{i \in {\mathbb {N}}}\lambda _iM_4(x^i)\), where \(\lambda _i\ge 0,\sum _i\lambda _i=1\) and \(x^i\in \mathbb {R}^2_+\). Then for any \(y\in \mathbb {R}^2\),

$$\begin{aligned} y^{\intercal }T_{(1,2,\cdot ,\cdot )}y = y^{\intercal }\sum _{i \in {\mathbb {N}}}\lambda _iM_4(x^i)_{(1,2,\cdot ,\cdot )}y=x^i_1x^i_2\sum _{i \in {\mathbb {N}}}(y^{\intercal }x^i)^2\ge 0, \end{aligned}$$

which indicates that \(T_{(1,2,\cdot ,\cdot )}\) is a \(2\times 2\) positive semidefinite matrix.

Next we discuss linear matrix inequality (LMI) approximation of the CPSD and the CP tensor cones. Based on Proposition 8, define the following tensor cones

$$\begin{aligned} \mathcal{K}^{SDP}_{n,d}= & {} \left\{ T\in \mathcal{S}_{n,d}:T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\succeq 0,\ \forall (i_1,\ldots ,i_{d-2})\in \{1,\ldots ,n\}^{d-2}\right\} ,\nonumber \\ \mathcal{K}^L_{n,d}= & {} \left\{ T\in \mathcal{S}_{n,d}:T_{(i_1,\ldots ,i_d)}\ge 0,\ \forall (i_1,\ldots ,i_{d})\in \{1,\ldots ,n\}^{d}\right\} ,\nonumber \\ \mathcal{K}^{DNN}_{n,d}= & {} \left\{ T\in \mathcal{S}_{n,d}:T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\succeq 0,T_{(i_1,\ldots ,i_{d-2},\cdot ,\cdot )}\right. \nonumber \\&\left. \ge 0,\ \forall (i_1,\ldots ,i_{d-2})\in \{1,\ldots ,n\}^{d-2}\right\} . \end{aligned}$$
(27)

It is easy to see these cones are convex closed cones satisfying

$$\begin{aligned} \begin{aligned}&\mathcal{C}^*_{n,d}(\mathbb {R}^n)\subseteq \mathcal{K}^{{\textit{SDP}}}_{n,d}, \\&\mathcal{C}^*_{n,d}(\mathbb {R}^n_+)\subseteq \mathcal{K}^{{\textit{DNN}}}_{n,d}\subseteq \mathcal{K}^L_{n,d}. \end{aligned} \end{aligned}$$
(28)

Consider the following conic program,

From (28), problem [TP-\(\mathcal{K}\)] is a LMI relaxation for problem (8) when \(\mathcal{K}\) is one of the cones defined in (27). These relaxations allow to approximately solve general POPs. In particular, it follows that

$$\begin{aligned}&z_{[\mathbf TP -\mathcal{K}^{SDP}] } \le z_{SP} \le z,\\&z_{[\mathbf TP -\mathcal{K}^{L}]} \le z_{[\mathbf TP -\mathcal{K}^{DNN}]} \le z_{CP} \le z_+. \end{aligned}$$

5.2 Numerical results

In Sect. 5.1, several LMI approximations for the CP and the CPSD tensor cones have been proposed to provide tractable relaxations for CP and CPSD tensor programs. In this section, we will provide numerical results on more general POP cases (compared to Sect. 4.2) in order to compare the bounds of two relaxation approaches discussed in Sect. 4.2. Similar to \([\mathbf TP -\mathcal{K}^{L}]\) and \([\mathbf TP -\mathcal{K}^{DNN}]\), denote \([\mathbf QP _{L}]\) and \([\mathbf QP _{DNN}]\)as the linear relaxation and \({\textit{DNN}}\) relaxation for problem (19). Also, denote by \([\mathbf QP _{SDP}]\) the SDP relaxation for the quadratic reformulation of problem (7). In Table 1, we compare the two approaches in terms of number and size of PSD matrices for 4th-degree POPs, to show that both approaches result in the solution of similar sized SDPs.

Table 1 Size comparison of different relaxations for 4th-degree POPs

Next we present some preliminary results on small scale POPs to illustrate the performance of CP and CPSD tensor relaxations. Note that only bounds are compared as the time differences are negligible for the small scale examples considered below. All the numerical experiments are conducted on a 2.4 GHz CPU laptop with 8 GB memory. We implement all the models with YALMIP [34] in Matlab. We use SeDuMi as the SDP solver and CPLEX as the LP solver. For examples in Sects. 5.2.4 and 5.2.5, we use Couenne as the global solver.

5.2.1 Simple POP

Consider the following problem,

(29)

By observation, the optimal value is 1, with an optimal solution \(x_1^*=1,x_k^*=0,k=2,\ldots ,n\). The QCQP reformulation of (29) with the least number of additional variables is

(30)

Relaxation \([\mathbf QP _{L}]\) for (30) gives an optimal value of 0. Relaxation \([\mathbf TP -\mathcal{K}^{L}]\) can be directly applied to (29) and gives an optimal value of 1, which means the approximation by using tensor relaxation is tight.

5.2.2 Bi-quadratic POPs

Bi-quadratic problem and its difficulty have been studied in [33]. Consider the following specific bi-quadratic POP,

(31)

where \(\Vert \cdot \Vert \) is the standard Euclidean norm. It is clear that problem (31) is equivalent to

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbb {R}^n,y\in \mathbb {R}^m}&&\frac{1}{4}[x^{\intercal }(e_ne_n^{\intercal }-I_n)x][y^{\intercal }(e_me_m^{\intercal }-I_m)y] \\ \hbox {s.t.}&&\Vert x\Vert ^2=1,\Vert y\Vert ^2=1, \\ \end{aligned} \end{aligned}$$

where \(e_n,e_m\) are the all-ones vectors of dimension n and m respectively, and \(I_n,I_m\) are the identity matrices of dimension \(n\times n\) and \(m\times m\). It is clear that the optimal value is \(-\frac{1}{4}(\max \{n,m\}-1)\). By defining an index set

$$\begin{aligned} S(n)= & {} \left\{ (i,j,k)\in {\mathbb {N}}^3:i=1,\ldots ,n-1,j=i+1,\ldots ,n,k\right. \\= & {} \left. \left( n-\frac{i}{2}\right) (i-1)+j-i\right\} , \end{aligned}$$

we can reformulate problem (31) as a QCQP by introducing appropriate additional variables and constraints as

(32)

where \(w,z\in \mathbb {R}^m\) with \(|S(n)|=n(n-1)/2,|S(m)|=m(m-1)/2\). Let \(u=(x,y,w,z)\), then a naive SPD relaxation of problem (32) is given by

(33)

Although we use (33) in the analysis that follows, it is worth to mention that more elaborated SDP relaxations of (31), that provide bounds with guaranteed performance, are discussed in [33].

Proposition 9

Problem (33) is unbounded.

Proof

Let \(\bar{u}\) be a \((n+m+|S(n)|+|S(m)|)\times 1\) all-zero vector and let \(\bar{Q}\) be a \((n+m+|S(n)|+|S(m)|)\times (n+m+|S(n)|+|S(m)|)\) matrix such that

$$\begin{aligned}&\bar{Q}_{11}=\bar{Q}_{n+1,n+1}=1,\bar{Q}_{n+m+1,n+m+1}=\bar{Q}_{n+m+|S(n)|+1,n+m+|S(n)|+1}=M^2,\\&\bar{Q}_{n+m+1,n+m+|S(n)|+1}=\bar{Q}_{n+m+|S(n)|+1,n+m+1}=-M, \end{aligned}$$

where M is a positive number and let all other entries for \(\bar{Q}\) be 0. It is clear that \((\bar{u},\bar{Q})\) is a feasible solution to problem (33). However, as \(M\rightarrow \infty \), the objective function values goes to \(-\infty \), thus the problem is unbounded.\(\square \)

Proposition 9 shows that the relaxation \([\mathbf QP _{SDP}]\) for problem (31) fails to provide a bound. However, a CPSD tensor cone can be directly applied to problem (31),

(34)

where \(p_0(x)\) is the objective function of problem (31).

In Table 2, we can see that the relaxation \([\mathbf TP -\mathcal{K}^{SDP}]\) can provide the optimal value for problem (33) while relaxation \([\mathbf QP _{SDP}]\) for the QCQP reformulation of problem (33) fails to give a bound.

Table 2 Relaxation Comparisons for Example 5.2.2

5.2.3 Non-convex QCQP

Consider the following nonconvex QCQP,

(35)

The optimal solution of problem (35) is \(x^*=(0,0.6667)^{\intercal }\) with \(f_0(x^*)=-6.4444\) (see [51]). A PSD relaxation and a CP relaxation of (35) have been studied in [51], that give bounds of \(-103.43\) and \(-26.67\) respectively for problem (35) (refer to Table 2 in [51], where (SDP+RLT) is actually a DNN relaxation for the CP relaxation of  (35)).

Now consider the equivalent formulation of (35), obtained after adding the valid inequalities \(x_2f_2(x)\le 0,x_1^2f_1(x)\le 0\):

(36)

The QCQP reformulation of (36) obtained by adding appropriate additional variables and constraints is given by

After using the \([\mathbf QP _{DNN}]\) relaxation on problem (36) with valid inequalities, the original bound of -26.67 obtained from the \([\mathbf QP _{DNN}]\) relaxation on problem (35) is not improved. In contrast, the \([\mathbf TP -\mathcal{K}^{DNN}]\) tensor relaxation on problem (36) provides a tighter bound, \(-12.83\), for the optimal value of (35).

In addition to adding valid inequalities discussed above, adding valid PSD constraints based on the reformulation linearization technique (RLT) can further strengthen the relaxations. Similar to the second order RLT-based valid constraints introduced in [14], using the constraint \(\langle T_4(1),X\rangle =1\), the conic constraint \(X\in \mathcal{C}^*_{3,4}(\mathbb {R}^3)\), and a quadratic constraint \(c_0+c_{10}x_1+c_{01}x_2+ c_{11}x_1^2+c_{12}x_1x_2+c_{22}x_2^2\ge 0\), the following valid PSD–RLT constraints for the CP tensor relaxation of problem (36) can be constructed

$$\begin{aligned} \begin{aligned}&\left( c_0+c_{10}x_1+c_{01}x_2+ c_{11}x_1^2+c_{12}x_1x_2+c_{22}x_2^2\right) X_{(0,0,\cdot ,\cdot )} \\&\quad = c_0X_{(0,0,\cdot ,\cdot )}+c_{10}X_{(1,0,\cdot ,\cdot )} +c_{01}X_{(2,0,\cdot ,\cdot )} + c_{11}X_{(1,1,\cdot ,\cdot )} \\&\qquad +c_{12}X_{(1,2,\cdot ,\cdot )} +c_{22}X_{(2,2,\cdot ,\cdot )} \succeq 0, \end{aligned} \end{aligned}$$
(37)

where \(X_{(0,0,\cdot ,\cdot )} \succeq 0\) as discussed in Sect. 5.1. Note that the CP tensor relaxations allow for the straightforward use of the valid PSD–RLT constraints. More importantly, with the valid PSD–RLT constraints, the \([\mathbf TP -\mathcal{K}^{DNN}]\) tensor relaxation gives the optimal value -6.4444 of problem (35).

A summary of the numerical results on problem (35) is given in Table 3.

Table 3 Relaxation comparisons for problem (35)

5.2.4 Random objective function with ellipsoidal feasible region

Here, we present numerical results on randomly generated 4th degree POPs with an ellipsoidal feasible region. The test problem is

(38)

where \(p_0(x_1, x_2, x_3)\) is a 4th degree polynomial whose coefficients are randomly selected from integers in the range \([-5,5]\). The first constraint make the feasible region non-convex. Also, it is easy to see that the problem is feasible and bounded. We use the \([\mathbf TP -\mathcal{K}^{DNN}]\) relaxation to approximate problem (38) and the \([\mathbf QP _{DNN}]\)relaxation to approximate the QCQP reformulation of problem (38). To compare these relaxation, and following [51], we use the following improvement ratio

$$\begin{aligned} {\textit{ratio}}=\frac{[\mathbf TP -\mathcal{K}^{DNN}]-[\mathbf QP _{DNN}]}{f_{{\textit{opt}}}-[\mathbf QP _{DNN}]}, \end{aligned}$$

where \(f_{{\textit{opt}}}\) denotes the optimal objective value of problem (38).

We also add PSD–RLT constraints to problem (38) using the constraints \((x_1-0.5)^2+(x_2-0.5)^2+(x_3-0.5)^2\ge 0.2^2\) and \((x_1-0.5)^2+(x_2-0.5)^2+(x_3-0.5)^2\le 0.6^2\). In Table 4, the relaxation \([\mathbf TP -\mathcal{K}^{DNN}]\) with PSD–RLT constraints provides the tightest bounds, and in fact, the optimal value of the problems. The relaxation \([\mathbf TP -\mathcal{K}^{DNN}]\) provides tighter bounds than \([\mathbf QP _{DNN}]\) for most test instances. For instances 8, 9, 18 and 20, the relaxation \([\mathbf TP -\mathcal{K}^{DNN}]\) gives the optimal objective value, while \([\mathbf QP _{DNN}]\) is not tight. For instances 15 and 17, \([\mathbf TP -\mathcal{K}^{DNN}]\) and \([\mathbf QP _{DNN}]\) give the same bound. An average of 50% improve ratio implies that \([\mathbf TP -\mathcal{K}^{DNN}]\) has better performance \([\mathbf QP _{DNN}]\) in approximating problem (38).

Table 4 Bound comparisons for problem (38)

5.2.5 Numerical results on randomly generated POPs

Next, we present numerical results on randomly generated POPs. The objective function is a 4th degree homogenous polynomial on 3 variables, with two 4th degree polynomial inequality constraints, a linear inequality constraint and nonnegative variables. The coefficients in the objective function are integers in the range \([-5,5]\) and the coefficients of the two polynomial constraints are integers in the range \([-10,10]\) and the coefficients of linear constraint are integers in the range [0, 5], with a right hand side coefficient in the range [5, 15]. We generate the problems and solve them with Couenne. For those problems which are feasible in Couenne, we use \([\mathbf TP -\mathcal{K}^{DNN}]\) to directly approximate these problems and \([\mathbf QP _{DNN}]\) to approximate the QCQP reformulation of these problems. Note that the convexity of these problems is not tested. Results are shown in Table 5, where we can clearly see that relaxation \([\mathbf QP _{DNN}]\) fails to give a valid bound for instances 1, 3, 6, 7, 8 and 10, while the tensor relaxation \([\mathbf TP -\mathcal{K}^{DNN}]\) can provide a valid lower bound for all tested instances.

Table 5 Relaxation Comparisons for Randomly Generated POPs

6 Conclusion

This article presents convex relaxations for general POPs over CP and CPSD tensor cones. Bomze, in [8] shows that CP matrix relaxations are as tight as Lagrangian relaxations for QPs with both linear and quadratic constraints. A natural question is whether similar results hold for general POPs that are not necessarily quadratic. Introducing CP and CPSD tensors to reformulate or relax general POPs, we generalize Bomze’s results to general POPs; that is, the CP tensor relaxations are as tight as Lagrangian relaxations for general POPs with degree higher than 2. These results provide another way of using symmetric tensor cones to globally approximate non-convex POPs. Burer in [13] shows that every QP with linear constraints and binary variables can be reformulated as a CP program and that QCQPs can be reformulated by CP programs under appropriate conditions. Note that one can reformulate general POPs as QPs by introducing additional variables and constraints and then apply Burer’s results to obtain global bounds on general POPs. Peña et al. generalize Burer’s results in [43] to show that under certain conditions a general POP can be reformulated as a conic program over the CP tensor cone. A natural question is which reformulations or relaxations will provide tighter bounds for general POPs. In this paper, we show that the bound of CP tensor relaxations is tighter than the bound of CP matrix relaxations for the quadratic reformulation of some classes of general POPs. This validates the advantages of using tensor cones for convexification of non-convex POPs. We also provide some tractable approximations of the CP tensor cone as well as the CPSD tensor cone, which allows the possibility to compute the bounds based on these tensor relaxations. Some preliminary numerical results on small scale POPs show that these tensor cone approximations can provide good bounds for the global optimum of the original POPs. More importantly, in the numerical results performed, the bounds obtained by CP or CPSD tensor cone programs yield tighter bounds than the ones obtained with CP or SDP matrix relaxations for quadratic reformulation of general POPs using a similar computational effort. In the future, it will be interesting to further characterize the classes of POPs for which the CP and CPSD tensor cone relaxations provide tighter bounds than the CP and PSD matrix relaxations of its associated quadratic reformulations. Also, more POP instances with larger sizes can be tested and numerical comparisons on these more complicated POP cases can be made by developing appropriate code to address these problems.