Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Introduction

The central question of this chapter is how to find out whether a given nc polynomial is a sum of hermitian squares (SOHS). We rely on Sect. 1.3, where we explained basic relations between SOHS polynomials and positive semidefinite Gram matrices. In this chapter we will enclose these results into the Gram matrix method and refine it with the Newton chip method.

2.2 The Gram Matrix Method

Recall from Sect. 1.3 that an nc polynomial \(f \in \mathbb{R}\langle \underline{X}\rangle _{2d}\) is SOHS if and only if we can find a positive semidefinite Gram matrix associated with f, i.e., a positive semidefinite matrix G satisfying \(\mathbf{W}_{d}^{{\ast}}G\mathbf{W}_{d} = f\), where W d is the vector of all words of degree ≤ d. This is a semidefinite feasibility problem in the matrix variable G. The constraints \(\langle \,A_{i}\,\vert \,G\,\rangle = b_{i}\) are implied by the fact that for each monomial w ∈ W 2d we have

$$\displaystyle{ \sum _{\begin{array}{c}u,v\in \mathbf{W}_{d} \\ u^{{\ast}}v=w \end{array}}G_{u,v} = a_{w}, }$$
(2.1)

where a w is the coefficient of w in f.

Problems like this can be (in theory) solved exactly using quantifier elimination [BPR06] as has been suggested in the commutative case by Powers and Wörmann [PW98]. However, this only works for problems of small size, so a numerical approach is needed in practice. Thus we turn to numerical methods to solve semidefinite programming problems.

Sums of hermitian squares are symmetric so we consider only \(f \in \mathrm{ Sym\,}\mathbb{R}\langle \underline{X}\rangle\). Two symmetric polynomials are equal if and only if all of their “symmetrized coefficients” (i.e., \(a_{w} + a_{w^{{\ast}}}\)) coincide, hence Eqs. ( 2.1) can be rewritten as

$$\displaystyle{ \sum \limits _{\stackrel{u,v\in \mathbf{W}_{d}}{u^{{\ast}}v=w}}G_{u,v} +\sum \limits _{\stackrel{u,v\in \mathbf{W}_{d}}{v^{{\ast}}u=w^{{\ast}}}}G_{v,u} = a_{w} + a_{w^{{\ast}}}\quad \forall w \in \mathbf{W}_{2d}, }$$
(2.2)

or equivalently,

$$\displaystyle{ \langle \,A_{w}\,\vert \,G\,\rangle = a_{w} + a_{w^{{\ast}}}\quad \forall w \in \mathbf{W}_{2d}, }$$
(2.3)

where A w is the symmetric matrix defined by

$$\displaystyle{(A_{w})_{u,v} = \left \{\begin{array}{ll} 2;&\mbox{ if}\ \ u^{{\ast}}v = w,\ w^{{\ast}} = w, \\ 1;&\mbox{ if}\ \ u^{{\ast}}v \in \{ w,w^{{\ast}}\},\ w^{{\ast}}\neq w, \\ 0;&\mbox{ otherwise}. \end{array} \right.}$$

Note that in this formulation the constraints obtained from w and w are the same so we keep only one of them. As we are interested in an arbitrary positive semidefinite G satisfying constraints (2.3), we can choose the objective function freely. However, in practice one prefers solutions of small rank leading to shorter SOHS decompositions. Hence we minimize the trace, a commonly used heuristic for matrix rank minimization (cf. [RFP10]). Therefore our SDP in primal form is as follows:

Summing up, the Gram matrix method can be presented in Algorithm 2.1.

Algorithm 2.1: The Gram matrix method for finding SOHS decompositions

Remark 2.1.

The order of G in (SOHSSDP) is the length of W d , which is \(\sigma = \frac{n^{d+1}-1} {n-1},\) as shown in Remark 1.12. Since \(\sigma =\sigma (n,d)\) grows exponentially with the polynomial degree d it easily exceeds the size manageable by the state-of-the-art SDP solvers, which is widely accepted to be of order 1000. This implies, for example, that the above algorithm can only handle nc polynomials in two variables if they are of degree < 10. Therefore it is very important to find an improvement of the Gram matrix method which will be able to work with much larger nc polynomials. This will be done in the rest of the chapter.

Example 2.2.

Let

$$\displaystyle{ f = X^{2} - X^{10}Y ^{20}X^{11} - X^{11}Y ^{20}X^{10} + X^{10}Y ^{20}X^{20}Y ^{20}X^{10}. }$$
(2.4)

The order of a Gram matrix G for f is \(\sigma (10) =\sigma (2,10) = 2^{41} - 1\) and is too big for today’s SDP solvers. Therefore any implementation of Algorithm 2.1 will get stuck. On the other hand, it is easy to see that

$$\displaystyle{f = (X - X^{10}Y ^{20}X^{10})^{{\ast}}(X - X^{10}Y ^{20}X^{10}) \in \varSigma ^{2}.}$$

The polynomial f is sparse and an improved SDP for testing whether (sparse) polynomials are sums of hermitian squares will be given below.

The complexity of solving an SDP is also determined by the number of Eq. ( 2.3), which we denote by m. There are exactly

$$\displaystyle{m =\mathrm{ card}\{w \in \mathbf{W}_{2d}\mid w^{{\ast}} = w\} + \frac{1} {2}\mathrm{card}\{w \in \mathbf{W}_{2d}\mid w^{{\ast}}\neq w\}}$$

such equations in (SOHSSDP). Since W d contains all words in \(\langle \underline{X}\rangle\) of degree ≤ d, we have \(m > \frac{1} {2}\sigma (2d) = \frac{n^{2d+1}-1} {2(n-1)}\).

For each w ∈ W 2d there are t different pairs \((u_{i},v_{i})\) such that \(w = u_{i}^{{\ast}}v_{i}\), where t = deg w + 1 if deg w ≤ d, and t = 2d + 1 − deg w if deg w ≥ d + 1. Note that t ≤ d + 1. Therefore the matrices A i defining constraints ( 2.3) have order \(\sigma (d)\) and every matrix A i has at most d + 1 nonzero entries if it corresponds to a symmetric monomial of f, and has at most 2(d + 1) nonzero entries otherwise. Hence the matrices A i are sparse. They are also pairwise orthogonal with respect to the standard scalar product on matrices \(\langle \,X\,\vert \,Y \,\rangle =\mathrm{ tr\,}X^{T}Y\), and have disjoint supports, as we now proceed to show:

Theorem 2.3.

Let {A i ∣i = 1,…,m} be the matrices constructed in Step 3 of Algorithm  2.1 [i.e., matrices satisfying  (2.3) ]. If \((A_{i})_{u,v}\neq 0\) , then \((A_{j})_{u,v} = 0\) for all j ≠ i. In particular, \(\langle \,A_{i}\,\vert \,A_{j}\,\rangle = 0\) for i ≠ j.

Proof.

The equations in the SDP underlying the SOHS decomposition represent the constraints that the monomials in W 2d must have coefficients prescribed by the polynomial f. Let us fix ij. The matrices A i and A j correspond to some monomials \(p_{1}^{{\ast}}q_{1}\) and \(p_{2}^{{\ast}}q_{2}\) (\(p_{i},q_{i} \in \mathbf{W}_{d}\)), respectively, and \(p_{1}^{{\ast}}q_{1}\neq p_{2}^{{\ast}}q_{2}\). If A i and A j both have a nonzero entry at position (u, v), then \(p_{1}^{{\ast}}q_{1} = u^{{\ast}}v = p_{2}^{{\ast}}q_{2}\), a contradiction. ■ 

Remark 2.4.

Sparsity and orthogonality of the constraints imply that the state-of-the-art SDP solvers can handle about 100,000 such constraints (see, e.g., [MPRW09]), if the order of the matrix variable is about 1000. The boundary point method introduced in [PRW06] and analyzed in [MPRW09] has turned out to perform best for semidefinite programs of this type. It is able to use the orthogonality of the matrices A i (though not the disjointness of their supports). In the computationally most expensive steps—solving a linear system—the system matrix becomes diagonal, so solving the system amounts to dividing by the corresponding diagonal entries.

Since W d contains all words in \(\langle \underline{X}\rangle\) of degree ≤ d, we have, e.g., for n = 2, d = 10 that \(m =\sigma (20) =\sigma (2,20) = 2,097,150\) and this is clearly out of reach for all current SDP solvers. Nevertheless, we show in the sequel that one can replace the vector W d in Step 2 of Algorithm 2.1 by a vector W which is usually much smaller and has at most kd words, where k is the number of symmetric monomials in f and 2d = deg f. Hence the order of the matrix variable G and the number of linear constraints m end up being much smaller in general.

2.3 Newton Chip Method

We present a modification of (Step 1 of) the Gram matrix method (Algorithm 2.1) by implementing the appropriate non-commutative analogue of the classical Newton polytope method [Rez78], which we call the Newton chip method and present it as Algorithm 2.2.

Definition 2.5.

Let us define the right chip function \(\mathrm{rc}:\langle \underline{ X}\rangle \times \mathbb{N}_{0} \rightarrow \langle \underline{ X}\rangle\) by

$$\displaystyle{ \mathrm{rc}(w_{1}\cdots w_{n},i):= \left \{\begin{array}{ll} w_{n-i+1}w_{n-i+2}\cdots w_{n}&\mbox{ if }1 \leq i \leq n; \\ w_{1}\cdots w_{n} &\mbox{ if }i > n; \\ 1 &\mbox{ if }i = 0. \end{array} \right. }$$

Example 2.6.

Given the word \(w = X_{1}X_{2}X_{1}X_{2}^{2}X_{1} \in \langle \underline{ X}\rangle\) we have \(\mathrm{rc}(w,4) = X_{1}X_{2}^{2}X_{1}\), rc(w, 6) = w and rc(w, 0) = 1. 

We introduce the Newton chip method, presented as Algorithm 2.2. It substantially reduces the word vector needed in the Gram matrix method.

Theorem 2.7.

Suppose \(f \in \mathrm{ Sym\,}\mathbb{R}\langle \underline{X}\rangle\) . Then \(f \in \varSigma ^{2}\) if and only if there exists a positive semidefinite matrix G satisfying

$$\displaystyle{ f = \mathbf{W}^{{\ast}}G\mathbf{W}, }$$

where W is the output given by the Newton chip method (Algorithm  2.2 ).

Proof.

Suppose \(f \in \varSigma ^{2}\). In every SOHS decomposition

$$\displaystyle{ f =\sum _{i}g_{i}^{{\ast}}g_{ i}, }$$

only words from \(\mathcal{D}\) (constructed in Step 4) are used, i.e., \(g_{i} \in \mathrm{span}\,\mathcal{D}\) for every i. This follows from the fact that the lowest and highest degree terms cannot cancel (cf. proof of Proposition 1.16). Let \(\mathcal{W}:=\bigcup _{i}\mathcal{W}_{g_{i}}\) be the union of the supports of the g i . We shall prove that \(\mathcal{W}\subseteq W\). For this, let us introduce a partial ordering on \(\langle \underline{X}\rangle\):

$$\displaystyle{w_{1}\preceq w_{2}\; \Leftrightarrow \;\exists \,i \in \mathbb{N}_{0}:\,\mathrm{ rc}(w_{2},i) = w_{1}.}$$

Note: \(w_{1}\preceq w_{2}\) if and only if there is a \(v \in \langle \underline{ X}\rangle\) with w 2 = vw 1.

Claim.

For every \(w \in \mathcal{W}\) there exists \(u \in \langle \underline{ X}\rangle\): \(w\preceq u\preceq u^{{\ast}}u \in \mathcal{W}_{f}\).

Proof.

Clearly, w w is a word that appears in the representation of \(g_{i}^{{\ast}}g_{i}\) which one naturally gets by multiplying out without simplifying, for some i. If \(w^{{\ast}}w\not\in \mathcal{W}_{f}\), then there are \(w_{1},w_{2} \in \mathcal{W}\setminus \{w\}\) with \(w_{1}^{{\ast}}w_{2} = w^{{\ast}}w\) (appearing with a negative coefficient so as to cancel the w w term). Then \(w\preceq w_{1}\) or \(w\preceq w_{2}\), without loss of generality, \(w\preceq w_{1}\). Continuing the same line of reasoning, but starting with \(w_{1}^{{\ast}}w_{1}\), we eventually arrive at \(w_{\ell} \in \mathcal{W}\) with \(w_{\ell}^{{\ast}}w_{\ell} \in \mathcal{W}_{f}\) and \(w\preceq w_{1}\preceq \cdots \preceq w_{\ell}\). Thus \(w\preceq w_{\ell}\preceq w_{\ell}^{{\ast}}w_{\ell} \in \mathcal{W}_{f}\), concluding the proof of the claim.

The theorem follows now. Since \(u^{{\ast}}u \in \mathcal{W}_{f}\) and w is a right chip of u we have w ∈ W. ■ 

Algorithm 2.2: The Newton chip method

Example 2.8 (Example 2.2 Continued).

The polynomial f from Example 2.2 has two hermitian squares: X 2 and \(X^{10}Y ^{20}X^{20}Y ^{20}X^{10}\). The first hermitian square contributes via the Newton chip method only one right chip: X; while the second hermitian square \(X^{10}Y ^{20}X^{20}Y ^{20}X^{10}\) contributes to W the following words: X,  X 2, , X 10 as well as \(Y X^{10},Y ^{2}X^{10},\ldots,Y ^{20}X^{10},XY ^{20}X^{10},\ldots,X^{10}Y ^{20}X^{10}\).

Applying the Newton chip method to f therefore yields W which is a vector in the lexicographic order and is equal to

$$\displaystyle{\mathbf{W} = \left [\begin{array}{cccccccccccccc} X\ &X^{2} & \cdots &X^{10}\ & Y X^{10}\ & \cdots &Y ^{20}X^{10}\ & XY ^{20}X^{10}\ & \cdots &X^{10}Y ^{20}X^{10} \end{array} \right ]^{T}}$$

of length 40. Problems of this size are easily handled by today’s SDP solvers. Nevertheless we provide a further strengthening of our Newton chip method reducing the number of words needed in this example to 2 (see Sect. 2.4).

2.4 Augmented Newton Chip Method

The following simple observation is often crucial to reduce the size of W returned by the Newton chip method.

Lemma 2.9.

Suppose W is the vector of words returned by the Newton chip method. If there exists a word u ∈ W such that the constraint in  (SOHSSDP) corresponding to u u can be written as

$$\displaystyle{\langle \,A_{u^{{\ast}}u}\,\vert \,G\,\rangle = 0}$$

and \(A_{u^{{\ast}}u}\) is a diagonal matrix (i.e., \((A_{u^{{\ast}}u})_{u,u} = 2\) and \(A_{u^{{\ast}}u}\) is 0 elsewhere), then we can eliminate u from W and likewise delete this equation from the semidefinite program.

Proof.

Indeed, such a constraint implies that G u, u  = 0 for the given u ∈ W, hence the uth row and column of G must be zero, since G is positive semidefinite. So we can decrease the order of (SOHSSDP) by deleting the uth row and column from G and by deleting this constraint. ■ 

Lemma 2.9 applies if and only if there exists a constraint \(\langle \,A_{w}\,\vert \,G\,\rangle = 0\), where w = u u for some u ∈ W and wv z for all v, z ∈ W, vz. Therefore we augment the Newton chip method (Algorithm 2.2) by new steps, as shown in Algorithm 2.3.

Algorithm 2.3: The Augmented Newton chip method

Note that in Step 2 there might exist some word u ∈ W which does not satisfy the condition initially but after deleting another u′ from W it does. We demonstrate Algorithm 2.3 in the following example:

Example 2.10 (Example 2.2 Continued).

By applying the Augmented Newton chip method to f from (2.4) we reduce the vector W significantly. Note that after Step 1, W also contains the words X 8, X 9, X 10. Although X 18 does not appear in f, we cannot delete X 9 from W immediately since \(X^{18} = (X^{9})^{{\ast}}X^{9} = (X^{8})^{{\ast}}X^{10}\). But we can delete X 10 since X 20 does not appear in f and \((X^{10})^{{\ast}}X^{10}\) is the unique decomposition of X 20 inside W. After deleting X 10 from W we realize that \((X^{9})^{{\ast}}X^{9}\) becomes the unique decomposition of X 18, hence we can eliminate X 9 too. Eventually the Augmented Newton chip method returns

$$\displaystyle{\mathbf{W} = \left [\begin{array}{cc} X\ &X^{10}Y ^{20}X^{10} \end{array} \right ]^{T},}$$

which is exactly the minimum vector needed for the SOHS decomposition of f.

2.5 Implementation

2.5.1 On the Gram Matrix Method

The Gram matrix method (Algorithm 2.1) consists of two main parts: (1) constructing the matrices corresponding to (SOHSSDP)—Step 3 and (2) solving the constructed SDP in Step 4. Step 3 is straightforward, running the Augmented Newton chip method (Algorithm 2.3) gives the desired vector of relevant words. There are no numerical problems, no convergence issues, Algorithm 2.3 always terminates with the desired vector W.

The second main part is more subtle. Solving an instance of SDP in practice always involves algorithms that are highly numerical: algorithms to compute spectral decompositions, solutions of systems of linear equations, inverses of matrices, etc. Methods for solving SDP, especially interior point methods [dK02, Ter96, WSV00], but also some first order methods [MPRW09, PRW06], typically assume strictly feasible solutions on the primal and the dual side, which imply the strong duality property and the attainability of optimums on both sides. Moreover, this assumption also guarantees that most of the methods will converge to a primal–dual \(\varepsilon\)-optimal solution; see also Sect. 1.13

As the following example demonstrates, the Slater condition is not necessarily satisfied on the primal side in our class of (SOHSSDP) problems.

Example 2.11.

Let \(f = (XY + X^{2})^{{\ast}}(XY + X^{2})\). It is homogeneous, and the Augmented Newton chip method gives

$$\displaystyle{\mathbf{W} = \left [\begin{array}{cc} X^{2} \\ XY \end{array} \right ].}$$

There exists a unique symmetric Gram matrix

$$\displaystyle{G = \left [\begin{array}{cc} 1&1\\ 1 &1 \end{array} \right ]}$$

for f such that f = W G W. Clearly G, a rank 1 matrix, is the only feasible solution of (SOHSSDP), hence the corresponding SDP has no strictly feasible solution on the primal side.

If we take the objective function in our primal SDP (SOHSSDP) to be equal to \(\langle \,I\,\vert \,G\,\rangle\), then the pair y = 0, Z = I is always strictly feasible for the dual problem of (SOHSSDP) and thus we do have the strong duality property.

Hence, when the given nc polynomial is in \(\varSigma ^{2}\), the corresponding semidefinite program (SOHSSDP) is feasible, and the optimal value is attained. If there is no strictly feasible solution, then numerical difficulties might arise but state-of-the-art SDP solvers such as SeDuMi [Stu99], SDPT3 [TTT99], SDPA [YFK03], or MOSEK [ApS15] are able to overcome them in most of the instances. When the given nc polynomial is not in \(\varSigma ^{2}\), then the semidefinite problem (SOHSSDP) is infeasible and this might cause numerical problems as well. However, state-of-the-art SDP solvers are generally robust and can reliably detect infeasibility for most practical problems; for more details see [dKRT98, PT09].

2.5.2 Software Package NCSOStools

The software package NCSOStools  [CKP11] was developed to help researchers working in the area of non-commutative polynomials. NCSOStools  [CKP11] is an open source Matlab toolbox for solving SOHS related problems using semidefinite programming. It also implements symbolic computation with non-commuting variables in Matlab.

There is a small overlap in features with Helton’s NCAlgebra package for Mathematica [HMdOS15]. However, NCSOStools  [CKP11] performs basic manipulations with non-commuting variables and is mainly oriented to detect several variants of constrained and unconstrained positivity of nc polynomials, while NCAlgebra is a fully fledged add-on for symbolic computation with polynomials, matrices, and rational functions in non-commuting variables.

When we started writing NCSOStools we decided to use Matlab as a main framework since we solve the underlying SDP instances by existing open source solvers like SeDuMi [Stu99], SDPT3 [TTT99], or SDPA [YFK03] and these solvers can be very easily run within Matlab.

Readers interested in solving sums of squares problems for commuting polynomials are referred to one of the many great existing packages, such as SOSTOOLS [PPSP05], SparsePOP [WKK+09], GloptiPoly [HLL09], or YALMIP [Löf04].

Detecting sums of hermitian squares by the Gram matrix method and using the (Augmented) Newton chip method can be done within NCSOStools by calling NCsos.

Example 2.12 (Example 2.11 Continued).

We declare the polynomial f that we started considering in Example 2.11 within NCSOStools by

NCvars x y

>> f=(x*y+x^2)’*(x*y+x^2)

By calling

>> [IsSohs,Gr,W,sohs,g,SDP_data,L] = NCsos(f)

we obtain that f is SOHS (IsSohs=1), the vector given by the Augmented Newton chip methods (W) and the corresponding Gram matrix Gr:

W =

    ’x*x’

    ’x*y’

Gr =

    1.0000    1.0000

    1.0000    1.0000

Likewise we obtain the SOHS decomposition of f

sohs =

       x^2+x*y

   2.2e-07*x*y

which means that the SOHS decomposition for f is

$$\displaystyle{f = (X^{2} + XY )^{{\ast}}(X^{2} + XY ) + (2.2 \cdot 10^{-7}XY )^{{\ast}}(2.2 \cdot 10^{-7}XY ).}$$

This is \(\varepsilon\) correct for \(\varepsilon = 10^{-13}\), i.e., if we leave cut off all monomials with coefficients less than 10−13 we obtain f. We can control precision using the parameter pars.precision. All monomials in sohs having coefficient smaller than pars.precision are ignored. Therefore by running

>> pars.precision=1e-6;

>> [IsSohs,Gr,W,sohs,g,SDP_data,L] = NCsos(f,pars);

we obtain the exact value for a SOHS decomposition of f, i.e., f is exactly a SOHS of elements from sohs.

The data describing the semidefinite program (SOHSSDP) is given in SDP_data while the optimal matrix for the dual problem to (SOHSSDP) is given in L. In g we return sum of squares of entries from sohs with monomials having coefficient larger than 10−8 which is an internal parameter.