1 Introduction

Principal component analysis (PCA) plays an important role in applications arising from data analysis, dimension reduction and bioinformatics etc. PCA finds a few linear combinations of the original variables. These linear combinations, which are called principal components (PCs), are orthogonal to each other and explain most of the variance of the data. PCs provide a powerful tool to compress data along the direction of maximum variance to reach the minimum information loss. Specifically, let \(\xi = (\xi _1,\ldots ,\xi _m)\) be an \(m\)-dimensional random vector. Then for a given data matrix \(A\in \mathbb {R}^{m\times n}\) which consists of \(n\) samples of the \(m\) variables, finding the PC that explains the largest variance of the variables \((\xi _1,\ldots ,\xi _m)\) corresponds to the following optimization problem:

$$\begin{aligned} \left( \lambda ^*,x^*,y^*\right) := \min _{\lambda \in \mathbf {R}, x\in \mathbf {R}^m, y\in \mathbf {R}^n}\Vert A - \lambda x y^{\top }\Vert . \end{aligned}$$
(1)

Problem (1) is well known to be reducible to computing the largest singular value (and corresponding singular vectors) of \(A\), and can be equivalently formulated as:

$$\begin{aligned} \begin{array}{l@{\quad }l} \mathop {\max }\limits _{x,y} &{} \left( \begin{array}{c} x \\ y \end{array} \right) ^{\top } \left( \begin{array}{c@{\quad }c} 0 &{} A \\ A^{\top } &{} 0 \end{array} \right) \left( \begin{array}{c} x \\ y \end{array} \right) \\ \text{ s.t. } &{} \left\| \left( \begin{array}{c} x \\ y \end{array} \right) \right\| = 1. \end{array} \end{aligned}$$
(2)

Note that the optimal value and the optimal solution of Problem (2) correspond to the largest eigenvalue and the corresponding eigenvector of the symmetric matrix \(\begin{pmatrix} 0 &{}\quad A \\ A^{\top } &{}\quad 0\end{pmatrix}\).

Although the PCA and eigenvalue problem for matrix have been well studied in the literature, the research of PCA for tensors is still lacking. Nevertheless, the tensor PCA is of great importance in practice and has many applications in computer vision [56], diffusion Magnetic Resonance Imaging (MRI) [2, 19, 50], quantum entanglement problem [27], spectral hypergraph theory [30] and higher-order Markov chains [37]. This is mainly because in real life we often encounter multidimensional data, including images, video, range data and medical data such as CT and MRI. A color image can be considered as 3D data with row, column, color in each direction, while a color video sequence can be considered as 4D data, where time is the fourth dimension. Moreover, it turns out that it is more reasonable to treat the multidimensional data as a tensor instead of unfolding it into a matrix. For example, Wang and Ahuja [56] reported that the images obtained by tensor PCA technique have higher quality than that by matrix PCA. Similar to its matrix counterpart, the problem of finding the PC that explains the most variance of a tensor \(\mathcal {A}\) (with degree \(m\)) can be formulated as:

$$\begin{aligned} \begin{array}{l@{\quad }l} \min &{} \Vert \mathcal {A} - \lambda x^1\otimes x^2 \otimes \cdots \otimes x^m \Vert \\ \text{ s.t. } &{} \lambda \in \mathbf {R},\, \Vert x^i\Vert =1, i=1,2,\ldots ,m, \end{array} \end{aligned}$$
(3)

which is equivalent to

$$\begin{aligned} \begin{array}{ll} \max &{} \mathcal {A} (x^1, x^2, \ldots , x^m) \\ \text{ s.t. } &{} \Vert x^i\Vert =1, i=1,2,\ldots ,m, \end{array} \end{aligned}$$
(4)

where \(\otimes \) denotes the outer product between vectors; viz.

$$\begin{aligned} (x^1\otimes x^2 \otimes \cdots \otimes x^m)_{i_1 i_2\ldots i_m} = \prod \limits _{k=1}^m (x^k)_{i_k}. \end{aligned}$$

Let us call the above solution the leading PC. Once the leading PC is found, the other PCs can be computed sequentially via the so-called “deflation” technique. For instance, the second PC is defined as the leading PC of the tensor subtracting the leading PC from the original tensor, and so forth. The theoretical basis of such deflation procedure for tensors is not exactly sound, although its matrix counterpart is well established (see [44] and the references therein for more details). However, the deflation process does provide a heuristic way to compute multiple PCs of a tensor, albeit approximately. Thus in the rest of this paper, we focus on finding the leading PC of a tensor.

Problem (4) is also known as the best rank-one approximation of tensor \(\mathcal A\). As we shall see later, problem (4) can be reformulated as

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F} (x, x, \ldots , x) \\ \text{ s.t. } &{} \Vert x \Vert = 1, \end{array} \end{aligned}$$
(5)

where \(\mathcal {F}\) is a super-symmetric tensor. Problem (5) is NP-hard and is known as the maximum Z-eigenvalue problem. Note that a variety of eigenvalues and eigenvectors of a real symmetric tensor were introduced by Lim [38] and Qi [48] independently in 2005. Since then, various methods have been proposed to find the Z-eigenvalues [10, 3133, 49], which however may correspond only to local optimums. In this paper, we shall focus on finding the global optimal solution of (5).

Before proceeding let us introduce notations that will be used throughout the paper. We denote \(\mathbf {R}^n\) to be the \(n\)-dimensional Euclidean space. A tensor is a high dimensional array of real data, usually in calligraphic letter, and is denoted as \(\mathcal {A}=(\mathcal {A}_{i_1i_2\ldots i_m})_{n_1\times n_2\times \cdots \times n_m}\). The space where \(n_1\times n_2\times \cdots \times n_m\) dimensional real-valued tensors reside is denoted by \(\mathbf {R}^{n_1\times n_2\times \cdots \times n_m}\). We call \(\mathcal {A}\) super-symmetric if \(n_1=n_2=\cdots =n_m\) and \(\mathcal {A}_{i_1i_2\ldots i_m}\) is invariant under any permutation of \((i_1,i_2,\ldots ,i_m)\), i.e., \(\mathcal {A}_{i_1i_2\ldots i_m} = \mathcal {A}_{\pi (i_1,i_2,\ldots ,i_m)}\), where \(\pi (i_1,i_2,\ldots ,i_m)\) is any permutation of indices \((i_1,i_2,\ldots ,i_m)\). The space where \(\underbrace{n\times n \times \cdots \times n}_{m}\) super-symmetric tensors reside is denoted by \(\mathbf {S}^{n^m}\). Special cases of tensors are vector (\(m=1\)) and matrix (\(m=2\)), and tensors can also be seen as a long vector or a specially arranged matrix. For instance, the tensor space \(\mathbf {R}^{n_1\times n_2\times \cdots \times n_m}\) can also be seen as a matrix space \(\mathbf {R}^{(n_1\times n_2\times \cdots \times n_{m_1}) \times (n_{m_1+1}\times n_{m_1+2} \times \cdots \times n_{m})}\), where the row is actually an \(m_1\) array tensor space and the column is another \(m-m_1\) array tensor space. Such connections between tensor and matrix re-arrangements will play an important role in this paper. As a convention in this paper, if there is no other specification we shall adhere to the Euclidean norm (i.e., the \(L_2\)-norm) for vectors and tensors; in the latter case, the Euclidean norm is also known as the Frobenius norm, and is sometimes denoted as \(\Vert \mathcal {A}\Vert _F=\sqrt{\sum _{i_1,i_2,\ldots ,i_m} \mathcal {A}_{i_1i_2\ldots i_m}^2}\). For a given matrix \(X\), we use \(\Vert X\Vert _*\) to denote the nuclear norm of \(X\), which is the sum of all the singular values of \(X\). Regarding the products, we use \(\otimes \) to denote the outer product for tensors; that is, for \(\mathcal {A}_1\in \mathbf {R}^{n_1\times n_2\times \cdots \times n_{m}}\) and \(\mathcal {A}_2\in \mathbf {R}^{n_{m+1}\times n_{m+2} \times \cdots \times n_{m+\ell }}, \mathcal {A}_1 \otimes \mathcal {A}_2\) is in \(\mathbf {R}^{n_1\times n_2\times \cdots \times n_{m+\ell }}\) with

$$\begin{aligned} (\mathcal {A}_1 \otimes \mathcal {A}_2)_{i_1i_2\ldots i_{m+\ell }} = (\mathcal {A}_1)_{i_1i_2\ldots i_m} (\mathcal {A}_2)_{i_{m+1}\ldots i_{m+\ell }}. \end{aligned}$$

The inner product between tensors \(\mathcal {A}_1\) and \(\mathcal {A}_2\) residing in the same space \(\mathbf {R}^{n_1\times n_2\times \cdots \times n_m}\) is denoted

$$\begin{aligned} \mathcal {A}_1 \bullet \mathcal {A}_2 = \sum _{i_1,i_2,\ldots ,i_m} (\mathcal {A}_1)_{i_1i_2\ldots i_m} (\mathcal {A}_2)_{i_1i_2\ldots i_m}. \end{aligned}$$

Under this light, a multi-linear form \(\mathcal {A}(x^1,x^2,\ldots ,x^m)\) can also be written in inner/outer products of tensors as

$$\begin{aligned} \mathcal {A}\bullet (x^1\otimes \cdots \otimes x^m)&:= \sum \limits _{i_1,\ldots , i_m }\mathcal {A}_{i_1{i_2} \ldots i_m }(x^1\otimes \cdots \otimes x^m)_{i_1\ldots i_m }\\&= \sum \limits _{i_1,\ldots , i_m }\mathcal {A}_{i_1\ldots i_m }\prod \limits _{k=1}^{m}{x^k_{i_k}}. \end{aligned}$$

In the subsequent analysis, for convenience we assume \(m\) to be even; i.e., \(m=2d\) in (5), where \(d\) is a positive integer. As we will see later, this assumption is essentially non-restrictive. Therefore, we will focus on the following problem of computing the largest eigenvalue of an even order super-symmetric tensor:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d}) \\ \mathrm{s.t.}&{} \Vert x\Vert =1, \end{array} \end{aligned}$$
(6)

where \(\mathcal {F}\) is a \(2d\)-th order super-symmetric tensor. In particular, problem (6) can be equivalently written as

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}\bullet \underbrace{x\otimes \cdots \otimes x}_{2d}\\ \mathrm{s.t.}&{} \Vert x\Vert =1. \end{array} \end{aligned}$$
(7)

Given any \(2d\)-th order super-symmetric tensor form \(\mathcal {F}\), we call it rank one if \(\mathcal {F} = { \lambda }\underbrace{a\otimes \cdots \otimes a}_{2d}\) for some \(a \in \mathbf {R}^n\) and \(\lambda \in \{ 1, -1 \}\). Moreover, the CP rank of \(\mathcal {F}\) is defined as follows.

Definition 1.1

Suppose \(\mathcal {F} \in \mathbf {S}^{n^{2d}}\), the CP rank of \(\mathcal {F}\) denoted by \(\mathrm{rank}(\mathcal {F})\) is the smallest integer \(r\) satisfying

$$\begin{aligned} \mathcal {F} = \sum _{i=1}^{r}\lambda _i\underbrace{a^i\otimes \cdots \otimes a^i}_{2d}, \end{aligned}$$
(8)

where \(a^{i}\in \mathbf {R}^n, {\lambda _i \in \{ 1,-1 \}}\).

The idea of decomposing a tensor into an (asymmetric) outer product of vectors was first introduced and studied by Hitchcock  [28, 29]. This concept of tensor-rank became popular after its rediscovery in the 1970s in the form of CANDECOMP (canonical decomposition) by Carroll and Chang [7] and PARAFAC (parallel factors) by Harshman [24]. Consequently, CANDECOMP and PARAFAC are further abbreviated as ‘CP’ in the context of ‘CP rank’ by many authors in the literature.

We remark that, the CP rank is theoretically associated with the complex number field, while in Definition 1.1 decomposition (8) is performed in the real domain. Though the choice of complex or real domain is immaterial for the matrices, it does make a difference in the tensor case [11]. Since we only focus on the real tensors here, throughout this paper we shall use the CP rank to denote the symmetric real rank of a super-symmetric tensor.

In the following, to simplify the notation, we denote \({\mathbb K}(n,d)\!=\!\Big \{k=(k_1,\ldots , k_n)\in {\mathbb Z}_+^{n} \,\bigg \vert \, \sum _{j=1}^{n}k_j=d\Big \}\) and

$$\begin{aligned} \mathcal {X}_{1^{2k_1}2^{2k_2}\ldots n^{2k_n}}:=\mathcal {X}_{ \tiny \underbrace{1\ldots 1}_{2k_1}\underbrace{2\ldots 2}_{2k_2} \ldots \underbrace{n\ldots n}_{2k_n} }. \end{aligned}$$

By letting \(\mathcal {X}=\underbrace{x\otimes \cdots \otimes x}_{2d}\) we can further convert problem (7) into:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}\bullet \mathcal {X} \\ \mathrm{s.t.}&{} \sum \limits _{k \in {\mathbb K}(n,d)}\frac{d!}{\prod _{j=1}^{n}{k_j!}}\mathcal {X}_{1^{2k_1}2^{2k_2}\ldots n^{2k_n}} = 1,\\ &{} \mathcal {X} \in \mathbf {S}^{n^{2d}},\;\mathrm{rank}(\mathcal {X})=1, \end{array} \end{aligned}$$
(9)

where the first equality constraint is due to the fact that \( \sum _{k \in {\mathbb K}(n,d)}\frac{d!}{\prod _{j=1}^{n}{k_j!}}\prod _{j=1}^{n}x_j^{2k_j} \!=\Vert x\Vert ^{2d}=1\).

The difficulty of the above problem lies in the dealing of the rank constraint \(\mathrm{rank}(\mathcal {X})=1\). Not only the rank function itself is difficult to deal with, but also determining the rank of a specific given tensor is already a difficult task, which is NP-hard in general [25]. To give an impression of the difficulty involved in computing tensor ranks, note that there is a particular \(9 \times 9 \times 9\) tensor (cf. [34]) whose rank is only known to be in between \(18\) and \(23\). One way to deal with the difficulty is to convert the tensor optimization problem (9) into a matrix optimization problem. A typical matricization technique is the so-called mode-\(n\) matricization [32]. Roughly speaking, given a tensor \(\mathcal {A} \in \mathbf {R}^{n_1\times n_2\times \cdots \times n_m}\), its mode-\(n\) matricization denoted by \(A(n)\) is to arrange the \(n\)-th index of \(\mathcal A\) to be the row index of the resulting matrix and merge all other indices of \(\mathcal A\) as the column index of \(A(n)\). The precise definition of the mode-\(n\) matricization is as follows.

Definition 1.2

For a given tensor \(\mathcal {A} \in \mathbf {R}^{n_1\times n_2\times \cdots \times n_m}\), the matrix \(A(n)\) is the associated mode-\(n\) matricization. In particular

$$\begin{aligned} A(n)_{i_nj} := \mathcal {A}_{i_1{i_2}\ldots i_m}, \;\forall \; 1 \le i_k \le n_k, \; 1\le k \le m, \end{aligned}$$

where

$$\begin{aligned} j = 1+ {\mathop {\mathop {\sum }\limits _{k=1}}\limits _{k\ne n}}^{m}(i_k-1)J_k, \text{ with } J_k = {\mathop {\mathop {\prod }\limits _{\ell =1}}\limits _{\ell \ne n}}^{k}n_\ell . \end{aligned}$$
(10)

The so-called \(n\)-rank of \(\mathcal A \) is defined by the vector \([\mathrm{rank}(A(1)),\mathrm{rank}(A(2)),\ldots , \mathrm{rank}(A(m))]\), where its \(n\)-th component corresponds to the column rank of the mode-\(n\) matrix \(A(n)\). The notion of \(n\)-rank has been widely used in the problems of tensor decomposition. Recently, Liu et al. [41] and Gandy et al. [18] considered the low-\(n\)-rank tensor recovery problem, which were the first attempts to solve low-rank tensor optimization problems. Along this line, Tomioka et al. [54] analyzed the statistical performance of nuclear norm relaxation of the tensor \(n\)-rank minimization problem. Chandrasekaran et al. [8] proposed another interesting idea, in particular they directly applied convex relaxation to the tensor rank and obtained a new norm called tensor nuclear norm, which was numerically intractable. Thus, a further semidefinite representable relaxation was introduced. However, the authors did not provide any numerical results for this relaxation. Therefore, in the following we shall introduce a new scheme to unfold a tensor into a matrix, where we use half of the indices of tensor to form the row index of a matrix and use the other half as the column index. Most importantly, in the next section, we manage to establish some connection between the CP rank of the tensor and the rank of the resulting unfolding matrix.

Definition 1.3

For a given super-symmetric even-order tensor \(\mathcal {F} \in \mathbf {S}^{n^{2d}}\), we define its square matricization, denoted by \(\varvec{M}\,(\mathcal {F})\in \mathbf {R}^{n^d\times n^d}\), as the following:

$$\begin{aligned} \varvec{M}\,(\mathcal {F})_{k \ell }:=\mathcal {F}_{i_1\ldots i_{d}i_{d+1} \ldots i_{2d}}, \quad 1\le i_1,\ldots ,i_d,i_{d+1},\ldots ,i_{2d} \le n, \end{aligned}$$

where

$$\begin{aligned} k = \sum \limits _{j=1}^{d}(i_j-1)n^{d-j}+1,\quad \hbox { and }\quad \ell = \sum \limits _{j=d+1}^{2d}(i_j-1)n^{2d-j}+1. \end{aligned}$$

Similarly we introduce below the vectorization of a tensor.

Definition 1.4

The vectorization, \(\varvec{V}\,(\mathcal {F})\), of tensor \(\mathcal {F} \in \mathbf {R}^{n^{m}}\) is defined as

$$\begin{aligned} \varvec{V}\,(\mathcal {F})_{k}:=\mathcal {F}_{i_1\ldots i_{m}}, \end{aligned}$$

where

$$\begin{aligned} k = \sum \limits _{j=1}^{m}(i_j-1)n^{m-j}+1, 1 \le i_1,\ldots , i_m \le n. \end{aligned}$$

In the same vein, we can convert a vector or a matrix with appropriate dimensions to a tensor. In other words, the inverse of the operators \(\varvec{M}\,\) and \(\varvec{V}\,\) can be defined in the same manner. In the following, we denote \(X=\varvec{M}\,(\mathcal {X})\), and so

$$\begin{aligned} \mathrm{Tr}\,(X) = \sum \limits _{\ell }X_{\ell \ell }\;\text{ with }\;\ell = \sum \limits _{j=1}^{d}(i_j-1)n^{d-j}+1. \end{aligned}$$

If we assume \(\mathcal {X}\) to be of rank one, then

$$\begin{aligned} \mathrm{Tr}\,(X) = \sum \limits _{i_1,\ldots ,i_d }\mathcal {X}_{i_1\ldots i_d i_1 \ldots i_d}=\sum \limits _{i_1,\ldots ,i_d }\mathcal {X}_{i_1^2\ldots i_d^2}. \end{aligned}$$

In the above expression, \((i_1,\ldots ,i_d )\) is a subset of \((1,2,\ldots , n)\). Suppose that \(j\) appears \(k_j\) times in \((i_1,\ldots ,i_d )\) with \(j=1,2,\ldots , n\) and \(\sum _{j=1}^{n}k_j=d\). Then for a fixed outcome \((k_1,k_2,\ldots , k_n)\), the total number of permutations \((i_1,\ldots ,i_d )\) that can achieve such combination is

$$\begin{aligned} \left( \begin{array}{l} d\\ k_1\end{array} \right) \left( \begin{array}{l} d-k_1\\ \quad k_2\end{array}\right) \left( \begin{array}{c} d-k_1-k_2\\ k_3\end{array}\right) \cdots \left( \begin{array}{c} d-k_1 -\cdots - k_{n-1}\\ \quad k_n\end{array}\right) = \frac{d!}{\prod _{j=1}^{n}{k_j!}}. \end{aligned}$$

Consequently,

$$\begin{aligned} \mathrm{Tr}\,(X)= \sum \limits _{i_1,\ldots ,i_d }\mathcal {X}_{i_1^2\ldots i_d^2} =\sum \limits _{k \in {\mathbb K}(n,d)}\frac{d!}{\prod _{j=1}^{n}{k_j!}}\mathcal {X}_{1^{2k_1}2^{2k_2}\ldots n^{2k_n}}. \end{aligned}$$
(11)

In light of the above discussion, if we further denote \(F = \varvec{M}\,(\mathcal {F})\), then the objective in (9) is \(\mathcal {F} \bullet \mathcal {X} = \mathrm{Tr}\,(FX)\), while the first constraint \(\sum _{k \in {\mathbb K}(n,d)}\frac{d!}{\prod _{j=1}^{n}{k_j!}}\mathcal {X}_{1^{2k_1}2^{2k_2}\ldots n^{2k_n}} = 1 \Longleftrightarrow \mathrm{Tr}\,(X)=1\). The hard constraint in (9) is \(\mathrm{rank}(\mathcal {X})=1\). It is straightforward to see that if \(\mathcal {X}\) is of rank one, then by letting \(\mathcal {X}={\lambda }\underbrace{x\otimes \cdots \otimes x}_{2d}\) for some \(\lambda \in \{1,-1 \}\) and \(\mathcal {Y}=\underbrace{x\otimes \cdots \otimes x}_{d}\), we have \(\varvec{M}\,(\mathcal {X})={\lambda } \varvec{V}\,(\mathcal {Y})\varvec{V}\,(\mathcal {Y})^{\top }\), which is to say that matrix \(\varvec{M}\,(\mathcal {X})\) is of rank one too. In the next section we shall continue to show that the other way around is also true.

2 Equivalence under the rank-one hypothesis

We first present some useful observations below.

Lemma 2.1

Suppose \(\mathcal {A} \in \mathbf {R}^{n^d}\) is an \(n\) dimensional \(d\)-th order tensor and \(\mathcal {A} \otimes \mathcal {A} \in \mathbf {S}^{n^{2d}}\). Then we have:

$$\begin{aligned} \begin{array}{l@{\quad }l} (i) &{} \mathcal {A} \in \mathbf {S}^{n^d}; \\ (ii) &{} the\, n\hbox {-}rank\, of\;\mathcal {A}\;\text{ is }\; [1,1,\ldots ,1].\end{array} \end{aligned}$$

Proof

We denote \(\mathcal {F}=\mathcal {A} \otimes \mathcal {A} \in \mathbf {S}^{n^{2d}}\). For any \(d\)-tuples \(\{i_1,\ldots ,i_d\}\), and one of its permutations \(\{j_1,\ldots ,j_d\} \in \pi (i_1,\ldots ,i_d)\), it holds that

$$\begin{aligned} \left( \mathcal {A} _{i_1\ldots i_d}-\mathcal {A} _{j_1\ldots j_d}\right) ^2&= \mathcal {A} _{i_1\ldots i_d}^2+\mathcal {A} _{j_1 \ldots j_d}^2-2\mathcal {A} _{i_1 \ldots i_d}\mathcal {A}_{j_1 \ldots j_d}\\&= \mathcal {F}_{i_1 \ldots i_d i_1 \ldots i_d}+\mathcal {F}_{j_1 \ldots j_d j_1 \ldots j_d}-2\mathcal {F}_{i_1 \ldots i_d j_1 \ldots j_d}=0, \end{aligned}$$

where the last equality is due to the fact that \(\mathcal {F}\) is super-symmetric. Therefore, \(\mathcal {A}\) is super-symmetric.

To prove the second statement, we first observe that for any two \(d\)-tuples \(\{i_1,\ldots ,i_d\}\) and \(\{i^\prime _1,\ldots ,i^\prime _d\}\), due to the super-symmetry of \(\mathcal F\), we have

$$\begin{aligned} \mathcal {A}_{i_1\ldots i_d}\mathcal {A}_{i^\prime _1\ldots i^\prime _d}= \mathcal {F}_{i_1\ldots i_d i^\prime _1\ldots i^\prime _d}= \mathcal {F}_{i^\prime _1 i_2\ldots i_d i_1 i^\prime _2\ldots i^\prime _d} =\mathcal {A}_{i^\prime _1 i_2\ldots i_d}\mathcal {A}_{i_1 i^\prime _2 \ldots i^\prime _d}. \end{aligned}$$

Now consider the mode-\(1\) unfolding, which is the matrix \(A(1)\). For any two components \(A(1)_{i_1j}\) and \(A(1)_{{i}^\prime _1{j}^\prime }\) with \(j = 1+ \sum _{{k=2} }^{m}(i_k-1)J_k, j^\prime = 1+ \sum _{{k=2} }^{m}(i^\prime _k-1)J_k\) and \(J_k\) is defined in (10), the equation above implies that

$$\begin{aligned} A(1)_{i_1j} A(1)_{{i}^\prime _1{j}^\prime }=\mathcal {A}_{i_1\ldots i_d}\mathcal {A}_{i^\prime _1\ldots i^\prime _d}=\mathcal {A}_{i^\prime _1 i_2\ldots i_d}\mathcal {A}_{i_1 i^\prime _2\ldots i^\prime _d}=A(1)_{i^\prime _1j} A(1)_{{i}_1{j}^\prime }. \end{aligned}$$

Therefore, every \(2 \times 2\) minor of matrix \(A(1)\) is zero and so \(A(1)\) is of rank one. Moreover, since \(\mathcal {A}\) is super-symmetric, the mode-unfolded matrices are all the same. Thus, we conclude that the \(n\)-rank of \(\mathcal {A}\) is \([1,1,\ldots ,1]\). \(\square \)

The following lemma tells us if a super-symmetric tensor is of rank one in the sense of nonsymmetric CP, then the symmetric CP rank of the tensor is also one.

Lemma 2.2

If a \(d\)-th order tensor \(\mathcal {A} = a^1\otimes a^2 \otimes \cdots \otimes a^d\) is super-symmetric, then we have \(a^i =\pm a^1\) for \(i=2,\ldots ,d\) and \(\mathcal {A} =\lambda \, \underbrace{a^1\otimes a^1 \otimes \cdots \otimes a^1}_{d}\) for some \(\lambda = \pm 1\).

Proof

Since \(\mathcal {A}\) is super-symmetric, from Theorem 4.1 in [10], we know that

$$\begin{aligned} \max _{\Vert x\Vert =1} |\mathcal {A} \Bigg (\underbrace{x,\ldots ,x}_{d}\!\Bigg )|=\max _{\Vert x^i\Vert =1,i=1,\ldots ,d} \mathcal {A} (x^1,\ldots ,x^d)\!=\!\Vert a^1\Vert \times \Vert a^2\Vert \times \cdots \times \Vert a^d\Vert . \end{aligned}$$

So there must exist an \(x^*\) with \(\Vert x^*\Vert =1\) such that \( |(a^i)^{\top }x^*|=\Vert a^i\Vert \) for all \(i\), which implies that \(a^i =\pm a^1\) for \(i=2,\ldots ,d\), and thus the conclusion follows. \(\square \)

We have the following proposition as the immediate consequence of the above lemmas.

Proposition 2.3

Suppose \(\mathcal {A} \in \mathbf {R}^{n^d}\) is an \(n\) dimensional \(d\)-th order tensor. The following two statements are equivalent:

$$\begin{aligned} \begin{array}{l@{\quad }l} (i) &{} \mathcal {A} \in \mathbf {S}^{n^d}, \text{ and } \ \mathrm{rank}(\mathcal {A})=1; \\ (ii) &{} \mathcal {A} \otimes \mathcal {A} \in \mathbf {S}^{n^{2d}}.\end{array} \end{aligned}$$

Proof

We shall first show (i) \(\Longrightarrow \) (ii). Suppose \(\mathcal {A} \in \mathbf {S}^{n^d}\) with \(\mathrm{rank}(\mathcal {A})=1\). Then there exists a vector \(a \in \mathbf {R}^{n}\) and a scalar \(\lambda \in \{ 1, -1 \}\) such that \(\mathcal {A} = {\lambda } \underbrace{a\otimes a \otimes \cdots \otimes a}_{{d}}\). Consequently, \(\mathcal {A} \otimes \mathcal {A} = \underbrace{a\otimes a \otimes \cdots \otimes a}_{2d} \in \mathbf {S}^{n^{2d}} \).

Now we prove (ii) \(\Longrightarrow \) (i). From Lemma 2.1, we know that \(\mathcal A\) is super-symmetric and the \(n\)-rank of \(\mathcal A\) is \([1,1,\ldots ,1]\). It is well known that the \(n\)-rank of a tensor corresponds to the size of the core tensor associated with the smallest exact Tucker decomposition [32]. Consequently, the \(n\)-rank of \(\mathcal {A}\) is \([1,1,\ldots ,1]\) means that the core tensor associated with the exact Tucker decomposition of \(\mathcal {A}\) is a scalar, thus the nonsymmetric real CP rank of \(\mathcal {A}\) is also one. Finally, due to Lemma 2.2 and the fact that \(\mathcal A\) is super-symmetric, we conclude that the symmetric CP rank of \(\mathcal {A}\) is one, i.e. \(\mathrm{rank}(\mathcal {A})=1\). \(\square \)

Now we are ready to present the main result of this section.

Theorem 2.4

Suppose \(\mathcal {X} \in \mathbf {S}^{n^{2d}}\) and \(X = \varvec{M}\,(\mathcal {X})\in \mathbf {R}^{n^d \times n^d}\). Then we have

$$\begin{aligned} \mathrm{rank}(\mathcal {X})=1\Longleftrightarrow \mathrm{rank}(X)=1. \end{aligned}$$

Proof

As remarked earlier, that \(\mathrm{rank}(\mathcal {X})=1\Longrightarrow \mathrm{rank}(X)=1\) is evident. To see this, suppose \(\mathrm{rank}(\mathcal {X})=1\) and \(\mathcal {X}=\underbrace{x\otimes \cdots \otimes x}_{2d}\) for some \(x \in \mathbf {R}^n\). By constructing \(\mathcal {Y}\!=\!\underbrace{x\otimes \cdots \otimes x}_{d}\), we have \(X=\varvec{M}\,(\mathcal {X})\!=\!\varvec{V}\,(\mathcal {Y})\varvec{V}\,(\mathcal {Y})^{\top }\), which leads to \(\mathrm{rank}(X)\!=\!1\).

To prove the other implication, suppose that we have \(\mathcal {X} \in \mathbf {S}^{n^{2d}}\) and \(\varvec{M}\,(\mathcal {X})\) is of rank one, i.e., \(\varvec{M}\,(\mathcal {X})=yy^{\top }\) for some vector \(y\in \mathbf {R}^{n^d}\). Then \(\mathcal {X} = \varvec{V}\,^{-1}(y)\otimes \varvec{V}\,^{-1}(y)\), which combined with Proposition 2.3 implies \(\varvec{V}\,^{-1}(y)\) is supper-symmetric and of rank one. Thus there exists \(x \in \mathbf {R}^n\) such that \(\varvec{V}\,^{-1}(y) = \underbrace{x\otimes \cdots \otimes x}_{d}\) and \(\mathcal {X}=\underbrace{x\otimes \cdots \otimes x}_{2d}\). \(\square \)

3 A nuclear norm penalty approach

According to Theorem 2.4, we know that a super-symmetric tensor is of rank one, if and only if its matrix correspondence obtained via the matricization operation defined in Definition 1.3, is also of rank one. As a result, we can reformulate Problem (9) equivalently as the following matrix optimization problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathrm{Tr}\,(F X) \\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1, \ \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}, \\ &{} X\in \mathbf {S}^{n^d \times n^d}, \ \mathrm{rank}(X)=1, \end{array} \end{aligned}$$
(12)

where \(X=\varvec{M}\,(\mathcal {X}), F = \varvec{M}\,(\mathcal {F})\), and \(\mathbf {S}^{n^d \times n^d}\) denotes the set of \(n^d\times n^d\) symmetric matrices. Note that the constraints \(\varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}\) requires the tensor correspondence of \(X\) to be super-symmetric, which essentially correspond to \(O(n^{2d})\) linear equality constraints. The rank constraint \(\mathrm{rank}(X)=1\) makes the problem intractable. In fact, Problem (12) is NP-hard in general, due to its equivalence to problem (6).

There have been a large amount of work that deal with the low-rank matrix optimization problems. Research in this area was mainly ignited by the recent emergence of compressed sensing [5, 12], matrix rank minimization and low-rank matrix completion problems [4, 6, 51]. The matrix rank minimization seeks a matrix with the lowest rank satisfying some linear constraints, i.e.,

$$\begin{aligned} \min _{X\in \mathbf {R}^{n_1\times n_2}} \ \mathrm{rank}(X), \ \mathrm{s.t.}, \ \mathcal {C}(X)=b, \end{aligned}$$
(13)

where \(b\in \mathbf {R}^p\) and \(\mathcal {C}:\mathbf {R}^{n_1\times n_2}\rightarrow \mathbf {R}^p\) is a linear operator. The results in [4, 6, 51] show that under certain randomness hypothesis on the linear operator \(\mathcal {C}\), with high probability the NP-hard problem (13) is equivalent to the following nuclear norm minimization problem, which is a convex programming problem:

$$\begin{aligned} \min _{X\in \mathbf {R}^{n_1\times n_2}} \ \Vert X\Vert _*, \ \mathrm{s.t.}, \ \mathcal {C}(X)=b. \end{aligned}$$
(14)

In other words, the optimal solution to the convex problem (14) is also the optimal solution to the original NP-hard problem (13).

Motivated by the convex nuclear norm relaxation, one way to deal with the rank constraint in (12) is to introduce the nuclear norm term of \(X\), which penalizes high-ranked \(X\)’s, in the objective function. This yields the following convex optimization formulation:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathrm{Tr}\,(F X) - \rho \Vert X \Vert _*\\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1, \ \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}, \\ &{} X\in \mathbf {S}^{n^d\times n^d}, \end{array} \end{aligned}$$
(15)

where \(\rho >0\) is a penalty parameter. It is easy to see that if the optimal solution of (15) (denoted by \(\tilde{X}\)) is of rank one, then \(\Vert \tilde{X}\Vert _* = \mathrm{Tr}\,(\tilde{X})=1\), which is a constant. In this case, the term \(-\rho \Vert X\Vert _*\) added to the objective function is a constant, which leads to the fact the solution is also optimal with the constraint that \(X\) is rank-one. In fact, Problem (15) is the convex relaxation of the following problem

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathrm{Tr}\,(F X) - \rho \Vert X \Vert _* \\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1, \ \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}, \\ &{} X\in \mathbf {S}^{n^d\times n^d},\ \mathrm{rank}(X)=1, \end{array} \end{aligned}$$

which is equivalent to the original problem  (12) since \(\rho \Vert X \Vert _* = \rho \mathrm{Tr}\,(X)=\rho \).

After solving the convex optimization problem (15) and obtaining the optimal solution \(\tilde{X}\), if \(\mathrm{rank}(\tilde{X})=1\), we can find \(\tilde{x}\) such that \(\varvec{M}\,^{-1}(\tilde{X})=\underbrace{\tilde{x}\otimes \cdots \otimes \tilde{x}}_{2d}\), according to Theorem 2.4. In this case, \(\tilde{x}\) is the optimal solution to Problem (6). The original tensor PCA problem, or the Z-eigenvalue problem (6), is thus solved to optimality.

Interestingly, we found from our extensive numerical tests that the optimal solution to Problem (15) is a rank-one matrix almost all the time. In the following, we will show this interesting phenomenon by some concrete examples. The first example is taken from [31].

Example 3.1

We consider a super-symmetric tensor \(\mathcal {F} \in \mathbf {S}^{3^{4}}\) defined by

$$\begin{aligned} \mathcal {F}_{1111}=0.2883,\quad \mathcal {F}_{1112}=-0.0031,\quad \mathcal {F}_{1113}=0.1973, \mathcal {F}_{1122}=-0.2485, \quad \mathcal {F}_{1123}=-0.2939,\\ \mathcal {F}_{1133}=0.3847,\quad \mathcal {F}_{1222}=0.2972,\quad \quad \mathcal {F}_{1223}=0.1862, \mathcal {F}_{1233}=0.0919, \quad \; \mathcal {F}_{1333}=-0.3619,\\ \mathcal {F}_{2222}=0.1241,\quad \mathcal {F}_{2223}=-0.3420,\quad \; \mathcal {F}_{2233}=0.2127, \mathcal {F}_{2333}=0.2727, \quad \; \mathcal {F}_{3333}=-0.3054. \end{aligned}$$

We want to compute the largest Z-eigenvalue of \(\mathcal {F}\).

Since the size of this tensor is small, we used CVX [23] to solve Problem (15) with \(F = \varvec{M}\,(\mathcal {F})\) and \(\rho = 10\). It turned out that CVX produced a rank-one solution \(\tilde{X} = aa^{\top }\in \mathbf {R}^{3^2\times 3^2}\), where

$$\begin{aligned} a&= (0.4451,0.1649,-0.4688,0.1649,0.0611,-0.1737,-0.4688,\\&-0.1737,0.4938)^{\top }. \end{aligned}$$

Thus we get the matrix correspondence of \(a\) by reshaping \(a\) into a square matrix \(A\):

$$\begin{aligned} A=[a(1:3),a(4:6),a(7:9)]= \begin{bmatrix} 0.4451&\quad 0.1649&\quad -0.4688\\ 0.1649&\quad 0.0611&\quad -0.1737\\ -0.4688&\quad -0.1737&\quad 0.4938\end{bmatrix}. \end{aligned}$$

It is easy to check that \(A\) is a rank-one matrix with the nonzero eigenvalue being 1. This further confirms our theory on the rank-one equivalence, i.e., Theorem 2.4. The eigenvector that corresponds to the nonzero eigenvalue of \(A\) is given by

$$\begin{aligned} \tilde{x} = (-0.6671,-0.2472,0.7027)^{\top }, \end{aligned}$$

which is the optimal solution to Problem (6).

The next example is from a real MRI application studied by Ghosh et al. [19]. In [19], Ghosh et al. studied a fiber detection problem in diffusion MRI, where they tried to extract the geometric characteristics from an antipodally symmetric spherical function (ASSF), which can be described equivalently in the homogeneous polynomial basis constrained to the sphere. They showed that it is possible to extract the maxima and minima of an ASSF by computing the stationary points of a problem in the form of (6) with \(d=2\) and \(n=4\).

Example 3.2

The objective function \(\mathcal {F}({x,x,x,x})\) in this example is given by

$$\begin{aligned} \begin{array}{rrrrrrrrrr} &{}0.74694 x_1^4&{} -\!\!&{} 0.435103 x_1^3x_2&{} +&{} 0.454945 x_1^2x_2^2&{} +&{}0.0657818 x_1x_2^3 &{} +&{} x_2^4 \\ +&{} 0.37089 x_1^3x_3 &{} -&{} 0.29883 x_1^2x_2x_3 &{} - &{} 0.795157 x_1x_2^2x_3 &{} + &{} 0.139751 x_2^3x_3 &{} + &{} 1.24733 x_1^2x_3^2\\ +&{} 0.714359 x_1x_2x_3^2 &{} +&{} 0.316264 x_2^2x_3^2 &{} -&{} 0.397391 x_1x_3^3 &{} -&{} 0.405544 x_2x_3^3&{} + &{} 0.794869 x_3^4. \end{array} \end{aligned}$$

Again, we used CVX to solve problem (15) with \(F = \varvec{M}\,(\mathcal {F})\) and \(\rho = 10\), and a rank-one solution was found with \(\tilde{X}= aa^{\top }\), where

$$\begin{aligned} a=(0.0001, 0.0116, 0.0004, 0.0116, 0.9984, 0.0382, 0.0004, 0.0382, 0.0015)^{\top }. \end{aligned}$$

By reshaping vector \(a\), we get the following expression of matrix \(A\):

$$\begin{aligned} A = [a(1:3),a(4:6),a(7:9)] = \begin{bmatrix} 0.0001&\quad 0.0116&\quad 0.0004\\ 0.0116&\quad 0.9984&\quad 0.0382\\ 0.0004&\quad 0.0382&\quad 0.0015\end{bmatrix}. \end{aligned}$$

It is easy to check that \(A\) is a rank-one matrix with 1 being the nonzero eigenvalue. The eigenvector corresponding to the nonzero eigenvalue of \(A\) is given by

$$\begin{aligned} \tilde{x} = (0.0116,0.9992,0.0382)^{\top }, \end{aligned}$$

which is also the optimal solution to the original problem (6).

Henceforth we conduct some numerical tests on randomly generated examples. We constructed \(4\)-th order tensor \(\mathcal T\) with its components drawn randomly from i.i.d. standard normal distribution. The super-symmetric tensor \(\mathcal F\) in the tensor PCA problem was obtained by symmetrizing \(\mathcal T\). All the numerical experiments in this paper were conducted on an Intel Core i5-2520M 2.5 GHz computer with 4 GB of RAM, and all the default settings of Matlab 2012b and CVX 1.22 were used for all the tests. We chose \(d=2\) and the dimension of \(\mathcal F\) in the tensor PCA problem from \(n=3\) to \(n=9\). We chose \(\rho =10\). For each \(n\), we tested \(100\) random instances. In Table 1, we report the number of instances that produced rank-one solutions. We also report the average CPU time (in seconds) using CVX to solve the problems.

Table 1 Frequency of nuclear norm penalty problem (15) having a rank-one solution

Table 1 shows that for these randomly created tensor PCA problems, the nuclear norm penalty problem (15) always gives a rank-one solution, and thus always solves the original problem (6) to optimality.

4 Semidefinite programming relaxation

In this section, we study another convex relaxation for Problem (12). Note that the constraint

$$\begin{aligned} X\in \mathbf {S}^{n^d\times n^d}, \mathrm{rank}(X)=1 \end{aligned}$$

in (12) actually implies that \(X\) is positive semidefinite. To get a tractable convex problem, we drop the \(\mathrm{rank}\) constraint and impose a semidefinite constraint to (12) and consider the following SDP relaxation:

$$\begin{aligned} \begin{array}{lll} (SDR)&{}\max &{} \mathrm{Tr}\,(F X) \\ &{}\mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1,\\ &{}&{} \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}},\;X \succeq 0. \end{array} \end{aligned}$$
(16)

Remark that replacing the rank-one constraint by SDP constraint is by now a common and standard practice; see, e.g., [1, 21, 55]. The next theorem shows that the SDP relaxation (16) is actually closely related to the nuclear norm penalty problem (15).

Theorem 4.1

Let \(X^*_{SDR}\) and \(X^*_{PNP}(\rho )\) be the optimal solutions of problems (16) and (15) respectively. Suppose \(Eig^+(X)\) and \(Eig^-(X)\) are the summations of nonnegative eigenvalues and negative eigenvalues of \(X\) respectively, i.e.,

$$\begin{aligned} Eig^+(X) := \sum _{i:\, \lambda _i(X)\ge 0} \lambda _i(X), \quad Eig^-(X) := \sum _{i:\, \lambda _i(X)< 0} \lambda _i(X). \end{aligned}$$

It holds that

$$\begin{aligned} {2(\rho - v)}\,\left| Eig^-(X^*_{PNP}(\rho ))\right| \le v - F_0, \end{aligned}$$

where \(F_0 := \max \nolimits _{1\le i \le n}\mathcal {F}_{i^{2d}}\) and \(v\) is the optimal value of the following optimization problem

$$\begin{aligned} \begin{array}{lll} &{}\max &{} \mathrm{Tr}\,(F X) \\ &{}\mathrm{s.t.}&{} \Vert X\Vert _* =1,\\ &{}&{} X \in \mathbf {S}^{n^d \times n^d}. \end{array} \end{aligned}$$
(17)

As a result, \(\lim _{\rho \rightarrow + \infty }\mathrm{Tr}\,(FX^*_{PNP}(\rho ))=\mathrm{Tr}\,(FX^*_{SDR})\).

Proof

Observe that \(\varvec{M}\,(\underbrace{e^i\otimes \cdots \otimes e^i}_{2d})\), where \(e^i\) is the \(i\)-th unit vector, is a feasible solution for problem (15) with objective value \(\mathcal {F}_{i^{2d}} - \rho \) for all \(1 \le i \le n\). Moreover, by denoting \(r(\rho ) = \left| Eig^-(X^*_{PNP}(\rho ))\right| \), we have

$$\begin{aligned} \Vert X^*_{PNP}(\rho )\Vert _*&= Eig^+(X^*_{PNP}(\rho )) + \left| Eig^-(X^*_{PNP}(\rho ))\right| \\&= \left( Eig^+(X^*_{PNP}(\rho )) + Eig^-(X^*_{PNP}(\rho )) \right) + 2\left| Eig^-(X^*_{PNP}(\rho ))\right| \\&= 1+2r(\rho ). \end{aligned}$$

Since \(X^*_{PNP}(\rho )\) is optimal to problem (15), we have

$$\begin{aligned} \mathrm{Tr}\,(FX^*_{PNP}(\rho )) - \rho \,(1+2r(\rho )) \ge \max \limits _{1\le i \le n}\mathcal {F}_{i^{2d}} - \rho \ge F_0 - \rho . \end{aligned}$$
(18)

Moreover, since \(X^*_{PNP}(\rho )/\Vert X^*_{PNP}(\rho ) \Vert _*\) is feasible to problem (17), we have

$$\begin{aligned} \mathrm{Tr}\,(FX^*_{PNP}(\rho )) \le \Vert X^*_{PNP}(\rho ) \Vert _*\,v =(1+2r(\rho ))\,v. \end{aligned}$$
(19)

Combining (19) and (18) yields

$$\begin{aligned} {2(\rho - v)}\,r(\rho ) \le v - F_0. \end{aligned}$$
(20)

Notice that \(\Vert X \Vert _* = 1\) implies \(\Vert X\Vert _\infty \) is bounded for all feasible \(X \in \mathbf {S}^{n^d \times n^d}\), where \(\Vert X\Vert _\infty \) denotes the largest entry of \(X\) in magnitude. Thus the set \(\{ X^*_{PNP}(\rho ) \mid \rho > 0\}\) is bounded. Let \(X^*_{PNP}\) be one cluster point of sequence \(\{ X^*_{PNP}(\rho ) \mid \rho > 0 \}\). By taking the limit \(\rho \rightarrow +\infty \) in (20), we have \(r(\rho )\rightarrow 0\) and thus \(X^*_{PNP} \succeq 0\). Consequently, \(X^*_{PNP}\) is a feasible solution to problem (16) and \(\mathrm{Tr}\,(FX^*_{SDR}) \ge \mathrm{Tr}\,(FX^*_{PNP})\). On the other hand, it is easy to check that for any \(0 < \rho _1 < \rho _2\),

$$\begin{aligned} \mathrm{Tr}\,(FX^*_{SDR}) \le \mathrm{Tr}\,(FX^*_{PNP}(\rho _2)) \le \mathrm{Tr}\,(FX^*_{PNP}(\rho _1)), \end{aligned}$$

which implies \(\mathrm{Tr}\,(FX^*_{SDR}) \!\le \! \mathrm{Tr}\,(FX^*_{PNP})\). Therefore, \(\lim _{\rho \rightarrow + \infty }\mathrm{Tr}\,(FX^*_{PNP}(\rho )) = \mathrm{Tr}\,(FX^*_{PNP})=\mathrm{Tr}\,(FX^*_{SDR})\). \(\square \)

Theorem 4.1 shows that when \(\rho \) goes to infinity in (15), the optimal solution of the nuclear norm penalty problem (15) converges to the optimal solution of the SDP relaxation (16). As we have shown in Table 1 that the nuclear norm penalty problem (15) returns rank-one solutions for all the randomly created tensor PCA problems that we tested, it is expected that the SDP relaxation (16) will also be likely to give rank-one solutions. In fact, this is indeed the case as shown through the numerical results in Table 2. As in Table 1, we tested \(100\) random instances for each \(n\). In Table 2, we report the number of instances that produced rank-one solutions for \(d=2\). We also report the average CPU time (in seconds) using CVX to solve the problems. As we see from Table 2, for these randomly created tensor PCA problems, the SDP relaxation (16) always gives a rank-one solution, and thus always solves the original problem (6) to optimality.

Table 2 Frequency of SDP relaxation (16) having a rank-one solution

5 Alternating direction method of multipliers

The computational times reported in Tables 1 and 2 suggest that it can be time-consuming to solve the convex problems (15) and (16) when the problem size is large [especially for the nuclear norm penalty problem (15)]. In this section, we propose an alternating direction method of multipliers (ADMM) for solving (15) and (16) that fully takes advantage of the structures. ADMM is closely related to some operator-splitting methods, known as Douglas–Rachford and Peaceman–Rachford methods, that were proposed in 1950s for solving variational problems arising from PDEs (see [13, 47]). These operator-splitting methods were extensively studied later in the literature for finding the zeros of the sum of monotone operators and for solving convex optimization problems (see [1416, 20, 40]). The ADMM we will study in this section was shown to be equivalent to the Douglas–Rachford operator-splitting method applied to convex optimization problem (see [17]). ADMM was revisited recently as it was found to be very efficient for many sparse and low-rank optimization problems arising from the recent emergence of compressed sensing [59], compressive imaging [22, 57], robust PCA [53], sparse inverse covariance selection [52, 60], sparse PCA [42] and SDP [58] etc. For a more complete discussion and list of references on ADMM, we refer to the recent survey paper by Boyd et al. [3] and the references therein.

Generally speaking, ADMM solves the following convex optimization problem,

$$\begin{aligned} \begin{array}{l@{\quad }l} \min _{x\in \mathbf {R}^n,y\in \mathbf {R}^p} &{} f(x) + g(y) \\ \mathrm{s.t.}&{} Ax + By = b \\ &{} x\in {\mathcal {C}}, \ y\in {\mathcal {D}}, \end{array} \end{aligned}$$
(21)

where \(f\) and \(g\) are convex functions, \(A\in \mathbf {R}^{m\times n}, B\in \mathbf {R}^{m\times p}, b\in \mathbf {R}^m, \mathcal {C}\) and \(\mathcal {D}\) are some simple convex sets. A typical iteration of ADMM for solving (21) can be described as follows:

$$\begin{aligned} \left\{ \begin{array}{lll} x^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{x\in \mathcal {C}} \ \mathcal {L}_\mu (x,y^k;\lambda ^k) \\ y^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{y\in \mathcal {D}} \ \mathcal {L}_\mu (x^{k+1},y;\lambda ^k) \\ \lambda ^{k+1} &{} := &{} \lambda ^k - (Ax^{k+1}+By^{k+1}-b)/\mu , \end{array}\right. \end{aligned}$$
(22)

where the augmented Lagrangian function \(\mathcal {L}_\mu (x,y;\lambda )\) is defined as

$$\begin{aligned} \mathcal {L}_\mu (x,y;\lambda ) := f(x) + g(y) - \langle \lambda , Ax+By-b \rangle + \frac{1}{2\mu }\Vert Ax+By-b\Vert ^2, \end{aligned}$$

with \(\lambda \) being the Lagrange multiplier and \(\mu >0\) a penalty parameter. The following theorem gives the global convergence of (22) for solving (21), and this has been well studied in the literature (see, e.g., [14, 16]).

Theorem 5.1

Assume both A and B are of full column rank, the sequence \(\{(x^k,y^k,\lambda ^k)\}\) generated by (22) globally converges to a pair of primal and dual optimal solutions \((x^*,y^*)\) and \(\lambda ^*\) of (21) from any starting point.

Because both the nuclear norm penalty problem (15) and SDP relaxation (16) can be rewritten in the form of (21), we can apply ADMM to solve them.

5.1 ADMM for nuclear norm penalty problem (15)

Note that the nuclear norm penalty problem (15) can be rewritten equivalently as

$$\begin{aligned} \begin{array}{lll} \min &{} - \mathrm{Tr}\,(F Y) + \rho \Vert Y \Vert _* \\ \mathrm{s.t.}&{} X - Y = 0, \\ &{} X \in {\mathcal {C}}, \end{array} \end{aligned}$$
(23)

where \(\mathcal {C}:=\{X\in \mathbf {S}^{n^d\times n^d}\mid \mathrm{Tr}\,(X) =1, \ \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}\}\). A typical iteration of ADMM for solving (23) can be described as

$$\begin{aligned} \left\{ \begin{array}{lll} X^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{X \in \mathcal {C}} - \mathrm{Tr}\,(F Y^k) + \rho \Vert Y^k\Vert _* - \langle \Lambda ^k, X-Y^k \rangle + \frac{1}{2\mu }\Vert X-Y^k\Vert _F^2 \\ Y^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{Y} -\mathrm{Tr}\,(F Y) + \rho \Vert Y\Vert _* - \langle \Lambda ^k, X^{k+1}-Y \rangle + \frac{1}{2\mu }\Vert X^{k+1}-Y\Vert _F^2 \\ \Lambda ^{k+1} &{} := &{} \Lambda ^k - (X^{k+1}-Y^{k+1})/\mu , \end{array}\right. \nonumber \\ \end{aligned}$$
(24)

where \(\Lambda \) is the Lagrange multiplier associated with the equality constraint in (23) and \(\mu >0\) is a penalty parameter. Following Theorem 5.1, we know that the sequence \(\{(X^k,Y^k,\Lambda ^k)\}\) generated by (24) globally converges to a pair of primal and dual optimal solutions \((X^*,Y^*)\) and \(\Lambda ^*\) of (23) from any starting point.

Next we show that the two subproblems in (24) are both easy to solve. The first subproblem in (24) can be equivalently written as

$$\begin{aligned} X^{k+1} := {\mathop {\mathop {\hbox {argmin}}\nolimits }\limits _{{X\in \mathcal {C}}}} \frac{1}{2}\Vert X-(Y^k+\mu \Lambda ^k)\Vert _F^2, \end{aligned}$$
(25)

i.e., the solution of the first subproblem in (24) corresponds to the projection of \(Y^k+\mu \Lambda ^k\) onto convex set \(\mathcal {C}\). We will elaborate how to compute this projection in Sect. 5.2.

The second subproblem in (24) can be reduced to:

$$\begin{aligned} Y^{k+1} := {\mathop {\mathop {\hbox {argmin}}\nolimits }_{{Y}}} \quad \mu \rho \Vert Y\Vert _*+\frac{1}{2}\Vert Y-(X^{k+1}-\mu (\Lambda ^k-F))\Vert _F^2. \end{aligned}$$
(26)

This problem is known to have a closed-form solution that is given by the following so-called matrix shrinkage operation (see, e.g., [43]):

$$\begin{aligned} Y^{k+1} := U\mathrm{Diag}\,\left( \max \left\{ \sigma -\mu \rho ,0\right\} \right) V^{\top }, \end{aligned}$$

where \(U\mathrm{Diag}\,(\sigma )V^{\top }\) is the singular value decomposition of matrix \(X^{k+1}-\mu (\Lambda ^k-F)\).

5.2 The projection

In this subsection, we study how to solve (25), i.e., how to compute the following projection for any given matrix \(Z\in \mathbf {S}^{n^d\times n^d}\):

$$\begin{aligned} \begin{array}{l@{\quad }l} \min &{} \Vert X - Z\Vert _{F}^2 \\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1,\\ &{} \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}}. \end{array} \end{aligned}$$
(27)

For the sake of discussion, in the following we consider the equivalent tensor representation of (27):

$$\begin{aligned} \begin{array}{l@{\quad }l} \min &{} \Vert \mathcal {X} - \mathcal {Z}\Vert _{F}^2 \\ \mathrm{s.t.}&{} \sum \limits _{k \in {\mathbb K}(n,d)}\frac{d!}{\prod _{j=1}^{n}{k_j!}}\mathcal {X}_{1^{2k_1}2^{2k_2}\ldots n^{2k_n}} =1,\\ &{} \mathcal {X} \in \mathbf {S}^{n^{2d}}, \end{array} \end{aligned}$$
(28)

where \(\mathcal {X} = \varvec{M}\,^{-1}(X), \mathcal {Z} = \varvec{M}\,^{-1}(Z)\), and the equality constraint is due to (11). Now we denote index set

$$\begin{aligned} \mathbf {I} = \left\{ (i_1\ldots i_{2d}) \in \pi (1^{2k_1}\ldots n^{2k_n} ) \; \big \vert \; k=(k_1,\ldots ,k_n) \in {\mathbb K}(n,d) \right\} . \end{aligned}$$

Then the first-order optimality conditions of Problem (28) imply

$$\begin{aligned} \left\{ \! \begin{array}{l@{\quad }l} 2 \left( |\pi (i_1\ldots i_{2d})|\,\mathcal {X}_{i_1\ldots i_{2d}} \!-\! \sum \limits _{j_1\ldots j_{2d} \in \pi (i_1\ldots i_{2d})}\mathcal {Z}_{j_1\ldots j_{2d}} \right) \!=\!0, &{}\text{ if }\; (i_1\ldots i_{2d}) \!\not \in \! \mathbf {I}, \\ 2 \left( \frac{(2d)!}{\prod _{j=1}^{n}{(2k_j)!}}\mathcal {X}_{1^{2k_1}\ldots n^{2k_n}} - \sum \limits _{j_1\ldots j_{2d} \in \pi (1^{2k_1}\ldots n^{2k_n})}\mathcal {Z}_{j_1\ldots j_{2d}} \right) - \lambda \, \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}} =0, &{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

Denote \(\hat{\mathcal {Z}}\) to be the super-symmetric counterpart of tensor \(\mathcal {Z}\), i.e.,

$$\begin{aligned} \hat{\mathcal {Z}}_{i_1\ldots i_{2d}} = \sum \limits _{j_1\ldots j_{2d} \in \pi (i_1\ldots i_{2d})}\frac{\mathcal {Z}_{j_1\ldots j_{2d}}}{| \pi (i_1\ldots i_{2d})|} \end{aligned}$$

and \(\alpha (k,d):= \left( \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}} \right) / \left( \frac{(2d)!}{\prod _{j=1}^{n}{(2k_j)!}} \right) \). Then due to the first-order optimality conditions for (28), the optimal solution \(\mathcal {X}^*\) of Problem (28) satisfies

$$\begin{aligned} \left\{ \begin{array}{llll} \mathcal {X}_{i_1\ldots i_{2d}}^* &{}=&{} \hat{\mathcal {Z}}_{i_1\ldots i_{2d}}, &{} \text{ if } (i_1\ldots i_{2d}) \not \in \mathbf {I}, \\ \mathcal {X}_{1^{2k_1}\ldots n^{2k_n}}^* &{}=&{} \frac{\lambda }{2}\, \alpha (k,d) + \hat{\mathcal {Z}}_{1^{2k_1}\ldots n^{2k_n}}, &{} \text{ otherwise } . \end{array} \right. \end{aligned}$$
(29)

Multiplying the second equality of (29) by \(\frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}}\) and summing the resulting equality over all \(k=(k_1,\ldots , k_n)\) yield

$$\begin{aligned} \sum \limits _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}} \mathcal {X}_{1^{2k_1}\ldots n^{2k_n}}^*&= \frac{\lambda }{2} \sum \limits _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}}\alpha (k,d)\\&+ \sum \limits _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}}\hat{\mathcal {Z}}_{1^{2k_1}\ldots n^{2k_n}}. \end{aligned}$$

It remains to determine \(\lambda \). Noticing that \(\mathcal {X}^*\) is a feasible solution for problem (28), we have \(\sum _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}} \mathcal {X}_{1^{2k_1}\ldots n^{2k_n}}^*=1\). As a result,

$$\begin{aligned} \lambda = 2\left( 1 - \sum _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}}\hat{\mathcal {Z}}_{1^{2k_1}\ldots n^{2k_n}} \right) \bigg / \sum _{k \in {\mathbb K}(n,d)} \frac{(d)!}{\prod _{j=1}^{n}{(k_j)!}}\alpha (k,d), \end{aligned}$$

and thus we derived \(\mathcal {X}^*\) and \({X}^* = \varvec{M}\,(\mathcal {X}^*)\) as the desired optimal solution for (27).

5.3 ADMM for SDP relaxation (16)

Note that the SDP relaxation problem (16) can be formulated as

$$\begin{aligned} \begin{array}{ll} \min &{} - \mathrm{Tr}\,(F Y)\\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1, \quad \varvec{M}\,^{-1}(X) \in \mathbf {S}^{n^{2d}} \\ &{} X - Y = 0, \quad Y \succeq 0. \end{array} \end{aligned}$$
(30)

A typical iteration of ADMM for solving (30) is

$$\begin{aligned} \left\{ \begin{array}{lll} X^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{X \in \mathcal {C}} -\mathrm{Tr}\,(F Y^k) - \langle \Lambda ^k, X-Y^k \rangle + \frac{1}{2\mu }\Vert X-Y^k\Vert _F^2 \\ Y^{k+1} &{} := &{} \mathop {\hbox {argmin}}\nolimits _{Y \succeq 0} - \mathrm{Tr}\,(F Y) - \langle \Lambda ^k, X^{k+1}-Y \rangle + \frac{1}{2\mu }\Vert X^{k+1}-Y\Vert _F^2 \\ \Lambda ^{k+1} &{} := &{} \Lambda ^k - (X^{k+1}-Y^{k+1})/\mu , \end{array}\right. \end{aligned}$$
(31)

where \(\mu >0\) is a penalty parameter. Following Theorem 5.1, we know that the sequence \(\{(X^k,Y^k,\Lambda ^k)\}\) generated by (31) globally converges to a pair of primal and dual optimal solutions \((X^*,Y^*)\) and \(\Lambda ^*\) of (30) from any starting point.

It is easy to check that the two subproblems in (31) are both relatively easy to solve. Specifically, the solution of the first subproblem in (31) corresponds to the projection of \(Y^k+\mu \Lambda ^k\) onto \(\mathcal {C}\). The solution of the second problem in (31) corresponds to the projection of \(X^{k+1}+\mu F - \mu \Lambda ^k\) onto the positive semidefinite cone \(Y\succeq 0\), i.e.,

$$\begin{aligned} Y^{k+1} := U\mathrm{Diag}\,\left( \max \{\sigma ,0\}\right) U^{\top }, \end{aligned}$$

where \(U\mathrm{Diag}\,(\sigma )U^{\top }\) is the eigenvalue decomposition of matrix \(X^{k+1}+\mu F - \mu \Lambda ^k\).

6 Numerical results

6.1 The ADMM for convex programs (15) and (16)

In this subsection, we report the results on using ADMM (24) to solve the nuclear norm penalty problem (15) and ADMM (31) to solve the SDP relaxation (16). For the nuclear norm penalty problem (15), we choose \(\rho =10\). For ADMM, we choose \(\mu =0.5\) and we terminate the algorithms whenever

$$\begin{aligned} \frac{\Vert X^{k}-X^{k-1}\Vert _F}{\Vert X^{k-1}\Vert _F} + \Vert X^{k}-Y^{k}\Vert _F \le 10^{-6}. \end{aligned}$$

We shall compare ADMM and CVX for solving (15) and (16), using the default solver of CVX—SeDuMi version 1.21. We report in Table 3 the results on randomly created problems with \(d=2\) and \(n=6,7, 8, 9\). For each pair of \(d\) and \(n\), we test ten randomly created examples. In Table 3, we use ‘Inst.’ to denote the number of the instance and use ‘Iter.’ to denote the number of iterations for ADMM to solve a random instance. We use ‘Sol.Dif.’ to denote the relative difference of the solutions obtained by ADMM and CVX, i.e., \(\mathrm{Sol.Dif.}=\frac{\Vert X_{ADMM}-X_{CVX}\Vert _F}{\max \{1, \Vert X_{CVX}\Vert _F\}}\), and we use ‘Val.Dif.’ to denote the relative difference of the objective values obtained by ADMM and CVX, i.e., \(\mathrm{Val.Dif.}=\frac{|v_{ADMM}-v_{CVX}|}{\max \{1, |v_{CVX}|\}}\). We use \(T_{ADMM}\) and \(T_{CVX}\) to denote the CPU times (in seconds) of ADMM and CVX, respectively. From Table 3 we see that, ADMM produced comparable solutions compared to CVX; however, ADMM were much faster than CVX, i.e., the interior point solver, especially for nuclear norm penalty problem (15). Note that when \(n=10\), ADMM was about 500 times faster than CVX for solving (15), and was about 8 times faster for solving (16).

Table 3 Comparison of CVX and ADMM for small-scale problems

In Table 4, we report the results on larger problems, i.e., \(n=14, 16, 18, 20\). Because it becomes time consuming to use CVX to solve the nuclear norm penalty problem (15) (our numerical tests showed that it took more than three hours to solve one instance of (15) for \(n=11\) using CVX), we compare the solution quality and objective value of the solution generated by ADMM for solving (15) with solution generated by CVX for solving SDP problem (16). From Table 4 we see that, the nuclear norm penalty problem (15) and the SDP problem (16) indeed produce the same solution as they are both close enough to the solution produced by CVX. We also see that using ADMM to solve (15) and (16) were much faster than using CVX to solve (16).

Table 4 Comparison of CVX and ADMM for large-scale problems

6.2 Comparison with SOS and MBI

Based on the results of the above tests, we may conclude that it is mostly efficient to solve the SDP relaxation by ADMM. In this subsection, we compare this approach with two competing methods: one is based on the Sum of Squares (SOS) approach (Lasserre [35, 36] and Parrilo [45, 46]), and the other one is the Maximum Block Improvement (MBI) method proposed by Chen et al. [10].

Theoretically speaking, the SOS can solve any general polynomial problems to any given accuracy, but it requires to solve a sequence of (possibly large) semidefinite programs, which limits the applicability of the method to solve large size problems. Henrion et al. [26] developed a specialized Matlab toolbox known as GloptiPoly 3 based on SOS approach, which will be used in our tests. The MBI is tailored for multi-block optimization problem, and the polynomial optimization can be treated as multi-block problems, to which MBI can be applied. As we mentioned before, MBI aims to finding a stationary point, which may or may not be globally optimal.

In Table 5 we report the results using ADMM to solve SDP relaxation of PCA problem and compare them with the results of applying the SOS method as well as the MBI method for the same problem. When using the MBI, as suggested in [10], we actually work on an equivalent problem of (6): \(\max \nolimits _{\Vert x \Vert = 1} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d})+ 6(x^{\top }x)^{d}\), where the equivalence is due to the constraint \(\Vert x \Vert = 1\). This transformation can help the MBI avoid getting trapped in a local minimum.

Table 5 Comparison of SDP relaxation by ADMM with GloptiPoly 3 and MBI

We use ‘Val.’ to denote the objective value of the solution, ‘Status’ to denote optimal status of GloptiPoly \(3\), i.e., \(\mathrm{Status}=1\) means GloptiPoly \(3\) successfully identified the optimality of current solution, ‘Sol.R.’ to denote the solution rank returned by SDP relaxation and thanks to the previous discussion ‘Sol.R. \(=\) 1’ means the current solution is already optimal. From Table 5, we see that the MBI is the fastest among all the methods but usually cannot guarantee global optimality, while GloptiPoly \(3\) is very time consuming but can globally solve most instances. Note that when \(n=20\), our ADMM was about 30 times faster than GloptiPoly \(3\). Moreover, for some instances GloptiPoly \(3\) cannot identify the optimality even though the current solution is actually already optimal (see instance \(9\) with \(n=16\) and instance \(3\) with \(n=18\)).

6.3 Comparison with Z-eigenvalue methods

Qi et al. [49] proposed two heuristic methods to find the maximum Z-eigenvalue of the third order super-symmetric tensors. We will show later in Sect. 8.2 that our method can solve tri-linear (not necessary super-symmetric) tensor PCA problems. Thus in this subsection, we report the results of using ADMM to solve SDP relaxation of the third order tensor PCA problems and compare them with the results of applying the two Z-eigenvalue methods, which are referred as “Z1” and “Z2” in Tables 6 and 7. In Table 6 we generate 1,000 tensors by normal distribution, while in Table 7, the 1,000 instances are generated by uniform distribution in the interval \((-1,1)\). We report, in Tables 6 and 7, the number of instances that are globally solved by each algorithm. According to these experiments, we can see that the performance of our approach is better than that of Z1 and is comparable to Z2.

Table 6 Comparison with Z-eigenvalue methods with data generated by normal distribution
Table 7 Comparison with Z-eigenvalue methods with data generated by uninform distribution

7 What if the solution is not rank-one?

Although our numerical results strongly indicate that problems (15) and (16) are very likely to admit rank-one solutions (100 % for the randomly created problems we tested), it is in principle possible that the solution \(X^*= \sum _{i=1}^{r}a^i(a^i)^{\top }\) is not of rank one, i.e., \(r > 1\). In this situation, we can introduce a small perturbation to the original tensor \(\mathcal {F}\) and apply the proposed algorithms. If the newly obtained solution is rank-one, we can say that this solution is a good approximation of the “true” solution. Another way to proceed is to apply a post-processing procedure, which will be discussed below, on \(X^*\) to obtain a rank-one solution.

We denote \(\mathcal {X}^*=\varvec{M}\,^{-1}(X^*)\), and \(\mathcal {X}^*=\sum \nolimits _{i=1}^{r}\mathcal {A}^i\otimes \mathcal {A}^i \), where \(\varvec{V}\,(\mathcal {A}^i)=a^{i}\). Basically, we want to find the projection of \(\mathcal {X}^*\) onto the rank-one tensor set \(\{ \mathcal {X} \in \mathbf {S}^{2d} \; \big \vert \; \mathrm{rank}(\mathcal {X})=1, \Vert \mathcal {X} \Vert _F=1 \}\):

$$\begin{aligned} \min \limits _{\mathcal {X} \in \mathbf {S}^{2d}, \Vert \mathcal {X}\Vert _F=1, \mathrm{rank}(\mathcal {X})=1}\Vert \mathcal {X}^*-\mathcal {X}\Vert _{F}, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \max \limits _{\Vert x^1\Vert =\cdots =\Vert x^{2d}\Vert =1}\mathcal {X}^* \left( x^1,x^2,\ldots ,x^{2d}\right) . \end{aligned}$$
(32)

This is a problem in the form of (6), but the difference is that \(\mathcal {X}^*\) has a further structure which plays an important role in the later discussion.

Proposition 7.1

For a tensor \(\mathcal {F} =\sum _{i=1}^{r}\mathcal {A}^i\otimes \mathcal {A}^i \in \mathbf {S}^{n^{2d}}\), it holds that

$$\begin{aligned} \mathcal {F}\left( x^1,x^1,x^2,x^2,\ldots ,x^d,x^d\right) \ge 0, \quad \forall \;x^1,x^2,\ldots ,x^d. \end{aligned}$$
(33)

Proof

Since \(\mathcal F\) is super-symmetric, for any \(x^1,x^2,\ldots ,x^d\),

$$\begin{aligned} \mathcal {F}\left( x^1,x^1,x^2,x^2,\ldots ,x^d,x^d\right)&= \mathcal {F}\left( x^1,x^2,\ldots ,x^d,x^1,x^2,\ldots ,x^d\right) \\&= \sum \limits _{i=1}^{r}\mathcal {A}^i\otimes \mathcal {A}^i\left( x^1,x^2,\ldots ,x^d,x^1,x^2,\ldots ,x^d\right) \\&= \sum \limits _{i=1}^{r}\left( \mathcal {A}^i\left( x^1,x^2,\ldots ,x^d\right) \right) ^2\ge 0. \end{aligned}$$

\(\square \)

Inequality (33) is called co-quadratic positive semidefinite. In [9], it was proved that if \(\mathcal F\) is co-quadratic positive semidefinite then

$$\begin{aligned} \mathcal {F}\left( x^1,x^2,\ldots ,x^{2d}\right) \le \max \limits _{1 \le i \le 2d}\left\{ \mathcal {F}(\underbrace{x^i,\ldots ,x^i}_{2d})\right\} . \end{aligned}$$

As a result,

$$\begin{aligned} \max \limits _{\Vert x\Vert =1}\mathcal {F}(\underbrace{x,\ldots ,x}_{2d})= \max \limits _{\Vert x^1\Vert =\cdots =\Vert x^{2d}\Vert =1}\mathcal {F}\left( x^1,x^2,\ldots ,x^{2d}\right) . \end{aligned}$$
(34)

For a solution \(X^*\), which is either optimal to (15) or (16), by Proposition 7.1 we know that \(\mathcal {X}^* =\varvec{M}\,^{-1}(X^*)\) is co-quadratic positive semidefinite. So the problem (32) on finding the rank-one projection of \(\mathcal {X}^*\) is equivalent to

$$\begin{aligned} \max \limits _{\Vert x^1\Vert =\cdots =\Vert x^{2d}\Vert =1}\mathcal {X}^*\left( x^1,x^2,\ldots ,x^{2d}\right) \end{aligned}$$

due to equality (34). Essentially, we can resort to a multi-linear problem, which is easier than (6) and can be solved, for instance, by the MBI method proposed in [10]. Note that if we apply the MBI method directly to  (6), relation (34) may not be guaranteed: the MBI may get trapped in a local minimum instead of local maximum.

8 Extensions

In this section, we show how to extend the results in the previous sections for super-symmetric tensor PCA problem to tensors that are not super-symmetric.

8.1 Bi-quadratic tensor PCA

A closely related problem to the tensor PCA problem (6) is the following bi-quadratic PCA problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {G}(x,y,x,y) \\ \mathrm{s.t.}&{} x\in \mathbf {R}^{n},\Vert x\Vert =1,\\ &{} y\in \mathbf {R}^{m},\Vert y\Vert =1, \end{array} \end{aligned}$$
(35)

where \(\mathcal G \) is a partial-symmetric tensor defined as follows.

Definition 8.1

A \(4\)-th order tensor \(\mathcal G \in \mathbf {R}^{{(nm)^2}}\) is called partial-symmetric if \(\mathcal {G}_{ijk\ell }=\mathcal {G}_{kji\ell }=\mathcal {G}_{i\ell kj}, \forall i,j,k,\ell \). The space of all \(4\)-th order partial-symmetric tensor is denoted by \(\overrightarrow{\overrightarrow{\mathbf {S}}}^{(mn)^2}\).

Various approximation algorithms for bi-quadratic PCA problem have been studied in [39]. Problem (35) arises from the strong ellipticity condition problem in solid mechanics and the entanglement problem in quantum physics; see [39] for more applications of bi-quadratic PCA problem.

We can unfold a partial-symmetric tensor \(\mathcal {G}\) in a similar manner as in Definition 1.3.

Definition 8.2

For \(\mathcal {G} \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^2}}\), we define its square matrix rearrangement, denoted by \(\varvec{M}\,(\mathcal {G})\in \mathbf {R}^{mn\times mn}\), as the following:

$$\begin{aligned} \varvec{M}\,(\mathcal {G})_{k \ell }:=\mathcal {G}_{i_1i_{2}i_{3}i_{4}}, \;\\ 1\!\le \! i_1,i_{3} \!\le \! n,\;1\le i_2,i_{4} \le m\;\text{ where }\;k \!=\!(i_1-1)m+i_2,\quad \text{ and }\quad&\ell = (i_3-1)m+i_4. \end{aligned}$$

It is easy to check that for given vectors \(a \in \mathbf {R}^n\) and \(b \in \mathbf {R}^m, a \otimes b \otimes a \otimes b \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^2}}\). Moreover, we say a partial-symmetric tensor \(\mathcal {G}\) is of rank one if \(\mathcal {G}={\lambda }\,a \otimes b \otimes a \otimes b\) for some vectors \(a,b\) and scalar \(\lambda \).

Since \(\mathrm{Tr}\,(xy^{\top }yx^{\top })=x^{\top }xy^{\top }y=1\), by letting \(\mathcal {X}=x \otimes y \otimes x \otimes y\), problem (35) is equivalent to

In the following, we group variables \(x\) and \(y\) together and treat \(x \otimes y\) as a long vector by stacking its columns. Denote \(X=\varvec{M}\,(\mathcal {X})\) and \(G = \varvec{M}\,(\mathcal {G})\). Then, we end up with a matrix version of the above tensor problem:

(36)

As it turns out, the rank-one equivalence theorem can be extended to the partial symmetric tensors. Therefore the above two problems are actually equivalent.

Theorem 8.1

Suppose \(A\) is an \(n\times m\) dimensional matrix. Then the following two statements are equivalent:

  1. (i)

    \(\mathrm{rank}(A)=1\);

  2. (ii)

    \(A \otimes A \ \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^2}}\).

In other words, suppose \(\mathcal {F} \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^2}}\), then \(\mathrm{rank}(\mathcal {F})=1 \Longleftrightarrow \mathrm{rank}(F)=1\), where \(F = \varvec{M}\,(\mathcal {F})\).

Proof

(i) \(\Longrightarrow \) (ii) is obvious. Suppose \(\mathrm{rank}(A)=1\), say \(A=ab^{\top }\) for some \(a\in \mathbf {R}^n\) and \(b \in \mathbf {R}^m\). Then \(\mathcal {G}=A \otimes A = a \otimes b \otimes a \otimes b\) is partial-symmetric.

Conversely, suppose \(\mathcal {G}=A \otimes A \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^2}}\). Then

$$\begin{aligned} A_{i_1j_1}A_{i_2j_2}=\mathcal {G}_{i_1j_1i_2j_2}=\mathcal {G}_{i_2j_1i_1j_2}=A_{i_2j_1}A_{i_1j_2}, \;\forall 1\le i_1,i_2 \le n,\;1\le j_1,j_2 \le m, \end{aligned}$$

implies \(A_{i_1j_1}A_{i_2j_2} - A_{i_2j_1}A_{i_1j_2}=0\). That is, every \(2 \times 2\) minor of matrix \(A\) is zero. Thus \(A\) is of rank one.\(\square \)

By using the similar argument in Theorem 4.1, we can show that the following SDP relaxation of (36) has a good chance to get a low-rank solution.

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathrm{Tr}\,(G X) \\ \mathrm{s.t.}&{} \mathrm{Tr}\,(X) =1,\quad X \succeq 0,\\ &{} \varvec{M}\,^{-1}(X) \in \overrightarrow{\overrightarrow{\mathbf {S}}}^{{(nm)^{2}}}. \end{array} \end{aligned}$$
(37)

Therefore, we used the same ADMM to solve (37). The frequency of returning rank-one solutions for randomly created examples is reported in Table 8. As in Tables 1 and 2, we tested \(100\) random instances for each \((n,m)\) and report the number of instances that produced rank-one solutions. We also report the average CPU time (in seconds) using ADMM to solve the problems. Table 8 shows that the SDP relaxation (37) can give a rank-one solution for most randomly created instances, and thus is likely to solve the original problem (35) to optimality.

Table 8 Frequency of problem (37) having rank-one solution

8.2 Tri-linear tensor PCA

Now let us consider a highly non-symmetric case: tri-linear PCA.

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(x,y,z) \\ \mathrm{s.t.}&{} x\in \mathbf {R}^{n},\Vert x\Vert =1,\\ &{} y\in \mathbf {R}^{m},\Vert y\Vert =1,\\ &{} z\in \mathbf {R}^{\ell },\;\, \Vert z\Vert =1, \end{array} \end{aligned}$$
(38)

where \(\mathcal {F} \in \mathbf {R}^{n \times m \times \ell }\) is any \(3\)-rd order tensor and \(n \le m \le \ell \).

Recently, tri-linear PCA problem was found to be very useful in many practical problems. For instance, Wang and Ahuja [56] proposed a tensor rank-one decomposition algorithm to compress image sequence, where they essentially solve a sequence of tri-linear PCA problems.

By the Cauchy–Schwarz inequality, the problem (38) is equivalent to

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \Vert \mathcal {F}(x,y,\cdot )\Vert \\ \mathrm{s.t.}&{} x\in \mathbf {R}^{n},\Vert x\Vert =1,\\ &{} y\in \mathbf {R}^{m},\Vert y\Vert =1, \end{array} \Longleftrightarrow \begin{array}{l@{\quad }l} \max &{} \Vert \mathcal {F}(x,y,\cdot )\Vert ^2 \\ \mathrm{s.t.}&{} x\in \mathbf {R}^{n},\Vert x\Vert =1,\\ &{} y\in \mathbf {R}^{m},\Vert y\Vert =1. \end{array} \end{aligned}$$

We further notice

$$\begin{aligned} \Vert \mathcal {F}(x,y,\cdot )\Vert ^2 =\mathcal {F}(x,y,\cdot )^{\top }\mathcal {F}(x,y,\cdot )&= {\sum \limits _{i,u=1}^{n}\sum \limits _{j,v=1}^{m}}\sum \limits _{k=1}^{\ell }\mathcal {F}_{ijk}\,\mathcal {F}_{uvk}\,x_iy_jx_uy_v\\&= {\sum \limits _{i,u=1}^{n}\sum \limits _{j,v=1}^{m}} \sum \limits _{k=1}^{\ell }\mathcal {F}_{ivk}\,\mathcal {F}_{ujk}\,x_iy_vx_uy_j. \end{aligned}$$

Therefore,

$$\begin{aligned} \Vert \mathcal {F}(x,y,\cdot )\Vert ^2 = \frac{\sum _{i,u=1}^{n}\sum _{j,v=1}^{m}\left( \sum _{k=1}^{\ell }\mathcal {F}_{ijk}\,\mathcal {F}_{uvk}+\sum _{k=1}^{\ell }\mathcal {F}_{ivk}\,\mathcal {F}_{ujk}\right) }{2}x_iy_jx_uy_v, \end{aligned}$$

and we can find a partial symmetric tensor \(\mathcal {G}\) with

$$\begin{aligned} \mathcal {G}_{ijuv}=\sum \limits _{k=1}^{\ell }\left( \mathcal {F}_{ijk}\,\mathcal {F}_{uvk} + \mathcal {F}_{ivk}\,\mathcal {F}_{ujk}\right) /2,\,\,\,\,\forall \; i,j,u,v, \end{aligned}$$

such that \(\Vert \mathcal {F}(x,y,\cdot )\Vert ^2 = \mathcal {G}\left( x,y,x,y \right) \). Hence, the tri-linear problem (38) can be equivalently formulated in the form of problem (35), which can be solved by the method proposed in the previous subsection.

8.3 Quadri-linear tensor PCA

In this subsection, we consider the following quadri-linear PCA problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(x^1,x^2,x^3,x^4) \\ \mathrm{s.t.}&{} x^i \in \mathbf {R}^{n_i}, \Vert x^i\Vert =1,\;\forall \; i=1,2,3,4, \end{array} \end{aligned}$$
(39)

where \(\mathcal {F} \in \mathbf {R}^{n_1 \times \cdots \times n_{4}}\) with \(n_1 \le n_3 \le n_2 \le n_4\). Let us first convert the quadri-linear function \(\mathcal {F}(x^1,x^2,x^3,x^4)\) to a bi-quadratic function \(\mathcal {T}\left( \begin{array}{l} {x^1} \\ {x^3}\end{array}, \begin{array}{l}{x^2} \\ {x^4}\end{array}, \begin{array}{l} {x^1} \\ {x^3}\end{array}, \begin{array}{l} {x^2}\\ {x^4}\end{array}\right) \) with \(\mathcal {T}\) being partial symmetric. To this end, we first construct \(\mathcal {G}\) with

$$\begin{aligned} \mathcal {G}_{i_1,i_2, n+i_3, n+i_4}=\left\{ \begin{array}{ll} \mathcal {F}_{j_1j_2 j_3 j_4},\quad &{} \text{ if }\;1 \le i_k \le n_k \\ 0, &{} \text{ otherwise }. \end{array}\right. \end{aligned}$$

Consequently, we have \(\mathcal {F}(x^1,x^2,x^3,x^4) = \mathcal {G}\left( \begin{array}{l} {x^1} \\ {x^3}\end{array}, \begin{array}{l}{x^2}\\ {x^4}\end{array}, \begin{array}{l} {x^1} \\ {x^3}\end{array}, \begin{array}{l} {x^2} \\ {x^4}\end{array}\right) \). Then we can further partial-symmetrize \(\mathcal {G}\) and the desired tensor \(\mathcal {T}\) is as follows,

$$\begin{aligned} \mathcal {T}_{i_1i_2 i_3 i_4}= \frac{1}{4}\left( \mathcal {G}_{i_1i_2 i_3 i_4}+\mathcal {G}_{i_1i_4 i_3 i_2}+\mathcal {G}_{i_3i_2 i_1 i_4}+\mathcal {G}_{i_3i_4 i_1 i_2} \right) \quad \forall \; i_1,i_{2},i_3,i_4, \end{aligned}$$

satisfying \(\mathcal {T}\left( \begin{array}{l} {x^1}\\ {x^3}\end{array},\quad \begin{array}{l} {x^2} \\ {x^4}\end{array},\quad \begin{array}{l} {x^1} \\ {x^3}\end{array},\quad \begin{array}{l} {x^2}\\ {x^4}\end{array}\right) =\mathcal {G}\left( \begin{array}{l}{x^1} \\ {x^3}\end{array},\quad \begin{array}{l}{x^2}\\ {x^4}\end{array},\quad \begin{array}{l} {x^1} \\ {x^3}\end{array}, \quad \begin{array}{l}{x^2} \\ {x^4}\end{array} \right) \). Therefore, problem (39) is now reformulated as a bi-quadratic problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {T}\left( \begin{array}{l} {x^1} \\ {x^3}\end{array},\quad \begin{array}{l} {x^2} \\ {x^4}\end{array},\quad \begin{array}{l} {x^1} \\ {x^3}\end{array},\quad \begin{array}{l} {x^2} \\ {x^4}\end{array}\right) \\ \mathrm{s.t.}&{} x^i \in \mathbf {R}^{n_i}, \Vert x^i\Vert =1,\;\forall \; i=1,\ldots ,4. \end{array} \end{aligned}$$
(40)

Moreover, we can show that the above problem is actually a bi-quadratic problem in the form of (35).

Proposition 8.2

Suppose \(\mathcal {T}\) is a fourth order partial symmetric tensor. Then problem (40) is equivalent to

$$\begin{aligned} \begin{array}{l@{\quad }l} \max \quad{} \mathcal {T}\left( \begin{array}{l} {x^1} \\ {x^3}\end{array},\quad \begin{array}{l} {x^2} \\ {x^4}\end{array},\quad \begin{array}{l} {x^1} \\ {x^3}\end{array},\quad \begin{array}{l} {x^2} \\ {x^4}\end{array}\right) \\ \mathrm{s.t.}\quad{}\sqrt{\Vert x^1\Vert ^2+ \Vert x^3\Vert ^2}=\sqrt{2}, \\ \quad \quad \sqrt{\Vert x^2\Vert ^2+ \Vert x^4\Vert ^2}=\sqrt{2}. \end{array} \end{aligned}$$
(41)

Proof

It is obvious that (41) is a relaxation of (40). To further prove that the relaxation (41) is tight, let \((\hat{x}^1, \hat{x}^2, \hat{x}^3, \hat{x}^4)\) be optimal to (41). Then \(\mathcal {T}\left( \begin{array}{l} {\hat{x}^1} \\ {\hat{x}^3}\end{array}, \begin{array}{l} {\hat{x}^2} \\ {\hat{x}^4}\end{array}, \begin{array}{l} {\hat{x}^1} \\ {\hat{x}^3}\end{array}, \begin{array}{l} {\hat{x}^2} \\ {\hat{x}^4}\end{array} \right) = \mathcal {F}(\hat{x}^1,\hat{x}^2,\hat{x}^3,\hat{x}^4)>0\), and so \(\hat{x}^i \ne 0\) for all \(i\). Moreover, notice that

$$\begin{aligned} \sqrt{\Vert \hat{x}^1\Vert \Vert \hat{x}^3\Vert } \le \sqrt{ \frac{ \Vert \hat{x}^1\Vert ^2 + \Vert \hat{x}^3\Vert ^2 }{2}}=1 \;\text{ and } \; \sqrt{\Vert \hat{x}^2\Vert \Vert \hat{x}^4\Vert } \le \sqrt{\frac{\Vert \hat{x}^2\Vert ^2 + \Vert \hat{x}^4\Vert ^2}{2}}=1. \end{aligned}$$

Thus

$$\begin{aligned} \mathcal {T}\left( \begin{array}{l} {\frac{\hat{x}^1}{\Vert \hat{x}^1\Vert }} \\ {\frac{\hat{x}^3}{\Vert \hat{x}^3\Vert }}\end{array}, \begin{array}{l} {\frac{\hat{x}^2}{\Vert \hat{x}^2\Vert }} \\ {\frac{\hat{x}^4}{\Vert \hat{x}^4\Vert }}\end{array},\quad \begin{array}{l} {\frac{\hat{x}^1}{\Vert \hat{x}^1\Vert }} \\ {\frac{\hat{x}^3}{\Vert \hat{x}^3\Vert }}\end{array},\quad \begin{array}{l} {\frac{\hat{x}^2}{\Vert \hat{x}^2\Vert }}\\ {\frac{\hat{x}^4}{\Vert \hat{x}^4\Vert }}\end{array}\quad \right)&= \mathcal {F}\left( \frac{\hat{x}^1}{\Vert \hat{x}^1\Vert }, \frac{\hat{x}^2}{\Vert \hat{x}^2\Vert },\frac{\hat{x}^3}{\Vert \hat{x}^3\Vert }, \frac{\hat{x}^4}{\Vert \hat{x}^4\Vert }\right) \\&= \frac{\mathcal {F}(\hat{x}^1,\hat{x}^2,\hat{x}^3,\hat{x}^4)}{\Vert \hat{x}^1\Vert \Vert \hat{x}^2\Vert \Vert \hat{x}^3\Vert \Vert \hat{x}^4\Vert }\\&\ge \mathcal {F}(\hat{x}^1,\hat{x}^2,\hat{x}^3,\hat{x}^4)\\&= \mathcal {T}\left( \begin{array}{l} {\hat{x}^1} \\ {\hat{x}^3}\end{array}, \begin{array}{l} {\hat{x}^2} \\ {\hat{x}^4}\end{array}, \begin{array}{l} {\hat{x}^1} \\ {\hat{x}^3}\end{array}, \begin{array}{l} {\hat{x}^2} \\ {\hat{x}^4}\end{array} \right) . \end{aligned}$$

To summarize, we have found a feasible solution \(\left( \frac{\hat{x}^1}{\Vert \hat{x}^1\Vert },\quad \frac{\hat{x}^2}{\Vert \hat{x}^2\Vert },\quad \frac{\hat{x}^3}{\Vert \hat{x}^3\Vert },\quad \frac{\hat{x}^4}{\Vert \hat{x}^4\Vert }\right) \) of (40), which is optimal to its relaxation (41) and thus this relaxation is tight. \(\square \)

By letting \(y= \left( \begin{array}{l} {x^1} \\ {x^3}\end{array}\right) , z=\left( \begin{array}{l} {x^2}\\ {x^4}\end{array}\right) \) and using some scaling technique, we can see that problem (41) share the same solution with

$$\begin{aligned} \begin{array}{l@{\quad }l} \max \quad {} \mathcal {T}\left( y,z,y,z\right) \\ \mathrm{s.t.} \quad {} \Vert y\Vert =1,\\ \quad \quad \Vert z\Vert =1, \end{array} \end{aligned}$$

which was studied in Subsect. 8.1.

8.4 Even order multi-linear PCA

The above discussion can be extended to the even order multi-linear PCA problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(x^1,x^2,\ldots ,x^{2d}) \\ \mathrm{s.t.}&{} x^i \in \mathbf {R}^{n_i}, \Vert x^i\Vert =1,\;\forall \; i=1,2,\ldots ,2d, \end{array} \end{aligned}$$
(42)

where \(\mathcal {F} \in \mathbb {R}^{n^1 \times \cdots \times n^{2d}}\). An immediate relaxation of (42) is the following

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(x^1,x^2,\ldots ,x^{2d}) \\ \mathrm{s.t.}&{} x^i \in \mathbf {R}^{n_i}, \sqrt{\sum \limits _{i}^{2d}\Vert x^i\Vert ^2 }=\sqrt{2d}. \end{array} \end{aligned}$$
(43)

The following result shows that these two problems are actually equivalent.

Proposition 8.3

It holds that problem (42) is equivalent to (43).

Proof

It suffices to show that relaxation (43) is tight. To this end, suppose \((\hat{x}^1,\ldots , \hat{x}^{2d})\) is an optimal solution of (43). Then \(\mathcal {F}(\hat{x}^1,\hat{x}^2,\ldots ,\hat{x}^{2d}) >0\) and so \(\hat{x}^i \ne 0\) for \(i=1,\ldots , 2d\). We also notice

$$\begin{aligned} \sqrt{\left( \,\prod \limits _{i=1}^{2d}\Vert \hat{x}_i\Vert ^2\right) ^{\frac{1}{2d}} } \le \sqrt{{\sum \limits _{i}^{2d}\Vert \hat{x}^i\Vert ^2}/{2d}}=1. \end{aligned}$$

Consequently, \(\prod \nolimits _{i=1}^{2d}\Vert \hat{x}_i\Vert \le 1\) and

$$\begin{aligned} \mathcal {F}\left( \frac{\hat{x}^1}{\Vert \hat{x}^1 \Vert },\frac{\hat{x}^2}{\Vert \hat{x}^2 \Vert },\ldots ,\frac{\hat{x}^{2d}}{\Vert \hat{x}^{2d} \Vert }\right) =\frac{\mathcal {F}(\hat{x}^1,\hat{x}^2,\ldots ,\hat{x}^{2d})}{\prod \nolimits _{i=1}^{2d}\Vert \hat{x}_i\Vert } \ge \mathcal {F}(\hat{x}^1,\hat{x}^2,\ldots ,\hat{x}^{2d}). \end{aligned}$$

Therefore, we have found a feasible solution \(\left( \frac{\hat{x}^1}{\Vert \hat{x}^1 \Vert },\frac{\hat{x}^2}{\Vert \hat{x}^2 \Vert },\ldots ,\frac{\hat{x}^{2d}}{\Vert \hat{x}^{2d} \Vert }\right) \) of (42), which is optimal to (43). This implies that the relaxation is tight. \(\square \)

We now focus on (43). Based on \(\mathcal {F}\), we can construct a larger tensor \(\mathcal {G}\) as follows

$$\begin{aligned} \mathcal {G}_{i_1\ldots i_{2d}}=\left\{ \begin{array}{cl} \mathcal {F}_{j_1\ldots j_{2d}},\quad &{}\text{ if }\;1+\sum \nolimits _{\ell =1}^{k-1}n_{\ell }\le i_k \le \sum \nolimits _{\ell =1}^{k}n_{\ell } \quad \text{ and }\quad j_k=i_k-\sum \nolimits _{\ell =1}^{k-1}n_{\ell } \\ 0,&{} \text{ otherwise }. \end{array}\right. \end{aligned}$$

By this construction, we have

$$\begin{aligned} \mathcal {F}(x^1,x^2,\ldots ,x^{2d}) = \mathcal {G}(\underbrace{y,\ldots ,y}_{2d}) \end{aligned}$$

with \(y=((x^1)^{\top },(x^2)^{\top },\ldots ,(x^{2d})^{\top })^{\top }\). We can further symmetrize \(\mathcal {G}\) and find a super-symmetric \(\mathcal {T}\) such that

$$\begin{aligned} \mathcal {T}_{i_1\ldots i_{2d}} := \frac{1}{|\pi (i_1\ldots i_{2d})|}\sum _{j_1\ldots j_{2d} \in \pi (i_1\ldots i_{2d})} \mathcal {G}_{j_1\ldots j_{2d}},\quad \forall \,1\le i_1,\ldots ,i_{2d} \le \sum \limits _{\ell =1}^{2d}n_{\ell }, \end{aligned}$$

and

$$\begin{aligned} \mathcal {T}(\underbrace{y,\ldots ,y}_{2d})=\mathcal {G} (\underbrace{y,\ldots ,y}_{2d})=\mathcal {F}(x^1,x^2,\ldots ,x^{2d}). \end{aligned}$$

Therefore, problem (43) is equivalent to

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {T}(\underbrace{y,\ldots ,y}_{2d}) \\ \mathrm{s.t.}&{} \left\| y\right\| =\sqrt{2d}, \end{array} \end{aligned}$$

which is further equivalent to

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {T}(\underbrace{y,\ldots ,y}_{2d}) \\ \mathrm{s.t.}&{} \left\| y\right\| =1. \end{array} \end{aligned}$$

Thus the methods we developed for solving (6) can be applied to solve (42).

8.5 Odd degree tensor PCA

The last problem studied in this section is the following odd degree tensor PCA problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d+1}) \\ \mathrm{s.t.}&{} \Vert x\Vert =1, \end{array} \end{aligned}$$
(44)

where \(\mathcal {F}\) is a \((2d+1)\)-th order super-symmetric tensor. As the degree is odd,

$$\begin{aligned} \max _{\Vert x\Vert =1} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d+1})=\max _{\Vert x\Vert =1} |\mathcal {F}(\underbrace{x,\ldots ,x}_{2d+1})|=\max _{\Vert x^i\Vert _2^2=1,\;i=1,\ldots ,2d+1}| \mathcal {F}(x^1,\ldots ,x^{2d+1})|, \end{aligned}$$

where the last identity is due to Corollary 4.2 in [10]. The above formula combined with the fact that

$$\begin{aligned} \max _{\Vert x\Vert =1} |\mathcal {F}(\underbrace{x,\ldots ,x}_{2d+1})|&\le \max _{\Vert x\Vert =1,\;\Vert y\Vert =1} |\mathcal {F}(\underbrace{x,\ldots ,x}_{2d},y)|\\&\le \max _{\Vert x^i\Vert =1,\;i=1,\ldots ,2d+1}| \mathcal {F}(x^1,\ldots ,x^{2d+1})| \end{aligned}$$

implies

$$\begin{aligned} \max _{\Vert x\Vert =1} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d+1})=\max _{\Vert x\Vert =1,\;\Vert y\Vert =1} |\mathcal {F}(\underbrace{x,\ldots ,x}_{2d},y)| =\max _{\Vert x\Vert =1,\;\Vert y\Vert =1} \mathcal {F}(\underbrace{x,\ldots ,x}_{2d},y). \end{aligned}$$

By using the similar technique as in Sect. 8.2, problem (44) is equivalent to an even order tensor PCA problem:

$$\begin{aligned} \begin{array}{l@{\quad }l} \max &{} \mathcal {G}(\underbrace{x,\ldots ,x}_{4d}) \\ \mathrm{s.t.}&{} \Vert x\Vert =1, \end{array} \end{aligned}$$

where \(\mathcal {G}\) is super-symmetric with

$$\begin{aligned} \mathcal {G}_{i_1,\ldots ,i_{4d}}= \frac{1}{|\pi (i_1\ldots i_{4d})|} \sum \limits _{k=1}^{n}\left( \sum \limits _{j_1\ldots j_{4d}\in \pi (i_1\ldots i_{4d})}\mathcal {F}_{i_1\ldots i_{2d}k}\,\mathcal {F}_{i_{2d+1}\ldots i_{4d}k}\right) . \end{aligned}$$

9 Conclusions

Tensor PCA is an emerging area of research with many important applications in image processing, data analysis, statistical learning, and bio-informatics. In this paper we introduced a new matricization scheme, which ensures that if the tensor is of rank one (in the sense of CP rank), then its matricization is a rank-one matrix, and vice versa. This enables one to apply the methodology in compressive sensing and matrix rank minimization, in particular the \(L_1\)-norm and nuclear norm optimization techniques. As it turns out, this approach is very likely to yield a rank-one solution. This effectively finds the leading PC by convex optimization, at least for randomly generated problem instances. The resulting convex optimization model is still large in general. We proposed to use the first-order method such as the ADMM method, which turns out to be very effective in this case. Multiple PCs can be computed sequentially via the so-called “deflation” technique.