1 Introduction

Tensor Candecomp/Parafac(CP) decomposition and tensor rank are the basic problems in tensor research and have lots of important applications in data analysis [1, 2], signal processing [3, 4] and others [5, 6]. Although the tensor decomposition problem is NP hard [7], there are many researchers still doing it without bored and tired. The decomposition of a tensor into an outer product of vectors and the corresponding notations were first proposed and studied by Frank L. Hitchcock in [8, 9]. The same decomposition was rediscovered in 1970s [20] in order to extend the data analysis model to multidimensional arrays.

Symmetric tensors are a class of tensors with many different decomposition properties and special applications [10,11,12], so many scholars have conducted in-depth research on symmetric tensors. Comon et al. [13] proved that any symmetric tensor can be decomposed into a linear combination of rank-1 tensor. Brachat et al. [14] presented an algebraic method for symmetric tensor decompositions, based on linear algebra computation with Hankel matrices by extending Sylvester’s theorem to multiple variables in 2010. Kolda [15] considered real-valued decomposition of symmetric tensor, both nonnegative and sparse and presented a numerical optimization algorithm based on gradient in 2015. In 2017, Nie [16] devised a combination of polynomial optimization and numerical approaches for solving complex-valued symmetric tensor decompositions. In 2022, Liu [17] proposed an alternating gradient descent algorithm for solving symmetric tensor CP decomposition by minimizing a multiconvex optimization problem.

Partially symmetric tensor can be used to solve practical engineering problems, such as elastic material analysis [18] and quantum information theory [19]. Carroll and Chang [20] first considered the CP decomposition of partially symmetric tensors and proposed an alternating method of ignoring the symmetry, the decomposition obtained by this method does not have the property of structure preserving unless the decomposition satisfies the uniqueness condition [21]. In 2013, Li and Navasca [22] proposed the so called partial column-wise least squares method to obtain the CP decomposition of the partially symmetric tensor of the third and fourth order. Ni and Li [23] proved that any partially symmetric tensor has a partially symmetric CP decomposition and presented a semi-definite relaxation algorithm. However, the semi-definite relaxation is hard to calculate a partially symmetric CP decomposition if the tensor has higher order and higher dimension, since the number of moments will increase sharply with the increase of tensor order and dimension.

Motivated by the above, we consider the numerical algorithm of structure preserving rank-R approximation of partially symmetric tensors. We deduce the gradient formula of the structure preserving rank-R approximation loss function, propose a gradient descent algorithm and a BFGS algorithm, and analyze the convergence of the BFGS algorithm. The BFGS algorithm can be used to the structure preserving CP decomposition of partially symmetric tensors, completely symmetric tensors and non-symmetric tensors, respectively. Numerical examples show that the BFGS algorithm has good numerical performance.

The paper is structured as follows. Section 2 introduces some basic concepts and related properties, including matrix and tensor product, inner product and Frobenius norm of tensor, as well as partially symmetric tensor, etc. Section 3 deduces the gradient of the partially symmetric rank-R approximation loss function. In Sect. 4, we propose the BFGS algorithm and discuss its convergence. In Sect. 5, We derive the discrimination of partially symmetric tensors with the positive decomposition. Finally, numerical experiments are given in Sect. 6.

Notation. \({\mathbb {N}}_{+}\), \({\mathbb {R}}\) and \({\mathbb {C}}\) denote the set of positive integers, real field and complex field, respectively. A uppercase letter in calligraphic font denotes a tensor, e.g., \({\mathcal {T}}\). A uppercase letter represents a matrix, e.g., U. A boldface lowercase letter represents a vector, e.g., \({\textbf{v}}\). A lowercase letter represents a scalar, e.g., x. The entry with row index i and column index j in a matrix U, i.e., \((U)_{ij}\), is symbolized by \(u_{ij}\) ( also \(({\textbf{v}})_i = v_i\) and \(({\mathcal {T}})_{i_1i_2\cdots i_N} = t_{i_1i_2\cdots i_N}\) ). Let \(s>0\) be an integer, denote \(\left[ s\right] :=\{1,2,\cdots ,s\}\) as an integer set. \({\mathfrak {S}}_t\) denotes the set of all permutations of \(\{1,2,\cdots ,m_t\}\), \(m_t\) is a positive integer, \(t\in [s]\).

2 Preliminaries

2.1 Matrix and tensor products

A tensor can be regarded as a multiway array, which is a generalization of matrices. A tensor is represented with calligraphic script uppercase letters in this paper, e.g., \({\mathcal {T}}\). In this subsection, some basic concepts and products of matrix and tensor will be reviewed. For more details, please refer to the literature [24].

The Kronecker product of matrices \(A \in {{{\mathbb {R}}}^{I\times J}}\) and \( B \in {{{\mathbb {R}}}^{K\times L}}\), denoted as \(A \otimes B\), is defined as

$$\begin{aligned} A \otimes B:= \begin{bmatrix} a_{11} B &{} a_{12} B &{} \cdots &{} a_{1J} B\\ a_{21} B &{} a_{22} B &{} \cdots &{} a_{2J} B\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ a_{I1} B &{} a_{I2} B &{} \cdots &{} a_{IJ} B\\ \end{bmatrix} \in {{\mathbb {R}}^{IK\times JL}}. \end{aligned}$$

The Khatri-Rao product of two matrices \(A\in {{\mathbb {R}}^{I\times K}}\) and \(B\in {{\mathbb {R}}^{J\times K}}\) is defined as

$$\begin{aligned} A \odot B:=[{\textbf{a}}_1\otimes {\textbf{b}}_1\ {\textbf{a}}_2\otimes {{\textbf{b}}_2}\ \cdots \ {\textbf{a}}_K\otimes {{\textbf{b}}}_K] \in {{\mathbb {R}}^{IJ\times K}}. \end{aligned}$$

If \({\textbf{a}}\) and \({\textbf{b}}\) are vectors, then the Khatri-Rao and Kronecker products are identical, i.e., \({\textbf{a}}\otimes {\textbf{b}}={\textbf{a}}\odot {\textbf{b}}\).

The Hadamard product of two same-sized matrices \(A,\ B\in {{\mathbb {R}}^{I\times J}}\), denoted as \(A*B\in {{\mathbb {R}}^{I\times J}}\), is defined as

$$\begin{aligned} {\left( A*B\right) }_{ij}:=a_{ij}b_{ij}. \end{aligned}$$

We use ”\(\circ \)” to represent the outer product of two arbitrary tensors \({\mathcal {X}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\), \({\mathcal {Y}}\in {\mathbb {R}}^{J_1\times J_2\times \cdots \times J_n}\)

$$\begin{aligned} {\mathcal {X}}\circ {\mathcal {Y}}=\left( x_{i_1i_2\cdots i_m}y_{j_{1}j_{2}\cdots j_{n}}\right) \in {\mathbb {R}}^{I_1\times \cdots \times I_m\times J_1\times \cdots \times J_n}. \end{aligned}$$
(2.1)

Invoking the definition of tensor outer product as described in (2.1), we can get that for vectors \({{\textbf{x}}_k}\in {\mathbb {R}}^{I_k}\), for \(k\in [m]\), their tensor outer product \({\textbf{x}}_1\circ {\textbf{x}}_2\circ \cdots \circ {\textbf{x}}_m\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\) is an mth-order rank-1 tensor such that

$$\begin{aligned} ({{\textbf{x}}_1}\circ {{\textbf{x}}_2}\circ \cdots \circ {{\textbf{x}}_m})_{i_1i_2\cdots i_m}= {\left( {\textbf{x}}_1\right) }_{i_1} {\left( {\textbf{x}}_2\right) }_{i_2}\cdots {\left( {\textbf{x}}_m\right) }_{i_m}. \end{aligned}$$

In particular, if \({\textbf{x}}_1={\textbf{x}}_2=\cdots {\textbf{x}}_m={\textbf{x}}\in {\mathbb {R}}^n\), then \({\textbf{x}}^{\circ m}\equiv \overbrace{{\textbf{x}}\circ {\textbf{x}}\circ \cdots \circ {\textbf{x}}}^m\) is an mth-order n-dimensional symmetric rank-1 tensor and is denoted as \({\textbf{x}}^m\) for simplicity.

Definition 2.1

Let \({\mathcal {T}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\) be a tensor and vectors \({{\textbf{x}}_k}\in {\mathbb {R}}^{I_k}\) for \(k\in [m]\). Define the contraction product of tensor \({\mathcal {T}}\) and a rank-1 tensor as follows

$$\begin{aligned}{} & {} {\mathcal {T}}\cdot \circ _{i=1}^m {\textbf{x}}_i:= \sum _{i_1, \cdots , i_m=1}^{I_1, \cdots , I_m} t_{i_1\cdots i_m} ({\textbf{x}}_1)_{i_1}\cdots ({\textbf{x}}_m)_{i_m}\in {\mathbb {R}},\\{} & {} {\mathcal {T}}\cdot \circ _{i=1, i\not =k}^m {\textbf{x}}_i:= \left( \sum \limits _{i_1, \cdots , i_{k-1}, i_{k+1}, \cdots , i_m=1}^{I_1, \cdots , I_{k-1}, I_{k+1},\cdots , I_m} t_{i_1\cdots i_m} ({\textbf{x}}_1)_{i_1}\cdots ({\textbf{x}}_{k-1})_{i_{k-1}} ({\textbf{x}}_{k+1})_{i_{k+1}}\cdots ({\textbf{x}}_m)_{i_m}\right) \in {\mathbb {R}}^{I_k}. \end{aligned}$$

Generally, the inner product of two tensors \({\mathcal {X}},{\mathcal {Y}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\), denoted as \({\mathcal {X}}\cdot {\mathcal {Y}}\) or \(\langle {\mathcal {X}},{\mathcal {Y}}\rangle \), is defined as

$$\begin{aligned} {\mathcal {X}}\cdot {\mathcal {Y}}:=\sum \limits _{i_1=1}^{I_1}\sum \limits _{i_2=1}^{I_2}\cdots \sum \limits _{i_m=1}^{I_m}{x_{i_1i_2\cdots i_m}}{y_{i_1i_2\cdots i_m}}. \end{aligned}$$

The Frobenius norm of a tensor \({\mathcal {X}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\) is defined as

$$\begin{aligned} {\Vert {{\mathcal {X}}}\Vert }_{F}:=\sqrt{\sum \limits _{i_1=1}^{I_1} \sum \limits _{i_2=1}^{I_2}\cdots \sum \limits _{i_m=1}^{I_m}{x_{i_1i_2\cdots i_m}^2}}. \end{aligned}$$

It follows immediately that \(\left<{\mathcal {X}},{\mathcal {X}}\right>={\Vert {{\mathcal {X}}}\Vert }^2_{F}\).

The mode-k matricization of a tensor \({\mathcal {T}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\) is denoted as \(T_{\left( k\right) }\) and rearrange its elements into a matrix of size \(I_k\times \prod _{i\ne k}I_i\). Element \(\left( i_1,i_2,\cdots ,i_m\right) \) maps to matrix entry \(\left( i_k,j\right) \), where

$$\begin{aligned} j=1+\sum \limits _{t\not =k}^m{\left( i_t-1\right) }\prod \limits _{n\not =k}^{t-1}I_n. \end{aligned}$$

Let \({\mathcal {T}}\in {\mathbb {R}}^{I_1\times I_2\times \cdots \times I_m}\) be an mth-order tensor. The CP decomposition of \({\mathcal {T}}\) is

$$\begin{aligned} {\mathcal {T}}=[\![A_1,A_2,\cdots ,A_m]\!]\equiv \sum \limits _{i=1}^{R}{\textbf{a}}_i^{(1)}\circ {\textbf{a}}_i^{(2)}\circ \cdots \circ {\textbf{a}}_i^{(m)}, \end{aligned}$$

where \(A_k=({\textbf{a}}_1^{(k)}\ {\textbf{a}}_2^{(k)}\ \cdots \ {\textbf{a}}_R^{(k)})\in {\mathbb {R}}^{I_k\times R}\), \(k\in [m]\) are called factor matrices. The mode-k matricization of tensor \({\mathcal {T}}=[\![A_1,A_2,\cdots ,A_m]\!]\), \(A_k\in {\mathbb {R}}^{I_k\times R}\), for \(k\in [m]\) can be written in the form of factor matrix,

$$\begin{aligned} {} T_{\left( k\right) }=A_k{\left( A_m \odot \cdots \odot {A_{k+1}}\odot {A_{k-1}} \odot \cdots \odot {A_1}\right) }^T. \end{aligned}$$
(2.2)

2.2 Partially symmetric tensor

Partially symmetric tensor is the main content of this paper, so its basic concepts and some properties are indispensable. Detailed introduction can be referred to the literature [23, 25,26,27].

Definition 2.2

[23] Let \({\textbf{m}}=\left( m_1,m_2,\cdots ,m_s\right) ,\ {\textbf{n}}=\left( n_1,n_2,\cdots ,n_s\right) \in {\mathbb {N}}_{+}^s\). An \({\textbf{m}}\)th-order \({\textbf{n}}\)-dimensional tensor \({\mathcal {T}}\) is an array over the field \({\mathbb {F}}\) indexed by integer tuples \(\left( i_1,\cdots ,i_{m_1},j_1,\cdots ,j_{m_2},\cdots ,l_1,\cdots ,l_{m_s}\right) \), i.e.,

$$\begin{aligned} {\mathcal {T}}=\left( {t_{i_1,\cdots ,i_{m_1},j_1,\cdots ,j_{m_2},\cdots ,l_1,\cdots ,l_{m_s}}}\right) \in {{\mathbb {F}}}^{n_1^{m_1}\times n_2^{m_2}\times \cdots \times n_s^{m_s}} \end{aligned}$$

with \(i_1,\cdots ,i_{m_1}\in {\left[ n_1\right] }\), \({j_1,\cdots ,j_{m_2}\in {\left[ n_2\right] }}\), \(\cdots \), \({l_1,\cdots ,l_{m_s}\in {\left[ n_s\right] }}\).

The space of all tensors over the filed \({\mathbb {F}}\) is denoted as \(T\left[ {\textbf{m}}\right] {\mathbb {F}}\left[ {\textbf{n}}\right] \). \({\mathcal {T}}\) is called a square tensor, if all dimensions are equal, that is to say \(n_1=n_2=\cdots =n_s\).

Definition 2.3

[25] Let \({\textbf{m}}=\left( m_1,m_2,\cdots ,m_s\right) ,\ {\textbf{n}}=\left( n_1,n_2,\cdots ,n_s\right) \in {\mathbb {N}}_{+}^s\). A tensor \({\mathcal {S}}\in {T\left[ {\textbf{m}}\right] {\mathbb {F}}\left[ {\textbf{n}}\right] }\) is called partially symmetric if

$$\begin{aligned} s_{i_1\cdots i_{m_1}j_1\cdots j_{m_2}\cdots l_1\cdots l_{m_s}}= s_{{\sigma _1(i_1)}\cdots {\sigma _1(i_{m_1})}{\sigma _2(j_1)}\cdots {\sigma _2(j_{m_2})}\cdots {\sigma _s(l_1)}\cdots {\sigma _s(l_{m_s})}} \end{aligned}$$

for every permutation \(\sigma _t\in {\mathfrak {S}}_t\), where \({\mathfrak {S}}_t\) is the set of all permutations of \(\left\{ 1,2,\cdots ,m_t\right\} \), \(t\in {\left[ s\right] }\).

The space of all \({\textbf{m}}\)th-order \({\textbf{n}}\)-dimensional partially symmetric tensors over the field \({\mathbb {F}}\) is denoted by \({S[{\textbf{m}}]{\mathbb {F}}[{\textbf{n}}]}\).

Definition 2.4

[23] Let vectors \({\textbf{u}}^{\left( t\right) }_k\in {{\mathbb {F}}}^{n_t}\) for all \(t\in \left[ s\right] \). The result of the tensor outer product \({\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1}\circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s}\) is a rank-1 partially symmetric tensor. A tensor \({\mathcal {S}}\in {S\left[ {\textbf{m}}\right] {\mathbb {F}}\left[ {\textbf{n}}\right] }\) is said to have a partially symmetric CP decomposition if the tensor \({\mathcal {S}}\) has a decomposition as

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^R{\lambda _k}{\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1} \circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s}, \lambda _k\in {\mathbb {F}}, {\textbf{u}}^{\left( t\right) }_k\in {{\mathbb {F}}}^{n_t}, \Vert {{\textbf{u}}^{(t)}_k}\Vert _2=1. \end{aligned}$$

If \(m_1,m_2,\cdots ,m_s=1\), then the tensor \({\mathcal {T}}\in {T[{\textbf{m}}]{\mathbb {F}}[{\textbf{n}}]}\) is an sth-order \(n_1\times n_2\times \cdots \times n_s\)-dimensional tensor, the partially symmetric CP decomposition of \({\mathcal {T}}\) is an usual nonsymmetric CP decomposition. If \(s=1\) and \(n_1=n_2=\cdots =n_s=n\), let \(m_1=m\), then \({\mathcal {T}}\in {T[m]{\mathbb {F}}[n]}\) is an mth-order n-dimensional symmetric tensor.

Theorem 2.1

[23] Let \({\textbf{m}}=\left( m_1,m_2,\cdots ,m_s\right) ,\ {\textbf{n}}=\left( n_1,n_2,\cdots ,n_s\right) \in {\mathbb {N}}_{+}^s\), and \({\mathcal {S}}\in S\left[ {\textbf{m}}\right] {\mathbb {F}}\left[ {\textbf{n}}\right] \) be a partially symmetric tensor. Then there exist \(\lambda _k\in {\mathbb {F}}\), \({\textbf{u}}^{(t)}_k\in {{\mathbb {F}}}^{n_t}\), \(\Vert {{\textbf{u}}^{(t)}_k}\Vert _2=1\) for \(k\in [R]\) and \(t\in [s]\) such that

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^R{\lambda _k}{\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1} \circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s}. \end{aligned}$$
(2.3)

i.e., any partially symmetric tensor (over any field) has a partially symmetric CP decomposition.

If \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) has a decomposition as (2.3) with \(\lambda _k\ge 0\) for all \(k\in [R]\), then we say that \({\mathcal {S}}\) has a positive partially symmetric CP decomposition. According to Theorem 2.1, there is the following corollary.

Corollary 2.1

If there is \(i\in [s]\) such that \(m_i\) is an odd number, then, any nonzero tensor \({\mathcal {S}}\in {S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]}\) has a positive partially symmetric CP decomposition. However, \({\mathcal {S}}\) may not have positive partially symmetric CP decomposition, if \(m_1, \cdots , m_s\) are even.

Proof

If there is \(i\in \left[ s\right] \) such that \(m_i\) is an odd number, without loss of generality, let \(m_1\) be an odd number. Then, the \(m_1\)th real root of \(\lambda _k\) always exists for \(k\in [R]\), so we can rewrite the item \(\lambda _k\left( {\textbf{u}}_k^{(1)}\right) ^{m_1}\) as

$$\begin{aligned} \lambda _k\left( {\textbf{u}}_k^{(1)}\right) ^{m_1}=\left( \tilde{{\textbf{u}}}_k^{(1)}\right) ^{m_1},\quad \mathrm{{where}}\ \tilde{{\textbf{u}}}_k^{(1)}=\root m_1 \of {\lambda }{\textbf{u}}_k^{(1)}. \end{aligned}$$

Then,

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^R {\left( \tilde{{\textbf{u}}}^{(1)}_k\right) }^{m_1} \circ {\left( {\textbf{u}}^{(2)}_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{(s)}_k\right) }^{m_s}. \end{aligned}$$

That is, \({\mathcal {S}}\in {S\left[ {{\textbf{m}}}\right] {\mathbb {R}}\left[ {{\textbf{n}}}\right] }\) has a positive partially symmetric CP decomposition. However, if all orders are even, then \(m_t\)th real root does not exist if \(\lambda _k<0\), for \(t\in [s]\), so the scalar cannot be absorbed. In this case, \({\mathcal {S}}\) may not have positive partially symmetric CP decomposition. This completes the proof. \(\square \)

If \({\mathcal {S}}\) has a positive partially symmetric CP decomposition as in (2.3), then we can write \(\lambda _k \left( {\textbf{u}}^{(t)}_k\right) ^{m_t} = \left( (\lambda _k)^{1/m_t} {\textbf{u}}^{(t)}_k\right) ^{m_t}\) for some \(t\in [s]\). In this case, the decomposition (2.3) can be rewritten as follows

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^R {\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1} \circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s}. \end{aligned}$$
(2.4)

Let \(U_t=({\textbf{u}}_1^{(t)},{\textbf{u}}_2^{(t)},\cdots ,{\textbf{u}}_R^{(t)})\in {\mathbb {R}}^{n_t\times R}\) for \(t\in [s]\). We call \(\{U_1,U_2,\cdots ,U_s\}\) as a tuple of factor matrices of partially symmetric tensor \({\mathcal {S}}\). The partially symmetric CP decomposition (2.4) can also be written as

$$\begin{aligned} {\mathcal {S}}= & {} [\![{\overbrace{U_1,\cdots ,U_1}^{m_1}}, {\overbrace{U_2,\cdots ,U_2}^{m_2}},\cdots ,{\overbrace{U_s,\cdots ,U_s}^{m_s}}]\!]\nonumber \\\equiv & {} [\![U_1^{\times m_1},U_2^{\times m_2},\cdots ,U_s^{\times m_s}]\!]. \end{aligned}$$
(2.5)

3 Positive partially symmetric rank-R approximation

Now, we study the positive partially symmetric rank-R approximation of partially symmetric tensors. Let s, R be two positive integers and \({\textbf{m}}=\left( m_1,m_2,\cdots ,m_s\right) ,{\textbf{n}}=\left( n_1,n_2,\cdots ,n_s\right) \in {\mathbb {N}}_{+}^{s}\). Given a real partially symmetric tensor \({\mathcal {S}}\in S\left[ {\textbf{m}}\right] {\mathbb {R}}\left[ {\textbf{n}}\right] \), find positive scalars \(\lambda _k > 0\) and unit-norm vectors \({\textbf{u}}_k^{(t)}\in {\mathbb {R}}^{n_t}\) for \(t\in [s]\) and \(k\in [R]\) such that the rank-R tensor

$$\begin{aligned} \hat{{\mathcal {S}}}:=\sum \limits _{k=1}^R{\lambda _k}{\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1} \circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots } \circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s} \end{aligned}$$

minimizes the function

$$\begin{aligned} f(\hat{{\mathcal {S}}})=\frac{1}{2}\Vert {{\mathcal {S}}}-\hat{{\mathcal {S}}}\Vert _{F}^2. \end{aligned}$$

The positive partially symmetric rank-R approximation problem of tensors \({\mathcal {S}}\) is equivalent to solving the following optimization problem

$$\begin{aligned}&\min \frac{1}{2}\left\| {{\mathcal {S}}-\sum \limits _{k=1}^R{\lambda _k}{\left( {\textbf{u}}^{\left( 1\right) }_k\right) }^{m_1}\circ {\left( {\textbf{u}}^{\left( 2\right) }_k\right) }^{m_2}\circ {\cdots }\circ {\left( {\textbf{u}}^{\left( s\right) }_k\right) }^{m_s} }\right\| ^2_F\\&\quad s.t. \lambda _k>0 ,{\textbf{u}}^{(t)}_k\in {{\mathbb {F}}}^{n_t}, \Vert {{\textbf{u}}^{(t)}_k}\Vert _2=1,k\in [R],t\in [s]. \end{aligned}$$
(P1)

Since \({\lambda }_k>0\) for all \(k\in [R]\), we can rewrite \(\hat{{\mathcal {S}}}\) as

$$\begin{aligned} \hat{{\mathcal {S}}}=\sum \limits _{k=1}^R{\left( \tilde{{\textbf{u}}}^{(1)}_k\right) }^{m_1} \circ {\left( \tilde{{\textbf{u}}}^{(2)}_k\right) }^{m_2}\circ {\cdots } \circ {\left( \tilde{{\textbf{u}}}^{(s)}_k\right) }^{m_s}. \end{aligned}$$
(3.1)

Then \({{\textbf{u}}}^{\left( t\right) }_k={\tilde{{\textbf{u}}}^{\left( t\right) }_k}/{\Vert {\tilde{{\textbf{u}}}^{(t)}_k}\Vert _2}\) and \({\lambda }_k=\prod \nolimits _{t=1}^s{{\Vert {\tilde{{\textbf{u}}}^{(t)}_k}\Vert }_2^{m_t}}\), for all \(k\in [R]\) and \(t\in [s]\).

The right hand side of (3.1) may also be written as the form of a tuple of factor matrices

$$\begin{aligned} \hat{{\mathcal {S}}}=[\![U_1^{\times m_1},U_2^{\times m_2},\cdots ,U_s^{\times m_s}]\!], \end{aligned}$$

where \(U_t=\left( \tilde{{\textbf{u}}}_1^{(t)}, \tilde{{\textbf{u}}}_2^{(t)}, \cdots , \tilde{{\textbf{u}}}_R^{(t)}\right) \in {\mathbb {R}}^{n_t\times R}\) is a matrix for all \(t\in [s]\).

Let \({\textbf{U}} \in {\mathbb {R}}^{(n_1+\cdots +n_s)\times r}\) be a block-matrix vector as

$$\begin{aligned} \textbf{U}= \begin{pmatrix} U_1 \\ U_2 \\ \vdots \\ U_s \end{pmatrix}. \end{aligned}$$

Then the constrained optimization problem (P1) is turned to an unconstrained optimization

$$\begin{aligned} \min \limits _\mathbf{{U}} f_{{\mathcal {S}}}(\textbf{U})=f_{{\mathcal {S}}}(U_1, U_2, \cdots , U_s) :=\frac{1}{2}\left\| {\mathcal {S}}- [\![U_1^{\times m_1},U_2^{\times m_2},\cdots ,U_s^{\times m_s}]\!] \right\| _F^{2}. \end{aligned}$$
(3.2)

3.1 Gradient calculation of \(f_{{\mathcal {S}}}({{\textbf{U}}})\)

Lemma 3.1

Let s and R be two positive integers, \({\textbf{m}}=(m_1,m_2,\cdots ,m_s),{\textbf{n}}=(n_1,n_2,\cdots ,n_s)\in {\mathbb {N}}_{+}^s\) and \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor. Let

$$\begin{aligned} \hat{{\mathcal {S}}}=\sum _{k=1}^R \left( {\textbf{u}}^{(1)}_k\right) ^{m_1} \circ \left( {\textbf{u}}^{(2)}_k\right) ^{m_2}\circ {\cdots } \circ \left( {\textbf{u}}^{(s)}_k\right) ^{m_s},\ {\textbf{u}}^{(t)}_k\in {\mathbb {R}}^{n_t}. \end{aligned}$$

Then

$$\begin{aligned} \frac{\partial {\langle {\mathcal {S}},\hat{{\mathcal {S}}}\rangle }}{\partial {{\textbf{u}}_k^{(t)}}}= m_t\ {\mathcal {S}}\cdot \left( {\left( {\textbf{u}}^{(1)}_k\right) }^{m_1}\circ \cdots \circ {\left( {\textbf{u}}^{(t)}_k\right) }^{{m_t}-1}\circ \cdots \circ {\left( {\textbf{u}}^{(s)}_k\right) }^{m_s}\right) \in {{{\mathbb {R}}}^{n_t}}. \end{aligned}$$

Proof

Assume that \({\mathcal {A}}\in S[m]{\mathbb {R}}[n]\) be a symmetric tensor of m-order. Then,

$$\begin{aligned} \frac{\partial {\mathcal {A}}\cdot {\textbf{x}}^{m}}{\partial {\textbf{x}}} = m{\mathcal {A}}\cdot {\textbf{x}}^{m-1}. \end{aligned}$$
(3.3)

Define a function \(F(\hat{{\mathcal {S}}})=\langle {\mathcal {S}},\hat{{\mathcal {S}}}\rangle \). Then,

$$\begin{aligned} F(\hat{{\mathcal {S}}})= & {} \langle {\mathcal {S}},\sum _{k=1}^R \left( {\textbf{u}}^{(1)}_k\right) ^{m_1} \circ \left( {\textbf{u}}^{(2)}_k\right) ^{m_2}\circ {\cdots } \circ \left( {\textbf{u}}^{(s)}_k\right) ^{m_s}\rangle \\= & {} \sum _{k=1}^R {\mathcal {S}}\cdot \left( {\textbf{u}}^{(1)}_k\right) ^{m_1} \circ \left( {\textbf{u}}^{(2)}_k\right) ^{m_2}\circ {\cdots } \circ \left( {\textbf{u}}^{(s)}_k\right) ^{m_s}. \end{aligned}$$

According to the formula (3.3), we have that

$$\begin{aligned} \frac{\partial {F(\hat{{\mathcal {S}}})}}{\partial {{\textbf{u}}_k^{\left( t\right) }}}= m_t\ {\mathcal {S}}\cdot \left( \left( {\textbf{u}}^{(1)}_k\right) ^{m_1}\circ \cdots \circ \left( {\textbf{u}}^{(t)}_k\right) ^{{m_t}-1}\circ \cdots \circ \left( {\textbf{u}}^{(s)}_k\right) ^{m_s}\right) . \end{aligned}$$

This completes proof. \(\square \)

Theorem 3.1

Let \({\textbf{m}}=(m_1,m_2,\cdots ,m_s),\ {\textbf{n}}=(n_1,n_2,\cdots ,n_s) \in {\mathbb {N}}_{+}^s\), \(M_t:=m_1+\cdots +m_t\), \(t\in [s]\). Let \({\mathcal {S}}\in {S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]}\) be a given partially symmetric tensor, \(U_t=({\textbf{u}}_1^{(t)},{\textbf{u}}_2^{(t)},\cdots , {\textbf{u}}_R^{(t)})\in {{\mathbb {R}}}^{{n_t}\times R}\) be a matrix for all \(k\in [R]\), \(t\in [s]\) and \({\textbf{U}}=(U_1^T, U_2^T, \cdots , U_s^T)^T\) be a block-matrix vector. Then the first order partial derivative of the function \(f_{\mathcal {S}}({\textbf{U}})\) defined in (3.2) with respect to \(U_t\) is

$$\begin{aligned} \frac{\partial f_{\mathcal {S}}({\textbf{U}})}{\partial {U_t}}=-m_t(S_{(M_t)}W_t-U_tW_t^T W_t), \end{aligned}$$

where \(S_{(M_t)}\) is the mode-\(M_t\) matricization of the tensor \({\mathcal {S}}\) and

$$\begin{aligned} W_t={{\textrm{U}_s}}^{\odot {m_s}}\odot \cdots \odot {{\textrm{U}_t}}^{\odot {m_{t}-1}}\odot \cdots \odot {{\textrm{U}_1}}^{\odot {m_1}}. \end{aligned}$$
(3.4)

Proof

According to the definition of derivative, the first order partial derivative of the function with respect to \(U_t\) is

$$\begin{aligned} \frac{\partial {f_{\mathcal {S}}(\textbf{U})}}{\partial {U_t}}=\left( {\frac{\partial {f_{\mathcal {S}}({\textbf{U}})}}{\partial {\textbf{u}}_1^{(t)}}}, {\frac{\partial {f_{\mathcal {S}}(\textbf{U})}}{\partial {{\textbf{u}}_2^{(t)}}}},\cdots , {\frac{\partial {f_{\mathcal {S}}(\textbf{U})}}{\partial {{\textbf{u}}_R^{(t)}}}}\right) . \end{aligned}$$

Let \(\hat{{\mathcal {S}}}=[\![U_1^{\times m_1},U_2^{\times m_2},\cdots ,U_s^{\times m_s}]\!].\) Then

$$\begin{aligned} {\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert }^2=\langle {\mathcal {S}}, {\mathcal {S}}\rangle -2\langle {\mathcal {S}},\hat{{\mathcal {S}}}\rangle +\langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}\rangle ,\ \frac{\partial f_{\mathcal {S}}({\textbf{U}})}{\partial {\textbf{u}}_k^{(t)} }= -\frac{\partial \langle {\mathcal {S}},\hat{{\mathcal {S}}} \rangle }{\partial {{\textbf{u}}_k^{(t)}} }+ \frac{\partial \langle \hat{{\mathcal {S}}}, \hat{{\mathcal {S}}}\rangle }{2\partial {{\textbf{u}}_k^{(t)}} }. \end{aligned}$$

From the Lemma 3.1, for every \(k\in [R]\), it is followed that

$$\begin{aligned} \frac{\partial f_{\mathcal {S}}(\textbf{U}) }{\partial {\textbf{u}}_k^{(t)} }= -m_t\ ({\mathcal {S}}-\hat{{\mathcal {S}}})\cdot \left( \left( {\textbf{u}}^{(1)}_k\right) ^{m_1}\circ \cdots \circ \left( {\textbf{u}}^{(t)}_k\right) ^{{m_t}-1}\circ \cdots \circ \left( {\textbf{u}}^{(s)}_k\right) ^{m_s}\right) . \end{aligned}$$

According to definition of the mode-k matricization, we have that

$$\begin{aligned} \frac{\partial f_{\mathcal {S}}({\textbf{U}})}{\partial {\textbf{u}}_k^{(t)} } =-m_t{({S-\hat{S}})}_{(M_t)} \left( {\left( {\textbf{u}}^{(s)}_k\right) }^{\otimes {m_s}}\otimes \cdots \otimes {\left( {\textbf{u}}^{(t)}_k\right) }^{\otimes {m_t}-1}\otimes \cdots \otimes {\left( {\textbf{u}}^{(1)}_k\right) }^{\otimes {m_1}}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\partial {f_{\mathcal {S}}({\textbf{U}})}}{\partial {U_t}} =-m_t(S-\hat{S})_{(M_t)} {\left( {{U_s}}^{\odot {m_s}}\odot \cdots \odot {{U_t}}^{\odot {m_{t}-1}}\odot \cdots \odot {{U_1}}^{\odot {m_1}}\right) }. \end{aligned}$$

Substitute (3.4) into the above equation, we get that

$$\begin{aligned} \frac{\partial f_{\mathcal {S}}({\textbf{U}})}{\partial {U_t}}=-m_t({S}_{(M_t)}W_t-U_tW_t^T W_t). \end{aligned}$$
(3.5)

This completes the proof. \(\square \)

If the gradient is calculated directly according to the formula (3.5), it will cost a lot of computational expenses. We can save the computational expenses by the following method. The first item \(S_{(M_t)}W_t\) can be obtained by MTTKRP, which is a function in the tensor toolbox [29] to efficiently calculate matricized tensor times Khatri–Rao product. The second item can also avoid forming \(W_t\) since \(W_t^T W_t\) is given by

$$\begin{aligned} W_t^T W_t=(U_s^T U_s)^{*m_s}*\cdots *(U_t^T U_t)^{*{m_t}-1}*\cdots *(U_1^T U_1)^{*m_1}. \end{aligned}$$

The following is the algorithm for calculating gradient value of \(f_{\mathcal {S}}({\textbf{U}})\) defined as (3.2) at a given point.

figure a

3.2 Gradient descent method

We introduce two linear search methods. Armijo inexact search rule adopts the backtracking strategy to obtain the step size. In other words, the step size is picked as \({\alpha }_k={\beta }{\gamma }^{m_k}\), where \(m_k\) is the smallest nonnegative integer satisfying the following condition

$$\begin{aligned} f\left( {{\textbf{u}}}^k+{\beta }{\gamma }^{m_k}{{\textbf{d}}}^k\right) \le {f\left( {{\textbf{u}}}^k\right) }+\sigma \beta {\gamma }^{m_k}\langle {{\textbf{g}}}^k,{{\textbf{d}}}^k\rangle , \end{aligned}$$
(3.6)

where \(\beta >0,\ \sigma ,\ \gamma \in \left( 0,1\right) \).

Another inexact search rule choosing a step size \(\alpha _k>0\) is Armijo-Wolfe conditions [28],

$$\begin{aligned} \left\{ \begin{array}{ll} f\left( {{\textbf{u}}}^k+\alpha {{\textbf{d}}}^k\right) \le {f\left( {{\textbf{u}}}^k\right) }+c_1\alpha \langle {{\textbf{g}}}^k,{{\textbf{d}}}^k\rangle \\ \langle \nabla f\left( {{\textbf{u}}}^k+\alpha {{\textbf{d}}}^k\right) ,{\textbf{d}}^k\rangle \ge c_2\langle {\textbf{g}}^k,{\textbf{d}}^k\rangle , \end{array}\right. \end{aligned}$$
(3.7)

where \(0<c_1<c_2<1\) are the parameters of line search algorithm, the above two condition are called the sufficient decrease condition and curvature condition respectively.

Therefore, the iterative scheme is

$$\begin{aligned} {{\textbf{u}}}^{k+1}={{\textbf{u}}}^k+{\alpha }_k{{\textbf{d}}}^k, \end{aligned}$$

where \(\alpha _k\) is obtained from the above Armijo or Armijo-Wolfe rule, \({{\textbf{d}}}^k\) is the search direction of the k-th iteration. If \({{\textbf{d}}}^k=-\nabla f({\textbf{u}}^k)\), the iteration method is called gradient descent method.

The termination condition of optimization algorithm is that norm of gradient is less than a given criteria, i.e.,

$$\begin{aligned} \Vert {\textbf{g}}^k\Vert _{F}<\epsilon . \end{aligned}$$

For partially symmetric rank-R approximation problem of some partially symmetric tensors, the iterative formula of gradient descent method is that

$$\begin{aligned} {{\textbf{U}}}^{k+1}={{\textbf{U}}}^k+{\alpha }_k{{\textbf{G}}}^k, \end{aligned}$$

where \({\textbf{G}}^k=-\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\), \(\alpha _k\) satisfies the Armijo or Armijo-Wolfe rule.

The following is gradient descent algorithm with Armijo rule (3.6) for the partially symmetric rank-R approximation of a partially symmetric tensor.

figure b

4 BFGS method

We now review the standard BFGS method with Armijo-Wolfe line search for solving the optimization problem \(\min \nolimits _{{\textbf{x}}\in {\mathbb {R}}^n}f({\textbf{x}})\). For more details, please refer to [28]. The iterative formula of BFGS method for solving the problem \(\min \nolimits _{{\textbf{x}}\in {\mathbb {R}}^n}f({\textbf{x}})\) is

$$\begin{aligned} {\textbf{x}}^{k+1} = {\textbf{x}}^{k}-\alpha _k(B^k)^{-1}{\textbf{g}}^k\ \textrm{or}\ {\textbf{x}}^{k+1} = {\textbf{x}}^{k}-\alpha _k H^k{\textbf{g}}^k. \end{aligned}$$

where \({\textbf{g}}^k=\nabla f({\textbf{x}}^k)\), \(\alpha _k\) is step size satisfying Armijo-Wolfe rule. The initial point \({\textbf{x}}^0\) of the iterative algorithm is usually given randomly; \(H^0\) and \(B^0\) must be positive definite matrices, usually the identity matrix.

Let \({\textbf{y}}^k={\textbf{g}}^{k+1}-{\textbf{g}}^{k}\) and \({\textbf{s}}^k={\textbf{x}}^{k+1}-{\textbf{x}}^{k}\). The updated formula of \(B^k\) is that

$$\begin{aligned} B^{k+1} = B^{k}+\frac{{\textbf{y}}^k{({\textbf{y}}^k)}^T}{({\textbf{y}}^k)^T{\textbf{s}}^k} -\frac{B^{k}{\textbf{s}}^k ({\textbf{s}}^k)^T B^{k}}{({\textbf{s}}^k)^T B^k {\textbf{s}}^k}. \end{aligned}$$

The updated formula of \(H^k\) is that

$$\begin{aligned} H^{k+1}=\left( I-\frac{{{\textbf{s}}}^k({{\textbf{y}}}^k)^T}{({\textbf{y}}^k)^T{{\textbf{s}}}^k}\right) H^{k} {\left( I-\frac{{\textbf{y}}^k ({\textbf{s}}^k)^T}{({\textbf{y}}^k)^T{{\textbf{s}}}^k}\right) }+\frac{{{\textbf{s}}}^k({{\textbf{s}}}^k)^T}{({\textbf{y}}^k)^T {\textbf{s}}^k}. \end{aligned}$$

4.1 The BFGS method for partially symmetric rank-R approximation

Following the standard BFGS method, we will give the tensor BFGS method to solve the partially symmetric rank-R approximation of partially symmetric tensor. Let \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor, \({\textbf{U}}=(U^T_1,\cdots ,U_s^T)^T\) be a block-matrix vector, \(U_t\in {\mathbb {R}}^{n_t\times r}\) for \(t\in [s]\) and \(n:=n_1+\cdots +n_s\). The function \(f_{\mathcal {S}}({\textbf{U}})\) is defined as (3.2), then the gradient \(\nabla f_{{\mathcal {S}}}({\textbf{U}})\) of is \(f_{\mathcal {S}}({\textbf{U}})\) that

$$\begin{aligned} \nabla f_{{\mathcal {S}}}({\textbf{U}})= \begin{pmatrix} \frac{\partial f_{{\mathcal {S}}}({\textbf{U}})}{\partial U_1}\\ \frac{\partial f_{{\mathcal {S}}}({\textbf{U}})}{\partial U_2}\\ \vdots \\ \frac{\partial f_{{\mathcal {S}}}({\textbf{U}})}{\partial U_s} \end{pmatrix}\in {\mathbb {R}}^{n\times R}; \end{aligned}$$

the Hessian is that

$$\begin{aligned} {\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})=\left( \frac{\partial ^2 f_{{\mathcal {S}}}({\textbf{U}})}{\partial u_{ij}\partial u_{kl}}\right) \in {\mathbb {R}}^{n\times R\times n\times R}, \end{aligned}$$

which is a fourth-order tensor.

In order to obtain the BFGS iterative formula for tensor, we define a multiplication of two fourth-order tensors and a product between a fourth-order tensor and a matrix. For any two tensors \({\mathcal {A}},\ {\mathcal {B}}\in {\mathbb {R}}^{K\times L\times K\times L}\), define \({\mathcal {C}}={\mathcal {A}}\oslash {\mathcal {B}}\in {\mathbb {R}}^{K\times L\times K\times L}\) as

$$\begin{aligned} c_{i_1i_2i_3i_4}=\sum _{j_1,j_2=1}^{K,L}a_{i_1i_2j_1j_2}b_{j_1j_2i_3i_4}. \end{aligned}$$
(4.1)

For any \({\mathcal {T}}\in {\mathbb {R}}^{K\times L\times K\times L}\) and \(A\in {\mathbb {R}}^{K\times L}\), define \(B={\mathcal {T}} A \in {\mathbb {R}}^{K\times L}\) as

$$\begin{aligned} b_{i_1i_2}=\sum _{i_3,i_4=1}^{K,L} t_{i_1i_2i_3i_4}a_{i_3i_4}. \end{aligned}$$
(4.2)

In the sense of the multiplication of two fourth-order tensors, we define the fourth-order identity tensor \({\mathcal {I}}\in {\mathbb {R}}^{K\times L\times K\times L}\) as

$$\begin{aligned} {({\mathcal {I}})}_{k_1l_1k_2l_2}={\left\{ \begin{array}{ll}1&{} \textrm{if }\ k_1=k_2,\ l_1=l_2,\\ 0&{}\mathrm{otherwise.}\end{array}\right. } \end{aligned}$$
(4.3)

According to the definition (4.3) and (4.2), for any matrix \(A\in {\mathbb {R}}^{K\times L}\), we have \({\mathcal {I}}A=A\). The tensor \({\mathcal {A}}\in {\mathbb {R}}^{K\times L\times K\times L}\) is an invertible tensor, if there is a same-sized tensor \({\mathcal {B}}\) such that \({\mathcal {A}}\oslash {\mathcal {B}}={\mathcal {I}}\) and \({\mathcal {B}}\oslash {\mathcal {A}}={\mathcal {I}}\), and if so, then \({\mathcal {B}}\) is called the inverse of \({\mathcal {A}}\). The inverse of \({\mathcal {A}}\) is denoted by \({\mathcal {A}}^{-1}\).

Lemma 4.1

Let fourth-order tensor \({\mathcal {T}}\in {\mathbb {R}}^{K\times L\times K\times L}\) be invertible and its elements satisfy \({\mathcal {T}}_{k_1l_1k_2l_2}={\mathcal {T}}_{k_2l_2k_1l_1}\), and \(A, B, C\in {\mathbb {R}}^{K\times L}\). Then

  1. (1)

    \(\langle {\mathcal {T}}^{-1}A,{\mathcal {T}}B\rangle =\langle A,B\rangle .\)

  2. (2)

    \((A\circ B)C=A\langle B,C\rangle .\)

Proof

According to the multiplication defined as (4.2),

$$\begin{aligned} \langle {\mathcal {T}}^{-1}A,{\mathcal {T}}B\rangle= & {} \sum _{i,j} \left( \sum _{k_1,l_2} {{\mathcal {T}}^{-1}}_{ijk_1l_1}A_{k_1l_1}\right) \left( \sum _{k_1,l_2} {\mathcal {T}}_{ijk_2l_2}B_{k_2l_2}\right) \\= & {} \sum _{k_1,l_1}\sum _{k_1,l_2} A_{k_1l_1}B_{k_2l_2}\sum _{i,j}{{\mathcal {T}}^{-1}}_{ijk_1l_1}{\mathcal {T}}_{ijk_2l_2} \end{aligned}$$

Since \({\mathcal {T}}_{k_1l_1k_2l_2}={\mathcal {T}}_{k_2l_2k_1l_1}\), we have that

$$\begin{aligned} \langle {\mathcal {T}}^{-1}A,{\mathcal {T}}B\rangle= & {} \sum _{k_2,l_2}\sum _{k_1,l_1}^{} {\mathcal {I}}_{k_2l_2k_1l_1}A_{k_1l_1}B_{k_2l_2}\\= & {} \sum _{k_2,l_2}A_{k_2l_2}B_{k_2l_2}. \end{aligned}$$

Therefore, \(\langle {\mathcal {T}}^{-1}A,{\mathcal {T}}B\rangle =\langle A,B\rangle .\) According to the multiplication defined as (4.2) and tensor outer product defined as (2.1),

$$\begin{aligned} \left( (A\circ B)C\right) _{ij}= & {} \sum _{k=1}^{K}\sum _{l=1}^{L}(A\circ B)_{ijkl}c_{kl}\\= & {} \sum _{k=1}^{K}\sum _{l=1}^{L}a_{ij}b_{kl}c_{kl}\\= & {} a_{ij}\sum _{k=1}^{K}\sum _{l=1}^{L}b_{kl}c_{kl}. \end{aligned}$$

Hence, \((A\circ B)C = A\langle B,C\rangle \). The proof is completed. \(\square \)

Let \(f_{{\mathcal {S}}}({\textbf{U}})\) be defined as (3.2), then the iterative formula of BFGS algorithm with Armijo-Wolfe line search for solving the optimization problem \(\min _{{\textbf{U}}} f_{{\mathcal {S}}}({\textbf{U}})\) is

$$\begin{aligned} {\textbf{U}}^{k+1} = {\textbf{U}}^{k}-\alpha _k({\mathcal {B}}^k)^{-1}{\textbf{G}}^k, \end{aligned}$$

where \({\textbf{G}}^k\) is gradient value of \(f_{{\mathcal {S}}}({\textbf{U}})\) at \({\textbf{U}}^k\), i.e., \({\textbf{G}}^k=\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\), the step size \(\alpha _k\) satisfies Armijo-Wolfe rule.

The updated formula of \({\mathcal {B}}^k\) is that

$$\begin{aligned} {\mathcal {B}}^{k+1} = {\mathcal {B}}^{k}+\frac{{\textbf{Y}}^k\circ {{\textbf{Y}}^k}}{\langle {\textbf{Y}}^k,{\textbf{S}}^k\rangle } -\frac{\left( {\mathcal {B}}^{k}{\textbf{S}}^k\right) \circ ({\mathcal {B}}^{k}{\textbf{S}}^k)}{\langle {\mathcal {B}}^{k}{\textbf{S}}^k,{\textbf{S}}^k\rangle }. \end{aligned}$$
(4.4)

where \({\textbf{Y}}^k=\nabla f_{{\mathcal {S}}}({\textbf{U}}^{k+1})-\nabla f_{{\mathcal {S}}}({\textbf{U}}^{k})\), \({\textbf{S}}^k={\textbf{U}}^{k+1}-{\textbf{U}}^{k}\).

Lemma 4.2

Let \({\mathcal {B}}^{k+1}\) be defined as (4.4). Then, the inverse of \({\mathcal {B}}^{k+1}\) can be given by

$$\begin{aligned} {\mathcal {H}}^{k+1}=\left( {\mathcal {I}}-\frac{{\textbf{S}}^k\circ {\textbf{Y}}^k}{\langle {\textbf{S}}^k,{\textbf{Y}}^k\rangle }\right) \oslash {\mathcal {H}}^{k}\oslash \left( {\mathcal {I}}-\frac{{\textbf{Y}}^k\circ {\textbf{S}}^k}{\langle {\textbf{S}}^k,{\textbf{Y}}^k\rangle }\right) +\frac{{\textbf{S}}^k\circ {\textbf{S}}^k}{\langle {\textbf{S}}^k,{\textbf{Y}}^k\rangle }. \end{aligned}$$
(4.5)

Proof

In order to prove that \({\mathcal {H}}^{k+1}\) is the inverse of \({\mathcal {B}}^{k+1}\), we just need to prove that, for any block-matrix \({\textbf{X}}\), \({\mathcal {B}}^{k+1}({\mathcal {H}}^{K+1}{\textbf{X}})={\textbf{X}}\). Let \(\langle {\textbf{S}}^k,{\textbf{Y}}^k\rangle = \frac{1}{\mu }\), \(\langle {\mathcal {B}}^k{\textbf{S}}^k,{\textbf{S}}^k\rangle =\frac{1}{\nu }\). From the formula (4.4), we have

$$\begin{aligned} {\mathcal {B}}^{k+1}({\mathcal {H}}^{k+1}X) \nonumber= & {} {\mathcal {B}}^k({\mathcal {H}}^{k+1}X)+\mu ({\textbf{Y}}^k\circ {\textbf{Y}}^k)({\mathcal {H}}^{k+1}X)\\{} & {} -\nu ({\mathcal {B}}^{k}{\textbf{S}}^k\circ {\mathcal {B}}^{k}{\textbf{S}}^k)({\mathcal {H}}^{k+1}X). \end{aligned}$$
(4.6)

According to the Lemma 4.1,

$$\begin{aligned} {\mathcal {H}}^{k+1}X \nonumber= & {} {\mathcal {H}}^k{\textbf{X}}-\mu {\mathcal {H}}^k{\textbf{Y}}^k \langle {\textbf{S}}^k,{\textbf{X}}\rangle -\mu {\textbf{S}}^k \langle {\textbf{Y}}^k,{\mathcal {H}}^k{\textbf{X}}\rangle \\{} & {} +\mu ^2{\textbf{S}}^k\langle {\textbf{Y}}^k,{\mathcal {H}}^k{\textbf{X}}\rangle \langle {\textbf{S}}^k,{\textbf{X}}\rangle +\mu {\textbf{S}}^k \langle {\textbf{S}}^k,{\textbf{X}}\rangle . \end{aligned}$$
(4.7)

By putting (4.7) into (4.6) and using the Lemma 4.1, we get \({\mathcal {B}}^{k+1}({\mathcal {H}}^{K+1}{\textbf{X}})={\textbf{X}}\). The proof is completed. \(\Box \) \(\square \)

Now, we will give the BFGS algorithm with Armijo-Wolfe step size (3.7) for the positive partially symmetric rank-R approximation of a partially symmetric tensor as follows. The initial factor matrix \({\textbf{U}}^1\) is obtained randomly and \({\mathcal {H}}^1\) is usually selected as fourth-order identity tensor \({\mathcal {I}}\) defined as (4.3).

figure c

4.2 Convergence analysis of BFGS algorithm

In this subsection, the super-linear convergence of algorithm will be discussed. The proof of following results refers to [28]. First, we give the relevant assumptions.

Assumption 4.1

Let \(s,r>0\) be two integers and \({\textbf{m}}=(m_1,m_2,\cdots ,m_s),\ {\textbf{n}}=(n_1,n_2,\cdots ,n_s)\in {\mathbb {N}}^s_{+}\). Let \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor, \(n=n_1+\cdots +n_s\). The function \(f_{{\mathcal {S}}}({\textbf{U}})\) is defined as (3.2), where the block-matrix vector \({\textbf{U}}=(U_1^T,U_2^T,\cdots ,U_s^T)^T\in {\mathbb {R}}^{n\times r}\), \(U_t\in {\mathbb {R}}^{n_t\times r}\) for all \(t\in [s]\).

  1. 1)

    \(\langle \nabla ^2 f_{{\mathcal {S}}} (\mathbf {{\textbf{U}}}^{*})\mathbf {{\textbf{W}}},\mathbf {{\textbf{W}}}\rangle >0\) for any \(\mathbf {{\textbf{W}}}\in {\mathbb {R}}^{n\times r}\), where \(\mathbf {{\textbf{U}}}^{*}\) is a local minimum point of \(f_{{\mathcal {S}}}(\mathbf {{\textbf{U}}})\);

  2. 2)

    there is a neighborhood \(N({\textbf{U}}^{*})\) of \({\textbf{U}}^{*}\) and a constant \(L>0\) such that for any \({\textbf{U}},\ {\textbf{V}}\in N({\textbf{U}}^{*})\)

    $$\begin{aligned} \Vert {\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})-{\nabla }^2f_{{\mathcal {S}}}({\textbf{V}})\Vert _F\le {L}\Vert {\textbf{U}}-{\textbf{V}}\Vert _F. \end{aligned}$$

Lemma 4.3

If the function \(f_{{\mathcal {S}}}({\textbf{U}})\) satisfies the second condition of Assumption 4.1, then for any \({\textbf{U}},{\textbf{V}},{\textbf{W}}\in {\mathbb {R}}^{n\times r}\),

$$\begin{aligned} \Vert {\nabla }f_{{\mathcal {S}}}({\textbf{V}})-{\nabla }f_{{\mathcal {S}}}({\textbf{W}}) -{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F \le L\frac{\Vert {\textbf{U}}-{\textbf{W}}\Vert _F +\Vert {\textbf{U}}-{\textbf{V}}\Vert _F}{2}\Vert {\textbf{V}}-{\textbf{W}}\Vert _F. \end{aligned}$$
(4.8)

Forthermore, if \({\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})\) is invertible and for every \(\varepsilon \in (0,\frac{1}{L\Vert ({\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}))^{-1}\Vert _F})\), for any \({\textbf{V}}\) and \({\textbf{W}}\) that satisfy \(\Vert {\textbf{U}}-{\textbf{W}}\Vert _F <\varepsilon \) and \(\Vert {\textbf{U}}-{\textbf{V}}\Vert _F <\varepsilon , \) respectively, then there are constants \(\beta>\alpha >0\) related to \({\textbf{U}}\) such that

$$\begin{aligned} \alpha \Vert {\textbf{V}}-{\textbf{W}}\Vert _F\le \Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})\Vert _F\le \beta \Vert {\textbf{V}}-{\textbf{W}}\Vert _F. \end{aligned}$$
(4.9)

Proof

For any \({\textbf{U}},{\textbf{V}},{\textbf{W}}\in {\mathbb {R}}^{n\times r}\), according to the Cauchy-Schwarz inequality,

$$\begin{aligned}{} & {} \Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})-\nabla ^2 f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F\\= & {} \left\| \int \nolimits _0^1(\nabla ^2f_{{\mathcal {S}}}({\textbf{W}}+\tau ({\textbf{V}}-{\textbf{W}}))-\nabla ^2f_{{\mathcal {S}}}({\textbf{U}}))({\textbf{V}}-{\textbf{W}})d\tau \right\| _F\\\le & {} \Vert {\textbf{V}}-{\textbf{W}}\Vert _F \int \nolimits _0^1 L\Vert {\textbf{W}}+\tau ({\textbf{V}}-{\textbf{W}})-{\textbf{U}}\Vert _Fd\tau \\\le & {} L\Vert {\textbf{V}}-{\textbf{W}}\Vert _F \int \nolimits _0^1 (\tau \Vert {\textbf{V}}-{\textbf{U}}\Vert _F+(1-\tau )\Vert {\textbf{W}}-{\textbf{U}}\Vert _F)d\tau \\= & {} L \Vert {\textbf{V}}-{\textbf{W}}\Vert _F \frac{\Vert {\textbf{V}}-{\textbf{U}}\Vert _F+\Vert {\textbf{W}}-{\textbf{U}}\Vert _F}{2}. \end{aligned}$$

For every given \(\varepsilon \in (0,\frac{1}{L\Vert [{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})]^{-1}\Vert _F})\), any \({\textbf{U}}\) and \({\textbf{V}}\) satisfying

$$\begin{aligned} \max \{{\Vert {\textbf{V}}-{\textbf{U}}\Vert _F},{\Vert {\textbf{W}}-{\textbf{U}}\Vert _F}\}<\varepsilon , \end{aligned}$$

we have that

$$\begin{aligned}{} & {} \Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})\Vert _F\\\le & {} \Vert \nabla ^2f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F+\Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F\\\le & {} \left( \Vert \nabla ^2f_{{\mathcal {S}}}({\textbf{U}})\Vert _F+\frac{L(\Vert {\textbf{V}}-{\textbf{U}}\Vert _F+\Vert {\textbf{W}}-{\textbf{U}}\Vert _F)}{2}\right) \Vert {\textbf{V}}-{\textbf{W}}\Vert _F\\\le & {} (\Vert \nabla ^2f_{{\mathcal {S}}}({\textbf{U}})\Vert _F+L\varepsilon )\Vert {\textbf{V}}-{\textbf{W}}\Vert _F. \end{aligned}$$

Let \(\beta =\Vert \nabla ^2f_{{\mathcal {S}}}({\textbf{U}})\Vert _F+L\varepsilon \), then the second inequality of (4.9) holds. Using the properties of the norm, we have that

$$\begin{aligned}{} & {} \Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})\Vert _F\\\ge & {} \Vert \nabla ^2f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F-\Vert \nabla f_{{\mathcal {S}}}({\textbf{V}})-\nabla f_{{\mathcal {S}}}({\textbf{W}})-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}})({\textbf{V}}-{\textbf{W}})\Vert _F\\\ge & {} \left( 1/\Vert {(\nabla ^2f_{{\mathcal {S}}}({\textbf{U}}))}^{-1}\Vert _F-\frac{L(\Vert {\textbf{V}}-{\textbf{U}}\Vert _F+\Vert {\textbf{W}}-{\textbf{U}}\Vert _F)}{2}\right) \Vert {\textbf{V}}-{\textbf{W}}\Vert _F\\\ge & {} (\Vert 1/\Vert {(\nabla ^2f_{{\mathcal {S}}}({\textbf{U}}))}^{-1}\Vert _F-L\varepsilon )\Vert {\textbf{V}}-{\textbf{W}}\Vert _F. \end{aligned}$$

Let \(\alpha =(\Vert 1/\Vert {(\nabla ^2f_{{\mathcal {S}}}({\textbf{U}}))}^{-1}\Vert _F-L\varepsilon )\), then The first inequality of (4.9) holds. The proof is completed. \(\square \)

Theorem 4.1

Assume that the function \(f_{{\mathcal {S}}}({\textbf{U}})\) satisfies the Assumption 4.1, \(\{{\mathcal {B}}^k\}\) and \(\{{\textbf{U}}^k\}\) are obtained by Algorithm 3 without line search (that is, the step size \(\alpha _k\) is uniformly 1), and \(\{{\textbf{U}}^k\}\) converges to a local minimum point \({\textbf{U}}^{*}\) of \(f_{{\mathcal {S}}}({\textbf{U}})\). Then the point sequence \(\{{\textbf{U}}^k\}\) super-linear converges to \({\textbf{U}}^{*}\) if and only if

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{\Vert ({\mathcal {B}}^k-{\nabla }^2f({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$
(4.10)

Proof

(\(\Leftarrow \)) First we prove the sufficiency. Let \({\textbf{S}}^k={\textbf{U}}^{k+1}-{\textbf{U}}^k\). Then

$$\begin{aligned} ({\mathcal {B}}^k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*})){\textbf{S}}^k= & {} -\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)-{\nabla }^2 f_{{\mathcal {S}}}({\textbf{U}}^{*}){\textbf{S}}^k\\= & {} \nabla f_{{\mathcal {S}}}({\textbf{U}}^{k+1})-{\nabla }f_{{\mathcal {S}}}({\textbf{U}}^k)-{\nabla }^2f_{{\mathcal {S}}} ({\textbf{U}}^{*}){\textbf{S}}^k-{\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1}). \end{aligned}$$

According to the triangle inequality and Lemma 4.3, for any \( k\ge 0\), we have that

$$\begin{aligned} 0\le & {} \frac{\Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F}{\Vert {\textbf{S}}^k\Vert _F}\nonumber \\\le & {} \frac{\Vert ({\mathcal {B}}_k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*})) {\textbf{S}}^k\Vert _F}{\Vert {\textbf{S}}^k\Vert _F} + \frac{\Vert {\nabla }f_{{\mathcal {S}}} ({\textbf{U}}^{k+1}) -{\nabla }f_{{\mathcal {S}}}({\textbf{U}}^k)-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*}){\textbf{S}}^k\Vert _F}{\Vert {\textbf{S}}^k\Vert _F}\nonumber \\\le & {} \frac{\Vert ({\mathcal {B}}^k- {\nabla }^2f_{{\mathcal {S}}} ({\textbf{U}}^{*})) {\textbf{S}}^k\Vert _F}{\Vert {\textbf{S}}^k\Vert _F}+ \frac{L}{2} (\Vert {\textbf{U}}^k -{\textbf{U}}^{*}\Vert _F+\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F). \end{aligned}$$
(4.11)

If (4.10) holds, substituting (4.10) and the assumption \(\lim \limits _{k\rightarrow \infty }{{\textbf{U}}^k}={\textbf{U}}^{*}\) into (4.11), we get

$$\begin{aligned} \lim \limits _{{k\rightarrow \infty }}\frac{\Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F}{\Vert {\textbf{S}}^k\Vert _F}=0. \end{aligned}$$
(4.12)

Because \({\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*})\) is invertible and \({\textbf{U}}^k\rightarrow {\textbf{U}}^{*}\), according to Lemma 4.3, there exist a real number \(\alpha >0\) and an integer \(k_0>0\), for any integer \( k\ge {k_0}\), such that

$$\begin{aligned} \Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F=\Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})-{\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{*})\Vert _F\ge \alpha \Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F. \end{aligned}$$

Thus

$$\begin{aligned} \frac{\Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{k}\Vert _F}\ge \frac{\alpha \Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F+\Vert {\textbf{U}}^{k}-{\textbf{U}}^{*}\Vert _F} =\frac{\alpha \frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k} -{\textbf{U}}^{*}\Vert _F}}{1+\frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k} -{\textbf{U}}^{*}\Vert _F}}>0, \end{aligned}$$

Since the left limit of the above inequality is zero by (4.12), so the right limit of the above inequality is also zero, i.e.,

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k} -{\textbf{U}}^{*}\Vert _F}}{1+\frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k} -{\textbf{U}}^{*}\Vert _F}}=0. \end{aligned}$$

It follows that

$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k} -{\textbf{U}}^{*}\Vert _F}=0, \end{aligned}$$

which means that \(\{{\textbf{U}}^k\}\) super-linear converges to \({\textbf{U}}^{*}\).

(\(\Rightarrow \)) Next we prove the necessity. By Lemma 4.3, there exist a real number \(\beta >0\) and an integer \(k_0>0\), for any integer \( k\ge {k_0}\), such that

$$\begin{aligned} \Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F=\Vert {\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})-{\nabla }f_{{\mathcal {S}}}({\textbf{U}}^{*})\Vert _F\le \beta \Vert {{\textbf{U}}^{k+1}}-{\textbf{U}}^{*}\Vert _F. \end{aligned}$$

Because \(\{{\textbf{U}}^k\}\) super-linear converges \({\textbf{U}}^{*}\), we have that

$$\begin{aligned} 0= & {} \lim \limits _{k\rightarrow \infty } \frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k}-{\textbf{U}}^{*}\Vert _F}\\\ge & {} \lim \limits _{k\rightarrow \infty }\frac{{\Vert \nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F}{\beta \Vert {\textbf{U}}^{k}-{\textbf{U}}^{*}\Vert _F}\\= & {} \lim \limits _{k\rightarrow \infty } \frac{1}{\beta }\frac{\Vert \nabla f_{\mathcal {S}}({\textbf{U}}^{k+1})\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F} \frac{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}{\Vert {\textbf{U}}^{k}-{\textbf{U}}^{*}\Vert _F}. \end{aligned}$$

Then

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{{\Vert \nabla }f_{{\mathcal {S}}}({\textbf{U}}^{k+1})\Vert _F}{\Vert {\textbf{U}}^{k+1}-{{\textbf{U}}^k}\Vert _F}=0. \end{aligned}$$

Thus,the equation (4.10) holds. This completes the proof. \(\square \)

Based on the Theorem 4.1, we give a theorem of super-linear convergence of the Algorithm 3.

Theorem 4.2

Let \(\{{\mathcal {B}}^k\}\) and \(\{{\textbf{U}}^k\}\) be obtained by Algorithm 3. Suppose \(f_{{\mathcal {S}}}({\textbf{U}})\) satisfies the Assumption 4.1, \(\{{\textbf{U}}^k\}\) converges to local minimum point \({\textbf{U}}^{*}\), and

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{\Vert ({\mathcal {B}}^k-{\nabla }^2f({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$
(4.13)

Then, \(\{{\textbf{U}}^k\}\) converges to \({\textbf{U}}^{*}\) super-linearly if and only if step size \(\{\alpha _k\}\) converges to 1.

Proof

First, we prove the sufficiency. From the trigonometric inequality, we get

$$\begin{aligned}{} & {} \frac{\Vert (\alpha _k^{-1} {\mathcal {B}}^k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}\\\le & {} \frac{\Vert ({\mathcal {B}}^k-{\nabla }^2f({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}+\frac{\Vert (\alpha _k^{-1}-1){\mathcal {B}}^k({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}. \end{aligned}$$

If (4.13) holds, and \(\{\alpha _k\}\) converges to 1, then the limit on the right side of the above inequality is equal to 0 when \(k\rightarrow \infty \). It follows that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{\Vert (\alpha _k^{-1} {\mathcal {B}}^k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$
(4.14)

By the Theorem 4.1 and (4.14), we can get that \(\{{\textbf{U}}^k\}\) superlinearly converges to \({\textbf{U}}^{*}\).

Next, we prove the necessity. If \(\{{\textbf{U}}^k\}\) converges to \({\textbf{U}}^{*}\) super-linearly, by the Theorem 4.1, it is followed that (4.14) holds. Since \({\mathcal {B}}^k({\textbf{U}}^{k+1}-{\textbf{U}}^k)=-\alpha _k\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\), by (4.13), we have that

$$\begin{aligned}{} & {} \lim \limits _{k\rightarrow \infty }\frac{\Vert (\alpha _k-1)\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F} = \lim \limits _{k\rightarrow \infty }\frac{\Vert (\alpha _k^{-1}-1) {\mathcal {B}}^k({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}\\\le & {} \lim \limits _{k\rightarrow \infty } \frac{\Vert (\alpha _k^{-1} {\mathcal {B}}^k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}\\{} & {} +\lim \limits _{k\rightarrow \infty } \frac{\Vert ({\mathcal {B}}^k-{\nabla }^2f_{{\mathcal {S}}}({\textbf{U}}^{*}))({\textbf{U}}^{k+1}-{\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$

Hence,

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{\Vert (\alpha _k-1)\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$
(4.15)

By Assumption 4.1 and Lemma 4.3, there exist a real number \(\alpha >0\) and an integer \(k_0>0\), for any integer \( k\ge {k_0}\), such that

$$\begin{aligned} \alpha \Vert {\textbf{U}}^k-{\textbf{U}}^{*}\Vert _F \le \Vert \nabla f_{{\mathcal {S}}}({\textbf{U}}^k)-\nabla f_{{\mathcal {S}}}({\textbf{U}}^{*})\Vert _F =\Vert \nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\Vert _F . \end{aligned}$$

So

$$\begin{aligned} \alpha |\alpha _k-1| \frac{\Vert {\textbf{U}}^k-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}\le \frac{\Vert (\alpha _k-1)\nabla f_{{\mathcal {S}}}({\textbf{U}}^k)\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}. \end{aligned}$$
(4.16)

By (4.15) and (4.16), we have that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\alpha |\alpha _k-1| \frac{\Vert {\textbf{U}}^k-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=0. \end{aligned}$$

Since again \(\{{\textbf{U}}^k\}\) super-linearly converges to \({\textbf{U}}^{*}\), so

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \frac{\Vert {\textbf{U}}^k-{\textbf{U}}^{*}\Vert _F}{\Vert {\textbf{U}}^{k+1}-{\textbf{U}}^k\Vert _F}=1. \end{aligned}$$

Hence \(\lim \limits _{k\rightarrow \infty }\alpha |\alpha _k-1|=0\), i.e., \(\lim \limits _{k\rightarrow \infty }\alpha _k=1\). This completes the proof. \(\square \)

5 Discrimination of positive partially symmetric CP decompositions

Let \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\). When all orders \(m_1,m_2,\cdots ,m_s\) are even, according to the Theorem 2.1, the tensor \({\mathcal {S}}\) may not have positive partially symmetric CP decomposition. In this case, we cannot obtain its partially symmetric CP decomposition by Algorithm 3. Therefore, we discuss how to identify whether even order partially symmetric tensors have positive partially symmetric CP decompositions in real number field.

Theorem 5.1

Let \({\textbf{m}}=(m_1,m_2,\cdots ,m_s)\), \({\textbf{n}}=(n_1,n_2,\cdots ,n_s)\), and \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor. For any vectors \({\textbf{x}}^{(i)}\in {\mathbb {R}}^{n_i}\) and \({\textbf{y}}^{(j)}\in {\mathbb {R}}^{n_j}\), \(i,j\in [s]\), denote

$$\begin{aligned} P^t_{{\textbf{m}}}(m_t-k,k):=({\textbf{x}}^{(1)})^{m_1}\circ \cdots \circ ({\textbf{x}}^{(t)})^{m_t-k}\circ ({\textbf{y}}^{(t)})^k\circ \cdots \circ ({\textbf{x}}^{(s)})^{m_s}, \end{aligned}$$

where \(k\in \{0,1,\cdots ,m_t\}.\) Then,

$$\begin{aligned} \langle {\mathcal {S}},({\textbf{x}}^{(1)})^{m_1}\circ \cdots \circ ({\textbf{x}}^{(t)}+{\textbf{y}}^{(t)})^{m_t}\circ \cdots \circ ({\textbf{x}}^{(s)})^{m_s}\rangle =\sum \limits _{k=0}^{m_t}\left( {\begin{array}{c}m_t\\ k\end{array}}\right) \langle {\mathcal {S}},P^t_{{\textbf{m}}}(m_t-k,k)\rangle . \end{aligned}$$

Proof

Let \(I_k:=\{{\textbf{i}}_t=(i_{t,1},i_{t,2},\cdots ,i_{t,m_t})|i_{t,1},\cdots , i_{t,m_t}\in [n_t]\}\), \(t\in [s]\). For any vectors \({\textbf{x}}^{(i)}\in {\mathbb {R}}^{n_i}\) and \({\textbf{y}}^{(j)}\in {\mathbb {R}}^{n_j}\), \(i,j\in [s]\),

$$\begin{aligned}{} & {} \langle {\mathcal {S}},({\textbf{x}}^{(1)})^{m_1}\circ \cdots \circ ({\textbf{x}}^{(t)}+{\textbf{y}}^{(t)})^{m_t}\circ \cdots \circ ({\textbf{x}}^{(s)})^{m_s}\rangle \\= & {} \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]} {\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s}(({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1} \cdots \\{} & {} \cdot ({\textbf{x}}^{(t)}+{\textbf{y}}^{(t)})_{i_{t,1}}\cdots ({\textbf{x}}^{(t)}+{\textbf{y}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}\\= & {} \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]}{\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s}(({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1} \cdots ({\textbf{x}}^{(t)})_{i_{t,1}}\cdots ({\textbf{x}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}\\{} & {} +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]} {\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s}(({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1} \cdots \\{} & {} \cdot ({\textbf{x}}^{(t)})_{i_{t,1}}\cdots ({\textbf{x}}^{(t)})_{i_{t,m_{t-1}}}({\textbf{y}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}\\{} & {} +\left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]} {\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s}(({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1} \cdots \\{} & {} \cdot ({\textbf{x}}^{(t)})_{i_{t,1}}\cdots ({\textbf{x}}^{(t)})_{i_{t,m_{t-2}}}({\textbf{y}}^{(t)})_{i_{t,m_{t-1}}}({\textbf{y}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}\\{} & {} +\cdots +\left( {\begin{array}{c}m_t\\ m_t\end{array}}\right) \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]} {\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s}(({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1} \cdots \\{} & {} \cdot ({\textbf{y}}^{(t)})_{i_{t,1}}\cdots ({\textbf{y}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}. \end{aligned}$$

For \(k=0,1,2,\cdots ,m_t\), it is obtained that

$$\begin{aligned} \langle {\mathcal {S}},P^t_{{\textbf{m}}}(m_t-k,k)\rangle= & {} \sum \limits _{{\textbf{i}}_t\in I_t,t\in [s]}{\mathcal {S}}_{{\textbf{i}}_1\cdots {\textbf{i}}_t\cdots {\textbf{i}}_s} (({\textbf{x}}^{(1)})^{m_1})_{{\textbf{i}}_1}\cdots ({\textbf{x}}^{(t)})_{i_{t,1}}\cdots \\{} & {} \cdot ({\textbf{x}}^{(t)})_{i_{t,m_t-k}} ({\textbf{y}}^{(t)})_{i_{t,m_t-k+1}}\cdots ({\textbf{y}}^{(t)})_{i_{t,m_t}}\cdots (({\textbf{x}}^{(s)})^{m_s})_{{\textbf{i}}_s}. \end{aligned}$$

Hence,

$$\begin{aligned}{} & {} \langle {\mathcal {S}},({\textbf{x}}^{(1)})^{m_1}\circ \cdots \circ ({\textbf{x}}^{(t)}+{\textbf{y}}^{(t)})^{m_t}\circ \cdots \circ ({\textbf{x}}^{(s)})^{m_s}\rangle \\= & {} \sum \limits _{k=0}^{m_t}\left( {\begin{array}{c}m_t\\ k\end{array}}\right) \langle {\mathcal {S}},P^t_{{\textbf{m}}}(m_t-k,k)\rangle . \end{aligned}$$

This completes the proof. \(\square \)

Theorem 5.2

Let \({\mathcal {S}}_1\), \({\mathcal {S}}_2\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be two partially symmetric tensors. Assume that all orders \(m_1,m_2,\cdots ,m_s\) are even. Then \(\langle {\mathcal {S}}_1,{\mathcal {S}}_2\rangle \ge 0\) if \({\mathcal {S}}_1\) and \({\mathcal {S}}_2\) both have positive partially symmetric CP decomposition.

Proof

Since \({\mathcal {S}}_1,{\mathcal {S}}_2\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\), and they have positive partially symmetric CP decomposition. Hence, \({\mathcal {S}}_1\) and \({\mathcal {S}}_2\) can be represented as the following form

$$\begin{aligned} {\mathcal {S}}_1= & {} \sum \limits _{k=1}^{R_1}{({\textbf{u}}^{(k)}_1)}^{m_1}\circ \cdots \circ {({\textbf{u}}^{(k)}_s)}^{m_s},\\ {\mathcal {S}}_2= & {} \sum \limits _{l=1}^{R_2}{({\textbf{v}}^{(l)}_1)}^{m_1}\circ \cdots \circ {({\textbf{v}}^{(l)}_s)}^{m_s}, \end{aligned}$$

where \({\textbf{u}}_t^{(k)},{\textbf{v}}_t^{(l)}\in {\mathbb {R}}^{n_t}\), \(k\in [R_1],l\in [R_2], t\in [s]\).

Then, the inner product of \({\mathcal {S}}_1\) and \({\mathcal {S}}_2\) is

$$\begin{aligned} \langle {\mathcal {S}}_1,{\mathcal {S}}_2\rangle= & {} \langle \sum \limits _{k=1}^{R_1}{({\textbf{u}}^{(k)}_1)}^{m_1}\circ \cdots \circ {({\textbf{u}}^{(k)}_s)}^{m_s},\sum \limits _{l=1}^{R_2}{({\textbf{v}}^{(l)}_1)}^{m_1}\circ \cdots \circ {({\textbf{v}}^{(l)}_s)}^{m_s}\rangle \\= & {} \sum \limits _{k=1}^{R_1}\sum \limits _{l=1}^{R_2}\langle {({\textbf{u}}^{(k)}_1)}^{m_1}\circ \cdots \circ {({\textbf{u}}^{(k)}_s)}^{m_s},{({\textbf{v}}^{(l)}_1)}^{m_1}\circ \cdots \circ {({\textbf{v}}^{(l)}_s)}^{m_s}\rangle \\= & {} \sum \limits _{k=1}^{R_1}\sum \limits _{l=1}^{R_2} {\left( \langle {{\textbf{u}}^{(k)}_1},{\textbf{v}}^{(l)}_1\rangle \right) }^{m_1}\cdots \left( \langle {\textbf{u}}^{(k)}_s,{\textbf{v}}^{(l)}_s\rangle \right) ^{m_s}.\\ \end{aligned}$$

Since orders \(m_1,m_2,\cdots ,m_s\) are even, \(\left( \langle {\textbf{u}}^{(k)}_t,{\textbf{v}}^{(l)}_t\rangle \right) ^{m_t}\ge 0\), for all \(k\in [R_1],l\in [R_2], t\in [s]\). Hence, it is followed that \(\langle {\mathcal {S}}_1,{\mathcal {S}}_2\rangle \ge 0\). The proof is completed. \(\square \)

Assume that \(\{U_1,U_2,\cdots ,U_s\}\) is a tuple of factor matrices of \(\hat{{\mathcal {S}}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\), i.e.,

$$\begin{aligned} \hat{{\mathcal {S}}}=[\![U_1^{\times m_1},U_2^{\times m_2},\cdots ,U_s^{\times m_s}]\!]. \end{aligned}$$

In the next, we see the object function \(F_{{\mathcal {S}}}(\hat{{\mathcal {S}}})=\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert _F^2\) as a function of factor matrix \(U_t, t\in [s]\), i.e.,

$$\begin{aligned} F_{{\mathcal {S}}}(U_t)=F_{{\mathcal {S}}}(\hat{{\mathcal {S}}})=\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert _F^2. \end{aligned}$$
(5.1)

For convenience, we denote

$$\begin{aligned}{} & {} \hat{{\mathcal {S}}}[U_t^{\times k},{\varDelta U_t}^{\times (m_t-k-l)},U_t^{\times l}]\\:= & {} [\![U_1^{\times m_1},\cdots , U_{t-1}^{\times m_{t-1}}, U_t^{\times k},{\varDelta U_t}^{\times (m_t-k-l)},U_t^{\times l},U_{t+1}^{\times m_{t+1}}, \cdots , U_s^{\times m_s}]\!]. \end{aligned}$$

Next, in order to study the convexity of the function \(F_{{\mathcal {S}}}(U_t)\), we derive a second-order Taylor formula of \(F_{{\mathcal {S}}}(U_t)\).

Theorem 5.3

Let \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor and \(F_{{\mathcal {S}}}(U_t)\) is defined as (5.1). Let

$$\begin{aligned} L_1:= & {} -2 \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle {\mathcal {S}}-\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\rangle , \end{aligned}$$
(5.2)
$$\begin{aligned} L_2:= & {} -2 \left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle {\mathcal {S}}-\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} \quad +\Vert \hat{{\mathcal {S}}}[\varDelta U_t,U_t^{\times (k-1)}]+\hat{{\mathcal {S}}}[U_t,\varDelta U_t,U_t^{\times (k-2)}]+\cdots +\hat{{\mathcal {S}}}[U_t^{\times (k-1)},\varDelta U_t]\Vert _F^2.\nonumber \\ \end{aligned}$$
(5.3)

Then, the second-order Taylor formula of the function \(F_{{\mathcal {S}}}(U_t)\) is as follows

$$\begin{aligned} F_{{\mathcal {S}}}{(U_t+\varDelta U_t)}=F_{{\mathcal {S}}}{(U_t)}+L_1+L_2+o(\Vert \varDelta U_t\Vert _F^2). \end{aligned}$$
(5.4)

Proof

From (5.1), \(F_{{\mathcal {S}}}{(U_t+\varDelta U_t)}=\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}]\Vert _F^2\). Then,

$$\begin{aligned} F_{{\mathcal {S}}}{(U_t+\varDelta U_t)}= & {} \langle {\mathcal {S}},{\mathcal {S}} \rangle -2\langle {\mathcal {S}},\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}]\rangle \nonumber \\{} & {} \quad +\langle \hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}],\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}] \rangle . \end{aligned}$$
(5.5)

Now, we compute the first degree term and the second degree term of \(\triangle U_t\) in the right hand side of (5.5). Since,

$$\begin{aligned} \langle {\mathcal {S}},\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}] \rangle = \langle {\mathcal {S}},\sum \limits _{k=1}^{R} ({{\textbf{u}}}_1^{(k)})^{m_1}\circ \cdots \circ ({{\textbf{u}}}_t^{(k)}+\varDelta {{\textbf{u}}}_t^{(k)})^{m_t}\circ \cdots \circ ({{\textbf{u}}}_s^{(k)})^{m_s} \rangle . \end{aligned}$$

According to the Theorem 5.1, we can get that

$$\begin{aligned}{} & {} \langle {\mathcal {S}},\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}] \rangle \nonumber \\= & {} \langle {\mathcal {S}},\hat{{\mathcal {S}}} \rangle + \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle {\mathcal {S}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}] \rangle \nonumber \\{} & {} + \left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle {\mathcal {S}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}] \rangle + o (\Vert \varDelta U_t\Vert _F^2). \end{aligned}$$
(5.6)

Since,

$$\begin{aligned}{} & {} \Vert \hat{{\mathcal {S}}}[\varDelta U_t,U_t^{\times (m_t-1)}]+\hat{{\mathcal {S}}}[U_t,\varDelta U_t,U_t^{\times (m_t-2)}]+\cdots +\hat{{\mathcal {S}}}[U_t^{\times (m_t-1)},\varDelta U_t]\Vert _F^2\nonumber \\= & {} \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \left( {\begin{array}{c}m_{t}-1\\ 1\end{array}}\right) \langle \hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}],\hat{{\mathcal {S}}}[U_t,{\varDelta U_t},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} \quad +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \Vert \hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\Vert _F^2. \end{aligned}$$
(5.7)

And, for all \(k,l\in [R]\) and \(t\in [s]\),

$$\begin{aligned}{} & {} \langle ({{\textbf{u}}}_t^{(k)}+\varDelta {{\textbf{u}}}_t^{(k)})^{m_t},({{\textbf{u}}}_t^{(l)}+\varDelta {{\textbf{u}}}_t^{(l)})^{m_t}\rangle \nonumber \\= & {} \left( \langle {{\textbf{u}}}_t^{(k)},{{\textbf{u}}}_t^{(l)}\rangle +\langle \varDelta {{\textbf{u}}}_t^{(k)},{{\textbf{u}}}_t^{(l)}\rangle + \langle {{\textbf{u}}}_t^{(k)},\varDelta {{\textbf{u}}}_t^{(l)}\rangle + \langle \varDelta {{\textbf{u}}}_t^{(k)},\varDelta {{\textbf{u}}}_t^{(l)}\rangle \right) ^{m_t}\nonumber \\= & {} {\langle {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^{m_t} + \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \left( \langle {\textbf{u}}_t^{(k)},\varDelta {\textbf{u}}_t^{(l)}\rangle +\langle \varDelta {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle \right) {\langle {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^{m_t-1}\nonumber \\{} & {} \quad +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle \varDelta {\textbf{u}}_t^{(k)},\varDelta {\textbf{u}}_t^{(l)}\rangle {\langle {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^{m_t-1}\nonumber \\{} & {} \quad +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \left( {\begin{array}{c}m_t-1\\ 1\end{array}}\right) \langle {\textbf{u}}_t^{(k)},\varDelta {\textbf{u}}_t^{(l)}\rangle \langle \varDelta {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle {\langle {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^{m_t-2}\nonumber \\{} & {} \quad +\left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \left( {\langle {\textbf{u}}_t^{(k)},\varDelta {\textbf{u}}_t^{(l)}\rangle }^2 +{\langle \varDelta {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^2 \right) {\langle {\textbf{u}}_t^{(k)},{\textbf{u}}_t^{(l)}\rangle }^{m_t-2}\nonumber \\{} & {} \quad +o\left( \Vert \varDelta {\textbf{u}}_t^{(l)}\Vert _F^2+\Vert \varDelta {\textbf{u}}_t^{(k)}\Vert _F^2\right) . \end{aligned}$$
(5.8)

Let

$$\begin{aligned} {\mathcal {P}}_k:= & {} ({{\textbf{u}}}_1^{(k)})^{m_1}\circ \cdots \circ ({{\textbf{u}}}_{t-1}^{(k)})^{m_{t-1}},k\in [R],\\ {\mathcal {Q}}_k:= & {} ({{\textbf{u}}}_{t+1}^{(k)})^{m_{t+1}}\circ \cdots \circ ({{\textbf{u}}}_{s}^{(k)})^{m_{s}},k\in [R]. \end{aligned}$$

Then, by (5.7) and (5.8), we have that

$$\begin{aligned}{} & {} \langle \hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}],\hat{{\mathcal {S}}}[{\left( U_t+\varDelta U_t\right) }^{\times m_t}] \rangle \nonumber \\= & {} \sum \limits _{k=1}^R \sum \limits _{l=1}^R\langle {\mathcal {P}}_k\circ ({{\textbf{u}}}_t^{(k)}+\varDelta {{\textbf{u}}}_t^{(k)})^{m_t}\circ {\mathcal {Q}}_k, {\mathcal {P}}_l\circ ({{\textbf{u}}}_t^{(l)}+\varDelta {{\textbf{u}}}_t^{(l)})^{m_t}\circ {\mathcal {Q}}_l\rangle \nonumber \\= & {} \sum \limits _{k=1}^R \sum \limits _{l=1}^R \langle {\mathcal {P}}_k,{\mathcal {P}}_l\rangle \langle ({{\textbf{u}}}_t^{(k)}+\varDelta {{\textbf{u}}}_t^{(k)})^{m_t},({{\textbf{u}}}_t^{(l)}+\varDelta {{\textbf{u}}}_t^{(l)})^{m_t}\rangle \langle {\mathcal {Q}}_k,{\mathcal {Q}}_l\rangle \nonumber \\= & {} \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}\rangle + 2\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\rangle +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \Vert \hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\Vert _F^2\nonumber \\{} & {} +\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \left( {\begin{array}{c}m_{t}-1\\ 1\end{array}}\right) \langle \hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}],\hat{{\mathcal {S}}}[U_t,{\varDelta U_t},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} +2\left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle +o(\Vert \varDelta U_t\Vert _F^2).\nonumber \\= & {} \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}\rangle +2\left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\rangle +2\left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} \quad +\Vert \hat{{\mathcal {S}}}[\varDelta U_t,U_t^{\times (k-1)}]+\hat{{\mathcal {S}}}[U_t,\varDelta U_t,U_t^{\times (k-2)}]+\cdots +\hat{{\mathcal {S}}}[U_t^{\times (k-1)},\varDelta U_t]\Vert _F^2\nonumber \\{} & {} \quad +o(\Vert \varDelta U_t\Vert _F^2). \end{aligned}$$
(5.9)

Hence, by (5.6) and (5.9), the first degree term and the second degree term of \(\varDelta U_t\) in the right hand side of (5.5) are as follows, respectively.

$$\begin{aligned} L_1= & {} -2 \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle {\mathcal {S}}-\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\rangle ,\\ L_2= & {} -2 \left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle {\mathcal {S}}-\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} +\Vert \hat{{\mathcal {S}}}[\varDelta U_t,U_t^{\times (k-1)}]+\hat{{\mathcal {S}}}[U_t,\varDelta U_t,U_t^{\times (k-2)}]+\cdots +\hat{{\mathcal {S}}}[U_t^{\times (k-1)},\varDelta U_t]\Vert _F^2. \end{aligned}$$

This completes the proof. \(\square \)

It is well known that (1) if \(U_t\) is a stationary point of function \(F_{{\mathcal {S}}}(U_t)\), then \(L_1=0\) for any \(\varDelta U_t\); (2) the function \(F_{{\mathcal {S}}}(U_t)\) is convex at the point \(U_t\) if and only if \(L_2\ge 0\) for any \(\varDelta U_t\). From this, we will discuss the discrimination of partially symmetric tensors with the positive decomposition.

Theorem 5.4

Assume that orders \(m_1,\cdots ,m_s\) are even and \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) has positive partially symmetric CP decomposition. Then, zero tensor \({\mathcal {O}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) is the global minimum point of the function \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\), and the global minimum value is equal to \(F_{-{\mathcal {S}}}({\mathcal {O}})=\Vert {\mathcal {S}}\Vert _F^2\).

Proof

Since \(F_{-{\mathcal {S}}}(U_t)=\Vert {\mathcal {S}}+\hat{{\mathcal {S}}}\Vert _F^2\). By Theorem 5.3, \(L_1\) and \(L_2\) of \(F_{-{\mathcal {S}}}(U_t)\) are as follows

$$\begin{aligned} L_1= & {} 2 \left( {\begin{array}{c}m_t\\ 1\end{array}}\right) \langle {\mathcal {S}}+\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t},U_t^{\times (m_t-1)}]\rangle ,\\ L_2= & {} 2 \left( {\begin{array}{c}m_t\\ 2\end{array}}\right) \langle {\mathcal {S}}+\hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \nonumber \\{} & {} +\Vert \hat{{\mathcal {S}}}[\varDelta U_t,U_t^{\times (k-1)}]+\hat{{\mathcal {S}}}[U_t,\varDelta U_t,U_t^{\times (k-2)}]+\cdots +\hat{{\mathcal {S}}}[U_t^{\times (k-1)},\varDelta U_t]\Vert _F^2. \end{aligned}$$

Since \({\mathcal {S}}\) has positive partially symmetric CP decomposition, and orders \(m_1\), \(m_2,\cdots ,m_s\) are even. According to Theorem 5.2, if \(U_i\ne {\textbf{0}}\) for all \(i\in [s],\ i\ne t\), then \(\langle {\mathcal {S}}, \hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \) is nonnegative and \(\langle \hat{{\mathcal {S}}},\hat{{\mathcal {S}}}[{\varDelta U_t}^{\times 2},U_t^{\times (m_t-2)}]\rangle \) is positive if \(U_t\not = {\textbf{0}} \). Hence, \(L_2\) is positive at \(U_t\) for any \(\varDelta U_t\), i.e., the function \(F_{-{\mathcal {S}}}(U_t)\) is convex for every \(t\in [s]\) if \(U_i\ne {\textbf{0}}\) for all \(i\in [s],\ i\ne t\). Since \(\hat{{\mathcal {S}}}={\mathcal {O}}\) if and only if there exists \(U_i={\textbf{0}}\) for some \(i\in [s]\). Hence, \(\hat{{\mathcal {S}}}={\mathcal {O}}\) is the global minimum point of the function \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\) and the global minimum value is \(F_{-{\mathcal {S}}}({\mathcal {O}})=\Vert {\mathcal {S}}\Vert _F^2\). This completes the proof. \(\square \)

Assume that \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) be a partially symmetric tensor and its orders are even. From the Theorem 5.4, \(U_t=\textbf{0}\) is the global minimum point of the function \(F_{-{\mathcal {S}}}(U_t)\), and the global minimum value is equal to \(\Vert {\mathcal {S}}\Vert _F^2\), if partially symmetric tensor \({\mathcal {S}}\) has positive CP decomposition. According to Theorem 5.4, we have the corollary in the following.

Corollary 5.1

Assume that \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) and its orders are even. The function \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\) is defined as in (5.1). Then, the partially symmetric tensor \({\mathcal {S}}\) has no positive partially symmetric CP decomposition, if zero tensor \({\mathcal {O}}\) is not the global minimum point of the function \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\).

Proof

These results can be obtained directly form Theorem 5.4. \(\square \)

When dealing with partially symmetric rank-R approximation of \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) with even orders, i.e., \(m_t\) for all \(t\in [s]\) are even, we construct a new tensor \(\tilde{{\mathcal {S}}}\in S[1,{\textbf{m}}]{\mathbb {R}}[1,{\textbf{n}}]\) satisfying

$$\begin{aligned} \tilde{{\mathcal {S}}}_{1 i_1\cdots i_{m_1}j_1\cdots j_{m_2}\cdots l_1\cdots l_{m_s}} = {\mathcal {S}}_{i_1\cdots i_{m_1}j_1\cdots j_{m_2}\cdots l_1\cdots l_{m_s}}. \end{aligned}$$

We compute the partially symmetric rank-R approximation of \(\tilde{{\mathcal {S}}}\) by the Algorithm 3 as

$$\begin{aligned} \tilde{{\mathcal {S}}} \approx [\![\tilde{U}_0,\tilde{U}_1^{\times m_1},\cdots ,\tilde{U}_s^{\times m_s}]\!], \end{aligned}$$

where \(\tilde{U}_t=\left( \tilde{{\textbf{u}}}_t^{(1)},\tilde{{\textbf{u}}}_t^{(2)},\cdots ,\tilde{{\textbf{u}}}_t^{(R)}\right) \), \(t\in [s]\). Compute \({\textbf{u}}^{(t)}_k\) and \(\lambda _k\), \(t\in [s]\), \(k\in [R]\) by the following formula

$$\begin{aligned} \lambda _k=\tilde{u}^{(0)}_k\prod _{t=1}^{s}\Vert \tilde{{\textbf{u}}}^{(t)}_k\Vert _2^{m_t},\ \textrm{and}\ {\textbf{u}}^{(t)}_k = \tilde{{\textbf{u}}}^{(t)}_k/\Vert \tilde{{\textbf{u}}}^{(t)}_k\Vert _2. \end{aligned}$$

So, we obtain that

$$\begin{aligned} \varvec{\lambda }=(\varvec{\lambda }_1,\varvec{\lambda }_2, \cdots , \varvec{\lambda }_R)^T,\ U_t=\left( {\textbf{u}}_t^{(1)},{\textbf{u}}_t^{(2)},\cdots ,{\textbf{u}}_t^{(R)}\right) ,\ t\in [s]. \end{aligned}$$

That is, the general partially symmetric rank-R approximation of \({\mathcal {S}}\) is

$$\begin{aligned} \mathcal {S} \approx [\![{\varvec{\lambda }}; U_1^{\times m_1},\cdots ,U_s^{\times m_s}]\!]. \end{aligned}$$

We give an algorithm to calculate general partially symmetric rank-R approximation of partially symmetric tensors with BFGS algorithm as follows.

figure d

6 Numerical examples

We give some examples in this section. All experiments on a laptop with an Intel(R) Core(TM) i7-8550U CPU and 8.00 GB of RAM, using MATLAB 2018b on a Microsoft Win 10. All tensor computations use the tensor toolbox for MATLAB, Version [29].

Example 6.1

(A random tensor) Consider a partially symmetric CP decomposition of a tensor \({\mathcal {S}}\in S[2,2]{\mathbb {R}}[3,2]\). It is obtained randomly as follows

$$\begin{aligned} {\mathcal {S}}(:,:,1,1)= & {} \begin{pmatrix} 1.4897&{}0.6715&{}1.6302\\ 0.6715&{}1.2075&{}0.4889\\ 1.6302&{}0.4889&{}1.0347 \end{pmatrix},\\ {\mathcal {S}}(:,:,1,2)= & {} \begin{pmatrix} 0.7296 &{}0.7873&{}-1.0689\\ 0.7873&{}0.8884&{}-0.8095\\ -1.0689&{}-0.8095&{}-2.9443 \end{pmatrix},\\ {\mathcal {S}}(:,:,2,2)= & {} \begin{pmatrix} 1.4384 &{}1.3703&{}-0.2412\\ 1.3703&{}-1.7115&{}0.3192\\ -0.2412&{}0.3192&{}0.3129 \end{pmatrix}. \end{aligned}$$

Firstly, we discuss whether there is a positive partially symmetric CP decomposition of tensor \({\mathcal {S}}\). We compute the minimum value of \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\) defined as in (5.1) for all \(R=1,2\cdots ,6\) by Algorithm 3 and obtain the minimum values min \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\), norms of the minimum points \(\Vert \hat{{\mathcal {S}}}\Vert _F\) and running time as in Table 1. From the datum in Table 1, it can be seen that the minimum points \(\hat{{\mathcal {S}}}\) are not zero tensor. Hence, tensor \({\mathcal {S}}\) has no positive partially symmetric CP decomposition by Corollary 5.1.

Table 1 The minimum values of \(F_{-{\mathcal {S}}}(\hat{{\mathcal {S}}})\) for \(R=1,2,\cdots ,6\)

Secondly, we compute the positive partially symmetric rank-R approximation of \({\mathcal {S}}\) by the Algorithm 3 for all \(R=1,2\cdots ,6\). We obtain in the minimum values min \(F_{{\mathcal {S}}}(\hat{{\mathcal {S}}})\) and running time as in Table 2. It can be seen from Table 2 that the minimum values of the objective function are always greater than 14.

Table 2 The positive partially symmetric rank-R approximation of \({\mathcal {S}}\)

Finally, we consider the general partially symmetric CP decomposition of tensor \({\mathcal {S}}\). We compute the partially symmetric rank-R approximation of the tensor \({{\mathcal {S}}}\) by the Algorithm 4 for \(R=1,2,\cdots ,6\) and obtain the minimum values min \(F_{{\mathcal {S}}}(\hat{{\mathcal {S}}})\) and running time as in Table 3.

Table 3 The general partially symmetric rank-R approximation of \({{\mathcal {S}}}\)

Therefore, we obtain a general partially symmetric CP decomposition of the tensor \({\mathcal {S}}\) with \(R=6\) as

$$\begin{aligned} \hat{{\mathcal {S}}}=\sum \limits _{k=1}^6{\lambda }_k({\textbf{u}}_k)^{2}\circ ({\textbf{v}}_k)^{2}, \end{aligned}$$

where \(\lambda _k\in {\mathbb {R}}\), \({\textbf{u}}_k\in {\mathbb {R}}^3\) and \({\textbf{v}}_k\in {\mathbb {R}}^2\) for \(k=1,2,\cdots ,6\) are as in the Table 4

Table 4 A general partially symmetric CP decomposition \(\hat{{\mathcal {S}}}\)

Example 6.2

(Comparison of convergence speed between the gradient descent method and the BFGS method) Partially symmetric tensor \({\mathcal {S}}\in S[1,2]{\mathbb {R}}[n_1,n_2]\) is given by \( {\mathcal {S}}=[\![U_1,U_2^{\times 2}]\!] \), where \(U_1\in {\mathbb {R}}^{n_1\times R}\), \(U_2\in {\mathbb {R}}^{n_2\times R}\) are randomly obtained. We use Algorithm 2 and Algorithm 3 to calculate the rank-R approximation of tensor \({\mathcal {S}}\), that is, its structure preserving CP decomposition. The termination condition of both methods are that the norm of gradient is less \(10^{-5}\), or the number of iteration reaches 5000. We calculate four types of tensors: \((n_1, n_2, R)=(5,5,3)\), (5, 5, 5), (10, 10, 5), (10, 10, 10). For each type, we calculate 10 times. The results of numerical calculation are shown in Table 5, where ‘Iter’ denotes the average number of iterations, ‘Time’ denotes average running time and ‘Error’ denotes average value of \(\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert _F\), respectively. Form Table 5, we see that the running time of BFGS method is less than that of gradient descent method.

Table 5 Comparison of gradient descent method and BFGS method

Example 6.3

(Rank partially symmetric CP decomposition) Assume orders \({\textbf{m}}\) of \({\mathcal {S}}\in S[{\textbf{m}}]{\mathbb {R}}[{\textbf{n}}]\) are even. If the tensor \({\mathcal {S}}\) has a positive partially symmetric CP decomposition, must its rank partially symmetric CP decomposition be a positive CP decomposition? Here, we assume that the tensor \({\mathcal {S}}\in S[2,2,2]{\mathbb {R}}[3,4,5]\) has the following form

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^r{\left( {{\textbf{u}}}_k\right) }^{2} \circ {\left( {{\textbf{v}}}_k\right) }^{2} \circ {\left( {{\textbf{w}}}_k\right) }^{2}, \end{aligned}$$

where \({\textbf{u}}_k\in {\mathbb {R}}^{3}\), \({\textbf{v}}_k\in {\mathbb {R}}^{4}\) and \({\textbf{w}}_k\in {\mathbb {R}}^{5}\) are generated randomly unit vectors for all \(k\in [r]\).

Let \(r=5\). The tensor \({\mathcal {S}}\) is obtained randomly as in Table 6.

Table 6 A random tensor \({\mathcal {S}}\)

We first compute the partially symmetric rank of \({\mathcal {S}}\) by Algorithm 4. We compute the general partially symmetric rank-R approximation by the Algorithm 4 for \(R=1,2,\cdots ,5\) as in Table 7. It is clear that the partially symmetric rank of tensor \({\mathcal {S}}\) is equal to 5.

Table 7 The positive partially symmetric rank-R approximation of \({\mathcal {S}}\)

Furthermore, we obtain a general partially symmetric rank decomposition of the tensor \({\mathcal {S}}\) with \(R=5\) as in Table 8. It is observed that the rank partially symmetric CP decomposition is also a positive CP decomposition. When we take \(r=10\) or 15, the results are the same. Hence, we have the result that if a tensor \({\mathcal {S}}\) has a positive partially symmetric CP decomposition, then its rank partially symmetric CP decomposition must be a positive CP decomposition.

Table 8 A rank partially symmetric CP decomposition of \({\mathcal {S}}\)

Example 6.4

(Comparison with ALS method) Given a random nonsymmetric tensor \({\mathcal {S}}\in S[1,1,1]{{\mathbb {R}}}[3,4,5]\) as in the following. We compare the compute efficient of the ALS method and the BFGS method.

$$\begin{aligned} {\mathcal {S}}(1,:,:)= & {} \begin{pmatrix} 0.8147&{} 0.9572&{} 0.6787&{} 0.6948&{} 0.7094\\ 0.9134&{} 0.1419&{} 0.3922&{} 0.0344&{} 0.6797\\ 0.2785&{} 0.7922&{} 0.7060&{} 0.7655&{} 0.1189\\ 0.9649&{} 0.0357&{} 0.0462&{} 0.4898&{} 0.3403 \end{pmatrix},\\ {\mathcal {S}}(2,:,:)= & {} \begin{pmatrix} 0.9058&{} 0.4854&{} 0.7577&{} 0.3171&{} 0.7547\\ 0.6324&{} 0.4218&{} 0.6555&{} 0.4387&{} 0.6551\\ 0.5469&{} 0.9595&{} 0.0318&{} 0.7952&{} 0.4984\\ 0.1576&{} 0.8491&{} 0.0971&{} 0.4456&{} 0.5853 \end{pmatrix},\\ {\mathcal {S}}(3,:,:)= & {} \begin{pmatrix} 0.1270&{} 0.8003&{} 0.7431&{} 0.9502&{} 0.2760\\ 0.0975&{} 0.9157&{} 0.1712&{} 0.3816&{} 0.1626\\ 0.9575&{} 0.6557&{} 0.2769&{} 0.1869&{} 0.9597\\ 0.9706&{} 0.9340&{} 0.8235&{} 0.6463&{} 0.2238 \end{pmatrix}. \end{aligned}$$

We compute the CP-rank \( \textrm{rank} ({\mathcal {S}})=7\) by Algorithm 3. Hence, we take \(R=7\) to compute its CP decomposition ten times by the ALS method and the BFGS method, respectively. For each time, the initial point is obtained randomly. For the ALS method, the program code “cp_als” comes from the tensor toolbox [29]. The termination condition of both algorithms is that the iterative steps reaches 10,000 or the absolute error \(\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert _F\) is less than \(10^{-5}\). We obtain the number of iteration ‘it-number’, running time and the absolute error \(\Vert {{\mathcal {S}}-\hat{{\mathcal {S}}}}\Vert _F\) in Table 9. The average running time of Algorithm 3 and ALS algorithm is 1.4743 seconds and 8.8574 seconds, respectively. The numerical results show that the BFGS method is more effective than the ALS method in terms of calculation accuracy and calculation speed.

Table 9 Comparison between ALS method and BFGS method

Example 6.5

(Comparison with ALS method) We compare the stability and computing speed of the BFGS algorithm and the ALS algorithm. The BFGS algorithm is the structure preserving CP decomposition method proposed in this paper, while the ALS algorithm is the usual CP decomposition method and does not have the structure preserving property. In the numerical example, partially symmetric tensors have the following form

$$\begin{aligned} {\mathcal {S}}=\sum \limits _{k=1}^R{\left( {{\textbf{u}}}_k\right) }^{m_1} \circ {\left( {{\textbf{v}}}_k\right) }^{m_2}\circ {\left( {{\textbf{w}}}_k\right) }^{m_3}, \end{aligned}$$

where \({\textbf{u}}_k\in {\mathbb {R}}^{n_1}\) and \({\textbf{v}}_k\in {\mathbb {R}}^{n_2}\) and \({\textbf{w}}_k\in {\mathbb {R}}^{n_3}\) are generated randomly, for all \(k\in [R]\).

For each parameter tuple \(({\textbf{m}}, {\textbf{n}}, R)\), we randomly generate tensor \({\mathcal {S}}\) ten times, and use Algorithm 3 and the ALS algorithm to calculate their CP decomposition respectively, where the program code “cp_als” comes from the tensor toolbox [29]. The termination condition of both algorithms is that the iterative steps reaches 10,000 or the relative error \(\Vert {\mathcal {S}}-\hat{{\mathcal {S}}}\Vert _F/\Vert {\mathcal {S}}\Vert _F\) is less than \(10^{-5}\). We say a run is successful, if the relative error is less than \(10^{-5}\). We obtain the total running time and success times of each tuple \(({\textbf{m}}, {\textbf{n}}, R)\) in Table 10. The numerical results show that the BFGS method is more stable and faster than the ALS method.

Table 10 Comparison with stability between ALS method and BFGS method

7 Conclusion

In this paper, we study the numerical problem of structure preserving rank-R approximation and structure preserving CP decomposition of partially symmetric tensors. For the problem of structure preserving rank-R approximation, we deduce the gradient formula of the objective function, obtain the BFGS iterative formula with tensor form, propose a BFGS algorithm for positive partially symmetric rank-R approximation, and discuss the convergence of the algorithm. For the problem of structure preserving CP decomposition, we give a necessary condition for partially symmetric tensors with even orders to have positive partially symmetric CP decomposition, and design a general partially symmetric rank-R algorithm to obtain structure preserving CP decomposition. Finally, some numerical examples are given. We compute the partially symmetric CP decomposition of the random partially symmetric tensors. By some numerical examples, we find that if a tensor has a positive partially symmetric CP decomposition then its partially symmetric rank CP decomposition must be a positive CP decomposition. Meanwhile, in some numerical examples, we compare the BFGS algorithm proposed in this paper with the standard CP-ALS method.

When \(m_1, m_2, \ldots , m_s\) are all even numbers, it is difficult to judge and obtain the positive partially symmetric CP decomposition of \({\mathcal {S}}\). In particular, when \(m_1=m_2=\ldots =m_s=2\), the tensor \({\mathcal {S}}\) can be regarded as a real Hermitian tensor, and \({\mathcal {S}}\) has real Hermitian separability if and only if \({\mathcal {S}}\) has a positive partially symmetric CP decomposition, see reference [30].