1 Introduction

A tensor is a multi-dimensional array, in which the order is the number of dimensions, also known as ways or modes. In the past few years, tensors have attracted significant attention in several applications, such as image processing, machine learning, and scientific computing [15, 20]. One of the popular problems in tensor-based modeling is the following equation, known as the Sylvester tensor equation

$$ \mathcal{X}\times_{1} A^{(1)} + \mathcal{X} \times_{2} A^{(2)} + {\cdots} + \mathcal{X} \times_{N} A^{(N)} = \mathcal{B}, $$
(1)

where the matrices \(A^{(n)}\in \mathbb {R}^{I_{n}\times I_{n}}, n=1,2,\cdots , N\), the right-hand side tensor \({\mathscr{B}} \in \mathbb {R}^{I_{1} \times I_{2} \times {\cdots } \times I_{N}}\) are known, and \(\mathcal {X}\in \mathbb {R}^{I_{1} \times I_{2} \times {\cdots } \times I_{N}}\) is the unknown tensor. The product ×i, i = 1, … , N, and some notations related to the concept of tensors will be specified in the next section. For simplicity, in the sequel, we define the following linear operator

$$ \begin{array}{cccc} \mathcal{M} : & \mathbb{R}^{I_{1}\times {\cdots} \times I_{N}} & \longrightarrow & \mathbb{R}^{I_{1}\times {\cdots} \times I_{N}}\\ & \mathcal{X} & \longmapsto & \mathcal{M}\left( \mathcal{X}\right):= {\sum}_{i=1}^{N}\mathcal{X}\times_{i} A^{(i)}. \end{array} $$
(2)

It is easy to verify that (1) is equivalent to the following linear system of equations

$$ \mathbb{A}x=b, $$
(3)

with \(\mathbb {A}={\sum }_{n=1}^{N} I_{I_{N}}\otimes \cdots \otimes I_{I_{n+1}}\otimes A^{(n)}\otimes I_{I_{n-1}}\otimes {\cdots } I_{I_{1}}, x=vec\left (\mathcal {X}\right ),\) and \(b=vec\left ({\mathscr{B}}\right ),\) where ⊗ denotes the Kronecker product (defined in the next section), \(I_{I_{n}}\) stands for the identity matrix of order In and the operator vec rearranges tensor’s elements in a vector. It is well known that the Sylvester tensor equation (1) has a unique solution if and only if λ1 + λ2 + … + λN ≠ 0, for all λiσ(A(i)), where σ(A(i)) denotes the spectrum of A(i) (Lemma4.2 [6]).

Note that the coefficient matrix of the linear system (3) is of order \(\displaystyle \prod \limits _{i=1}^{N}I_{i}\), which may become too large even for moderate values of I1, … , IN and solving this linear system can be a real challenge. When \(\mathcal {X}\) is a tensor of order two, i.e., a matrix, (1) is reduced to

$$ A^{(1)}X+XA^{(2)T}=B, $$
(4)

which is exactly the well-known Sylvester matrix equation, it has been widely used in control and communication theory, image restoration, and numerical methods for ordinary differential equations (see [4] and the references therein). Sylvester tensor equation (1) can arise when discretizing high dimensional linear partial differential equations using finite difference or spectral method [19, 21, 24]; a survey of the tensor-structured numerical methods in applications to multidimensional problems in scientific computing is given in [15]. As an example of the applications of (1), one can think of some Laplace-like operator L in an N −dimensional domain

$$ \begin{array}{@{}rcl@{}} Lu=f in {\Omega} \\ u=0 on \partial{\Omega}. \end{array} $$

In this paper, we are interested in the case where the discretized right-hand side in (1) is of low rank—this is possible when the function f is sufficiently smooth to be well approximated by a short sum of separable functions (see, e.g., [3, 11]). In general, the right-hand side tensor \({\mathscr{B}}\) can be approximated by a low rank tensor using CP decomposition [16].

In recent years, various methods have been proposed in order to solve (1). For instance, the tensor format of the GMRES method (GMRES-BTF) has been established by Chen and Lu [6]. In [7], gradient-based iterative algorithms have been proposed for solving (1); they are based on the hierarchical identification principle [9] and tensor arithmetic concepts. Kressner and Tobler proposed a tensor Krylov subspace method to solve (3) when the right-hand side is given in a tensor product structure, i.e., is of rank one. The idea based on applying a standard Krylov subspace method to the coefficient matrices, in order to approximate the solution by a vector of low tensor rank [17]. Ballani and Grasedyck presented an iterative scheme similar to Krylov subspace method to solve (3), relying on truncation operator, whereas the operator is implemented by hierarchical Tucker format [16], to allow applications in high dimensions [2]. Some well-known Krylov subspace methods have been studied in their tensor format by Beik et al. in [1]; the authors have described the tensor format of the full orthogonalization method (FOM-BTF) and conjugate gradient (CG-BTF)–type iterative algorithms. These methods are attractive if the coefficient matrices are not large, since they are based on the use of the tensor Krylov subspace associated to the operator \({\mathscr{M}}\) defined by (2).

In the concept of Krylov subspace methods, we apply an Arnoldi-based algorithm to the coefficient matrices to get a reduced Sylvester tensor equation which can be solved by a global iterative scheme. The approximate solution is then constructed from the solution of the reduced equation and the Krylov subspaces basis associated to the coefficient matrices.

The rest of this paper is organized as follows. In Section 2, we give notations adopted in this paper and some basic definitions and properties related to tensors. Section 3 is dedicated to theoretical results when the right-hand side tensor in (1) is of rank one. In Section 4, we present our approaches to solve (1) with a right-hand side tensor of a specific rank; we give the block Krylov approach in Section 4.1 and the global Krylov approach in Section 4.2. Some numerical examples are presented in Section 5 to evaluate the performance of our approaches. Finally, in Section 6, we give a brief conclusion and perspectives.

2 Notations and preliminary concepts

In this section, we introduce some basic definitions, tensor notations, and common operations related to tensors adopted in this paper (for more details, see [16]). Throughout this paper, vectors (tensors of order one), matrices (tensors of order two), and higher order tensors (order three or higher) are signified by lower-case letter, capital letters, and Euler script letters respectively. A tensor \(\mathcal {X}\) is an element in \(\mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}}\), where \(I_{1},I_{2},\cdots ,I_{N}\in \mathbb {N}\); its entries are denoted by \(\mathcal {X}_{i_{1}i_{2}{\cdots } i_{N}},\) with in ∈{1,⋯ , In}, for every 1 ≤ nN. Fibers are the higher order analogue of matrix rows and columns. A fiber is defined by fixing every index but one. The n-mode fiber is denoted by \(\mathcal {X}_{i_{1},{\cdots } i_{n-1},:,i_{n+1},{\cdots } ,i_{N}}\in \mathbb {R}^{I_{n}}\), where the indices ij are fixed for j = 1, ⋯ , n − 1, n + 1, ⋯ , N. The notation \(\overline {i_{1}{\ldots } i_{N}}\) corresponds to a multi-index, which is obtained as follows

$$\overline{i_{1}{\ldots} i_{N}}=i_{N}+(i_{N-1}-1)I_{N}+{\ldots} +(i_{1}-1)I_{2}{\ldots} I_{N}.$$

The inner product of two same size tensors \(\mathcal {A},{\mathscr{B}}\in \mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}}\) is defined by

$$<\mathcal{A},\mathcal{B}> := \sum\limits_{i_{1}=1}^{I_{1}}\sum\limits_{i_{2}=1}^{I_{2}}\cdots\sum\limits_{i_{N}=1}^{I_{N}}{\mathcal{A}}_{i_{1}{\cdots} i_{N}}{\mathcal{B}}_{i_{1}{\cdots} i_{N}},$$

and the norm induced by this inner product is

$$\parallel\mathcal{A\parallel}=\sqrt{<\mathcal{A},\mathcal{A}>}.$$

Definition 1

[16, 18]

Let \( \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}} \) be an N th-order tensor and \( U\in \mathbb {R}^{J\times I_{n}} \) be a matrix. The n-mode product of \( \mathcal {A} \) and U, denoted by \(\mathcal {A}\times _{n}U \), is a tensor of size

$$ I_{1}\times I_{2}\times\cdots\times I_{n-1}\times J\times I_{n+1}\times\cdots\times I_{N},$$

whose entries are given by

$$(\mathcal{A}\times_{n}U)_{i_{1}{\cdots} i_{n-1}ji_{n+1}{\cdots} i_{N}}=\sum\limits_{i_{n}=1}^{I_{n}}\mathcal{A}_{i_{1}{\cdots} i_{N}}U_{ji_{n}}.$$

For \(\mathcal {X}\in \mathbb {R}^{I_{1}\times {\cdots } I_{N}}\) and {A} a set of matrices \(A_{n}\in \mathbb {R}^{I_{n}\times I_{n}}, n=1,2,{\cdots } ,N\), their multiplication in all possible modes (n = 1, 2, ... , N) is denoted as

$$\mathcal{X}\times\lbrace A \rbrace :=\mathcal{X}\times_{1}A_{1}\times_{2}A_{2}\cdots\times_{N}A_{N},$$

and

$$\mathcal{X}\times\lbrace A\rbrace^{T} :=\mathcal{X}\times_{1}{A_{1}^{T}}\times_{2}{A_{2}^{T}}\cdots\times_{N}{A_{N}^{T}}.$$

Proposition 1

[16, 18]

Let\(\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}\)be an N th-order tensor, \(U\in \mathbb {R}^{J\times I_{m}}\), \(V\in \mathbb {R}^{K\times I_{n}}\)and\(W\in \mathbb {R}^{I_{n}\times I_{n}}\)be three matrices, then for distinct modes in a series of multiplication, the order of the multiplication is irrelevant, i.e.,

$$\mathcal{A}\times_{m}U\times_{n}V=\mathcal{A}\times_{n}V\times_{m}U.$$

If the modes are the same, then

$$\mathcal{A}\times_{n}W\times_{n}V=\mathcal{A}\times_{n}VW.$$

Definition 2

[16]

The outer product of two tensors \(\mathcal {A}\in \mathbb {R}^{I_{1}\times I_{2}\times \ldots \times I_{N}}\) and \({\mathscr{B}}\in \mathbb {R}^{J_{1}\times J_{2}\times \cdots \times J_{M}}\) is a tensor denoted by \(\mathcal {A}\circ {\mathscr{B}}=\mathcal {C}\in \mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}\times J_{1}\times J_{2}\times \cdots \times J_{M}}\).

Elementwise,

$$\mathcal{C}_{i_{1}{\ldots} i_{N}j_{1}{\ldots} j_{M}}=\mathcal{A}_{i_{1},{\ldots} ,i_{N}}\mathcal{B}_{j_{1},{\ldots} ,j_{M}}.$$

If v1, v2, … , vN are N vectors of sizes Ii, i = 1, … , N, their outer product is an N th-order tensor of size I1 ×… × IN and we have

$${v_{1}\circ\ldots\circ v_{N}}_{i_{1},{\cdots} ,i_{N}}=v_{1}(i_{1}){\ldots} v_{n}(i_{N})$$

Definition 3

[16]

An N th-order tensor \( \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times \cdots \times I_{N}} \) is of rank one if it can be written as the outer product of N vectors \(v_{k}\in \mathbb {R}^{I_{k}}, k=1,2,\cdots ,N\), i.e.,

$$\mathcal{A}=v_{1}\circ v_{2}\circ\cdots\circ v_{N}.$$

A tensor is of rank \(R\in \mathbb {N}\) if it could be written as the sum of R rank one tensors.

Definition 4

[16, 18]

The Kronecker product of two matrices \(A\in \mathbb {R}^{I_{1}\times I_{2}}\) and \(B\in \mathbb {R}^{J_{1}\times J_{2}}\) is a matrix of size I1J1 × I1J2 denoted by AB, where

$$ A\otimes B= \left( \begin{array}{ccccc} a_{11}B & {\cdots} & a_{1I_{2}}B\\ {\vdots} & {\ddots} & \vdots\\ a_{I_{1}1}B & {\cdots} & a_{I_{1}I_{2}}B \end{array}\right). $$

The Kronecker product of two tensors \(\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}\) and \({\mathscr{B}}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}\) is defined by

$$\mathcal{C}=\mathcal{A}\otimes \mathcal{B}\in\\{R}^{I_{1}J_{1}\times\cdots\times I_{N}J_{N}},$$

where

$$\mathcal{C}_{\overline{i_{1}j_{1}},{\cdots} ,\overline{i_{N}j_{N}}}=\mathcal{A}_{i_{1},{\cdots} ,i_{N}}\mathcal{B}_{j_{1},{\cdots} ,j_{N}},$$

for in = 1, ⋯ , In, jn = 1, ⋯ , Jn, n = 1, ⋯ , N.

In the following remark, we state the link between Kronecker product of 3 vectors and their outer product

Remark 1

[8, Page 33]

Let v1, v2, and v3 be 3 vectors of sizes I1, I2, and I3, respectively, we have

$$vec(v_{1}\circ v_{2}\circ v_{3})=v_{3}\otimes v_{2}\otimes v_{1}$$

It is easy to verify that the above remark still available for N vectors. This property shows that the idea in Section 3 of the current paper can be considered as a reformulation of the one in [17].

Proposition 2

[18]

Let \(\mathcal {A}=a_{1}\circ a_{2}\circ {\cdots } \circ a_{N}\) and \({\mathscr{B}}=b_{1}\circ b_{2}\circ {\cdots } \circ b_{N}\) denote two rank one tensors, then, the Kronecker product \(\mathcal {A}\otimes {\mathscr{B}}\) can be expressed by

$$\mathcal{A}\otimes\mathcal{B}=(a_{1}\otimes b_{1})\circ\cdots\circ(a_{N}\otimes b_{N}).$$

It is well known that if A, B, C, and D are four matrices, we have

$$(A\otimes B)(C\otimes D)=(AC)\otimes(BD).$$

We give an elegant generalization of this property to tensors as follows

Proposition 3

Let\(\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}\), \({\mathscr{B}}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}\)be two tensors, and\(A\in \mathbb {R}^{K_{n}\times I_{n}}\)and\(B\in \mathbb {R}^{L_{n}\times J_{n}}\)be two matrices, then

$$\left( \mathcal{A}\otimes\mathcal{B}\right)\times_{n}\left( A\otimes B\right) =\left( \mathcal{A}\times_{n}A\right)\otimes\left( \mathcal{B}\times_{n}B\right).$$

Proof

$$ \begin{array}{@{}rcl@{}} \left( \left( \mathcal{A}\times_{n}A\right)\otimes\left( \mathcal{B}\times_{n}B\right)\right)_{\overline{i_{1}j_{1}}\cdots\overline{k_{n}l_{n}}\cdots\overline{i_{N}j_{N}}}&=&\left( \mathcal{A}\times_{n}A\right)_{i_{1}{\cdots} k_{n}{\cdots} i_{n}}\left( \mathcal{B}\times_{n}B\right)_{j_{1}{\cdots} l_{n}{\cdots} j_{n}}\\ &=&\sum\limits_{i_{n}=1}^{I_{n}}\mathcal{A}_{i_{1}{\cdots} i_{n}{\cdots} i_{n}}A_{k_{n}i_{n}}\sum\limits_{j_{n}=1}^{J_{n}}\mathcal{B}_{j_{1}{\cdots} j_{n}{\cdots} j_{n}}B_{l_{n}j_{n}}\\ &=&\sum\limits_{i_{n}=1}^{I_{n}}\sum\limits_{j_{n}=1}^{J_{n}}\left( \mathcal{A}\otimes\mathcal{B}\right)_{\overline{i_{1}j_{1}}\cdots\overline{i_{n}j_{n}}\cdots\overline{i_{N}j_{N}}}\left( A\otimes B\right)_{\overline{k_{n}l_{n}},\overline{i_{n}j_{n}}}\\ &=&\left( \left( \mathcal{A}\otimes\mathcal{B}\right)\times_{n}\left( A\otimes B\right)\right)_{\overline{i_{1}j_{1}}\cdots\overline{k_{n}l_{n}}\cdots\overline{i_{N}j_{N}}}. \end{array} $$

It is easy to verify the following result

Proposition 4

Let\(\mathcal {A}=a_{1}\circ a_{2}\circ {\cdots } \circ a_{N}\)a rank one tensor and {V} a set of N matrices {V1, V2, … , VN}, then we have

$$\mathcal{A}\times\lbrace V\rbrace=V_{1}a_{1}\circ\ldots\circ V_{N}a_{N}.$$

Definition 5 (CP decomposition 16)

Let \( \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}} \) be an N th-order tensor. The CP decomposition of \(\mathcal {A}\)is

$$\mathcal{A}=\sum\limits_{r=1}^{R} a_{r}^{(1)} \circ a_{r}^{(2)}\cdots\circ a_{r}^{(N)},$$

where \(a_{r}^{(k)}\) are vectors of size Ik with 1 ≤ kN. If we define \(A_{n}=\left [a_{1}^{(n)} a_{2}^{(n)}{\cdots } a_{R}^{(n)} \right ] \) for \(n\in \left \lbrace 1,{\cdots } ,N \right \rbrace \), the CP decomposition can be symbolically written as

$$ \mathcal{A}= A_{1} \circ A_{2}\circ\cdots\circ A_{N},$$

the matrices \(A_{n}\in \mathbb {R}^{I_{n}\times R}\) are called factor matrices. Often, the vectors \(a_{r}^{(n)}\) are chosen such that \(\Vert a_{r}^{(n)} \Vert =1\). In this case, the CP decomposition is written as

$$ \mathcal{A}=\sum\limits_{r=1}^{R} \lambda_{r} a_{r}^{(1)} \circ a_{r}^{(2)}\cdots\circ a_{r}^{(N)},$$

where λr is a scalar that compensates for the magnitudes of vectors \( a_{r}^{(n)}\).

In the following, we denote by m = (m1, m2, ⋯ , mN) a multi-index that represents all the Krylov subspaces dimensions, \(e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}} \), \(e_{1}^{(m_{i})}=\left [1,0,{\ldots } ,0\right ]^{T}\in \mathbb {R}^{m_{i}} \), \(E_{m_{i}}=\left [0_{R},{\ldots } ,0_{R},I_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}\), \(E_{1}^{(m_{i})}=\left [I_{R},0_{R}{\ldots } ,0_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}\) and \(e_{r}^{(R)}\in \mathbb {R}^{R}\) the r th vector from the canonical basis of \(\mathbb {R}^{R}\), where 0R and IR correspond to the square matrix full of zeros and the identity matrix of size R respectively.

3 Rank one right-hand side tensor

In this section, we assume that the right-hand side tensor in (1) is of rank one; i.e., it can be written as follows

$$\mathcal{B}=b_{1}\circ b_{2}\circ\cdots\circ b_{N}$$

where \(b_{i}\in \mathbb {R}^{I_{i}}, i=1,{\ldots } ,N\). Applying the Arnoldi algorithm (section 6.3 in [23]) to the pairs (A(i), bi), i = 1, … , N, leads to the following relations, for i = 1, … , N,

$$ A^{(i)}V_{m_{i}}=V_{m_{i}}H_{m_{i}}+h_{m_{i}+1}v_{m_{i}+1}e_{m_{i}}^{T}, $$
(5)

and

$$ V_{m_{i}}^{T} A^{(i)}V_{m_{i}}=H_{m_{i}}, $$
(6)

where the first vector of each basis \(V_{m_{i}}\) is exactly the normalized vector

$$v_{1}^{(i)}=\frac{b_{i}}{\|b_{i}\|}.$$

We consider the following approximate solution of (1)

$$\mathcal{X}_{m}=\mathcal{Y}_{m}\times\lbrace V_{m}\rbrace,$$

where \(\mathcal {Y}_{m}\in \mathbb {R}^{m_{1}\times \cdots \times m_{N}}\) and {Vm} is the set of matrices \(\lbrace V_{m_{1}},{\cdots } ,V_{m_{N}}\rbrace \).

The associated residual tensor is given by

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}. \end{array} $$

We consider the Petrov-Galerkin condition on its tensor format as follows

$$\mathcal{R}_{m}\times\lbrace V_{m}\rbrace^{T}=0,$$

hence

$$ \begin{array}{@{}rcl@{}} 0&=\mathcal{B}\times\lbrace V_{m}\rbrace^{T}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}\times_{1} V_{m_{1}}^{T}\cdots\times_{N} V_{m_{N}}^{T}\\ &=V_{m_{1}}^{T}b_{1}\circ V_{m_{2}}^{T}b_{2}\circ\cdots\circ V_{m_{N}}^{T}b_{N}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\\ &=\upbeta e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ\cdots\circ e_{1}^{(m_{N})}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}} \end{array} $$

where \(\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|b_{i}\|\). Thus, the reduced Sylvester tensor equation is given as follows

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}=\upbeta \mathcal{E}_{m}, $$
(7)

where \(\mathcal {E}_{m}=e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ \cdots \circ e_{1}^{(m_{N})}\). The following proposition gives the associated residual tensor

Proposition 5

Let\(\mathcal {Y}_{m}\)be the solution of (7) and for i = 1, … , N, \(V_{m_{i}}\)are the basis obtained by applying Arnoldi algorithm to the pairs (A(i), bi), then

$$\mathcal{R}_{m}=-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}.$$

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Using the relations (5) and the expression of the right-hand side \({\mathscr{B}}\), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=b_{1}\circ b_{2}\circ\cdots\circ b_{N}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\times\lbrace V_{m}\rbrace\\ &-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Taking in consideration the fact that \(b_{i}=\|b_{i}\|v_{1}^{(i)}=\|b_{i}\|V_{m_{i}}e_{1}^{(m_{i})}\) and using the proposition (4), it results

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=\left( \upbeta e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ\cdots\circ e_{1}^{(m_{N})}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\right)\times\lbrace V_{m}\rbrace\\ &-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Invoking (7), the result achieved. □

Theorem 1

Let\(\mathcal {R}_{m}\)be the corresponding residual, then

$$\|\mathcal{R}_{m}\|=\left( {\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\|^{2}}\right)^{1/2}.$$

where\(\mathcal {Y}_{m}\)is the solution of (7) and\(e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}\).

Proof

$$ \begin{array}{@{}rcl@{}} \|\mathcal{R}_{m}\|^{2} &=&\langle\mathcal{R}_{m},\mathcal{R}_{m}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i} +1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}\|^{2}\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\langle\mathcal{Y}_{m}\times_{i} e_{m_{i}}e_{m_{i}}^{T},\mathcal{Y}_{m}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\langle\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T},\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\|^{2}. \end{array} $$

4 Rank R right-hand side tensor

Inspired by the work addressed the Sylvester matrix equation in [10, 12], and the fact that an N th-order tensor can be decomposable using the CP decomposition mentioned in the definition 5, we propose two approaches to extract approximate solutions to (1) with low rank right-hand sides; by this means, we assume in the following that the right-hand side is of rank R, i.e.,

$$\mathcal{B} = \sum\limits_{r=1}^{R} b_{1}^{(r)}\circ\cdots\circ b_{N}^{(r)},$$

where \(b_{i}^{(r)}\in \mathbb {R}^{I_{i}}\), for \(i\in \left \lbrace 1,{\ldots } ,N \right \rbrace \), and \(r\in \left \lbrace 1,{\ldots } ,R \right \rbrace \). We set for i = 1,2, ⋯ , N

$$B^{(i)}=\left[b_{i}^{(1)},b_{i}^{(2)},{\cdots} ,b_{i}^{(R)}\right].$$

Straightforward computations show that the right-hand side tensor can also be written as follows

$$\mathcal{B}=\mathcal{I}_{R}\times_{1} B^{(1)}\cdots\times_{N} B^{(N)},$$

where \(\mathcal {I}_{R}\), called identity tensor, is the N th-order tensor of size R ×… × R with ones along the super-diagonal. In the following two paragraphs, we will show how to extract approximate solutions to (1), via block and global Krylov methods (for more details about the block and global Arnoldi algorithms, we refer the reader to section 6.1 in [23] and [13] respectively).

4.1 Block Krylov approach

Let \(\mathbb {U}_{m_{i}}=\left [U_{1}^{(i)},{\ldots } ,U_{m_{i}}^{(i)}\right ] \) and \(\mathbb {H}_{m_{i}}\) be the matrices obtained by applying block Arnoldi algorithm to the pairs (A(i), B(i)), i = 1,2, … , N, starting with \(U_{1}^{(i)}=Q^{(i)}\), where Q(i) is obtained from the QR factorisation of B(i), i.e., B(i) = Q(i)Ri, then the following relation holds, for i = 1, … , N,

$$ A^{(i)}\mathbb{U}_{m_{i}}=\mathbb{U}_{m_{i}}\mathbb{H}_{m_{i}}+U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}. $$
(8)

An approximate solution is given by

$$\mathcal{X}_{m} = \mathcal{Y}_{m}\times\lbrace \mathbb{U}_{m}\rbrace,$$

where \(\mathcal {Y}_{m}\in \mathbb {R}^{{Rm_{1}\times \cdots \times Rm_{N}}}\) and \(\lbrace \mathbb {U}_{m}\rbrace =\lbrace \mathbb {U}_{m_{1}},{\ldots } ,\mathbb {U}_{m_{N}}\rbrace \).

We consider the following Petrov-Galerkin condition of orthogonality

$$ \mathcal{R}_{m}\times\lbrace \mathbb{U}_{m}\rbrace^{T} = 0, $$
(9)

with \(\mathcal {R}_{m}\) the residual tensor, which is given by

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}(\mathcal{X}_{m})\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times\lbrace \mathbb{U}_{m}\rbrace \times_{i}A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i} A^{(i)}\mathbb{U}_{m_{i}}\cdots\times_{N}\mathbb{U}_{m_{N}}. \end{array} $$

Using the relation (8) and the condition (9), we obtain

$$0=\mathcal{B}\times\lbrace\mathbb{U}_{m}\rbrace^{T}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}.$$

Using the expression of the right-hand side tensor \({\mathscr{B}}\) and the fact that \(U_{1}^{(i)}\), i = 1, … , N, are obtained from QR factorizations of B(i), i = 1, … , N, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{B}\times\lbrace\mathbb{U}_{m}\rbrace^{T}&=&\mathcal{I}_{R}\times_{1}\mathbb{U}_{m_{1}}^{T}B^{(1)}\cdots\times_{N} \mathbb{U}_{m_{N}}^{T} B^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} \mathbb{U}_{m_{1}}^{T}U_{1}^{(m_{1})}R^{(1)}\cdots\times_{N} \mathbb{U}_{m_{N}}^{T}U_{1}^{(m_{N})}R^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} E_{1}^{(m_{1})}R^{(1)}\cdots\times_{N} E_{1}^{(m_{N})}R^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} \tilde{R}^{(1)}\cdots\times_{N} \tilde{R}^{(N)}, \end{array} $$

where \(\tilde {R}^{(i)}=E_{1}^{(m_{i})}R^{(i)}\).

Then, the low dimensional Sylvester tensor equation is given as follows

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}=\mathcal{B}_{m}, $$
(10)

where \({\mathscr{B}}_{m}=\mathcal {I}_{R}\times _{1} \tilde {R}^{(1)}\cdots \times _{N} \tilde {R}^{(N)}\).

Proposition 6

Let\(\mathcal {Y}_{m}\)be the solution of (10), the associated residual tensor to the approximate solution\(\mathcal {X}_{m}=\mathcal {Y}_{m}\times \lbrace \mathbb {U}_{m}\rbrace \)is

$$\mathcal{R}_{m}=-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i}U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{U}_{m_{N}}.$$

Proof

We have

$$ \mathcal{R}_{m}=\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i} A^{(i)}\mathbb{U}_{m_{i}}\cdots\times_{N}\mathbb{U}_{m_{N}}.$$

Since \(U_{1}^{(i)}\), i = 1, … , N, are obtained from QR factorizations of B(i), i = 1, … , N, the right-hand side tensor \({\mathscr{B}}\) can be written as follows

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=&\mathcal{I}_{R}\times_{1} B^{(1)}\cdots\times_{N} B^{(N)}\\ &=&\mathcal{I}_{R}\times_{1}U_{1}^{(1)}R_{1}\cdots\times_{N}U_{1}^{(N)}R_{N}\\ &=&\mathcal{I}_{R}\times_{1}\mathbb{U}_{m_{1}}E_{1}^{(m_{1})}R_{1}\cdots\times_{N}\mathbb{U}_{m_{N}}E_{1}^{(m_{N})}R_{N}\\ &=&\mathcal{I}_{R}\times_{1} \tilde{R}^{(1)}\cdots\times_{N} \tilde{R}^{(N)}\times\lbrace\mathbb{U}_{m}\rbrace\\ &=&\mathcal{B}_{m}\times\lbrace\mathbb{U}_{m}\rbrace . \end{array} $$

Using the expression below and the relation (11), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\left( \mathcal{B}_{m}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}\right)\times\lbrace\mathbb{U}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i}U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{U}_{m_{N}}. \end{array} $$

Invoking (13), the result in the proposition achieved. □

The following theorem can be established in the same way as theorem 1 in Section 2

Theorem 2

Let\(\mathcal {R}_{m}\)be the corresponding residual to the approximate solution obtained by the block Arnoldi approach, then

$$\|\mathcal{R}_{m}\|=\left( {\sum\limits_{i=1}^{N}\|\mathcal{Y}_{m}\times_{i}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\|^{2}}\right)^{1/2}.$$

where\(\mathcal {Y}_{m}\)is the solution of (10) and\(E_{m_{i}}=\left [0_{R},{\ldots } ,0_{R},I_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}\).

The block Arnoldi algorithm for Sylvester tensor equation is summarized in algorithm 1. By the end of this section, we point out that the solution of (10) is of size Rm1 × Rm2 ×…RmN, which may become large even for moderate values of R and mi. Solving (10) required in step (3), Algorithm (1), can be then challenging; by this means, we propose the method in the following section.

figure a

4.2 Global Krylov approach

The previous block Krylov approach reduced (1) to a low dimensional Sylvester tensor equation, where the solution is of size Rm1 × Rm2 ×…RmN, while the global Krylov approach constructs an approximate solution from an m1 ×… × mN tensor.

Let \(\mathbb {V}_{m_{i}}=\left [ V_{1}^{(i)},\cdots ,V_{m_{i}}^{(i)}\right ] \) be the matrices obtained by applying the global Arnoldi algorithm to the pairs (A(i), B(i)), i = 1,2, … , N, starting with \(V_{1}^{(i)}=B^{(i)}/\|B^{(i)}\|\), then the following relations hold, for i = 1, … , N,

$$ A^{(i)}\mathbb{V}_{m_{i}} = \mathbb{V}_{m_{i}}\left( H_{m_{i}}\otimes I_{R}\right)+h_{m_{i}+1,m_{i}}V_{m_{i}+1}E_{m_{i}}^{T}. $$
(11)
$$ A^{(i)}\mathbb{V}_{m_{i}}=\mathbb{V}_{m_{i}+1}\left( \tilde{H}_{m_{i}}\otimes I_{R}\right). $$
(12)

An approximate solution is given by

$$\mathcal{X}_{m}=(\mathcal{Y}_{m}\otimes\mathcal{I}_{R})\times\lbrace\mathbb{V}_{m}\rbrace,$$

where \(\mathcal {Y}_{m}\) is the m1 × ⋯ × mN tensor, satisfying the low dimensional Sylvester tensor equation

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}H_{m_{i}}=\upbeta\mathcal{E}_{m}, $$
(13)

with \(\mathcal {E}_{m}=e_{1}^{(m_{1})}\circ \cdots \circ e_{1}^{(m_{N})}\) and \(\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|B^{(i)}\|\).

Proposition 7

Let\(\mathcal {X}_{m}\)be the approximate solution obtained by the global Arnoldi approach and\(\mathcal {R}_{m}\)be the corresponding residual, then

$$ \mathcal{R}_{m}=-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}\cdots\times_{N}\mathbb{V}_{m_{N}}. $$
(14)

where\(\mathcal {Y}_{m}\)is the solution of (13) and\(e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}\).

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-{\sum}_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times \lbrace \mathbb{V}_{m}\rbrace\times_{i} A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{i}\left( H_{m_{i}}\otimes I_{R}\right)\times\lbrace\mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Using the proposition (3) and the fact that \(E_{m_{i}}=e_{m_{i}}\otimes I_{R}\), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Since the first blocks in each basis are taken from the right-hand side tensor \({\mathscr{B}}\), we have

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=&\sum\limits_{r=1}^{R} b_{1}^{(r)}\circ\cdots\circ b_{N}^{(r)}\\ &=&\sum\limits_{r=1}^{R} \|B^{(1)}\| V_{m_{1}}^{(1)}(:,r)\circ\cdots\circ\|B^{(N)}\|V_{m_{N}}^{(1)}(:,r)\\ &=&\upbeta\sum\limits_{r=1}^{R} E_{1}^{(m_{1})}e_{r}^{(R)}\circ\cdots\circ E_{1}^{(m_{N})}e_{r}^{(R)}\times\lbrace\mathbb{V}_{m}\rbrace . \end{array} $$

where \(\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|B^{(i)}\|\).

Using the fact that \(E_{1}^{(m_{i})}e_{r}^{(R)}=e_{r}^{(Rm_{i})}=e_{1}^{(m_{i})}\otimes e_{r}^{(R)}\) and the proposition (2), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=\upbeta\sum\limits_{r=1}^{R} (e_{1}^{(m_{1})}\circ\cdots\circ e_{1}^{(m_{N})})\otimes (e_{r}^{(R)}\circ\cdots\circ e_{r}^{(R)})\times\lbrace \mathbb{V}_{m}\rbrace\\ &=\upbeta(e_{1}^{(m_{1})}\circ\cdots\circ e_{1}^{(m_{N})})\otimes\sum\limits_{r=1}^{R}(e_{r}^{(R)}\circ\cdots\circ e_{r}^{(R)})\times\lbrace \mathbb{V}_{m}\rbrace , \end{array} $$

then

$$ \mathcal{B}=\upbeta\mathcal{E}_{m}\otimes\mathcal{I}_{R}\times\lbrace\ \mathbb{V}_{m}\rbrace. $$
(15)

Now, we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\left( \upbeta\mathcal{E}_{m}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)\otimes\mathcal{I}_{R}\times\lbrace \mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Invoking (13) the result achieved. □

Lemma 1

Let\(\mathbb {V}_{m_{i}}\)be the basis generated by the global Arnoldi algorithm applied to the pairs (A(i), B(i)) and\(\mathcal {X}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}\)with Ji = Rmiand\(\mathcal {Z}\in \mathbb {R}^{K_{1}\times \cdots \times K_{N}}\)with Ki = mi. Then

$$ \|\mathcal{X}\times_{i}\mathbb{V}_{m_{i}}\|\leq\|\mathcal{X}\|, $$
(16)
$$ \|\left( \mathcal{Z}\otimes\mathcal{I}_{R}\right)\times_{i}\mathbb{V}_{m_{i}}\|=\|\mathcal{Z}\|. $$
(17)

Proof

We set \(\mathcal {P}=\mathcal {X}\times _{i}\mathbb {V}_{m_{i}}\), the i th mode fiber of the tensor \(\mathcal {P}\) can be written as follows

$$ \begin{array}{@{}rcl@{}} \mathcal{P}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}&=&\mathbb{V}_{m_{i}}\mathcal{X}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}\\ &=&\sum\limits_{k=1}^{m_{i}}V_{k}\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{N})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T}. \end{array} $$

As the norm of a tensor can be expressed as the sum of the norm of all its i-mode fibers, it results that

$$ \begin{array}{@{}rcl@{}} \|\mathcal{P}\|^{2} & = & \sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\|\mathcal{P}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{n})}\|^{2}\\ & = & \sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\|\sum\limits_{k=1}^{m_{i}}V_{k}\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{n})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T} \|^{2} \end{array} $$

Since \(\mathbb {V}_{m_{i}}=\left [ V_{1},{\ldots } ,V_{m_{i}}\right ]\) is orthonormal, we obtain

$$ \begin{array}{@{}rcl@{}} \|\mathcal{P}\|^{2}&\leq&\sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\sum\limits_{k=1}^{m_{i}}\|\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{n})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T}\|^{2}\\ &\leq&\|\mathcal{X}\|^{2}, \end{array} $$

therefore (16) is achieved.

We set \(\mathcal {Q}=\left (\mathcal {Z}\otimes \mathcal {I}_{R}\right )\times _{i}\mathbb {V}_{m_{i}}\), if \(\mathcal {Q}_{(\overline {r_{1}j_{1}},{\ldots } ,:,{\ldots } ,\overline {r_{N}j_{N}})}, \mathcal {Z}_{(j_{1},{\ldots } ,:,{\ldots } ,j_{N})}\) and \({\mathcal {I}_{R}}_{(r_{1},{\ldots } ,:,\ldots ,r_{N})}\) denotes the i th-mode fibers of \(\mathcal {Q}, \mathcal {Z}\), and \(\mathcal {I}_{R}\), respectively, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{Q}_{(\overline{r_{1}j_{1}},{\ldots} ,:,{\ldots} ,\overline{r_{N}j_{N}})}&=\mathbb{V}_{m_{i}}\left( \mathcal{Z}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\right)\\ &=\sum\limits_{k=1}^{m_{i}}V_{k}\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\\ &=\sum\limits_{k=1}^{m_{i}}\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}V_{k}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}. \end{array} $$

As \(\mathbb {V}_{m_{i}}=\left [ V_{1},{\ldots } ,V_{m_{i}}\right ]\) is orthonormal, we obtain

$$ \begin{array}{@{}rcl@{}} \|\mathcal{Q}_{(\overline{r_{1}j_{1}},{\ldots} ,:,{\ldots} ,\overline{r_{N}j_{N}})}\|^{2}&=&\sum\limits_{k=1}^{m_{i}}\vert\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\vert^{2}\|V_{k}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\|^{2}\\ &=&\|\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\|^{2}, \end{array} $$

then (17) is achieved. □

In the following theorem, we give an upper bound for the residual norm

Theorem 3

Let\(\mathcal {X}_{m}\)be the approximate solution obtained by the global Arnoldi approach and\(\mathcal {R}_{m}\)be the corresponding residual, then

$$ \|\mathcal{R}_{m}\|\leq\left( {\sum\limits_{i=1}^{N}\vert h_{m_{i}+1,m_{i}}\vert^{2}\|\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\|^{2}}\right)^{1/2} , $$
(18)

where\(\mathcal {Y}_{m}\)is the solution of (13) and\(e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}\).

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times \lbrace \mathbb{V}_{m}\rbrace\times_{i} A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\ldots\times_{i}A^{(i)}\mathbb{V}_{m_{i}}\ldots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Using the relations (12) and (15), we obtain

$$\mathcal{R}_{m}=\upbeta\mathcal{E}_{m}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m}\rbrace-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\times_{i}\tilde{H}_{m_{i}}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\ldots\times_{i}\mathbb{V}_{m_{i}+1}\ldots\times_{N}\mathbb{V}_{m_{N}}.$$

Let the tensor \(\mathcal {Z}_{m}^{(i)}\in \mathbb {R}^{(m_{1}+1)\times \cdots \times (m_{N}+1)}\), for i = 1, … , N, defined by

$$ \left\{ \begin{array}{r c l} {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&\left( \mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)_{j_{1},{\ldots} ,j_{N}}, j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&h_{m_{i}+1,m_{i}}\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}, j_{k}=1, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&0 , j_{l}=1, (l\neq k) \end{array} \right. $$

and the tensor \(\mathcal {F}_{m}\in \mathbb {R}^{(m_{1}+1)\times \cdots \times (m_{N}+1)}\), defined by

$$ \left\{ \begin{array}{r c l} {\mathcal{F}_{m}}_{j_{1},{\ldots} ,j_{N}}&=&\mathcal{E}_{m}, j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{F}_{m}}_{j_{1},{\ldots} ,j_{N}}&=&0 , j_{k}=1, (k=1,{\ldots} ,N) \end{array} \right. $$

Then

$$\mathcal{R}_{m}=\upbeta\mathcal{F}_{m}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace-\sum\limits_{i=1}^{N}\mathcal{Z}_{m}^{(i)}\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}+1}\ldots\times_{i}\mathbb{V}_{m_{i}+1}\ldots\times_{N}\mathbb{V}_{m_{N}+1}.$$

By setting \(\tilde {\mathcal {Z}}_{m}={\sum }_{i=1}^{N}\mathcal {Z}_{m}^{(i)}\), we obtain

$$ \mathcal{R}_{m}=\left( \upbeta\mathcal{F}_{m}-\tilde{\mathcal{Z}}_{m}\right)\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace . $$

We set \(\mathcal {Z}_{m}^{(0)}=\upbeta \mathcal {F}_{m}-\tilde {\mathcal {Z}}_{m}\), we have

$$ \left\{ \begin{array}{r c l} {\mathcal{Z}_{m}^{(0)}}_{j_{1},{\ldots} ,j_{N}}&=&0 j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(0)}}_{j_{1},{\ldots} ,j_{N}}&=&-h_{m_{i}+1,m_{i}}\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T} , j_{k}=1, (k=1,{\ldots} ,N) \end{array} \right. $$

and

$$\mathcal{R}_{m}=\mathcal{Z}_{m}^{(0)}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace .$$

By applying the relation (16) of lemma 1 (N − 1) times to \(\|\mathcal {R}_{m}\|\), we obtain

$$\|\mathcal{R}_{m}\|^{2}\leq\|\mathcal{Z}_{m}^{(0)}\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}+1}\|^{2}.$$

Then, the relation (17) of lemma 1 leads to

$$ \begin{array}{@{}rcl@{}} \|\mathcal{R}_{m}\|^{2}&\leq&\|\mathcal{Z}_{m}^{(0)}\|^{2}\\ &\leq&{\sum}_{i=1}^{N}\vert h_{m_{i}+1,m_{i}}\vert^{2}\|\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\|^{2}. \end{array} $$

The Global Arnoldi algorithm for Sylvester tensor equation is summarized as follows

figure b

4.3 Complexity consideration

In this section, we present the required number of operations to apply the global or block Arnoldi algorithm to the coefficient matrices. For sake of simplicity, we consider the 3-mode case, i.e., \({\mathscr{B}}\in \mathbb {R}^{n\times n\times n}\) and \(A^{(i)}\in \mathbb {R}^{n\times n}, i=1,2,3\). The associated number of operations in the Arnoldi process while computing \(A^{(i)}V_{j}^{(i)}\) is determined by

$$(2n_{z}(A^{(i)})-1)R,$$

where nz(A(i)) refer to the number of non-zero entries of the matrix A(i). So for 3 matrices and after m iterations, the total cost is

$$mR[2(nz(A^{(1)}) + nz(A^{(2)}) + nz(A^{(3)})) -3],$$

which is 3mR(2n2 − 1) in the worst case when A(i), i = 1,2,3, are full matrices.

5 Numerical examples

In this section, we present three numerical examples to show the effectiveness of our approaches for solving (1), with large-scale coefficient matrices (Example 1). The low dimensional Sylvester tensor (10) and (13) will be solved by the GLS-BTF algorithm given in [14] when the size of the reduced equation is small, or by the recursive algorithm presented in [5]. The numerical results were performed on a 2.7-GHz Intel Core i5 and 8 Go 1600-MHz DDR3 with Matlab R2016a. In all the examples, the right-hand side tensor is either constructed randomly or constructed so that the exact solution \(\mathcal {X}^{\ast }\) is given. Note that each cycle corresponds to \(k^{\prime }=5\) iterations in all examples. The used stopping criterion is

$$\|\mathcal{R}_{m}\|<\epsilon,$$

where 𝜖 is a given tolerance and \(\mathcal {R}_{m}\) is the m th residual associated with the approximated solution \(\mathcal {X}_{m}\).

5.1 Example 1

We point out that this section is restricted to the special case of the Sylvester tensor equation with N = 3, i.e.,

$$\mathcal{X}\times_{1} A^{(1)} + \mathcal{X} \times_{2} A^{(2)}+\mathcal{X} \times_{N} A^{(3)} = \mathcal{B}.$$

In this first example, the coefficient matrices are taken from [25, Example 35.1], and have the same size n. They are generated by the Matlab-commands eye and rand as follows

$$A^{(i)}=\texttt{eye}(n) + \frac{0.5}{sqrt(n)} \texttt{rand}(n) \textit{for} i=1,2,3,$$

and the right-hand side tensor \({\mathscr{B}}\) is chosen so that the exact solution \(\mathcal {X}^{\ast }\) is a random tensor of rank r, i.e.,

$$\mathcal{X}^{\ast}=\sum\limits_{k=1}^{r} x_{1}^{(k)}\circ x_{2}^{(k)}\circ x_{3}^{(k)},$$

where \(x_{i}^{(k)}=\texttt {rand}(n,1)\), for i = 1, 2, 3, and k = 1, … , r. Notice, in this case, that straightforward computations show that the right-hand side tensor is of rank R = 3r. We point out that the Krylov subspace dimensions are chosen to be the same, i.e., m1 = m2 = m3 = m, since the coefficient matrices are the same. We take n in the set {500, 10, 000}, and the tolerance 𝜖 in the set {10− 12, 10− 9} respectively. The numerical results are reported in Table 1. We point out that the CPU time does not cover the construction of the approximate solution; it covers only the construction of the Krylov subspaces basis and the solution of the reduced Sylvester tensor equation. For n = 500, we gave the exact error and the residual norm, where for n = 10,000, we gave only the residual norm due to the time needed to construct the whole approximate solution tensor. The larger CPU time needed for the block method is caused by the computational expenses of the block Arnoldi algorithm and to the computation of the solution of the reduced Sylvester tensor equation of order mR × mR × mR for increasing m; for this reason, we will compute the following examples using only the global method.

Table 1 Example 1

5.2 Example 2

In this example, the coefficient matrices A(i), i = 1, … , N, are obtained from the 5-point discretization of the Poisson equation on an n0byn0 mesh, with homogeneous Dirichlet boundary conditions. We use the following Matlab command to generate A(i), i = 1, … , N,

$$A^{(i)}=\texttt{gallery}('poisson',n_{0})$$

A(i) are then of size \({n_{0}^{2}}\times {n_{0}^{2}}\). For this experiment, we take n in the set {400, 121, 100} and N in the set {3, 4}, the tolerance 𝜖 is set to 10− 6. We construct the right-hand side tensor so that all the components of the exact solution \(\mathcal {X}^{\ast }\) are one (the numerical examples are reported in Table 2). We run the same example with a random rank R right-hand side tensor (see Table 3).

Table 2 \(\mathcal {X}^{\ast }=ones(n,n,n)\) for N = 3, n = 400 and \(\mathcal {X}^{\ast }=ones(n,n,n,n)\) for N = 4, n = 100
Table 3 The right-hand side \({\mathscr{B}}\) is a random tensor of rank R = 5

5.3 Example 3

Here, we keep the same data as the previous example, except for the coefficient matrices A(i), i = 1, … , N, which are obtained from the discretization of the operator

$$Lu:={\Delta} u-f_{1}(x,y)\frac{\partial u}{\partial x}+f_{2}(x,y)\frac{\partial u}{\partial y}+g(x,y),$$

on the unit square \(\left [ 0,1\right ]\times \left [ 0,1\right ] \) with homogeneous Dirichlet boundary conditions. The number of inner grid points in each direction is n0 for the operator L. The dimensions of the matrices A(i), i = 1, … , N, are \(I_{i} = {n_{0}^{2}}\). The discretization of the operator L yields matrices extracted from the Lyapack packag [22], using the command fdm and denoted as

$$A^{(i)} = fdm(n_{0},f_{1}(x,y),f_{2}(x,y),g(x,y)),$$

with f1(x, y) = exy, f2(x, y) = sin(xy), g(x, y) = y2x2. The numerical results for this example are reported in Tables 2 and 3.

6 Conclusion

In this paper, we have proposed new approaches to extract approximate solutions to (1) with low rank right-hand sides. The first approach is based on the use of the block Arnoldi algorithm for the coefficient matrices in (1), which leads to the reduced Sylvester tensor (10). The second approach is based on the use of the global Arnoldi algorithm in order to obtain the low dimensional Sylvester tensor (13). We gave the expressions of the residuals and the residual norms for each approach. Numerical examples show that our approaches lead to satisfactory results when applied to (1). Combining our approaches with some tensors decompositions in order to work with full rank right-hand sides is a future project.