Krylov subspace projection method for Sylvester tensor equation with low rank right-hand side

Bentbib, A. H.; El-Halouy, S.; Sadek, El M.

doi:10.1007/s11075-020-00874-0

Krylov subspace projection method for Sylvester tensor equation with low rank right-hand side

Original Paper
Published: 17 February 2020

Volume 84, pages 1411–1430, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Numerical Algorithms Aims and scope Submit manuscript

Krylov subspace projection method for Sylvester tensor equation with low rank right-hand side

Download PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Motivated by the effectiveness of Krylov projection methods and the CP decomposition of tensors, which is a low rank decomposition, we propose Arnoldi-based methods (block and global) to solve Sylvester tensor equation with low rank right-hand sides. We apply a standard Krylov subspace method to each coefficient matrix, in order to reduce the main problem to a projected Sylvester tensor equation, which can be solved by a global iterative scheme. We show how to extract approximate solutions via matrix Krylov subspaces basis. Several theoretical results such as expressions of residual and its norm are presented. To show the performance of the proposed approaches, some numerical experiments are given.

Deflated and restarted Krylov subspace methods for Sylvester tensor equations

Article 11 July 2023

Projection schemes based on Hessenberg process for Sylvester tensor equation with low-rank right-hand side

Article 14 September 2022

The new Krylov subspace methods for solving tensor equations via T-product

Article 21 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A tensor is a multi-dimensional array, in which the order is the number of dimensions, also known as ways or modes. In the past few years, tensors have attracted significant attention in several applications, such as image processing, machine learning, and scientific computing [15, 20]. One of the popular problems in tensor-based modeling is the following equation, known as the Sylvester tensor equation

$$ \mathcal{X}\times_{1} A^{(1)} + \mathcal{X} \times_{2} A^{(2)} + {\cdots} + \mathcal{X} \times_{N} A^{(N)} = \mathcal{B}, $$

(1)

where the matrices $A^{(n)}\in \mathbb {R}^{I_{n}\times I_{n}}, n=1,2,\cdots , N$, the right-hand side tensor ${\mathscr{B}} \in \mathbb {R}^{I_{1} \times I_{2} \times {\cdots } \times I_{N}}$ are known, and $\mathcal {X}\in \mathbb {R}^{I_{1} \times I_{2} \times {\cdots } \times I_{N}}$ is the unknown tensor. The product ×_i, i = 1, … , N, and some notations related to the concept of tensors will be specified in the next section. For simplicity, in the sequel, we define the following linear operator

$$ \begin{array}{cccc} \mathcal{M} : & \mathbb{R}^{I_{1}\times {\cdots} \times I_{N}} & \longrightarrow & \mathbb{R}^{I_{1}\times {\cdots} \times I_{N}}\\ & \mathcal{X} & \longmapsto & \mathcal{M}\left( \mathcal{X}\right):= {\sum}_{i=1}^{N}\mathcal{X}\times_{i} A^{(i)}. \end{array} $$

(2)

It is easy to verify that (1) is equivalent to the following linear system of equations

$$ \mathbb{A}x=b, $$

(3)

with $\mathbb {A}={\sum }_{n=1}^{N} I_{I_{N}}\otimes \cdots \otimes I_{I_{n+1}}\otimes A^{(n)}\otimes I_{I_{n-1}}\otimes {\cdots } I_{I_{1}}, x=vec\left (\mathcal {X}\right ),$ and $b=vec\left ({\mathscr{B}}\right ),$ where ⊗ denotes the Kronecker product (defined in the next section), $I_{I_{n}}$ stands for the identity matrix of order I_n and the operator vec rearranges tensor’s elements in a vector. It is well known that the Sylvester tensor equation (1) has a unique solution if and only if λ₁ + λ₂ + … + λ_N ≠ 0, for all λ_i ∈ σ(A⁽ⁱ⁾), where σ(A⁽ⁱ⁾) denotes the spectrum of A⁽ⁱ⁾ (Lemma4.2 [6]).

Note that the coefficient matrix of the linear system (3) is of order $\displaystyle \prod \limits _{i=1}^{N}I_{i}$, which may become too large even for moderate values of I₁, … , I_N and solving this linear system can be a real challenge. When $\mathcal {X}$ is a tensor of order two, i.e., a matrix, (1) is reduced to

$$ A^{(1)}X+XA^{(2)T}=B, $$

(4)

which is exactly the well-known Sylvester matrix equation, it has been widely used in control and communication theory, image restoration, and numerical methods for ordinary differential equations (see [4] and the references therein). Sylvester tensor equation (1) can arise when discretizing high dimensional linear partial differential equations using finite difference or spectral method [19, 21, 24]; a survey of the tensor-structured numerical methods in applications to multidimensional problems in scientific computing is given in [15]. As an example of the applications of (1), one can think of some Laplace-like operator L in an N −dimensional domain

$$ \begin{array}{@{}rcl@{}} Lu=f in {\Omega} \\ u=0 on \partial{\Omega}. \end{array} $$

In this paper, we are interested in the case where the discretized right-hand side in (1) is of low rank—this is possible when the function f is sufficiently smooth to be well approximated by a short sum of separable functions (see, e.g., [3, 11]). In general, the right-hand side tensor ${\mathscr{B}}$ can be approximated by a low rank tensor using CP decomposition [16].

In recent years, various methods have been proposed in order to solve (1). For instance, the tensor format of the GMRES method (GMRES-BTF) has been established by Chen and Lu [6]. In [7], gradient-based iterative algorithms have been proposed for solving (1); they are based on the hierarchical identification principle [9] and tensor arithmetic concepts. Kressner and Tobler proposed a tensor Krylov subspace method to solve (3) when the right-hand side is given in a tensor product structure, i.e., is of rank one. The idea based on applying a standard Krylov subspace method to the coefficient matrices, in order to approximate the solution by a vector of low tensor rank [17]. Ballani and Grasedyck presented an iterative scheme similar to Krylov subspace method to solve (3), relying on truncation operator, whereas the operator is implemented by hierarchical Tucker format [16], to allow applications in high dimensions [2]. Some well-known Krylov subspace methods have been studied in their tensor format by Beik et al. in [1]; the authors have described the tensor format of the full orthogonalization method (FOM-BTF) and conjugate gradient (CG-BTF)–type iterative algorithms. These methods are attractive if the coefficient matrices are not large, since they are based on the use of the tensor Krylov subspace associated to the operator ${\mathscr{M}}$ defined by (2).

In the concept of Krylov subspace methods, we apply an Arnoldi-based algorithm to the coefficient matrices to get a reduced Sylvester tensor equation which can be solved by a global iterative scheme. The approximate solution is then constructed from the solution of the reduced equation and the Krylov subspaces basis associated to the coefficient matrices.

The rest of this paper is organized as follows. In Section 2, we give notations adopted in this paper and some basic definitions and properties related to tensors. Section 3 is dedicated to theoretical results when the right-hand side tensor in (1) is of rank one. In Section 4, we present our approaches to solve (1) with a right-hand side tensor of a specific rank; we give the block Krylov approach in Section 4.1 and the global Krylov approach in Section 4.2. Some numerical examples are presented in Section 5 to evaluate the performance of our approaches. Finally, in Section 6, we give a brief conclusion and perspectives.

2 Notations and preliminary concepts

In this section, we introduce some basic definitions, tensor notations, and common operations related to tensors adopted in this paper (for more details, see [16]). Throughout this paper, vectors (tensors of order one), matrices (tensors of order two), and higher order tensors (order three or higher) are signified by lower-case letter, capital letters, and Euler script letters respectively. A tensor $\mathcal {X}$ is an element in $\mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}}$, where $I_{1},I_{2},\cdots ,I_{N}\in \mathbb {N}$; its entries are denoted by $\mathcal {X}_{i_{1}i_{2}{\cdots } i_{N}},$ with i_n ∈{1,⋯ , I_n}, for every 1 ≤ n ≤ N. Fibers are the higher order analogue of matrix rows and columns. A fiber is defined by fixing every index but one. The n-mode fiber is denoted by $\mathcal {X}_{i_{1},{\cdots } i_{n-1},:,i_{n+1},{\cdots } ,i_{N}}\in \mathbb {R}^{I_{n}}$, where the indices i_j are fixed for j = 1, ⋯ , n − 1, n + 1, ⋯ , N. The notation $\overline {i_{1}{\ldots } i_{N}}$ corresponds to a multi-index, which is obtained as follows

$$\overline{i_{1}{\ldots} i_{N}}=i_{N}+(i_{N-1}-1)I_{N}+{\ldots} +(i_{1}-1)I_{2}{\ldots} I_{N}.$$

The inner product of two same size tensors $\mathcal {A},{\mathscr{B}}\in \mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}}$ is defined by

$$<\mathcal{A},\mathcal{B}> := \sum\limits_{i_{1}=1}^{I_{1}}\sum\limits_{i_{2}=1}^{I_{2}}\cdots\sum\limits_{i_{N}=1}^{I_{N}}{\mathcal{A}}_{i_{1}{\cdots} i_{N}}{\mathcal{B}}_{i_{1}{\cdots} i_{N}},$$

and the norm induced by this inner product is

$$\parallel\mathcal{A\parallel}=\sqrt{<\mathcal{A},\mathcal{A}>}.$$

Definition 1

[16, 18]

Let $ \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}} $ be an N th-order tensor and $ U\in \mathbb {R}^{J\times I_{n}} $ be a matrix. The n-mode product of $ \mathcal {A} $ and U, denoted by $\mathcal {A}\times _{n}U $, is a tensor of size

$$ I_{1}\times I_{2}\times\cdots\times I_{n-1}\times J\times I_{n+1}\times\cdots\times I_{N},$$

whose entries are given by

$$(\mathcal{A}\times_{n}U)_{i_{1}{\cdots} i_{n-1}ji_{n+1}{\cdots} i_{N}}=\sum\limits_{i_{n}=1}^{I_{n}}\mathcal{A}_{i_{1}{\cdots} i_{N}}U_{ji_{n}}.$$

For $\mathcal {X}\in \mathbb {R}^{I_{1}\times {\cdots } I_{N}}$ and {A} a set of matrices $A_{n}\in \mathbb {R}^{I_{n}\times I_{n}}, n=1,2,{\cdots } ,N$, their multiplication in all possible modes (n = 1, 2, ... , N) is denoted as

$$\mathcal{X}\times\lbrace A \rbrace :=\mathcal{X}\times_{1}A_{1}\times_{2}A_{2}\cdots\times_{N}A_{N},$$

and

$$\mathcal{X}\times\lbrace A\rbrace^{T} :=\mathcal{X}\times_{1}{A_{1}^{T}}\times_{2}{A_{2}^{T}}\cdots\times_{N}{A_{N}^{T}}.$$

Proposition 1

[16, 18]

Let$\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}$be an N th-order tensor, $U\in \mathbb {R}^{J\times I_{m}}$, $V\in \mathbb {R}^{K\times I_{n}}$and$W\in \mathbb {R}^{I_{n}\times I_{n}}$be three matrices, then for distinct modes in a series of multiplication, the order of the multiplication is irrelevant, i.e.,

$$\mathcal{A}\times_{m}U\times_{n}V=\mathcal{A}\times_{n}V\times_{m}U.$$

If the modes are the same, then

$$\mathcal{A}\times_{n}W\times_{n}V=\mathcal{A}\times_{n}VW.$$

Definition 2

[16]

The outer product of two tensors $\mathcal {A}\in \mathbb {R}^{I_{1}\times I_{2}\times \ldots \times I_{N}}$ and ${\mathscr{B}}\in \mathbb {R}^{J_{1}\times J_{2}\times \cdots \times J_{M}}$ is a tensor denoted by $\mathcal {A}\circ {\mathscr{B}}=\mathcal {C}\in \mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}\times J_{1}\times J_{2}\times \cdots \times J_{M}}$.

Elementwise,

$$\mathcal{C}_{i_{1}{\ldots} i_{N}j_{1}{\ldots} j_{M}}=\mathcal{A}_{i_{1},{\ldots} ,i_{N}}\mathcal{B}_{j_{1},{\ldots} ,j_{M}}.$$

If v₁, v₂, … , v_N are N vectors of sizes I_i, i = 1, … , N, their outer product is an N th-order tensor of size I₁ ×… × I_N and we have

$${v_{1}\circ\ldots\circ v_{N}}_{i_{1},{\cdots} ,i_{N}}=v_{1}(i_{1}){\ldots} v_{n}(i_{N})$$

Definition 3

[16]

An N th-order tensor $ \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times \cdots \times I_{N}} $ is of rank one if it can be written as the outer product of N vectors $v_{k}\in \mathbb {R}^{I_{k}}, k=1,2,\cdots ,N$, i.e.,

$$\mathcal{A}=v_{1}\circ v_{2}\circ\cdots\circ v_{N}.$$

A tensor is of rank $R\in \mathbb {N}$ if it could be written as the sum of R rank one tensors.

Definition 4

[16, 18]

The Kronecker product of two matrices $A\in \mathbb {R}^{I_{1}\times I_{2}}$ and $B\in \mathbb {R}^{J_{1}\times J_{2}}$ is a matrix of size I₁J₁ × I₁J₂ denoted by A ⊗ B, where

$$ A\otimes B= \left( \begin{array}{ccccc} a_{11}B & {\cdots} & a_{1I_{2}}B\\ {\vdots} & {\ddots} & \vdots\\ a_{I_{1}1}B & {\cdots} & a_{I_{1}I_{2}}B \end{array}\right). $$

The Kronecker product of two tensors $\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}$ and ${\mathscr{B}}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}$ is defined by

$$\mathcal{C}=\mathcal{A}\otimes \mathcal{B}\in\\{R}^{I_{1}J_{1}\times\cdots\times I_{N}J_{N}},$$

where

$$\mathcal{C}_{\overline{i_{1}j_{1}},{\cdots} ,\overline{i_{N}j_{N}}}=\mathcal{A}_{i_{1},{\cdots} ,i_{N}}\mathcal{B}_{j_{1},{\cdots} ,j_{N}},$$

for i_n = 1, ⋯ , I_n, j_n = 1, ⋯ , J_n, n = 1, ⋯ , N.

In the following remark, we state the link between Kronecker product of 3 vectors and their outer product

Remark 1

[8, Page 33]

Let v₁, v₂, and v₃ be 3 vectors of sizes I₁, I₂, and I₃, respectively, we have

$$vec(v_{1}\circ v_{2}\circ v_{3})=v_{3}\otimes v_{2}\otimes v_{1}$$

It is easy to verify that the above remark still available for N vectors. This property shows that the idea in Section 3 of the current paper can be considered as a reformulation of the one in [17].

Proposition 2

[18]

Let $\mathcal {A}=a_{1}\circ a_{2}\circ {\cdots } \circ a_{N}$ and ${\mathscr{B}}=b_{1}\circ b_{2}\circ {\cdots } \circ b_{N}$ denote two rank one tensors, then, the Kronecker product $\mathcal {A}\otimes {\mathscr{B}}$ can be expressed by

$$\mathcal{A}\otimes\mathcal{B}=(a_{1}\otimes b_{1})\circ\cdots\circ(a_{N}\otimes b_{N}).$$

It is well known that if A, B, C, and D are four matrices, we have

$$(A\otimes B)(C\otimes D)=(AC)\otimes(BD).$$

We give an elegant generalization of this property to tensors as follows

Proposition 3

Let$\mathcal {A}\in \mathbb {R}^{I_{1}\times \cdots \times I_{N}}$, ${\mathscr{B}}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}$be two tensors, and$A\in \mathbb {R}^{K_{n}\times I_{n}}$and$B\in \mathbb {R}^{L_{n}\times J_{n}}$be two matrices, then

$$\left( \mathcal{A}\otimes\mathcal{B}\right)\times_{n}\left( A\otimes B\right) =\left( \mathcal{A}\times_{n}A\right)\otimes\left( \mathcal{B}\times_{n}B\right).$$

Proof

$$ \begin{array}{@{}rcl@{}} \left( \left( \mathcal{A}\times_{n}A\right)\otimes\left( \mathcal{B}\times_{n}B\right)\right)_{\overline{i_{1}j_{1}}\cdots\overline{k_{n}l_{n}}\cdots\overline{i_{N}j_{N}}}&=&\left( \mathcal{A}\times_{n}A\right)_{i_{1}{\cdots} k_{n}{\cdots} i_{n}}\left( \mathcal{B}\times_{n}B\right)_{j_{1}{\cdots} l_{n}{\cdots} j_{n}}\\ &=&\sum\limits_{i_{n}=1}^{I_{n}}\mathcal{A}_{i_{1}{\cdots} i_{n}{\cdots} i_{n}}A_{k_{n}i_{n}}\sum\limits_{j_{n}=1}^{J_{n}}\mathcal{B}_{j_{1}{\cdots} j_{n}{\cdots} j_{n}}B_{l_{n}j_{n}}\\ &=&\sum\limits_{i_{n}=1}^{I_{n}}\sum\limits_{j_{n}=1}^{J_{n}}\left( \mathcal{A}\otimes\mathcal{B}\right)_{\overline{i_{1}j_{1}}\cdots\overline{i_{n}j_{n}}\cdots\overline{i_{N}j_{N}}}\left( A\otimes B\right)_{\overline{k_{n}l_{n}},\overline{i_{n}j_{n}}}\\ &=&\left( \left( \mathcal{A}\otimes\mathcal{B}\right)\times_{n}\left( A\otimes B\right)\right)_{\overline{i_{1}j_{1}}\cdots\overline{k_{n}l_{n}}\cdots\overline{i_{N}j_{N}}}. \end{array} $$

□

It is easy to verify the following result

Proposition 4

Let$\mathcal {A}=a_{1}\circ a_{2}\circ {\cdots } \circ a_{N}$a rank one tensor and {V} a set of N matrices {V₁, V₂, … , V_N}, then we have

$$\mathcal{A}\times\lbrace V\rbrace=V_{1}a_{1}\circ\ldots\circ V_{N}a_{N}.$$

Definition 5 (CP decomposition 16)

Let $ \mathcal {A} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}} $ be an N th-order tensor. The CP decomposition of $\mathcal {A}$is

$$\mathcal{A}=\sum\limits_{r=1}^{R} a_{r}^{(1)} \circ a_{r}^{(2)}\cdots\circ a_{r}^{(N)},$$

where $a_{r}^{(k)}$ are vectors of size I_k with 1 ≤ k ≤ N. If we define $A_{n}=\left [a_{1}^{(n)} a_{2}^{(n)}{\cdots } a_{R}^{(n)} \right ] $ for $n\in \left \lbrace 1,{\cdots } ,N \right \rbrace $, the CP decomposition can be symbolically written as

$$ \mathcal{A}= A_{1} \circ A_{2}\circ\cdots\circ A_{N},$$

the matrices $A_{n}\in \mathbb {R}^{I_{n}\times R}$ are called factor matrices. Often, the vectors $a_{r}^{(n)}$ are chosen such that $\Vert a_{r}^{(n)} \Vert =1$. In this case, the CP decomposition is written as

$$ \mathcal{A}=\sum\limits_{r=1}^{R} \lambda_{r} a_{r}^{(1)} \circ a_{r}^{(2)}\cdots\circ a_{r}^{(N)},$$

where λ_r is a scalar that compensates for the magnitudes of vectors $ a_{r}^{(n)}$.

In the following, we denote by m = (m₁, m₂, ⋯ , m_N) a multi-index that represents all the Krylov subspaces dimensions, $e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}} $, $e_{1}^{(m_{i})}=\left [1,0,{\ldots } ,0\right ]^{T}\in \mathbb {R}^{m_{i}} $, $E_{m_{i}}=\left [0_{R},{\ldots } ,0_{R},I_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}$, $E_{1}^{(m_{i})}=\left [I_{R},0_{R}{\ldots } ,0_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}$ and $e_{r}^{(R)}\in \mathbb {R}^{R}$ the r th vector from the canonical basis of $\mathbb {R}^{R}$, where 0_R and I_R correspond to the square matrix full of zeros and the identity matrix of size R respectively.

3 Rank one right-hand side tensor

In this section, we assume that the right-hand side tensor in (1) is of rank one; i.e., it can be written as follows

$$\mathcal{B}=b_{1}\circ b_{2}\circ\cdots\circ b_{N}$$

where $b_{i}\in \mathbb {R}^{I_{i}}, i=1,{\ldots } ,N$. Applying the Arnoldi algorithm (section 6.3 in [23]) to the pairs (A⁽ⁱ⁾, b_i), i = 1, … , N, leads to the following relations, for i = 1, … , N,

$$ A^{(i)}V_{m_{i}}=V_{m_{i}}H_{m_{i}}+h_{m_{i}+1}v_{m_{i}+1}e_{m_{i}}^{T}, $$

(5)

and

$$ V_{m_{i}}^{T} A^{(i)}V_{m_{i}}=H_{m_{i}}, $$

(6)

where the first vector of each basis $V_{m_{i}}$ is exactly the normalized vector

$$v_{1}^{(i)}=\frac{b_{i}}{\|b_{i}\|}.$$

We consider the following approximate solution of (1)

$$\mathcal{X}_{m}=\mathcal{Y}_{m}\times\lbrace V_{m}\rbrace,$$

where $\mathcal {Y}_{m}\in \mathbb {R}^{m_{1}\times \cdots \times m_{N}}$ and {V_m} is the set of matrices $\lbrace V_{m_{1}},{\cdots } ,V_{m_{N}}\rbrace $.

The associated residual tensor is given by

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}. \end{array} $$

We consider the Petrov-Galerkin condition on its tensor format as follows

$$\mathcal{R}_{m}\times\lbrace V_{m}\rbrace^{T}=0,$$

hence

$$ \begin{array}{@{}rcl@{}} 0&=\mathcal{B}\times\lbrace V_{m}\rbrace^{T}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}\times_{1} V_{m_{1}}^{T}\cdots\times_{N} V_{m_{N}}^{T}\\ &=V_{m_{1}}^{T}b_{1}\circ V_{m_{2}}^{T}b_{2}\circ\cdots\circ V_{m_{N}}^{T}b_{N}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\\ &=\upbeta e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ\cdots\circ e_{1}^{(m_{N})}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}} \end{array} $$

where $\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|b_{i}\|$. Thus, the reduced Sylvester tensor equation is given as follows

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}=\upbeta \mathcal{E}_{m}, $$

(7)

where $\mathcal {E}_{m}=e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ \cdots \circ e_{1}^{(m_{N})}$. The following proposition gives the associated residual tensor

Proposition 5

Let$\mathcal {Y}_{m}$be the solution of (7) and for i = 1, … , N, $V_{m_{i}}$are the basis obtained by applying Arnoldi algorithm to the pairs (A⁽ⁱ⁾, b_i), then

$$\mathcal{R}_{m}=-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}.$$

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} A^{(i)}V_{m_{i}}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Using the relations (5) and the expression of the right-hand side ${\mathscr{B}}$, we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=b_{1}\circ b_{2}\circ\cdots\circ b_{N}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\times\lbrace V_{m}\rbrace\\ &-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Taking in consideration the fact that $b_{i}=\|b_{i}\|v_{1}^{(i)}=\|b_{i}\|V_{m_{i}}e_{1}^{(m_{i})}$ and using the proposition (4), it results

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=\left( \upbeta e_{1}^{(m_{1})}\circ e_{1}^{(m_{2})}\circ\cdots\circ e_{1}^{(m_{N})}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i} H_{m_{i}}\right)\times\lbrace V_{m}\rbrace\\ &-\sum\limits_{i=1}^{N}h_{m_{i}+1}\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i}+1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}. \end{array} $$

Invoking (7), the result achieved. □

Theorem 1

Let$\mathcal {R}_{m}$be the corresponding residual, then

$$\|\mathcal{R}_{m}\|=\left( {\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\|^{2}}\right)^{1/2}.$$

where$\mathcal {Y}_{m}$is the solution of (7) and$e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}$.

Proof

$$ \begin{array}{@{}rcl@{}} \|\mathcal{R}_{m}\|^{2} &=&\langle\mathcal{R}_{m},\mathcal{R}_{m}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{1} V_{m_{1}}\cdots\times_{i} v_{m_{i} +1}e_{m_{i}}^{T}\cdots\times_{N} V_{m_{N}}\|^{2}\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\langle\mathcal{Y}_{m}\times_{i} e_{m_{i}}e_{m_{i}}^{T},\mathcal{Y}_{m}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\langle\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T},\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\rangle\\ &=&\sum\limits_{i=1}^{N}\vert h_{m_{i} +1}\vert^{2}\|\mathcal{Y}_{m}\times_{i} e_{m_{i}}^{T}\|^{2}. \end{array} $$

□

4 Rank R right-hand side tensor

Inspired by the work addressed the Sylvester matrix equation in [10, 12], and the fact that an N th-order tensor can be decomposable using the CP decomposition mentioned in the definition 5, we propose two approaches to extract approximate solutions to (1) with low rank right-hand sides; by this means, we assume in the following that the right-hand side is of rank R, i.e.,

$$\mathcal{B} = \sum\limits_{r=1}^{R} b_{1}^{(r)}\circ\cdots\circ b_{N}^{(r)},$$

where $b_{i}^{(r)}\in \mathbb {R}^{I_{i}}$, for $i\in \left \lbrace 1,{\ldots } ,N \right \rbrace $, and $r\in \left \lbrace 1,{\ldots } ,R \right \rbrace $. We set for i = 1,2, ⋯ , N

$$B^{(i)}=\left[b_{i}^{(1)},b_{i}^{(2)},{\cdots} ,b_{i}^{(R)}\right].$$

Straightforward computations show that the right-hand side tensor can also be written as follows

$$\mathcal{B}=\mathcal{I}_{R}\times_{1} B^{(1)}\cdots\times_{N} B^{(N)},$$

where $\mathcal {I}_{R}$, called identity tensor, is the N th-order tensor of size R ×… × R with ones along the super-diagonal. In the following two paragraphs, we will show how to extract approximate solutions to (1), via block and global Krylov methods (for more details about the block and global Arnoldi algorithms, we refer the reader to section 6.1 in [23] and [13] respectively).

4.1 Block Krylov approach

Let $\mathbb {U}_{m_{i}}=\left [U_{1}^{(i)},{\ldots } ,U_{m_{i}}^{(i)}\right ] $ and $\mathbb {H}_{m_{i}}$ be the matrices obtained by applying block Arnoldi algorithm to the pairs (A⁽ⁱ⁾, B⁽ⁱ⁾), i = 1,2, … , N, starting with $U_{1}^{(i)}=Q^{(i)}$, where Q⁽ⁱ⁾ is obtained from the QR factorisation of B⁽ⁱ⁾, i.e., B⁽ⁱ⁾ = Q⁽ⁱ⁾R_i, then the following relation holds, for i = 1, … , N,

$$ A^{(i)}\mathbb{U}_{m_{i}}=\mathbb{U}_{m_{i}}\mathbb{H}_{m_{i}}+U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}. $$

(8)

An approximate solution is given by

$$\mathcal{X}_{m} = \mathcal{Y}_{m}\times\lbrace \mathbb{U}_{m}\rbrace,$$

where $\mathcal {Y}_{m}\in \mathbb {R}^{{Rm_{1}\times \cdots \times Rm_{N}}}$ and $\lbrace \mathbb {U}_{m}\rbrace =\lbrace \mathbb {U}_{m_{1}},{\ldots } ,\mathbb {U}_{m_{N}}\rbrace $.

We consider the following Petrov-Galerkin condition of orthogonality

$$ \mathcal{R}_{m}\times\lbrace \mathbb{U}_{m}\rbrace^{T} = 0, $$

(9)

with $\mathcal {R}_{m}$ the residual tensor, which is given by

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m} &=&\mathcal{B}-\mathcal{M}(\mathcal{X}_{m})\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times\lbrace \mathbb{U}_{m}\rbrace \times_{i}A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i} A^{(i)}\mathbb{U}_{m_{i}}\cdots\times_{N}\mathbb{U}_{m_{N}}. \end{array} $$

Using the relation (8) and the condition (9), we obtain

$$0=\mathcal{B}\times\lbrace\mathbb{U}_{m}\rbrace^{T}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}.$$

Using the expression of the right-hand side tensor ${\mathscr{B}}$ and the fact that $U_{1}^{(i)}$, i = 1, … , N, are obtained from QR factorizations of B⁽ⁱ⁾, i = 1, … , N, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{B}\times\lbrace\mathbb{U}_{m}\rbrace^{T}&=&\mathcal{I}_{R}\times_{1}\mathbb{U}_{m_{1}}^{T}B^{(1)}\cdots\times_{N} \mathbb{U}_{m_{N}}^{T} B^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} \mathbb{U}_{m_{1}}^{T}U_{1}^{(m_{1})}R^{(1)}\cdots\times_{N} \mathbb{U}_{m_{N}}^{T}U_{1}^{(m_{N})}R^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} E_{1}^{(m_{1})}R^{(1)}\cdots\times_{N} E_{1}^{(m_{N})}R^{(N)}\\ &=&\mathcal{I}_{R}\times_{1} \tilde{R}^{(1)}\cdots\times_{N} \tilde{R}^{(N)}, \end{array} $$

where $\tilde {R}^{(i)}=E_{1}^{(m_{i})}R^{(i)}$.

Then, the low dimensional Sylvester tensor equation is given as follows

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}=\mathcal{B}_{m}, $$

(10)

where ${\mathscr{B}}_{m}=\mathcal {I}_{R}\times _{1} \tilde {R}^{(1)}\cdots \times _{N} \tilde {R}^{(N)}$.

Proposition 6

Let$\mathcal {Y}_{m}$be the solution of (10), the associated residual tensor to the approximate solution$\mathcal {X}_{m}=\mathcal {Y}_{m}\times \lbrace \mathbb {U}_{m}\rbrace $is

$$\mathcal{R}_{m}=-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i}U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{U}_{m_{N}}.$$

Proof

We have

$$ \mathcal{R}_{m}=\mathcal{B}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i} A^{(i)}\mathbb{U}_{m_{i}}\cdots\times_{N}\mathbb{U}_{m_{N}}.$$

Since $U_{1}^{(i)}$, i = 1, … , N, are obtained from QR factorizations of B⁽ⁱ⁾, i = 1, … , N, the right-hand side tensor ${\mathscr{B}}$ can be written as follows

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=&\mathcal{I}_{R}\times_{1} B^{(1)}\cdots\times_{N} B^{(N)}\\ &=&\mathcal{I}_{R}\times_{1}U_{1}^{(1)}R_{1}\cdots\times_{N}U_{1}^{(N)}R_{N}\\ &=&\mathcal{I}_{R}\times_{1}\mathbb{U}_{m_{1}}E_{1}^{(m_{1})}R_{1}\cdots\times_{N}\mathbb{U}_{m_{N}}E_{1}^{(m_{N})}R_{N}\\ &=&\mathcal{I}_{R}\times_{1} \tilde{R}^{(1)}\cdots\times_{N} \tilde{R}^{(N)}\times\lbrace\mathbb{U}_{m}\rbrace\\ &=&\mathcal{B}_{m}\times\lbrace\mathbb{U}_{m}\rbrace . \end{array} $$

Using the expression below and the relation (11), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\left( \mathcal{B}_{m}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}\mathbb{H}_{m_{i}}\right)\times\lbrace\mathbb{U}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{1}\mathbb{U}_{m_{1}}\cdots\times_{i}U_{m_{i}+1}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{U}_{m_{N}}. \end{array} $$

Invoking (13), the result in the proposition achieved. □

The following theorem can be established in the same way as theorem 1 in Section 2

Theorem 2

Let$\mathcal {R}_{m}$be the corresponding residual to the approximate solution obtained by the block Arnoldi approach, then

$$\|\mathcal{R}_{m}\|=\left( {\sum\limits_{i=1}^{N}\|\mathcal{Y}_{m}\times_{i}H_{m_{i}+1,m_{i}}E_{m_{i}}^{T}\|^{2}}\right)^{1/2}.$$

where$\mathcal {Y}_{m}$is the solution of (10) and$E_{m_{i}}=\left [0_{R},{\ldots } ,0_{R},I_{R}\right ]^{T}\in \mathbb {R}^{Rm_{i}\times R}$.

The block Arnoldi algorithm for Sylvester tensor equation is summarized in algorithm 1. By the end of this section, we point out that the solution of (10) is of size Rm₁ × Rm₂ ×…Rm_N, which may become large even for moderate values of R and m_i. Solving (10) required in step (3), Algorithm (1), can be then challenging; by this means, we propose the method in the following section.

4.2 Global Krylov approach

The previous block Krylov approach reduced (1) to a low dimensional Sylvester tensor equation, where the solution is of size Rm₁ × Rm₂ ×…Rm_N, while the global Krylov approach constructs an approximate solution from an m₁ ×… × m_N tensor.

Let $\mathbb {V}_{m_{i}}=\left [ V_{1}^{(i)},\cdots ,V_{m_{i}}^{(i)}\right ] $ be the matrices obtained by applying the global Arnoldi algorithm to the pairs (A⁽ⁱ⁾, B⁽ⁱ⁾), i = 1,2, … , N, starting with $V_{1}^{(i)}=B^{(i)}/\|B^{(i)}\|$, then the following relations hold, for i = 1, … , N,

$$ A^{(i)}\mathbb{V}_{m_{i}} = \mathbb{V}_{m_{i}}\left( H_{m_{i}}\otimes I_{R}\right)+h_{m_{i}+1,m_{i}}V_{m_{i}+1}E_{m_{i}}^{T}. $$

(11)

$$ A^{(i)}\mathbb{V}_{m_{i}}=\mathbb{V}_{m_{i}+1}\left( \tilde{H}_{m_{i}}\otimes I_{R}\right). $$

(12)

An approximate solution is given by

$$\mathcal{X}_{m}=(\mathcal{Y}_{m}\otimes\mathcal{I}_{R})\times\lbrace\mathbb{V}_{m}\rbrace,$$

where $\mathcal {Y}_{m}$ is the m₁ × ⋯ × m_N tensor, satisfying the low dimensional Sylvester tensor equation

$$ \sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}H_{m_{i}}=\upbeta\mathcal{E}_{m}, $$

(13)

with $\mathcal {E}_{m}=e_{1}^{(m_{1})}\circ \cdots \circ e_{1}^{(m_{N})}$ and $\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|B^{(i)}\|$.

Proposition 7

Let$\mathcal {X}_{m}$be the approximate solution obtained by the global Arnoldi approach and$\mathcal {R}_{m}$be the corresponding residual, then

$$ \mathcal{R}_{m}=-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}\cdots\times_{N}\mathbb{V}_{m_{N}}. $$

(14)

where$\mathcal {Y}_{m}$is the solution of (13) and$e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}$.

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\mathcal{M}\left( \mathcal{X}_{m}\right)\\ &=&\mathcal{B}-{\sum}_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times \lbrace \mathbb{V}_{m}\rbrace\times_{i} A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{i}\left( H_{m_{i}}\otimes I_{R}\right)\times\lbrace\mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Using the proposition (3) and the fact that $E_{m_{i}}=e_{m_{i}}\otimes I_{R}$, we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Since the first blocks in each basis are taken from the right-hand side tensor ${\mathscr{B}}$, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=&\sum\limits_{r=1}^{R} b_{1}^{(r)}\circ\cdots\circ b_{N}^{(r)}\\ &=&\sum\limits_{r=1}^{R} \|B^{(1)}\| V_{m_{1}}^{(1)}(:,r)\circ\cdots\circ\|B^{(N)}\|V_{m_{N}}^{(1)}(:,r)\\ &=&\upbeta\sum\limits_{r=1}^{R} E_{1}^{(m_{1})}e_{r}^{(R)}\circ\cdots\circ E_{1}^{(m_{N})}e_{r}^{(R)}\times\lbrace\mathbb{V}_{m}\rbrace . \end{array} $$

where $\upbeta =\displaystyle \prod \limits _{i=1}^{N}\|B^{(i)}\|$.

Using the fact that $E_{1}^{(m_{i})}e_{r}^{(R)}=e_{r}^{(Rm_{i})}=e_{1}^{(m_{i})}\otimes e_{r}^{(R)}$ and the proposition (2), we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{B}&=\upbeta\sum\limits_{r=1}^{R} (e_{1}^{(m_{1})}\circ\cdots\circ e_{1}^{(m_{N})})\otimes (e_{r}^{(R)}\circ\cdots\circ e_{r}^{(R)})\times\lbrace \mathbb{V}_{m}\rbrace\\ &=\upbeta(e_{1}^{(m_{1})}\circ\cdots\circ e_{1}^{(m_{N})})\otimes\sum\limits_{r=1}^{R}(e_{r}^{(R)}\circ\cdots\circ e_{r}^{(R)})\times\lbrace \mathbb{V}_{m}\rbrace , \end{array} $$

then

$$ \mathcal{B}=\upbeta\mathcal{E}_{m}\otimes\mathcal{I}_{R}\times\lbrace\ \mathbb{V}_{m}\rbrace. $$

(15)

Now, we obtain

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\left( \upbeta\mathcal{E}_{m}-\sum\limits_{i=1}^{N}\mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)\otimes\mathcal{I}_{R}\times\lbrace \mathbb{V}_{m}\rbrace\\ &&-\sum\limits_{i=1}^{N}h_{m_{i}+1,m_{i}}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\cdots\times_{i}V_{m_{i}+1}E_{m_{i}}^{T}\cdots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Invoking (13) the result achieved. □

Lemma 1

Let$\mathbb {V}_{m_{i}}$be the basis generated by the global Arnoldi algorithm applied to the pairs (A⁽ⁱ⁾, B⁽ⁱ⁾) and$\mathcal {X}\in \mathbb {R}^{J_{1}\times \cdots \times J_{N}}$with J_i = Rm_iand$\mathcal {Z}\in \mathbb {R}^{K_{1}\times \cdots \times K_{N}}$with K_i = m_i. Then

$$ \|\mathcal{X}\times_{i}\mathbb{V}_{m_{i}}\|\leq\|\mathcal{X}\|, $$

(16)

$$ \|\left( \mathcal{Z}\otimes\mathcal{I}_{R}\right)\times_{i}\mathbb{V}_{m_{i}}\|=\|\mathcal{Z}\|. $$

(17)

Proof

We set $\mathcal {P}=\mathcal {X}\times _{i}\mathbb {V}_{m_{i}}$, the i th mode fiber of the tensor $\mathcal {P}$ can be written as follows

$$ \begin{array}{@{}rcl@{}} \mathcal{P}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}&=&\mathbb{V}_{m_{i}}\mathcal{X}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}\\ &=&\sum\limits_{k=1}^{m_{i}}V_{k}\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{N})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T}. \end{array} $$

As the norm of a tensor can be expressed as the sum of the norm of all its i-mode fibers, it results that

$$ \begin{array}{@{}rcl@{}} \|\mathcal{P}\|^{2} & = & \sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\|\mathcal{P}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{n})}\|^{2}\\ & = & \sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\|\sum\limits_{k=1}^{m_{i}}V_{k}\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{n})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T} \|^{2} \end{array} $$

Since $\mathbb {V}_{m_{i}}=\left [ V_{1},{\ldots } ,V_{m_{i}}\right ]$ is orthonormal, we obtain

$$ \begin{array}{@{}rcl@{}} \|\mathcal{P}\|^{2}&\leq&\sum\limits_{j_{1},{\ldots} ,j_{n-1},j_{n+1},{\ldots} ,j_{N}}\sum\limits_{k=1}^{m_{i}}\|\left[ \mathcal{X}_{(j_{1},{\ldots} ,(k-1)R+1,{\ldots} ,j_{n})},{\ldots} ,\mathcal{X}_{(j_{1},{\ldots} ,kR+1,{\ldots} ,j_{N})}\right]^{T}\|^{2}\\ &\leq&\|\mathcal{X}\|^{2}, \end{array} $$

therefore (16) is achieved.

We set $\mathcal {Q}=\left (\mathcal {Z}\otimes \mathcal {I}_{R}\right )\times _{i}\mathbb {V}_{m_{i}}$, if $\mathcal {Q}_{(\overline {r_{1}j_{1}},{\ldots } ,:,{\ldots } ,\overline {r_{N}j_{N}})}, \mathcal {Z}_{(j_{1},{\ldots } ,:,{\ldots } ,j_{N})}$ and ${\mathcal {I}_{R}}_{(r_{1},{\ldots } ,:,\ldots ,r_{N})}$ denotes the i th-mode fibers of $\mathcal {Q}, \mathcal {Z}$, and $\mathcal {I}_{R}$, respectively, we have

$$ \begin{array}{@{}rcl@{}} \mathcal{Q}_{(\overline{r_{1}j_{1}},{\ldots} ,:,{\ldots} ,\overline{r_{N}j_{N}})}&=\mathbb{V}_{m_{i}}\left( \mathcal{Z}_{(j_{1},{\ldots} ,:,{\ldots} ,j_{N})}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\right)\\ &=\sum\limits_{k=1}^{m_{i}}V_{k}\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\\ &=\sum\limits_{k=1}^{m_{i}}\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}V_{k}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}. \end{array} $$

As $\mathbb {V}_{m_{i}}=\left [ V_{1},{\ldots } ,V_{m_{i}}\right ]$ is orthonormal, we obtain

$$ \begin{array}{@{}rcl@{}} \|\mathcal{Q}_{(\overline{r_{1}j_{1}},{\ldots} ,:,{\ldots} ,\overline{r_{N}j_{N}})}\|^{2}&=&\sum\limits_{k=1}^{m_{i}}\vert\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\vert^{2}\|V_{k}\otimes{\mathcal{I}_{R}}_{(r_{1},{\ldots} ,:,{\ldots} ,r_{N})}\|^{2}\\ &=&\|\mathcal{Z}_{(j_{1},{\ldots} ,k,{\ldots} ,j_{N})}\|^{2}, \end{array} $$

then (17) is achieved. □

In the following theorem, we give an upper bound for the residual norm

Theorem 3

Let$\mathcal {X}_{m}$be the approximate solution obtained by the global Arnoldi approach and$\mathcal {R}_{m}$be the corresponding residual, then

$$ \|\mathcal{R}_{m}\|\leq\left( {\sum\limits_{i=1}^{N}\vert h_{m_{i}+1,m_{i}}\vert^{2}\|\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\|^{2}}\right)^{1/2} , $$

(18)

where$\mathcal {Y}_{m}$is the solution of (13) and$e_{m_{i}}=\left [0,{\ldots } ,0,1\right ]^{T}\in \mathbb {R}^{m_{i}}$.

Proof

We have

$$ \begin{array}{@{}rcl@{}} \mathcal{R}_{m}&=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times \lbrace \mathbb{V}_{m}\rbrace\times_{i} A^{(i)}\\ &=&\mathcal{B}-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\otimes\mathcal{I}_{R}\right)\times_{1}\mathbb{V}_{m_{1}}\ldots\times_{i}A^{(i)}\mathbb{V}_{m_{i}}\ldots\times_{N}\mathbb{V}_{m_{N}}. \end{array} $$

Using the relations (12) and (15), we obtain

$$\mathcal{R}_{m}=\upbeta\mathcal{E}_{m}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m}\rbrace-\sum\limits_{i=1}^{N}\left( \mathcal{Y}_{m}\times_{i}\tilde{H}_{m_{i}}\right)\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}}\ldots\times_{i}\mathbb{V}_{m_{i}+1}\ldots\times_{N}\mathbb{V}_{m_{N}}.$$

Let the tensor $\mathcal {Z}_{m}^{(i)}\in \mathbb {R}^{(m_{1}+1)\times \cdots \times (m_{N}+1)}$, for i = 1, … , N, defined by

$$ \left\{ \begin{array}{r c l} {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&\left( \mathcal{Y}_{m}\times_{i}H_{m_{i}}\right)_{j_{1},{\ldots} ,j_{N}}, j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&h_{m_{i}+1,m_{i}}\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}, j_{k}=1, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(i)}}_{j_{1},{\ldots} ,j_{N}}&=&0 , j_{l}=1, (l\neq k) \end{array} \right. $$

and the tensor $\mathcal {F}_{m}\in \mathbb {R}^{(m_{1}+1)\times \cdots \times (m_{N}+1)}$, defined by

$$ \left\{ \begin{array}{r c l} {\mathcal{F}_{m}}_{j_{1},{\ldots} ,j_{N}}&=&\mathcal{E}_{m}, j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{F}_{m}}_{j_{1},{\ldots} ,j_{N}}&=&0 , j_{k}=1, (k=1,{\ldots} ,N) \end{array} \right. $$

Then

$$\mathcal{R}_{m}=\upbeta\mathcal{F}_{m}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace-\sum\limits_{i=1}^{N}\mathcal{Z}_{m}^{(i)}\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}+1}\ldots\times_{i}\mathbb{V}_{m_{i}+1}\ldots\times_{N}\mathbb{V}_{m_{N}+1}.$$

By setting $\tilde {\mathcal {Z}}_{m}={\sum }_{i=1}^{N}\mathcal {Z}_{m}^{(i)}$, we obtain

$$ \mathcal{R}_{m}=\left( \upbeta\mathcal{F}_{m}-\tilde{\mathcal{Z}}_{m}\right)\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace . $$

We set $\mathcal {Z}_{m}^{(0)}=\upbeta \mathcal {F}_{m}-\tilde {\mathcal {Z}}_{m}$, we have

$$ \left\{ \begin{array}{r c l} {\mathcal{Z}_{m}^{(0)}}_{j_{1},{\ldots} ,j_{N}}&=&0 j_{k}\in\{1,{\ldots} ,m_{k}\}, (k=1,{\ldots} ,N)\\ {\mathcal{Z}_{m}^{(0)}}_{j_{1},{\ldots} ,j_{N}}&=&-h_{m_{i}+1,m_{i}}\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T} , j_{k}=1, (k=1,{\ldots} ,N) \end{array} \right. $$

and

$$\mathcal{R}_{m}=\mathcal{Z}_{m}^{(0)}\otimes\mathcal{I}_{R}\times\lbrace\mathbb{V}_{m+1}\rbrace .$$

By applying the relation (16) of lemma 1 (N − 1) times to $\|\mathcal {R}_{m}\|$, we obtain

$$\|\mathcal{R}_{m}\|^{2}\leq\|\mathcal{Z}_{m}^{(0)}\otimes\mathcal{I}_{R}\times_{1}\mathbb{V}_{m_{1}+1}\|^{2}.$$

Then, the relation (17) of lemma 1 leads to

$$ \begin{array}{@{}rcl@{}} \|\mathcal{R}_{m}\|^{2}&\leq&\|\mathcal{Z}_{m}^{(0)}\|^{2}\\ &\leq&{\sum}_{i=1}^{N}\vert h_{m_{i}+1,m_{i}}\vert^{2}\|\mathcal{Y}_{m}\times_{i}e_{m_{i}}^{T}\|^{2}. \end{array} $$

□

The Global Arnoldi algorithm for Sylvester tensor equation is summarized as follows

4.3 Complexity consideration

In this section, we present the required number of operations to apply the global or block Arnoldi algorithm to the coefficient matrices. For sake of simplicity, we consider the 3-mode case, i.e., ${\mathscr{B}}\in \mathbb {R}^{n\times n\times n}$ and $A^{(i)}\in \mathbb {R}^{n\times n}, i=1,2,3$. The associated number of operations in the Arnoldi process while computing $A^{(i)}V_{j}^{(i)}$ is determined by

$$(2n_{z}(A^{(i)})-1)R,$$

where n_z(A⁽ⁱ⁾) refer to the number of non-zero entries of the matrix A⁽ⁱ⁾. So for 3 matrices and after m iterations, the total cost is

$$mR[2(nz(A^{(1)}) + nz(A^{(2)}) + nz(A^{(3)})) -3],$$

which is 3mR(2n² − 1) in the worst case when A⁽ⁱ⁾, i = 1,2,3, are full matrices.

5 Numerical examples

In this section, we present three numerical examples to show the effectiveness of our approaches for solving (1), with large-scale coefficient matrices (Example 1). The low dimensional Sylvester tensor (10) and (13) will be solved by the GLS-BTF algorithm given in [14] when the size of the reduced equation is small, or by the recursive algorithm presented in [5]. The numerical results were performed on a 2.7-GHz Intel Core i5 and 8 Go 1600-MHz DDR3 with Matlab R2016a. In all the examples, the right-hand side tensor is either constructed randomly or constructed so that the exact solution $\mathcal {X}^{\ast }$ is given. Note that each cycle corresponds to $k^{\prime }=5$ iterations in all examples. The used stopping criterion is

$$\|\mathcal{R}_{m}\|<\epsilon,$$

where 𝜖 is a given tolerance and $\mathcal {R}_{m}$ is the m th residual associated with the approximated solution $\mathcal {X}_{m}$.

5.1 Example 1

We point out that this section is restricted to the special case of the Sylvester tensor equation with N = 3, i.e.,

$$\mathcal{X}\times_{1} A^{(1)} + \mathcal{X} \times_{2} A^{(2)}+\mathcal{X} \times_{N} A^{(3)} = \mathcal{B}.$$

In this first example, the coefficient matrices are taken from [25, Example 35.1], and have the same size n. They are generated by the Matlab-commands eye and rand as follows

$$A^{(i)}=\texttt{eye}(n) + \frac{0.5}{sqrt(n)} \texttt{rand}(n) \textit{for} i=1,2,3,$$

and the right-hand side tensor ${\mathscr{B}}$ is chosen so that the exact solution $\mathcal {X}^{\ast }$ is a random tensor of rank r, i.e.,

$$\mathcal{X}^{\ast}=\sum\limits_{k=1}^{r} x_{1}^{(k)}\circ x_{2}^{(k)}\circ x_{3}^{(k)},$$

where $x_{i}^{(k)}=\texttt {rand}(n,1)$, for i = 1, 2, 3, and k = 1, … , r. Notice, in this case, that straightforward computations show that the right-hand side tensor is of rank R = 3r. We point out that the Krylov subspace dimensions are chosen to be the same, i.e., m₁ = m₂ = m₃ = m, since the coefficient matrices are the same. We take n in the set {500, 10, 000}, and the tolerance 𝜖 in the set {10^− 12, 10^− 9} respectively. The numerical results are reported in Table 1. We point out that the CPU time does not cover the construction of the approximate solution; it covers only the construction of the Krylov subspaces basis and the solution of the reduced Sylvester tensor equation. For n = 500, we gave the exact error and the residual norm, where for n = 10,000, we gave only the residual norm due to the time needed to construct the whole approximate solution tensor. The larger CPU time needed for the block method is caused by the computational expenses of the block Arnoldi algorithm and to the computation of the solution of the reduced Sylvester tensor equation of order mR × mR × mR for increasing m; for this reason, we will compute the following examples using only the global method.

Table 1 Example 1

Full size table

5.2 Example 2

In this example, the coefficient matrices A⁽ⁱ⁾, i = 1, … , N, are obtained from the 5-point discretization of the Poisson equation on an n₀ −by − n₀ mesh, with homogeneous Dirichlet boundary conditions. We use the following Matlab command to generate A⁽ⁱ⁾, i = 1, … , N,

$$A^{(i)}=\texttt{gallery}('poisson',n_{0})$$

A⁽ⁱ⁾ are then of size ${n_{0}^{2}}\times {n_{0}^{2}}$. For this experiment, we take n in the set {400, 121, 100} and N in the set {3, 4}, the tolerance 𝜖 is set to 10^− 6. We construct the right-hand side tensor so that all the components of the exact solution $\mathcal {X}^{\ast }$ are one (the numerical examples are reported in Table 2). We run the same example with a random rank R right-hand side tensor (see Table 3).

Table 2 $\mathcal {X}^{\ast }=ones(n,n,n)$ for N = 3, n = 400 and $\mathcal {X}^{\ast }=ones(n,n,n,n)$ for N = 4, n = 100

Full size table

Table 3 The right-hand side ${\mathscr{B}}$ is a random tensor of rank R = 5

Full size table

5.3 Example 3

Here, we keep the same data as the previous example, except for the coefficient matrices A⁽ⁱ⁾, i = 1, … , N, which are obtained from the discretization of the operator

$$Lu:={\Delta} u-f_{1}(x,y)\frac{\partial u}{\partial x}+f_{2}(x,y)\frac{\partial u}{\partial y}+g(x,y),$$

on the unit square $\left [ 0,1\right ]\times \left [ 0,1\right ] $ with homogeneous Dirichlet boundary conditions. The number of inner grid points in each direction is n₀ for the operator L. The dimensions of the matrices A⁽ⁱ⁾, i = 1, … , N, are $I_{i} = {n_{0}^{2}}$. The discretization of the operator L yields matrices extracted from the Lyapack packag [22], using the command fdm and denoted as

$$A^{(i)} = fdm(n_{0},f_{1}(x,y),f_{2}(x,y),g(x,y)),$$

with f₁(x, y) = e^xy, f₂(x, y) = sin(xy), g(x, y) = y² − x². The numerical results for this example are reported in Tables 2 and 3.

6 Conclusion

In this paper, we have proposed new approaches to extract approximate solutions to (1) with low rank right-hand sides. The first approach is based on the use of the block Arnoldi algorithm for the coefficient matrices in (1), which leads to the reduced Sylvester tensor (10). The second approach is based on the use of the global Arnoldi algorithm in order to obtain the low dimensional Sylvester tensor (13). We gave the expressions of the residuals and the residual norms for each approach. Numerical examples show that our approaches lead to satisfactory results when applied to (1). Combining our approaches with some tensors decompositions in order to work with full rank right-hand sides is a future project.

References

Ali Beik, F.P., Movahed, F.S., Ahmadi-Asl, S.: On the Krylov subspace methods based on tensor format for positive definite Sylvester tensor equations. Numer. Linear Algebra Appl. 23(3), 444–466 (2016)
MathSciNet MATH Google Scholar
Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format. Numer. Linear Algebra Appl. 20(1), 27–43 (2013)
MathSciNet MATH Google Scholar
Beylkin, G., Mohlenkamp, M.J.: Algorithms for numerical analysis in high dimensions. SIAM J. Sci. Comput. 26(6), 2133–2159 (2005)
MathSciNet MATH Google Scholar
Calvetti, D., Reichel, L.: Application of ADI iterative methods to the restoration of noisy images. SIAM J. Matrix Anal. Appl. 17(1), 165–186 (1996)
MathSciNet MATH Google Scholar
Chen, M., Kressner, D.: Recursive blocked algorithms for linear systems with Kronecker product structure. arXiv:1905.09539 (2019)
Chen, Z., Lu, L.: A projection method and Kronecker product pre-conditioner for solving Sylvester tensor equations. Sci. Chin. Math. 55(6), 1281–1292 (2012)
MATH Google Scholar
Chen, Z., Lu, L.: A gradient based iterative solutions for sylvester tensor equations. Math. Probl. Eng., 2013 (2013)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.i.: Non-Negative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation. Wiley (2009)
Ding, F., Chen, T.: Gradient based iterative algorithms for solving a class of matrix equations. IEEE Trans. Autom. Control 50(8), 1216–1221 (2005)
MathSciNet MATH Google Scholar
El Guennouni, A., Jbilou, K., Riquet, A.: Block Krylov subspace methods for solving large Sylvester equations. Numer. Algor. 29(1–3), 75–96 (2002)
MathSciNet MATH Google Scholar
Grasedyck, L.: Existence and computation of low Kronecker-rank approximations for large linear systems of tensor product structure. Computing 72(3-4), 247–265 (2004)
MathSciNet MATH Google Scholar
Jbilou, K.: Low rank approximate solutions to large Sylvester matrix equations. Appl. Math. Comput. 177(1), 365–376 (2006)
MathSciNet MATH Google Scholar
Jbilou, K., Messaoudi, A., Sadok, H.: Global FOM and GMRES algorithms for matrix equations. Appl. Numer. Math. 31(1), 49–63 (1999)
MathSciNet MATH Google Scholar
Karimi, S., Dehghan, M.: Global least squares method based on tensor form to solve linear systems in Kronecker format. Trans. Instit. Measur. Control 40(7), 2378–2386 (2018)
Google Scholar
Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: Survey on recent advances. Chemom. Intell. Lab. Syst. 110(1), 1–19 (2012)
MathSciNet Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
MathSciNet MATH Google Scholar
Kressner, D., Tobler, C.: Krylov subspace methods for linear systems with tensor product structure. SIAM J. Matrix Anal. Appl. 31(4), 1688–1714 (2010)
MathSciNet MATH Google Scholar
Lee, N., Cichocki, A.: Fundamental tensor operations for large-scale data analysis using tensor network formats. Multidim. Syst. Sign. Process. 29(3), 921–960 (2018)
MathSciNet Google Scholar
Li, B.W., Tian, S., Sun, Y.S., Hu, Z.M.: Schur decomposition for 3d matrix equations and its application in solving radiative discrete ordinates equations discretized by chebyshev collocation spectral method. J. Comput. Phys. 229(4), 1198–1212 (2010)
MathSciNet MATH Google Scholar
Lu, H., Plataniotis, K.N., Venetsanopoulos, A.N.: A survey of multi-linear subspace learning for tensor data. Pattern Recogn. 44(7), 1540–1551 (2011)
MATH Google Scholar
Malek, A., Momeni-Masuleh, S.H.: A mixed collocation finite difference method for 3d microscopic heat transport problems. J. Comput. Appl. Math. 217(1), 137–147 (2008)
MathSciNet MATH Google Scholar
Penzl, T., et al.: A Matlab toolbox for large Lyapunov and Riccati equations, model reduction problems, and linear quadratic optimal control problems. Software available at https://www.tu-chemnitz.de/sfb393/lyapack/ (2000)
Saad, Y.: Iterative Methods for Sparse Linear Systems, vol. 82. SIAM (2003)
Sun, Y.S., Ma, J., Li, B.W.: Chebyshev collocation spectral method for three-dimensional transient coupled radiative conductive heat transfer. J. Heat Transfer 134(9), 092701 (2012)
Google Scholar
Trefethen, L.N., Bau, D. III: Numerical Linear Algebra, vol. 50. SIAM (1997)

Download references

Author information

Authors and Affiliations

Laboratory LAMAI, University Cadi Ayyad, Marrakesh, Morocco
A. H. Bentbib & S. El-Halouy
Laboratory LabSIPE, ENSA, d’El Jadida, University Chouaib Doukkali, El Jadida, Morocco
El M. Sadek

Authors

A. H. Bentbib
View author publications
You can also search for this author in PubMed Google Scholar
S. El-Halouy
View author publications
You can also search for this author in PubMed Google Scholar
El M. Sadek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. El-Halouy.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bentbib, A.H., El-Halouy, S. & Sadek, E.M. Krylov subspace projection method for Sylvester tensor equation with low rank right-hand side. Numer Algor 84, 1411–1430 (2020). https://doi.org/10.1007/s11075-020-00874-0

Download citation

Received: 23 July 2019
Accepted: 03 January 2020
Published: 17 February 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11075-020-00874-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Krylov subspace projection method for Sylvester tensor equation with low rank right-hand side

Abstract

Similar content being viewed by others

Deflated and restarted Krylov subspace methods for Sylvester tensor equations

Projection schemes based on Hessenberg process for Sylvester tensor equation with low-rank right-hand side

The new Krylov subspace methods for solving tensor equations via T-product

1 Introduction

2 Notations and preliminary concepts

Definition 1

Proposition 1

Definition 2

Definition 3

Definition 4

Remark 1

Proposition 2

Proposition 3

Proof

Proposition 4

Definition 5 (CP decomposition 16)

3 Rank one right-hand side tensor

Proposition 5

Proof

Theorem 1

Proof

4 Rank R right-hand side tensor

4.1 Block Krylov approach

Proposition 6

Proof

Theorem 2

4.2 Global Krylov approach

Proposition 7

Proof

Lemma 1

Proof

Theorem 3

Proof

4.3 Complexity consideration

5 Numerical examples

5.1 Example 1

5.2 Example 2

5.3 Example 3

6 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation