Golub–Kahan bidiagonalization for ill-conditioned tensor equations with applications

Beik, Fatemeh P. A.; Jbilou, Khalide; Najafi-Kalyani, Mehdi; Reichel, Lothar

doi:10.1007/s11075-020-00911-y

Golub–Kahan bidiagonalization for ill-conditioned tensor equations with applications

Original Paper
Published: 02 April 2020

Volume 84, pages 1535–1563, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Numerical Algorithms Aims and scope Submit manuscript

Golub–Kahan bidiagonalization for ill-conditioned tensor equations with applications

Download PDF

Fatemeh P. A. Beik¹,
Khalide Jbilou^2,3,
Mehdi Najafi-Kalyani¹ &
…
Lothar Reichel⁴

547 Accesses
20 Citations
Explore all metrics

Abstract

This paper is concerned with the solution of severely ill-conditioned linear tensor equations. These kinds of equations may arise when discretizing partial differential equations in many space-dimensions by finite difference or spectral methods. The deblurring of color images is another application. We describe the tensor Golub–Kahan bidiagonalization (GKB) algorithm and apply it in conjunction with Tikhonov regularization. The conditioning of the Stein tensor equation is examined. These results suggest how the tensor GKB process can be used to solve general linear tensor equations. Computed examples illustrate the feasibility of the proposed algorithm.

Paige’s Algorithm for solving a class of tensor least squares problem

Article 20 September 2023

Alternative Arnoldi process for ill-conditioned tensor equations with application to image restoration

Article 16 August 2024

A new preconditioned Gauss-Seidel method for solving ${\mathcal {M}}$-tensor multi-linear system

Article 23 August 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper is concerned with the numerical solution of severely ill-conditioned tensor equations. We are particularly interested in the solution of Sylvester and Stein tensor equations. The proposed iterative schemes also can be used to solve equations of the form

$$ \mathcal{L}(\mathcal{X})=\mathcal{C}, $$

(1)

where ${\mathcal{L}}:\mathbb {R}^{I_{1}\times I_{2}\times {\ldots } \times I_{N}} \to \mathbb {R}^{I_{1}\times I_{2}\times {\ldots } \times I_{N}}$ is a linear tensor operator. Severely ill-conditioned tensor equations arise in color image restoration, video restoration, and when solving certain partial differential equations in several space-dimensions by collocation methods; see, e.g., [3, 21,22,23,24]. Throughout this work, vectors and matrices are denoted by lowercase and capital letters, respectively, and tensors of order three (or higher) are represented by Euler script letters.

Before discussing the problems to be solved, we recall the definition of an n-mode product from [19]:

Definition 1

The n-mode (matrix) product of a tensor ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times I_{2}\times {\ldots } \times I_{N}}$ with a matrix $U\in \mathbb {R}^{J\times I_{n}}$ is denoted by ${\mathcal{X}} \times _{n} U$. It is of size

$$ {I}_{1}\times \cdots\times I_{n-1}\times J \times I_{n+1}\times{\cdots} \times I_{N}, $$

and has the elements

$$ (\mathcal{X} \times_{n} U)_{_{i_{1}{\cdots} i_{n-1}ji_{n+1}{\cdots} i_{N}}} = \sum\limits_{i_{n}=1}^{I_{n}} x_{i_{1}i_{2}{\cdots} i_{N}}u_{ji_{n}}. $$

The n-mode (vector) product of a tensor ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times I_{2}\times {\ldots } \times I_{N}}$ with a vector $v \in \mathbb {R}^{I_{n}}$ is of order N − 1 and is denoted by ${\mathcal{X}} \bar {\times }_{n} v$; its size is I₁ ×… × I_n− 1 × I_n+ 1 ×… × I_N.

The Sylvester and Stein tensor equations are given by

$$ \mathcal{X}\times_{1} A^{(1)}+\mathcal{X}\times_{2} A^{(2)}+\ldots+\mathcal{X}\times_{N} A^{(N)}=\mathcal{D} $$

(2)

and

$$ \mathcal{X}-\mathcal{X}\times_{1} A^{(1)}\times_{2} A^{(2)}\ldots\times_{N} A^{(N)}=\mathcal{F}, $$

(3)

respectively, where the right-hand side tensors ${\mathcal{D}}, {\mathcal{F}} \in \mathbb {R}^{I_{1}\times I_{2} \times {\ldots } \times I_{N}}$ and the coefficient matrices $A^{(n)}\in \mathbb {R}^{I_{n}\times I_{n}}$ (n = 1,2,…,N) are known, and ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times I_{2} \times {\ldots } \times I_{N}}$ is the unknown tensor to be determined.

Many discretized linear partial differential equations in several space-dimensions by finite differences [2, 3, 9] or spectral methods [3, 21,22,23, 27] can be expressed with the aid of a Sylvester tensor equation. A discussion on the conditioning of (2) under certain conditions is provided by Najafi et al. [24], who proposed the application of Tikhonov regularization in conjunction with the global Hessenberg process in tensor form to solve (2) with a perturbed right-hand side. Some perturbation results for (3) are provided by Liang and Zheng [20] and by Xu and Wang [28], who solve (3) by using the tensor forms of the BiCG and BiCR iterative methods. Liang and Zheng [20] present perturbation results for (3) for the case when N is even and A⁽¹⁾ = ⋯ = A^(N) = A is Schur stable, i.e., when all eigenvalues of A lie in the open unit disc in the complex plane. These results rely on the matrix spectral norm of (I − A^(N) ⊗⋯ ⊗ A⁽²⁾ ⊗ A⁽¹⁾)^− 1.

Recently, Huang et al. [16] proposed to apply the global form of well-known iterative methods in their tensor forms to the solution of a class of tensor equations via the Einstein product. The iterative methods in the present work are well suited to solve problems discussed in [16] when they are severely ill-conditioned; Huang et al. [16] do not consider this situation.

This paper first establishes some results on the conditioning of (3) motivated by [20, 28]. Then the tensor form of the Golub–Kahan bidiagonalization (GKB) process for the solution of severely ill-conditioned tensor equations is described. In particular, we consider the solution of severely ill-conditioned tensor equations of the forms (2) and (3). To this end, we apply results in [3] and generalize techniques described in [5]. We remark that the results discussed in Section 3 also can be applied to the solution of severely ill-conditioned problems of the form (1).

The remainder of this section introduces notation used in throughout this paper. We also recall the concept of the contracted product between two tensors. Section 2 presents some results on the sensitivity of the solution of (3), and in Section 3 we describe a tensor form of the GKB process and discuss the use of Gauss-type quadrature to determine quantities of interest for Tikhonov regularization. Section 4 presents some numerical results, and Section 5 contains concluding remarks.

1.1 Notation

Let ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times I_{2}\times \cdots \times I_{N}}$ be an N-mode tensor, and let $x_{i_{1}i_{2}{\ldots } i_{N}}$ denote the element (i₁,i₂,…,i_N) of ${\mathcal{X}}$. For a real square matrix A with real eigenvalues, ${\lambda }_{\min \limits }(A)$ and ${\lambda }_{\max \limits }(A)$ stand for its smallest and largest eigenvalues, respectively. The set of all eigenvalues of A is denoted by σ(A). The symmetric and skew-symmetric parts of A are given by

$$ \mathcal{H}(A)=\frac{1}{2}\left( A+A^{T}\right)\quad \text{and} \quad \mathcal{S}(A)=\frac{1}{2}\left( A-A^{T}\right), $$

respectively, where the superscript ^T denotes transposition. The condition number of an invertible matrix A is defined by

$$ \text{cond}(A)=\|A\|_{2}\|A^{-1}\|_{2}, $$

where ∥⋅∥₂ stands for the spectral norm. The largest and smallest singular values of a matrix A are denoted by ${\sigma }_{\max \limits }(A)$ and ${\sigma }_{\min \limits }(A)$, respectively. In particular, for an invertible matrix it holds

$$ \text{cond}(A)=\frac{{\sigma}_{\max}(A)}{{\sigma}_{\min}(A)}. $$

We use the notation $\underset {i = 1}{\overset {{\ell }}{\bigotimes }} x_{i}:= x_{1}\otimes x_{2}\otimes \cdots \otimes x_{\ell }$ for the multi-dimensional Kronecker product. The vector $\text {vec}({\mathcal{X}})$ is obtained by using the standard vectorization operator with respect to frontal slices of ${\mathcal{X}}$. The mode-n matricization of a tensor ${\mathcal{X}}$ is denoted by X_(n); it arranges the mode-n fibers to be the columns of the resulting matrix. Recall that a fiber of a tensor is defined by fixing all indices but one; see [19] for more details.

1.2 Contracted product

The $\boxtimes ^{N}$ product between two N-mode tensors

$$ \mathcal{X}\in \mathbb{R}^{I_{1}\times I_{2} \times {\cdots} \times I_{N-1} \times I_{N}} \quad \text{and} \quad \mathcal{Y}\in \mathbb{R}^{I_{1}\times {I}_{2} \times {\cdots} \times I_{N-1} \times \tilde{I}_{N}} $$

is defined as the $I_{N} \times \tilde {I}_{N}$ matrix, whose (i,j)th entry is given by

$$ {[\mathcal{X} \boxtimes^{N} \mathcal{Y}]}_{ij}=\text{tr} \left( \mathcal{X}_{{::\dots:}i} \boxtimes^{N-1} \mathcal{Y}_{{::\dots:}j}\right),\qquad N=3,4,\ldots, $$

where

$$ \mathcal{X} \boxtimes^{2} \mathcal{Y}= \mathcal{X}^{T} \mathcal{Y}, \qquad \mathcal{X}\in \mathbb{R}^{I_{1}\times I_{2}}, \mathcal{Y}\in \mathbb{R}^{I_{1}\times \tilde{I}_{2}}, $$

and tr(⋅) denotes the trace of its argument. The $\boxtimes ^{N}$ product is a special case of the contracted product [10]. Specifically, ${\mathcal{X}} \boxtimes ^{N} {\mathcal{Y}}$ is the contracted product of the N-mode tensors ${\mathcal{X}}$ and ${\mathcal{Y}}$ along the first N − 1 modes. For ${\mathcal{X}}, {\mathcal{Y}} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$, we have

$$ \left\langle {\mathcal{X}, \mathcal{Y}} \right\rangle = \text{tr}(\mathcal{X} \boxtimes^{N} \mathcal{Y}),\qquad N=2,3,\ldots, $$

and $\left \| {\mathcal{X}} \right \|^{2}= \text {tr} ({\mathcal{X}} \boxtimes ^{N} {\mathcal{X}})= {\mathcal{X}} \boxtimes ^{(N+1)} {\mathcal{X}}$ for ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$. We conclude this section by recalling the following two results from [3].

Lemma 1

Let ${\mathcal{X}}\in \mathbb {R}^{I_{1}\times \cdots \times I_{n}\times {\cdots } \times I_{N}}$, $A\in \mathbb {R}^{J_{n}\times I_{n}}$, and $y\in \mathbb {R}^{J_{n}}$. Then

$$ \mathcal{X}\times_{n} A \bar{\times}_{n} y=\mathcal{X}\bar{\times}_{n} (A^{T}y). $$

Proposition 1

Let ${\mathcal{B}}\in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}\times m}$ be an (N + 1)-mode tensor with the column tensors ${\mathcal{B}}_{1},{\mathcal{B}}_{2},\ldots ,{\mathcal{B}}_{m}\in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$ and $z=(z_{1},z_{2},\ldots ,z_{m})^{T}\in \mathbb {R}^{m}$. Then for an arbitrary (N + 1)-mode tensor ${\mathcal{A}}$ with N-mode column tensors ${\mathcal{A}}_{1},{\mathcal{A}}_{2},\ldots ,{\mathcal{A}}_{m}$, we have

$$ \mathcal{A} \boxtimes^{(N+1)} (\mathcal{B} \bar{\times}_{_{N+1}} z) = (\mathcal{A} \boxtimes^{(N+1)} \mathcal{B}) z. $$

2 Sensitivity analysis of the Stein tensor equation

This section mainly discusses the conditioning of the Stein tensor (3). To this end, we first consider a linear system of equations that is equivalent to (3), and then derive lower and upper bounds for the condition number of the matrix of this linear system.

It is well-known that (2) is equivalent to the linear system of equations

$$ \tilde{\mathcal{A}} x=b, $$

with $x=\text {vec} ({\mathcal{X}})$, $b=\text {vec} ({\mathcal{D}})$, and

$$ \tilde{\mathcal{A}}=\sum\limits_{j=1}^{N} {I^{(I_{N})}\otimes {\cdots} \otimes I^{(I_{j+1})}\otimes A^{(j)}\otimes I^{(I_{j-1})}\otimes {\cdots} \otimes I^{(I_{1})}}. $$

Moreover, we have (see [19])

$$ \mathcal{Y} = \mathcal{X}{ \times_{1}}{A^{(1)}}{ \times_{2}}{A^{(2)}} {\cdots} { \times_{N}}{A^{(N)}} \quad \Leftrightarrow \quad {Y_{(1)}} = {A^{(1)}}{X_{(n)}}{({A^{(N)}} \otimes {\cdots} \otimes {A^{(2)}})^{T}}. $$

As a result, it follows that (3) corresponds to the linear system of equations

$$ \mathcal{A}x:=\left( {I - {A^{(N)}} \otimes {\cdots} \otimes {A^{(2)}} \otimes {A^{(1)}}} \right)\text{vec}(\mathcal{X}) =\text{vec}(\mathcal{F}). $$

We use the tensor norm

$$ \left\| {\mathcal{X}} \right\|=\left\| {\text{vec}(\mathcal{X})} \right\|_{2}. $$

Therefore, the sensitivity analyses of (2) and (3) are closely related to deriving bounds for the condition number of the matrices $\tilde {\mathcal {A}}$ and $\mathcal {A}$. For linear systems of equations $\mathcal {A}x=b$ and $\mathcal {A}({x+\varDelta x})={b+\varDelta b}$ with a non-singular matrix, it is well known that

$$ \frac{{\left\| {\varDelta x} \right\|_{2}}}{{\left\| x \right\|_{2}}} \le \text{cond}(\mathcal{A})\frac{{\left\| {\varDelta b} \right\|_{2}}}{{\left\| b \right\|_{2}}}. $$

Moreover, if $\left \|{\mathcal {A}^{- 1}}\right \|_{2}\left \|\varDelta \mathcal {A} \right \|_{2}<1$, then

$$ \frac{{\left\| {\varDelta x} \right\|_{2}}}{{\left\| x \right\|_{2}}} \le \frac{{\text{cond}(\mathcal{A})}}{{1 - \text{cond}(\mathcal{A}) \frac{{\left\| {\varDelta \mathcal{A}} \right\|_{2}}}{{\left\| \mathcal{A} \right\|_{2}}}}} \left\{ \frac{{\left\| {\varDelta \mathcal{A}} \right\|_{2}}} {{\left\| \mathcal{A} \right\|_{2}}} +\frac{{\left\| {\varDelta b} \right\|_{2}}} {{{\left\| b \right\|}_{2}}} \right\}; $$

see, e.g., [13] for further details on perturbation analysis for linear systems of equations.

Lower and upper bounds for $\tilde {\mathcal {A}}$ have been derived in [24] under suitable conditions. Therefore, we limit our discussion to the tensor $\mathcal {A}$, which we will assume to be invertible. It is shown in [28] that

$$ \text{cond}(\mathcal{A}) \ge \frac{{{{\max }_{{{\lambda}_{{i_{k}}}} \in \sigma ({A^{(k)}}),k=1,2,\ldots,N}}\left| {1 - {{\lambda}_{{i_{1}}}}{{\lambda}_{{i_{2}}}} \ldots {{\lambda}_{{i_{N}}}}} \right|}}{{{{\min }_{{{\lambda}_{{i_{k}}}} \in \sigma ({A^{(k)}}),k=1,2,\ldots,N}} \left| {1 - {{\lambda}_{{i_{1}}}}{{\lambda}_{{i_{2}}}} {\ldots} {{\lambda}_{{i_{N}}}}} \right|}} $$

and

$$ \text{cond}(\mathcal{A}) \le \frac{{1 + \prod\nolimits_{i = 1}^{N} {{{\| {{A^{(i)}}} \|}_{2}}} }}{{1 - \prod\nolimits_{i = 1}^{N} {{{\| {{A^{(i)}}}\|}_{2}}} }}, $$

(4)

where the latter bound requires the inequality ∥A^(N) ⊗⋯ ⊗ A⁽²⁾ ⊗ A⁽¹⁾∥₂ < 1 to hold. The following proposition presents an alternative upper bound.

Proposition 2

Assume that $\prod \nolimits _{i = 1}^{N} {{\sigma }_{{\min \limits } }} ({A^{(i)}})>1$. Then

$$ \text{cond}(\mathcal{A}) \le \left( \frac{\prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }} ({A^{(i)}}){ }}{{\prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }} ({A^{(i)}}){ } - 1 }}\right) \left( 1+\prod\nolimits_{i = 1}^{N} {{{\| {{A^{(i)}}} \|}_{2}}}\right). $$

Proof

Define $\mathcal {F}={A^{(N)}} \otimes {\cdots } \otimes {A^{(1)}}$ and let ρ(M) denote the spectral radius of the matrix M. Then

$$ \begin{array}{@{}rcl@{}} \|\mathcal{A}\|_{2} &\le& 1 + \| \mathcal{F} \|_{2}= 1 + \sqrt{\rho\left( \mathcal{F} \mathcal{F}^{T}\right)}\\ & = & 1+ {\prod\limits_{i = 1}^{N} {{\sigma}_{\max }} \left( {A^{(i)}}\right)}\\ & = & 1+ \prod\nolimits_{i = 1}^{N} {{{\| {{A^{(i)}}} \|}_{2}}}. \end{array} $$

(5)

Since $(I-\mathcal {F})^{-1} = -(I-\mathcal {F}^{-1})^{-1}\mathcal {F}^{-1}$ and

$$ \mathcal{F}^{-1}=\left( {A^{(N)}}\right)^{-1} \otimes {\cdots} \otimes \left( {A^{(1)}}\right)^{-1}, $$

we obtain

$$ \|\mathcal{F}^{-1}\|_{2}=\prod\nolimits_{i = 1}^{N} \|\left( A^{(i)}\right)^{-1}\|_{2}= \left( \prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }} \left( {A^{(i)}}\right)\right)^{-1} < 1 $$

and

$$ \| (I-\mathcal{F})^{-1}\|_{2} \le \| (I-\mathcal{F}^{-1})^{-1}\|_{2} \| \mathcal{F}^{-1}\|_{2} \le \| (I-\mathcal{F}^{-1})^{-1}\|_{2} \le \frac{1}{1-\|\mathcal{F}^{-1}\|_{2} }, $$

which shows the proposition. □

Remark 1

We note that the assumption in Proposition 2 differs from the one used in [28]. Because of the importance of determining upper bounds in perturbation analysis, we report the upper bounds provided by (4) and Proposition 2 for two matrices ${\mathcal A}$. The bounds and the exact condition numbers are plotted in Fig. 1. We used the MATLAB function “$\text {cond}(\text {full}(\mathcal {A}))$.” This allowed us to calculate the condition number of $\mathcal {A}$ for small n only due to lack of computer memory.^{Footnote 1} When the matrix $\mathcal {A}$ is large and sparse, we can compute an estimate of the condition number with MATLAB function “condest($\mathcal {A}$).”

Case I::

We let the matrices A⁽ⁱ⁾, i = 1,2,3, be ill-conditioned “prolate” Toeplitz matrices. This kind of Toeplitz matrix can be generated with the MATLAB command A = gallery(^′prolate^′,n,w), which returns the n-by-n prolate Toeplitz matrix with parameter w. We set w = 0.11 for A⁽¹⁾, w = 0.12 for A⁽²⁾, and w = 0.13 for A⁽³⁾. Then ∥A⁽³⁾ ⊗ A⁽²⁾ ⊗ A⁽¹⁾∥ < 1. Notice that A⁽¹⁾, A⁽²⁾, and A⁽³⁾ are full matrices. We therefore do not report “condest($\mathcal {A}$)” for this case.

Case II::

For i = 1,2,3, consider the matrices

$$ A^{(i)} =\frac{\nu}{h^{2}} \left[\begin{array}{lllll} 2&-1&&&\\ -1&2&-1&&\\ &\ddots&\ddots&\ddots&\\ &&-1&2&-1\\ &&&-1&2 \end{array}\right]+\frac{c_{i}}{4h} \left[\begin{array}{lllll} 3&-5&1&&\\ 1&3&-5&1&\\ &\ddots&\ddots&\ddots&1\\ &&1&3&-5\\ &&&1&3 \end{array}\right] \in \mathbb{R}^{n\times n}, $$

(6)

that are the sum of a symmetric tridiagonal matrix and a banded upper Hessenberg Toeplitz matrix with ν = 0.1, c₁ = 1, c₂ = 2, c₃ = 3, and h = 1/(n + 1). It can be verified that $\prod \limits _{i=1}^{3}{\sigma }_{\min \limits }(A^{(i)})>1$.

We next derive new bounds for $\text {cond}(\mathcal {A})$. This requires the following two propositions.

Proposition 3

Let $A^{(i)}\in \mathbb {R}^{n_{i}\times n_{i}}$ and $x_{i}\in \mathbb {R}^{n_{i}}$ for i = 1,2,…,ℓ. Then

$$ \left( \underset{i = 1}{\overset{\ell}{\bigotimes}} x_{i}\right)^{T}\mathcal{H}\left( A^{(1)}\otimes A^{(2)}\otimes {\cdots} \otimes A^{(\ell)}\right) \underset{i = 1}{\overset{\ell}{\bigotimes}} x_{i} = \prod\limits_{i=1}^{\ell} {{x_{i}^{T}}\mathcal{H}(A^{(i)}) x_{i}}. $$

(7)

Proof

We show the assertion by induction. Let ℓ = 2. Using the fact that ${x_{i}^{T}}\mathcal {S}(A^{(i)})x_{i}=0$ for i = 1,2, we obtain (7) from the following equality (see [29]):

$$ \mathcal{H}(A^{(1)}\otimes A^{(2)}) = \mathcal{H}(A^{(1)})\otimes \mathcal{H}(A^{(2)})+ \mathcal{S}(A^{(1)})\otimes\mathcal{S}(A^{(2)}). $$

Now assume that (7) holds for ℓ = k. Let ℓ = k + 1 and define

$$ \mathcal{Y}_{k}=\underset{i = 2}{\overset{(k+1)}{\bigotimes}} x_{i}, \qquad \mathcal{Y}_{k+1}=x_{1}\otimes\mathcal{Y}_{k}, \qquad \mathcal{A}_{k}=A^{(2)}\otimes {\cdots} \otimes A^{(k+1)}. $$

Then

$$ \begin{array}{@{}rcl@{}} \mathcal{Y}_{k+1}^{T} \mathcal{H}(A^{(1)}\otimes A^{(2)}\otimes {\cdots} \otimes A^{(k+1)})\mathcal{Y}_{k+1} & =&(x_{1} \otimes \mathcal{Y}_{k})^{T}\mathcal{H}(A^{(1)}\otimes \mathcal{A}_{k})(x_{1} \otimes \mathcal{Y}_{k})\\ & = & ({x_{1}^{T}} \mathcal{H}(A^{(1)}) x_{1}) \times (\mathcal{Y}_{k}^{T}\mathcal{H}(\mathcal{A}_{k}) \mathcal{Y}_{k}). \end{array} $$

The proposition now follows from the induction hypothesis. □

Proposition 4

Let $\mathcal {A}=I - {A^{(N)}} \otimes {\cdots } \otimes {A^{(2)}} \otimes {A^{(1)}}$. Then

$$ {{\lambda}_{\max }}(\mathcal{A}\mathcal{A}^{T}) \ge 1 + \prod\nolimits_{i = 1}^{N} {{\sigma}_{\max }^{2}} ({A^{(i)}})-2\prod\nolimits_{i = 1}^{N} {{y_{i}^{T}}\mathcal{H}} ({A^{(i)}}){y_{i}} $$

and

$$ {{\lambda}_{\min }}(\mathcal{A}\mathcal{A}^{T}) \le 1 + \prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) -2 \prod\nolimits_{i = 1}^{N} {{z_{i}^{T}}\mathcal{H}} ({A^{(i)}}){z_{i}}, $$

where the y_i and z_i are unit eigenvectors such that, for i = 1,2,…,N,

$$ A^{(i)}(A^{(i)})^{T}z_{i} = \sigma^{2}_{\min}(A^{(i)}) z_{i}\quad\text{and}\quad A^{(i)}(A^{(i)})^{T}y_{i} = \sigma^{2}_{\max}(A^{(i)}) y_{i}. $$

Proof

It is easy to verify that

$$ \begin{array}{@{}rcl@{}} \mathcal{A}{\mathcal{A}^{T}} & = & (I - {A_{N}} \otimes {\cdots} \otimes {A_{1}})(I - {A_{N}^{T}} \otimes {\cdots} \otimes {A_{1}^{T}})\\ & = & I + {A_{N}}{A_{N}^{T}} \otimes {\cdots} \otimes {A_{1}}{A_{1}^{T}} - 2\mathcal{H}({A_{N}} \otimes {\cdots} \otimes {A_{1}}). \end{array} $$

(8)

Let $\mathcal {Y}=(y_{N}\otimes {\cdots } \otimes y_{1})$ and $\mathcal {Z}=(z_{N}\otimes {\cdots } \otimes z_{1})$. Then it follows from Proposition 3 that

$$ \mathcal{Y}^{T}\mathcal{A}{\mathcal{A}^{T}}\mathcal{Y} = 1+ \prod\nolimits_{i = 1}^{N} {{\sigma}_{\max }^{2}} ({A^{(i)}})-2 \prod\nolimits_{i = 1}^{N} {{y_{i}^{T}}\mathcal{H}} ({A^{(i)}}){y_{i}} $$

and

$$ \mathcal{Z}^{T}\mathcal{A}{\mathcal{A}^{T}}\mathcal{Z} = 1 + \prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) - 2\prod\nolimits_{i = 1}^{N} {{z_{i}^{T}}\mathcal{H}} ({A^{(i)}}){z_{i}}. $$

This shows the desired result. □

Remark 2

If the matrices A⁽ⁱ⁾, for i = 1,2,…,N, are positive definite, then

$$ {{\lambda}_{\min }}(\mathcal{A}\mathcal{A}^{T}) \le 1+ \prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}). $$

We note that the matrices A⁽ⁱ⁾ are not required to be symmetric. Positive definiteness of the matrix A⁽ⁱ⁾, i = 1,2,…,N, implies that ${\mathcal{H}}(A^{(i)})$ is symmetric positive definite. Furthermore, if

$$ \prod\nolimits_{i = 1}^{N} {{\sigma}_{\max}^{2}} ({A^{(i)}})\ge 2\prod\nolimits_{i = 1}^{N} {\lambda}_{\max }({\mathcal{H}} ({A^{(i)}})), $$

then the following upper bound follows from Proposition 4,

$$ \begin{array}{@{}rcl@{}} \text{cond}(\mathcal{A}) &\ge& \frac{\sqrt{1+\prod\nolimits_{i = 1}^{N} {{\sigma}_{\max }^{2}} ({A^{(i)}})- 2\prod\nolimits_{i = 1}^{N} {\lambda}_{\max} ({\mathcal{H}} ({A^{(i)}}))}}{\sqrt{1+\prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) }}\\&\ge& \frac{1}{\sqrt{1+ \prod\nolimits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) }}. \end{array} $$

Under additional assumptions, we can derive an alternative upper bound for the condition number. To this end, we need the following result, which is a consequence of Weyl’s Theorem [14, Theorem 4.3.1].

Proposition 5

Let the matrices $A,B\in \mathbb {R}^{n\times n}$ be symmetric. Then

$$ \begin{array}{@{}rcl@{}} {\lambda}_{\max}(A + B) &\leq& {\lambda}_{\max}(A) + {\lambda}_{\max}(B),\\ {\lambda}_{\min}(A + B) &\geq& {\lambda}_{\min}(A) + {\lambda}_{\min}(B). \end{array} $$

Remark 3

Let $\mathcal {F}={A^{(N)}} \otimes {\cdots } \otimes {A^{(1)}}$ and $\lambda \in \sigma ({\mathcal{H}}(\mathcal {F}))$. Let $\mathcal {E}_{N}$ denote the set of non-negative even numbers less than or equal to N. Then

$$ {{\lambda}_{max}(\mathcal{H}(\mathcal{F}))} \le \sum\limits_{r\in \mathcal{E}_{N}}{\frac{{N!}}{{r!\left( {N - r} \right)!}}{M_{S}^{r}}M_{H}^{N - r}} \le (M_{S}+M_{H})^{N}, $$

where

$$ M_{S} = \underset{i = 1,2,...,N}{\max} {\| {\mathcal{S}({A^{(i)}})} \|_{2}}\quad \text{and} \quad {M_{H}} = \underset{i = 1,2,...,N}{\max} {{\|\mathcal{H}({A^{(i)}})\|_{2}}}. $$

The result can be shown by considering the symmetric part of $\mathcal {F}$. For simplicity, let N = 3. Then

$$ \begin{array}{@{}rcl@{}} \mathcal{H}(\mathcal{F})&=&\mathcal{H}(A^{(3)})\otimes\mathcal{H}(A^{(2)})\otimes \mathcal{H}(A^{(1)}) + \mathcal{H}(A^{(3)})\otimes\mathcal{S}(A^{(2)})\otimes \mathcal{S}(A^{(1)})\\ &&+\mathcal{S}(A^{(3)})\otimes\mathcal{H}(A^{(2)})\otimes\mathcal{S}(A^{(1)})+ \mathcal{S}(A^{(3)})\otimes \mathcal{S}(A^{(2)})\otimes \mathcal{H}(A^{(1)}). \end{array} $$

Using Proposition 5, we have

$$ \begin{array}{@{}rcl@{}} {{\lambda}_{max}(\mathcal{H}(\mathcal{F}))} &\le & \prod\limits_{i = 1}^{3} \|\mathcal{H}({A^{(i)}})\|_{2} + \|\mathcal{H}({A^{(3)}})\|_{2} \|\mathcal{S}({A^{(2)}})\|_{2} \|\mathcal{S}({A^{(1)}})\|_{2} \\ && +\|\mathcal{S}({A^{(3)}})\|_{2}\|\mathcal{H}({A^{(2)}})\|_{2}\|\mathcal{S}({A^{(1)}})\|_{2}\\ &&+ \|\mathcal{S}({A^{(3)}})\|_{2}\|\mathcal{S}({A^{(2)}})\|_{2}\|\mathcal{H}({A^{(1)}})\|_{2}\\ &\le& {M_{H}^{3}}+3M_{H}{M_{S}^{2}} \le (M_{H}+M_{S})^{3}. \end{array} $$

It follows from the above discussions that if $1 + \prod \limits _{i = 1}^{N} {{\sigma }_{{\min \limits } }^{2}} ({A^{(i)}}) -2 {({M_{S}} + {M_{H}})^{N}}>0$, then we can derive an upper bound for $\|\mathcal {A}^{-1}\|_{2}$ in the following manner: We obtain from (8) that

$$ \begin{array}{@{}rcl@{}} {{\lambda}_{\min }}(\mathcal{A}{\mathcal{A}^{T}}) &\ge& 1 + \prod\limits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) - 2{{\lambda}_{max}(\mathcal{H}(\mathcal{F}))}\\ &\ge& 1 + \prod\limits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) -2 {({M_{S}} + {M_{H}})^{N}}. \end{array} $$

Therefore,

$$ \|\mathcal{A}^{-1}\|_{2} \le \frac{1}{\sqrt{1 + \prod\limits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) -2 {({M_{S}} + {M_{H}})^{N}}}}. $$

(9)

Combining the inequalities (5) and (9) yields

$$ \text{cond}(\mathcal{A}) \le \frac{1+ {\prod\limits_{i = 1}^{N} {{\sigma}_{\max }} ({A^{(i)}})}}{\sqrt{1 + \prod\limits_{i = 1}^{N} {{\sigma}_{\min }^{2}} ({A^{(i)}}) - 2{({M_{S}} + {M_{H}})^{N}}}}. $$

(10)

To illustrate the bound (10), we let $A^{(1)}=A^{(2)}=A^{(3)}=\tilde {A}$, where the matrix $\tilde {A}\in {\mathbb R}^{n\times n}$ is defined by

$$ \tilde{A}=M+2rL+\frac{100}{(n+1)^{2}}I $$

with M = tridiag(− 1,2,− 1) and L = tridiag(0.5,0,− 0.5). We note that the matrix $\tilde {A}$ is taken from [29,30,31]. The condition

$$ 1 + \prod\limits_{i = 1}^{3} {{\sigma}_{\min }^{2}} ({A^{(i)}}) -2 {({M_{S}} + {M_{H}})^{3}}>0 $$

holds for suitable choices of r and even values of n. Figure 2 displays graphs for the exact condition number cond($\mathcal {A}$) and the bound (10). The computations are carried out on the same computer as for Fig. 1. In particular, the function cond(⋅) can be evaluated for fairly small values of n, only.

We conclude this section by considering the situation when all the matrices A⁽ⁱ⁾ are diagonalizable.

Remark 4

Let the matrices A⁽ⁱ⁾ be diagonalizable, i.e., there are non-singular matrices S_i and diagonal matrices D_i such that $A^{(i)}=S_{i}D_{i}S_{i}^{-1}$ for i = 1,2,…,N. Introduce

$$ \mathcal{A}=I-{A^{(N)}} \otimes {\cdots} \otimes {A^{(1)}}\text{~~~and~~~} \mathcal{S}={S_{N}} \otimes {\cdots} \otimes {S_{1}}. $$

Then $\mathcal {A}=\mathcal {S}(I-D_{N} \otimes {\cdots } \otimes D_{1})\mathcal {S}^{-1}$. Hence, if 1∉σ(A^(N) ⊗⋯ ⊗ A⁽¹⁾), then

$$ \mathcal{A}^{-1}= \mathcal{S} (I-D_{N} \otimes {\cdots} \otimes D_{1})^{-1}\mathcal{S}^{-1}. $$

As a result, we get

$$ \begin{array}{@{}rcl@{}} \|\mathcal{A}^{-1} \|_{2} &\le& \prod\limits_{i = 1}^{N} \|S_{i}^{-1}\|_{2}\|S_{i}\|_{2} M_{1,D}= \prod\limits_{i = 1}^{N} \text{cond}(S_{i}) M_{1,D},\\ \|\mathcal{A} \|_{2} &\le& \prod\limits_{i = 1}^{N} \|S_{i}^{-1}\|_{2}\|S_{i}\|_{2} M_{2,D}= \prod\limits_{i = 1}^{N} \text{cond}(S_{i}) M_{2,D}, \end{array} $$

where

$$ \begin{array}{@{}rcl@{}} M_{1,D}\!&=&\!\max \left\{ {\frac{1}{{\left| {1 - {{\lambda}_{\min }}({D_{N}} \otimes {\cdots} \otimes {D_{1}})} \right|}},\frac{1}{{\left| {1 - {{\lambda}_{\max }}({D_{N}} \otimes \cdots \otimes {D_{1}})} \right|}}} \right\},\\ M_{2,D}\!&=&\!\max \left\{ {{{\left|{\kern-.5pt} {1 - {{\lambda}_{\min }}({D_{N}} \!\otimes {\cdots} \!\otimes\! {D_{1}})} \right|{\kern-.5pt}}},\!{{\left| {1 - {{\lambda}_{\max }}({{\kern-.5pt}D_{N}} \!\otimes\! {\cdots} \!\otimes\! {D_{1}})} \right|}}} \right\} \!\le\! 1 + \prod\limits_{i = 1}^{N} \|D_{i}\|_{2}. \end{array} $$

We obtain the inequality

$$ \text{cond}(\mathcal{A}) \le \prod\limits_{i = 1}^{N} \left( \text{cond}(S_{i})\right)^{2} M_{1,D}M_{2,D}. $$

Let $\prod \limits _{i = 1}^{N} \|D_{i}^{-1}\|_{2} <1$. Then analogously to the proof of Proposition 2, we have

$$ \text{cond}(\mathcal{A}) \le \prod\limits_{i = 1}^{N} \left( \text{cond}(S_{i})\right)^{2} \frac{M_{2,D}}{1-\prod\limits_{i = 1}^{N} \|D_{i}^{-1}\|_{2}}. $$

If $\prod \limits _{i = 1}^{N} \|D_{i}\|_{2} <1$, then

$$ \text{cond}(\mathcal{A}) \le \prod\limits_{i = 1}^{N} \left( \text{cond}(S_{i})\right)^{2} \frac{M_{2,D}}{1-\prod\limits_{i = 1}^{N} \|D_{i}^{}\|_{2}} \le \prod\limits_{i = 1}^{N} \left( \text{cond}(S_{i})\right)^{2} \frac{2}{1-\prod\limits_{i = 1}^{N} \|D_{i}^{}\|_{2}}. $$

Finally, we note that if the matrices D_i, i = 1,2,…,N, are all positive definite, then

$$ {{\lambda}_{\min }}({D_{N}} \otimes {\cdots} \otimes {D_{1}}) = \prod\limits_{i = 1}^{N} {{\lambda}_{\min }(D_{i})} \quad \text{and} \quad {{\lambda}_{\max }}({D_{N}} \otimes {\cdots} \otimes {D_{1}}) = \prod\limits_{i = 1}^{N} {{\lambda}_{\max }(D_{i})}. $$

3 The tensor form of GKB and Tikhonov regularization

We first describe the implementation of the Golub–Kahan bidiagonalization (GKB) process in the tensor framework. Subsequently, we discuss an application of the GKB process to Tikhonov regularization. For notational simplicity, we introduce the two linear operators $\tilde {{\mathcal{M}}}, {{\mathcal{M}}}:\mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}} \to \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$ defined by

$$ \begin{array}{@{}rcl@{}} \tilde{\mathcal{M}}(\mathcal{X}) & := &\mathcal{X}\times_{1} A^{(1)}+ \mathcal{X}\times_{2} A^{(2)}+\cdots+\mathcal{X}\times_{N} A^{(N)}, \\ \mathcal{M}(\mathcal{X}) &:=&\mathcal{X}-\mathcal{X}\times_{1} A^{(1)}\times_{2} A^{(2)} \ldots\times_{N} A^{(N)}. \end{array} $$

The adjoint operators of $\tilde {{\mathcal{M}}}$ and ${\mathcal{M}}$ are given by

$$ \begin{array}{@{}rcl@{}} \tilde{\mathcal{M}}^{*}(\mathcal{Y}) & := & \mathcal{Y}\times_{1} (A^{(1)})^{T}+\mathcal{Y}\times_{2} (A^{(2)})^{T}+\cdots+\mathcal{Y}\times_{N} (A^{(N)})^{T}, \\ \mathcal{M}^{*}(\mathcal{Y}) & := &\mathcal{Y}-\mathcal{Y}\times_{1} (A^{(1)})^{T}\times_{2} (A^{(2)})^{T}\ldots\times_{N} (A^{(N)})^{T}, \end{array} $$

for ${\mathcal{Y}} \in \mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$. The tensor equations (2) and (3) can be expressed as

$$ \begin{array}{@{}rcl@{}} \tilde{\mathcal{M}}(\mathcal{X})=\mathcal{D},\\ \mathcal{M}(\mathcal{X})=\mathcal{F}. \end{array} $$

(11)

We remark that the results and methods of this section also can be applied to other linear operators from $\mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$ to $\mathbb {R}^{I_{1}\times I_{2} \times {\cdots } \times I_{N}}$. For notational convenience, we discuss in the sequel results and methods for (11).

Consider for the moment the linear system of equations Ax = b with a non-singular matrix $A\in \mathbb {R}^{n\times n}$. Application of k steps of the GKB process to A with initial vector b produces the decompositions

$$ AU_{k}=V_{k+1}\bar{T}_{k},\qquad A^{T}V_{k}=U_{k}{T_{k}^{T}}, $$

(12)

where the matrices $V_{k+1}\in \mathbb {R}^{n\times (k+1)}$ and $U_{k}\in \mathbb {R}^{n\times k}$ have orthonormal columns, the matrix V_k is made up of the first k columns of V_k+ 1, the first column of V_k+ 1 is b/∥b∥₂, the matrix $\bar {T}_{k}\in \mathbb {R}^{(k+1)\times k}$ is lower bidiagonal with all diagonal and subdiagonal entries positive, and T_k is the leading k × k submatrix of $\bar {T}_{k}$. We assume that k is small enough so that the decompositions (12) with the stated properties exist. This is the generic situation. Otherwise, the GKB process is said to break down. In the latter event, the computations simplify. We will not dwell on the handling of breakdowns. Thorough discussions on the GKB process can be found in [13, 25].

It is natural to extend the GKB process to tensor equations. Algorithm 1 describes the application of the GKB process to (11). We refer to the process so defined as the GKB based on tensor format (GKB₋BTF) process.

Assume that the first k steps of Algorithm 1 can be carried out without breakdown, i.e., without any coefficients α_j and β_j vanishing. The analogue of the lower bidiagonal matrix $\bar {T}_{k}\in \mathbb {R}^{(k+1)\times k}$ in (12), which we also refer to as $\bar {T}_{k}$, has the diagonal entries α₁,α₂,…,α_k. They are computed in line 6 of Algorithm 1. The subdiagonal elements β₂,β₃,…,β_k+ 1 of $\bar {T}_{k}$ are computed in line 12 of the algorithm. We can express the matrix $\bar {T}_{k}$ in the form

$$ \bar{T}_{k} = \left( {\begin{array}{*{20}{c}} {{T_{k}}}\\ {{{\beta}_{k + 1}}{e_{k}^{T}}} \end{array}} \right). $$

Theorem 1

Let ${\tilde {{\mathcal{V}}}}_{k}$, ${\tilde {{\mathcal{U}}}}_{k}$, ${\tilde {{\mathcal{W}}}}_{k}$, and ${\tilde {{\mathcal{W}}}}^{*}_{k}$ be (N + 1)-mode tensors with frontal slices ${{{\mathcal{V}}}}_{j}$, ${{{\mathcal{U}}}}_{j}$, ${\mathcal{W}}_{j}:={\mathcal{M}}({\mathcal{U}}_{j})$, and ${\mathcal{W}}_{j}^{*}:={\mathcal{M}}^{*}({\mathcal{V}}_{j})$, respectively, for j = 1,2,…,k, computed by Algorithm 1. Then

$$ \begin{array}{@{}rcl@{}} {\tilde{\mathcal{W}}}_{k}&=&{\tilde{\mathcal{V}}}_{k}\times_{(N+1)}{T_{k}^{T}}+ {\beta}_{k+1}\mathcal{Z}\times_{(N+1)}E_{k} = {\tilde{\mathcal{V}}}_{k+1}\times_{(N+1)}\bar{T}_{k}^{T}, \end{array} $$

(13)

$$ \begin{array}{@{}rcl@{}} {\tilde{\mathcal{W}}}_{k}^{*}&=&\mathcal{\tilde{\mathcal{U}}}_{k}\times_{(N+1)}T_{k}, \end{array} $$

(14)

where ${\mathcal{Z}}$ is an (N + 1)-mode tensor with k column tensors $0,\ldots ,0,{\mathcal{V}}_{k+1}$. The last column of the matrix $E_{k}=[0,\ldots ,0,e_{k}]\in {\mathbb R}^{k\times k}$ is the last column of the identity matrix of order k.

Proof

From lines 11 and 16 of Algorithm 1, we have

$$ \mathcal{M}(\mathcal{U}_{j-1})=\alpha_{j-1}\mathcal{V}_{j-1}+{\beta}_{j}\mathcal{V}_{j}. $$

(15)

Note that the (j − 1)st frontal slice of (13) is given by

$$ ({\tilde{\mathcal{V}}}_{k+1}\times_{(N+1)}\bar{T}_{k}^{T} )_{:\ldots:(j-1)} = \sum\limits_{\ell=1}^{k+1} { \mathcal{V}_{\ell} (\bar T_{k})_{\ell,j-1}} = \alpha_{j-1} \mathcal{V}_{j-1} + {\beta}_{j} \mathcal{V}_{j}. $$

Equation (13) now follows from (15) and the above relation.

To show (14), we first note that lines 2, 5, and 10 of Algorithm 1 yield

$$ \mathcal{M}^{*}(\mathcal{V}_{j})={\beta}_{j}\mathcal{U}_{j-1}+\alpha_{j}\mathcal{U}_{j},\quad j=1,2,\ldots~, $$

where ${\mathcal{U}}_{0}$ is defined to be zero. Equation (14) now follows by comparing the above equation and the j th frontal slice of the right-hand side of (14). □

We turn to the situation when the operator ${\mathcal M}$ in (11) is severely ill-conditioned and the right-hand side tensor ${{\mathcal{F}}}$ is contaminated by error. Let $\hat {{\mathcal{F}}}$ denote the unknown error-free tensor associated with ${{\mathcal{F}}}$, and assume that $\hat {{\mathcal{F}}}$ is in the range of ${\mathcal M}$. We would like to determine the solution of minimal norm, $\hat {{\mathcal{X}}}$, of

$$ {\mathcal M}({\mathcal{X}})=\hat{\mathcal{F}}. $$

Straightforward solution of (11) may not give a meaningful approximation of $\hat {{\mathcal{X}}}$ due to a large propagated error in the solution of (11) stemming from the error in ${\mathcal{F}}$. A common way to address this difficulty is to replace (11) by a nearby problem, whose solution is less sensitive to the error in ${\mathcal{F}}$. This replacement is known as regularization. One of the most popular regularization methods is due to Tikhonov. This regularization method replaces the solution of (11) by the minimization problem

$$ \min_{\mathcal{X}\in \mathbb{R}^{I_{1}\times I_{2} \times {\ldots} \times I_{N}}} \left\{\left\|\mathcal{M}(\mathcal{X})-\mathcal{F}\right\|^{2}+\mu\|\mathcal{X}\|^{2}\right\}. $$

(16)

The parameter μ > 0 is referred to as a regularization parameter. Its purpose is to balance the influence of the first term (the fidelity term) and the second term (the regularization term).

Let ${\mathcal{X}}_{k,\mu _{k}}={\tilde {{\mathcal{U}}}}_{k}\bar {\times }_{(N+1)} y_{k,\mu _{k}}$ be an approximate solution of (16), where ${\tilde {{\mathcal{V}}}}_{k}$ is defined as above. We obtain from (13), by using Lemma 1 and Proposition 1, that

$$ \begin{array}{@{}rcl@{}} \left\| \mathcal{M}(\mathcal{X}_{k,\mu_{k}})-\mathcal{F} \right\| & = & \left\| {\tilde{\mathcal{V}}}_{k+1}\times_{(N+1)}\bar {T_{k}^{T}}\bar{\times}_{(N+1)} y_{k,\mu_{k}} -\mathcal{F} \right\|\\ & = & \left\| {\tilde{\mathcal{V}}}_{k+1}\bar \times_{(N+1)}\bar T_{k} y_{k,\mu_{k}} -\mathcal{F} \right\|\\ & = & \left\| \tilde{\mathcal{V}}_{k+1}\boxtimes^{(N+1)} ({\tilde{\mathcal{V}}}_{k+1} \bar{\times}_{(N+1)}\bar{T}_{k} y_{k,\mu_{k}} -\mathcal{F}) \right\|_{2}\\ & = & \left\| (\tilde{\mathcal{V}}_{k+1}\boxtimes^{(N+1)} {\tilde{\mathcal{V}}}_{k+1})\bar{T}_{k} y_{k,\mu_{k}}-\tilde{\mathcal{V}}_{k+1} \boxtimes^{(N+1)}\mathcal{F} \right\|_{2}\\ & = & \left\| \bar{T}_{k} y_{k,\mu_{k}} -{\beta}_{1} e_{1} \right\|_{2}. \end{array} $$

(17)

This shows that (16) is equivalent to the following low-dimensional minimization problem

$$ \underset{y \in {\mathbb{R}^{k}}}{\min} \left\{ \left\| \bar{T}_{k} y -{\beta}_{1} e_{1} \right\|_{2}^{2} +\mu\|y\|^{2}_{2} \right\} = \underset{y \in {\mathbb{R}^{k}}}{\min} {\left\| {\left( {\begin{array}{*{20}{c}} {{{{ \bar T }}_{k}}}\\ {\sqrt \mu I} \end{array}} \right)y - {\beta}_{1} e_{1} }\right\|^{2}_{2}}. $$

(18)

The minimization problem on the right-hand side can be solved in only ${\mathcal O}(k)$ arithmetic floating point operations for each value of μ > 0; see Eldén [11] for details.

We turn to the choice of the regularization parameter and assume that an upper bound 𝜖 > 0 for the norm of the error in the right-hand side ${\mathcal F}$ is explicitly known. Then the discrepancy principle can be applied to determine the regularization parameter μ. The discrepancy principle prescribes that μ > 0 be chosen so that

$$ \left\| \mathcal{M}(\mathcal{X}_{k,\mu_{k}})-\mathcal{F} \right\| = \eta\epsilon $$

(19)

for some parameter η > 1 that is independent of 𝜖. This is a non-linear equation for μ > 0; see, e.g., Engl et al. [12] for a discussion on the discrepancy principle. Of course, other techniques for determining a suitable value of μ also can be applied; see, e.g., Kindermann and Raik [17, 18] for discussions.

It is not advisable to use the normal equations associated with the right-hand side of (18) in computations. However, the normal equations are convenient to apply when deriving expressions for determining a value of μ > 0 that approximately satisfies (19). Let y_k,μ denote the solution of (18). Using the normal equations associated with the right-hand side of (18), y_k,μ can be expressed as

$$ y_{k,\mu}={\beta}_{1} (\bar{T}_{k}^{T}\bar{T}_{k}+\mu I)^{-1} \bar{T}_{k}^{T}e_{1}. $$

(20)

Consequently,

$$ \begin{array}{@{}rcl@{}} \left\| \bar{T}_{k} y_{k,\mu} -{\beta}_{1} e_{1} \right\|_{2}^{2} & = & \left\| {\beta}_{1}\bar{T}_{k} (\bar{T}_{k}^{T}\bar{T}_{k}+\mu I)^{-1} \bar{T}_{k}^{T}e_{1} -{\beta}_{1} e_{1} \right\|_{2}^{2}\\ & = & \left\| (\bar{T}_{k} (\bar{T}_{k}^{T}\bar{T}_{k}+\mu I)^{-1} \bar{T}_{k}^{T} -I){\beta}_{1} e_{1} \right\|_{2}^{2}\\ & = & \left\| (\mu^{-1}\bar{T}_{k} \bar{T}_{k}^{T}+I)^{-1} {\beta}_{1} e_{1}\right\|_{2}^{2}\\ & = &{{\beta}_{1}^{2}} {e_{1}^{T}} (\mu^{-1}\bar{T}_{k} \bar{T}_{k}^{T}+I)^{-2} e_{1}. \end{array} $$

Introduce the functions

$$ \begin{array}{@{}rcl@{}} {\psi}_{k} (\mu) &=& {{\beta}_{1}^{2}} {e_{1}^{T}} (\mu^{-1}\bar{T}_{k} \bar{T}_{k}^{T}+I)^{-2} e_{1},\\ \phi_{k} (\mu) &=& {{\beta}_{1}^{2}} {e_{1}^{T}} (\mu^{-1}T_{k} {T_{k}^{T}}+I)^{-2} e_{1}. \end{array} $$

(21)

Proposition 6

Let η > 1 and 𝜖 > 0 be constants, and let the function ϕ_k(μ) be defined by (21). If μ > 0 satisfies

$$ \epsilon^{2} \le \phi_{k} (\mu)\le \eta^{2} \epsilon^{2}, $$

(22)

then the associated solution y_k,μ of (18) is such that

$$ \epsilon \le \left\| T_{k} y_{k,\mu} -{\beta}_{1} e_{1} \right\|_{2}\le \eta \epsilon, $$

and ${\mathcal{X}}_{k,\mu }= {\tilde {{\mathcal{V}}}}_{k}\bar {\times }_{(N+1)} y_{k,\mu }$ fulfills

$$ \epsilon \le \left\| \mathcal{M}(\mathcal{X}_{k,\mu})-\mathcal{F} \right\|\le \eta \epsilon. $$

(23)

Moreover,

$$ {\psi}_{k}(\mu)=\left\| \mathcal{M}(\mathcal{X}_{k,\mu})-\mathcal{F} \right\|^{2}. $$

(24)

Proof

It can be shown that ϕ_k(μ) ≤ ψ_k(μ) for μ ≥ 0. A proof based on interpreting ϕ_k(μ) as a Gauss quadrature rule and ψ_k(μ) as a Gauss–Radau quadrature rule with a fixed node at the origin is provided in [8] in the context of solving large linear systems of equations with a severely ill-conditioned matrix and an error-contaminated right-hand side. Equation (24) follows from (17). □

The following result is easy to show. A proof can be found in [8].

Proposition 7

Let ϕ_k(μ) be defined by (21). Then the function μ → ϕ_k(1/μ) is strictly decreasing and convex for μ > 0. Moreover,

$$ \underset{\mu \to \infty }{\lim } {\phi_{k}}(1/\mu ) = {{\beta}_{1}^{2}}. $$

In particular, Newton’s method applied to compute the solution μ_k of the equation

$$ \phi_{k}(1/\mu)=\eta^{2} \epsilon^{2} $$

(25)

with an initial approximate solution μ₀ ≥ 0 to the left of the solution converges monotonically and quadratically. For instance, one may choose μ₀ = 0 when the function $\mu \rightarrow \phi _{k}(1/\mu )$ and its derivative are suitably defined at μ = 0.

It follows from Proposition 7 that the use of the Newton method to solve (25) is easy to implement, because the method does not have to be safe-guarded when starting with μ₀ = 0. This is discussed in [8]. However, a cubically convergent zero-finder described in [26] and applied in [7, 26] requires fewer iterations and less CPU-time.

The most expensive part of the computations with Algorithm 1 is the evaluation of ${\mathcal M}^{*}({\mathcal V}_{j})$ and ${\mathcal M}({\mathcal U}_{j})$ in lines 5 and 11 of the algorithm. With the aim of keeping the computational effort required by Algorithm 1 as small as possible, we would like to choose the number of steps, k, of the algorithm small, but large enough to be able to satisfy (23). To achieve this, we proceed as follows: Carry out a few steps k > 0 with Algorithm 1, say k = 2, and compute the solution μ_k > 0 of ϕ_k(1/μ) = 𝜖². If ψ_k(1/μ_k) ≤ η²𝜖², then (23) holds for

$$ \mathcal{X}_{k,\mu_{k}}= {\tilde{\mathcal{U}}}_{k}\bar{\times}_{(N+1)} y_{k,\mu_{k}}, $$

(26)

where $y_{k,\mu _{k}}$ is defined by (20) with μ = μ_k. If, instead, ψ_k(1/μ_k) > η²𝜖², then we increase k by one, i.e., we set k = k + 1 and carry out one more step with Algorithm 1. We increase the number of steps until (23) holds. Typically, only a few steps of Algorithm 1 are required to satisfy (23). The required number of evaluations of the expressions ${\mathcal M}^{*}({\mathcal V}_{j})$ and ${\mathcal M}({\mathcal U}_{j})$ typically is fairly small. This is illustrated in Section 4. Algorithm 2 summarizes the computations required for Tikhonov regularization based on the GKB₋BTF process.

This section focused on the solution of equation (11). However, the solution method described can be applied to the solution of more general tensor (1).

4 Numerical examples

This section shows a few numerical experiments that illustrate the performance of the method described in Section 3. We limit ourselves to the case N = 3 in (2) and (3). For notational simplicity, we write (2) and (3) in the form ${\mathcal{L}}({\mathcal{X}})={\mathcal{C}}$. The right-hand side tensor is in all test problems contaminated by an error tensor ${\mathcal{E}}$ with normally distributed random entries with zero mean. The entries are scaled to yield a specified noise level

$$ \nu :=\frac{\|\mathcal{E}\|}{\|\mathcal{C}\|}. $$

All computations were carried out using the Tensor Toolbox [1] in MATLAB version R2018b with an Intel Core i7-4770K CPU @ 3.50-GHz processor and 24-GB RAM.

We report the relative errors

$$ e_{k}:= \frac{{\Vert {{\mathcal{X}_{{\mu_{k}},k}} - \hat{\mathcal{X}}} \Vert}} {{\Vert \hat{\mathcal{X}} \Vert}}, $$

where $\hat {{\mathcal{X}}}$ denotes the desired solution of the problem with error-free right-hand side tensor $\hat {{\mathcal{C}}}$ associated with ${\mathcal{C}}$, and ${\mathcal{X}}_{{\mu _{k}},k}$ denotes the k th computed approximation determined by the algorithms.

In the computations for Tables 1, 5, and 7, the iterations were terminated as soon as an approximate solution ${\mathcal{X}}_{{\mu _{k}},k}$ was found such that the discrepancy principle

$$ \left\| \mathcal{L}(\mathcal{X}_{{\mu_{k}},k})-\mathcal{C} \right\|\leq \eta \varepsilon, $$

(27)

was satisfied, where η = 1.01 is a user-chosen constant and ε is the norm of error in ${\mathcal{C}}$, i.e., $\varepsilon =\|{\mathcal{E}}\|$. Our numerical results illustrate that the performance of the algorithms is not very sensitive to the choice of η(≥ 1); we illustrate the convergence behavior of the algorithms for several values of η in Example 5. We remark that the left-hand side of (27) can be computed inexpensively by using (17) with ${\mathcal{M}}$ and ${\mathcal{F}}$ replaced by ${\mathcal{L}}$ and ${\mathcal{C}}$, respectively. We compare Algorithm 2 of the present paper to methods that apply the Hessenberg and flexible Hessenberg processes based on tensor format to reduce the given large problem to smaller ones. These methods are used together with Tikhonov regularization and are described in [24]. The discrepancy principle is used to determine the regularization parameter. We refer to the method that uses the Hessenberg process based on tensor format together with Tikhonov regularization as the HT₋BTF method; when the Hessenberg process based on tensor format is replaced by the flexible Hessenberg process based on tensor format, the resulting method is referred to as the FHT₋BTF method.

Table 1 Comparison results for Example 1 with respect to stopping criterion (27)

Full size table

When the coefficient matrices are dense and not very large, the FHT₋BTF method outperforms the other methods in our comparison. However, for large and sparse coefficient matrices, FHT₋BTF requires more CPU time than Algorithm 2. For large problems, the FHT₋BTF method requires many iterations to satisfy the stopping criterion (27). We therefore for the results reported in Tables 2, 3, 4, 6, 8, 9, and 10 used the alternative stopping criterion

$$ \frac{{\| {{\mathcal{X}_{{\mu_{k}},k}} - {\mathcal{X}_{{\mu_{k - 1}},k - 1}}} \|}} {{\| {{\mathcal{X}_{{\mu_{k - 1}},k - 1}}} \|}} \le \tau $$

(28)

for a user-specified value of the parameter τ > 0. Moreover, at most 300 iterations were allowed. In the FHT₋BTF method, we used two steps of the stabilized biconjugate gradient method based on tensor format (BiCGSTAB₋BTF) [9] as inner iteration; see [24] for further details. Choosing a smaller value of τ results in that a larger number of iteration is required to satisfy (28). We illustrate the performance of Algorithm 2 for several values of τ in Example 5.

We report the number of iterations and the CPU-time (in seconds) required by the methods in our comparison to compute approximate solutions that satisfy the specified stopping criteria. Section 4.1 discusses the solution of severely ill-conditioned problems of the form (2) and Section 4.2 considers severely ill-conditioned problems of the form (3). The blurring matrices used in Section 4.1 can be expressed as

$$ I \otimes I \times A^{(1)} + I \otimes A^{(2)} \otimes I + A^{(3)} \otimes I \otimes I, $$

while the blurring matrices applied in Section 4.2 can be written as

$$ {I - {A^{(3)}} \otimes {A^{(2)}} \otimes {A^{(1)}}}, $$

where the A^(ℓ) are a Gaussian Toeplitz matrix A = [a_ij] given by

$$ a_{ij} = \left\{\begin{array}{ll} \frac{1}{\sigma \sqrt{2\pi}}\exp\left( -\frac{(i-j)^{2}}{2\sigma^{2}}\right) ,&|i-j|\leq r,\\ 0,&\text{otherwise}, \end{array}\right. $$

(29)

or a Toeplitz matrix with entries

$$ a_{ij} = \left\{\begin{array}{ll} \frac{1}{2r-1},&|i-j|\leq r,\\ 0,&\text{otherwise.} \end{array}\right. $$

(30)

We further present some experiments for Sylvester and Stein tensor equations with the coefficient matrices given in Case II of Remark 1 at the end of each subsection. Blurring matrices of type (29) and (30) have been used in the literature to test iterative schemes for image deblurring; see [4,5,6, 15].

4.1 Experimental results for severely ill-conditioned Sylvester tensor equations

We consider (2) with coefficient matrices that are dense and very ill-conditioned. This kind of equation arises from the discretization of a fully three-dimensional microscale dual-phase lag problem by a mixed-collocation finite difference method; see [21,22,23] for details.

Example 1

Consider (2) with the matrices $A^{(\ell )}=[a_{ij}]\in \mathbb {R}^{n\times n}$ for ℓ = 1,2,3 defined by

$$ a_{ij}= \left\{\begin{array}{ll} -2\left( \frac{\pi}{L} \right)^{2}\frac{(-1)^{i+j}}{\sin^{2}\left[\frac{1}{2} \left( \frac{2\pi \xi_{j}}{L}-x_{i}\right) \right]}, & i\ne j\\ -\left( \frac{\pi}{L} \right)^{2}\left( \frac{n^{2}+2}{3} \right),& i=j, \end{array}\right. $$

where $x_{i}=\frac {2\pi (i-1)}{n}$, $\xi _{j}=\frac {(j-1)L}{n}$, $i,j=1,2,\dots ,n$, and L = 300. When n is odd, the coefficient matrices A^(ℓ) are well-conditioned and the problem can be solved successfully with a block iterative method; see [3]. However, when n is even, the coefficient matrices are very ill-conditioned. This is illustrated in [24, Example 5.4]. The error-free right-hand side of (2) is constructed so that $\hat {{\mathcal{X}}}=\text {randn}(n,n,n)$ is the exact solution, i.e., $\hat {{\mathcal{X}}}$ has normally distributed random entries with mean zero and variance one. Table 1 shows the numerical results obtained. Computed approximate solutions and the exact solution are displayed in Fig. 3.

Table 1 shows the FHT₋BTF method to perform better than the other methods. This is typical for problems with dense coefficient matrices.

We next turn to an image restoration problem, in which the error-free right-hand side in (2) is constructed so that the exact solution is a hyperspectral image. Here, the matrices A⁽ⁱ⁾, i = 1,2,3, are sparse and we will see that Algorithm 2 performs the best.

Example 2

We consider the situation when the exact solution of (2) is a tensor of order 1019 × 1337 × 33 that represents a hyperspectral image of a natural scene.^{Footnote 2} The coefficient matrices A⁽¹⁾, A⁽²⁾, and A⁽³⁾ are defined by (30) with suitable dimensions and with r = 2 for A⁽¹⁾ and A⁽²⁾, and r = 3 for A⁽³⁾. This gives cond(A⁽¹⁾) = 5.26 ⋅ 10¹⁶, cond(A⁽²⁾) = 1.75 ⋅ 10¹⁷, and cond(A⁽³⁾) = 4.75 ⋅ 10¹⁶. Thus, all the coefficient matrices are numerically singular.

As mentioned above, the (F)HT₋BTF methods can not be efficiently used with the stopping criterion (27). Therefore, we used the stopping criterion (28) for all algorithms. The results are reported in Table 2. Algorithm 2 can be seen to perform better than the HT₋BTF and FHT₋BTF methods. Table 7 illustrates that the computational effort increases as the error in the tensor ${\mathcal{C}}$ decreases. Here, Algorithm 2 was terminated as soon as (27) was satisfied. The contaminated and restored images are displayed in Figs. 4 and 5.

Table 2 Results for Example 2 using the stopping criterion (28) with τ = 2 ⋅ 10^− 2

Full size table

Example 3

Consider the Sylvester tensor equation (2) whose coefficient matrices A⁽¹⁾, A⁽²⁾, and A⁽³⁾ are defined by (30) with r = 30 for A⁽¹⁾, r = 20 for A⁽²⁾, and r = 20 for A⁽³⁾. We examine the performance of algorithms for the following cases:

Case I :: Let the exact solution of (2) be hyperspectral image of order 1019 × 1337 × 33 in the above example. Here, we have cond(A⁽¹⁾) = 1.66 ⋅ 10¹⁸,cond(A⁽²⁾) = 4.13 ⋅ 10¹⁹, and cond(A⁽³⁾) = 5.59 ⋅ 10¹⁸.
Case II :: Let $\hat {{\mathcal{X}}}=\text {randn}(1000,500,100)$ be the exact solution of (2), i.e., $A^{(1)}\in \mathbb {R}^{1000\times 1000}$, $A^{(2)}\in \mathbb {R}^{500\times 500}$ and $A^{(3)}\in \mathbb {R}^{100\times 100}$ for which cond(A⁽¹⁾) = 1.74 ⋅ 10¹⁸,cond(A⁽²⁾) = 8.07 ⋅ 10¹⁷, and cond(A⁽³⁾) = 3.66 ⋅ 10¹⁸.

Results for these cases are reported in Table 3. The table shows Algorithm 2 to converge faster for Case I. However, the HT₋BTF method outperforms the other approaches for the noise level 0.01 for Case II. We remark that the performance of the methods when applied to the Stein tensor equation is different when increasing r in the coefficient matrices; see Example 7 for more details.

Table 3 Results for Example 3 using the stopping criterion (28) with τ = 2 ⋅ 10^− 2

Full size table

We turn to results for the Sylvester tensor equation with the coefficient matrices given in Case II of Remark 1. This equation arises from the discretisation of a three-dimensional convection-diffusion equation on a uniform grid using a standard finite difference for the diffusion term and a second order convergent scheme (Fromm’s scheme) for the convection term with mesh size h = 1/(n + 1); see [2, 3]. This problem was examined in [3] for n × n × n grids with n ≤ 110, for which the corresponding matrix $\mathcal {A}$ is not severely ill-conditioned. However, the condition number increases with the value of n.

Example 4

Consider the Sylvester tensor equation for N = 3 with the coefficient matrices A^(ℓ) for ℓ = 1,2,3 given in the second case of Remark 1. Table 4 shows that Algorithm 2 is an efficient solver. When the noise level is small, FHT₋BTF requires more CPU-time than HT₋BTF and produces slightly more accurate approximate solutions.

Table 4 Results for Example 4 using stopping criterion (28)

Full size table

4.2 Experimental results for severely ill-conditioned Stein tensor equations

In this subsection, we consider the solution of three severely ill-conditioned problems of the form (3). For the first two examples, error-free right-hand sides are constructed so that the exact solutions are color images. The iterations with the algorithms were terminated with the stopping criteria (27) or (28). We conclude this subsection by reporting the results for Stein tensor equations with the coefficient matrices given in Case II of Remark 1.

Example 5

The “exact” image^{Footnote 3} is represented by a 576 × 787 × 3 tensor and is displayed in Fig. 6a. The coefficient matrices of (3) are A⁽¹⁾, which is defined by (29), and A⁽²⁾ and A⁽³⁾, which are given by (30), and have suitable dimensions. We set r = 7,σ = 2 for A⁽¹⁾, and r = 2 for A⁽²⁾ and A⁽³⁾. Then cond(A⁽¹⁾) = 1.79 ⋅ 10⁶, cond(A⁽²⁾) = 4.05 ⋅ 10¹⁷, and cond(A⁽³⁾) = 6.45 ⋅ 10⁴⁹. We found that when using the stopping criterion (28), the performance of Algorithm 2 is not very sensitive to small changes in η(> 1) and τ; see Fig. 7 for details.

Example 6

Let the exact solution of (3) be of order 1019 × 1337 × 33; it represents the hyperspectral image shown in Fig. 8. The coefficient matrices A⁽¹⁾, A⁽²⁾, and A⁽³⁾ of suitable dimensions are defined by (30) with r = 12 for A⁽¹⁾, r = 2 for A⁽²⁾, and r = 6 for A⁽³⁾. Then cond(A⁽¹⁾) = 2.05 ⋅ 10¹⁸, cond(A⁽²⁾) = 1.75 ⋅ 10¹⁷, and cond(A⁽³⁾) = 2.44 ⋅ 10¹⁷.

Tables 5, 6, 7, and 8 show results for Examples 5 and 6. Algorithm 2 can be seen to be superior to the other methods examined. The exact, contaminated, and restored images are shown in Figs. 6, 8, and 9.

Table 5 Results for Example 5 using the stopping criterion (27)

Full size table

Table 6 Results for Example 5 using the stopping criterion (28) with τ = 2 ⋅ 10^− 2

Full size table

Table 7 Results for Algorithm 2 with the stopping criterion (27)

Full size table

Table 8 Results for Example 6 using the stopping criterion (28) with τ = 3 ⋅ 10^− 2

Full size table

Similarly to Example 3, we consider coefficient matrices (30) with larger values of r. Differently from Sylvester tensor equations, all algorithms perform better when increasing the value of r. For the Stein tensor equation, we note that Algorithm 2 can be competitive with the (F)HT₋BTF method.

Example 7

Consider the Stein tensor (3) with the matrices A^(ℓ) given by (30) for ℓ = 1,2,3. Let r = 40 for A⁽¹⁾, r = 50 for A⁽²⁾, and r = 30 for A⁽³⁾. Table 9 reports results for the following two cases:

Table 9 Results for Example 7 using the stopping criterion (28) with τ = 3 ⋅ 10^− 2

Full size table

Case I :: Let the exact solution of (3) be the hyperspectral image of order 1019 × 1337 × 33 mentioned above. We have cond(A⁽¹⁾) = 1.18 ⋅ 10¹⁸,cond(A⁽²⁾) = 4.87 ⋅ 10¹⁸, and cond(A⁽³⁾) = 3.12 ⋅ 10¹¹⁴.
Case II :: Let $\hat {{\mathcal{X}}}=\text {randn}(1000,500,100)$ be the exact solution of (3); i.e., $A^{(1)}\in \mathbb {R}^{1000\times 1000}$, $A^{(2)}\in \mathbb {R}^{500\times 500}$, and $A^{(3)}\in \mathbb {R}^{100\times 100}$. We have cond(A⁽¹⁾) = 2.48 ⋅ 10¹⁹, cond(A⁽²⁾) = 6.70 ⋅ 10¹⁷, and cond(A⁽³⁾) = 5.70 ⋅ 10¹⁸.

The results reported in Table 9 show Algorithm 2 to perform better than (F)HT₋BTF for larger values of r.

We conclude this subsection by reporting results for a Stein tensor equation, whose coefficient matrices are given by (6).

Example 8

Let $\hat {{\mathcal{X}}}=\text {randn}(n,n,n)$ be the exact solution of equation (3) and let the coefficient matrices A⁽¹⁾, A⁽²⁾, and A⁽³⁾ be defined by (6). We observe that the (F)HT₋BT methods perform less well when increasing the problem size. Therefore, we used a slightly larger value of τ for n = 200. Table 10 shows that HT₋BT is superior to Algorithm 2 for n = 120. When n = 200, Algorithm 2 outperforms (F)HT₋BT.

Table 10 Results for Example 8 using the stopping criterion (28)

Full size table

5 Conclusions

This paper first presents some results on the conditioning of the Stein tensor equation. Then it introduces the Golub–Kahan bidiagonalization process with application to solving severely ill-conditioned linear tensor equations, such as Sylvester and Stein tensor equations. The iterative scheme also can be applied to the solution of general linear tensor equations with an operator over $\mathbb {R}^{n_{1}\times n_{2}\times {\cdots } \times n_{k}}$. We provide new theoretical results and present some numerical examples with applications to high-dimensional PDEs and color image restoration to illustrate the applicability and effectiveness of the proposed iterative method.

Notes

All computations for this section were carried out on a 64-bit 2.50-GHz core i5 processor with 8.00-GB RAM using MATLAB version 9.4 (R2018a).
http://personalpages.manchester.ac.uk/staff/d.h.foster
The image is available at https://www.hlevkin.com/TestImages/Boats.ppm

References

Bader, B.W, Kolda, T.G.: MATLAB tensor toolbox version 2.5. http://www.sandia.gov/tgkolda/TensorToolbox
Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format. Numer. Linear Algebra Appl. 20, 27–43 (2013)
Article MathSciNet Google Scholar
Beik, F.P.A., Movahed, F.S., Ahmadi-Asl, S.: On the Krylov subspace methods based on tensor format for positive definite Sylvester tensor equations. Numer. Linear Algebra Appl. 23, 444–466 (2016)
Article MathSciNet Google Scholar
Bentbib, A.H., El Guide, M., Jbilou, K., Reichel, L.: A global Lanczos method for image restoration. J. Comput. Appl. Math. 300, 233–244 (2016)
Article MathSciNet Google Scholar
Bentbib, A.H., El Guide, M., Jbilou, K, Reichel, L.: Global Golub–Kahan bidiagonalization applied to large discrete ill-posed problems. J. Comput. Appl. Math. 322, 46–56 (2017)
Article MathSciNet Google Scholar
Bouhamidi, A., Jbilou, K., Reichel, L., Sadok, H.: A generalized global Arnoldi method for ill-posed matrix equations. J. Comput. Appl. Math. 236, 2078–2089 (2012)
Article MathSciNet Google Scholar
Buccini, A., Pasha, M., Reichel, L.: Generalized singular value decomposition with iterated Tikhonov regularization. J. Comput. Appl. Math. 373, 112276 (2020)
Article MathSciNet Google Scholar
Calvetti, D., Reichel, L.: Tikhonov regularization of large linear problems. BIT 43, 263–283 (2003)
Article MathSciNet Google Scholar
Chen, Z., Lu, L.Z.: A projection method and Kronecker product preconditioner for solving Sylvester tensor equations. Sci. China Math. 55, 1281–1292 (2012)
Article MathSciNet Google Scholar
Cichocki, A., Zdunek, R., Phan, A. H., Amari, S. I.: Nonnegative matrix and tensor factorizations: Applications to exploratory multi-way data analysis and blind source separation. John Wiley & Sons, Chichester (2009)
Book Google Scholar
Eldén, L.: Algorithms for the regularization of ill-conditioned least squares problems. BIT 17, 134–145 (1977)
Article MathSciNet Google Scholar
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of inverse problems. Kluwer, Dordrecht (1996)
Book Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix computations. The Johns Hopkins University Press, Batimore (1996)
MATH Google Scholar
Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press, Cambridge (1985)
Book Google Scholar
Huang, G., Reichel, L., Yin, F.: On the choice of subspace for large-scale Tikhonov regularization problems in general form. Numer. Algorithms 81, 33–55 (2019)
Article MathSciNet Google Scholar
Huang, B., Xie, Y., Ma, C.: Krylov subspace methods to solve a class of tensor equations via the Einstein product. Numer. Linear Algebra Appl. 26, e2254 (2019)
MathSciNet MATH Google Scholar
Kindermann, S.: Convergence analysis of minimization-based noise level-freeparameter choice rules for linear ill-posed problems. Electron. Trans. Numer. Anal. 38, 233–257 (2011)
MathSciNet MATH Google Scholar
Kindermann, S., Raik, K.: A simplified L-curve method as error estimator. Electron. Trans. Numer. Anal. 53, 217–238 (2020)
Article Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Article MathSciNet Google Scholar
Liang, L., Zheng, B.: Sensitivity analysis of the Lyapunov tensor equation. Linear Multilinear Algebra 67, 555–572 (2019)
Article MathSciNet Google Scholar
Malek, A., Bojdi, Z.K., Golbarg, P.N.N.: Solving fully three-dimensional microscale dual phase lag problem using mixed-collocation finite difference discretization. J. Heat Transf. 134, 094504 (2012)
Article Google Scholar
Malek, A., Masuleh, S.H.M.: Mixed collocation-finite difference method for 3D microscopic heat transport problems. J. Comput. Appl. Math. 217, 137–147 (2008)
Article MathSciNet Google Scholar
Masuleh, S.H.M., Phillips, T.N.: Viscoelastic flow in an undulating tube using spectral methods. Comput. Fluids 33, 1075–1095 (2004)
Article Google Scholar
Najafi-Kalyani, M., Beik, F.P.A., Jbilou, K.: On global iterative schemes based on Hessenberg process for (ill-posed) Sylvester tensor equations. J. Comput. Appl. Math. 373, 112216 (2020)
Article MathSciNet Google Scholar
Paige, C.C., Saunders, M.A.: LSQR: An algorithm for sparse linear equations and sparse least squares problems. ACM Trans. Math. Softw. 8, 43–71 (1982)
Article Google Scholar
Reichel, L., Shyshkov, A.: A new zero-finder for Tikhonov regularization. BIT 48, 627–643 (2008)
Article MathSciNet Google Scholar
Sun, Y.S., Jing, M., Li, B.W.: Chebyshev collocation spectral method for three-dimensional transient coupled radiative-conductive heat transfer. J. Heat Transf. 134, 092701–092707 (2012)
Article Google Scholar
Xu, X., Wang, Q.-W.: Extending BiCG and BiCR methods to solve the Stein tensor equation. Comput. Math. Appl. 77, 3117–3127 (2019)
Article MathSciNet Google Scholar
Zak, M.K., Toutounian, F.: Nested splitting conjugate gradient method for matrix equation AXB = C and preconditioning. Comput. Math. Appl. 66, 269–278 (2013)
Article MathSciNet Google Scholar
Zak, M.K., Toutounian, F.: Nested splitting CG-like iterative method for solving the continuous Sylvester equation and preconditioning. Adv. Comput. Math. 40, 865–880 (2014)
Article MathSciNet Google Scholar
Zak, M.K., Toutounian, F.: An iterative method for solving the continuous Sylvester equation by emphasizing on the skew-Hermitian parts of the coefficient matrices. Internat. J. Comput. Math. 94, 633–649 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their suggestions and comments.

Funding

Research by LR was supported in part by NSF grants DMS-1720259 and DMS-1729509.

Author information

Authors and Affiliations

Department of Mathematics, Vali-e-Asr University of Rafsanjan, PO Box 518, Rafsanjan, Iran
Fatemeh P. A. Beik & Mehdi Najafi-Kalyani
Laboratory LMPA, 50 Rue F. Buisson, ULCO calais cedex, Lyon, France
Khalide Jbilou
Laboratory CSEHS, University UM6P, Benguérir, Morocco
Khalide Jbilou
Department of Mathematical Sciences, Kent State University, Kent, OH, 44242, USA
Lothar Reichel

Authors

Fatemeh P. A. Beik
View author publications
You can also search for this author in PubMed Google Scholar
Khalide Jbilou
View author publications
You can also search for this author in PubMed Google Scholar
Mehdi Najafi-Kalyani
View author publications
You can also search for this author in PubMed Google Scholar
Lothar Reichel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lothar Reichel.

Additional information

Dedicated to Gérard Meurant on the occasion of his 70th birthday.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beik, F.P.A., Jbilou, K., Najafi-Kalyani, M. et al. Golub–Kahan bidiagonalization for ill-conditioned tensor equations with applications. Numer Algor 84, 1535–1563 (2020). https://doi.org/10.1007/s11075-020-00911-y

Download citation

Received: 16 August 2019
Accepted: 14 February 2020
Published: 02 April 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11075-020-00911-y

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Golub–Kahan bidiagonalization for ill-conditioned tensor equations with applications

Abstract

Similar content being viewed by others

Paige’s Algorithm for solving a class of tensor least squares problem

Alternative Arnoldi process for ill-conditioned tensor equations with application to image restoration

A new preconditioned Gauss-Seidel method for solving \({\mathcal {M}}\)-tensor multi-linear system

1 Introduction

Definition 1

1.1 Notation

1.2 Contracted product

Lemma 1

Proposition 1

2 Sensitivity analysis of the Stein tensor equation

Proposition 2

Proof

Remark 1

Proposition 3

Proof

Proposition 4

Proof

Remark 2

Proposition 5

Remark 3

Remark 4

3 The tensor form of GKB and Tikhonov regularization

Theorem 1

Proof

Proposition 6

Proof

Proposition 7

4 Numerical examples

4.1 Experimental results for severely ill-conditioned Sylvester tensor equations

Example 1

Example 2

Example 3

Example 4

4.2 Experimental results for severely ill-conditioned Stein tensor equations

Example 5

Example 6

Example 7

Example 8

5 Conclusions

Notes

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation