Paige’s Algorithm for solving a class of tensor least squares problem

Duan, Xue-Feng; Zhang, Yong-Shen; Wang, Qing-Wen; Li, Chun-Mei

doi:10.1007/s10543-023-00990-y

Paige’s Algorithm for solving a class of tensor least squares problem

Published: 20 September 2023

Volume 63, article number 48, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

BIT Numerical Mathematics Aims and scope Submit manuscript

Paige’s Algorithm for solving a class of tensor least squares problem

Download PDF

Xue-Feng Duan ORCID: orcid.org/0000-0002-3326-4585¹,
Yong-Shen Zhang¹,
Qing-Wen Wang² &
…
Chun-Mei Li¹

262 Accesses
Explore all metrics

Abstract

In this paper, we consider a class of tensor least squares problem with an invertible linear transform, which arises in image restoration. Based on the operator-bidiagonal procedure, two Paige’s algorithms are designed to solve it. The convergence theorems of the new methods are derived. Numerical experiments are performed to illustrate the feasibility and efficiency of the new methods, including when the algorithm is tested with the synthetic data and on some image restoration problems. Comparisons with some previous methods are also given.

Regularized and Structured Tensor Total Least Squares Methods with Applications

Article 12 August 2024

Convergence and evaluation-complexity analysis of a regularized tensor-Newton method for solving nonlinear least-squares problems

Article 15 February 2019

Alternative Arnoldi process for ill-conditioned tensor equations with application to image restoration

Article 16 August 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Throughout this paper, vectors are written in italic lowercase letters such as u, v, matrices correspond to uppercase letters, e.g., A, B, and tensors are denoted by calligraphic capital letters such as . Let $\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}$ denote the set of ${I_1} \times {I_2} \times {I_3}$ tensors over the real field $\mathbb {R}$. The zero tensor $ {\mathscr {O}} $ is the one with all entries being zero. A slice is a two-dimensional section of a tensor, defined by fixing all but two indices. The kth frontal slice $ \textbf{X}_{::k} $ of the third-order tensor $ {{\mathscr {X}}} \in \mathbb {R}^{{I_1} \times {I_2} \times {I_3}} $ is denoted as $ {{\mathscr {X}}}^{(k)} $. The tensor Frobenius norm $||\mathscr {A} ||_F = \sqrt{\left\langle {{\mathscr {A}},\mathscr {A}} \right\rangle }$ is induced by the tensor inner product

where $ {{\mathscr {A}}} = (a_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} $, $ {{\mathscr {B}}} = (b_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} $.

In this paper, we consider the following tensor least squares problem.

Problem 1.1

Given tensors ${{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}$, ${{\mathscr {B}} } \in \mathbb {R}^{{p} \times {q} \times {n}}$, and ${{\mathscr {C}}} \in \mathbb {R}^{{m} \times {q} \times {n}}$, find a tensor $ \mathcal{\tilde{X}} \in \mathbb {R}^{ {l} \times {p} \times {n}} $ such that

$$\begin{aligned} ||{{\mathscr {A}}}{*_L}{\tilde{{\mathscr {X}}}}{*_L}{{\mathscr {B}}} - {{{\mathscr {C}}}}||_F^2 = \mathop {\min }\limits _{{{\mathscr {X}}}\in \mathbb {R}^{ {l} \times {p} \times {n}}} ||{{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}} - {\mathscr {C}}||_F^2. \end{aligned}$$

(1.1)

Here we set the linear operator $ {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}$ be

$$\begin{aligned} {{\mathscr {M}}}({{\mathscr {X}}}) = {{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}. \end{aligned}$$

The operator $ *_L $ in Problem 1.1 is a tensor-tensor product with an invertible linear transform, which was first proposed by Kernfeld, Kilmer and Aeron [27]. It is a generalization of the t-product [28] and the cosine transform product (*c product). The definitions and connections between the t-product and *c product are described in Section 2. The $ *_L $ product is also referred to as the $ \star _M $-product family in the literature [29]. This kind of tensor-tensor product has been applied to processing of seismic data [13], color image restoration [16, 18], facial recognition [21] and data compression [41].

Problem 1.1 often arises in the image restoration. As described in [16], the 3D model of color image restoration can be expressed as the tensor expression

$$\begin{aligned} {{{\mathscr {B}}}} = {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}} + {{{\mathscr {E}}}}, \end{aligned}$$

(1.2)

where $ {\mathscr {B}}$, $ {\mathscr {A}}$ and $ {\mathscr {E}} $ are the observation image, the blurring operator and the noise, respectively. And $ {\mathscr {X}} $ is the restored image to be sought. Obviously, Problem 1.1 is a generalized form of (1.2), which consider both horizontal and vertical blurring of the image. Compared with the previous image restoration models, the matrix product model [5]

$$\begin{aligned} g=Hx+n, \end{aligned}$$

(1.3)

and the Einstein product model [10, 31, 40]

$$\begin{aligned} {{{\mathscr {M}}}} = {{{\mathscr {T}}}}{ * _3}{{{\mathscr {Q}}}} + {{{\mathscr {N}}}}, \end{aligned}$$

(1.4)

the advantage of (1.2) is that it avoids the lack of information inherent in the expansion process of the tensor and takes into account the orderliness and correlation between the data.

The tensor equations and their least squares problems have been investigated by many scholars. Some mathematical tools, for example, the tensor generalized inverse, have been used in the literature [2, 3, 8, 11, 22, 37, 38]. The methods for solving the tensor equations and their least squares problems can be generally divided into two classes. The first class is the so-called direct methods, such as the elimination method [7] and the tensor generalized inverse methods [2, 3, 6, 8]. However, these methods do not perform well in large-scale data. The second class is the iterative methods. Wang, Xu and Duan [39] proposed the conjugate gradient method to solve the quaternion Sylvester tensor equation. Huang and Ma [23] and Xie et al. [40] solved the tensor least squares problem in Einstein product by conjugate gradient method and preprocessed conjugate gradient method, respectively. Ding and Wei [12] proposed the Newton method to solve the $ m-1 $ degree homogeneous tensor equation with the symmetric M-tensor coefficient. Guide and Ichi et al. [18] proposed the GMRES-type methods for solving the image processing problems with the t-product. Similar algorithms can also be found in the literature [9, 16, 17, 24, 39]. For the ill-posed problem, Reichel and Ugwu [34, 35] use the Golub-Kahan bidiagonalization and the Arnoldi upper Hessenberg preprocessors to solve the Tikhonov regularization problem in the image restoration model. Some other algorithms [19, 36] have been designed by involving matrixization of tensors, thus they are difficult to be used for the large-scale tensor least squares problem. However, the research results of Problem 1.1 are very few as far as we know.

In this paper, we consider the tensor least squares Problem 1.1 arising in image restoration. We first give the operator-bidiagonal procedures of $ {{\mathscr {M}}}({{\mathscr {X}}}) $, and then use them to design two Paige’s Algorithms for solving Problem 1.1. Two convergence theorems of the new methods are also given. A numerical example shows the feasibility of the Paige’s methods for solving Problem 1.1. The simulation experiments of the image restoration illustrate the feasibility and effectiveness of the new methods. Specifically, the algorithms in this paper differ from the ones described in [34, 35] in the following two ways:

Our proposed operator bidiagonalization process is distinguished from GG-tGKB process [35] because it provides a unified iterative framework for different linear operator bidiagonalization.
We derive a coefficient relation between the minimal Frobenius norm solution of Problem 1.1 and the orthogonal tensor bases of the Krylov subspace
$$\begin{aligned} {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}) = span\{ {{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}}){{\mathscr {M}}^*}({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})\}, \end{aligned}$$
which allows our algorithms to use only the tensor generated by the current kth iteration. We do not need additional memory for the orthogonal bases as described in [34, 35], and this approach reduces the space complexity of the algorithm.

This paper is organized as follows. In Sect. 2, we introduce some notations and definitions. And we also summarize the connection between $ *_L $ product, t-product and $ *_c $ product. In Sect. 3, we design the Paige’s algorithms for solving Problem 1.1 and give the convergence theorems. In Sect. 4, some numerical experiments are given to illustrate that the algorithms are feasible and effective.

2 Preliminaries

In this section, we give some preliminaries, and then recall the $ *_L $ product proposed by Kernfeld, Kilmer and Aeron [27] and analyze the connection between t-product, $ *_c $ product, and $ *_L $ product.

Given ${{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}$ and its frontal slice $ {{\mathscr {A}}}^{(i)} $, $ i = 1, 2, \cdots , n $, the operators bcirc unfold and fold can be defined as [28]

$$\begin{aligned} bcirc({{{\mathscr {A}}}}): = \left[ \begin{array}{cccc} {\mathscr {A}^{(1)}}&{}{\mathscr {A}^{(n)}}&{} \cdots &{}{\mathscr {A}^{(2)}}\\ {\mathscr {A}^{(2)}}&{}{\mathscr {A}^{(1)}}&{} \cdots &{}{\mathscr {A}^{(3)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\mathscr {A}^{(n)}}&{}{\mathscr {A}^{(n - 1)}}&{} \cdots &{}{\mathscr {A}^{(1)}} \end{array} \right] , unfold({{\mathscr {A}}}) := \left[ \begin{array}{c} {\mathscr {A}^{(1)}}\\ {\mathscr {A}^{(2)}}\\ \vdots \\ {\mathscr {A}^{(n)}} \end{array} \right] , \end{aligned}$$

and $ fold(unfold({{\mathscr {A}}})):= {\mathscr {A}} $. Then the t-product can be defined as follows.

Definition 2.1

(t-product [28]) Let the t-product be the operator $ * $: $ \mathbb {R}^{ {m} \times {l} \times {n}} \times \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {p} \times {n}} $ between two tensors $ {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} $ and $ {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} $, which produces a tensor of $ \mathbb {R}^{ {m} \times {p} \times {n}} $ as follows

$$\begin{aligned} {{\mathscr {A}}}*{{\mathscr {B}}} = fold(bcirc({{\mathscr {A}}})unfold(\mathscr {B})). \end{aligned}$$

Notice that the block matrix $ bcirc ({{\mathscr {A}}}) $ can be block diagonalized by the discrete Fourier transform (DFT), i.e.,

$$\begin{aligned} A = blockdiag({{{\hat{{\mathscr {A}}}}}^{(1)}},{{{\hat{{\mathscr {A}}}}}^{(2)}}, \cdots {{{\hat{{\mathscr {A}}}}}^{(n)}}) = ({F_n} \otimes {I_l})bcirc({{{\mathscr {A}}}})({F_{n}^{*}} \otimes {I_m}), \end{aligned}$$

where $ \otimes $ is the Kronecker product, $ F_n $ is the unitary DFT matrix satisfing

$$\begin{aligned} F_{n}F_{n}^{*} = F_{n}^{*}F_{n} = I_n. \end{aligned}$$

Therefore, $ {{\mathscr {A}}}*{{\mathscr {B}}} $ with the t-product can be efficiently implemented by the fast Fourier transform.

Considering the discrete cosine transform instead of the discrete Fourier transform, we obtain the $ *_c $ product similar to the t-product. The block Toeplitz-plus-Hankel matrix and the operator ten can be defined as [27]

where $ E_l = \left[ {\begin{array}{*{20}{c}} {{I_l}}\\ 0\\ \vdots \\ 0 \end{array}} \right] \in \mathbb {R}^{nl \times l}$ is the first l columns of the identity matrix $ I_{nl} $ and

$$\begin{aligned} T = \left[ {\begin{array}{*{20}{c}} {{I_m}}&{}{ - {I_m}}&{} \cdots &{}{{{( - {I_m})}^{n - 1}}}\\ 0&{}{{I_m}}&{} \cdots &{}{{{( - {I_m})}^{n - 2}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0&{}0&{}0&{}{{I_m}} \end{array}} \right] \in \mathbb {R}^{nm \times nm}. \end{aligned}$$

Definition 2.2

($ *_c $ product [27]) Set $ {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} $ and $ {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} $ be two real tensors. Then the $ *_c $ product $ {{\mathscr {A}}} *_c {{\mathscr {B}}} $ is an $ {m} \times {p} \times {n} $ real tensor defined by

$$\begin{aligned} {{\mathscr {A}}} *_c {{\mathscr {B}}} = ten(mat({{\mathscr {A}}})mat({{\mathscr {B}}})). \end{aligned}$$

Both the t-product and the $ *_c $ product use linear transformations that block diagonalize the block matrix. If we consider each diagonal block as each frontal slice of the tensor, we can redefine the t-product and $*_c$ product using the n-mode product.

Definition 2.3

($ *_L $ product [27]) Let L be an invertible linear operator (see Definition 2.4), and set $ {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} $ and $ {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} $. Then the $ *_L $ product $ {{\mathscr {A}}} *_L {{\mathscr {B}}} $ is an $ {m} \times {p} \times {n} $ tensor defined by

$$\begin{aligned} {{\mathscr {A}}} *_L {{\mathscr {B}}} = L^{-1}(L({{\mathscr {A}}}) \bigtriangleup L(\mathscr {B})), \end{aligned}$$

where the face-wise product $ {{\mathscr {A}}} \bigtriangleup {{\mathscr {B}}} $ is defined by

$$\begin{aligned} ({{\mathscr {A}}} \bigtriangleup {{\mathscr {B}}})^{(i)} = {{\mathscr {A}}}^{(i)}\mathscr {B}^{(i)}. \end{aligned}$$

Definition 2.4

([30]) Let $ M \in \mathbb {R}^{{n} \times {n}} $ be an invertible matrix. Then the invertible linear operator L: $ \mathbb {R}^{ {m} \times {l} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {l} \times {n}} $ is defined as

$$\begin{aligned} L({{\mathscr {A}}}) = {{\mathscr {A}}} \times _{3} M, \end{aligned}$$

where $ {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} $.

As described by Kernfeld, Kilmer and Aeron [27], the $ *_c $ product is an example of the $ *_L $ product, where $ M = W_{c}^{-1}C_n(I+Z) $ in the invertible linear operator L. Here $ C_n $ denote the $ n \times n $ orthogonal DCT matrix, $ W_c = diag(C_n(:,1)) $ and Z is the $ n \times n $ (singular) circulant upshift matrix. The connection between the t-product and the $ *_L $ product is

$$\begin{aligned} M = W_{t}^{-1}F_n, \end{aligned}$$

where $ W_t = diag(F_n(:,1)) $.

Using the invertible linear operator L, we can give the definition of a tensor transpose which satisfies the t-product and $ *_c $ product.

Definition 2.5

([27]) Set $ {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} $, then transpose $ {{\mathscr {A}}}^T \in \mathbb {R}^{ {l} \times {m} \times {n}} $ satisfies

$$\begin{aligned} L({{\mathscr {A}}}^T)^{(i)} = (L({{\mathscr {A}}})^{(i)})^T,~i=1, \cdots ,n. \end{aligned}$$

We also need the following definitions for the paper.

Definition 2.6

([18]) Let $ \mathbb {V} = [{{\mathscr {V}}}_1, {{\mathscr {V}}}_2, \cdots ,{{\mathscr {V}}}_k] \in \mathbb {R}^{m \times kp \times n} $, where $ {{\mathscr {V}}}_i \in \mathbb {R}^{m \times p \times n} $, $ i = 1,2, \cdots ,k $ and $ y = [y_1, y_2, \cdots ,y_k]^T \in \mathbb {R}^k $, $ Z = [z_1,z_2, \cdots ,z_k] \in \mathbb {R}^{k \times k} $. Then the product $ \mathbb {V} \circledast y $ and $ \mathbb {V} \circledast Z $ are defined as

$$\begin{aligned} \mathbb {V} \circledast y = \sum \limits _{i = 1}^k {{y_i}{\mathscr {V}_i}}, ~ \mathbb {V} \circledast Z = [{{\mathscr {V}}}_1 \circledast z_1, {{\mathscr {V}}}_2 \circledast z_2, \cdots ,{{\mathscr {V}}}_k \circledast z_k]. \end{aligned}$$

Definition 2.7

([18]) Let $ \mathbb {A} = [{{\mathscr {A}}}_1, {{\mathscr {A}}}_2, \cdots ,{{\mathscr {A}}}_k] \in \mathbb {R}^{m \times kl \times n} $ and $ \mathbb {B} = [{{\mathscr {B}}}_1, {{\mathscr {B}}}_2, \cdots ,{{\mathscr {B}}}_p] \in \mathbb {R}^{m \times pl \times n} $, where $ {{\mathscr {A}}}_i, {{\mathscr {B}}}_i \in \mathbb {R}^{{m} \times {l} \times {n}} $, $ i = 1,2,\cdots $. Then the product $ {\mathbb {A}}^T \lozenge {\mathbb {B}} $ is the $ k \times p $ matrix defined by

$$\begin{aligned} ({\mathbb {A}}^T \lozenge {\mathbb {B}})_{ij} = \left\langle { \mathscr {A}_i,{{\mathscr {B}}}_j} \right\rangle . \end{aligned}$$

Notice that if $ \mathbb {B} = {{\mathscr {B}}} \in \mathbb {R}^{m \times l \times n} $, then the product $ {\mathbb {A}}^T \lozenge {{\mathscr {B}}} $ yields a column vector.

Definition 2.8

([26]) Let $ {\mathscr {M}} $: $ \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {q} \times {n}} $ be the linear operator. If the linear operator $ {{\mathscr {M}}}^* $: $ \mathbb {R}^{ {l} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} $ satisfies

$$\begin{aligned} \left\langle {{{{\mathscr {M}}}}({{{\mathscr {X}}}}),{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{{\mathscr {M}}}}^*}({{{\mathscr {Y}}}})} \right\rangle ,~ \forall {{{\mathscr {X}}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} , ~ {{{\mathscr {Y}}}} \in \mathbb {R}^{ {l} \times {q} \times {n}}, \end{aligned}$$

then it is called the adjoint of $ {\mathscr {M}} $.

3 Paige’s Algorithms for solving problem 1.1

In this section, we first give the operator-bidiagonal procedure of $ {{\mathscr {M}}}({{\mathscr {X}}}) $, and then use it to design the Paige’s Algorithms for solving Problem 1.1. The convergence of the new methods are also given.

3.1 The operator-bidiagonal procedure

This subsection gives the operator-bidiagonal procedure. Firstly, the adjoint operator $ {{\mathscr {M}}}^* $: $ \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} $ of the linear operator $ {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}$ in Problem 1.1 is derived.

Lemma 3.1

Set $ {{\mathscr {M}}}({{\mathscr {X}}}) = {{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}} $. The adjoint of the linear operator $ {\mathscr {M}} $ is

$$\begin{aligned} {{\mathscr {M}}}^*({{\mathscr {X}}}) = {{\mathscr {A}}}^T{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}^T. \end{aligned}$$

Proof

Here we first prove that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}}} \right\rangle ,~ \forall {{{\mathscr {X}}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} , ~ {{{\mathscr {Y}}}} \in \mathbb {R}^{ {m} \times {p} \times {n}}. \end{aligned}$$

Set $ M = (m_1, m_2, \cdots , m_n) $, $ M^{-1} = ({\hat{m}}_1, {\hat{m}}_2, \cdots , {\hat{m}}_n) $, $ {{\mathscr {A}}} = (a_{ijk}) \in \mathbb {R}^{m \times l \times n} $, $ {{\mathscr {X}}} = (x_{ijk}) \in \mathbb {R}^{l \times p \times n} $, $ {{\mathscr {Y}}} = (y_{ijk}) \in \mathbb {R}^{m \times p \times n} $, and set

$$\begin{aligned} a_{ij}^{(i)} := \textbf{a}_{ij:}^{T}m_i , ~ i = 1, 2, \cdots , n. \end{aligned}$$

Combining Definition 2.4, we have

$$\begin{aligned} L({{\mathscr {A}}})^{(i)} = ({{\mathscr {A}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {a_{11}^{(i)}}&{}{a_{12}^{(i)}}&{} \cdots &{}{a_{1l}^{(i)}}\\ {a_{21}^{(i)}}&{}{a_{22}^{(i)}}&{} \cdots &{}{a_{2l}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {a_{m1}^{(i)}}&{}{a_{m2}^{(i)}}&{} \cdots &{}{a_{ml}^{(i)}} \end{array}} \right] . \end{aligned}$$

Similarly, we have

$$\begin{aligned}&L({{\mathscr {X}}})^{(i)} = ({{\mathscr {X}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {x_{11}^{(i)}}&{}{x_{12}^{(i)}}&{} \cdots &{}{x_{1p}^{(i)}}\\ {x_{21}^{(i)}}&{}{x_{22}^{(i)}}&{} \cdots &{}{x_{2p}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {x_{l1}^{(i)}}&{}{x_{l2}^{(i)}}&{} \cdots &{}{x_{lp}^{(i)}} \end{array}} \right] ,\\&L({{\mathscr {Y}}})^{(i)} = ({{\mathscr {Y}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {y_{11}^{(i)}}&{}{y_{12}^{(i)}}&{} \cdots &{}{y_{1p}^{(i)}}\\ {y_{21}^{(i)}}&{}{y_{22}^{(i)}}&{} \cdots &{}{y_{2p}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {y_{m1}^{(i)}}&{}{y_{m2}^{(i)}}&{} \cdots &{}{y_{mp}^{(i)}} \end{array}} \right] , \end{aligned}$$

where $ x_{ij}^{(i)}:= \textbf{x}_{ij:}^{T}m_i, ~ y_{ij}^{(i)}:= \textbf{y}_{ij:}^{T}m_i, ~ i = 1, 2, \cdots , n $. Then the expression for any element in $ {{\mathscr {A}}}{*_L}{{\mathscr {X}}} $ can be given as

$$\begin{aligned} ({{\mathscr {A}}}{*_L}{{\mathscr {X}}})_{ijk} = \left( {\begin{array}{*{20}{c}} {\sum \nolimits _{t = 1}^l {a_{it}^{(1)}x_{tj}^{(1)}} }&{\sum \nolimits _{t = 1}^l {a_{it}^{(2)}x_{tj}^{(2)}} }&\cdots&{\sum \nolimits _{t = 1}^l {a_{it}^{(n)}x_{tj}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}. \end{aligned}$$

Noting that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \sum \limits _{i=1}^n {\left\langle {({{\mathscr {A}}}{*_L}\mathscr {X})^{(i)},{{\mathscr {Y}}}^{(i)}} \right\rangle }, \end{aligned}$$

and by making use of the matrix inner product of each frontal slice of the tensor $ {{\mathscr {A}}}{*_L}{{\mathscr {X}}} $ and tensor $ {\mathscr {Y}} $, we get

$$\begin{aligned} \begin{aligned}&\sum \limits _{i=1}^n{(({{\mathscr {A}}}{*_L}{{\mathscr {X}}})^{(i)T}\mathscr {Y}^{(i)})_{rr}}\\ {}&= \sum \limits _{i=1}^n {\sum \limits _{j=1}^m {\left( {\begin{array}{*{20}{c}} {\sum \limits _{t = 1}^l {a_{jt}^{(1)}x_{tr}^{(1)}} }&{\sum \limits _{t = 1}^l {a_{jt}^{(2)}x_{tr}^{(2)}} }&\cdots&{\sum \limits _{t = 1}^l {a_{jt}^{(n)}x_{tr}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}} {y_{jr}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{j=1}^m {\left( {\begin{array}{*{20}{c}} {\sum \limits _{t = 1}^l {a_{jt}^{(1)}x_{tr}^{(1)} {y_{jr}}} }&{\sum \limits _{t = 1}^l {a_{jt}^{(2)}x_{tr}^{(2)} {y_{jr}}} }&\cdots&{\sum \limits _{t = 1}^l {a_{jt}^{(n)}x_{tr}^{(n)} {y_{jr}}} } \end{array}} \right) {{{{\hat{m}}}}_i}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{t = 1}^l {\left( {\begin{array}{*{20}{c}} {\sum \limits _{j=1}^m {a_{jt}^{(1)}y_{jr}^{(1)} {x_{tr}}} }&{\sum \limits _{j=1}^m {a_{jt}^{(2)}y_{jr}^{(2)} {x_{tr}}} }&\cdots&{\sum \limits _{j=1}^m {a_{jt}^{(n)}y_{jr}^{(n)} {x_{tr}}} } \end{array}} \right) {{\hat{m}}_i}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{t = 1}^l {\left( {\begin{array}{*{20}{c}} {\sum \limits _{j=1}^m {a_{jt}^{(1)}y_{jr}^{(1)}} }&{\sum \limits _{j=1}^m {a_{jt}^{(2)}y_{jr}^{(2)} } }&\cdots&{\sum \limits _{j=1}^m {a_{jt}^{(n)}y_{jr}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}} {x_{tr}}} \\ {}&= \sum \limits _{i=1}^n{({{\mathscr {X}}}^{(i)T}(\mathscr {A}^{T}{*_L}{{\mathscr {Y}}})^{(i)})_{rr}}, ~ r = 1, 2, \cdots ,p. \end{aligned} \end{aligned}$$

This implies that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}}} \right\rangle . \end{aligned}$$

(3.1)

In a similar way, we have

$$\begin{aligned} \left\langle {{{\mathscr {X}}}{*_L}{{\mathscr {B}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {Y}}}{*_L}{{\mathscr {B}}}^T}} \right\rangle . \end{aligned}$$

(3.2)

Combining (3.1) and (3.2), we can obtain that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}{*_L}{{\mathscr {B}}}^T}} \right\rangle , \end{aligned}$$

which means that

$$\begin{aligned} {{\mathscr {M}}}^*({{\mathscr {X}}}) = {{\mathscr {A}}}^T{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}^T. \end{aligned}$$

The proof is completed. $\square $

Combining Definition 2.8, we simplify the operator $ {\mathscr {M}} $ in Problem 1.1 using the operator-bidiagonal procedure, which is similar to the Golub-Kahan bidiagonalization technique [14] as follows.

In Algorithm 1, the scalars $ \alpha _k $ and $ \beta _k $ are chosen so that $ ||{{{\mathscr {U}}}_k}||_F = ||{{{\mathscr {V}}}_k}||_F = 1 $. It is not difficult to find that after m iterations of Algorithm 1, we have the following results

$$\begin{aligned} \begin{aligned} {{{{\mathscr {V}}}}_k}&\in span\{ {{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}}){{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})\}\\ {}&\buildrel \Delta \over = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})) = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}),\\ {{{{\mathscr {U}}}}_k}&\in span\{ {{{{\mathscr {U}}}}_1},({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}({{{{\mathscr {U}}}}_1})\} \buildrel \Delta \over = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {U}}}}_1}). \end{aligned} \end{aligned}$$

By using the definitions

$$\begin{aligned} \begin{aligned}&{\mathbb {V}_k}: = [{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_2}, \cdots ,{{{{\mathscr {V}}}}_k}] \in \mathbb {R}{^{{l} \times pk \times {n}}}, ~ {\mathbb {U}_k}: = [{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_2}, \cdots ,{{{{\mathscr {U}}}}_k}] \in \mathbb {R}{^{{m} \times qk \times {n}}},\\&{L_k} = \left[ {\begin{array}{*{20}{c}} {{\alpha _1}}&{}{}&{}{}&{}{}\\ {{\beta _2}}&{}{{\alpha _2}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\beta _k}}&{}{{\alpha _k}} \end{array}} \right] , ~ {{\tilde{L}}_k} = \left[ {\begin{array}{*{20}{c}} {{\alpha _1}}&{}{}&{}{}&{}{}\\ {{\beta _2}}&{}{{\alpha _2}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\beta _k}}&{}{{\alpha _k}}\\ {}&{}{}&{}{}&{}{{\beta _{k + 1}}} \end{array}} \right] ,\\&{{{{\mathscr {M}}}}^*}({\mathbb {U}_k}): = [{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_2}), \cdots ,{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_k})] \in \mathbb {R}{^{{l} \times pk \times {n}}},\\&{{{\mathscr {M}}}}({\mathbb {V}_k}): = [{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_1}),{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_2}), \cdots ,{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_k})] \in \mathbb {R}{^{{m} \times qk \times {n}}}, \end{aligned} \end{aligned}$$

the recurrence relations of Algorithm 1 can be rewritten as

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({\mathbb {U}_k}) = {\mathbb {V}_k} \circledast L_k^T, ~ {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {U}_{k + 1}} \circledast {{\tilde{L}}_k}. \end{aligned}$$

(3.3)

Theorem 3.1

The tensors sequences $ \{{{\mathscr {V}}}_k\} $ and $ \{{{\mathscr {U}}}_k\} $ generated by Algorithm 1 are the orthonormal basis of the Krylov space $ {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {V}}_1}) $ and ${{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {U}}_1}) $, i.e.

$$\begin{aligned} \mathbb {U}_k^T \lozenge {\mathbb {U}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$

(3.4)

Proof

We prove this theorem by the mathematical induction. When $ m = 1 $, we have

$$\begin{aligned} \begin{aligned} \left\langle {{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_2}} \right\rangle&= \left\langle {{{{{\mathscr {U}}}}_1},{{({{{\mathscr {M}}}}({{{{\mathscr {V}}}}_1}) - {\alpha _1}{{{{\mathscr {U}}}}_1})} / {{\beta _2}}}} \right\rangle \\ {}&= \frac{{\left\langle {{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),{{{{\mathscr {V}}}}_1}} \right\rangle - {\alpha _1}\left\langle {{{{{\mathscr {U}}}}_1},{{\mathscr {U}}_1}} \right\rangle }}{{{\beta _2}}}\\ {}&= \frac{{\left\langle {{\alpha _1}{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_1}} \right\rangle - {\alpha _1}\left\langle {{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_1}} \right\rangle }}{{{\beta _2}}} = 0. \end{aligned} \end{aligned}$$

Assume that the result is true for $ m = k $, then for $ k+1 $, we have

$$\begin{aligned} \begin{aligned} \left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 2}}} \right\rangle&= \left\langle {{{{\mathscr {U}}}_i},{{({{\mathscr {M}}}({{{\mathscr {V}}}_{k + 1}}) - {\alpha _{k + 1}}{{{\mathscr {U}}}_{k + 1}})} / {{\beta _{k + 2}}}}} \right\rangle \\ {}&= \dfrac{{\left\langle {{\mathscr {U}_i},{{\mathscr {M}}}({{{\mathscr {V}}}_{k + 1}})} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \dfrac{{\left\langle {{{\mathscr {M}}}^*({\mathscr {U}_i}),{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \dfrac{{\left\langle {\alpha _i \mathscr {V}_i + \beta _i {{\mathscr {V}}}_{i-1},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \left\{ \begin{array}{l} \dfrac{{\left\langle {{\alpha _i}{{{\mathscr {V}}}_i} + {\beta _i}{\mathscr {V}_{i - 1}},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}}, \qquad ~ ~ i = 1,2, \cdots , k \\ \dfrac{{\left\langle {{\alpha _{k + 1}}{{{\mathscr {V}}}_{k + 1}} + {\beta _{k + 1}}{{{\mathscr {V}}}_k},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_{k + 1}},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}}, ~ i = k + 1 \end{array} \right. \\ {}&= 0. \end{aligned} \end{aligned}$$

Similarly, we also have

$$\begin{aligned} \begin{aligned} \left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k + 1}}} \right\rangle&= \left\langle {{{{\mathscr {V}}}_i},{{({{\mathscr {M}}}^*({{{\mathscr {U}}}_{k + 1}}) - {\beta _{k + 1}}{{{\mathscr {V}}}_{k}})} / {{\alpha _{k + 1}}}}} \right\rangle \\ {}&= \dfrac{{\left\langle {{{{\mathscr {V}}}_i}},\mathscr {M}^*({{{\mathscr {U}}}_{k + 1}}) \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \dfrac{{\left\langle {{{\mathscr {M}}}({\mathscr {V}_i}),{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \dfrac{{\left\langle {\alpha _i \mathscr {U}_i + \beta _{i+1} {{\mathscr {U}}}_{i+1},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \left\{ \begin{array}{l} \dfrac{{\left\langle {\alpha _i {{\mathscr {U}}}_i + \beta _{i+1} {{\mathscr {U}}}_{i+1},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}}, ~ ~ ~ i = 1,2, \cdots , k-1 \\ \dfrac{{\left\langle {{\alpha _{k}}{{{\mathscr {U}}}_{k}} + {\beta _{k + 1}}{{{\mathscr {U}}}_{k+1}},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_{k}},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}}, ~ i = k \end{array} \right. \\ {}&= 0. \end{aligned} \end{aligned}$$

It is easy to obtain that $ ||{{\mathscr {V}}}_{k+1}||_F = ||{{\mathscr {U}}}_{k+1}||_F = 1 $, which means

$$\begin{aligned} \mathbb {U}_k^T \lozenge {\mathbb {U}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$

The proof is completed. $\square $

In a similar way we can get the upper operator-bidiagonal procedure of the linear operator $ {{\mathscr {T}}} $: $ \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} $

$$\begin{aligned} {{\mathscr {T}}}({{\mathscr {X}}}) = {\hat{{\mathscr {A}}}} *_L {{\mathscr {X}}} *_L {\hat{\mathscr {B}}}, \end{aligned}$$

where $ {\hat{{\mathscr {A}}}} \in \mathbb {R}^{ {l} \times {m} \times {n}} $ and $ {\hat{{\mathscr {B}}}} \in \mathbb {R}^{ {q} \times {p} \times {n}} $, which can be seen in the following Algorithm 2.

Similar to Algorithm 1, assume that

$$\begin{aligned} \begin{aligned}&{\mathbb {V}_k}: = [{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_2}, \cdots ,{{\mathscr {V}}_k}] \in \mathbb {R}{^{{l} \times pk \times {n}}}, ~ {\mathbb {P}_k}: = [{{{{\mathscr {P}}}}_1},{{{{\mathscr {P}}}}_2}, \cdots ,{{\mathscr {P}}_k}] \in \mathbb {R}{^{{m} \times qk \times {n}}},\\ {}&{B_k} = \left[ {\begin{array}{*{20}{c}} {{\rho _1}}&{}{}{{\theta _2}}&{}{}{}&{}{}{}\\ {}&{}{}{{\rho _2}}&{}{} \ddots &{}{}{}\\ {}&{}{}{}&{}{} \ddots &{}{}{{\theta _k}}\\ {}&{}{}{}&{}{}{}&{}{}{{\rho _k}} \end{array}} \right] ,{{{\tilde{B}}}_k} = \left[ {\begin{array}{*{20}{c}} {{\rho _1}}&{}{}{{\theta _2}}&{}{}{}&{}{}{}&{}{}{}\\ {}&{}{}{{\rho _2}}&{}{} \ddots &{}{}{}&{}{}{}\\ {}&{}{}{}&{}{} \ddots &{}{}{{\theta _k}}&{}{}{}\\ {}&{}{}{}&{}{}{}&{}{}{{\rho _k}}&{}{}{{\theta _{k + 1}}} \end{array}} \right] , \end{aligned} \end{aligned}$$

the recurrence relations of Algorithm 2 can be rewritten as

$$\begin{aligned} {{{{\mathscr {T}}}}^*}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast B_k^T, ~ {{{\mathscr {T}}}}({\mathbb {P}_k}) = {\mathbb {V}_{k + 1}} \circledast {{\tilde{B}}_k}. \end{aligned}$$

(3.5)

Theorem 3.2

The tensors sequences $ \{{{\mathscr {V}}}_k\} $ and $ \{{{\mathscr {P}}}_k\} $ generated by Algorithm 2 are the orthonormal basis of the Krylov space $ {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{V}}_1}) $ and $ {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{P}}_1}) $, i.e.

$$\begin{aligned} \mathbb {P}_k^T \lozenge {\mathbb {P}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$

Proof

It is similar with Theorem 3.1 and is omitted here. $\square $

Comparing (3.3) with (3.5), it can be found that the lower operator-bidiagonal procedure of the adjoint operator $ {{\mathscr {M}}}^* $ is equivalent to upper operator-bidiagonal procedure of the linear operator $ {\mathscr {M}} $.

3.2 New algorithms for Problem 1.1 based on the operator-bidiagonal procedure

In this subsection, we present two Paige’s Algorithms (denoted as Paige1-TTP and Paige2-TTP) based on Algorithms 1 and 2.

Set the initial tensor $ {{\mathscr {X}}}_0 = 0 $ and $ \beta _1 {{\mathscr {U}}}_1 = {{\mathscr {C}}} $. Suppose $ ||{{\mathscr {R}}}||_F^2 = \mathop {\min }\nolimits _{{{\mathscr {X}}}} ||{{{\mathscr {M}}}}({{{\mathscr {X}}}}) - {{{\mathscr {C}}}}||_F^2 $ and $ \hat{{\mathscr {X}}} $ is the solution of Problem 1.1. Then Problem 1.1 is equivalent to

$$\begin{aligned} {{{\mathscr {R}}}} + {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {C}}}} = {\beta _1}{{{{\mathscr {U}}}}_1}, \end{aligned}$$

and its optimality condition is

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({{{\mathscr {M}}}}({\hat{{\mathscr {X}}}}) - {{{\mathscr {C}}}}) = {{\mathscr {O}}}, \end{aligned}$$

i.e.,

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({{\mathscr {R}}}) = {{\mathscr {O}}}. \end{aligned}$$

By constructing the approximate solution $ {{\mathscr {X}}}_k $ of Problem 1.1 as

$$\begin{aligned} {{\mathscr {X}}}_k = {\mathbb {V}_k} \circledast y_k, \end{aligned}$$

we have

$$\begin{aligned} {{{\mathscr {R}}}}_k + {{{\mathscr {M}}}}({\mathbb {V}_k} \circledast y_k) = {\beta _1}{{{{\mathscr {U}}}}_1}, \end{aligned}$$

where $ {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _k}} \right) ^T} $. Combining (3.3), we can deduce that

$$\begin{aligned} {{{\mathscr {R}}}}_k + ({\mathbb {U}_{k + 1}} \circledast {{\tilde{L}}_k}) \circledast y_k = {\beta _1}{{{{\mathscr {U}}}}_1}. \end{aligned}$$

(3.6)

Simplifying (3.6) by (3.4), we obtain that

$$\begin{aligned} {{\tilde{L}}_k}{y_k} = {\beta _1}{e_1} - \mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}_k}. \end{aligned}$$

(3.7)

Clearly, there are two cases of Problem 1.1. In the first case, the equations $ {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {C}}}} $ are consistent, i.e., $ {{\mathscr {R}}} = {{\mathscr {O}}} $. In the second case, $ {{\mathscr {R}}} \ne {{\mathscr {O}}} $, the equations are inconsistent.

We first consider the Case 1 with $ {{\mathscr {R}}} = {{\mathscr {O}}} $. According to (3.7), we have

$$\begin{aligned} {{\tilde{L}}_k}{y_k} = {\beta _1}{e_1}. \end{aligned}$$

Thus y can be obtained by

$$\begin{aligned} {\eta _1} = \frac{{{\beta _1}}}{{{\alpha _1}}}, ~ {\eta _{k + 1}} = - \frac{{{\beta _{k + 1}}}}{{{\alpha _{k + 1}}}}{\eta _k}. \end{aligned}$$

The case with $ {{\mathscr {R}}} \ne {{\mathscr {O}}} $, since $ {\mathscr {R}} $ is unknown a priori, the ith element $ \eta _{i} $ of y obviously cannot be found at the same time as $ {{\mathscr {U}}}_i $ and $ {{\mathscr {V}}}_i $ are produced. We suppose that

$$\begin{aligned} \gamma \equiv \left\langle {{{{{\mathscr {U}}}}_1},{{{\mathscr {R}}}}} \right\rangle , ~ t \equiv {\left( {{\tau _1},{\tau _2}, \cdots ,{\tau _{k - 1}}} \right) ^T} , ~ \left( {\begin{array}{*{20}{c}} 1\\ t\\ {{\tau _k}} \end{array}} \right) \equiv \frac{{\mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}}}}{\gamma }. \end{aligned}$$

Together with (3.3) and (3.4), we have

$$\begin{aligned} {{\tilde{L}}_k}(\mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}}) = {({\mathbb {U}_{k + 1}}{}{{\tilde{L}}_k})^T} \lozenge {{{\mathscr {R}}}} = {{{\mathscr {M}}}}{({\mathbb {V}_k})^T} \lozenge {{{\mathscr {R}}}} = {\mathbb {V}_k}^T \lozenge ({{{{\mathscr {M}}}}^*}({{{\mathscr {R}}}})) = 0. \end{aligned}$$

Set

$$\begin{aligned} {{{\bar{L}}}_k} = \left[ {\begin{array}{*{20}{c}} {{\beta _2}}&{}{}&{}{}&{}{}\\ {{\alpha _2}}&{}{{\beta _3}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\alpha _k}}&{}{{\beta _{k + 1}}} \end{array}} \right] , \end{aligned}$$

it follows on dividing by $ \gamma $ that

$$\begin{aligned} {{{\bar{L}}}_k}\left( {\begin{array}{*{20}{c}} t\\ {{\tau _k}} \end{array}} \right) = - {\alpha _1}{e_1}. \end{aligned}$$

Therefore, t can be solved by

$$\begin{aligned} {\tau _k} = - \frac{{{\alpha _k}}}{{{\beta _{k + 1}}}}{\tau _{k - 1}}, ~ k = 1,2, \cdots \end{aligned}$$

where $ {\tau _0} = 1 $. Then assume that $ {z_k} = {\left( {{\xi _1},{\xi _2}, \cdots ,{\xi _k}} \right) ^T}$ and ${w_k} = {\left( {{\omega _1},{\omega _2}, \cdots ,{\omega _k}} \right) ^T} $ such that

$$\begin{aligned} {L_k}{z_k} = {\beta _1}{e_1}, ~{L_k}{w_k} = \left( {\begin{array}{*{20}{c}} 1\\ t \end{array}} \right) . \end{aligned}$$

By $ {y_k} = {z_k} - \gamma {w_k} $, we have

$$\begin{aligned} {\beta _{k + 1}}{\eta _k} = {\beta _{k + 1}}({\xi _k} - \gamma {\omega _k}) = - \gamma {\tau _k}, \end{aligned}$$

i.e.,

$$\begin{aligned} \gamma = {\beta _{k + 1}}\frac{{{\xi _k}}}{{{\beta _{k + 1}}\omega - {\tau _k}}}. \end{aligned}$$

Then, $ {{\mathscr {X}}}_k $ in Algorithm 3 can be redescribed as

$$\begin{aligned} {{{{\mathscr {X}}}}_k} = {\mathbb {V}_k} \circledast {y_k} = {\mathbb {V}_k} \circledast {z_k} - \gamma {\mathbb {V}_k} \circledast {w_k}. \end{aligned}$$

It is easy to find that the residual after the kth step is

$$\begin{aligned} {{\mathscr {R}}}_k = \beta _{k+1}\gamma \omega _{k}{{\mathscr {U}}}_{k+1} -\beta _{k+1}\xi _{k} + \gamma \sum \limits _{i = 1}^k {{\tau _{i - 1}}{\mathscr {U}_i}} . \end{aligned}$$

In addition to judging that $ {{\mathscr {X}}}_k $ tends to be stable, combining Algorithm 1 and Algorithm 3, we can easily find that $ \alpha _i $, $ \beta _i $, $ \xi _i $, and $ \omega _i $ become negligible. And $ \gamma $ will also tend to be stable, so we can also use $ ||{{{\gamma }}_k} - {{{\gamma }}_{k - 1}}|| \le \varepsilon $ or $|\xi _i| \le \varepsilon $ as the stopping criterion. It also shows that Algorithm 3 will converge to the least squares solution.

Now we will give the convergence theorem for Algorithm 3.

Theorem 3.3

The sequence $ \{{{\mathscr {X}}}_k\} $ generated by Algorithm 3 converges to the minimum Frobenius norm solution $ {{\mathscr {X}}}^* $ of Problem 1.1 in a finite number of steps.

Proof

Suppose that the sequence $ \{{{\mathscr {X}}}_k\} $ generated by Algorithm 3 converges to $ {{\mathscr {X}}}^* $, which satisfies

$$\begin{aligned} {{\mathscr {X}}}^* = {\mathbb {V}_k} \circledast y_k \in {{\mathscr {K}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}), ~ {{\mathscr {M}}}^*(\mathscr {R}^*) = {{\mathscr {O}}}, \end{aligned}$$

where $ {{\mathscr {R}}}^* = {{\mathscr {M}}}({{\mathscr {X}}}^*) - {{\mathscr {C}}} $. By introducing an auxiliary tensor $ {{\mathscr {D}}} \in \mathbb {R}^{l \times q \times n} $ such that $ {{\mathscr {X}}}^* = {{\mathscr {M}}}^*({{\mathscr {D}}}) $. Let $ {\tilde{{\mathscr {X}}}} $ be any least squares solution of Problem 1.1 and $ {{\mathscr {Y}}} = {\tilde{{\mathscr {X}}}} - {{\mathscr {X}}}^* $. We can obtain that

$$\begin{aligned} \begin{aligned} ||{{\mathscr {R}}}^*||_{F}^2&= ||{\tilde{{\mathscr {R}}}}||_{F}^2 = ||{{\mathscr {C}}} - {{\mathscr {M}}}({\tilde{{\mathscr {X}}}})||_{F}^2 = ||{{\mathscr {C}}} - {{\mathscr {M}}}(\mathscr {X}^*+{{\mathscr {Y}}})||_{F}^2 \\ {}&= ||{{\mathscr {R}}}^* - {{\mathscr {M}}}({{\mathscr {Y}}}) ||_{F}^2 = ||{{\mathscr {R}}}^*||_{F}^2 -2\left\langle {{{\mathscr {R}}}^*,\mathscr {M}({{\mathscr {Y}}})} \right\rangle + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2 \\ {}&= ||{{\mathscr {R}}}^*||_{F}^2 -2\left\langle {{{\mathscr {M}}}^*({{\mathscr {R}}}^*),\mathscr {Y}} \right\rangle + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2 \\ {}&= ||\mathscr {R}^*||_{F}^2 + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2, \end{aligned} \end{aligned}$$

which implies that $ {{\mathscr {M}}}({{\mathscr {Y}}}) = {{\mathscr {O}}} $. Then we have

$$\begin{aligned} \begin{aligned} ||{\tilde{{\mathscr {X}}}}||_{F}^2&= ||{{{\mathscr {X}}}^*}+{{\mathscr {Y}}}||_{F}^2 = ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {{{\mathscr {X}}}^*,{{\mathscr {Y}}}} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {{{\mathscr {M}}}^*({{\mathscr {D}}}),{{\mathscr {Y}}}} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {\mathscr {D},{{\mathscr {M}}}({{\mathscr {Y}}})} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 + ||{{\mathscr {Y}}}||_{F}^2 \ge ||{{\mathscr {X}}}^*||_{F}^2. \end{aligned} \end{aligned}$$

The equality holds if and only if $ {{\mathscr {Y}}} = {{\mathscr {O}}} $, which implies that $ {{\mathscr {X}}}^* $ is the minimum Frobenius norm solution. $\square $

Next, we will give the Paige2-TTP method. The upper operator-bidiagonal procedure of the operator $ {{\mathscr {M}}} $ is equivalent to the lower operator-bidiagonal procedure of the operator $ {{\mathscr {M}}}^* $ as mentioned above. Therefore, we have

$$\begin{aligned} {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast L_k^T, ~ {{{{\mathscr {M}}}}^*}({\mathbb {P}_k}) = {\mathbb {V}_{k + 1}} \circledast {{\tilde{L}}_k},~ {\mathbb {P}}_{k}^{T} \lozenge {\mathbb {P}}_{k} = {\mathbb {V}}_{k}^{T} \lozenge {\mathbb {V}}_{k} = I_k. \end{aligned}$$

As can be seen in Algorithm 3, we construct the approximate solution as

$$\begin{aligned} {{\mathscr {X}}}_k = {\mathbb {V}_k} \circledast y_k. \end{aligned}$$

For convenience, we choose $ {\theta _1}{{{{\mathscr {V}}}}_1} = {{\mathcal{M}}^*}({{{\mathscr {C}}}}) $. Paige [32] points out that when $ {\theta _1}{{{{\mathscr {V}}}}_1} = {{{{\mathscr {M}}}}^*}({{{\mathscr {C}}}}) $, the diagonalization process will stop when $ {\theta _{k+1}}{{\mathcal{V}}_{k+1}} ={{\mathscr {O}}} $, i.e.

$$\begin{aligned} {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast L_k^T, ~ {{{{\mathscr {M}}}}^*}({\mathbb {P}_k}) = {\mathbb {V}_{k}} \circledast {L_k}. \end{aligned}$$

Let $ {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _3}} \right) ^T} $. We are more interested in $ {\mathbb {V}_k} \circledast y_k $ than in the end of the bidiagonalization, which completely solves for $ y_k $ and then determines $ X_k $. We rederive the iterative relation

$$\begin{aligned} {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {M}}}}({\mathbb {V}_k} \circledast {y_k}) = ({\mathbb {P}_k}\circledast L_k^T)\circledast {y_k} = {\mathbb {P}_k}\circledast (L_k^T{y_k}) = {{{\mathscr {C}}}} - {{{\mathscr {R}}}}, \end{aligned}$$

which implies

$$\begin{aligned} L_k^T{y_k} = {\mathbb {P}_k}\circledast ({{{\mathscr {C}}}} - {{{\mathscr {R}}}}). \end{aligned}$$

Notice that

$$\begin{aligned} \begin{aligned} {L_k}({\mathbb {P}_k}^T\lozenge ({{{\mathscr {C}}}} - {{{\mathscr {R}}}}))&= {({\mathbb {P}_k}\circledast L_k^T)^T}\lozenge ({{{\mathscr {C}}}} - {\mathscr {R}}) = {({{{\mathscr {M}}}}({\mathbb {V}_k}))^T}\lozenge ({{{\mathscr {C}}}} - {\mathscr {R}}) \\ {}&= \mathbb {V}_k^T\lozenge ({{{{\mathscr {M}}}}^*}({{{\mathscr {C}}}} - {{{\mathscr {R}}}})) = {\theta _1}{e_1}. \end{aligned} \end{aligned}$$

By assuming that $ z = {\mathbb {P}_k}^T\lozenge ({{{\mathscr {C}}}} - {\mathcal{R}}) $, we have

$$\begin{aligned} L_k^T{y_k} = z, ~ {L_k}z = {\theta _1}{e_1}. \end{aligned}$$

(3.8)

Combining (3.8) we obtain that

$$\begin{aligned} {{{{\mathscr {X}}}}_k} = {\mathbb {V}_k}\circledast {y_k} = ({\mathbb {V}_k}\circledast L_k^{ - T})\circledast (L_k^T{y_k}) = {\mathbb {W}_k}\circledast z, \end{aligned}$$

where $ {\mathbb {W}_k} = {\mathbb {V}_k}\circledast L_k^{ - T} $ and $ {\mathbb {W}_k} = [{{{{\mathscr {W}}}}_1},{{{{\mathscr {W}}}}_2}, \cdots ,{\mathcal{W}}] $. We can determine $ {{\mathscr {W}}}_i $ sequentially by

$$\begin{aligned} {\mathbb {W}_k} \circledast L_k^{T} = {\mathbb {V}_k}. \end{aligned}$$

We give the Paige2-TTP Algorithm for solving Problem 1.1 as follows.

We also give the convergence theorem for Algorithm 4.

Theorem 3.4

The sequence $ \{{{\mathscr {X}}}_k\} $ generated by Algorithm 4 converges to the minimum Frobenius norm solution $ {{\mathscr {X}}}^* $ of Problem 1.1 in a finite number of steps.

Proof

It is similar with Theorem 3.3 and is omitted here. $\square $

3.3 Computational complexity analysis

In this subsection, we provide the computational complexity of Algorithms 3 and 4. We note that the computational cost of the multiplication operations of $ L({{\mathscr {A}}}) = {{\mathscr {A}}} \times _{3}M $ and $ {{\mathscr {A}}} {* _L} {{\mathscr {B}}} $ are $ O(mln^2) $ and $ O(mln^2 + lpn^2 + mlpn) $, respectively. Thus the cost of computing $ {{\mathscr {U}}}_k $ and $ {{\mathscr {V}}}_k $ in Algorithm 1 is $ O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) $. Similarly, the cost of computing $ {{\mathscr {V}}}_k $ and $ {{\mathscr {P}}}_k $ in Algorithm 2 is $ O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) $.

The computational cost of Algorithm 3 involves mainly tensor multiplication and addition. The computational cost of each iteration of Algorithm 3 is O(lpn) , except that the cost of calculating $ {{\mathscr {U}}}_k $ and $ {{\mathscr {V}}}_k $ is the same as in Algorithm 1, which means that the computational complexity of Algorithm 3 is $ O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) $. Analogously, the computational cost of Algorithm 4 is mainly posed by steps 4 and 5. Therefore, the computational cost for Algorithm 4 is $ O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) $.

4 Numerical experiments

In this section we give numerical examples and simulation experiments of image restoration to illustrate the performance of Algorithms 3 and 4. All of the tests are implemented in MATLAB R2018b with the machine precision $ 10^{-16} $ on PC (Intel(R) Core(TM) i5-1155G7), where the CPU is 2.50 GHz and the memory is 16.0 GB. The implementations of the algorithms are based on the functions from the MATLAB Tensor Toolbox developed by Bader and Kolda [1].

Example 4.1

Consider Problem 1.1 with $ m = 5 $, $ l = 4 $, $ n = 3 $, $ p = 4 $, and $ q = 5 $, where the tensors $ {{\mathscr {A}}} $ and $ {{\mathscr {B}}} $ are constructed by using the MATLAB command $ rand(I_1, I_2, I_3) $ as follow

$$\begin{aligned}&\begin{aligned} \mathscr {A}(:,:,1)&= \left( {\begin{array}{*{20}{c}} {0.7663}&{}{\mathrm{{0}}\mathrm{{.9361}}}&{}{\mathrm{{0}}\mathrm{{.3733}}}&{}{\mathrm{{0}}\mathrm{{.1862}}}\\ {\mathrm{{0}}\mathrm{{.1746}}}&{}{\mathrm{{0}}\mathrm{{.8927}}}&{}{\mathrm{{0}}\mathrm{{.1163}}}&{}{\mathrm{{0}}\mathrm{{.2087}}}\\ {\mathrm{{0}}\mathrm{{.5031}}}&{}{\mathrm{{0}}\mathrm{{.4435}}}&{}{\mathrm{{0}}\mathrm{{.5251}}}&{}{0.5304}\\ {\mathrm{{0}}\mathrm{{.3827}}}&{}{\mathrm{{0}}\mathrm{{.7515}}}&{}{\mathrm{{0}}\mathrm{{.3411}}}&{}{\mathrm{{0}}\mathrm{{.0843}}}\\ {\mathrm{{0}}\mathrm{{.9772}}}&{}{\mathrm{{0}}\mathrm{{.9094}}}&{}{\mathrm{{0}}\mathrm{{.8494}}}&{}{0.2408} \end{array}} \right) , \\ {{\mathscr {A}}}(:,:,2)&= \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.6346}}}&{}{\mathrm{{0}}\mathrm{{.3135}}}&{}{\mathrm{{0}}\mathrm{{.4382}}}&{}{\mathrm{{0}}\mathrm{{.7165}}}\\ {\mathrm{{0}}\mathrm{{.1111}}}&{}{\mathrm{{0}}\mathrm{{.7621}}}&{}{\mathrm{{0}}\mathrm{{.1864}}}&{}{\mathrm{{0}}\mathrm{{.6203}}}\\ {\mathrm{{0}}\mathrm{{.6073}}}&{}{\mathrm{{0}}\mathrm{{.9545}}}&{}{\mathrm{{0}}\mathrm{{.4431}}}&{}{\mathrm{{0}}\mathrm{{.0898}}}\\ {\mathrm{{0}}\mathrm{{.8898}}}&{}{\mathrm{{0}}\mathrm{{.6459}}}&{}{\mathrm{{0}}\mathrm{{.4881}}}&{}{\mathrm{{0}}\mathrm{{.1118}}}\\ {\mathrm{{0}}\mathrm{{.4907}}}&{}{\mathrm{{0}}\mathrm{{.4853}}}&{}{\mathrm{{0}}\mathrm{{.8611}}}&{}{\mathrm{{0}}\mathrm{{.5756}}} \end{array}} \right) ,\\ {{\mathscr {A}}}(:,:,3)&= \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.4617}}}&{}{\mathrm{{0}}\mathrm{{.4819}}}&{}{\mathrm{{0}}\mathrm{{.5076}}}&{}{\mathrm{{0}}\mathrm{{.1959}}}\\ {\mathrm{{0}}\mathrm{{.5454}}}&{}{\mathrm{{0}}\mathrm{{.1605}}}&{}{\mathrm{{0}}\mathrm{{.2380}}}&{}{\mathrm{{0}}\mathrm{{.9748}}}\\ {\mathrm{{0}}\mathrm{{.0380}}}&{}{\mathrm{{0}}\mathrm{{.7544}}}&{}{\mathrm{{0}}\mathrm{{.1927}}}&{}{\mathrm{{0}}\mathrm{{.5751}}}\\ {\mathrm{{0}}\mathrm{{.5793}}}&{}{\mathrm{{0}}\mathrm{{.7935}}}&{}{\mathrm{{0}}\mathrm{{.8087}}}&{}{\mathrm{{0}}\mathrm{{.5832}}}\\ {\mathrm{{0}}\mathrm{{.6290}}}&{}{\mathrm{{0}}\mathrm{{.2914}}}&{}{\mathrm{{0}}\mathrm{{.0667}}}&{}{\mathrm{{0}}\mathrm{{.9500}}} \end{array}} \right) ; \end{aligned}\\&\begin{aligned} {{\mathscr {B}}}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.4521}}}&{}{\mathrm{{0}}\mathrm{{.0431}}}&{}{\mathrm{{0}}\mathrm{{.0431}}}&{}{\mathrm{{0}}\mathrm{{.8928}}}&{}{\mathrm{{0}}\mathrm{{.6670}}}\\ {\mathrm{{0}}\mathrm{{.0268}}}&{}{\mathrm{{0}}\mathrm{{.5561}}}&{}{\mathrm{{0}}\mathrm{{.7263}}}&{}{\mathrm{{0}}\mathrm{{.7794}}}&{}{\mathrm{{0}}\mathrm{{.9781}}}\\ {\mathrm{{0}}\mathrm{{.4253}}}&{}{\mathrm{{0}}\mathrm{{.6113}}}&{}{\mathrm{{0}}\mathrm{{.7133}}}&{}{\mathrm{{0}}\mathrm{{.4949}}}&{}{\mathrm{{0}}\mathrm{{.1171}}}\\ {\mathrm{{0}}\mathrm{{.1058}}}&{}{\mathrm{{0}}\mathrm{{.7388}}}&{}{\mathrm{{0}}\mathrm{{.5038}}}&{}{\mathrm{{0}}\mathrm{{.7615}}}&{}{\mathrm{{0}}\mathrm{{.8693}}} \end{array}} \right) , \\ {{\mathscr {B}}}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.5679}}}&{}{\mathrm{{0}}\mathrm{{.0623}}}&{}{\mathrm{{0}}\mathrm{{.9191}}}&{}{\mathrm{{0}}\mathrm{{.4355}}}&{}{\mathrm{{0}}\mathrm{{.1427}}}\\ {\mathrm{{0}}\mathrm{{.1202}}}&{}{\mathrm{{0}}\mathrm{{.2781}}}&{}{\mathrm{{0}}\mathrm{{.4152}}}&{}{\mathrm{{0}}\mathrm{{.6354}}}&{}{\mathrm{{0}}\mathrm{{.2006}}}\\ {\mathrm{{0}}\mathrm{{.9703}}}&{}{\mathrm{{0}}\mathrm{{.4096}}}&{}{\mathrm{{0}}\mathrm{{.7115}}}&{}{\mathrm{{0}}\mathrm{{.9016}}}&{}{\mathrm{{0}}\mathrm{{.6209}}}\\ {\mathrm{{0}}\mathrm{{.8400}}}&{}{\mathrm{{0}}\mathrm{{.6268}}}&{}{\mathrm{{0}}\mathrm{{.9709}}}&{}{\mathrm{{0}}\mathrm{{.8716}}}&{}{\mathrm{{0}}\mathrm{{.7750}}} \end{array}} \right) , \\ {{\mathscr {B}}}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.2468}}}&{}{\mathrm{{0}}\mathrm{{.1708}}}&{}{\mathrm{{0}}\mathrm{{.0821}}}&{}{\mathrm{{0}}\mathrm{{.6053}}}&{}{\mathrm{{0}}\mathrm{{.6116}}}\\ {\mathrm{{0}}\mathrm{{.3695}}}&{}{\mathrm{{0}}\mathrm{{.7456}}}&{}{\mathrm{{0}}\mathrm{{.7815}}}&{}{\mathrm{{0}}\mathrm{{.3545}}}&{}{\mathrm{{0}}\mathrm{{.8432}}}\\ {\mathrm{{0}}\mathrm{{.3411}}}&{}{\mathrm{{0}}\mathrm{{.0980}}}&{}{\mathrm{{0}}\mathrm{{.8487}}}&{}{\mathrm{{0}}\mathrm{{.7591}}}&{}{\mathrm{{0}}\mathrm{{.4570}}}\\ {\mathrm{{0}}\mathrm{{.4564}}}&{}{\mathrm{{0}}\mathrm{{.8903}}}&{}{\mathrm{{0}}\mathrm{{.4509}}}&{}{\mathrm{{0}}\mathrm{{.5894}}}&{}{\mathrm{{0}}\mathrm{{.8828}}} \end{array}} \right) . \end{aligned} \end{aligned}$$

The tensor $ {\mathscr {C}} $ of Problem 1.1 is generated by ${{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}}^* *_L {{\mathscr {B}}} $, where $ {{\mathscr {X}}}^* $ is the exact solution.

$$\begin{aligned}&\begin{aligned} {{\mathscr {C}}}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm{{2}}\mathrm{{.0709}}}&{}{\mathrm{{2}}\mathrm{{.9842}}}&{}{\mathrm{{3}}\mathrm{{.7128}}}&{}{\mathrm{{4}}\mathrm{{.6192}}}&{}{\mathrm{{4}}\mathrm{{.2295}}}\\ {\mathrm{{1}}\mathrm{{.2905}}}&{}{\mathrm{{2}}\mathrm{{.2195}}}&{}{\mathrm{{2}}\mathrm{{.7104}}}&{}{\mathrm{{3}}\mathrm{{.5677}}}&{}{\mathrm{{3}}\mathrm{{.2243}}}\\ {\mathrm{{2}}\mathrm{{.0320}}}&{}{\mathrm{{2}}\mathrm{{.6432}}}&{}{\mathrm{{3}}\mathrm{{.3329}}}&{}{\mathrm{{4}}\mathrm{{.2888}}}&{}{\mathrm{{3}}\mathrm{{.7896}}}\\ {\mathrm{{1}}\mathrm{{.3801}}}&{}{\mathrm{{2}}\mathrm{{.6006}}}&{}{\mathrm{{2}}\mathrm{{.7593}}}&{}{\mathrm{{3}}\mathrm{{.9167}}}&{}{\mathrm{{3}}\mathrm{{.8628}}}\\ {\mathrm{{2}}\mathrm{{.7502}}}&{}{\mathrm{{3}}\mathrm{{.6893}}}&{}{\mathrm{{4}}\mathrm{{.8344}}}&{}{\mathrm{{5}}\mathrm{{.7619}}}&{}{\mathrm{{5}}\mathrm{{.1022}}} \end{array}} \right) , \\ {{\mathscr {C}}}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm{{1}}\mathrm{{.9866}}}&{}{\mathrm{{1}}\mathrm{{.1023}}}&{}{\mathrm{{2}}\mathrm{{.5292}}}&{}{\mathrm{{2}}\mathrm{{.4022}}}&{}{\mathrm{{1}}\mathrm{{.4136}}}\\ {\mathrm{{1}}\mathrm{{.3878}}}&{}{\mathrm{{0}}\mathrm{{.6701}}}&{}{\mathrm{{1}}\mathrm{{.9098}}}&{}{\mathrm{{1}}\mathrm{{.7834}}}&{}{\mathrm{{0}}\mathrm{{.9858}}}\\ {\mathrm{{1}}\mathrm{{.6750}}}&{}{\mathrm{{0}}\mathrm{{.8739}}}&{}{\mathrm{{2}}\mathrm{{.2706}}}&{}{\mathrm{{2}}\mathrm{{.1246}}}&{}{\mathrm{{1}}\mathrm{{.1841}}}\\ {\mathrm{{1}}\mathrm{{.9704}}}&{}{\mathrm{{0}}\mathrm{{.7190}}}&{}{\mathrm{{2}}\mathrm{{.2637}}}&{}{\mathrm{{1}}\mathrm{{.9022}}}&{}{\mathrm{{1}}\mathrm{{.0239}}}\\ {\mathrm{{2}}\mathrm{{.2009}}}&{}{\mathrm{{1}}\mathrm{{.1848}}}&{}{\mathrm{{2}}\mathrm{{.8530}}}&{}{\mathrm{{2}}\mathrm{{.7460}}}&{}{\mathrm{{1}}\mathrm{{.5583}}} \end{array}} \right) , \end{aligned}\\&{{\mathscr {C}}}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm{{1}}\mathrm{{.5129}}}&{}{\mathrm{{2}}\mathrm{{.6267}}}&{}{\mathrm{{2}}\mathrm{{.5285}}}&{}{\mathrm{{3}}\mathrm{{.1925}}}&{}{\mathrm{{3}}\mathrm{{.8561}}}\\ {\mathrm{{1}}\mathrm{{.6347}}}&{}{\mathrm{{2}}\mathrm{{.4602}}}&{}{\mathrm{{2}}\mathrm{{.6092}}}&{}{\mathrm{{3}}\mathrm{{.0806}}}&{}{\mathrm{{3}}\mathrm{{.3725}}}\\ {\mathrm{{1}}\mathrm{{.2401}}}&{}{\mathrm{{2}}\mathrm{{.0938}}}&{}{\mathrm{{1}}\mathrm{{.9912}}}&{}{\mathrm{{2}}\mathrm{{.8409}}}&{}{\mathrm{{3}}\mathrm{{.2008}}}\\ {\mathrm{{2}}\mathrm{{.4940}}}&{}{\mathrm{{3}}\mathrm{{.4742}}}&{}{\mathrm{{3}}\mathrm{{.7211}}}&{}{\mathrm{{4}}\mathrm{{.8108}}}&{}{\mathrm{{5}}\mathrm{{.2201}}}\\ {\mathrm{{1}}\mathrm{{.6014}}}&{}{\mathrm{{3}}\mathrm{{.1166}}}&{}{\mathrm{{3}}\mathrm{{.0049}}}&{}{\mathrm{{3}}\mathrm{{.6039}}}&{}{\mathrm{{4}}\mathrm{{.2954}}} \end{array}} \right) . \end{aligned}$$

Given a random invertible linear transformation matrix

$$\begin{aligned} M = \left( {\begin{array}{*{20}{c}} { - 0.5560}&{}{0.1429}&{}{ - 0.8188}\\ { - 0.6669}&{}{ - 0.6646}&{}{0.3368}\\ { - 0.4961}&{}{0.7334}&{}{0.4648} \end{array}} \right) , \end{aligned}$$

we solve Problem 1.1 by using Algorithms 3 and 4 with $ \varepsilon = 10^{-6} $. After 129 iterations, we get the solution

$$\begin{aligned} {{\mathscr {X}}}^* = {{\mathscr {X}}}_{Paige1}^{129} = {{\mathscr {X}}}_{Paige2}^{129}, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} {{\mathscr {X}}}_{Paige1}^{129}(:,:,1) = {{\mathscr {X}}}_{Paige2}^{129}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.1000}}}&{}{}{\mathrm {{0}}\mathrm {{.1670}}}&{}{}{\mathrm {{0}}\mathrm {{.3653}}}&{}{}{\mathrm {{0}}\mathrm {{.5469}}}\\ {\mathrm {{0}}\mathrm {{.8787}}}&{}{}{\mathrm {{0}}\mathrm {{.6461}}}&{}{}{\mathrm {{0}}\mathrm {{.4024}}}&{}{}{\mathrm {{0}}\mathrm {{.9860}}}\\ {\mathrm {{0}}\mathrm {{.8915}}}&{}{}{\mathrm {{0}}\mathrm {{.1803}}}&{}{}{\mathrm {{0}}\mathrm {{.0898}}}&{}{}{\mathrm {{0}}\mathrm {{.2127}}}\\ {\mathrm {{0}}\mathrm {{.2140}}}&{}{}{\mathrm {{0}}\mathrm {{.1165}}}&{}{}{\mathrm {{0}}\mathrm {{.6838}}}&{}{}{\mathrm {{0}}\mathrm {{.9190}}} \end{array}} \right) , \\ {{\mathscr {X}}}_{Paige1}^{129}(:,:,2) = \mathscr {X}_{Paige2}^{129}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.6652}}}&{}{}{\mathrm {{0}}\mathrm {{.1063}}}&{}{}{\mathrm {{0}}\mathrm {{.7282}}}&{}{}{\mathrm {{0}}\mathrm {{.7956}}}\\ {\mathrm {{0}}\mathrm {{.7435}}}&{}{}{\mathrm {{0}}\mathrm {{.8838}}}&{}{}{\mathrm {{0}}\mathrm {{.2856}}}&{}{}{\mathrm {{0}}\mathrm {{.0735}}}\\ {\mathrm {{0}}\mathrm {{.3785}}}&{}{}{\mathrm {{0}}\mathrm {{.4451}}}&{}{}{\mathrm {{0}}\mathrm {{.5937}}}&{}{}{\mathrm {{0}}\mathrm {{.9927}}}\\ {\mathrm {{0}}\mathrm {{.4748}}}&{}{}{\mathrm {{0}}\mathrm {{.6682}}}&{}{}{\mathrm {{0}}\mathrm {{.3705}}}&{}{}{\mathrm {{0}}\mathrm {{.7760}}} \end{array}} \right) , \\ {{\mathscr {X}}}_{Paige1}^{129}(:,:,3) = \mathscr {X}_{Paige2}^{129}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.0074}}}&{}{}{\mathrm {{0}}\mathrm {{.8850}}}&{}{}{\mathrm {{0}}\mathrm {{.9972}}}&{}{}{\mathrm {{0}}\mathrm {{.6549}}}\\ {\mathrm {{0}}\mathrm {{.8889}}}&{}{}{\mathrm {{0}}\mathrm {{.4656}}}&{}{}{\mathrm {{0}}\mathrm {{.8698}}}&{}{}{\mathrm {{0}}\mathrm {{.4118}}}\\ {\mathrm {{0}}\mathrm {{.8641}}}&{}{}{\mathrm {{0}}\mathrm {{.3716}}}&{}{}{\mathrm {{0}}\mathrm {{.0001}}}&{}{}{\mathrm {{0}}\mathrm {{.4764}}}\\ {\mathrm {{0}}\mathrm {{.6976}}}&{}{}{\mathrm {{0}}\mathrm {{.0830}}}&{}{}{\mathrm {{0}}\mathrm {{.6800}}}&{}{}{\mathrm {{0}}\mathrm {{.3883}}} \end{array}} \right) . \end{aligned} \end{aligned}$$

The residual errors are

$$\begin{aligned} \Vert {{\mathscr {A}}} *_L {{\mathscr {X}}}_{Paige1}^{129} *_L {{\mathscr {B}}} - {{\mathscr {C}}}\Vert _F = 2.5535 \times 10^{-7}, ~ \Vert {{\mathscr {A}}} *_L {{\mathscr {X}}}_{Paige2}^{129} *_L {{\mathscr {B}}} - {{\mathscr {C}}}\Vert _F = 2.0378 \times 10^{-7}. \end{aligned}$$

The relative errors are

$$\begin{aligned} \frac{{\Vert {{\mathscr {X}}}_{Paige1}^{129} - {{\mathscr {X}}}^*\Vert _F}}{{\Vert \mathscr {X}^*\Vert _F}} = 1.6625 \times 10^{-7}, ~ \frac{{\Vert \mathscr {X}_{Paige2}^{129} - {{\mathscr {X}}}^*\Vert _F}}{{\Vert {{\mathscr {X}}}^*\Vert _F}} = 1.3598 \times 10^{-7}. \end{aligned}$$

We also give the convergence curves of Algorithms 3 and 4 in the figure 1.

Example 1 demonstrates the feasibility of Algorithms 3 and 4 in solving Problem 1.1. To further illustrate their effectiveness, we present two simulation experiments for color image restoration as follows.

Colored 3D models for the image restoration can be modeled as tensor expressions [16]

$$\begin{aligned} {{{\mathscr {B}}}} = {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}} + {{{\mathscr {E}}}}, \end{aligned}$$

where $ {\mathscr {B}}$, $ {\mathscr {A}}$ and $ {{\mathscr {E}}} \in \mathbb {R}^{n \times n \times 3} $ are the observation image, the blurring operator and the noise, respectively. And $ {\mathscr {X}} $ is the restored image to be sought. Each frontal slice of $ {{\mathscr {A}}}\in \mathbb {R}^{n \times n \times 3} $ is a Toeplitz matrix obtained from a matrix $ {S} \in \mathbb {R}^{ {n} \times {n} }$, where S is a two-dimensional Gaussian function

$$\begin{aligned} S(i,j) = \left\{ \begin{array}{l} \frac{1}{{\sqrt{2\pi } \sigma }}\exp ( - \frac{{{{(i - j)}^2}}}{{2{\sigma ^2}}}),|i - j| \le r,\\ 0,\quad \quad \quad \quad \quad \quad \;otherwise, \end{array} \right. \end{aligned}$$

(4.1)

and $ {\mathscr {A}} $ is obtained from

$$\begin{aligned} {{\mathscr {A}}}^{(1)} = \alpha S, ~ {{\mathscr {A}}}^{(2)} = \beta S, ~ \mathscr {A}^{(3)} = \gamma S. \end{aligned}$$

(4.2)

Here $ \alpha $, $ \beta $ and $ \gamma $ are the entries of the circular matrix

$$\begin{aligned} S_{color} = \left( {\begin{array}{*{20}{c}} \alpha &{}\gamma &{}\beta \\ \beta &{}\alpha &{}\gamma \\ \gamma &{}\beta &{}\alpha \end{array}} \right) , \end{aligned}$$

obtained from [20], which satisfy $ \alpha +\beta +\gamma = 1 $. This matrix gives rise to cyclic mixing between different channels (RGB channels). We set the noise level

$$\begin{aligned} v: = \frac{\Vert {{\mathscr {E}}}\Vert _F}{\Vert {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}}\Vert _F}. \end{aligned}$$

The noise $ {{\mathscr {E}}} $ obeys a Gaussian distribution, which has a zero mean and v variance.

We use Algorithm 3, Algorithm 4, PGD algorithm [25], CG algorithm [33], NSPG algorithm [4] and GMRES algorithm [16] to solve the above model with $ *_c $ product and t-product. The stopping criterion for all algorithms is either

$$\begin{aligned} \frac{{\Vert {{{{\mathscr {X}}}}_k} - {{{{\mathscr {X}}}}_{k - 1}}\Vert }}{{\Vert {{{{\mathscr {X}}}}_{k - 1}}\Vert }} \le {10^{ - 4}}, \end{aligned}$$

or the iteration step k reached the upper limit 1000. All color images in the experiments are from the USC-SIPI image database. In the experiments, we use CPU time, relative error (RE) and peak signal-to-noise ratio (PSNR) as the measurement for the restoration effect of the color image. The relative error (RE) and peak signal-to-noise ratio (PSNR) are defined as

$$\begin{aligned} RE = \frac{{\Vert {{\mathscr {X}} - \mathscr {Q}}\Vert _F}}{{\Vert {{\mathscr {Q}}}\Vert _F}},~PSNR = 10{\log _{10}}(\frac{{{{\mathscr {Q}}}_{\max }^2{I_1}{I_2}{I_3}}}{{\Vert {{\mathscr {X}} - \mathscr {Q}}\Vert _F^2}}), \end{aligned}$$

where $ {\mathscr {X}} $ and $ {\mathscr {Q}} $ are the restored image and the real image, respectively. The higher PSNR and the lower RE, the better restoration performance.

Example 4.2

We consider Problem 1.1 with $ {\mathscr {A}} $ obtained from (4.1) and (4.2) by setting $ \sigma = 2 $, $ r = 3 $, $ \alpha = 0.5 $, $ \beta = 0.1 $ and $ \gamma = 0.4 $ as inside channel and cross-channel blur, respectively. Take the $ House \in \mathbb {R}^{512 \times 512 \times 3} $ and $ Airplane \in \mathbb {R}^{512 \times 512 \times 3} $ to be the true images. And we get a series of blurred images $ {{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} *_L {{\mathscr {B}}} +{{\mathscr {E}}} $ by adding noise $ {{\mathscr {E}}} $ with noise level $ v=10^{-4} $, where $ {\mathscr {B}} $ is a third-order tensor with $ {{\mathscr {B}}}^{(1)} = S^T $ and $ {{\mathscr {B}}}^{(2)} = {{\mathscr {B}}}^{(3)} = 0 $.

In the implementation of the algorithms for this example, we simultaneously considered the cases of t-product and $ *_c $ product. We present visual representations of restored images obtained through various algorithms. Additionally, we magnify the local details in these images to compare the effectiveness of different restoration techniques on blurred images. We also provide numerical results for the CPU time, relative error (RE), and peak signal-to-noise ratio (PSNR) for each algorithm to compare their performance. All experimental results are listed in Figs. 2−3 and Tables 1−2.

Figures 2 and 3 show the visual effects of the restored images (House and Airplane). We can observe that the two images restored by Algorithms 3 and 4 have clearer texture details and features. This indicates that the results recovered by Algorithms 3 and 4 are of higher quality and more similar to the original images.

On the other hand, from Tables 1 and 2, we can see that our methods (Algorithms 3 and 4) obtain the highest PSNR value and the least RE value and take the least CPU time. The higher PSNR value and the lower RE value, the better recovery performance. So our methods outperform PGD, CG, NSPG and GMRES methods in PSNR and RE value.

Table 1 The computational results for Algorithms 3, 4, PGD, CG, NSPG and GMRES when the stopping criterion is reached under $ *_c $ product

Full size table

Table 2 The computational results for Algorithms 3, 4, PGD, CG, NSPG and GMRES when the stopping criterion is reached under t-product

Full size table

The image restoration model is a typical ill-posed problem. When we use an algorithm to solve this problem, we often encounter the semi-convergence^{Footnote 1} phenomenon. Especially, when the residual of Problem 1.1 is less than $ \Vert {{\mathscr {E}}}\Vert _F $, the tensor least squares problem will treat some of the noise as part of the image to be restored. The happening of overfitting leads to the semi-convergence phenomenon, which makes the restored image significantly different from the real image. This phenomenon is obvious in solving the image restoration model with a high noise level^{Footnote 2}. In this case, we add the Tikhonov regularization into Problem 1.1 to overcome this difficulty, i.e.,

$$\begin{aligned} \mathop {\min }\limits _{{{\mathscr {X}}}\in \mathbb {R}^{ {n} \times {n} \times {3}}} \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}} - {{{\mathscr {B}}}}\Vert _F^2 + {\mu }\Vert {{\mathscr {X}}}\Vert _F^2. \end{aligned}$$

(4.3)

Set the linear operator $ {{\mathscr {M}}}: \mathbb {R}^{ {n} \times {n} \times {3}} \rightarrow \mathbb {R}^{ {2n} \times {n} \times {3}} $ be

$$\begin{aligned} {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = \left[ \begin{array}{*{20}{c}} {{{\mathscr {A}}}}\\ {\mu ^{{{ 1} / 2}}{{\mathscr {I}}}} \end{array} \right] {\mathcal{X}}, \end{aligned}$$

and the tensor $ {{\mathscr {C}}} \in \mathbb {R}^{ {2n} \times {n} \times {3}} $ be

$$\begin{aligned} {{\mathscr {C}}} = \left[ \begin{array}{l} {{{\mathscr {B}}}}\\ {{\mathscr {O}}} \end{array} \right] . \end{aligned}$$

Our proposed algorithms can also be used to solve Problem (4.3). In the upcoming experiments, we will address this problem by utilizing Algorithms 3 and 4. Additionally, we will compare our results not only with PGD, CG, NSPG, and GMRES algorithms but also with the recently proposed GG-tGKT [34, 35] and GG-LtAT [35] methods.

Example 4.3

This example studies the image restoration model with a higher noise level. We consider Problem (4.3) with $ {\mathscr {A}} $ obtained from (4.1) and (4.2) by setting $ \sigma = 3 $, $ r = 3 $, $ \alpha = 0.8 $, $ \beta = 0.1 $ and $ \gamma = 0.1 $ as inside channel and cross-channel blur, respectively. Take the $ Peppers \in \mathbb {R}^{512 \times 512 \times 3} $ and $ Sailboat \in \mathbb {R}^{512 \times 512 \times 3} $ to be the true images. And we get a series of blurred images $ {{\mathscr {B}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} +{{\mathscr {E}}} $ by adding noise $ {{\mathscr {E}}} $ with noise level $ v=10^{-2} $. The obtained images are displayed on the left of Fig. 4. The optimal regularization parameter $ \mu _{opt} = 5.1 \times 10^{-3} $ ($ \mu _{opt} = 2.9 \times 10^{-3} $) under $*_c$ product (t-product) was obtained by using the generalized GCV method [15].

The visual results of two color images (Peppers and Sailboat) restored by Algorithms 3 and 4 are shown in Fig. 4. We can observe that the images restored by our algorithm are very close to the original images, with clear contour and texture features.

Further, the results of comparing Algorithms 3 and 4 with other algorithms are presented in Tables 3 and 4. These tables show that Algorithms 3 and 4 achieve the same PSNR and RE values as the PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms. This suggests that all algorithms restore images to a comparable quality. However, our methods require the least CPU time compared to the other four algorithms. Therefore, the convergence rate of Algorithm 3 and are faster than PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms.

Table 3 The computational results for Algorithms 3, 4, PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT when the stopping criterion is reached under $ *_c $ product

Full size table

Table 4 The computational results for Algorithms 3, 4, PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT when the stopping criterion is reached under t-product

Full size table

5 Conclusion

In this paper, we consider a class of tensor least squares problems under the tensor-tensor product with an invertible linear transforms, which arises in image restoration. Two Paige’s Algorithms are proposed to solve this problem. The convergence theorems are also derived. Numerical experiments show that the new algorithms are feasible and effective for Problem 1.1. Two simulation experiments for the image restoration show the good performance of the new algorithms.

Data Availability

The data used to support the findings of this study are included within the article.

Notes

When we use an algorithm to solve the ill-posed problem, the algorithm exhibits a gradual convergence behavior for the first k iterations. However, after the kth step, the error will increase and the convergence will disappear. This is called "semi-convergence".
Set ${{\mathscr {X}}}_k$ be the kth iterative value of an algorithm. If the noise ${\mathscr {E}}$ satisfies $\Vert {{\mathscr {E}}}\Vert _F \le \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}}_k - {{{\mathscr {B}}}}\Vert _F$, it is called as a low noise level. Conversely, if ${\mathscr {E}}$ satisfies $ \Vert {{\mathscr {E}}}\Vert _F > \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}}_k - {{{\mathscr {B}}}}\Vert _F $, it is called as a high noise level.

References

Bader, B.W., Kolda, T.G., et al.: Tensor Toolbox for MATLAB, Version 3.2.1, https://www.tensortoolbox.org
Behera, R., Mishra, D.: Further results on generalized inverses of tensors via the Einstein product. Linear Multilinear Algebra 65, 1662–1682 (2017)
Article MathSciNet Google Scholar
Behera, R., Sahoo, J.K., Mohapatra, R.N., et al.: Computation of generalized inverses of tensors viat-product. Numer. Linear Algebra Appl. 29, 1–23 (2022)
Article Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Article MathSciNet Google Scholar
Bouhamidi, A., Jbilou, K., Raydan, M.: Convex constrained optimization for large-scale generalized Sylvester equations. Comput. Optim. Appl. 48, 233–253 (2011)
Article MathSciNet Google Scholar
Brazell, M., Li, N., Navasca, C., et al.: Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 34, 542–570 (2013)
Article MathSciNet Google Scholar
Bykov, V.I., Kytmanov, A.M., Lazman, M.Z., et al.: Elimination Methods in Polynomial Computer Algebra. Kluwer Academic Publishers, The Netherlands (1998)
Book Google Scholar
Cao, Z., Xie, P.: Perturbation analysis for $t$-product based tensor inverse, Moore-Penrose inverse and tensor system, arXiv preprint arXiv:2107.09544 (2021)
Chen, Z., Lu, L.Z.: A projection method and Kronecker product preconditioner for solving Sylvester tensor equations. Sci. China Math. 55, 1281–1292 (2012)
Article MathSciNet Google Scholar
Cui, L.B., Chen, C., Li, W., et al.: An eigenvalue problem for even order tensors with its applications. Linear Multilinear Algebra 64, 602–621 (2016)
Article MathSciNet Google Scholar
Dehdezi, E.K., Karimi, S.: A fast and efficient Newton-Shultz-type iterative method for computing inverse and Moore-Penrose inverse of tensors. J. Math. Model. 9, 645–664 (2021)
MathSciNet Google Scholar
Ding, W.Y., Wei, Y.M.: Solving multi-linear systems with M-tensors. J. Sci. Comput. 68, 689–715 (2016)
Article MathSciNet Google Scholar
Ely, G., Aeron, S., Hao, N., et al.: 5D and 4D pre-stack seismic data completion using tensor nuclear norm. TNN, SEG International Exposition and Eighty-Third Annual Meeting at Houston, TX (2013)
Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Industr. Appl. Math. 2, 205–224 (1965)
MathSciNet Google Scholar
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–223 (1979)
Article MathSciNet Google Scholar
Guide, M.E., Ichi, A.E., Jbilou, K.: Discrete cosine transform LSQR and GMRES methods for multidimensional ill-posed problems, arXiv preprint arXiv:2103.11847 (2021)
Guide, M.E., Ichi, A.E., Jbilou, K.: Discrete cosine transform LSQR methods for multidimensional ill-posed problems. J. Math. Model. 10, 21–37 (2022)
MathSciNet Google Scholar
Guide, M.E., Ichi, A.E., Jbilou, K., et al.: Tensor Krylov subspace methods via the T-product for color image processing, arXiv preprint arXiv:2006.07133 (2020)
Guide, M.E., Ichi, A.E., Beik, F.P., et al.: Tensor GMRES and Golub-Kahan Bidiagonalization methods via the Einstein product with applications to image and video processing, arXiv preprint arXiv:2005.07458 (2020)
Hansen, P.C., Nagy, J.G., O’leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, Philadelphia (2006)
Hao, N., Kilmer, M.E., Braman, K., et al.: Facial recognition using tensor-tensor decompositions. SIAM J. Imaging Sci. 6, 437–463 (2013)
Article MathSciNet Google Scholar
Huang, B.: Numerical study on Moore-Penrose inverse of tensors via Einstein product. Numer. Algorithms 87, 1767–1797 (2021)
Article MathSciNet Google Scholar
Huang, B., Ma, C.: An iterative algorithm to solve the generalized Sylvester tensor equations. Linear Multilinear Algebra 68, 1175–1200 (2020)
Article MathSciNet Google Scholar
Huang, B., Ma, C.: Global least squares methods based on tensor form to solve a class of generalized Sylvester tensor equations. Appl. Math. Comput. 369, 1–16 (2020)
MathSciNet Google Scholar
Iusem, A.N.: On the convergence properties of the projected gradient method for convex optimization. Comput. Appl. Math. 22, 37–52 (2003)
Article MathSciNet Google Scholar
Karimi, S., Dehghan, M.: Global least squares method based on tensor form to solve linear systems in Kronecker format. T. I. Meas. Control 40, 2378–2386 (2018)
Article Google Scholar
Kernfeld, E., Kilmer, M., Aeron, S.: Tensor-tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)
Article MathSciNet Google Scholar
Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435, 641–658 (2011)
Article MathSciNet Google Scholar
Kilmer, M.E., Horesh, L., Avron, H., et al.: Tensor-tensor algebra for optimal representation and compression of multiway data. Proc. Natl. Acad. Sci. 118, e2015851118 (2021)
Article MathSciNet Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Article MathSciNet Google Scholar
Liu, X., Wang, L., Wang, J., et al.: A three-dimensional point spread function for phase retrieval and deconvolution. Opt. Express 20, 15392–15405 (2012)
Article Google Scholar
Paige, C.C.: Bidiagonalization of matrices and solution of linear equations. SIAM J. Numer. Anal. 11, 197–209 (1974)
Article MathSciNet Google Scholar
Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comp. Math. Math. Phys. 9, 94–112 (1969)
Article Google Scholar
Reichel, L., Ugwu, U.O.: The tensor Golub-Kahan-Tikhonov method applied to the solution of ill-posed problems with a $t$-product structure. Numer. Linear Algebra Appl. 29, e2412 (2022)
Article MathSciNet Google Scholar
Reichel, L., Ugwu, U.O.: Tensor Krylov subspace methods with an invertible linear transform product applied to image processing. Appl. Numer. Math. 166, 186–207 (2021)
Article MathSciNet Google Scholar
Savas, B., Eldén, L.: Krylov-type methods for tensor computations. Linear Algebra Appl. 438, 891–918 (2013)
Article MathSciNet Google Scholar
Sun, L., Zheng, B., Bu, C., et al.: Moore-Penrose inverse of tensors via Einstein product. Linear Multilinear Algebra 64, 686–698 (2016)
Article MathSciNet Google Scholar
Sun, L., Zheng, B., Wei, Y., et al.: Generalized inverses of tensors via a general product of tensors. Front. Math. China 13, 893–911 (2018)
Article MathSciNet Google Scholar
Wang, Q.W., Xu, X., Duan, X.: Least squares solution of the quaternion Sylvester tensor equation. Linear Multilinear Algebra 69, 104–130 (2021)
Article MathSciNet Google Scholar
Xie, Z.J., Jin, X.Q., Sin, V.K.: An optimal preconditioner for tensor equations involving Einstein product. Linear Multilinear Algebra 68, 886–902 (2020)
Article MathSciNet Google Scholar
Zhang, Z., Ely, G., Aeron, S., et al.: Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3842–3849 (2014)

Download references

Acknowledgements

The authors thank the editor and the reviewers for the constructive and helpful comments on the revision of this article.

Author information

Authors and Affiliations

College of Mathematics and Computational Science, Center for Applied Mathematics of Guangxi (GUET), Guangxi Colleges and Universities Key Laboratory of Data Analysis and Computation, Guilin University of Electronic Technology, Guilin, 541004, Guangxi, People’s Republic of China
Xue-Feng Duan, Yong-Shen Zhang & Chun-Mei Li
Department of Mathematics, Shanghai University, Shanghai, 200444, Guangxi, People’s Republic of China
Qing-Wen Wang

Authors

Xue-Feng Duan
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Shen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qing-Wen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chun-Mei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xue-Feng Duan.

Ethics declarations

Conflicts of interest

This study does not have any conflicts to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the National Natural Science Foundation of China (Grant No. 12361079; 12201149; 12261026; 12371023), the Natural Science Foundation of Guangxi Province (Grant No. 2023GXNSFAA026067), and the Innovation Project of GUET Graduate Education (Grant No. 2022YCXS147; 2022YCXS140).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Duan, XF., Zhang, YS., Wang, QW. et al. Paige’s Algorithm for solving a class of tensor least squares problem. Bit Numer Math 63, 48 (2023). https://doi.org/10.1007/s10543-023-00990-y

Download citation

Received: 12 October 2022
Accepted: 29 August 2023
Published: 20 September 2023
DOI: https://doi.org/10.1007/s10543-023-00990-y

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Paige’s Algorithm for solving a class of tensor least squares problem

Abstract

Similar content being viewed by others

Regularized and Structured Tensor Total Least Squares Methods with Applications

Convergence and evaluation-complexity analysis of a regularized tensor-Newton method for solving nonlinear least-squares problems

Alternative Arnoldi process for ill-conditioned tensor equations with application to image restoration

1 Introduction

Problem 1.1

2 Preliminaries

Definition 2.1

Definition 2.2

Definition 2.3

Definition 2.4

Definition 2.5

Definition 2.6

Definition 2.7

Definition 2.8

3 Paige’s Algorithms for solving problem 1.1

3.1 The operator-bidiagonal procedure

Lemma 3.1

Proof

Theorem 3.1

Proof

Theorem 3.2

Proof

3.2 New algorithms for Problem 1.1 based on the operator-bidiagonal procedure

Theorem 3.3

Proof

Theorem 3.4

Proof

3.3 Computational complexity analysis

4 Numerical experiments

Example 4.1

Example 4.2

Example 4.3

5 Conclusion

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation