1 Introduction

Throughout this paper, vectors are written in italic lowercase letters such as uv, matrices correspond to uppercase letters, e.g., AB, and tensors are denoted by calligraphic capital letters such as . Let \(\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}\) denote the set of \({I_1} \times {I_2} \times {I_3}\) tensors over the real field \(\mathbb {R}\). The zero tensor \( {\mathscr {O}} \) is the one with all entries being zero. A slice is a two-dimensional section of a tensor, defined by fixing all but two indices. The kth frontal slice \( \textbf{X}_{::k} \) of the third-order tensor \( {{\mathscr {X}}} \in \mathbb {R}^{{I_1} \times {I_2} \times {I_3}} \) is denoted as \( {{\mathscr {X}}}^{(k)} \). The tensor Frobenius norm \(||\mathscr {A} ||_F = \sqrt{\left\langle {{\mathscr {A}},\mathscr {A}} \right\rangle }\) is induced by the tensor inner product

where \( {{\mathscr {A}}} = (a_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} \), \( {{\mathscr {B}}} = (b_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} \).

In this paper, we consider the following tensor least squares problem.

Problem 1.1

Given tensors \({{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}\), \({{\mathscr {B}} } \in \mathbb {R}^{{p} \times {q} \times {n}}\), and \({{\mathscr {C}}} \in \mathbb {R}^{{m} \times {q} \times {n}}\), find a tensor \( \mathcal{\tilde{X}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \) such that

$$\begin{aligned} ||{{\mathscr {A}}}{*_L}{\tilde{{\mathscr {X}}}}{*_L}{{\mathscr {B}}} - {{{\mathscr {C}}}}||_F^2 = \mathop {\min }\limits _{{{\mathscr {X}}}\in \mathbb {R}^{ {l} \times {p} \times {n}}} ||{{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}} - {\mathscr {C}}||_F^2. \end{aligned}$$
(1.1)

Here we set the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}\) be

$$\begin{aligned} {{\mathscr {M}}}({{\mathscr {X}}}) = {{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}. \end{aligned}$$

The operator \( *_L \) in Problem 1.1 is a tensor-tensor product with an invertible linear transform, which was first proposed by Kernfeld, Kilmer and Aeron [27]. It is a generalization of the t-product [28] and the cosine transform product (*c product). The definitions and connections between the t-product and *c product are described in Section 2. The \( *_L \) product is also referred to as the \( \star _M \)-product family in the literature [29]. This kind of tensor-tensor product has been applied to processing of seismic data [13], color image restoration [16, 18], facial recognition [21] and data compression [41].

Problem 1.1 often arises in the image restoration. As described in [16], the 3D model of color image restoration can be expressed as the tensor expression

$$\begin{aligned} {{{\mathscr {B}}}} = {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}} + {{{\mathscr {E}}}}, \end{aligned}$$
(1.2)

where \( {\mathscr {B}}\), \( {\mathscr {A}}\) and \( {\mathscr {E}} \) are the observation image, the blurring operator and the noise, respectively. And \( {\mathscr {X}} \) is the restored image to be sought. Obviously, Problem 1.1 is a generalized form of (1.2), which consider both horizontal and vertical blurring of the image. Compared with the previous image restoration models, the matrix product model [5]

$$\begin{aligned} g=Hx+n, \end{aligned}$$
(1.3)

and the Einstein product model [10, 31, 40]

$$\begin{aligned} {{{\mathscr {M}}}} = {{{\mathscr {T}}}}{ * _3}{{{\mathscr {Q}}}} + {{{\mathscr {N}}}}, \end{aligned}$$
(1.4)

the advantage of (1.2) is that it avoids the lack of information inherent in the expansion process of the tensor and takes into account the orderliness and correlation between the data.

The tensor equations and their least squares problems have been investigated by many scholars. Some mathematical tools, for example, the tensor generalized inverse, have been used in the literature [2, 3, 8, 11, 22, 37, 38]. The methods for solving the tensor equations and their least squares problems can be generally divided into two classes. The first class is the so-called direct methods, such as the elimination method [7] and the tensor generalized inverse methods [2, 3, 6, 8]. However, these methods do not perform well in large-scale data. The second class is the iterative methods. Wang, Xu and Duan [39] proposed the conjugate gradient method to solve the quaternion Sylvester tensor equation. Huang and Ma [23] and Xie et al. [40] solved the tensor least squares problem in Einstein product by conjugate gradient method and preprocessed conjugate gradient method, respectively. Ding and Wei [12] proposed the Newton method to solve the \( m-1 \) degree homogeneous tensor equation with the symmetric M-tensor coefficient. Guide and Ichi et al. [18] proposed the GMRES-type methods for solving the image processing problems with the t-product. Similar algorithms can also be found in the literature [9, 16, 17, 24, 39]. For the ill-posed problem, Reichel and Ugwu [34, 35] use the Golub-Kahan bidiagonalization and the Arnoldi upper Hessenberg preprocessors to solve the Tikhonov regularization problem in the image restoration model. Some other algorithms [19, 36] have been designed by involving matrixization of tensors, thus they are difficult to be used for the large-scale tensor least squares problem. However, the research results of Problem 1.1 are very few as far as we know.

In this paper, we consider the tensor least squares Problem 1.1 arising in image restoration. We first give the operator-bidiagonal procedures of \( {{\mathscr {M}}}({{\mathscr {X}}}) \), and then use them to design two Paige’s Algorithms for solving Problem 1.1. Two convergence theorems of the new methods are also given. A numerical example shows the feasibility of the Paige’s methods for solving Problem 1.1. The simulation experiments of the image restoration illustrate the feasibility and effectiveness of the new methods. Specifically, the algorithms in this paper differ from the ones described in [34, 35] in the following two ways:

  • Our proposed operator bidiagonalization process is distinguished from GG-tGKB process [35] because it provides a unified iterative framework for different linear operator bidiagonalization.

  • We derive a coefficient relation between the minimal Frobenius norm solution of Problem 1.1 and the orthogonal tensor bases of the Krylov subspace

    $$\begin{aligned} {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}) = span\{ {{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}}){{\mathscr {M}}^*}({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})\}, \end{aligned}$$

    which allows our algorithms to use only the tensor generated by the current kth iteration. We do not need additional memory for the orthogonal bases as described in [34, 35], and this approach reduces the space complexity of the algorithm.

This paper is organized as follows. In Sect. 2, we introduce some notations and definitions. And we also summarize the connection between \( *_L \) product, t-product and \( *_c \) product. In Sect. 3, we design the Paige’s algorithms for solving Problem 1.1 and give the convergence theorems. In Sect. 4, some numerical experiments are given to illustrate that the algorithms are feasible and effective.

2 Preliminaries

In this section, we give some preliminaries, and then recall the \( *_L \) product proposed by Kernfeld, Kilmer and Aeron [27] and analyze the connection between t-product, \( *_c \) product, and \( *_L \) product.

Given \({{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}\) and its frontal slice \( {{\mathscr {A}}}^{(i)} \), \( i = 1, 2, \cdots , n \), the operators bcirc unfold and fold can be defined as [28]

$$\begin{aligned} bcirc({{{\mathscr {A}}}}): = \left[ \begin{array}{cccc} {\mathscr {A}^{(1)}}&{}{\mathscr {A}^{(n)}}&{} \cdots &{}{\mathscr {A}^{(2)}}\\ {\mathscr {A}^{(2)}}&{}{\mathscr {A}^{(1)}}&{} \cdots &{}{\mathscr {A}^{(3)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {\mathscr {A}^{(n)}}&{}{\mathscr {A}^{(n - 1)}}&{} \cdots &{}{\mathscr {A}^{(1)}} \end{array} \right] , unfold({{\mathscr {A}}}) := \left[ \begin{array}{c} {\mathscr {A}^{(1)}}\\ {\mathscr {A}^{(2)}}\\ \vdots \\ {\mathscr {A}^{(n)}} \end{array} \right] , \end{aligned}$$

and \( fold(unfold({{\mathscr {A}}})):= {\mathscr {A}} \). Then the t-product can be defined as follows.

Definition 2.1

(t-product [28]) Let the t-product be the operator \( * \): \( \mathbb {R}^{ {m} \times {l} \times {n}} \times \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {p} \times {n}} \) between two tensors \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \), which produces a tensor of \( \mathbb {R}^{ {m} \times {p} \times {n}} \) as follows

$$\begin{aligned} {{\mathscr {A}}}*{{\mathscr {B}}} = fold(bcirc({{\mathscr {A}}})unfold(\mathscr {B})). \end{aligned}$$

Notice that the block matrix \( bcirc ({{\mathscr {A}}}) \) can be block diagonalized by the discrete Fourier transform (DFT), i.e.,

$$\begin{aligned} A = blockdiag({{{\hat{{\mathscr {A}}}}}^{(1)}},{{{\hat{{\mathscr {A}}}}}^{(2)}}, \cdots {{{\hat{{\mathscr {A}}}}}^{(n)}}) = ({F_n} \otimes {I_l})bcirc({{{\mathscr {A}}}})({F_{n}^{*}} \otimes {I_m}), \end{aligned}$$

where \( \otimes \) is the Kronecker product, \( F_n \) is the unitary DFT matrix satisfing

$$\begin{aligned} F_{n}F_{n}^{*} = F_{n}^{*}F_{n} = I_n. \end{aligned}$$

Therefore, \( {{\mathscr {A}}}*{{\mathscr {B}}} \) with the t-product can be efficiently implemented by the fast Fourier transform.

Considering the discrete cosine transform instead of the discrete Fourier transform, we obtain the \( *_c \) product similar to the t-product. The block Toeplitz-plus-Hankel matrix and the operator ten can be defined as [27]

where \( E_l = \left[ {\begin{array}{*{20}{c}} {{I_l}}\\ 0\\ \vdots \\ 0 \end{array}} \right] \in \mathbb {R}^{nl \times l}\) is the first l columns of the identity matrix \( I_{nl} \) and

$$\begin{aligned} T = \left[ {\begin{array}{*{20}{c}} {{I_m}}&{}{ - {I_m}}&{} \cdots &{}{{{( - {I_m})}^{n - 1}}}\\ 0&{}{{I_m}}&{} \cdots &{}{{{( - {I_m})}^{n - 2}}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 0&{}0&{}0&{}{{I_m}} \end{array}} \right] \in \mathbb {R}^{nm \times nm}. \end{aligned}$$

Definition 2.2

(\( *_c \) product [27]) Set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \) be two real tensors. Then the \( *_c \) product \( {{\mathscr {A}}} *_c {{\mathscr {B}}} \) is an \( {m} \times {p} \times {n} \) real tensor defined by

$$\begin{aligned} {{\mathscr {A}}} *_c {{\mathscr {B}}} = ten(mat({{\mathscr {A}}})mat({{\mathscr {B}}})). \end{aligned}$$

Both the t-product and the \( *_c \) product use linear transformations that block diagonalize the block matrix. If we consider each diagonal block as each frontal slice of the tensor, we can redefine the t-product and \(*_c\) product using the n-mode product.

Definition 2.3

(\( *_L \) product [27]) Let L be an invertible linear operator (see Definition 2.4), and set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \). Then the \( *_L \) product \( {{\mathscr {A}}} *_L {{\mathscr {B}}} \) is an \( {m} \times {p} \times {n} \) tensor defined by

$$\begin{aligned} {{\mathscr {A}}} *_L {{\mathscr {B}}} = L^{-1}(L({{\mathscr {A}}}) \bigtriangleup L(\mathscr {B})), \end{aligned}$$

where the face-wise product \( {{\mathscr {A}}} \bigtriangleup {{\mathscr {B}}} \) is defined by

$$\begin{aligned} ({{\mathscr {A}}} \bigtriangleup {{\mathscr {B}}})^{(i)} = {{\mathscr {A}}}^{(i)}\mathscr {B}^{(i)}. \end{aligned}$$

Definition 2.4

([30]) Let \( M \in \mathbb {R}^{{n} \times {n}} \) be an invertible matrix. Then the invertible linear operator L: \( \mathbb {R}^{ {m} \times {l} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {l} \times {n}} \) is defined as

$$\begin{aligned} L({{\mathscr {A}}}) = {{\mathscr {A}}} \times _{3} M, \end{aligned}$$

where \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \).

As described by Kernfeld, Kilmer and Aeron [27], the \( *_c \) product is an example of the \( *_L \) product, where \( M = W_{c}^{-1}C_n(I+Z) \) in the invertible linear operator L. Here \( C_n \) denote the \( n \times n \) orthogonal DCT matrix, \( W_c = diag(C_n(:,1)) \) and Z is the \( n \times n \) (singular) circulant upshift matrix. The connection between the t-product and the \( *_L \) product is

$$\begin{aligned} M = W_{t}^{-1}F_n, \end{aligned}$$

where \( W_t = diag(F_n(:,1)) \).

Using the invertible linear operator L, we can give the definition of a tensor transpose which satisfies the t-product and \( *_c \) product.

Definition 2.5

([27]) Set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \), then transpose \( {{\mathscr {A}}}^T \in \mathbb {R}^{ {l} \times {m} \times {n}} \) satisfies

$$\begin{aligned} L({{\mathscr {A}}}^T)^{(i)} = (L({{\mathscr {A}}})^{(i)})^T,~i=1, \cdots ,n. \end{aligned}$$

We also need the following definitions for the paper.

Definition 2.6

([18]) Let \( \mathbb {V} = [{{\mathscr {V}}}_1, {{\mathscr {V}}}_2, \cdots ,{{\mathscr {V}}}_k] \in \mathbb {R}^{m \times kp \times n} \), where \( {{\mathscr {V}}}_i \in \mathbb {R}^{m \times p \times n} \), \( i = 1,2, \cdots ,k \) and \( y = [y_1, y_2, \cdots ,y_k]^T \in \mathbb {R}^k \), \( Z = [z_1,z_2, \cdots ,z_k] \in \mathbb {R}^{k \times k} \). Then the product \( \mathbb {V} \circledast y \) and \( \mathbb {V} \circledast Z \) are defined as

$$\begin{aligned} \mathbb {V} \circledast y = \sum \limits _{i = 1}^k {{y_i}{\mathscr {V}_i}}, ~ \mathbb {V} \circledast Z = [{{\mathscr {V}}}_1 \circledast z_1, {{\mathscr {V}}}_2 \circledast z_2, \cdots ,{{\mathscr {V}}}_k \circledast z_k]. \end{aligned}$$

Definition 2.7

([18]) Let \( \mathbb {A} = [{{\mathscr {A}}}_1, {{\mathscr {A}}}_2, \cdots ,{{\mathscr {A}}}_k] \in \mathbb {R}^{m \times kl \times n} \) and \( \mathbb {B} = [{{\mathscr {B}}}_1, {{\mathscr {B}}}_2, \cdots ,{{\mathscr {B}}}_p] \in \mathbb {R}^{m \times pl \times n} \), where \( {{\mathscr {A}}}_i, {{\mathscr {B}}}_i \in \mathbb {R}^{{m} \times {l} \times {n}} \), \( i = 1,2,\cdots \). Then the product \( {\mathbb {A}}^T \lozenge {\mathbb {B}} \) is the \( k \times p \) matrix defined by

$$\begin{aligned} ({\mathbb {A}}^T \lozenge {\mathbb {B}})_{ij} = \left\langle { \mathscr {A}_i,{{\mathscr {B}}}_j} \right\rangle . \end{aligned}$$

Notice that if \( \mathbb {B} = {{\mathscr {B}}} \in \mathbb {R}^{m \times l \times n} \), then the product \( {\mathbb {A}}^T \lozenge {{\mathscr {B}}} \) yields a column vector.

Definition 2.8

([26]) Let \( {\mathscr {M}} \): \( \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {q} \times {n}} \) be the linear operator. If the linear operator \( {{\mathscr {M}}}^* \): \( \mathbb {R}^{ {l} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \) satisfies

$$\begin{aligned} \left\langle {{{{\mathscr {M}}}}({{{\mathscr {X}}}}),{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{{\mathscr {M}}}}^*}({{{\mathscr {Y}}}})} \right\rangle ,~ \forall {{{\mathscr {X}}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} , ~ {{{\mathscr {Y}}}} \in \mathbb {R}^{ {l} \times {q} \times {n}}, \end{aligned}$$

then it is called the adjoint of \( {\mathscr {M}} \).

3 Paige’s Algorithms for solving problem 1.1

In this section, we first give the operator-bidiagonal procedure of \( {{\mathscr {M}}}({{\mathscr {X}}}) \), and then use it to design the Paige’s Algorithms for solving Problem 1.1. The convergence of the new methods are also given.

3.1 The operator-bidiagonal procedure

This subsection gives the operator-bidiagonal procedure. Firstly, the adjoint operator \( {{\mathscr {M}}}^* \): \( \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \) of the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}\) in Problem 1.1 is derived.

Lemma 3.1

Set \( {{\mathscr {M}}}({{\mathscr {X}}}) = {{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}} \). The adjoint of the linear operator \( {\mathscr {M}} \) is

$$\begin{aligned} {{\mathscr {M}}}^*({{\mathscr {X}}}) = {{\mathscr {A}}}^T{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}^T. \end{aligned}$$

Proof

Here we first prove that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}}} \right\rangle ,~ \forall {{{\mathscr {X}}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} , ~ {{{\mathscr {Y}}}} \in \mathbb {R}^{ {m} \times {p} \times {n}}. \end{aligned}$$

Set \( M = (m_1, m_2, \cdots , m_n) \), \( M^{-1} = ({\hat{m}}_1, {\hat{m}}_2, \cdots , {\hat{m}}_n) \), \( {{\mathscr {A}}} = (a_{ijk}) \in \mathbb {R}^{m \times l \times n} \), \( {{\mathscr {X}}} = (x_{ijk}) \in \mathbb {R}^{l \times p \times n} \), \( {{\mathscr {Y}}} = (y_{ijk}) \in \mathbb {R}^{m \times p \times n} \), and set

$$\begin{aligned} a_{ij}^{(i)} := \textbf{a}_{ij:}^{T}m_i , ~ i = 1, 2, \cdots , n. \end{aligned}$$

Combining Definition 2.4, we have

$$\begin{aligned} L({{\mathscr {A}}})^{(i)} = ({{\mathscr {A}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {a_{11}^{(i)}}&{}{a_{12}^{(i)}}&{} \cdots &{}{a_{1l}^{(i)}}\\ {a_{21}^{(i)}}&{}{a_{22}^{(i)}}&{} \cdots &{}{a_{2l}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {a_{m1}^{(i)}}&{}{a_{m2}^{(i)}}&{} \cdots &{}{a_{ml}^{(i)}} \end{array}} \right] . \end{aligned}$$

Similarly, we have

$$\begin{aligned}&L({{\mathscr {X}}})^{(i)} = ({{\mathscr {X}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {x_{11}^{(i)}}&{}{x_{12}^{(i)}}&{} \cdots &{}{x_{1p}^{(i)}}\\ {x_{21}^{(i)}}&{}{x_{22}^{(i)}}&{} \cdots &{}{x_{2p}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {x_{l1}^{(i)}}&{}{x_{l2}^{(i)}}&{} \cdots &{}{x_{lp}^{(i)}} \end{array}} \right] ,\\&L({{\mathscr {Y}}})^{(i)} = ({{\mathscr {Y}}} \times _{3} M)^{(i)} = \left[ {\begin{array}{*{20}{c}} {y_{11}^{(i)}}&{}{y_{12}^{(i)}}&{} \cdots &{}{y_{1p}^{(i)}}\\ {y_{21}^{(i)}}&{}{y_{22}^{(i)}}&{} \cdots &{}{y_{2p}^{(i)}}\\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ {y_{m1}^{(i)}}&{}{y_{m2}^{(i)}}&{} \cdots &{}{y_{mp}^{(i)}} \end{array}} \right] , \end{aligned}$$

where \( x_{ij}^{(i)}:= \textbf{x}_{ij:}^{T}m_i, ~ y_{ij}^{(i)}:= \textbf{y}_{ij:}^{T}m_i, ~ i = 1, 2, \cdots , n \). Then the expression for any element in \( {{\mathscr {A}}}{*_L}{{\mathscr {X}}} \) can be given as

$$\begin{aligned} ({{\mathscr {A}}}{*_L}{{\mathscr {X}}})_{ijk} = \left( {\begin{array}{*{20}{c}} {\sum \nolimits _{t = 1}^l {a_{it}^{(1)}x_{tj}^{(1)}} }&{\sum \nolimits _{t = 1}^l {a_{it}^{(2)}x_{tj}^{(2)}} }&\cdots&{\sum \nolimits _{t = 1}^l {a_{it}^{(n)}x_{tj}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}. \end{aligned}$$

Noting that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \sum \limits _{i=1}^n {\left\langle {({{\mathscr {A}}}{*_L}\mathscr {X})^{(i)},{{\mathscr {Y}}}^{(i)}} \right\rangle }, \end{aligned}$$

and by making use of the matrix inner product of each frontal slice of the tensor \( {{\mathscr {A}}}{*_L}{{\mathscr {X}}} \) and tensor \( {\mathscr {Y}} \), we get

$$\begin{aligned} \begin{aligned}&\sum \limits _{i=1}^n{(({{\mathscr {A}}}{*_L}{{\mathscr {X}}})^{(i)T}\mathscr {Y}^{(i)})_{rr}}\\ {}&= \sum \limits _{i=1}^n {\sum \limits _{j=1}^m {\left( {\begin{array}{*{20}{c}} {\sum \limits _{t = 1}^l {a_{jt}^{(1)}x_{tr}^{(1)}} }&{\sum \limits _{t = 1}^l {a_{jt}^{(2)}x_{tr}^{(2)}} }&\cdots&{\sum \limits _{t = 1}^l {a_{jt}^{(n)}x_{tr}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}} {y_{jr}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{j=1}^m {\left( {\begin{array}{*{20}{c}} {\sum \limits _{t = 1}^l {a_{jt}^{(1)}x_{tr}^{(1)} {y_{jr}}} }&{\sum \limits _{t = 1}^l {a_{jt}^{(2)}x_{tr}^{(2)} {y_{jr}}} }&\cdots&{\sum \limits _{t = 1}^l {a_{jt}^{(n)}x_{tr}^{(n)} {y_{jr}}} } \end{array}} \right) {{{{\hat{m}}}}_i}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{t = 1}^l {\left( {\begin{array}{*{20}{c}} {\sum \limits _{j=1}^m {a_{jt}^{(1)}y_{jr}^{(1)} {x_{tr}}} }&{\sum \limits _{j=1}^m {a_{jt}^{(2)}y_{jr}^{(2)} {x_{tr}}} }&\cdots&{\sum \limits _{j=1}^m {a_{jt}^{(n)}y_{jr}^{(n)} {x_{tr}}} } \end{array}} \right) {{\hat{m}}_i}}} \\ {}&= \sum \limits _{i=1}^n {\sum \limits _{t = 1}^l {\left( {\begin{array}{*{20}{c}} {\sum \limits _{j=1}^m {a_{jt}^{(1)}y_{jr}^{(1)}} }&{\sum \limits _{j=1}^m {a_{jt}^{(2)}y_{jr}^{(2)} } }&\cdots&{\sum \limits _{j=1}^m {a_{jt}^{(n)}y_{jr}^{(n)}} } \end{array}} \right) {{{{\hat{m}}}}_i}} {x_{tr}}} \\ {}&= \sum \limits _{i=1}^n{({{\mathscr {X}}}^{(i)T}(\mathscr {A}^{T}{*_L}{{\mathscr {Y}}})^{(i)})_{rr}}, ~ r = 1, 2, \cdots ,p. \end{aligned} \end{aligned}$$

This implies that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}}} \right\rangle . \end{aligned}$$
(3.1)

In a similar way, we have

$$\begin{aligned} \left\langle {{{\mathscr {X}}}{*_L}{{\mathscr {B}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {Y}}}{*_L}{{\mathscr {B}}}^T}} \right\rangle . \end{aligned}$$
(3.2)

Combining (3.1) and (3.2), we can obtain that

$$\begin{aligned} \left\langle {{{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}},{{{\mathscr {Y}}}}} \right\rangle = \left\langle {{{{\mathscr {X}}}},{{{\mathscr {A}}}^T{*_L}{{\mathscr {Y}}}{*_L}{{\mathscr {B}}}^T}} \right\rangle , \end{aligned}$$

which means that

$$\begin{aligned} {{\mathscr {M}}}^*({{\mathscr {X}}}) = {{\mathscr {A}}}^T{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}}^T. \end{aligned}$$

The proof is completed. \(\square \)

Combining Definition 2.8, we simplify the operator \( {\mathscr {M}} \) in Problem 1.1 using the operator-bidiagonal procedure, which is similar to the Golub-Kahan bidiagonalization technique [14] as follows.

Algorithm 1
figure a

The lower operator-bidiagonal procedure of \( {{\mathscr {M}}} \)

In Algorithm 1, the scalars \( \alpha _k \) and \( \beta _k \) are chosen so that \( ||{{{\mathscr {U}}}_k}||_F = ||{{{\mathscr {V}}}_k}||_F = 1 \). It is not difficult to find that after m iterations of Algorithm 1, we have the following results

$$\begin{aligned} \begin{aligned} {{{{\mathscr {V}}}}_k}&\in span\{ {{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}}){{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})\}\\ {}&\buildrel \Delta \over = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})) = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}),\\ {{{{\mathscr {U}}}}_k}&\in span\{ {{{{\mathscr {U}}}}_1},({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}({{{{\mathscr {U}}}}_1})\} \buildrel \Delta \over = {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {U}}}}_1}). \end{aligned} \end{aligned}$$

By using the definitions

$$\begin{aligned} \begin{aligned}&{\mathbb {V}_k}: = [{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_2}, \cdots ,{{{{\mathscr {V}}}}_k}] \in \mathbb {R}{^{{l} \times pk \times {n}}}, ~ {\mathbb {U}_k}: = [{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_2}, \cdots ,{{{{\mathscr {U}}}}_k}] \in \mathbb {R}{^{{m} \times qk \times {n}}},\\&{L_k} = \left[ {\begin{array}{*{20}{c}} {{\alpha _1}}&{}{}&{}{}&{}{}\\ {{\beta _2}}&{}{{\alpha _2}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\beta _k}}&{}{{\alpha _k}} \end{array}} \right] , ~ {{\tilde{L}}_k} = \left[ {\begin{array}{*{20}{c}} {{\alpha _1}}&{}{}&{}{}&{}{}\\ {{\beta _2}}&{}{{\alpha _2}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\beta _k}}&{}{{\alpha _k}}\\ {}&{}{}&{}{}&{}{{\beta _{k + 1}}} \end{array}} \right] ,\\&{{{{\mathscr {M}}}}^*}({\mathbb {U}_k}): = [{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_2}), \cdots ,{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_k})] \in \mathbb {R}{^{{l} \times pk \times {n}}},\\&{{{\mathscr {M}}}}({\mathbb {V}_k}): = [{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_1}),{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_2}), \cdots ,{{{\mathscr {M}}}}({{{{\mathscr {V}}}}_k})] \in \mathbb {R}{^{{m} \times qk \times {n}}}, \end{aligned} \end{aligned}$$

the recurrence relations of Algorithm 1 can be rewritten as

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({\mathbb {U}_k}) = {\mathbb {V}_k} \circledast L_k^T, ~ {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {U}_{k + 1}} \circledast {{\tilde{L}}_k}. \end{aligned}$$
(3.3)

Theorem 3.1

The tensors sequences \( \{{{\mathscr {V}}}_k\} \) and \( \{{{\mathscr {U}}}_k\} \) generated by Algorithm 1 are the orthonormal basis of the Krylov space \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {V}}_1}) \) and \({{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {U}}_1}) \), i.e.

$$\begin{aligned} \mathbb {U}_k^T \lozenge {\mathbb {U}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$
(3.4)

Proof

We prove this theorem by the mathematical induction. When \( m = 1 \), we have

$$\begin{aligned} \begin{aligned} \left\langle {{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_2}} \right\rangle&= \left\langle {{{{{\mathscr {U}}}}_1},{{({{{\mathscr {M}}}}({{{{\mathscr {V}}}}_1}) - {\alpha _1}{{{{\mathscr {U}}}}_1})} / {{\beta _2}}}} \right\rangle \\ {}&= \frac{{\left\langle {{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),{{{{\mathscr {V}}}}_1}} \right\rangle - {\alpha _1}\left\langle {{{{{\mathscr {U}}}}_1},{{\mathscr {U}}_1}} \right\rangle }}{{{\beta _2}}}\\ {}&= \frac{{\left\langle {{\alpha _1}{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_1}} \right\rangle - {\alpha _1}\left\langle {{{{{\mathscr {U}}}}_1},{{{{\mathscr {U}}}}_1}} \right\rangle }}{{{\beta _2}}} = 0. \end{aligned} \end{aligned}$$

Assume that the result is true for \( m = k \), then for \( k+1 \), we have

$$\begin{aligned} \begin{aligned} \left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 2}}} \right\rangle&= \left\langle {{{{\mathscr {U}}}_i},{{({{\mathscr {M}}}({{{\mathscr {V}}}_{k + 1}}) - {\alpha _{k + 1}}{{{\mathscr {U}}}_{k + 1}})} / {{\beta _{k + 2}}}}} \right\rangle \\ {}&= \dfrac{{\left\langle {{\mathscr {U}_i},{{\mathscr {M}}}({{{\mathscr {V}}}_{k + 1}})} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \dfrac{{\left\langle {{{\mathscr {M}}}^*({\mathscr {U}_i}),{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \dfrac{{\left\langle {\alpha _i \mathscr {V}_i + \beta _i {{\mathscr {V}}}_{i-1},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}} \\ {}&= \left\{ \begin{array}{l} \dfrac{{\left\langle {{\alpha _i}{{{\mathscr {V}}}_i} + {\beta _i}{\mathscr {V}_{i - 1}},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_i},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}}, \qquad ~ ~ i = 1,2, \cdots , k \\ \dfrac{{\left\langle {{\alpha _{k + 1}}{{{\mathscr {V}}}_{k + 1}} + {\beta _{k + 1}}{{{\mathscr {V}}}_k},{{{\mathscr {V}}}_{k + 1}}} \right\rangle - {\alpha _{k + 1}}\left\langle {{{{\mathscr {U}}}_{k + 1}},{{{\mathscr {U}}}_{k + 1}}} \right\rangle }}{{{\beta _{k + 2}}}}, ~ i = k + 1 \end{array} \right. \\ {}&= 0. \end{aligned} \end{aligned}$$

Similarly, we also have

$$\begin{aligned} \begin{aligned} \left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k + 1}}} \right\rangle&= \left\langle {{{{\mathscr {V}}}_i},{{({{\mathscr {M}}}^*({{{\mathscr {U}}}_{k + 1}}) - {\beta _{k + 1}}{{{\mathscr {V}}}_{k}})} / {{\alpha _{k + 1}}}}} \right\rangle \\ {}&= \dfrac{{\left\langle {{{{\mathscr {V}}}_i}},\mathscr {M}^*({{{\mathscr {U}}}_{k + 1}}) \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \dfrac{{\left\langle {{{\mathscr {M}}}({\mathscr {V}_i}),{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \dfrac{{\left\langle {\alpha _i \mathscr {U}_i + \beta _{i+1} {{\mathscr {U}}}_{i+1},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}} \\ {}&= \left\{ \begin{array}{l} \dfrac{{\left\langle {\alpha _i {{\mathscr {U}}}_i + \beta _{i+1} {{\mathscr {U}}}_{i+1},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_i},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}}, ~ ~ ~ i = 1,2, \cdots , k-1 \\ \dfrac{{\left\langle {{\alpha _{k}}{{{\mathscr {U}}}_{k}} + {\beta _{k + 1}}{{{\mathscr {U}}}_{k+1}},{{{\mathscr {U}}}_{k + 1}}} \right\rangle - {\beta _{k + 1}}\left\langle {{{{\mathscr {V}}}_{k}},{{{\mathscr {V}}}_{k}}} \right\rangle }}{{{\alpha _{k + 1}}}}, ~ i = k \end{array} \right. \\ {}&= 0. \end{aligned} \end{aligned}$$

It is easy to obtain that \( ||{{\mathscr {V}}}_{k+1}||_F = ||{{\mathscr {U}}}_{k+1}||_F = 1 \), which means

$$\begin{aligned} \mathbb {U}_k^T \lozenge {\mathbb {U}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$

The proof is completed. \(\square \)

In a similar way we can get the upper operator-bidiagonal procedure of the linear operator \( {{\mathscr {T}}} \): \( \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \)

$$\begin{aligned} {{\mathscr {T}}}({{\mathscr {X}}}) = {\hat{{\mathscr {A}}}} *_L {{\mathscr {X}}} *_L {\hat{\mathscr {B}}}, \end{aligned}$$

where \( {\hat{{\mathscr {A}}}} \in \mathbb {R}^{ {l} \times {m} \times {n}} \) and \( {\hat{{\mathscr {B}}}} \in \mathbb {R}^{ {q} \times {p} \times {n}} \), which can be seen in the following Algorithm 2.

Algorithm 2
figure b

The upper operator-bidiagonal procedure of \({{\mathscr {T}}}\)

Similar to Algorithm 1, assume that

$$\begin{aligned} \begin{aligned}&{\mathbb {V}_k}: = [{{{{\mathscr {V}}}}_1},{{{{\mathscr {V}}}}_2}, \cdots ,{{\mathscr {V}}_k}] \in \mathbb {R}{^{{l} \times pk \times {n}}}, ~ {\mathbb {P}_k}: = [{{{{\mathscr {P}}}}_1},{{{{\mathscr {P}}}}_2}, \cdots ,{{\mathscr {P}}_k}] \in \mathbb {R}{^{{m} \times qk \times {n}}},\\ {}&{B_k} = \left[ {\begin{array}{*{20}{c}} {{\rho _1}}&{}{}{{\theta _2}}&{}{}{}&{}{}{}\\ {}&{}{}{{\rho _2}}&{}{} \ddots &{}{}{}\\ {}&{}{}{}&{}{} \ddots &{}{}{{\theta _k}}\\ {}&{}{}{}&{}{}{}&{}{}{{\rho _k}} \end{array}} \right] ,{{{\tilde{B}}}_k} = \left[ {\begin{array}{*{20}{c}} {{\rho _1}}&{}{}{{\theta _2}}&{}{}{}&{}{}{}&{}{}{}\\ {}&{}{}{{\rho _2}}&{}{} \ddots &{}{}{}&{}{}{}\\ {}&{}{}{}&{}{} \ddots &{}{}{{\theta _k}}&{}{}{}\\ {}&{}{}{}&{}{}{}&{}{}{{\rho _k}}&{}{}{{\theta _{k + 1}}} \end{array}} \right] , \end{aligned} \end{aligned}$$

the recurrence relations of Algorithm 2 can be rewritten as

$$\begin{aligned} {{{{\mathscr {T}}}}^*}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast B_k^T, ~ {{{\mathscr {T}}}}({\mathbb {P}_k}) = {\mathbb {V}_{k + 1}} \circledast {{\tilde{B}}_k}. \end{aligned}$$
(3.5)

Theorem 3.2

The tensors sequences \( \{{{\mathscr {V}}}_k\} \) and \( \{{{\mathscr {P}}}_k\} \) generated by Algorithm 2 are the orthonormal basis of the Krylov space \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{V}}_1}) \) and \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{P}}_1}) \), i.e.

$$\begin{aligned} \mathbb {P}_k^T \lozenge {\mathbb {P}_k} = \mathbb {V}_k^T \lozenge {\mathbb {V}_k} = {I_k}. \end{aligned}$$

Proof

It is similar with Theorem 3.1 and is omitted here. \(\square \)

Comparing (3.3) with (3.5), it can be found that the lower operator-bidiagonal procedure of the adjoint operator \( {{\mathscr {M}}}^* \) is equivalent to upper operator-bidiagonal procedure of the linear operator \( {\mathscr {M}} \).

3.2 New algorithms for Problem 1.1 based on the operator-bidiagonal procedure

In this subsection, we present two Paige’s Algorithms (denoted as Paige1-TTP and Paige2-TTP) based on Algorithms 1 and 2.

Set the initial tensor \( {{\mathscr {X}}}_0 = 0 \) and \( \beta _1 {{\mathscr {U}}}_1 = {{\mathscr {C}}} \). Suppose \( ||{{\mathscr {R}}}||_F^2 = \mathop {\min }\nolimits _{{{\mathscr {X}}}} ||{{{\mathscr {M}}}}({{{\mathscr {X}}}}) - {{{\mathscr {C}}}}||_F^2 \) and \( \hat{{\mathscr {X}}} \) is the solution of Problem 1.1. Then Problem 1.1 is equivalent to

$$\begin{aligned} {{{\mathscr {R}}}} + {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {C}}}} = {\beta _1}{{{{\mathscr {U}}}}_1}, \end{aligned}$$

and its optimality condition is

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({{{\mathscr {M}}}}({\hat{{\mathscr {X}}}}) - {{{\mathscr {C}}}}) = {{\mathscr {O}}}, \end{aligned}$$

i.e.,

$$\begin{aligned} {{{{\mathscr {M}}}}^*}({{\mathscr {R}}}) = {{\mathscr {O}}}. \end{aligned}$$

By constructing the approximate solution \( {{\mathscr {X}}}_k \) of Problem 1.1 as

$$\begin{aligned} {{\mathscr {X}}}_k = {\mathbb {V}_k} \circledast y_k, \end{aligned}$$

we have

$$\begin{aligned} {{{\mathscr {R}}}}_k + {{{\mathscr {M}}}}({\mathbb {V}_k} \circledast y_k) = {\beta _1}{{{{\mathscr {U}}}}_1}, \end{aligned}$$

where \( {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _k}} \right) ^T} \). Combining (3.3), we can deduce that

$$\begin{aligned} {{{\mathscr {R}}}}_k + ({\mathbb {U}_{k + 1}} \circledast {{\tilde{L}}_k}) \circledast y_k = {\beta _1}{{{{\mathscr {U}}}}_1}. \end{aligned}$$
(3.6)

Simplifying (3.6) by (3.4), we obtain that

$$\begin{aligned} {{\tilde{L}}_k}{y_k} = {\beta _1}{e_1} - \mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}_k}. \end{aligned}$$
(3.7)

Clearly, there are two cases of Problem 1.1. In the first case, the equations \( {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {C}}}} \) are consistent, i.e., \( {{\mathscr {R}}} = {{\mathscr {O}}} \). In the second case, \( {{\mathscr {R}}} \ne {{\mathscr {O}}} \), the equations are inconsistent.

We first consider the Case 1 with \( {{\mathscr {R}}} = {{\mathscr {O}}} \). According to (3.7), we have

$$\begin{aligned} {{\tilde{L}}_k}{y_k} = {\beta _1}{e_1}. \end{aligned}$$

Thus y can be obtained by

$$\begin{aligned} {\eta _1} = \frac{{{\beta _1}}}{{{\alpha _1}}}, ~ {\eta _{k + 1}} = - \frac{{{\beta _{k + 1}}}}{{{\alpha _{k + 1}}}}{\eta _k}. \end{aligned}$$

The case with \( {{\mathscr {R}}} \ne {{\mathscr {O}}} \), since \( {\mathscr {R}} \) is unknown a priori, the ith element \( \eta _{i} \) of y obviously cannot be found at the same time as \( {{\mathscr {U}}}_i \) and \( {{\mathscr {V}}}_i \) are produced. We suppose that

$$\begin{aligned} \gamma \equiv \left\langle {{{{{\mathscr {U}}}}_1},{{{\mathscr {R}}}}} \right\rangle , ~ t \equiv {\left( {{\tau _1},{\tau _2}, \cdots ,{\tau _{k - 1}}} \right) ^T} , ~ \left( {\begin{array}{*{20}{c}} 1\\ t\\ {{\tau _k}} \end{array}} \right) \equiv \frac{{\mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}}}}{\gamma }. \end{aligned}$$

Together with (3.3) and (3.4), we have

$$\begin{aligned} {{\tilde{L}}_k}(\mathbb {U}_{k + 1}^T \lozenge {{{\mathscr {R}}}}) = {({\mathbb {U}_{k + 1}}{}{{\tilde{L}}_k})^T} \lozenge {{{\mathscr {R}}}} = {{{\mathscr {M}}}}{({\mathbb {V}_k})^T} \lozenge {{{\mathscr {R}}}} = {\mathbb {V}_k}^T \lozenge ({{{{\mathscr {M}}}}^*}({{{\mathscr {R}}}})) = 0. \end{aligned}$$

Set

$$\begin{aligned} {{{\bar{L}}}_k} = \left[ {\begin{array}{*{20}{c}} {{\beta _2}}&{}{}&{}{}&{}{}\\ {{\alpha _2}}&{}{{\beta _3}}&{}{}&{}{}\\ {}&{} \ddots &{} \ddots &{}{}\\ {}&{}{}&{}{{\alpha _k}}&{}{{\beta _{k + 1}}} \end{array}} \right] , \end{aligned}$$

it follows on dividing by \( \gamma \) that

$$\begin{aligned} {{{\bar{L}}}_k}\left( {\begin{array}{*{20}{c}} t\\ {{\tau _k}} \end{array}} \right) = - {\alpha _1}{e_1}. \end{aligned}$$

Therefore, t can be solved by

$$\begin{aligned} {\tau _k} = - \frac{{{\alpha _k}}}{{{\beta _{k + 1}}}}{\tau _{k - 1}}, ~ k = 1,2, \cdots \end{aligned}$$

where \( {\tau _0} = 1 \). Then assume that \( {z_k} = {\left( {{\xi _1},{\xi _2}, \cdots ,{\xi _k}} \right) ^T}\) and \({w_k} = {\left( {{\omega _1},{\omega _2}, \cdots ,{\omega _k}} \right) ^T} \) such that

$$\begin{aligned} {L_k}{z_k} = {\beta _1}{e_1}, ~{L_k}{w_k} = \left( {\begin{array}{*{20}{c}} 1\\ t \end{array}} \right) . \end{aligned}$$

By \( {y_k} = {z_k} - \gamma {w_k} \), we have

$$\begin{aligned} {\beta _{k + 1}}{\eta _k} = {\beta _{k + 1}}({\xi _k} - \gamma {\omega _k}) = - \gamma {\tau _k}, \end{aligned}$$

i.e.,

$$\begin{aligned} \gamma = {\beta _{k + 1}}\frac{{{\xi _k}}}{{{\beta _{k + 1}}\omega - {\tau _k}}}. \end{aligned}$$

Then, \( {{\mathscr {X}}}_k \) in Algorithm 3 can be redescribed as

$$\begin{aligned} {{{{\mathscr {X}}}}_k} = {\mathbb {V}_k} \circledast {y_k} = {\mathbb {V}_k} \circledast {z_k} - \gamma {\mathbb {V}_k} \circledast {w_k}. \end{aligned}$$

It is easy to find that the residual after the kth step is

$$\begin{aligned} {{\mathscr {R}}}_k = \beta _{k+1}\gamma \omega _{k}{{\mathscr {U}}}_{k+1} -\beta _{k+1}\xi _{k} + \gamma \sum \limits _{i = 1}^k {{\tau _{i - 1}}{\mathscr {U}_i}} . \end{aligned}$$
Algorithm 3
figure c

Paige1-TTP method for solving Problem 1.1

In addition to judging that \( {{\mathscr {X}}}_k \) tends to be stable, combining Algorithm 1 and Algorithm 3, we can easily find that \( \alpha _i \), \( \beta _i \), \( \xi _i \), and \( \omega _i \) become negligible. And \( \gamma \) will also tend to be stable, so we can also use \( ||{{{\gamma }}_k} - {{{\gamma }}_{k - 1}}|| \le \varepsilon \) or \(|\xi _i| \le \varepsilon \) as the stopping criterion. It also shows that Algorithm 3 will converge to the least squares solution.

Now we will give the convergence theorem for Algorithm 3.

Theorem 3.3

The sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 3 converges to the minimum Frobenius norm solution \( {{\mathscr {X}}}^* \) of Problem 1.1 in a finite number of steps.

Proof

Suppose that the sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 3 converges to \( {{\mathscr {X}}}^* \), which satisfies

$$\begin{aligned} {{\mathscr {X}}}^* = {\mathbb {V}_k} \circledast y_k \in {{\mathscr {K}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}), ~ {{\mathscr {M}}}^*(\mathscr {R}^*) = {{\mathscr {O}}}, \end{aligned}$$

where \( {{\mathscr {R}}}^* = {{\mathscr {M}}}({{\mathscr {X}}}^*) - {{\mathscr {C}}} \). By introducing an auxiliary tensor \( {{\mathscr {D}}} \in \mathbb {R}^{l \times q \times n} \) such that \( {{\mathscr {X}}}^* = {{\mathscr {M}}}^*({{\mathscr {D}}}) \). Let \( {\tilde{{\mathscr {X}}}} \) be any least squares solution of Problem 1.1 and \( {{\mathscr {Y}}} = {\tilde{{\mathscr {X}}}} - {{\mathscr {X}}}^* \). We can obtain that

$$\begin{aligned} \begin{aligned} ||{{\mathscr {R}}}^*||_{F}^2&= ||{\tilde{{\mathscr {R}}}}||_{F}^2 = ||{{\mathscr {C}}} - {{\mathscr {M}}}({\tilde{{\mathscr {X}}}})||_{F}^2 = ||{{\mathscr {C}}} - {{\mathscr {M}}}(\mathscr {X}^*+{{\mathscr {Y}}})||_{F}^2 \\ {}&= ||{{\mathscr {R}}}^* - {{\mathscr {M}}}({{\mathscr {Y}}}) ||_{F}^2 = ||{{\mathscr {R}}}^*||_{F}^2 -2\left\langle {{{\mathscr {R}}}^*,\mathscr {M}({{\mathscr {Y}}})} \right\rangle + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2 \\ {}&= ||{{\mathscr {R}}}^*||_{F}^2 -2\left\langle {{{\mathscr {M}}}^*({{\mathscr {R}}}^*),\mathscr {Y}} \right\rangle + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2 \\ {}&= ||\mathscr {R}^*||_{F}^2 + ||{{\mathscr {M}}}({{\mathscr {Y}}})||_{F}^2, \end{aligned} \end{aligned}$$

which implies that \( {{\mathscr {M}}}({{\mathscr {Y}}}) = {{\mathscr {O}}} \). Then we have

$$\begin{aligned} \begin{aligned} ||{\tilde{{\mathscr {X}}}}||_{F}^2&= ||{{{\mathscr {X}}}^*}+{{\mathscr {Y}}}||_{F}^2 = ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {{{\mathscr {X}}}^*,{{\mathscr {Y}}}} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {{{\mathscr {M}}}^*({{\mathscr {D}}}),{{\mathscr {Y}}}} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 -2\left\langle {\mathscr {D},{{\mathscr {M}}}({{\mathscr {Y}}})} \right\rangle + ||{{\mathscr {Y}}}||_{F}^2 \\ {}&= ||{{\mathscr {X}}}^*||_{F}^2 + ||{{\mathscr {Y}}}||_{F}^2 \ge ||{{\mathscr {X}}}^*||_{F}^2. \end{aligned} \end{aligned}$$

The equality holds if and only if \( {{\mathscr {Y}}} = {{\mathscr {O}}} \), which implies that \( {{\mathscr {X}}}^* \) is the minimum Frobenius norm solution. \(\square \)

Next, we will give the Paige2-TTP method. The upper operator-bidiagonal procedure of the operator \( {{\mathscr {M}}} \) is equivalent to the lower operator-bidiagonal procedure of the operator \( {{\mathscr {M}}}^* \) as mentioned above. Therefore, we have

$$\begin{aligned} {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast L_k^T, ~ {{{{\mathscr {M}}}}^*}({\mathbb {P}_k}) = {\mathbb {V}_{k + 1}} \circledast {{\tilde{L}}_k},~ {\mathbb {P}}_{k}^{T} \lozenge {\mathbb {P}}_{k} = {\mathbb {V}}_{k}^{T} \lozenge {\mathbb {V}}_{k} = I_k. \end{aligned}$$

As can be seen in Algorithm 3, we construct the approximate solution as

$$\begin{aligned} {{\mathscr {X}}}_k = {\mathbb {V}_k} \circledast y_k. \end{aligned}$$

For convenience, we choose \( {\theta _1}{{{{\mathscr {V}}}}_1} = {{\mathcal{M}}^*}({{{\mathscr {C}}}}) \). Paige [32] points out that when \( {\theta _1}{{{{\mathscr {V}}}}_1} = {{{{\mathscr {M}}}}^*}({{{\mathscr {C}}}}) \), the diagonalization process will stop when \( {\theta _{k+1}}{{\mathcal{V}}_{k+1}} ={{\mathscr {O}}} \), i.e.

$$\begin{aligned} {{{\mathscr {M}}}}({\mathbb {V}_k}) = {\mathbb {P}_k} \circledast L_k^T, ~ {{{{\mathscr {M}}}}^*}({\mathbb {P}_k}) = {\mathbb {V}_{k}} \circledast {L_k}. \end{aligned}$$

Let \( {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _3}} \right) ^T} \). We are more interested in \( {\mathbb {V}_k} \circledast y_k \) than in the end of the bidiagonalization, which completely solves for \( y_k \) and then determines \( X_k \). We rederive the iterative relation

$$\begin{aligned} {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {M}}}}({\mathbb {V}_k} \circledast {y_k}) = ({\mathbb {P}_k}\circledast L_k^T)\circledast {y_k} = {\mathbb {P}_k}\circledast (L_k^T{y_k}) = {{{\mathscr {C}}}} - {{{\mathscr {R}}}}, \end{aligned}$$

which implies

$$\begin{aligned} L_k^T{y_k} = {\mathbb {P}_k}\circledast ({{{\mathscr {C}}}} - {{{\mathscr {R}}}}). \end{aligned}$$

Notice that

$$\begin{aligned} \begin{aligned} {L_k}({\mathbb {P}_k}^T\lozenge ({{{\mathscr {C}}}} - {{{\mathscr {R}}}}))&= {({\mathbb {P}_k}\circledast L_k^T)^T}\lozenge ({{{\mathscr {C}}}} - {\mathscr {R}}) = {({{{\mathscr {M}}}}({\mathbb {V}_k}))^T}\lozenge ({{{\mathscr {C}}}} - {\mathscr {R}}) \\ {}&= \mathbb {V}_k^T\lozenge ({{{{\mathscr {M}}}}^*}({{{\mathscr {C}}}} - {{{\mathscr {R}}}})) = {\theta _1}{e_1}. \end{aligned} \end{aligned}$$

By assuming that \( z = {\mathbb {P}_k}^T\lozenge ({{{\mathscr {C}}}} - {\mathcal{R}}) \), we have

$$\begin{aligned} L_k^T{y_k} = z, ~ {L_k}z = {\theta _1}{e_1}. \end{aligned}$$
(3.8)

Combining (3.8) we obtain that

$$\begin{aligned} {{{{\mathscr {X}}}}_k} = {\mathbb {V}_k}\circledast {y_k} = ({\mathbb {V}_k}\circledast L_k^{ - T})\circledast (L_k^T{y_k}) = {\mathbb {W}_k}\circledast z, \end{aligned}$$

where \( {\mathbb {W}_k} = {\mathbb {V}_k}\circledast L_k^{ - T} \) and \( {\mathbb {W}_k} = [{{{{\mathscr {W}}}}_1},{{{{\mathscr {W}}}}_2}, \cdots ,{\mathcal{W}}] \). We can determine \( {{\mathscr {W}}}_i \) sequentially by

$$\begin{aligned} {\mathbb {W}_k} \circledast L_k^{T} = {\mathbb {V}_k}. \end{aligned}$$

We give the Paige2-TTP Algorithm for solving Problem 1.1 as follows.

Algorithm 4
figure d

Paige2-TTP method for solving Problem 1.1

We also give the convergence theorem for Algorithm 4.

Theorem 3.4

The sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 4 converges to the minimum Frobenius norm solution \( {{\mathscr {X}}}^* \) of Problem 1.1 in a finite number of steps.

Proof

It is similar with Theorem 3.3 and is omitted here. \(\square \)

3.3 Computational complexity analysis

In this subsection, we provide the computational complexity of Algorithms 3 and 4. We note that the computational cost of the multiplication operations of \( L({{\mathscr {A}}}) = {{\mathscr {A}}} \times _{3}M \) and \( {{\mathscr {A}}} {* _L} {{\mathscr {B}}} \) are \( O(mln^2) \) and \( O(mln^2 + lpn^2 + mlpn) \), respectively. Thus the cost of computing \( {{\mathscr {U}}}_k \) and \( {{\mathscr {V}}}_k \) in Algorithm 1 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \). Similarly, the cost of computing \( {{\mathscr {V}}}_k \) and \( {{\mathscr {P}}}_k \) in Algorithm 2 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \).

The computational cost of Algorithm 3 involves mainly tensor multiplication and addition. The computational cost of each iteration of Algorithm 3 is O(lpn) , except that the cost of calculating \( {{\mathscr {U}}}_k \) and \( {{\mathscr {V}}}_k \) is the same as in Algorithm 1, which means that the computational complexity of Algorithm 3 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \). Analogously, the computational cost of Algorithm 4 is mainly posed by steps 4 and 5. Therefore, the computational cost for Algorithm 4 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \).

4 Numerical experiments

In this section we give numerical examples and simulation experiments of image restoration to illustrate the performance of Algorithms 3 and 4. All of the tests are implemented in MATLAB R2018b with the machine precision \( 10^{-16} \) on PC (Intel(R) Core(TM) i5-1155G7), where the CPU is 2.50 GHz and the memory is 16.0 GB. The implementations of the algorithms are based on the functions from the MATLAB Tensor Toolbox developed by Bader and Kolda [1].

Example 4.1

Consider Problem 1.1 with \( m = 5 \), \( l = 4 \), \( n = 3 \), \( p = 4 \), and \( q = 5 \), where the tensors \( {{\mathscr {A}}} \) and \( {{\mathscr {B}}} \) are constructed by using the MATLAB command \( rand(I_1, I_2, I_3) \) as follow

$$\begin{aligned}&\begin{aligned} \mathscr {A}(:,:,1)&= \left( {\begin{array}{*{20}{c}} {0.7663}&{}{\mathrm{{0}}\mathrm{{.9361}}}&{}{\mathrm{{0}}\mathrm{{.3733}}}&{}{\mathrm{{0}}\mathrm{{.1862}}}\\ {\mathrm{{0}}\mathrm{{.1746}}}&{}{\mathrm{{0}}\mathrm{{.8927}}}&{}{\mathrm{{0}}\mathrm{{.1163}}}&{}{\mathrm{{0}}\mathrm{{.2087}}}\\ {\mathrm{{0}}\mathrm{{.5031}}}&{}{\mathrm{{0}}\mathrm{{.4435}}}&{}{\mathrm{{0}}\mathrm{{.5251}}}&{}{0.5304}\\ {\mathrm{{0}}\mathrm{{.3827}}}&{}{\mathrm{{0}}\mathrm{{.7515}}}&{}{\mathrm{{0}}\mathrm{{.3411}}}&{}{\mathrm{{0}}\mathrm{{.0843}}}\\ {\mathrm{{0}}\mathrm{{.9772}}}&{}{\mathrm{{0}}\mathrm{{.9094}}}&{}{\mathrm{{0}}\mathrm{{.8494}}}&{}{0.2408} \end{array}} \right) , \\ {{\mathscr {A}}}(:,:,2)&= \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.6346}}}&{}{\mathrm{{0}}\mathrm{{.3135}}}&{}{\mathrm{{0}}\mathrm{{.4382}}}&{}{\mathrm{{0}}\mathrm{{.7165}}}\\ {\mathrm{{0}}\mathrm{{.1111}}}&{}{\mathrm{{0}}\mathrm{{.7621}}}&{}{\mathrm{{0}}\mathrm{{.1864}}}&{}{\mathrm{{0}}\mathrm{{.6203}}}\\ {\mathrm{{0}}\mathrm{{.6073}}}&{}{\mathrm{{0}}\mathrm{{.9545}}}&{}{\mathrm{{0}}\mathrm{{.4431}}}&{}{\mathrm{{0}}\mathrm{{.0898}}}\\ {\mathrm{{0}}\mathrm{{.8898}}}&{}{\mathrm{{0}}\mathrm{{.6459}}}&{}{\mathrm{{0}}\mathrm{{.4881}}}&{}{\mathrm{{0}}\mathrm{{.1118}}}\\ {\mathrm{{0}}\mathrm{{.4907}}}&{}{\mathrm{{0}}\mathrm{{.4853}}}&{}{\mathrm{{0}}\mathrm{{.8611}}}&{}{\mathrm{{0}}\mathrm{{.5756}}} \end{array}} \right) ,\\ {{\mathscr {A}}}(:,:,3)&= \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.4617}}}&{}{\mathrm{{0}}\mathrm{{.4819}}}&{}{\mathrm{{0}}\mathrm{{.5076}}}&{}{\mathrm{{0}}\mathrm{{.1959}}}\\ {\mathrm{{0}}\mathrm{{.5454}}}&{}{\mathrm{{0}}\mathrm{{.1605}}}&{}{\mathrm{{0}}\mathrm{{.2380}}}&{}{\mathrm{{0}}\mathrm{{.9748}}}\\ {\mathrm{{0}}\mathrm{{.0380}}}&{}{\mathrm{{0}}\mathrm{{.7544}}}&{}{\mathrm{{0}}\mathrm{{.1927}}}&{}{\mathrm{{0}}\mathrm{{.5751}}}\\ {\mathrm{{0}}\mathrm{{.5793}}}&{}{\mathrm{{0}}\mathrm{{.7935}}}&{}{\mathrm{{0}}\mathrm{{.8087}}}&{}{\mathrm{{0}}\mathrm{{.5832}}}\\ {\mathrm{{0}}\mathrm{{.6290}}}&{}{\mathrm{{0}}\mathrm{{.2914}}}&{}{\mathrm{{0}}\mathrm{{.0667}}}&{}{\mathrm{{0}}\mathrm{{.9500}}} \end{array}} \right) ; \end{aligned}\\&\begin{aligned} {{\mathscr {B}}}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.4521}}}&{}{\mathrm{{0}}\mathrm{{.0431}}}&{}{\mathrm{{0}}\mathrm{{.0431}}}&{}{\mathrm{{0}}\mathrm{{.8928}}}&{}{\mathrm{{0}}\mathrm{{.6670}}}\\ {\mathrm{{0}}\mathrm{{.0268}}}&{}{\mathrm{{0}}\mathrm{{.5561}}}&{}{\mathrm{{0}}\mathrm{{.7263}}}&{}{\mathrm{{0}}\mathrm{{.7794}}}&{}{\mathrm{{0}}\mathrm{{.9781}}}\\ {\mathrm{{0}}\mathrm{{.4253}}}&{}{\mathrm{{0}}\mathrm{{.6113}}}&{}{\mathrm{{0}}\mathrm{{.7133}}}&{}{\mathrm{{0}}\mathrm{{.4949}}}&{}{\mathrm{{0}}\mathrm{{.1171}}}\\ {\mathrm{{0}}\mathrm{{.1058}}}&{}{\mathrm{{0}}\mathrm{{.7388}}}&{}{\mathrm{{0}}\mathrm{{.5038}}}&{}{\mathrm{{0}}\mathrm{{.7615}}}&{}{\mathrm{{0}}\mathrm{{.8693}}} \end{array}} \right) , \\ {{\mathscr {B}}}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.5679}}}&{}{\mathrm{{0}}\mathrm{{.0623}}}&{}{\mathrm{{0}}\mathrm{{.9191}}}&{}{\mathrm{{0}}\mathrm{{.4355}}}&{}{\mathrm{{0}}\mathrm{{.1427}}}\\ {\mathrm{{0}}\mathrm{{.1202}}}&{}{\mathrm{{0}}\mathrm{{.2781}}}&{}{\mathrm{{0}}\mathrm{{.4152}}}&{}{\mathrm{{0}}\mathrm{{.6354}}}&{}{\mathrm{{0}}\mathrm{{.2006}}}\\ {\mathrm{{0}}\mathrm{{.9703}}}&{}{\mathrm{{0}}\mathrm{{.4096}}}&{}{\mathrm{{0}}\mathrm{{.7115}}}&{}{\mathrm{{0}}\mathrm{{.9016}}}&{}{\mathrm{{0}}\mathrm{{.6209}}}\\ {\mathrm{{0}}\mathrm{{.8400}}}&{}{\mathrm{{0}}\mathrm{{.6268}}}&{}{\mathrm{{0}}\mathrm{{.9709}}}&{}{\mathrm{{0}}\mathrm{{.8716}}}&{}{\mathrm{{0}}\mathrm{{.7750}}} \end{array}} \right) , \\ {{\mathscr {B}}}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm{{0}}\mathrm{{.2468}}}&{}{\mathrm{{0}}\mathrm{{.1708}}}&{}{\mathrm{{0}}\mathrm{{.0821}}}&{}{\mathrm{{0}}\mathrm{{.6053}}}&{}{\mathrm{{0}}\mathrm{{.6116}}}\\ {\mathrm{{0}}\mathrm{{.3695}}}&{}{\mathrm{{0}}\mathrm{{.7456}}}&{}{\mathrm{{0}}\mathrm{{.7815}}}&{}{\mathrm{{0}}\mathrm{{.3545}}}&{}{\mathrm{{0}}\mathrm{{.8432}}}\\ {\mathrm{{0}}\mathrm{{.3411}}}&{}{\mathrm{{0}}\mathrm{{.0980}}}&{}{\mathrm{{0}}\mathrm{{.8487}}}&{}{\mathrm{{0}}\mathrm{{.7591}}}&{}{\mathrm{{0}}\mathrm{{.4570}}}\\ {\mathrm{{0}}\mathrm{{.4564}}}&{}{\mathrm{{0}}\mathrm{{.8903}}}&{}{\mathrm{{0}}\mathrm{{.4509}}}&{}{\mathrm{{0}}\mathrm{{.5894}}}&{}{\mathrm{{0}}\mathrm{{.8828}}} \end{array}} \right) . \end{aligned} \end{aligned}$$

The tensor \( {\mathscr {C}} \) of Problem 1.1 is generated by \({{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}}^* *_L {{\mathscr {B}}} \), where \( {{\mathscr {X}}}^* \) is the exact solution.

$$\begin{aligned}&\begin{aligned} {{\mathscr {C}}}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm{{2}}\mathrm{{.0709}}}&{}{\mathrm{{2}}\mathrm{{.9842}}}&{}{\mathrm{{3}}\mathrm{{.7128}}}&{}{\mathrm{{4}}\mathrm{{.6192}}}&{}{\mathrm{{4}}\mathrm{{.2295}}}\\ {\mathrm{{1}}\mathrm{{.2905}}}&{}{\mathrm{{2}}\mathrm{{.2195}}}&{}{\mathrm{{2}}\mathrm{{.7104}}}&{}{\mathrm{{3}}\mathrm{{.5677}}}&{}{\mathrm{{3}}\mathrm{{.2243}}}\\ {\mathrm{{2}}\mathrm{{.0320}}}&{}{\mathrm{{2}}\mathrm{{.6432}}}&{}{\mathrm{{3}}\mathrm{{.3329}}}&{}{\mathrm{{4}}\mathrm{{.2888}}}&{}{\mathrm{{3}}\mathrm{{.7896}}}\\ {\mathrm{{1}}\mathrm{{.3801}}}&{}{\mathrm{{2}}\mathrm{{.6006}}}&{}{\mathrm{{2}}\mathrm{{.7593}}}&{}{\mathrm{{3}}\mathrm{{.9167}}}&{}{\mathrm{{3}}\mathrm{{.8628}}}\\ {\mathrm{{2}}\mathrm{{.7502}}}&{}{\mathrm{{3}}\mathrm{{.6893}}}&{}{\mathrm{{4}}\mathrm{{.8344}}}&{}{\mathrm{{5}}\mathrm{{.7619}}}&{}{\mathrm{{5}}\mathrm{{.1022}}} \end{array}} \right) , \\ {{\mathscr {C}}}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm{{1}}\mathrm{{.9866}}}&{}{\mathrm{{1}}\mathrm{{.1023}}}&{}{\mathrm{{2}}\mathrm{{.5292}}}&{}{\mathrm{{2}}\mathrm{{.4022}}}&{}{\mathrm{{1}}\mathrm{{.4136}}}\\ {\mathrm{{1}}\mathrm{{.3878}}}&{}{\mathrm{{0}}\mathrm{{.6701}}}&{}{\mathrm{{1}}\mathrm{{.9098}}}&{}{\mathrm{{1}}\mathrm{{.7834}}}&{}{\mathrm{{0}}\mathrm{{.9858}}}\\ {\mathrm{{1}}\mathrm{{.6750}}}&{}{\mathrm{{0}}\mathrm{{.8739}}}&{}{\mathrm{{2}}\mathrm{{.2706}}}&{}{\mathrm{{2}}\mathrm{{.1246}}}&{}{\mathrm{{1}}\mathrm{{.1841}}}\\ {\mathrm{{1}}\mathrm{{.9704}}}&{}{\mathrm{{0}}\mathrm{{.7190}}}&{}{\mathrm{{2}}\mathrm{{.2637}}}&{}{\mathrm{{1}}\mathrm{{.9022}}}&{}{\mathrm{{1}}\mathrm{{.0239}}}\\ {\mathrm{{2}}\mathrm{{.2009}}}&{}{\mathrm{{1}}\mathrm{{.1848}}}&{}{\mathrm{{2}}\mathrm{{.8530}}}&{}{\mathrm{{2}}\mathrm{{.7460}}}&{}{\mathrm{{1}}\mathrm{{.5583}}} \end{array}} \right) , \end{aligned}\\&{{\mathscr {C}}}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm{{1}}\mathrm{{.5129}}}&{}{\mathrm{{2}}\mathrm{{.6267}}}&{}{\mathrm{{2}}\mathrm{{.5285}}}&{}{\mathrm{{3}}\mathrm{{.1925}}}&{}{\mathrm{{3}}\mathrm{{.8561}}}\\ {\mathrm{{1}}\mathrm{{.6347}}}&{}{\mathrm{{2}}\mathrm{{.4602}}}&{}{\mathrm{{2}}\mathrm{{.6092}}}&{}{\mathrm{{3}}\mathrm{{.0806}}}&{}{\mathrm{{3}}\mathrm{{.3725}}}\\ {\mathrm{{1}}\mathrm{{.2401}}}&{}{\mathrm{{2}}\mathrm{{.0938}}}&{}{\mathrm{{1}}\mathrm{{.9912}}}&{}{\mathrm{{2}}\mathrm{{.8409}}}&{}{\mathrm{{3}}\mathrm{{.2008}}}\\ {\mathrm{{2}}\mathrm{{.4940}}}&{}{\mathrm{{3}}\mathrm{{.4742}}}&{}{\mathrm{{3}}\mathrm{{.7211}}}&{}{\mathrm{{4}}\mathrm{{.8108}}}&{}{\mathrm{{5}}\mathrm{{.2201}}}\\ {\mathrm{{1}}\mathrm{{.6014}}}&{}{\mathrm{{3}}\mathrm{{.1166}}}&{}{\mathrm{{3}}\mathrm{{.0049}}}&{}{\mathrm{{3}}\mathrm{{.6039}}}&{}{\mathrm{{4}}\mathrm{{.2954}}} \end{array}} \right) . \end{aligned}$$

Given a random invertible linear transformation matrix

$$\begin{aligned} M = \left( {\begin{array}{*{20}{c}} { - 0.5560}&{}{0.1429}&{}{ - 0.8188}\\ { - 0.6669}&{}{ - 0.6646}&{}{0.3368}\\ { - 0.4961}&{}{0.7334}&{}{0.4648} \end{array}} \right) , \end{aligned}$$

we solve Problem 1.1 by using Algorithms 3 and 4 with \( \varepsilon = 10^{-6} \). After 129 iterations, we get the solution

$$\begin{aligned} {{\mathscr {X}}}^* = {{\mathscr {X}}}_{Paige1}^{129} = {{\mathscr {X}}}_{Paige2}^{129}, \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} {{\mathscr {X}}}_{Paige1}^{129}(:,:,1) = {{\mathscr {X}}}_{Paige2}^{129}(:,:,1) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.1000}}}&{}{}{\mathrm {{0}}\mathrm {{.1670}}}&{}{}{\mathrm {{0}}\mathrm {{.3653}}}&{}{}{\mathrm {{0}}\mathrm {{.5469}}}\\ {\mathrm {{0}}\mathrm {{.8787}}}&{}{}{\mathrm {{0}}\mathrm {{.6461}}}&{}{}{\mathrm {{0}}\mathrm {{.4024}}}&{}{}{\mathrm {{0}}\mathrm {{.9860}}}\\ {\mathrm {{0}}\mathrm {{.8915}}}&{}{}{\mathrm {{0}}\mathrm {{.1803}}}&{}{}{\mathrm {{0}}\mathrm {{.0898}}}&{}{}{\mathrm {{0}}\mathrm {{.2127}}}\\ {\mathrm {{0}}\mathrm {{.2140}}}&{}{}{\mathrm {{0}}\mathrm {{.1165}}}&{}{}{\mathrm {{0}}\mathrm {{.6838}}}&{}{}{\mathrm {{0}}\mathrm {{.9190}}} \end{array}} \right) , \\ {{\mathscr {X}}}_{Paige1}^{129}(:,:,2) = \mathscr {X}_{Paige2}^{129}(:,:,2) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.6652}}}&{}{}{\mathrm {{0}}\mathrm {{.1063}}}&{}{}{\mathrm {{0}}\mathrm {{.7282}}}&{}{}{\mathrm {{0}}\mathrm {{.7956}}}\\ {\mathrm {{0}}\mathrm {{.7435}}}&{}{}{\mathrm {{0}}\mathrm {{.8838}}}&{}{}{\mathrm {{0}}\mathrm {{.2856}}}&{}{}{\mathrm {{0}}\mathrm {{.0735}}}\\ {\mathrm {{0}}\mathrm {{.3785}}}&{}{}{\mathrm {{0}}\mathrm {{.4451}}}&{}{}{\mathrm {{0}}\mathrm {{.5937}}}&{}{}{\mathrm {{0}}\mathrm {{.9927}}}\\ {\mathrm {{0}}\mathrm {{.4748}}}&{}{}{\mathrm {{0}}\mathrm {{.6682}}}&{}{}{\mathrm {{0}}\mathrm {{.3705}}}&{}{}{\mathrm {{0}}\mathrm {{.7760}}} \end{array}} \right) , \\ {{\mathscr {X}}}_{Paige1}^{129}(:,:,3) = \mathscr {X}_{Paige2}^{129}(:,:,3) = \left( {\begin{array}{*{20}{c}} {\mathrm {{0}}\mathrm {{.0074}}}&{}{}{\mathrm {{0}}\mathrm {{.8850}}}&{}{}{\mathrm {{0}}\mathrm {{.9972}}}&{}{}{\mathrm {{0}}\mathrm {{.6549}}}\\ {\mathrm {{0}}\mathrm {{.8889}}}&{}{}{\mathrm {{0}}\mathrm {{.4656}}}&{}{}{\mathrm {{0}}\mathrm {{.8698}}}&{}{}{\mathrm {{0}}\mathrm {{.4118}}}\\ {\mathrm {{0}}\mathrm {{.8641}}}&{}{}{\mathrm {{0}}\mathrm {{.3716}}}&{}{}{\mathrm {{0}}\mathrm {{.0001}}}&{}{}{\mathrm {{0}}\mathrm {{.4764}}}\\ {\mathrm {{0}}\mathrm {{.6976}}}&{}{}{\mathrm {{0}}\mathrm {{.0830}}}&{}{}{\mathrm {{0}}\mathrm {{.6800}}}&{}{}{\mathrm {{0}}\mathrm {{.3883}}} \end{array}} \right) . \end{aligned} \end{aligned}$$

The residual errors are

$$\begin{aligned} \Vert {{\mathscr {A}}} *_L {{\mathscr {X}}}_{Paige1}^{129} *_L {{\mathscr {B}}} - {{\mathscr {C}}}\Vert _F = 2.5535 \times 10^{-7}, ~ \Vert {{\mathscr {A}}} *_L {{\mathscr {X}}}_{Paige2}^{129} *_L {{\mathscr {B}}} - {{\mathscr {C}}}\Vert _F = 2.0378 \times 10^{-7}. \end{aligned}$$

The relative errors are

$$\begin{aligned} \frac{{\Vert {{\mathscr {X}}}_{Paige1}^{129} - {{\mathscr {X}}}^*\Vert _F}}{{\Vert \mathscr {X}^*\Vert _F}} = 1.6625 \times 10^{-7}, ~ \frac{{\Vert \mathscr {X}_{Paige2}^{129} - {{\mathscr {X}}}^*\Vert _F}}{{\Vert {{\mathscr {X}}}^*\Vert _F}} = 1.3598 \times 10^{-7}. \end{aligned}$$

We also give the convergence curves of Algorithms 3 and 4 in the figure 1.

Fig. 1
figure 1

Convergence curves of Algorithms 3 and 4

Example 1 demonstrates the feasibility of Algorithms 3 and 4 in solving Problem 1.1. To further illustrate their effectiveness, we present two simulation experiments for color image restoration as follows.

Colored 3D models for the image restoration can be modeled as tensor expressions [16]

$$\begin{aligned} {{{\mathscr {B}}}} = {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}} + {{{\mathscr {E}}}}, \end{aligned}$$

where \( {\mathscr {B}}\), \( {\mathscr {A}}\) and \( {{\mathscr {E}}} \in \mathbb {R}^{n \times n \times 3} \) are the observation image, the blurring operator and the noise, respectively. And \( {\mathscr {X}} \) is the restored image to be sought. Each frontal slice of \( {{\mathscr {A}}}\in \mathbb {R}^{n \times n \times 3} \) is a Toeplitz matrix obtained from a matrix \( {S} \in \mathbb {R}^{ {n} \times {n} }\), where S is a two-dimensional Gaussian function

$$\begin{aligned} S(i,j) = \left\{ \begin{array}{l} \frac{1}{{\sqrt{2\pi } \sigma }}\exp ( - \frac{{{{(i - j)}^2}}}{{2{\sigma ^2}}}),|i - j| \le r,\\ 0,\quad \quad \quad \quad \quad \quad \;otherwise, \end{array} \right. \end{aligned}$$
(4.1)

and \( {\mathscr {A}} \) is obtained from

$$\begin{aligned} {{\mathscr {A}}}^{(1)} = \alpha S, ~ {{\mathscr {A}}}^{(2)} = \beta S, ~ \mathscr {A}^{(3)} = \gamma S. \end{aligned}$$
(4.2)

Here \( \alpha \), \( \beta \) and \( \gamma \) are the entries of the circular matrix

$$\begin{aligned} S_{color} = \left( {\begin{array}{*{20}{c}} \alpha &{}\gamma &{}\beta \\ \beta &{}\alpha &{}\gamma \\ \gamma &{}\beta &{}\alpha \end{array}} \right) , \end{aligned}$$

obtained from [20], which satisfy \( \alpha +\beta +\gamma = 1 \). This matrix gives rise to cyclic mixing between different channels (RGB channels). We set the noise level

$$\begin{aligned} v: = \frac{\Vert {{\mathscr {E}}}\Vert _F}{\Vert {{{\mathscr {A}}}}{ * _L}{{{\mathscr {X}}}}\Vert _F}. \end{aligned}$$

The noise \( {{\mathscr {E}}} \) obeys a Gaussian distribution, which has a zero mean and v variance.

We use Algorithm 3, Algorithm 4, PGD algorithm [25], CG algorithm [33], NSPG algorithm [4] and GMRES algorithm [16] to solve the above model with \( *_c \) product and t-product. The stopping criterion for all algorithms is either

$$\begin{aligned} \frac{{\Vert {{{{\mathscr {X}}}}_k} - {{{{\mathscr {X}}}}_{k - 1}}\Vert }}{{\Vert {{{{\mathscr {X}}}}_{k - 1}}\Vert }} \le {10^{ - 4}}, \end{aligned}$$

or the iteration step k reached the upper limit 1000. All color images in the experiments are from the USC-SIPI image database. In the experiments, we use CPU time, relative error (RE) and peak signal-to-noise ratio (PSNR) as the measurement for the restoration effect of the color image. The relative error (RE) and peak signal-to-noise ratio (PSNR) are defined as

$$\begin{aligned} RE = \frac{{\Vert {{\mathscr {X}} - \mathscr {Q}}\Vert _F}}{{\Vert {{\mathscr {Q}}}\Vert _F}},~PSNR = 10{\log _{10}}(\frac{{{{\mathscr {Q}}}_{\max }^2{I_1}{I_2}{I_3}}}{{\Vert {{\mathscr {X}} - \mathscr {Q}}\Vert _F^2}}), \end{aligned}$$

where \( {\mathscr {X}} \) and \( {\mathscr {Q}} \) are the restored image and the real image, respectively. The higher PSNR and the lower RE, the better restoration performance.

Example 4.2

We consider Problem 1.1 with \( {\mathscr {A}} \) obtained from (4.1) and (4.2) by setting \( \sigma = 2 \), \( r = 3 \), \( \alpha = 0.5 \), \( \beta = 0.1 \) and \( \gamma = 0.4 \) as inside channel and cross-channel blur, respectively. Take the \( House \in \mathbb {R}^{512 \times 512 \times 3} \) and \( Airplane \in \mathbb {R}^{512 \times 512 \times 3} \) to be the true images. And we get a series of blurred images \( {{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} *_L {{\mathscr {B}}} +{{\mathscr {E}}} \) by adding noise \( {{\mathscr {E}}} \) with noise level \( v=10^{-4} \), where \( {\mathscr {B}} \) is a third-order tensor with \( {{\mathscr {B}}}^{(1)} = S^T \) and \( {{\mathscr {B}}}^{(2)} = {{\mathscr {B}}}^{(3)} = 0 \).

In the implementation of the algorithms for this example, we simultaneously considered the cases of t-product and \( *_c \) product. We present visual representations of restored images obtained through various algorithms. Additionally, we magnify the local details in these images to compare the effectiveness of different restoration techniques on blurred images. We also provide numerical results for the CPU time, relative error (RE), and peak signal-to-noise ratio (PSNR) for each algorithm to compare their performance. All experimental results are listed in Figs. 23 and Tables 12.

Figures 2 and 3 show the visual effects of the restored images (House and Airplane). We can observe that the two images restored by Algorithms 3 and 4 have clearer texture details and features. This indicates that the results recovered by Algorithms 3 and 4 are of higher quality and more similar to the original images.

On the other hand, from Tables 1 and 2, we can see that our methods (Algorithms 3 and 4) obtain the highest PSNR value and the least RE value and take the least CPU time. The higher PSNR value and the lower RE value, the better recovery performance. So our methods outperform PGD, CG, NSPG and GMRES methods in PSNR and RE value.

Fig. 2
figure 2

The visualization of the restoration image (House) for each algorithm. Experiments were performed under \( *_c \) product

Fig. 3
figure 3

The visualization of the restoration image (Airplane) for each algorithm. Experiments were performed under t-product

Table 1 The computational results for Algorithms 3, 4, PGD, CG, NSPG and GMRES when the stopping criterion is reached under \( *_c \) product
Table 2 The computational results for Algorithms 3, 4, PGD, CG, NSPG and GMRES when the stopping criterion is reached under t-product

The image restoration model is a typical ill-posed problem. When we use an algorithm to solve this problem, we often encounter the semi-convergenceFootnote 1 phenomenon. Especially, when the residual of Problem 1.1 is less than \( \Vert {{\mathscr {E}}}\Vert _F \), the tensor least squares problem will treat some of the noise as part of the image to be restored. The happening of overfitting leads to the semi-convergence phenomenon, which makes the restored image significantly different from the real image. This phenomenon is obvious in solving the image restoration model with a high noise levelFootnote 2. In this case, we add the Tikhonov regularization into Problem 1.1 to overcome this difficulty, i.e.,

$$\begin{aligned} \mathop {\min }\limits _{{{\mathscr {X}}}\in \mathbb {R}^{ {n} \times {n} \times {3}}} \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}} - {{{\mathscr {B}}}}\Vert _F^2 + {\mu }\Vert {{\mathscr {X}}}\Vert _F^2. \end{aligned}$$
(4.3)

Set the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {n} \times {n} \times {3}} \rightarrow \mathbb {R}^{ {2n} \times {n} \times {3}} \) be

$$\begin{aligned} {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = \left[ \begin{array}{*{20}{c}} {{{\mathscr {A}}}}\\ {\mu ^{{{ 1} / 2}}{{\mathscr {I}}}} \end{array} \right] {\mathcal{X}}, \end{aligned}$$

and the tensor \( {{\mathscr {C}}} \in \mathbb {R}^{ {2n} \times {n} \times {3}} \) be

$$\begin{aligned} {{\mathscr {C}}} = \left[ \begin{array}{l} {{{\mathscr {B}}}}\\ {{\mathscr {O}}} \end{array} \right] . \end{aligned}$$

Our proposed algorithms can also be used to solve Problem (4.3). In the upcoming experiments, we will address this problem by utilizing Algorithms 3 and 4. Additionally, we will compare our results not only with PGD, CG, NSPG, and GMRES algorithms but also with the recently proposed GG-tGKT [34, 35] and GG-LtAT [35] methods.

Example 4.3

This example studies the image restoration model with a higher noise level. We consider Problem (4.3) with \( {\mathscr {A}} \) obtained from (4.1) and (4.2) by setting \( \sigma = 3 \), \( r = 3 \), \( \alpha = 0.8 \), \( \beta = 0.1 \) and \( \gamma = 0.1 \) as inside channel and cross-channel blur, respectively. Take the \( Peppers \in \mathbb {R}^{512 \times 512 \times 3} \) and \( Sailboat \in \mathbb {R}^{512 \times 512 \times 3} \) to be the true images. And we get a series of blurred images \( {{\mathscr {B}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} +{{\mathscr {E}}} \) by adding noise \( {{\mathscr {E}}} \) with noise level \( v=10^{-2} \). The obtained images are displayed on the left of Fig. 4. The optimal regularization parameter \( \mu _{opt} = 5.1 \times 10^{-3} \) (\( \mu _{opt} = 2.9 \times 10^{-3} \)) under \(*_c\) product (t-product) was obtained by using the generalized GCV method [15].

The visual results of two color images (Peppers and Sailboat) restored by Algorithms 3 and 4 are shown in Fig. 4. We can observe that the images restored by our algorithm are very close to the original images, with clear contour and texture features.

Further, the results of comparing Algorithms 3 and 4 with other algorithms are presented in Tables 3 and 4. These tables show that Algorithms 3 and 4 achieve the same PSNR and RE values as the PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms. This suggests that all algorithms restore images to a comparable quality. However, our methods require the least CPU time compared to the other four algorithms. Therefore, the convergence rate of Algorithm 3 and are faster than PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms.

Fig. 4
figure 4

The visualization of the restoration images (Peppers and Sailboat). The first row of images is obtained under the t-product, and the second row of images is obtained under the \( *_c \) product

Table 3 The computational results for Algorithms 3, 4, PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT when the stopping criterion is reached under \( *_c \) product
Table 4 The computational results for Algorithms 3, 4, PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT when the stopping criterion is reached under t-product

5 Conclusion

In this paper, we consider a class of tensor least squares problems under the tensor-tensor product with an invertible linear transforms, which arises in image restoration. Two Paige’s Algorithms are proposed to solve this problem. The convergence theorems are also derived. Numerical experiments show that the new algorithms are feasible and effective for Problem 1.1. Two simulation experiments for the image restoration show the good performance of the new algorithms.