Abstract
In this paper, we consider a class of tensor least squares problem with an invertible linear transform, which arises in image restoration. Based on the operator-bidiagonal procedure, two Paige’s algorithms are designed to solve it. The convergence theorems of the new methods are derived. Numerical experiments are performed to illustrate the feasibility and efficiency of the new methods, including when the algorithm is tested with the synthetic data and on some image restoration problems. Comparisons with some previous methods are also given.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Throughout this paper, vectors are written in italic lowercase letters such as u, v, matrices correspond to uppercase letters, e.g., A, B, and tensors are denoted by calligraphic capital letters such as . Let \(\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}\) denote the set of \({I_1} \times {I_2} \times {I_3}\) tensors over the real field \(\mathbb {R}\). The zero tensor \( {\mathscr {O}} \) is the one with all entries being zero. A slice is a two-dimensional section of a tensor, defined by fixing all but two indices. The kth frontal slice \( \textbf{X}_{::k} \) of the third-order tensor \( {{\mathscr {X}}} \in \mathbb {R}^{{I_1} \times {I_2} \times {I_3}} \) is denoted as \( {{\mathscr {X}}}^{(k)} \). The tensor Frobenius norm \(||\mathscr {A} ||_F = \sqrt{\left\langle {{\mathscr {A}},\mathscr {A}} \right\rangle }\) is induced by the tensor inner product
where \( {{\mathscr {A}}} = (a_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} \), \( {{\mathscr {B}}} = (b_{ijk}) \in {\mathbb {R}^{{I_1} \times {I_2} \times {I_3}}} \).
In this paper, we consider the following tensor least squares problem.
Problem 1.1
Given tensors \({{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}\), \({{\mathscr {B}} } \in \mathbb {R}^{{p} \times {q} \times {n}}\), and \({{\mathscr {C}}} \in \mathbb {R}^{{m} \times {q} \times {n}}\), find a tensor \( \mathcal{\tilde{X}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \) such that
Here we set the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}\) be
The operator \( *_L \) in Problem 1.1 is a tensor-tensor product with an invertible linear transform, which was first proposed by Kernfeld, Kilmer and Aeron [27]. It is a generalization of the t-product [28] and the cosine transform product (*c product). The definitions and connections between the t-product and *c product are described in Section 2. The \( *_L \) product is also referred to as the \( \star _M \)-product family in the literature [29]. This kind of tensor-tensor product has been applied to processing of seismic data [13], color image restoration [16, 18], facial recognition [21] and data compression [41].
Problem 1.1 often arises in the image restoration. As described in [16], the 3D model of color image restoration can be expressed as the tensor expression
where \( {\mathscr {B}}\), \( {\mathscr {A}}\) and \( {\mathscr {E}} \) are the observation image, the blurring operator and the noise, respectively. And \( {\mathscr {X}} \) is the restored image to be sought. Obviously, Problem 1.1 is a generalized form of (1.2), which consider both horizontal and vertical blurring of the image. Compared with the previous image restoration models, the matrix product model [5]
and the Einstein product model [10, 31, 40]
the advantage of (1.2) is that it avoids the lack of information inherent in the expansion process of the tensor and takes into account the orderliness and correlation between the data.
The tensor equations and their least squares problems have been investigated by many scholars. Some mathematical tools, for example, the tensor generalized inverse, have been used in the literature [2, 3, 8, 11, 22, 37, 38]. The methods for solving the tensor equations and their least squares problems can be generally divided into two classes. The first class is the so-called direct methods, such as the elimination method [7] and the tensor generalized inverse methods [2, 3, 6, 8]. However, these methods do not perform well in large-scale data. The second class is the iterative methods. Wang, Xu and Duan [39] proposed the conjugate gradient method to solve the quaternion Sylvester tensor equation. Huang and Ma [23] and Xie et al. [40] solved the tensor least squares problem in Einstein product by conjugate gradient method and preprocessed conjugate gradient method, respectively. Ding and Wei [12] proposed the Newton method to solve the \( m-1 \) degree homogeneous tensor equation with the symmetric M-tensor coefficient. Guide and Ichi et al. [18] proposed the GMRES-type methods for solving the image processing problems with the t-product. Similar algorithms can also be found in the literature [9, 16, 17, 24, 39]. For the ill-posed problem, Reichel and Ugwu [34, 35] use the Golub-Kahan bidiagonalization and the Arnoldi upper Hessenberg preprocessors to solve the Tikhonov regularization problem in the image restoration model. Some other algorithms [19, 36] have been designed by involving matrixization of tensors, thus they are difficult to be used for the large-scale tensor least squares problem. However, the research results of Problem 1.1 are very few as far as we know.
In this paper, we consider the tensor least squares Problem 1.1 arising in image restoration. We first give the operator-bidiagonal procedures of \( {{\mathscr {M}}}({{\mathscr {X}}}) \), and then use them to design two Paige’s Algorithms for solving Problem 1.1. Two convergence theorems of the new methods are also given. A numerical example shows the feasibility of the Paige’s methods for solving Problem 1.1. The simulation experiments of the image restoration illustrate the feasibility and effectiveness of the new methods. Specifically, the algorithms in this paper differ from the ones described in [34, 35] in the following two ways:
-
Our proposed operator bidiagonalization process is distinguished from GG-tGKB process [35] because it provides a unified iterative framework for different linear operator bidiagonalization.
-
We derive a coefficient relation between the minimal Frobenius norm solution of Problem 1.1 and the orthogonal tensor bases of the Krylov subspace
$$\begin{aligned} {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{{{\mathscr {V}}}}_1}) = span\{ {{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1}),({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}}){{\mathscr {M}}^*}({{{{\mathscr {U}}}}_1}), \cdots ,{({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}})^{k - 1}}{{{{\mathscr {M}}}}^*}({{{{\mathscr {U}}}}_1})\}, \end{aligned}$$which allows our algorithms to use only the tensor generated by the current kth iteration. We do not need additional memory for the orthogonal bases as described in [34, 35], and this approach reduces the space complexity of the algorithm.
This paper is organized as follows. In Sect. 2, we introduce some notations and definitions. And we also summarize the connection between \( *_L \) product, t-product and \( *_c \) product. In Sect. 3, we design the Paige’s algorithms for solving Problem 1.1 and give the convergence theorems. In Sect. 4, some numerical experiments are given to illustrate that the algorithms are feasible and effective.
2 Preliminaries
In this section, we give some preliminaries, and then recall the \( *_L \) product proposed by Kernfeld, Kilmer and Aeron [27] and analyze the connection between t-product, \( *_c \) product, and \( *_L \) product.
Given \({{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}}\) and its frontal slice \( {{\mathscr {A}}}^{(i)} \), \( i = 1, 2, \cdots , n \), the operators bcirc unfold and fold can be defined as [28]
and \( fold(unfold({{\mathscr {A}}})):= {\mathscr {A}} \). Then the t-product can be defined as follows.
Definition 2.1
(t-product [28]) Let the t-product be the operator \( * \): \( \mathbb {R}^{ {m} \times {l} \times {n}} \times \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {p} \times {n}} \) between two tensors \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \), which produces a tensor of \( \mathbb {R}^{ {m} \times {p} \times {n}} \) as follows
Notice that the block matrix \( bcirc ({{\mathscr {A}}}) \) can be block diagonalized by the discrete Fourier transform (DFT), i.e.,
where \( \otimes \) is the Kronecker product, \( F_n \) is the unitary DFT matrix satisfing
Therefore, \( {{\mathscr {A}}}*{{\mathscr {B}}} \) with the t-product can be efficiently implemented by the fast Fourier transform.
Considering the discrete cosine transform instead of the discrete Fourier transform, we obtain the \( *_c \) product similar to the t-product. The block Toeplitz-plus-Hankel matrix and the operator ten can be defined as [27]
where \( E_l = \left[ {\begin{array}{*{20}{c}} {{I_l}}\\ 0\\ \vdots \\ 0 \end{array}} \right] \in \mathbb {R}^{nl \times l}\) is the first l columns of the identity matrix \( I_{nl} \) and
Definition 2.2
(\( *_c \) product [27]) Set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \) be two real tensors. Then the \( *_c \) product \( {{\mathscr {A}}} *_c {{\mathscr {B}}} \) is an \( {m} \times {p} \times {n} \) real tensor defined by
Both the t-product and the \( *_c \) product use linear transformations that block diagonalize the block matrix. If we consider each diagonal block as each frontal slice of the tensor, we can redefine the t-product and \(*_c\) product using the n-mode product.
Definition 2.3
(\( *_L \) product [27]) Let L be an invertible linear operator (see Definition 2.4), and set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \) and \( {{\mathscr {B}}} \in \mathbb {R}^{ {l} \times {p} \times {n}} \). Then the \( *_L \) product \( {{\mathscr {A}}} *_L {{\mathscr {B}}} \) is an \( {m} \times {p} \times {n} \) tensor defined by
where the face-wise product \( {{\mathscr {A}}} \bigtriangleup {{\mathscr {B}}} \) is defined by
Definition 2.4
([30]) Let \( M \in \mathbb {R}^{{n} \times {n}} \) be an invertible matrix. Then the invertible linear operator L: \( \mathbb {R}^{ {m} \times {l} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {l} \times {n}} \) is defined as
where \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \).
As described by Kernfeld, Kilmer and Aeron [27], the \( *_c \) product is an example of the \( *_L \) product, where \( M = W_{c}^{-1}C_n(I+Z) \) in the invertible linear operator L. Here \( C_n \) denote the \( n \times n \) orthogonal DCT matrix, \( W_c = diag(C_n(:,1)) \) and Z is the \( n \times n \) (singular) circulant upshift matrix. The connection between the t-product and the \( *_L \) product is
where \( W_t = diag(F_n(:,1)) \).
Using the invertible linear operator L, we can give the definition of a tensor transpose which satisfies the t-product and \( *_c \) product.
Definition 2.5
([27]) Set \( {{\mathscr {A}}} \in \mathbb {R}^{ {m} \times {l} \times {n}} \), then transpose \( {{\mathscr {A}}}^T \in \mathbb {R}^{ {l} \times {m} \times {n}} \) satisfies
We also need the following definitions for the paper.
Definition 2.6
([18]) Let \( \mathbb {V} = [{{\mathscr {V}}}_1, {{\mathscr {V}}}_2, \cdots ,{{\mathscr {V}}}_k] \in \mathbb {R}^{m \times kp \times n} \), where \( {{\mathscr {V}}}_i \in \mathbb {R}^{m \times p \times n} \), \( i = 1,2, \cdots ,k \) and \( y = [y_1, y_2, \cdots ,y_k]^T \in \mathbb {R}^k \), \( Z = [z_1,z_2, \cdots ,z_k] \in \mathbb {R}^{k \times k} \). Then the product \( \mathbb {V} \circledast y \) and \( \mathbb {V} \circledast Z \) are defined as
Definition 2.7
([18]) Let \( \mathbb {A} = [{{\mathscr {A}}}_1, {{\mathscr {A}}}_2, \cdots ,{{\mathscr {A}}}_k] \in \mathbb {R}^{m \times kl \times n} \) and \( \mathbb {B} = [{{\mathscr {B}}}_1, {{\mathscr {B}}}_2, \cdots ,{{\mathscr {B}}}_p] \in \mathbb {R}^{m \times pl \times n} \), where \( {{\mathscr {A}}}_i, {{\mathscr {B}}}_i \in \mathbb {R}^{{m} \times {l} \times {n}} \), \( i = 1,2,\cdots \). Then the product \( {\mathbb {A}}^T \lozenge {\mathbb {B}} \) is the \( k \times p \) matrix defined by
Notice that if \( \mathbb {B} = {{\mathscr {B}}} \in \mathbb {R}^{m \times l \times n} \), then the product \( {\mathbb {A}}^T \lozenge {{\mathscr {B}}} \) yields a column vector.
Definition 2.8
([26]) Let \( {\mathscr {M}} \): \( \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {q} \times {n}} \) be the linear operator. If the linear operator \( {{\mathscr {M}}}^* \): \( \mathbb {R}^{ {l} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \) satisfies
then it is called the adjoint of \( {\mathscr {M}} \).
3 Paige’s Algorithms for solving problem 1.1
In this section, we first give the operator-bidiagonal procedure of \( {{\mathscr {M}}}({{\mathscr {X}}}) \), and then use it to design the Paige’s Algorithms for solving Problem 1.1. The convergence of the new methods are also given.
3.1 The operator-bidiagonal procedure
This subsection gives the operator-bidiagonal procedure. Firstly, the adjoint operator \( {{\mathscr {M}}}^* \): \( \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \) of the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {l} \times {p} \times {n}} \rightarrow \mathbb {R}^{ {m} \times {q} \times {n}}\) in Problem 1.1 is derived.
Lemma 3.1
Set \( {{\mathscr {M}}}({{\mathscr {X}}}) = {{\mathscr {A}}}{*_L}{{\mathscr {X}}}{*_L}{{\mathscr {B}}} \). The adjoint of the linear operator \( {\mathscr {M}} \) is
Proof
Here we first prove that
Set \( M = (m_1, m_2, \cdots , m_n) \), \( M^{-1} = ({\hat{m}}_1, {\hat{m}}_2, \cdots , {\hat{m}}_n) \), \( {{\mathscr {A}}} = (a_{ijk}) \in \mathbb {R}^{m \times l \times n} \), \( {{\mathscr {X}}} = (x_{ijk}) \in \mathbb {R}^{l \times p \times n} \), \( {{\mathscr {Y}}} = (y_{ijk}) \in \mathbb {R}^{m \times p \times n} \), and set
Combining Definition 2.4, we have
Similarly, we have
where \( x_{ij}^{(i)}:= \textbf{x}_{ij:}^{T}m_i, ~ y_{ij}^{(i)}:= \textbf{y}_{ij:}^{T}m_i, ~ i = 1, 2, \cdots , n \). Then the expression for any element in \( {{\mathscr {A}}}{*_L}{{\mathscr {X}}} \) can be given as
Noting that
and by making use of the matrix inner product of each frontal slice of the tensor \( {{\mathscr {A}}}{*_L}{{\mathscr {X}}} \) and tensor \( {\mathscr {Y}} \), we get
This implies that
In a similar way, we have
Combining (3.1) and (3.2), we can obtain that
which means that
The proof is completed. \(\square \)
Combining Definition 2.8, we simplify the operator \( {\mathscr {M}} \) in Problem 1.1 using the operator-bidiagonal procedure, which is similar to the Golub-Kahan bidiagonalization technique [14] as follows.
In Algorithm 1, the scalars \( \alpha _k \) and \( \beta _k \) are chosen so that \( ||{{{\mathscr {U}}}_k}||_F = ||{{{\mathscr {V}}}_k}||_F = 1 \). It is not difficult to find that after m iterations of Algorithm 1, we have the following results
By using the definitions
the recurrence relations of Algorithm 1 can be rewritten as
Theorem 3.1
The tensors sequences \( \{{{\mathscr {V}}}_k\} \) and \( \{{{\mathscr {U}}}_k\} \) generated by Algorithm 1 are the orthonormal basis of the Krylov space \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {V}}_1}) \) and \({{{{\mathscr {K}}}}_k}({{{{\mathscr {M}}}}^*}{{{\mathscr {M}}}},{{\mathscr {U}}_1}) \), i.e.
Proof
We prove this theorem by the mathematical induction. When \( m = 1 \), we have
Assume that the result is true for \( m = k \), then for \( k+1 \), we have
Similarly, we also have
It is easy to obtain that \( ||{{\mathscr {V}}}_{k+1}||_F = ||{{\mathscr {U}}}_{k+1}||_F = 1 \), which means
The proof is completed. \(\square \)
In a similar way we can get the upper operator-bidiagonal procedure of the linear operator \( {{\mathscr {T}}} \): \( \mathbb {R}^{ {m} \times {q} \times {n}} \rightarrow \mathbb {R}^{ {l} \times {p} \times {n}} \)
where \( {\hat{{\mathscr {A}}}} \in \mathbb {R}^{ {l} \times {m} \times {n}} \) and \( {\hat{{\mathscr {B}}}} \in \mathbb {R}^{ {q} \times {p} \times {n}} \), which can be seen in the following Algorithm 2.
Similar to Algorithm 1, assume that
the recurrence relations of Algorithm 2 can be rewritten as
Theorem 3.2
The tensors sequences \( \{{{\mathscr {V}}}_k\} \) and \( \{{{\mathscr {P}}}_k\} \) generated by Algorithm 2 are the orthonormal basis of the Krylov space \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{V}}_1}) \) and \( {{{{\mathscr {K}}}}_k}({{{{\mathscr {T}}}}^*}{{{\mathscr {T}}}},{{\mathcal{P}}_1}) \), i.e.
Proof
It is similar with Theorem 3.1 and is omitted here. \(\square \)
Comparing (3.3) with (3.5), it can be found that the lower operator-bidiagonal procedure of the adjoint operator \( {{\mathscr {M}}}^* \) is equivalent to upper operator-bidiagonal procedure of the linear operator \( {\mathscr {M}} \).
3.2 New algorithms for Problem 1.1 based on the operator-bidiagonal procedure
In this subsection, we present two Paige’s Algorithms (denoted as Paige1-TTP and Paige2-TTP) based on Algorithms 1 and 2.
Set the initial tensor \( {{\mathscr {X}}}_0 = 0 \) and \( \beta _1 {{\mathscr {U}}}_1 = {{\mathscr {C}}} \). Suppose \( ||{{\mathscr {R}}}||_F^2 = \mathop {\min }\nolimits _{{{\mathscr {X}}}} ||{{{\mathscr {M}}}}({{{\mathscr {X}}}}) - {{{\mathscr {C}}}}||_F^2 \) and \( \hat{{\mathscr {X}}} \) is the solution of Problem 1.1. Then Problem 1.1 is equivalent to
and its optimality condition is
i.e.,
By constructing the approximate solution \( {{\mathscr {X}}}_k \) of Problem 1.1 as
we have
where \( {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _k}} \right) ^T} \). Combining (3.3), we can deduce that
Simplifying (3.6) by (3.4), we obtain that
Clearly, there are two cases of Problem 1.1. In the first case, the equations \( {{{\mathscr {M}}}}({{{\mathscr {X}}}}) = {{{\mathscr {C}}}} \) are consistent, i.e., \( {{\mathscr {R}}} = {{\mathscr {O}}} \). In the second case, \( {{\mathscr {R}}} \ne {{\mathscr {O}}} \), the equations are inconsistent.
We first consider the Case 1 with \( {{\mathscr {R}}} = {{\mathscr {O}}} \). According to (3.7), we have
Thus y can be obtained by
The case with \( {{\mathscr {R}}} \ne {{\mathscr {O}}} \), since \( {\mathscr {R}} \) is unknown a priori, the ith element \( \eta _{i} \) of y obviously cannot be found at the same time as \( {{\mathscr {U}}}_i \) and \( {{\mathscr {V}}}_i \) are produced. We suppose that
Together with (3.3) and (3.4), we have
Set
it follows on dividing by \( \gamma \) that
Therefore, t can be solved by
where \( {\tau _0} = 1 \). Then assume that \( {z_k} = {\left( {{\xi _1},{\xi _2}, \cdots ,{\xi _k}} \right) ^T}\) and \({w_k} = {\left( {{\omega _1},{\omega _2}, \cdots ,{\omega _k}} \right) ^T} \) such that
By \( {y_k} = {z_k} - \gamma {w_k} \), we have
i.e.,
Then, \( {{\mathscr {X}}}_k \) in Algorithm 3 can be redescribed as
It is easy to find that the residual after the kth step is
In addition to judging that \( {{\mathscr {X}}}_k \) tends to be stable, combining Algorithm 1 and Algorithm 3, we can easily find that \( \alpha _i \), \( \beta _i \), \( \xi _i \), and \( \omega _i \) become negligible. And \( \gamma \) will also tend to be stable, so we can also use \( ||{{{\gamma }}_k} - {{{\gamma }}_{k - 1}}|| \le \varepsilon \) or \(|\xi _i| \le \varepsilon \) as the stopping criterion. It also shows that Algorithm 3 will converge to the least squares solution.
Now we will give the convergence theorem for Algorithm 3.
Theorem 3.3
The sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 3 converges to the minimum Frobenius norm solution \( {{\mathscr {X}}}^* \) of Problem 1.1 in a finite number of steps.
Proof
Suppose that the sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 3 converges to \( {{\mathscr {X}}}^* \), which satisfies
where \( {{\mathscr {R}}}^* = {{\mathscr {M}}}({{\mathscr {X}}}^*) - {{\mathscr {C}}} \). By introducing an auxiliary tensor \( {{\mathscr {D}}} \in \mathbb {R}^{l \times q \times n} \) such that \( {{\mathscr {X}}}^* = {{\mathscr {M}}}^*({{\mathscr {D}}}) \). Let \( {\tilde{{\mathscr {X}}}} \) be any least squares solution of Problem 1.1 and \( {{\mathscr {Y}}} = {\tilde{{\mathscr {X}}}} - {{\mathscr {X}}}^* \). We can obtain that
which implies that \( {{\mathscr {M}}}({{\mathscr {Y}}}) = {{\mathscr {O}}} \). Then we have
The equality holds if and only if \( {{\mathscr {Y}}} = {{\mathscr {O}}} \), which implies that \( {{\mathscr {X}}}^* \) is the minimum Frobenius norm solution. \(\square \)
Next, we will give the Paige2-TTP method. The upper operator-bidiagonal procedure of the operator \( {{\mathscr {M}}} \) is equivalent to the lower operator-bidiagonal procedure of the operator \( {{\mathscr {M}}}^* \) as mentioned above. Therefore, we have
As can be seen in Algorithm 3, we construct the approximate solution as
For convenience, we choose \( {\theta _1}{{{{\mathscr {V}}}}_1} = {{\mathcal{M}}^*}({{{\mathscr {C}}}}) \). Paige [32] points out that when \( {\theta _1}{{{{\mathscr {V}}}}_1} = {{{{\mathscr {M}}}}^*}({{{\mathscr {C}}}}) \), the diagonalization process will stop when \( {\theta _{k+1}}{{\mathcal{V}}_{k+1}} ={{\mathscr {O}}} \), i.e.
Let \( {y_k} = {\left( {{\eta _1},{\eta _2}, \cdots ,{\eta _3}} \right) ^T} \). We are more interested in \( {\mathbb {V}_k} \circledast y_k \) than in the end of the bidiagonalization, which completely solves for \( y_k \) and then determines \( X_k \). We rederive the iterative relation
which implies
Notice that
By assuming that \( z = {\mathbb {P}_k}^T\lozenge ({{{\mathscr {C}}}} - {\mathcal{R}}) \), we have
Combining (3.8) we obtain that
where \( {\mathbb {W}_k} = {\mathbb {V}_k}\circledast L_k^{ - T} \) and \( {\mathbb {W}_k} = [{{{{\mathscr {W}}}}_1},{{{{\mathscr {W}}}}_2}, \cdots ,{\mathcal{W}}] \). We can determine \( {{\mathscr {W}}}_i \) sequentially by
We give the Paige2-TTP Algorithm for solving Problem 1.1 as follows.
We also give the convergence theorem for Algorithm 4.
Theorem 3.4
The sequence \( \{{{\mathscr {X}}}_k\} \) generated by Algorithm 4 converges to the minimum Frobenius norm solution \( {{\mathscr {X}}}^* \) of Problem 1.1 in a finite number of steps.
Proof
It is similar with Theorem 3.3 and is omitted here. \(\square \)
3.3 Computational complexity analysis
In this subsection, we provide the computational complexity of Algorithms 3 and 4. We note that the computational cost of the multiplication operations of \( L({{\mathscr {A}}}) = {{\mathscr {A}}} \times _{3}M \) and \( {{\mathscr {A}}} {* _L} {{\mathscr {B}}} \) are \( O(mln^2) \) and \( O(mln^2 + lpn^2 + mlpn) \), respectively. Thus the cost of computing \( {{\mathscr {U}}}_k \) and \( {{\mathscr {V}}}_k \) in Algorithm 1 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \). Similarly, the cost of computing \( {{\mathscr {V}}}_k \) and \( {{\mathscr {P}}}_k \) in Algorithm 2 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \).
The computational cost of Algorithm 3 involves mainly tensor multiplication and addition. The computational cost of each iteration of Algorithm 3 is O(lpn) , except that the cost of calculating \( {{\mathscr {U}}}_k \) and \( {{\mathscr {V}}}_k \) is the same as in Algorithm 1, which means that the computational complexity of Algorithm 3 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \). Analogously, the computational cost of Algorithm 4 is mainly posed by steps 4 and 5. Therefore, the computational cost for Algorithm 4 is \( O(mln^2 + lpn^2 + pqn^2 + \min \{ (l+q)mpn, (m+p)lqn \}) \).
4 Numerical experiments
In this section we give numerical examples and simulation experiments of image restoration to illustrate the performance of Algorithms 3 and 4. All of the tests are implemented in MATLAB R2018b with the machine precision \( 10^{-16} \) on PC (Intel(R) Core(TM) i5-1155G7), where the CPU is 2.50 GHz and the memory is 16.0 GB. The implementations of the algorithms are based on the functions from the MATLAB Tensor Toolbox developed by Bader and Kolda [1].
Example 4.1
Consider Problem 1.1 with \( m = 5 \), \( l = 4 \), \( n = 3 \), \( p = 4 \), and \( q = 5 \), where the tensors \( {{\mathscr {A}}} \) and \( {{\mathscr {B}}} \) are constructed by using the MATLAB command \( rand(I_1, I_2, I_3) \) as follow
The tensor \( {\mathscr {C}} \) of Problem 1.1 is generated by \({{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}}^* *_L {{\mathscr {B}}} \), where \( {{\mathscr {X}}}^* \) is the exact solution.
Given a random invertible linear transformation matrix
we solve Problem 1.1 by using Algorithms 3 and 4 with \( \varepsilon = 10^{-6} \). After 129 iterations, we get the solution
where
The residual errors are
The relative errors are
We also give the convergence curves of Algorithms 3 and 4 in the figure 1.
Example 1 demonstrates the feasibility of Algorithms 3 and 4 in solving Problem 1.1. To further illustrate their effectiveness, we present two simulation experiments for color image restoration as follows.
Colored 3D models for the image restoration can be modeled as tensor expressions [16]
where \( {\mathscr {B}}\), \( {\mathscr {A}}\) and \( {{\mathscr {E}}} \in \mathbb {R}^{n \times n \times 3} \) are the observation image, the blurring operator and the noise, respectively. And \( {\mathscr {X}} \) is the restored image to be sought. Each frontal slice of \( {{\mathscr {A}}}\in \mathbb {R}^{n \times n \times 3} \) is a Toeplitz matrix obtained from a matrix \( {S} \in \mathbb {R}^{ {n} \times {n} }\), where S is a two-dimensional Gaussian function
and \( {\mathscr {A}} \) is obtained from
Here \( \alpha \), \( \beta \) and \( \gamma \) are the entries of the circular matrix
obtained from [20], which satisfy \( \alpha +\beta +\gamma = 1 \). This matrix gives rise to cyclic mixing between different channels (RGB channels). We set the noise level
The noise \( {{\mathscr {E}}} \) obeys a Gaussian distribution, which has a zero mean and v variance.
We use Algorithm 3, Algorithm 4, PGD algorithm [25], CG algorithm [33], NSPG algorithm [4] and GMRES algorithm [16] to solve the above model with \( *_c \) product and t-product. The stopping criterion for all algorithms is either
or the iteration step k reached the upper limit 1000. All color images in the experiments are from the USC-SIPI image database. In the experiments, we use CPU time, relative error (RE) and peak signal-to-noise ratio (PSNR) as the measurement for the restoration effect of the color image. The relative error (RE) and peak signal-to-noise ratio (PSNR) are defined as
where \( {\mathscr {X}} \) and \( {\mathscr {Q}} \) are the restored image and the real image, respectively. The higher PSNR and the lower RE, the better restoration performance.
Example 4.2
We consider Problem 1.1 with \( {\mathscr {A}} \) obtained from (4.1) and (4.2) by setting \( \sigma = 2 \), \( r = 3 \), \( \alpha = 0.5 \), \( \beta = 0.1 \) and \( \gamma = 0.4 \) as inside channel and cross-channel blur, respectively. Take the \( House \in \mathbb {R}^{512 \times 512 \times 3} \) and \( Airplane \in \mathbb {R}^{512 \times 512 \times 3} \) to be the true images. And we get a series of blurred images \( {{\mathscr {C}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} *_L {{\mathscr {B}}} +{{\mathscr {E}}} \) by adding noise \( {{\mathscr {E}}} \) with noise level \( v=10^{-4} \), where \( {\mathscr {B}} \) is a third-order tensor with \( {{\mathscr {B}}}^{(1)} = S^T \) and \( {{\mathscr {B}}}^{(2)} = {{\mathscr {B}}}^{(3)} = 0 \).
In the implementation of the algorithms for this example, we simultaneously considered the cases of t-product and \( *_c \) product. We present visual representations of restored images obtained through various algorithms. Additionally, we magnify the local details in these images to compare the effectiveness of different restoration techniques on blurred images. We also provide numerical results for the CPU time, relative error (RE), and peak signal-to-noise ratio (PSNR) for each algorithm to compare their performance. All experimental results are listed in Figs. 2−3 and Tables 1−2.
Figures 2 and 3 show the visual effects of the restored images (House and Airplane). We can observe that the two images restored by Algorithms 3 and 4 have clearer texture details and features. This indicates that the results recovered by Algorithms 3 and 4 are of higher quality and more similar to the original images.
On the other hand, from Tables 1 and 2, we can see that our methods (Algorithms 3 and 4) obtain the highest PSNR value and the least RE value and take the least CPU time. The higher PSNR value and the lower RE value, the better recovery performance. So our methods outperform PGD, CG, NSPG and GMRES methods in PSNR and RE value.
The image restoration model is a typical ill-posed problem. When we use an algorithm to solve this problem, we often encounter the semi-convergenceFootnote 1 phenomenon. Especially, when the residual of Problem 1.1 is less than \( \Vert {{\mathscr {E}}}\Vert _F \), the tensor least squares problem will treat some of the noise as part of the image to be restored. The happening of overfitting leads to the semi-convergence phenomenon, which makes the restored image significantly different from the real image. This phenomenon is obvious in solving the image restoration model with a high noise levelFootnote 2. In this case, we add the Tikhonov regularization into Problem 1.1 to overcome this difficulty, i.e.,
Set the linear operator \( {{\mathscr {M}}}: \mathbb {R}^{ {n} \times {n} \times {3}} \rightarrow \mathbb {R}^{ {2n} \times {n} \times {3}} \) be
and the tensor \( {{\mathscr {C}}} \in \mathbb {R}^{ {2n} \times {n} \times {3}} \) be
Our proposed algorithms can also be used to solve Problem (4.3). In the upcoming experiments, we will address this problem by utilizing Algorithms 3 and 4. Additionally, we will compare our results not only with PGD, CG, NSPG, and GMRES algorithms but also with the recently proposed GG-tGKT [34, 35] and GG-LtAT [35] methods.
Example 4.3
This example studies the image restoration model with a higher noise level. We consider Problem (4.3) with \( {\mathscr {A}} \) obtained from (4.1) and (4.2) by setting \( \sigma = 3 \), \( r = 3 \), \( \alpha = 0.8 \), \( \beta = 0.1 \) and \( \gamma = 0.1 \) as inside channel and cross-channel blur, respectively. Take the \( Peppers \in \mathbb {R}^{512 \times 512 \times 3} \) and \( Sailboat \in \mathbb {R}^{512 \times 512 \times 3} \) to be the true images. And we get a series of blurred images \( {{\mathscr {B}}} = {{\mathscr {A}}} *_L {{\mathscr {X}}} +{{\mathscr {E}}} \) by adding noise \( {{\mathscr {E}}} \) with noise level \( v=10^{-2} \). The obtained images are displayed on the left of Fig. 4. The optimal regularization parameter \( \mu _{opt} = 5.1 \times 10^{-3} \) (\( \mu _{opt} = 2.9 \times 10^{-3} \)) under \(*_c\) product (t-product) was obtained by using the generalized GCV method [15].
The visual results of two color images (Peppers and Sailboat) restored by Algorithms 3 and 4 are shown in Fig. 4. We can observe that the images restored by our algorithm are very close to the original images, with clear contour and texture features.
Further, the results of comparing Algorithms 3 and 4 with other algorithms are presented in Tables 3 and 4. These tables show that Algorithms 3 and 4 achieve the same PSNR and RE values as the PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms. This suggests that all algorithms restore images to a comparable quality. However, our methods require the least CPU time compared to the other four algorithms. Therefore, the convergence rate of Algorithm 3 and are faster than PGD, CG, NSPG, GMRES, GG-tGKT and GG-LtAT algorithms.
5 Conclusion
In this paper, we consider a class of tensor least squares problems under the tensor-tensor product with an invertible linear transforms, which arises in image restoration. Two Paige’s Algorithms are proposed to solve this problem. The convergence theorems are also derived. Numerical experiments show that the new algorithms are feasible and effective for Problem 1.1. Two simulation experiments for the image restoration show the good performance of the new algorithms.
Data Availability
The data used to support the findings of this study are included within the article.
Notes
When we use an algorithm to solve the ill-posed problem, the algorithm exhibits a gradual convergence behavior for the first k iterations. However, after the kth step, the error will increase and the convergence will disappear. This is called "semi-convergence".
Set \({{\mathscr {X}}}_k\) be the kth iterative value of an algorithm. If the noise \({\mathscr {E}}\) satisfies \(\Vert {{\mathscr {E}}}\Vert _F \le \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}}_k - {{{\mathscr {B}}}}\Vert _F\), it is called as a low noise level. Conversely, if \({\mathscr {E}}\) satisfies \( \Vert {{\mathscr {E}}}\Vert _F > \Vert {{\mathscr {A}}}{*_L}{{\mathscr {X}}}_k - {{{\mathscr {B}}}}\Vert _F \), it is called as a high noise level.
References
Bader, B.W., Kolda, T.G., et al.: Tensor Toolbox for MATLAB, Version 3.2.1, https://www.tensortoolbox.org
Behera, R., Mishra, D.: Further results on generalized inverses of tensors via the Einstein product. Linear Multilinear Algebra 65, 1662–1682 (2017)
Behera, R., Sahoo, J.K., Mohapatra, R.N., et al.: Computation of generalized inverses of tensors viat-product. Numer. Linear Algebra Appl. 29, 1–23 (2022)
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10, 1196–1211 (2000)
Bouhamidi, A., Jbilou, K., Raydan, M.: Convex constrained optimization for large-scale generalized Sylvester equations. Comput. Optim. Appl. 48, 233–253 (2011)
Brazell, M., Li, N., Navasca, C., et al.: Solving multilinear systems via tensor inversion. SIAM J. Matrix Anal. Appl. 34, 542–570 (2013)
Bykov, V.I., Kytmanov, A.M., Lazman, M.Z., et al.: Elimination Methods in Polynomial Computer Algebra. Kluwer Academic Publishers, The Netherlands (1998)
Cao, Z., Xie, P.: Perturbation analysis for \(t\)-product based tensor inverse, Moore-Penrose inverse and tensor system, arXiv preprint arXiv:2107.09544 (2021)
Chen, Z., Lu, L.Z.: A projection method and Kronecker product preconditioner for solving Sylvester tensor equations. Sci. China Math. 55, 1281–1292 (2012)
Cui, L.B., Chen, C., Li, W., et al.: An eigenvalue problem for even order tensors with its applications. Linear Multilinear Algebra 64, 602–621 (2016)
Dehdezi, E.K., Karimi, S.: A fast and efficient Newton-Shultz-type iterative method for computing inverse and Moore-Penrose inverse of tensors. J. Math. Model. 9, 645–664 (2021)
Ding, W.Y., Wei, Y.M.: Solving multi-linear systems with M-tensors. J. Sci. Comput. 68, 689–715 (2016)
Ely, G., Aeron, S., Hao, N., et al.: 5D and 4D pre-stack seismic data completion using tensor nuclear norm. TNN, SEG International Exposition and Eighty-Third Annual Meeting at Houston, TX (2013)
Golub, G., Kahan, W.: Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Industr. Appl. Math. 2, 205–224 (1965)
Golub, G.H., Heath, M., Wahba, G.: Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21, 215–223 (1979)
Guide, M.E., Ichi, A.E., Jbilou, K.: Discrete cosine transform LSQR and GMRES methods for multidimensional ill-posed problems, arXiv preprint arXiv:2103.11847 (2021)
Guide, M.E., Ichi, A.E., Jbilou, K.: Discrete cosine transform LSQR methods for multidimensional ill-posed problems. J. Math. Model. 10, 21–37 (2022)
Guide, M.E., Ichi, A.E., Jbilou, K., et al.: Tensor Krylov subspace methods via the T-product for color image processing, arXiv preprint arXiv:2006.07133 (2020)
Guide, M.E., Ichi, A.E., Beik, F.P., et al.: Tensor GMRES and Golub-Kahan Bidiagonalization methods via the Einstein product with applications to image and video processing, arXiv preprint arXiv:2005.07458 (2020)
Hansen, P.C., Nagy, J.G., O’leary, D.P.: Deblurring Images: Matrices, Spectra, and Filtering. SIAM, Philadelphia (2006)
Hao, N., Kilmer, M.E., Braman, K., et al.: Facial recognition using tensor-tensor decompositions. SIAM J. Imaging Sci. 6, 437–463 (2013)
Huang, B.: Numerical study on Moore-Penrose inverse of tensors via Einstein product. Numer. Algorithms 87, 1767–1797 (2021)
Huang, B., Ma, C.: An iterative algorithm to solve the generalized Sylvester tensor equations. Linear Multilinear Algebra 68, 1175–1200 (2020)
Huang, B., Ma, C.: Global least squares methods based on tensor form to solve a class of generalized Sylvester tensor equations. Appl. Math. Comput. 369, 1–16 (2020)
Iusem, A.N.: On the convergence properties of the projected gradient method for convex optimization. Comput. Appl. Math. 22, 37–52 (2003)
Karimi, S., Dehghan, M.: Global least squares method based on tensor form to solve linear systems in Kronecker format. T. I. Meas. Control 40, 2378–2386 (2018)
Kernfeld, E., Kilmer, M., Aeron, S.: Tensor-tensor products with invertible linear transforms. Linear Algebra Appl. 485, 545–570 (2015)
Kilmer, M.E., Martin, C.D.: Factorization strategies for third-order tensors. Linear Algebra Appl. 435, 641–658 (2011)
Kilmer, M.E., Horesh, L., Avron, H., et al.: Tensor-tensor algebra for optimal representation and compression of multiway data. Proc. Natl. Acad. Sci. 118, e2015851118 (2021)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51, 455–500 (2009)
Liu, X., Wang, L., Wang, J., et al.: A three-dimensional point spread function for phase retrieval and deconvolution. Opt. Express 20, 15392–15405 (2012)
Paige, C.C.: Bidiagonalization of matrices and solution of linear equations. SIAM J. Numer. Anal. 11, 197–209 (1974)
Polyak, B.T.: The conjugate gradient method in extremal problems. USSR Comp. Math. Math. Phys. 9, 94–112 (1969)
Reichel, L., Ugwu, U.O.: The tensor Golub-Kahan-Tikhonov method applied to the solution of ill-posed problems with a \(t\)-product structure. Numer. Linear Algebra Appl. 29, e2412 (2022)
Reichel, L., Ugwu, U.O.: Tensor Krylov subspace methods with an invertible linear transform product applied to image processing. Appl. Numer. Math. 166, 186–207 (2021)
Savas, B., Eldén, L.: Krylov-type methods for tensor computations. Linear Algebra Appl. 438, 891–918 (2013)
Sun, L., Zheng, B., Bu, C., et al.: Moore-Penrose inverse of tensors via Einstein product. Linear Multilinear Algebra 64, 686–698 (2016)
Sun, L., Zheng, B., Wei, Y., et al.: Generalized inverses of tensors via a general product of tensors. Front. Math. China 13, 893–911 (2018)
Wang, Q.W., Xu, X., Duan, X.: Least squares solution of the quaternion Sylvester tensor equation. Linear Multilinear Algebra 69, 104–130 (2021)
Xie, Z.J., Jin, X.Q., Sin, V.K.: An optimal preconditioner for tensor equations involving Einstein product. Linear Multilinear Algebra 68, 886–902 (2020)
Zhang, Z., Ely, G., Aeron, S., et al.: Novel methods for multilinear data completion and de-noising based on tensor-SVD. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3842–3849 (2014)
Acknowledgements
The authors thank the editor and the reviewers for the constructive and helpful comments on the revision of this article.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
This study does not have any conflicts to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by the National Natural Science Foundation of China (Grant No. 12361079; 12201149; 12261026; 12371023), the Natural Science Foundation of Guangxi Province (Grant No. 2023GXNSFAA026067), and the Innovation Project of GUET Graduate Education (Grant No. 2022YCXS147; 2022YCXS140).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duan, XF., Zhang, YS., Wang, QW. et al. Paige’s Algorithm for solving a class of tensor least squares problem. Bit Numer Math 63, 48 (2023). https://doi.org/10.1007/s10543-023-00990-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10543-023-00990-y