1 Introduction

Low-rank matrix approximation has been widely used in scientific computing and data science, including image processing [1, 2], high dimensional data analysis [3, 4], face recognition [5] and structured matrix approximation [6, 7], etc. In theory, the rank-k truncated singular value decomposition (SVD) provides the optimal rank-k low-rank approximation in the matrix spectral norm and the Frobenius norm [8, 9].

In recent years, randomized algorithms have been developed for computing the low-rank approximation of a real or complex data matrix, which are designed for high-performance computing [3, 10,11,12,13]. For the input matrix A, the goal of a randomized algorithm is to find a low-rank matrix approximation \(A\approx QB\), which is called a matrix sketch. The matrix sketch only preserves important features and filters unnecessary information out from the data matrix A. A randomized algorithm includes two stages: in the first stage, the original high-dimensional data matrix is projected onto a low-dimensional matrix via random matrices, a deterministic algorithm is applied in the second stage to a low-dimensional projected matrix. Compared with a deterministic algorithm (e.g., the SVD and the rank-revealing QR decomposition (RRQR)), a randomized algorithm have the advantages of low complexity, fast running speed and easy implementation. Recently, many matrix-decomposition based randomized algorithms have been proposed for the low-rank matrix approximation such as randomized SVD [3], randomized LU [14], randomized QR decomposition with column pivoting (QRCP) [15,16,17], and randomized QLP [18], etc.

The concept of quaternion was introduced by Hamilton in 1843 [19]. Quaternion and quaternion matrices have been applied in various applications such as quantum mechanics [20], image processing [21,22,23] and so on. The quaternion toolbox for MATLAB (QTFM), which was developed by Sangwine and Le Bihan [24], allows calculation with quaternion matrices (e.g., quaternion QR factorization [25] and quaternion SVD (QSVD) [26]). By exploring the real counterpart of a quaternion matrix with different real transformations, various structure-preserving algorithms have been proposed for different quaternion matrix decompositions such as quaternion LU factorization [27, 28], quaternion QR decomposition [29, 30], QSVD [29, 31], and quaternion Cholesky decomposition [32], etc.

The low-rank quaternion matrix approximation has drawn growing attention in many applications including color image processing [33] and signal processing [34]. Obviously, the truncated QSVD provides the optimal low-rank approximation of a quaternion matrix in a least-squares sense under the Frobenius norm [8]. However, it is expensive to compute the QSVD of a large-scale quaternion data matrix. In [35], Liu et al. presented a randomized QSVD algorithm for computing a low-rank approximation of a large-scale quaternion matrix, where the matrix approximation error analysis was also studied. Compared with QSVD, the randomized QSVD can improve the computational efficiency surprisingly. This is the only work on quaternion randomized algorithms so far.

Although a quaternion-matrix based method is shown to be more effective [22] than the hot tensor-based algorithm in color image processing, randomized algorithms of quaternion matrices do not get enough attention as randomized algorithms of real matrices. To enrich the application and efficient numerical algorithms of quaternion matrices, in this paper, we propose a randomized quaternion QLP decomposition algorithm for computing a low-rank approximation to a quaternion data matrix. This is motivated by the two papers due to Liu et al. [35] and Stewart [36]. As an approximate SVD, the QLP decomposition was introduced by Stewart [36]. The QLP decomposition is essentially two-step QRCP, i.e., for a real matrix A, we obtain an upper triangular factor \(R_0\) from the QRCP \(AP_{0}=Q_{0}R_{0}\) of A and then we get a lower triangular matrix L from the QRCP \(R_{0}^{T}P_{1}=Q_{1}L^{T}\) of \(R_0^T\), which results in \(A=QLP^T\), where \(Q=Q_0P_1\) and \(P=P_0Q_1\) with \(Q_0\) and \(Q_1\) being orthogonal and \(P_0\) and \(P_1\) being permutation matrices. As noted in [36], the singular values of A can be tracked by the diagonal entries of L with considerable fidelity. We see that the QLP decomposition needs only no more than twice the cost required for a QRCP.

In this paper, we directly apply the QLP decomposition to a quaternion data matrix and we obtain a quaternion QLP (QQLP) decomposition. Then, a good low-rank approximation of a quaternion data matrix can be obtained by truncating the final QQLP decomposition. However, the QQLP decomposition for a large quaternion data matrix might be very costly. To enhance the efficiency, we develop a randomized QQLP algorithm for computing a low-rank approximation. Based on quaternion probability theory and technical handling in numerical algebra, our entire and novel work includes

  • The convergence properties of QQLP. In [37], Huckaby and Chan presented the convergence results for the real QLP decomposition. With new technical handling, we give new convergence results for the QQLP decomposition. These theoretical results show that the singular values of the diagonal blocks of L approximate those of A quadratically in the gap ratio of neighboring singular values. When the quaternion data matrix degenerates to a real matrix, the corresponding results deliver slightly tighter error bounds than those in [37].

  • Error analysis of randomized QQLP. In [35], the authors only considered the matrix approximation error analysis for randomized quaternion SVD. Our error analysis of randomized QQLP includes the matrix approximation error analysis, and the singular value approximation error analysis as well. The error analysis results illustrate that the randomized QQLP algorithm computes low-rank matrix approximations and approximate singular values up to some constants depending on the dimension of data matrix and the gap ratio from optimal.

The rest of this paper is organized as follows. In Sect. 2 we review some preliminary results on quaternion and quaternion matrices. In Sect. 3 a randomized quaternion QLP decomposition algorithm is proposed for computing a low-rank approximation to a quaternion data matrix. In Sect. 4, convergence analysis of QQLP is studied. Both the matrix approximation error analysis and the singular value approximation error analysis of randomized QQLP are presented. In Sect. 5 we report some numerical experiments to demonstrate the effectiveness of the proposed algorithm. Finally, some concluding remarks are presented in Sect. 6.

2 Preliminaries

In this section, we review some preliminary results on quaternions and quaternion matrices.

2.1 Notation

Let \({\mathbb {R}}\) and \({\mathbb {Q}}\) be the set of all real numbers and the set of all quaternions, respectively. Let \({\mathbb {R}}^{m\times n}\) and \({\mathbb {Q}}^{m\times n}\) be the set of all \(m\times n\) real matrices and the set of all \(m\times n\) quaternion matrices, respectively. Denote by \(\Vert \cdot \Vert _2\) and \(\Vert \cdot \Vert _F\) the matrix spectral norm and the Frobenius norm, respectively. The superscripts “\(\cdot ^{T}\)”, “\(\cdot ^*\)”, and “\(\cdot ^{\dag }\)" denote the transpose, the conjugate transpose, and the Moore-Penrose inverse of a matrix accordingly. Let \(\sigma _{\max }(A)=\sigma _{1}(A)\ge \sigma _{2}(A)\ge \cdots \ge \sigma _{\ell }(A)=\sigma _{\min }(A)\ge 0\) denote the singular values of an \(m\times n\) matrix A, where \(\ell =\min \{m,n\}\). Denote by \({\mathcal {R}}(A)\) the subspaces spanned by the column vectors of a matrix A. Let \(I_n\) be the identity matrix of order n. The symbol “\(\otimes \)" means the Kronecker product. Finally, let \({\mathbb {P}}(\cdot )\) and \({\mathbb {E}}(\cdot )\) denote the probability of a random event and the expectation of a random variable, respectively.

2.2 Quaternion Matrix and its Properties

A quaternion \(a\in {\mathbb {Q}}\) is defined by

$$\begin{aligned} a=a_0+a_1{\mathbf {i}}+a_2{\mathbf {j}}+a_3{\mathbf {k}}, \end{aligned}$$

where \(a_0\), \(a_1\), \(a_2\), \(a_3\in {{\mathbb {R}}}\) and the quaternion units \(\mathbf {i}\), \(\mathbf {j}\) and \(\mathbf {k}\) satisfy the following properties:

$$\begin{aligned} {\mathbf {i}}^2={\mathbf {j}}^2={\mathbf {k}}^2=-1,\quad \mathbf {ij}=-\mathbf {ji}={\mathbf {k}} ,\quad \mathbf {jk}=-\mathbf {kj}={\mathbf {i}},\quad \mathbf {ki}=-\mathbf {ik}={\mathbf {j}}. \end{aligned}$$

The quaternion a is called a pure quaternion if \(a_0=0\). The conjugate of a is given by

$$\begin{aligned} {\bar{a}}=a_0-a_1{\mathbf {i}}-a_2{\mathbf {j}}-a_3{\mathbf {k}}. \end{aligned}$$

For two quaternions \(a=a_0+a_1\mathbf{i}+a_2\mathbf{j}+a_3\mathbf{k}\in {\mathbb {Q}}\) and \(b=b_0+b_1\mathbf{i}+b_2\mathbf{j}+b_3\mathbf{k}\in {\mathbb {Q}}\), their product (i.e., the Hamilton product) is given by

$$\begin{aligned} ab= & {} (a_0b_0-a_1b_1-a_2b_2-a_3b_3) +(a_0b_1+a_1b_0+a_2b_3-a_3b_2)\mathbf{i}\\&+(a_0b_2-a_1b_3+a_2b_0+a_3b_1)\mathbf{j}+(a_0b_3+a_1b_2-a_2b_1+a_3b_0)\mathbf{k}. \end{aligned}$$

The modulus of \(a=a_0+a_1\mathbf{i}+a_2\mathbf{j}+a_3\mathbf{k}\in {\mathbb {Q}}\) is defined by

$$\begin{aligned} \quad |a|=\sqrt{a_0^2+a_1^2+a_2^2+a_3^2}=\sqrt{a{\bar{a}}}=\sqrt{{\bar{a}}a}. \end{aligned}$$

It is worth noting that quaternion multiplication does not satisfy the commutative law but satisfies the associate law. Therefore, \({\mathbb {Q}}\) is an associate division algebra over \({{\mathbb {R}}}^4\).

Let \(A=A_0+A_1{\mathbf {i}}+A_2{\mathbf {j}}+A_3{\mathbf {k}}\in {{\mathbb {Q}}}^{m\times n}\) be a quaternion matrix, where \(A_0\), \(A_1\), \(A_2\), \(A_3\in {\mathbb {R}}^{m\times n}\). Then the conjugate transpose of A is given by \(A^*=A_0^T-A_1^T{\mathbf {i}}-A_2^T{\mathbf {j}}-A_3^T{\mathbf {k}}\). The product of two quaternion matrices \(A=A_0+A_1{\mathbf {i}}+A_2{\mathbf {j}}+A_3{\mathbf {k}}\in {{\mathbb {Q}}}^{m\times s}\) and \(B=B_0+B_1{\mathbf {i}}+B_2{\mathbf {j}}+B_3{\mathbf {k}}\in {{\mathbb {Q}}}^{s\times n}\) is given by

$$\begin{aligned} AB= & {} (A_0B_0-A_1B_1-A_2B_2-A_3B_3) +(A_0B_1+A_1B_0+A_2B_3-A_3B_2)\mathbf{i}\\&+(A_0B_2-A_1B_3+A_2B_0+A_3B_1)\mathbf{j}+(A_0B_3+A_1B_2-A_2B_1+A_3B_0)\mathbf{k}\in {{\mathbb {Q}}}^{m\times n}. \end{aligned}$$

We recall the inner product and norms of quaternion vectors and matrices [38].

Definition 2.1

The inner product of two quaternion vectors \(\mathbf{x}=[\mathbf{x}_1,\ldots ,\mathbf{x}_n]^T\in {{\mathbb {Q}}}^n\) and \(\mathbf{y}=[\mathbf{y}_1,\ldots ,\mathbf{y}_n]^T\in {{\mathbb {Q}}}^n\) is defined by

$$\begin{aligned} \langle \mathbf{x},\mathbf{y}\rangle =\sum _{i=1}^{n}{\bar{\mathbf{y}}}_i\mathbf{x}_i \end{aligned}$$

with the induced 2-norm

$$\begin{aligned} \Vert \mathbf{x}\Vert _2=\Big (\sum _{i=1}^{n}|\mathbf{x}_i|^2\Big )^{1/2},\quad \forall \mathbf{x}=[\mathbf{x}_1,\ldots ,\mathbf{x}_n]^T\in {{\mathbb {Q}}}^n. \end{aligned}$$

The inner product of two quaternion matrices \(A=[\mathbf{a}_{ij}]\in {{\mathbb {Q}}}^{m\times n}\) and \(B=[\mathbf{b}_{ij}]\in {{\mathbb {Q}}}^{m\times n}\) is defined by

$$\begin{aligned} \langle A,B\rangle =\sum _{i=1}^{m}\sum _{j=1}^{n}{\bar{\mathbf{b}}}_{ij}\mathbf{a}_{ij}; \end{aligned}$$

with the induced Frobenius norm

$$\begin{aligned} \Vert A\Vert _F=\Big (\sum _{i=1}^{m}\sum _{j=1}^{n}|\mathbf{a}_{ij}|^2\Big )^{1/2},\quad \forall A=[\mathbf{a}_{ij}]\in {{\mathbb {Q}}}^{m\times n}. \end{aligned}$$

In addition, the spectral norm of the quaternion matrix \(A=[\mathbf{a}_{ij}]\in {{\mathbb {Q}}}^{m\times n}\) is defined by

$$\begin{aligned} \Vert A\Vert _2=\max _{\mathbf{x}\in {{\mathbb {Q}}}^n, \Vert \mathbf{x}\Vert _2=1}\Vert A\mathbf{x}\Vert _2. \end{aligned}$$

From Definition 2.1, it is easy to check the following unitary invariance

$$\begin{aligned} \Vert Z_1\mathbf{x}\Vert _2=\Vert \mathbf{x}\Vert _2 \quad \Vert Z_1AZ_2\Vert _2=\Vert A\Vert _2,\quad \Vert Z_1AZ_2\Vert _F=\Vert A\Vert _F \end{aligned}$$

and the following inequalities

$$\begin{aligned} \Vert AB\Vert _F\le \Vert A\Vert _2\Vert B\Vert _F,\quad \Vert AB\Vert _F\le \Vert A\Vert _F\Vert B\Vert _2, \end{aligned}$$

for all consistent quaternion matrices A and B and unitary quaternion matrices \(Z_1\) and \(Z_2\).

Next, we recall some properties of the real counterpart of quaternion matrices [30, 39]. We recall that a real counterpart of \(A=A_0+A_1{\mathbf {i}}+A_2{\mathbf {j}}+A_3{\mathbf {k}}\in {{\mathbb {Q}}}^{m\times n}\) is given by [40]:

$$\begin{aligned} \Upsilon _A=\left[ \begin{array}{cccc} A_0 &{} -A_1 &{} -A_2 &{} -A_3 \\ A_1 &{} A_0 &{} -A_3 &{} A_2 \\ A_2 &{} A_3 &{} A_0 &{} -A_1 \\ A_3 &{} -A_2 &{} A_1 &{} A_0 \\ \end{array} \right] \in {\mathbb {R}}^{4m\times 4n}. \end{aligned}$$
(2.1)

The matrix \(\Upsilon _A\) has special algebraic symmetry structure that is preserved under some operations below.

Lemma 2.2

([40]). Let A, \(B\in {\mathbb {Q}}^{m\times n}\), \(C\in {\mathbb {Q}}^{n\times s}\), and \(\tau \in {\mathbb {R}}\). Then

  1. (i)

    \(\Upsilon _{A+B}=\Upsilon _A+\Upsilon _B\), \(\Upsilon _{\tau A}=\tau \Upsilon _A\), \(\Upsilon _{AC}=\Upsilon _A\Upsilon _C\).

  2. (ii)

    \(\Upsilon _{A^*}=\Upsilon _A^T\).

  3. (iii)

    \(A\in {\mathbb {Q}}^{m\times m}\) is a unitary matrix if and only if \(\Upsilon _A\) is an orthogonal matrix.

  4. (iv)

    \(\Vert A\Vert _F=\frac{1}{2}\Vert \Upsilon _A\Vert _F\) and \(\Vert A\Vert _2=\Vert \Upsilon _A\Vert _2\).

The following result on the QSVD follows from [41, Theorem 7.2].

Lemma 2.3

([41]). Let \(A\in {\mathbb {Q}}^{m\times n}\). Then there exist two unitary quaternion matrices \(U_m\in {\mathbb {Q}}^{m\times m}\) and \(V_n\in {\mathbb {Q}}^{n\times n}\) such that

$$\begin{aligned} A=U_m\Sigma V_n^*,\quad \Sigma ={\mathrm{diag}}(\sigma _1(A),\ldots ,\sigma _\ell (A))\in {{\mathbb {Q}}}^{m\times n},\quad \ell =\min \{m,n\}, \end{aligned}$$
(2.2)

where \(\sigma _{1}(A)\ge \sigma _{2}(A)\ge \cdots \ge \sigma _\ell (A)\ge 0\).

It is easy to see that, for any quaternion matrix A, we have \(\Vert A\Vert _2=\sigma _{\max }(A)\). We must point out that there exist some fast structure-preserving algorithms for computing the QSVD [29, 31].

3 Randomized Quaternion QLP Decomposition

In this section, we first present the quaternion QLP decomposition. Then we propose a randomized quaternion QLP decomposition algorithm for computing a low-rank approximation of a large-scale quaternion data matrix.

3.1 Quaternion QLP Decomposition

As described in Sect. 1, the QLP of a real \(m\times n\) \((m\ge n)\) matrix A applies two consecutive QRCPs to A [36], i.e., computes the QRCP of A first, and then applies the same procedure to the transpose of the obtained R-factor. The convergence results in [37] show that the singular values of the diagonal blocks of L in the QLP decomposition can track the singular values of A under some circumstances, and it has a quadratic relative error with respect to the gap ratio.

We now present the quaternion QLP decomposition. Let \(A\in {\mathbb {Q}}^{m\times n}\) with \(m\ge n\). Then, by using the following two consecutive quaternion QRCPs (qrcpQ)

$$\begin{aligned} AP_0=Q_0R_0\quad \text{ and }\quad R_0^{*}P_1=Q_1L^*, \end{aligned}$$

we obtain the following quaternion QLP decomposition of A:

$$\begin{aligned} A=QLP^{*},\quad Q=Q_{0}P_{1}, \quad P=P_{0}Q_{1}, \end{aligned}$$

where \(P_{0}\in {{\mathbb {Q}}}^{n\times n}\) and \(P_{1}\in {{\mathbb {Q}}}^{m\times m}\) are two permutation matrices, \(Q_{0}\in {{\mathbb {Q}}}^{m\times m}\) and \(Q_{1}\in {{\mathbb {Q}}}^{n\times n}\) are two unitary matrices, \(R_0\in {{\mathbb {Q}}}^{m\times n}\) is an upper triangular matrix, and \(L\in {{\mathbb {Q}}}^{m\times n}\) is a lower triangular matrix.

Based on the above descriptions, the quaternion QLP decomposition of a quaternion matrix can be described as Algorithm 3.1.

figure a

One may use the toolbox QTFM to implement Algorithm 3.1, where the quaternion operations are involved. We can carry out Algorithm 3.1 via a real structure-preserving quaternion QR algorithm within real operations (e.g., [29, 30]).

3.2 Randomized Quaternion QLP Decomposition

In this subsection, we propose a randomized quaternion QLP decomposition algorithm for computing a low-rank approximation of a large-scale quaternion data matrix.

We first recall the randomized QLP decomposition of a real matrix in [18]. Let \(A\in {{\mathbb {R}}}^{m\times n}\) be a given data matrix. Then the randomized QLP decomposition of A can be divided into two stages. The first stage aims to construct an approximate basis for the range of A or \(A^T\), where the basis matrix V has only a few orthonormal columns and \(A\approx VV^TA\) or \(A^T\approx VV^TA^T\). The second stage is to compute an approximated rank-k QLP decomposition of A by using V. Specifically, we first form the matrix product \(Y=A\Omega \) or \(Y=\Omega A\) by applying random sampling to A via a random test matrix \(\Omega \) and then we construct an orthonormal basis for the range of Y or \(Y^T\), e.g., using the QR factorization \(Y=QR\) or \(Y^T=QR\). Afterwards we form the matrix \(B=V^TA\) or \(B=AV\) and compute the QLP decomposition of B: \(B={\widehat{Q}}{\widehat{L}}{\widehat{P}}^T\). The final approximate randomized QLP decomposition of A is given by \(A\approx (V{\widehat{Q}}){\widehat{L}}{\widehat{P}}^T\) or \(A\approx {\widehat{Q}}{\widehat{L}}(V{\widehat{P}})^T\).

As in [35], we use the quaternion random test matrix defined by

$$\begin{aligned} \Omega =\Omega _0+\Omega _1{\mathbf {i}}+\Omega _2{\mathbf {j}}+\Omega _3{\mathbf {k}}\in {{\mathbb {Q}}}^{m\times n}, \end{aligned}$$
(3.1)

where \(\Omega _0,\Omega _1,\Omega _2,\Omega _3\in {{\mathbb {R}}}^{m\times n}\) are all real independent standard Gaussian random matrices. Then we can develop a randomized QLP decomposition algorithm for computing a low-rank approximation of a quaternion data matrix. This can be described as Algorithm 3.2 with MATLAB notations, where the power iterations \(Y=\Omega A(A^*A)^q\) in Steps 3–6 assert a good low-rank basis of \({{{\mathcal {R}}}}(A^*\Omega ^*)\). In practice, it is enough if \(q=1\) or \(q=2\).

figure b

Remark 3.1

In Algorithm 3.2, by using the row sampling, we get an approximate basis \(V\in {{\mathbb {Q}}}^{n\times l}\) for the range of \(A^*\) and thus \(B=AV\) is an \(m\times l\) matrix. Then in Steps 9–10, we can compute an economy-size QQLP decomposition of the \(m\times l\) matrix \(B=AV\) since \(l<m\). However, if we use the column sampling to get an approximate basis \(V\in {{\mathbb {Q}}}^{m\times l}\) for the range of A and thus \(B=V^*A\) is an \(l\times n\) matrix. In this case, the QQLP decomposition of the \(l\times n\) matrix \(B=V^*A\) is more expensive, since more pivoting selection and update on trailing column vectors are needed in the quaternion QRCP.

Remark 3.2

In Step 7 of Algorithm 3.2, we need to compute the quaternion QR decomposition of \(Y^*\), where the Q-factor \(V\in {\mathbb {Q}}^{n\times l}\) should have orthonormal columns. This process can be implemented by using the quaternion modified Gram-Schmidt (QMGS) algorithm [9, Algorithm 5.2.6] or the quaternion Householder QR decomposition (QHQR) [29] to the quaternion matrix \(Y^*\). In the QHQR method, we need to explicitly form V. As in [9, p. 238], by using the Householder vectors \(\mathbf{v}^{(1)},\ldots ,\mathbf{v}^{(l)}\), we can form the matrix V using MATLAB notations as follows:

$$\begin{aligned} \begin{aligned}&V=I_n(:,1:l)\\&{\textbf {for}}\quad j=l:-1:1\\&\quad \beta _{j}=2/\Vert \mathbf{v}^{(j)}\Vert _2^2\\&\quad V(j:n,j:l)=V(j:m,j:l)-(\beta _j\mathbf{v}^{(j)})((\mathbf{v}^{(j)})^*V(j:n,j:l))\\&{\textbf {end}} \end{aligned} \end{aligned}$$

The QHQR method can also be used to form \(Q_0\) in Step 9 of Algorithm 3.2.

Remark 3.3

We must point out that, in Algorithm 3.2, the data matrix A is accessed twice when the power iteration number \(q=0\). However, when the data is stored outside the core memory, the cost of accessing the data is very expensive. We can access the data once if it is generated in the form of stream data. In this case, in Step 8 of Algorithm 3.2, we can replace \(B=AV\) by \(B={\widetilde{Y}}(V^*{\widetilde{\Omega }})^{\dag }\), where \({\widetilde{Y}}=A{\widetilde{\Omega }}\) with \({\widetilde{\Omega }}\in {\mathbb {Q}}^{n\times {\tilde{l}} }(l \le {\tilde{l}}\le \min \{m,n\})\) being a quaternion random test matrix. This is a single-pass version [3].

Remark 3.4

In general, it is difficult to give the suitable target rank for Algorithm 3.2. However, we can determine the target rank k, an \(m\times k\) orthonormal quaternion matrix \({\widetilde{V}}_k\) and a \(k\times n\) quaternion matrix \(B_k={\widetilde{V}}_k^*A\) such that \(\Vert A-{\widetilde{Q}}_kB_k\Vert \le \varepsilon \) for a prescribed accuracy \(\varepsilon >0\). This can be implemented by adaptively in a greedy fashion (see [42, Algorithm 12.2.1]).

Algorithm 3.2 can be implemented by using the toolbox QTFM, but the computational efficiency of this implementation method is relatively low. To improve the efficiency, one may carry out Algorithm 3.2 using real structure-preserving operations. For example, in Steps 7 and 9–10, Algorithm 3.2 can be more efficient by using the fast real structure-preserving quaternion QR algorithm [30] with/without column pivoting.

4 Theoretical Analysis

In this section, we provide theoretical analysis of Algorithms 3.1 and 3.2, including the convergence analysis of QQLP, and the matrix approximation error analysis and singular value approximation error analysis of RQQLP as well.

4.1 Improved Convergence Results of QQLP Decomposition

In [37], Huckaby and Chan gave the convergence analysis of the QLP decomposition for a real matrix. In this section, we extend the relevant results to quaternion matrices and give tighter error bounds.

We start from the right eigenvalue inequality of the Hermitian quaternion matrix sum. It is worth noting that in general a square quaternion matrix has left and right eigenvalues with distinct values and properties [41], due to the non-commutative multiplication of quaternions. However, the two kinds of eigenvalues of a quaternion Hermitian matrix coincide to be the same real number [40], and any right eigenvalue of a quaternion Hermitian matrix A is also an eigenvalue of the symmetric matrix \(\Upsilon _{A}\) with multiplicity four. With this, we obtain the following eigenvalue inequality of quaternion matrices.

Lemma 4.1

If \(A,B\in {\mathbb {Q}}^{n\times n}\) are quaternion Hermitian matrices, then

$$\begin{aligned} \lambda _i(A)+\lambda _n(B)\le \lambda _i(A+B)\le \lambda _i(A)+\lambda _1(B), \end{aligned}$$

for \(i=1,\ldots ,n\), where \(\lambda _i(A)\) denotes the i-th largest right eigenvalue of A.

Proof

According to [9, Theorem 8.1.5], the inequality

$$\begin{aligned} \lambda _i({\tilde{A}})+\lambda _n({\tilde{B}})\le \lambda _i({\tilde{A}}+{\tilde{B}})\le \lambda _i({\tilde{A}})+\lambda _1({\tilde{B}}),\quad i=1,\ldots ,n \end{aligned}$$

holds for any real matrices \({\tilde{A}},{\tilde{B}}\in {\mathbb {R}}^{n\times n}\). Applying this result to \(\Upsilon _A\) and \(\Upsilon _B\) leads to Lemma 4.1. \(\square \)

On the singular value inequalities for the quaternion matrix sum and product, we have the following results.

Lemma 4.2

Let \(A,B\in {\mathbb {Q}}^{m\times s}\) and \(C\in {{\mathbb {Q}}}^{s\times n}\). Then we have

$$\begin{aligned} |\sigma _{i}(A+B)-\sigma _{i}(A)|\le \sigma _{\max }(B), \end{aligned}$$

for \(i=1,\ldots ,\min \{m,s\}\) and

$$\begin{aligned} \sigma _{\min }(A)\sigma _i(C)\le \sigma _i(AC)\le \sigma _{\max }(A)\sigma _i(C), \end{aligned}$$

for \(i=1,\ldots ,\min \{m,s,n\}\).

Proof

It is well known that

$$\begin{aligned} |\sigma _{i}({\tilde{A}}+{\tilde{B}})-\sigma _{i}({\tilde{A}})|\le \sigma _{\max }({\tilde{B}}), \quad i=1,\ldots ,\min \{m,s\} \end{aligned}$$

for \({\tilde{A}},{\tilde{B}}\in {\mathbb {R}}^{m\times s}\) [43, Theorem 3.3.16] and

$$\begin{aligned} \sigma _{\min }({\tilde{A}})\sigma _i({\tilde{C}})\le \sigma _i({\tilde{A}}{\tilde{C}})\le \sigma _{\max }({\tilde{A}})\sigma _i({\tilde{C}}), \quad i=1,\ldots ,\min \{m,s,n\} \end{aligned}$$

for \({\tilde{A}}\in {\mathbb {R}}^{m\times s}\) and \({\tilde{C}}\in {\mathbb {R}}^{s\times n}\) [44, Proposition 9.6.4]. Notice that each singular value of A is also that of \(\Upsilon _A\) with multiplicity four. Thus Lemma 4.2 can be established by applying above relations to \(\Upsilon _A\), \(\Upsilon _B\) and \(\Upsilon _C\). \(\square \)

Using Lemma 4.2, we obtain the following result.

Lemma 4.3

Let \(A\in {\mathbb {Q}}^{m\times n}\) and \({{{\mathcal {U}}}}_1\in {\mathbb {Q}}^{m\times s}, {{{\mathcal {V}}}}_1\in {\mathbb {R}}^{n\times t}\) with orthonormal columns (\(s\le m, t\le n\)). Then \(\sigma _{j}(A)\ge \sigma _{j}({{{\mathcal {U}}}}_1^{*}A{{{\mathcal {V}}}}_1)\) for \(j=1,\ldots , \min \{s,t\}\).

Now we set to analyze the interior singular value approximation errors for the QLP decomposition of a quaternion matrix. As noted in [36, 37], for the QLP decomposition, the pivoting in the first QRCP is indispensable while the pivoting in the second QRCP simply avoid certain contrived counterexamples. For simplification, we assume that there is no pivoting in the second QRCP of the QQLP decomposition for the matrix B generated by Algorithm 3.2.

Theorem 4.4

Let \(A\in {\mathbb {Q}}^{m\times n}\) and \(\sigma _{k}(A)>\sigma _{k+1}(A)\) for some \(k<\ell =\min \{m,n\}\). Let \(R_0\) be the R-factor in the QRCP of A, \(AP_{0}=Q_{0}R_{0}\), where \(Q_0\in {{\mathbb {Q}}}^{m\times m}\) and \(R_0\in {{\mathbb {Q}}}^{m\times n}\), and let \(L^*\) be the R-factor in the QR of \(R_0^*\), \(R_{0}^{*}=Q_{1}L^{*}\), where \(Q_1\in {{\mathbb {Q}}}^{n\times n}\) and \(L\in {{\mathbb {Q}}}^{m\times n}\). Partition \(R_{0}\) and L as

$$\begin{aligned} R_{0}=\left[ \begin{array}{cc} R_{11} &{} R_{12} \\ 0 &{} R_{22} \\ \end{array} \right] \quad \text{ and }\quad L=\left[ \begin{array}{cc} L_{11} &{} 0 \\ L_{21} &{} L_{22} \\ \end{array} \right] , \end{aligned}$$

where \(R_{11}\), \(L_{11}\in {\mathbb {Q}}^{k\times k}\). If \(\Vert R_{22}\Vert _{2}\le \sqrt{(k+1)(n-k)}\cdot \sigma _{k+1}(A)\), \(\sigma _{k}(R_{11})\ge \frac{\sigma _{k}(A)}{\sqrt{k(n-k+1)}}\), and \(\rho =\frac{\Vert R_{22}\Vert _{2}}{\sigma _{k}(R_{11})}<1\), then for \(i=1,\ldots ,\ell -k\),

$$\begin{aligned} \frac{\sigma _{j}(L_{22})-\sigma _{k+j}(A)}{\sigma _{k+j}(A)}\le \left( \frac{\sigma _{k+1}(A)}{\sigma _{k}(A)}\right) ^{2} {\mathcal {O}}\Big (\frac{n^{5/2}\Vert R_{11}^{-1}R_{12}\Vert _2^2}{1-\rho _1^2}\Big ) \end{aligned}$$
(4.1)

and for \(j=1,\ldots ,k\),

$$\begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)}\le \left( \frac{\sigma _{k+1}(A)}{\sigma _{k}(A)} \right) ^{2}{\mathcal {O}}\Big (\frac{n^{5/2}\Vert R_{11}^{-1}R_{12}\Vert _2^2}{1-\rho _1^2}\Big ), \end{aligned}$$
(4.2)

where \(\rho _{1}=\frac{\Vert L_{22}\Vert _{2}}{\sigma _{k}(L_{11})}\le \rho <1\).

Proof

We first show that

$$\begin{aligned}&\rho _1=\frac{\Vert L_{22}\Vert _2}{\sigma _{k}(L_{11})}\le \rho =\frac{\Vert R_{22}\Vert _{2}}{\sigma _{k}(R_{11})}<1, \end{aligned}$$
(4.3a)
$$\begin{aligned}&\sigma _j(L_{11})\ge \sigma _j(R_{11}),\quad j=1,\ldots ,k, \end{aligned}$$
(4.3b)
$$\begin{aligned}&\Vert L_{21}\Vert _2 \le \rho \Vert R_{12}\Vert _2<\Vert R_{12}\Vert _2, \end{aligned}$$
(4.3c)
$$\begin{aligned}&\Vert L_{21}L_{11}^{-1}\Vert _2\le \rho \Vert R_{11}^{-1}R_{12}\Vert _2<\Vert R_{11}^{-1}R_{12}\Vert _2, \end{aligned}$$
(4.3d)
$$\begin{aligned}&\sigma _j(L_{22})\le \sigma _j(R_{22}),\quad j=1,\ldots ,\ell -k, \end{aligned}$$
(4.3e)
$$\begin{aligned}&\sigma _j(L_{22})\ge \sigma _j(R_{22})(1-\Vert R_{11}^{-1}R_{12}\Vert _2^2)^{1/2},\quad j=1,\ldots ,\ell -k. \end{aligned}$$
(4.3f)

By assumption, we have \(AP_0=Q_0R_0\) and \(\sigma _{k}(A)>\sigma _{k+1}(A)\). Thus \(\sigma _k(R_0)>\sigma _{k+1}(R_0)\) and \(\sigma _j(A)=\sigma _j(R_0)\) for \(j=1,\ldots ,\ell \). By hypothesis, \(R_0^*=Q_1L^*\), i.e., \(R_0Q_1=L\), and hence \(R_0R_0^*=LL^*\), i.e.,

$$\begin{aligned} \left[ \begin{array}{cc} R_{11} &{} R_{12} \\ 0 &{} R_{22} \\ \end{array} \right] \left[ \begin{array}{cc} R_{11}^* &{} 0 \\ R_{12}^* &{} R_{22}^* \\ \end{array} \right] =\left[ \begin{array}{cc} L_{11} &{} 0 \\ L_{21} &{} L_{22} \\ \end{array} \right] \left[ \begin{array}{cc} L_{11}^* &{} L_{21}^* \\ 0 &{} L_{22}^* \\ \end{array} \right] . \end{aligned}$$
(4.4)

It follows from (4.4) that

$$\begin{aligned} R_{11}R_{11}^*+R_{12}R_{12}^*=L_{11}L_{11}^*. \end{aligned}$$
(4.5)

By Lemma 4.1 and (4.5) we obtain

$$\begin{aligned} \sigma _j^2(L_{11})=\lambda _j(L_{11}L_{11}^*)\ge \lambda _j(R_{11}R_{11}^*)+\lambda _k(R_{12}R_{12}^*)\ge \lambda _j(R_{11}R_{11}^*)=\sigma _j^2(R_{11}), \end{aligned}$$

for \(j=1,\ldots ,k\). This shows that (4.3b) holds.

Also, it follows from (4.4) that

$$\begin{aligned} R_{12}R_{22}^*=L_{11}L_{21}^*. \end{aligned}$$
(4.6)

Thus,

$$\begin{aligned} \Vert L_{21}\Vert _2=\Vert L_{11}^{-1}R_{12}R_{22}^*\Vert _2\le \frac{\Vert R_{22}\Vert _2}{\sigma _k(L_{11})}\Vert R_{12}\Vert _2 \le \frac{\Vert R_{22}\Vert _2}{\sigma _k(R_{11})}\Vert R_{12}\Vert _2=\rho \Vert R_{12}\Vert _2<\Vert R_{12}\Vert _2, \end{aligned}$$

where the second inequality follows from (4.3b). Then we obtain (4.3c).

From (4.6) and (4.5) we have

$$\begin{aligned} (L_{11}^{-1})^*L_{21}^*= & {} (L_{11}L_{11}^*)^{-1}R_{12}R_{22}^*=(R_{11}R_{11}^*+R_{12}R_{12}^*)^{-1}R_{12}R_{22}^*\\= & {} (R_{11}^{-1})^*\big (I_k+R_{11}^{-1}R_{12}(R_{11}^{-1}R_{12})^*\big )^{-1}R_{11}^{-1}R_{12}R_{22}^*. \end{aligned}$$

Thus,

$$\begin{aligned} \Vert L_{21}L_{11}^{-1}\Vert _2\le & {} \Vert R_{11}^{-1}\Vert _2\Vert R_{22}\Vert _2\Vert R_{11}^{-1}R_{12}\Vert _2\\&=\frac{\Vert R_{22}\Vert _2}{\sigma _k(R_{11})} \Vert R_{11}^{-1}R_{12}\Vert _2 = \rho \Vert R_{11}^{-1}R_{12}\Vert _2<\Vert R_{11}^{-1}R_{12}\Vert _2. \end{aligned}$$

Therefore, we obtain (4.3d).

Again, it follows from (4.4) that

$$\begin{aligned} R_{22}R_{22}^*=L_{21}L_{21}^*+L_{22}L_{22}^*. \end{aligned}$$
(4.7)

Using Lemma 4.1 we have \(\sigma _j(R_{22})\ge \sigma _j(L_{22})\) for \(j=1,\ldots ,\ell -k\). This shows that (4.3e) holds. Using (4.3b) and (4.3e), we obtain (4.3a).

By (4.5)–(4.7), we obtain

$$\begin{aligned} R_{22}R_{22}^*= & {} L_{21}L_{11}^*(L_{11}L_{11}^*)^{-1}L_{11}L_{21}^*+L_{22}L_{22}^*\\= & {} R_{22}R_{12}^*(R_{11}R_{11}^*+R_{12}R_{12}^*)^{-1}R_{12}R_{22}^*+L_{22}L_{22}^*. \end{aligned}$$

Thus,

$$\begin{aligned} L_{22}L_{22}^*= & {} R_{22}\big (I_{n-k}-R_{12}^*(R_{11}R_{11}^*+R_{12}R_{12}^*)^{-1}R_{12}\big )R_{22}^* \nonumber \\= & {} R_{22}\Big (I_{n-k}-(R_{11}^{-1}R_{12})^*\big (I_k+(R_{11}^{-1}R_{12})(R_{11}^{-1}R_{12})^*\big )^{-1} (R_{11}^{-1}R_{12})\Big )R_{22}^*.\nonumber \\ \end{aligned}$$
(4.8)

Let \(R_{11}^{-1}R_{12}\) admit the SVD: \(R_{11}^{-1}R_{12}={\widehat{U}}\widehat{\Sigma }{\widehat{V}}^*\), where \(\widehat{\Sigma }={\mathrm{diag}}(\sigma _1(R_{11}^{-1}R_{12}),\ldots ,\sigma _{s_0}(R_{11}^{-1}R_{12}))\) \(\in {{\mathbb {R}}}^{k\times (n-k)}\) with \(s_0=\min \{k, n-k\}\). By simple calculations, we obtain

$$\begin{aligned} I_{n-k}-(R_{11}^{-1}R_{12})^*\big (I_k+(R_{11}^{-1}R_{12})(R_{11}^{-1}R_{12})^*\big )^{-1} (R_{11}^{-1}R_{12})=GG^*, \end{aligned}$$

where \(G:={\widehat{V}}D_G{\widehat{V}}^*\) with \(D_G:={\mathrm{diag}}(1/(1+\sigma _1^2(R_{11}^{-1}R_{12}))^{1/2},\ldots ,1/(1+\sigma _{s_0}^2(R_{11}^{-1}R_{12}))^{1/2},1,\ldots ,1)\) \(\in {{\mathbb {R}}}^{(n-k)\times (n-k)}\). This, together with Lemma 4.2 and (4.8), yields

$$\begin{aligned} \sigma _j^2(L_{22})&=\sigma _j^2(R_{22}G)\ge \sigma _j^2(R_{22})\cdot \sigma _{\min }^2(G) \ge \frac{\sigma _j^2(R_{22})}{1+\Vert R_{11}^{-1}R_{12}\Vert _2^2} \\&\ge \sigma _j^2(R_{22})(1-\Vert R_{11}^{-1}R_{12}\Vert _2^2) \end{aligned}$$

for \(j=1,\ldots ,\ell -k\). This shows that (4.3f) holds.

Next, we consider the QLP iteration, which aims to compute the QR decomposition of the conjugate transpose of the R-factor produced by the last step. With \(R_1=L^*\), the algorithm can be described by

$$\begin{aligned} \text{ Compute } \text{ the } \text{ QR } \text{ decomposition } \text{ of }\, R_i^*: R_i^*=Q_{i+1}R_{i+1},\quad i=0,1,2,\ldots . \end{aligned}$$
(4.9)

For any \(i\ge 0\), let

$$\begin{aligned} R_{i}=\left[ \begin{array}{cc} S_{i} &{} H_{i} \\ 0 &{} E_{i} \end{array} \right] ,\quad S_{i}\in {{\mathbb {Q}}}^{k\times k}, \end{aligned}$$

where \(S_0=R_{11}\), \(H_{0}=R_{12}\), \(E_{0}=R_{22}\), \(S_1=L_{11}^*\), \(H_{1}=L_{21}^*\), and \(E_{1}=L_{22}^*\). By using a similar proof of (4.3), we have

$$\begin{aligned}&\rho _i :=\frac{\Vert E_{i}\Vert _2}{\sigma _k(S_{i})} \le \rho _{i-1} \quad (\rho _0=\rho ),\quad i=1,2,\ldots ,\end{aligned}$$
(4.10a)
$$\begin{aligned}&\sigma _{j}(S_{i+1})\ge \sigma _{j}(S_{i}),\quad j=1,\ldots ,k,\quad i=0,1,\ldots ,\end{aligned}$$
(4.10b)
$$\begin{aligned}&\Vert H_{i+1}\Vert _2 \le \rho _{i}\Vert H_{i}\Vert _2,\quad i=0,1,\ldots ,\end{aligned}$$
(4.10c)
$$\begin{aligned}&\Vert S_{i+1}^{-1}H_{i+1}\Vert _2\le \rho _{i}\Vert S_{i}^{-1}H_{i}\Vert _2,\quad i=0,1,\ldots ,\end{aligned}$$
(4.10d)
$$\begin{aligned}&\sigma _j(E_{i+1})\le \sigma _j(E_{i}),\quad j=1,\ldots ,\ell -k,\quad i=0,1,\dots ,\end{aligned}$$
(4.10e)
$$\begin{aligned}&\sigma _j(E_{i+1})\ge \sigma _j(E_{i})(1-\Vert S_{i}^{-1}H_{i}\Vert _2^2)^{1/2},\quad j=1,\ldots ,\ell -k,\quad i=0,1,\dots . \end{aligned}$$
(4.10f)

Using (4.10a)–(4.10f) and (4.3a)–(4.3f), we can establish (4.1)–(4.2) by following the arguments similar to those of [37, Theorem 3.4] under the assumptions that \(\sigma _{k}(R_{11})\ge \frac{\sigma _{k}(A)}{\sqrt{k(n-k+1)}}\) and \(\Vert R_{22}\Vert _{2}\le \sqrt{(k+1)(n-k)}\cdot \sigma _{k+1}(A)\). \(\square \)

On the maximum singular value approximation error for the QLP decomposition of a quaternion matrix, we have the following theorem.

Theorem 4.5

With the matrices \(R_0, L\) defined in Theorem 4.4, if we partition \(R_{0}\) and L as

$$\begin{aligned} R_{0}=\left[ \begin{array}{cc} r_{11} &{} R_{12} \\ 0 &{} R_{22} \\ \end{array} \right] \quad \text{ and }\quad L=\left[ \begin{array}{cc} l_{11} &{} 0 \\ L_{21} &{} L_{22} \\ \end{array} \right] , \end{aligned}$$

where \(r_{11}\), \(l_{11}\in {\mathbb {Q}}\). If \(\Vert R_{22}\Vert _{2}\le \sqrt{2(n-1)}\sigma _{2}(A)\) and \(\frac{\Vert R_{22}\Vert _{2}}{|r_{11}|}<1\), then

$$\begin{aligned} |l_{11}|^{-1}-\sigma _{1}^{-1}(A)\le \frac{\sigma _{2}^{2}(A)}{\sigma _{1}^{3}(A)}{\mathcal {O}} \left( \frac{n^{\frac{5}{2}}\Vert r_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\right) , \end{aligned}$$

where \(\rho _{1}=\frac{\Vert L_{22}\Vert _{2}}{|l_{11}|}<\frac{\Vert R_{22}\Vert _2}{|r_{11}|}<1\).

Proof

This follows from a similar proof of Theorem 4.4 by using the fact that

$$\begin{aligned} \sigma _1^2(A)\le \Vert A\Vert _F^2=\Vert R_0\Vert _F^2\le n\max \limits _{1\le j\le n}\Vert R_0\mathbf{e}_j\Vert _2^2=n|r_{11}|^2. \end{aligned}$$

\(\square \)

Remark 4.6

The results indicate that the singular values of the diagonal blocks of L are good approximations to those of A, when there exists a significant gap between \(\sigma _k(A)\) and \(\sigma _{k+1}(A)\). When the quaternion data matrix degenerates to a real matrix, the bounds provided in [37, Theorem 3.4] are given by

$$\begin{aligned} \frac{\sigma _{j}(L_{22})-\sigma _{k+j}(A)}{\sigma _{k+j}(A)}\le \left( \frac{\sigma _{k+1}(A)}{\sigma _{k}(A)}\right) ^{2} {\mathcal {O}}\Big (\frac{n^{5/2}\Vert R_{12}\Vert _2^2}{(1-\rho _1^2) \sigma _{k}^2(R_{11})}\Big ), \; i=1,\ldots ,\ell -k \end{aligned}$$

and

$$\begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)}\le \left( \frac{\sigma _{k+1}(A)}{\sigma _{k}(A)} \right) ^{2} {\mathcal {O}}\Big (\frac{n^{5/2}\Vert R_{12}\Vert _2^2}{(1-\rho _1^2) \sigma _{k}^2(R_{11})}\Big ), \;j=1,\ldots ,k. \end{aligned}$$

We note that \(\Vert R_{11}^{-1}R_{12}\Vert _2\le \Vert R_{11}^{-1}\Vert _2\Vert R_{12}\Vert _2=\Vert R_{12}\Vert _2/ \sigma _{k}(R_{11})\) and experimental results for QRCP of 300 real test matrices in [45] show that \(\Vert R_{11}^{-1}R_{12}\Vert _2\) is always smaller than 10. This shows that our error bound in Theorem 4.4 is tighter. In fact, the main difference is that, in the proof [37, Theorem 3.4], the main inequalities (4.10a)–(4.10c) and (4.10e) were derived by employing the block structure and the singular value property of the blocks of \(Q_{i+1}\) in (4.9) while we deduce (4.10a)–(4.10f) from \(R_iR_i^*=R_{i+1}^*R_{i+1}\) avoiding using the singular value property of the blocks of \(Q_{i+1}\), and we obtain the new inequalities associated with \(\Vert S_j^{-1}H_j\Vert _2\) in (4.10d) and (4.10f) to substitute the ones in [37] that are associated with \(\Vert S_j^{-1}\Vert _2\Vert H_j\Vert _2\) .

4.2 Error Analysis of Randomized QQLP

The idea and implementation of RQQLP are simple, while the theoretical analysis is rather lengthy. In this section we first give the main approximation results of RQQLP.

4.2.1 Matrix Approximation Error Analysis

In this subsection, we give the matrix approximation error bound for Algorithm 3.2.

On the matrix approximation error of Algorithm 3.2, we have the following results. The proof is similar to [35, Theorems 4.1–4.3] and thus we omit the details here.

Theorem 4.7

(Average error) Let \(A\in {{\mathbb {Q}}}^{m\times n}\) admit the QSVD: \(A=U_m\Sigma V_n^* \), where \(U_m=[U_{1m}, U_{2m}]\) with \(U_{1m}\in {{\mathbb {Q}}}^{m\times k}\) and \(\Sigma ={\mathrm{diag}}(\sigma _1(A),\ldots ,\sigma _\ell (A))\in {{\mathbb {R}}}^{m\times n}\) with \(\sigma _{1}(A)\ge \sigma _{2}(A)\ge \ldots \ge \sigma _\ell (A)\ge 0\) (\(\ell =\min \{m,n\}\)). With the notations of Algorithm 3.2, if the oversampling parameter \(p\ge 2\), \(l=k+p\le \min \{m,n\}\), the power iteration number \(q=0\), and \(\Omega _1=\Omega U_{1m}\) has full column rank for the quaternion random test matrix \(\Omega \in {\mathbb {Q}}^{l\times m}\), then the expected approximation error for output matrices \(Q_l, L_l, P_l\) from Algorithm 3.2 satisfies

$$\begin{aligned}&{\mathbb {E}}\Vert A-Q_{l}L_{l}P_{l}^*\Vert _F\le \Big (1+\frac{4k}{4p+2}\Big )^{1/2} \Big (\sum _{j>k}\sigma _j^2(A)\Big )^{1/2},\\&{\mathbb {E}}\Vert A-Q_{l}L_{l}P_{l}^*\Vert _2\le \Bigg (1+3\sqrt{\frac{k}{4p+2}}\Bigg )\sigma _{k+1}(A)+ \frac{3e\sqrt{4k+4p+2}}{2p+2}\Big (\sum _{j>k}\sigma _j^2(A)\Big )^{1/2}. \end{aligned}$$

If the power iteration number \(q>0\), then the expected approximation error is given by

$$\begin{aligned}&{\mathbb {E}}\Vert A-Q_{l}L_{l}P_{l}^*\Vert _2 \\&\quad \le \left( \left( 1+3\sqrt{\frac{k}{4p+2}}\right) \sigma _{k+1}^{2q+1}(A)+ \frac{3e\sqrt{4k+4p+2}}{2p+2}\Big (\sum _{j>k}\sigma _j^{2(2q+1)}(A)\Big )^{1/2}\right) ^{1/(2q+1)}. \end{aligned}$$

Theorem 4.8

(Deviation bound for the approximation error) With the notations of Theorem 4.7, for Algorithm 3.2, we have the following bounds for the approximation error:

$$\begin{aligned} \Vert A-Q_{l}L_{l}P_{l}^*\Vert _F\le \left( 1+v\sqrt{\frac{3k}{p+1}}\right) \Big (\sum _{j>k}\sigma _j^2(A)\Big )^{1/2}+ uv\frac{e\sqrt{4k+4p+2}}{4p+4}\sigma _{k+1}(A)\nonumber \\ \end{aligned}$$
(4.11)

with probability not less than \(1-2v^{-4p}-e^{-u^2/2}\) and

$$\begin{aligned} \Vert A-Q_{l}L_{l}P_{l}^*\Vert _2\le \left( 1+\frac{3v}{2}\sqrt{\frac{3k}{p+1}}+uv\eta _{k,p}\right) \sigma _{k+1}(A)+ 3v\eta _{k,p}\Big (\sum _{j>k}\sigma _j^2(A)\Big )^{1/2} \end{aligned}$$

with probability not less than \(1-2v^{-4p}-e^{-u^2/2}\), where \(u,v\ge 1\) and \(\eta _{k,p}=\frac{e\sqrt{4k+4p+2}}{4p+4}\).

We observe from Theorems 4.74.8 that the singular value decaying rate is a key factor governing the low-rank approximation, and Algorithm 3.2 can provide low-rank approximations from optimal up to constants depending on the oversampling parameter p and gap ratio of singular values. If the singular values decay slowly, the number q of the power iteration may significantly improve the matrix approximation error with high probability.

4.2.2 Singular Values Approximation Error Analysis

In this subsection, we provide the singular values approximation error bounds for Algorithm 3.2.

On the maximum and interior singular value approximation errors of Algorithm 3.2, we have the following results.

Theorem 4.9

With the notation of Algorithm 3.2, computing the QQLP decomposition of B such that \(R_0\) is the R-factor in the QRCP of B, \(BP_{0}=Q_{0}R_{0}\) and \(L^*\) is the R-factor in the QRCP of \(R_0^*\), \(R_{0}^{*}=Q_{1}L^{*}\). Suppose \(\sigma _{1}(B)>\sigma _{2}(B)\) and partition \(R_{0}\) and L as

$$\begin{aligned} R_{0}=\left[ \begin{array}{cc} r_{11} &{} R_{12} \\ 0 &{} R_{22} \\ \end{array} \right] \quad \text{ and }\quad L=\left[ \begin{array}{cc} l_{11} &{} 0 \\ L_{21} &{} L_{22} \\ \end{array} \right] , \end{aligned}$$

where \(r_{11}\), \(l_{11}\in {\mathbb {Q}}\). If \(\Vert R_{22}\Vert _{2}\le \sqrt{2(l-1)}\sigma _{2}(B)\) and \(\frac{\Vert R_{22}\Vert _{2}}{|r_{11}|}<1\), then we have

$$\begin{aligned} |l_{11}|^{-1}-\sigma _{1}^{-1}(A)\le \frac{1}{\sigma _{1}(B)}\left( 1-\frac{1}{\delta _1}\right) + \frac{\sigma _{2}^{2}(B)}{\sigma _{1}^{3}(B)}{\mathcal {O}}\left( \frac{l^{\frac{5}{2}}\Vert r_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\right) , \end{aligned}$$

with probability not less than \(1-\Delta \), where \(\Delta =\exp (-\alpha ^2/2)+\frac{\pi ^{-3}}{4(p+1)(2p+3)}\gamma ^{-4(p+1)}\), \(\rho _{1}=\frac{\Vert L_{22}\Vert _{2}}{|l_{11}|}<1\), and \(\delta _1=\sqrt{1+{\mathcal {C}}_{\alpha ,\gamma }^2 \left( \frac{\sigma _{k+1}(A)}{\sigma _{1}(A)}\right) ^{4q+2}}\) with \({\mathcal {C}}_{\alpha ,\gamma }=(3\sqrt{m-k}+3\sqrt{\ell }+\alpha ) \frac{e\sqrt{4l+2}}{4(p+1)}\gamma \) for \(\alpha >0\) and \(\gamma \ge 1\).

Theorem 4.10

With the notation of Algorithm 3.2, computing the QQLP decomposition of B such that \(R_0\) is the R-factor in the QRCP of B, \(BP_{0}=Q_{0}R_{0}\) and \(L^*\) is the R-factor in the QRCP of \(R_0^*\), \(R_{0}^{*}=Q_{1}L^{*}\). Suppose \(\sigma _{s}(B)>\sigma _{s+1}(B)\) for some \(1\le s< k\). Partition \(R_{0}\) and L as

$$\begin{aligned} R_{0}=\left[ \begin{array}{cc} R_{11} &{} R_{12} \\ 0 &{} R_{22} \\ \end{array} \right] \quad \text{ and }\quad L=\left[ \begin{array}{cc} L_{11} &{} 0 \\ L_{21} &{} L_{22} \\ \end{array} \right] , \end{aligned}$$

where \(R_{11}\), \(L_{11}\in {\mathbb {Q}}^{s\times s}\). Assume that \(\Vert R_{22}\Vert _{2}\le \sqrt{(s+1)(l-s)}\sigma _{s+1}(B)\), \(\sigma _{s}(R_{11})\ge \frac{\sigma _{s}(B)}{\sqrt{s(l-s+1)}}\), and \(\frac{\Vert R_{22}\Vert _{2}}{\sigma _s(R_{11})}<1\). Then we have

$$\begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)}\le \delta _j-1+ \delta _j\Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2}{\mathcal {O}} \Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ) \end{aligned}$$

with probability not less than \(1-\Delta \) for all \(j=1,\ldots ,s\), and

$$\begin{aligned} \frac{\sigma _{j}(L_{22})-\sigma _{s+j}(A)}{\sigma _{s+j}(A)} \le \Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2} {\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ), \end{aligned}$$

for all \(j=1,\ldots ,k-s\), where \(\rho _{1}=\frac{\Vert L_{22}\Vert _{2}}{\sigma _{s}(L_{11})}<1\), \(\delta _j=\Big (1+{\mathcal {C}}_{\alpha ,\gamma }^2 \Big (\frac{\sigma _{k+1}(A)}{\sigma _{j}(A)}\Big )^{4q+2}\Big )^{1/2}\), \(\Delta \) and \({\mathcal {C}}_{\alpha ,\gamma }\) are defined in Theorem 4.9.

4.3 Proof of Theorems 4.94.10

4.3.1 Setup

To study the singular values approximation error of Algorithm 3.2, we first build some preliminary results for later analysis.

Lemma 4.11

([3, Proposition 10.3]) Let \(h(\cdot )\) be a Lipschitz function on real matrices: \(|h(X)-h(Y)|\le L\Vert X-Y\Vert _F\) for all \(X, Y\in {{\mathbb {R}}}^{m\times n}\). Then for any \(m\times n\) standard real Gaussian matrix G and \(u>0\), we have \(\mathbb {P}(h(G) \ge \mathbb {E}h(G)+Lu)\le e^{-u^2/2}\).

In the following lemma, we give some upper bounds on the spectral norm of a random quaternion test matrix defined by (3.1).

Lemma 4.12

Let \(\Omega =\Omega _0+\Omega _1{\mathbf {i}}+\Omega _2{\mathbf {j}}+\Omega _3{\mathbf {k}}\in {\mathbb {Q}}^{s\times \ell }\), where \(\Omega _0,\Omega _1,\Omega _2,\Omega _3\in {\mathbb {R}}^{s\times \ell }\) are all real independent standard Gaussian random matrices. Then

$$\begin{aligned} {\mathbb {E}}\Vert \Omega \Vert _2\le 3\sqrt{s}+3\sqrt{\ell } \end{aligned}$$

and

$$\begin{aligned} \Vert \Omega \Vert _2<3\sqrt{s}+3\sqrt{\ell }+\alpha \end{aligned}$$

with probability not less than \(1-\exp (-\alpha ^2/2)\) for any \(\alpha >0\).

Proof

In [35, Lemma 4.15], the relation \({\mathbb {E}}\Vert S\Omega T\Vert _2\le 3(\Vert S\Vert _2\Vert T\Vert _F+\Vert T\Vert _2\Vert S\Vert _2)\) for any \(S\in {{\mathbb {Q}}}^{s\times s}\) and \(T\in {{\mathbb {Q}}}^{\ell \times \ell }\) has been established. By taking \(S=I_s\) and \( T=I_{\ell }\), the upper bound for \({\mathbb {E}}\Vert \Omega \Vert _2\) is as desired.

We now establish the probability upper bound of \(\Vert \Omega \Vert _2\). Define

$$\begin{aligned} h(X_c)=\Vert \Upsilon _{X}\Vert _2=\Vert [J_0X_c, J_1X_c, J_2X_c, J_3X_c]\Vert _2,\quad X_c:=[X_0^T, X_1^T, X_2^T, X_3^T]^T, \end{aligned}$$

for all \(X=X_0+X_1{\mathbf {i}}+X_2{\mathbf {j}}+X_3{\mathbf {k}}\in {\mathcal {Q}}^{s\times \ell }\), and \(J_0=I_4\otimes I_s\), \(J_1=[-\mathbf{e}_2, \mathbf{e}_1,\mathbf{e}_4,-\mathbf{e}_3]^T\otimes I_s\), \(J_2=[-\mathbf{e}_3, -\mathbf{e}_4, \mathbf{e}_1,\mathbf{e}_2]^T\otimes I_s\), \(J_3=[-\mathbf{e}_4,\mathbf{e}_3,-\mathbf{e}_2,\mathbf{e}_1]^T \otimes I_s\) with \(\mathbf{e}_i\) being the i-th column of \(I_4\). By Lemma 2.2,

$$\begin{aligned} |h(F_c)-h(G_c)|=|\Vert \Upsilon _F\Vert _2-\Vert \Upsilon _G\Vert _2|=|\Vert F\Vert _2-\Vert G\Vert _2|\le \Vert F-G\Vert _2\le \Vert F_c-G_c\Vert _F, \end{aligned}$$

for all \(F_c=[F_0^T, F_1^T, F_2^T, F_3^T]^T\), \(G_c=[G_0^T, G_1^T, G_2^T, G_3^T]^T\in {\mathbb {R}}^{4s\times \ell }\) with \(F=F_0+F_1{\mathbf {i}}+F_2{\mathbf {j}}+F_3{\mathbf {k}}\), \(G=G_0+G_1{\mathbf {i}}+G_2{\mathbf {j}}+G_3{\mathbf {k}}\in {\mathbb {Q}}^{s\times \ell }\). This shows that \(h(\cdot )\) is a Lipschitz function with the Lipschitz constant \(L=1\). By hypothesis, \(\Omega _c=[\Omega _0^T,\Omega _1^T,\Omega _2^T,\Omega _3^T]^T\) is a \(4s\times \ell \) standard Gaussian random matrix. By Lemma 2.2 and Lemma 4.11, we have

$$\begin{aligned} \begin{array}{lcl} \Vert \Omega \Vert _2&{}=&{}\Vert \Upsilon _\Omega \Vert _2=h(\Omega _c)\le {\mathbb {E}}h(\Omega _c)+\alpha \\ &{}=&{}{\mathbb {E}}\Vert \Upsilon _\Omega \Vert _2+\alpha ={\mathbb {E}}\Vert \Omega \Vert _2+\alpha \le 3\sqrt{s}+3\sqrt{\ell }+\alpha \end{array} \end{aligned}$$

with probability not less then \(1-e^{-\alpha ^2/2}\) for any \(\alpha >0\). \(\square \)

On the spectral norm of the Moore-Penrose inverse of a quaternion random test matrix, we have the following result [35].

Lemma 4.13

[35, Theorem 4.14] Let \(\Omega =\Omega _0+\Omega _1{\mathbf {i}}+\Omega _2{\mathbf {j}}+\Omega _3{\mathbf {k}}\in {\mathbb {Q}}^{s\times \ell }\) with \(s\le \ell \), where \(\Omega _0,\Omega _1,\Omega _2,\Omega _3\in {{\mathbb {R}}}^{m\times n}\) are all real independent standard Gaussian random matrices. Then for any \(\gamma \ge 1\), we have

$$\begin{aligned} {\mathbb {E}}\Vert \Omega ^{\dag }\Vert _2\le \frac{e\sqrt{4\ell +2}}{2\ell -2s+2} \end{aligned}$$

and

$$\begin{aligned} {\mathbb {P}}\Big \{\Vert \Omega ^{\dag }\Vert _2>\frac{e\sqrt{4\ell +2}}{4(s-\ell +1)}\gamma \Big \}\le \frac{\pi ^{-3}}{4(\ell -s+1)(2\ell -2s+3)} \gamma ^{-4(\ell -s+1)}. \end{aligned}$$

Let the matrix \(A\in {\mathbb {Q}}^{m\times n}\) and the parameters k, l and q be defined as in Algorithm 3.2. The following lemma provides the lower bounds for the k largest singular values of \(B=W^*A\) with \(W\in {\mathbb {Q}}^{m\times l}\) being an orthogonal column basis of \((AA^*)^qA\Psi \), where \(\Psi =\Psi _0+\Psi _1{\mathbf {i}}+\Psi _2{\mathbf {j}} +\Psi _3{\mathbf {k}} \in {\mathbb {Q}}^{n\times l}\) is a quaternion random test matrix.

Lemma 4.14

Let \(A\in {\mathbb {Q}}^{m\times n}\) admit the SVD: \(A=U_\ell \Sigma _\ell V_\ell ^*\), where \(U_\ell \in {{\mathbb {Q}}}^{m\times \ell }\) and \(V_\ell \in {{\mathbb {Q}}}^{n\times \ell }\) have orthonormal columns and \(\Sigma _\ell ={\mathrm{diag}}(\sigma _1(A),\ldots ,\sigma _\ell (A))\in {{\mathbb {R}}}^{\ell \times \ell }\) with \(\ell =\min \{m,n\}\). Suppose k, l and q are integers such that \(0<k\le l\le \ell \), and \(q\ge 0\). Let \(\Psi =\Psi _0+\Psi _1{\mathbf {i}}+\Psi _2{\mathbf {j}} +\Psi _3{\mathbf {k}} \in {\mathbb {Q}}^{n\times l}\) be a quaternion random test matrix. Partition \(V_\ell ^{*}\Psi =[\widehat{\Psi }_{1}^T, \widehat{\Psi }_{2} ^T]^T\) with \(\widehat{\Psi }_{1}\in {\mathbb {Q}}^{k\times l}\). Let \(B=W^{*}A\) with the column vectors of \(W\in {\mathbb {Q}}^{m\times l}\) being an orthogonal basis of \({{{\mathcal {R}}}}((AA^*)^qA\Psi )\). If \(\widehat{\Psi }_{1}\) has full row rank, then

$$\begin{aligned} \sigma _{j}(B)=\sigma _{j}(W^{*}A)\ge \frac{\sigma _{j}(A)}{\sqrt{1+\Vert \widehat{\Psi }_{2}\Vert _{2}^{2} \Vert \widehat{\Psi }_{1}^{\dag }\Vert _{2}^{2}\left( \frac{\sigma _{k+1}(A)}{\sigma _{j}(A)}\right) ^{4q+2}}} \end{aligned}$$

for \(j=1,\ldots ,k\).

Proof

Using the SVD of A we have \((AA^*)^qA\Psi =U_\ell \Sigma _\ell ^{2q+1} V_\ell ^*\Psi \). By hypothesis, we have

$$\begin{aligned} \widehat{\Psi }:=V_\ell ^*\Psi =[\widehat{\Psi }_1^T,\widehat{\Psi }_2^T]^T,\quad \widehat{\Psi }_1\in {\mathbb {Q}}^{k\times l}. \end{aligned}$$

Let

$$\begin{aligned} \Sigma _\ell ={\mathrm{diag}}(\Sigma _k,\Sigma _c),\quad \Sigma _k\in {\mathbb {R}}^{k\times k}. \end{aligned}$$
(4.12)

Then we obtain

$$\begin{aligned} (AA^*)^qA\Psi =U_\ell \left[ \begin{array}{c} \Sigma _k^{2q+1}\widehat{\Psi }_1 \\ \Sigma _c^{2q+1}\widehat{\Psi }_2 \\ \end{array} \right] . \end{aligned}$$

Let \(X\in {{\mathbb {Q}}}^{l\times l}\) be nonsingular. Then we can easily prove that \({\mathcal {R}}((AA^*)^qA\Psi X)={\mathcal {R}}((AA^*)^qA\Psi )\). Let \(X=\left[ \widehat{\Psi }_1^{\dag }\Sigma _k^{-(2q+1)}, {\widehat{X}}_2 \right] ,\) where \({\widehat{X}}_2\in {{\mathbb {Q}}}^{l\times (l-k)}\) satisfies \(\widehat{\Psi }_1{\widehat{X}}_2=0\). In this case, we have

$$\begin{aligned} Z:=(AA^*)^qA\Psi X=U_\ell \left[ \begin{array}{cc} I_k &{} 0 \\ H_1 &{} H_2 \\ \end{array} \right] =:U_\ell {\widehat{H}}, \end{aligned}$$
(4.13)

where \(H_1=\Sigma _c^{2q+1}\widehat{\Psi }_2\widehat{\Psi }_1^{\dag }\Sigma _1^{-(2q+1)} \in {{\mathbb {Q}}}^{(\ell -k)\times k}\), and \(H_2=\Sigma _c^{2q+1}\widehat{\Psi }_2{\widehat{X}}_2\in {{\mathbb {Q}}}^{(\ell -k)\times (l-k)}\) has full column rank. By assumption, \({{{\mathcal {R}}}}(W)={\mathcal {R}}\left( (AA^*)^qA\Psi \right) \) with \(W^*W=I_l\) yields \(WW^*=ZZ^\dag \).

Let \({\widehat{H}}\) admit the following QR decomposition:

$$\begin{aligned} {\widehat{H}}=\left[ \begin{array}{cc} I_k &{} 0 \\ H_1 &{} H_2 \\ \end{array} \right] =[{\widehat{W}}_1, {\widehat{W}}_2]\left[ \begin{array}{cc} {\widehat{R}}_{11} &{} {\widehat{R}}_{12} \\ 0 &{} {\widehat{R}}_{22} \\ \end{array} \right] \equiv {\widehat{W}}{\widehat{R}}, \quad {\widehat{W}}_1\in {{\mathbb {Q}}}^{\ell \times k},\quad {\widehat{R}}_{11}\in {{\mathbb {Q}}}^{k\times k}, \end{aligned}$$

which implies that

$$\begin{aligned}{}[I_k,H_1^T]^T={\widehat{W}}_1{\widehat{R}}_{11}. \end{aligned}$$
(4.14)

Therefore,

$$\begin{aligned} \sigma _k(WW^*A)= & {} \sigma _k(ZZ^{\dag }A)=\sigma _k((U_\ell {\widehat{W}})(U_\ell {\widehat{W}})^*A)= \sigma _k(U_\ell {\widehat{W}}{\widehat{W}}^*\Sigma _\ell V_\ell ^*)\nonumber \\= & {} \sigma _k({\widehat{W}}^*\Sigma _\ell )\ge \sigma _k({{{\mathcal {U}}}}_1^*{\widehat{W}}^*\Sigma _\ell {{{\mathcal {V}}}}_1)=\sigma _k\left( {\widehat{W}}_1^*\left[ \begin{array}{c} \Sigma _k \\ 0 \\ \end{array} \right] \right) \nonumber \\= & {} \sigma _k\left( \left( \left[ \begin{array}{c} I_k \\ H_1 \\ \end{array} \right] {\widehat{R}}_{11}^{-1}\right) ^*\left[ \begin{array}{c} \Sigma _k \\ 0 \\ \end{array} \right] \right) =\sigma _k({\widehat{R}}_{11}^{-*}\Sigma _k) \end{aligned}$$
(4.15)

where the first inequality follows from Lemma 4.3 with \({{{\mathcal {U}}}}_1={{{\mathcal {V}}}}_1=I(:,1:k)\), and the penultimate equality follows from (4.14).

On the other hand, by Lemma 4.2 and (4.14), we obtain

$$\begin{aligned} \sigma _k(A)=\sigma _k\left( {\widehat{R}}_{11}^*{\widehat{R}}_{11}^{-*}\Sigma _k\right) \le \Vert {\widehat{R}}_{11}\Vert _2\sigma _k \left( {\widehat{R}}_{11}^{-*}\Sigma _k\right) = \sqrt{1+\Vert H_1\Vert _2^2}\cdot \sigma _k\left( {\widehat{R}}_{11}^{-*}\Sigma _k\right) . \end{aligned}$$

This, together with (4.15), gives rise to

$$\begin{aligned} \sigma _k(B)=\sigma _k(WW^*A)\ge \sigma _k\left( {\widehat{R}}_{11}^{-*}\Sigma _k\right) \ge \frac{\sigma _k(A)}{\sqrt{1+\Vert H_1\Vert _2^2}}. \end{aligned}$$
(4.16)

From the definition of \(H_1\) in (4.13) we have

$$\begin{aligned} \Vert H_1\Vert _2\le \Vert \widehat{\Psi }_2\Vert _2\Vert \widehat{\Psi }_1^{\dag }\Vert _2 \left( \frac{\sigma _{k+1}(A)}{\sigma _k(A)}\right) ^{2q+1}. \end{aligned}$$

This, together with (4.16), yields

$$\begin{aligned} \sigma _k(B)\ge \frac{\sigma _k(A)}{\sqrt{1+\Vert \widehat{\Psi }_2\Vert _2^2\Vert \widehat{\Psi }_1^{\dag }\Vert _2^2 \left( \frac{\sigma _{k+1}(A)}{\sigma _k(A)}\right) ^{4q+2}}}. \end{aligned}$$

This shows that the result in Lemma 4.14 holds for \(j=k\). By repeating all previous arguments for the rank-j truncated SVD of B for all \(1\le j<k\), we can establish Lemma 4.14. \(\square \)

We point out that one may establish Lemma 4.14 by employing the similar arguments in [10, Lemmas 4.1–4.2, Theorem 4.3].

4.3.2 The Detailed Proof

Based on the preliminary results developed in Sect. 4.3.1, we give the proof of Theorems 4.94.10.

Proof of Theorem 4.9

Let \(A\in {\mathbb {Q}}^{m\times n}\) admit the SVD: \(A=U_\ell \Sigma _\ell V_\ell ^*\), where \(U_\ell \in {{\mathbb {Q}}}^{m\times \ell }\) and \(V_\ell \in {{\mathbb {Q}}}^{n\times \ell }\) have orthonormal columns and \(\Sigma _\ell ={\mathrm{diag}}(\sigma _1(A),\ldots ,\sigma _\ell (A))\in {{\mathbb {R}}}^{\ell \times \ell }\). Denote

$$\begin{aligned} U_m^*\Omega ^*=[\widehat{\Omega }_{1}^T, \widehat{\Omega }_{2}^T]^T,\quad \widehat{\Omega }_{1}\in {\mathbb {Q}}^{k\times l}. \end{aligned}$$

By Lemma 4.3 and Lemma 4.14, we obtain, for any \(1\le j\le k\),

$$\begin{aligned} \frac{\sigma _j(A)}{\hat{\delta }_j}\le \sigma _j(B)=\sigma _{j}(AV)=\sigma _{j}(V^*A^*)\le \sigma _{j}(A^*)=\sigma _{j}(A), \end{aligned}$$
(4.17)

where \(\hat{\delta }_j=\big (1+\Vert \widehat{\Omega }_{2}\Vert _{2}^{2} \Vert \widehat{\Omega }_{1}^{\dag }\Vert _{2}^{2}\left( \frac{\sigma _{k+1}(A)}{\sigma _{j}(A)}\right) ^{4q+2}\big )^{1/2}.\) Using Lemmas 4.124.13 we have

$$\begin{aligned} \Vert \widehat{\Omega }_2\Vert _2\Vert \widehat{\Omega }_1^{\dag }\Vert _2\le (3\sqrt{m-k}+3\sqrt{\ell }+\alpha ) \frac{e\sqrt{4l+2}}{4(p+1)}\gamma =:{\mathcal {C}}_{\alpha ,\gamma } \end{aligned}$$

with probability not less than \(1-\Delta \), where \(\Delta =\exp (-\alpha ^2/2)+\frac{\pi ^{-3}}{4(p+1)(2p+3)}\gamma ^{-4(p+1)}\). From (4.17) we have, for any \(1\le j\le k\),

$$\begin{aligned} \frac{\sigma _j(A)}{\delta _j}\le \sigma _j(B) \end{aligned}$$
(4.18)

with probability not less than \(1-\Delta \), where \(\delta _j=\left( 1+{\mathcal {C}}_{\alpha ,\gamma }^2\left( \frac{\sigma _{k+1}(A)}{\sigma _{j}(A)}\right) ^{4q+2}\right) ^{1/2}\). It follows from Theorem 4.5 and (4.18) with \(j=1\) that

$$\begin{aligned} |l_{11}|^{-1}-\sigma _{1}^{-1}(A)= & {} \sigma _1^{-1}(B)-\sigma _1^{-1}(A)+(|l_{11}|^{-1}-\sigma _1^{-1}(B))\\\le & {} \sigma _{1}^{-1}(B)-\frac{1}{\delta _1\sigma _{1}(B)}+ \frac{\sigma _{2}^{2}(B)}{\sigma _{1}^{3}(B)}{\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert r_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big )\\= & {} \sigma _{1}^{-1}(B)\Big (1-\frac{1}{\delta _1}\Big )+\frac{\sigma _{2}^{2}(B)}{\sigma _{1}^{3}(B)} {\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert r_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ) \end{aligned}$$

with probability not less than \(1-\Delta \). \(\square \)

Proof of Theorem 4.10

By Theorem 4.4 and (4.17) we have, for any \(1\le j\le s\),

$$\begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)}= & {} \frac{\sigma _j^{-1}(B)-\sigma _j^{-1}(A)}{\sigma _j^{-1}(A)}+\frac{\sigma _j(A)}{\sigma _j(B)} \frac{\sigma _j^{-1}(L_{11})-\sigma _j^{-1}(B)}{\sigma _j^{-1}(B)}\\\le & {} \frac{\sigma _{j}^{-1}(B)-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)} +\frac{\sigma _{j}(A)\sigma _{s+1}^{2}(B)}{\sigma _{j}(B)\sigma _{s}^{2}(B)} {\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert R_{1}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big )\\\le & {} \frac{\sigma _{j}^{-1}(B)-\big (\delta _j\sigma _{j}(B)\big )^{-1}}{(\delta _j\sigma _{j}(B))^{-1}}+ \delta _j\frac{\sigma _{j}(B)\sigma _{s+1}^{2}(B)}{\sigma _{j}(B)\sigma _{s}^{2}(B)}{\mathcal {O}} \Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big )\\= & {} \delta _j-1+\delta _j\Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2}{\mathcal {O}} \Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ) \end{aligned}$$

with probability not less than \(1-\Delta \).

On the other hand, using Theorem 4.4 and (4.17) again we have, for any \(1\le j\le k-s\),

$$\begin{aligned} \frac{\sigma _{j}(L_{22})-\sigma _{s+j}(A)}{\sigma _{s+j}(A)}= & {} \frac{\sigma _{j+s}(B)-\sigma _{j+s}(A)}{\sigma _{j+s}(A)}-\frac{\sigma _{j+s}(B)}{\sigma _{j+s}(A)} \frac{\sigma _j(L_{22})-\sigma _{j+s}(B)}{\sigma _{j+s}(B)}\\\le & {} \frac{\sigma _{s+j}(B)-\sigma _{s+j}(A)}{\sigma _{s+j}(A)}+\frac{\sigma _{s+j}(B)}{\sigma _{s+j}(A)} \Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2}{\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big )\\\le & {} \Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2} {\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ). \end{aligned}$$

\(\square \)

Finally, we have the following corollary from Theorem 4.10.

Corollary 4.15

Under the same assumptions of Theorem 4.10, if \(s>k\), then

$$\begin{aligned} \begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)} \le \delta _j-1+\delta _j \Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2}{\mathcal {O}}\Big (\frac{l^{\frac{5}{2}} \Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ) \end{aligned} \end{aligned}$$

with probability not less than \(1-\Delta \) for all \(j=1,\ldots ,k\). In particular, if \(s=k\), then

$$\begin{aligned} \begin{aligned} \frac{\sigma _{j}^{-1}(L_{11})-\sigma _{j}^{-1}(A)}{\sigma _{j}^{-1}(A)} \le \delta _j-1+\delta _j\Big (\frac{\sigma _{s+1}(B)}{\sigma _{s}(B)}\Big )^{2} {\mathcal {O}}\Big (\frac{l^{\frac{5}{2}}\Vert R_{11}^{-1}R_{12}\Vert _{2}^{2}}{1-\rho _{1}^{2}}\Big ) \end{aligned} \end{aligned}$$

with probability not less than \(1-\Delta \) for all \(j=1,\ldots ,k\).

Remark 4.16

From Theorems 4.94.10 and Corollary 4.15, we see that if the numerical rank of A is clearly defined, i.e., \(\sigma _k(A)\gg \sigma _{k+1}(A)\) [46], then \(\delta _j\) tends to 1 for \(j=1,2,\ldots , k\) and the error bounds grow sharper with high probability whenever there exists a significant gap in the singular values of B.

5 Numerical Experiments

In this section, we report some numerical results on the performance of Algorithm 3.2 for computing the low-rank quaternion approximation. To illustrate the effectiveness, all the numerical experiments are carried out in MATLAB 2019b on a personal laptop with an Intel(R) CPU i5-10210U of 1.60 GHz and 8 GB of RAM. To improve the efficiency, as described in Sect. 3.2, we implement Algorithm 3.2 via real structure-preserving operations. As noted in Remark 3.2, in Algorithm 3.2, we can compute an approximate orthonormal basis of the range of \(A^*\) (i.e., V in Step 7) via the QHQR method or the QMGS method.

For Algorithms 3.1 and 3.2, the relative matrix approximation errors are given by

$$\begin{aligned} E_f=\frac{\Vert A-Q(:,1:k)L(1:k,1:k)P(:,1:k)^{*}\Vert _F}{\Vert A\Vert _F} \end{aligned}$$

and the absolute and relative singular value approximation errors are measured by

$$\begin{aligned} |\sigma _{j}(A)-|l_{jj}|| \quad \text{ and }\quad \frac{|\sigma _{j}(A)-|l_{jj}||}{\sigma _{j}(A)},\quad \forall 1\le j\le k, \end{aligned}$$

respectively. We set the oversampling parameter \(p=5\) and \(q=0\) unless otherwise specified.

We first consider a numerical example with multiple positive singular values in Example 5.1.

Example 5.1

We consider a synthetic quaternion data matrix \(A=U_n\Sigma V_n^*\in {\mathbb {Q}}^{n\times n}\), where the unitary matrices \(U_n, V_n\in {\mathbb {Q}}^{n\times n}\) are generated by the QSVD of an \(n\times n\) random quaternion matrix via the toolbox QTFM and \(\Sigma \) takes the form of

  • Polynomially decaying spectrum (pds):

    $$\begin{aligned} \Sigma ={\mathrm{diag}}(\underbrace{1,\ldots ,1}_{t},2^{-s},3^{-s},\ldots ,(n-t+1)^{-s}); \end{aligned}$$
  • Exponentially decaying spectrum (eds):

    $$\begin{aligned} \Sigma ={\mathrm{diag}}(\underbrace{1,\ldots ,1}_{t},2^{-s},2^{-2s},\ldots ,2^{-(n-t)s}), \end{aligned}$$

where \(t>0\) controls the numerical rank of the matrix and \(s>0\) controls the singular value decay rate. We report our numerical results for \(n=400\) (\(s=2, t=30\) in pds and \(s=0.25, t=30\) in eds) from several aspects below.

5.1 Comparison of Running Time

Table 1 lists the running time (in seconds) of Algorithms 3.1 and 3.2 with different target ranks for one of the test matrices in Example 5.1. Note that Algorithm 3.1 is a deterministic algorithm, the running time has nothing to do with the target rank. The low rank approximation is obtained by truncation. We find from Table 1 that the running time of Algorithm 3.2 increases with the increase of the target rank k. We also see that Algorithm 3.2 is much faster than Algorithm 3.1.

Table 1 Running time (in seconds) for Example 5.1

5.2 Comparison of Matrix Approximation Error

Figure 1 shows the relative matrix approximation error of Algorithms 3.1 and 3.2 with different target ranks for one of the test matrices in Example 5.1. It is observed from Fig. 1(pds) that both Algorithm 3.1 and Algorithm 3.2 reduce the matrix approximation error with the increase of the target rank. We also see from Fig. 1(eds) that both Algorithm 3.1 and Algorithm 3.2 with QHQR works effectively while Algorithm 3.2 with QMGS doesn’t work very well. Moreover, there is no significant difference between Algorithm 3.2 and Algorithm 3.1 with QHQR in terms of matrix approximation error. Finally, experimental results show the QHQR approach is more stable than the QMGS process especially for an extremely ill-conditioned matrix.

Fig. 1
figure 1

Relative matrix approximation error for Example 5.1

5.3 Comparison of Singular Value Approximation Error

Figures 2 and 3 display the singular value approximation error of Algorithms 3.1 and 3.2 for one of the test matrices in Example 5.1, where “Index” means the index varies from 1 to the target rank \(k=190\). From Figs. 2 and 3, we can observe that Algorithm 3.2 can reduce the singular value approximation error significantly as Algorithm 3.1 for most cases, and even better for some cases. We also see from Fig. 3 that Algorithm 3.2 works much better than Algorithm 3.1 in terms of the relative singular value approximation error for most of the top 30 singular values of the test matrix. This is perhaps because the first \(t=30\) singular values of the test matrix are all equal to 1.

Fig. 2
figure 2

Absolute singular value error for Example 5.1

Fig. 3
figure 3

Relative error of top 30 singular values for Example 5.1

Next, we give a numerical example with distinct positive singular values.

Example 5.2

In this example, we construct a test matrix with distinct positive singular values \( A=U\Sigma V^*\in {\mathbb {Q}}^{m\times n} (m\ge n), \) where the unitary matrices \(U\in {\mathbb {Q}}^{m\times m}\) and \(V\in {\mathbb {Q}}^{n\times n}\) are generated by the QSVD of two random quaternion matrices using the toolbox QTFM, and \(\Sigma =[\Sigma _1,0]^T\) with \(\Sigma _1={\mathrm{diag}}(\sigma _1(A), \ldots , \sigma _n(A))\). Here, \(\sigma _1(A)=1\), \(\frac{\sigma _{i+1}(A)}{\sigma _i(A)}=0.9\) for \(i=1, \ldots , n-1\). We report our numerical results for \(m=600\) and \(n=400\).

Fig. 4
figure 4

Absolute and relative singular value error for Example 5.2

Figure 4 shows the effect of using Algorithms 3.1 and 3.2 to approximate the singular value of the test matrix in Example 5.2. We observe from Fig. 4 that there is no significant difference between Algorithm 3.1 and Algorithm 3.2 in the singular value approximation error. However, why Algorithm 3.2 works better than Algorithm 3.1 for the case of multiple positive singular values needs further study.

5.4 Application in Image Processing

In this subsection, we apply Algorithm 3.2 to image compression.

Example 5.3

We consider the standard test image “Building” with \(300\times 300\) pixels and the (ij) pixel is set to be

$$\begin{aligned} \mathbf{a}_{ij}=r_{ij}\mathbf{i}+g_{ij}\mathbf{j}+b_{ij}\mathbf{k}, \end{aligned}$$

where \(r_{ij}\), \(g_{ij}\) and \(b_{ij}\) are respectively the red, green and blue pixels at location (ij).

Figure 5 displays the reconstructed images via Algorithm 3.1 and Algorithm 3.2 with QHQR for \(k=20, 40, 60, 80\). Fig. 6 shows the reconstructed image quality for different target ranks, where we use the peak signal-to-noise ratio defined by

$$\begin{aligned} {\mathrm {PSNR}}=20\log _{10}\frac{\max (A)\sqrt{mn}}{\Vert A-{\widehat{A}}\Vert _F}, \end{aligned}$$

where \(A\in {{\mathbb {Q}}}^{m\times n}\) denotes the original image, \({\widehat{A}}\) denotes the reconstructed image, and \(\max (A)\) represents the maximum pixel value of the original image.

We observe from Figs. 5 and 6 that the reconstructed image quality of Algorithm 3.2 is very close to that of Algorithm 3.1 under the same target rank. With the increase of the target rank, the reconstructed image quality is increasing. To illustrate the effect of the parameter q in Algorithm 3.2, in Fig. 6, we show the image quality reconstructed by Algorithm 3.2 for \(q = 0,1,2,3\). We see from Fig. 6 that the reconstructed image quality via Algorithm 3.2 with \(q=1,2\) or 3 is much better than Algorithm 3.1. We also see that \(q=1\) produces significantly better image quality than \(q = 0\) while there is no essential further improvement for \(q=2,3\). Therefore, to improve the effectiveness of our algorithm, we suggest \(q = 1\).

Fig. 5
figure 5

Low-rank image reconstruction

Fig. 6
figure 6

PSNR of reconstructed image

5.5 Application in Signal Denoising Problem

In this subsection, we apply Algorithm 3.2 to the signal denoising problem.

Example 5.4

[48] The Lorentz attractor was first used in atmospheric turbulence, which can be regarded as a three-dimensional nonlinear system. Specifically, the Lorentz system can be expressed as a coupled differential equation as follows:

$$\begin{aligned} \frac{\partial x}{\partial t}=\phi (y-x),\quad \frac{\partial y}{\partial t}=x(\psi -z)-y,\quad \frac{\partial z}{\partial t}=xy-\omega z, \end{aligned}$$

where \(\phi , \psi , \omega >0\). For chaotic behavior of the Lorenz attractor, we set \(\phi =10\), \(\psi =28\) and \(\omega =8/3\). In our experiment, the Lorentz system are solved by the built-in function ode45(f(t, [xyz]),  [0, 40], [12, 4, 0]) in MATLAB, where \(x,y,z\in {\mathbb {R}}^{2001}\). For convenience, we take \(x_1=x_r(401:800)\), \(x_2=x_g(401:800)\), \(x_3=x_b(401:800)\) as the three true signals, where \(x_r\), \(x_g\) and \(x_b\) are the solutions of the Lorenz system. We add white Gaussian noise

$$\begin{aligned} y_1=x_1+\mathtt{awgn}(x_1,snr),\quad y_2=x_2+\mathtt{awgn}(x_2,snr),\quad y_3=x_3+\mathtt{awgn}(x_3,snr), \end{aligned}$$

by the built-in function awgn in MATLAB, where the signal-to-noise ratio snr is set to be \(snr=5\). For a signal \(x=[x(1),x(2),\cdots ,x(N)]^T\), we can construct a Hankel matrix as follows:

$$\begin{aligned} X=\left[ \begin{array}{cccc} x(1) &{} x(2) &{} \cdots &{} x(s) \\ x(2) &{} x(3) &{} \cdots &{} x(s+1) \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ x(s) &{} x(s+1) &{} \cdots &{} x(N) \\ \end{array} \right] , \end{aligned}$$

where \(s=\lfloor N/2\rfloor \). In a similar way, we can construct three Hankel matrices \(Y_1, Y_2, Y_3\) for signals \(y_1\), \(y_2\) and \(y_3\), respectively. As such, we obtain we can construct a pure quaternion noisy signal matrix

$$\begin{aligned} A=Y_1\mathbf{i}+Y_2\mathbf{j}+Y_3\mathbf{k}. \end{aligned}$$
Fig. 7
figure 7

Original signal and noisy signal

Fig. 8
figure 8

Recovered signal

The original signal and noisy signal are shown in Fig. 7(a). We see from Fig. 7 (b) that, except for the few largest singular values, the other singular values decreases slowly. Therefore, we choose the target rank k such that cumulative contribution ratio of k largest singular values attains 80%. As we know, the noise can be removed effectively by truncating the SVD decomposition of the noisy signal matrix. Therefore, we denoise the noisy signal by using the truncated QSVD and Algorithm 3.2 with \(q=1\) to the noisy signal matrix A and the recovery signal is shown in Fig. 8. We see from Fig. 8 that Algorithm 3.2 is nearly as effective as the truncated QSVD. However, as a randomized algorithm, Algorithm 3.2 is more economical.

6 Conclusions

In this paper, we have presented a randomized quaternion QLP decomposition algorithm for computing a low-rank approximation to a quaternion data matrix. We give the matrix approximation error analysis of the proposed algorithm. To study the singular value approximation error, we first establish the convergence results of the quaternion QLP decomposition, where the provided bounds are sharper than those for the real QLP decomposition. Based on the new convergence analysis for the quaternion QLP decomposition and the quaternion probability theory, we also provide the error bounds for the singular value approximation error of the proposed randomized algorithm. These theoretical results confirm that the proposed randomized algorithm can approximate the prescribed largest singular values of the data matrix with high probability. Our numerical experiments verify the effectiveness of our algorithm for image compression and signal denoising.