Abstract
A greedy block extended Kaczmarz method is introduced for solving the least squares problem where the greedy rule combines the maximum-distances with relaxation parameters. In order to save the computational cost of Moore–Penrose inverse, an average projection technique is used. The convergence theory of the greedy block extended Kaczmarz method is established and an upper bound for the convergence rate is also derived. Numerical experiments show that the proposed method is efficient and better than the randomized block extended Kaczmarz methods in terms of the number of iteration steps and computational time.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider the solution of the least squares problem
where \(A \in \mathbb {R}^{m \times n}\), \( b \in \mathbb {R}^{m}\), which widely arises from many scientific and engineering computing fields, such as image reconstruction [15], big data analysis[6] and optimization [16].
Iterative methods, particularly stochastic iterative methods, recently attract much attention in solving the least squares problem, as direct methods such as QR decomposition and singular value decomposition are usually expensive due to the memory and computational cost. One stochastic iterative method is the randomized extended Kaczmarz (REK) method [26], which was proved to have an exponential convergence in expectation towards the least squares solution \(x_{LS}=A^{\dagger }b\) of (1.1). To accelerate the randomized extended Kaczmarz method, the randomized double block Kaczmarz (RDBK) method was introduced in [20] by selecting multiple rows and columns for projection. In order to save the computational cost of the Moore-Penrose inverse, the randomized extended average block Kaczmarz method [10] and the extended randomized multiple rows method [23] were presented and well studied. For more research on the randomized extended Kaczmarz method, we refer the readers to [3, 4, 9, 24].
Greedy techniques including maximizing the distance and the residual were firstly considered for Schwarz method in [18] and were proved to improve the efficiency of Kaczmarz methods [11]. A greedy randomized Kaczmarz method was proposed in [2] by using the combination of the maximum distance and the average distance to construct a novel greedy strategy. Further, a different greedy randomized Kaczmarz method was presented and studied in [21] with the maximum distance and a relaxation parameter. From a geometric point of view, a geometric probability randomized Kaczmarz method and its greedy version were established in [25].
In order to improve the performance of the randomized block extended Kaczmarz methods, a greedy block extended Kaczmarz method is proposed for solving the least squares problem, and the average block projection is used to save the computational cost. The convergence theory of the greedy block extended Kaczmarz method is established and an upper bound for the convergence rate is derived and analyzed in details. Numerical experiments show that the proposed method is efficient and better than the existing randomized block extended Kaczmarz methods.
The rest of the paper is organized as follows. In Sect. 2, the greedy block extended Kaczmarz method is presented and its convergence theory is established. Numerical experiments are provided in Sect. 3 to illustrate the efficiency and excellent performance of the proposed method. Finally we conclude the paper with a brief summary in Sect. 4.
2 The Greedy Block Extended Kaczmarz Method
This section introduces the greedy block extended Kaczmarz method for solving the least squares problem and establishes its convergence theory.
In the past decade, a number of greedy rules are proposed and studied, for instance, maximizing the distance, the residual and the geometric angles, and different greedy strategies usually lead to different block iterative methods [2, 21, 25].
In this paper, the greedy rule of maximum-distances with relaxation parameter is utilized for both row and column projections. At the \((k+1)\)-th iteration, the column block \(\mathcal {J}_k\) and row block \(\mathcal {I}_k\) are selected as follows:
where \(A^{(i)}, A_{(j)}\) denote the i-th row and the j-th column of A respectively, [m] represents the set \(\{1,2,\ldots ,m\}\), and
The condition \(\rho _y, \rho _x \in (0,1]\) guarantees that \(\mathcal {J}_k\) and \(\mathcal {I}_k\) are non-empty sets. Without pre-partitioning the rows and columns of A, the blocks \(\mathcal {J}_k\) and \(\mathcal {I}_k\) are adaptive and made up of the larger entries of the distance vectors at each iteration.
By combining the above greedy selection rule with the average block projection technique, the greedy block extended Kaczmarz method is proposed and described in detail in Algorithm 1.
In Algorithm 1, \(\tilde{\eta }_{k}\) and \(\eta _k\) are sparse residual vectors used to create two linear combinations of rows in \(A_{:,\mathcal {J}_{k}}^T\) and \(A_{\mathcal {I}_{k},:}\) respectively. These linear combinations serves as the direction of row and column projections, thus the computaion of Moore-Penrose inverse is not required.
Before discussion of the convergence property of the greedy block extended Kaczmarz method, the following useful lemma is introduced.
Lemma 1
([10]). Let \(A\in \mathbb {R}^{m\times n}\) and \({\text {rank}}(A) = r\). For any \(u\in R(A)\), it holds that
where \(\sigma _{1 }(A)\ge \sigma _{2 }(A)\ge \ldots \ge \sigma _{r }(A)>0\) denote all the nonzero singular values of A.
Denote \(R(A)^{\perp }\) as the orthogonal complement of the column space of A and \(b_{R(A)^{\perp }}\) as the orthogonal projection of b onto \(R(A)^{\perp }\). The convergence theory of the sequence \(\{z^{(k)}\}_{k=0}^{\infty }\) generated by Algorithm 1 is established as follows.
Theorem 1
The sequence \(\{z^{(k)}\}_{k=0}^{\infty }\) generated by the GBEK method converges to \(z^*=b_{R(A)^{\perp }}\). Moreover, it holds that
where \(\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\).
Proof
By subtracting \(z^*=b_{R(A)^{\perp }}\) from both sides of step 5 in Algorithm 1, we get
Let \(\tilde{P}_k:=\frac{A\tilde{\eta }_{k}\tilde{\eta }_{k}^TA^T}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\), and \(\tilde{P}_k\) is an orthogonal projection. Due to the fact that \(\tilde{P}_kz^*=0,\) it holds that
Let \(\tilde{e}^{(k)}=z^{(k)}-z^*\). Taking the norm of both sides of the above equality yields:
Note that \(\tilde{e}^{(0)} = z^{(0)}-z^* = AA^{\dagger }b \in R(A)\) and \(\tilde{P}_k\tilde{e}^{(k)}\in R(A)\), then it follows that \(\tilde{e}^{(k+1)}\in R(A)\). For the second term on the right side of (2.2), it holds that
where the first inequality holds because \(\Vert A\tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^T\tilde{\eta }_{k}\Vert _2^2\le \sigma _{1}^2(A_{:,\mathcal {J}_k})\Vert \tilde{\eta }_{k}\Vert _2^2\) and the second inequality holds because \(\Vert \tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^Tz^{(k)}\Vert _2^2\ge \tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2\). Note that
therefore, \(\Vert (A_{:,\mathcal {J}_{k-1}})^Tz^{(k)}\Vert _2^2=0\) and then
Thus,
Substituting (2.3) and (2.4) into (2.2) yields
From the fact \(\frac{\Vert A_{:,\mathcal {J}k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}k})}\ge 1\) and the definition \(\tilde{\phi }_{\min }:= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\), the recursive expression (2.1) is derived. \(\square \)
Remark 1
In the extended randomized multiple rows method, the expected decrease in mean squared error at the \((k+1)\)-th iteration is
which is obtained by taking the expectation over \(\mathcal {J}_k\) for the first equality of (2.3). Here \(\mathbb {E}\) denotes the expected value conditional on the first k iterations. For the greedy block extended Kaczmarz method, the error reduction is \(\frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\). It is obvious that
which indicates that the convergence rate of \(\{z^{(k)}\}_{k=0}^{\infty }\) in the greedy block extended Kaczmarz method is larger than that of the extended randomized multiple rows method.
The convergence analysis of \(\{x^{(k)}\}_{k=0}^{\infty }\) in the greedy block extended Kaczmarz method relies on the utilization of the following lemma.
Lemma 2
([5]). Let \(c_1, c_2\) be real numbers such that \(c_1\in [0,1),\quad c_2\ge -1, \quad c_2-c_1=c_1c_2,\) then
By Theorem 1 and Lemma 2, the convergence property for the greedy block extended Kaczmarz method is constructed as follows.
Theorem 2
Assume \({\text {rank}}(A) = r \). The sequence \(\{x^{(k)}\}_{k=0}^{\infty }\) with the initial guess \(x^{(0)}=0\) generated by the GBEK method converges to the least squares solution \(x_{LS}=A^{\dagger }b\). Moreover, the solution error satisfies
where
with constants \(c_1, c_2 \) from Lemma 2, \( \phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\) and \( \tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2. \)
Proof
Subtracting \(x_{LS}\) from both sides of step 9 in the Algorithm 1 leads to
For simplicity, let \(e^{(k)}=x^{(k)}-x_{LS}\) and \(P_k=\frac{A^T\eta _k\eta _k^TA}{\Vert A^T\eta _k\Vert _2^2}\). Then we have
Observe that \(e^{(k)}-P_ke^{(k)}\) is perpendicular to \(\frac{A^T\eta _k\eta _k^T\tilde{e}^{(k+1)}}{\Vert A^T\eta _k\Vert _2^2}\) and \(P_k\) is an orthogonal projection. By taking the norm of both sides of the above equality and using the Pythagorean Theorem, it yields
Since \(e^{(0)} = x^{(0)}-x_{LS}=A^{\dagger }b\in R(A^T), \frac{\eta _k^T(b-z^{(k+1)}-Ax^{(k)})}{\Vert A^T\eta _k\Vert _2^2}A^T\eta _k\in R(A^T)\), it holds \(e^{(k+1)}\in R(A^T)\) by induction. It follows that
For the term \(\Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2\),
where the first inequality holds because of Lemma 2. Therefore equation (2.5) becomes
where the second inequality holds because \(\frac{\Vert A_{\mathcal {I}_{k},:}\Vert _F^2}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}\ge 1\) and \(\sigma _{1}^2(A_{\mathcal {I}_{k},:})\ge \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\). For the lower bound of \(\varepsilon _k\), it holds that
Let \(\phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\). With the lower bound of \(\varepsilon _k\), the inequality (2.7) is reformulated as
According to Theorem 1,
where \(\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\). For simplicity, let
then the inequality (2.8) is rewritten as
where the third inequality holds because of \(\tilde{e}^{(0)}=z^{(0)}-b_{R(A)^{\perp }}=b_{R(A)}\), and the last inequality holds because of \(x^{(0)}=0\) and \(\Vert b_{R(A)}\Vert _2^2\le \sigma _{1}^2(A)\Vert x_{LS}\Vert _2^2\). This completes the proof. \(\square \)
3 Numerical Experiments
In this section, the numerical examples are presented to show the efficiency of the greedy block extended Kaczmarz (GBEK) method compared with the randomized double block Kaczmarz (RDBK) method, the randomized extended average block Kaczmarz (REABK) method and the extended randomized multiple rows (ERMR) method.
The inconsistent system is \(Ax + \varepsilon = b\), where \(\varepsilon \) is a noise vector whose entries are drawn from a normal distribution and satisfies \(\Vert \varepsilon \Vert _2 = 0.01\times \Vert Ax\Vert _2\). The number of iteration steps (denoted as “IT”) and the computational time in seconds (denoted as “CPU”) are used for evaluation. The row blocks \(\left\{ \mathcal {I}_i\right\} _{i=1}^s\) and the column blocks \(\left\{ \mathcal {J}_j\right\} _{j=1}^t\) of the RDBK, REABK and ERMR methods are partitioned as follows:
and
where \(\tau _r\) and \(\tau _c\) are block sizes for the row and column partitions respectively. To ensure a fair comparison, it is necessary to use the same block size for all four methods. This is achieved by initially applying the GBEK method to get the average sizes of the row and column blocks, then utilizing these sizes to partition the rows and columns for the RDBK, REABK and ERMR methods.
All the methods are started from the initial vectors \(x^{(0)}=0\) and \( z^{(0)}=b\) and stopped if the relative solution error (RSE) satisfies
or the number of iteration steps exceeds 50000. To compare the difference in computational time between the proposed method and other methods, define the following speed-ups:
Example 1
Apply the GBEK method with different parameters \(\rho _x\) and \(\rho _z\) to solve the problem, where A is either a random Gaussian matrix or a sparse matrix from [8].
The influence of parameters \(\rho _x\) and \( \rho _z\) on the efficiency of the GBEK method is firstly explored in Example 1.
In Fig. 1, the curves of the computational time versus \(\rho _z\) with fixed \(\rho _x\) of the GBEK method for two different matrices are presented respectively. For \(A\in \mathbb {R}^{1000\times 100}\) and \({\text {cond}}(A)=50\), it is observed that the computing time first decreases and then increases when the value of \(\rho _x\) is fixed and value of \(\rho _x\) is increasing. Similar phenomenon exists in \(A={\text {rel5}}\). For the rest of the numerical examples, we set the parameters \(\rho _x= \rho _z=0.5\) in the GBEK method.
Example 2
The coefficient matrix A is an overdetermined random Gaussian matrix.
In Table 1, the number of iteration steps, computational time and speed-ups for the RDBK, REABK, ERMR and GBEK methods for solving Example 2 are presented respectively.
From Table 1, it is obvious that the GBEK method significantly reduces the number of iteration steps and computing time, compared with the RDBK, REABK, ERMR methods. The GBEK method demonstrates a noticeable performance against the ERMR method, with a maximum value of \({\text {speed-up}}_3\) reaching 12.9021. Due to the fact that the GBEK method and the ERMR method employ the same iterative format, the possible reason of the superior performance of the GBEK method is the use of greedy block criterion.
Example 3
The coefficient matrix A is an underdetermined random Gaussian matrix.
The numerical results of Example 3 are listed in Table 2. When the coefficient matrix A is underdetermined, the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of both iteration counts and computing time. The GBEK method exhibits the lowest iteration counts and the shortest computational time to achieve the desired accuracy.
Example 4
The matrix A is taken from the SuiteSparse Matrix Collection [8].
For solving Example 4, the numbers of iteration steps, the computational time and the \({\text {speed-up}}\)s for the RDBK, REABK, ERMR and GBEK methods are provided in Table 3. All the coefficient matrices are sparse and rank-deficient, with different matrix sizes, densities, and condition numbers. Here the density of A is defined as
which accurately describes the sparsity of A.
From Table 3, it is seen that the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of the number of iteration steps and computational time. Furthermore, in Table 3, the maximum values of \({\text {speed-up}}_1\) and \({\text {speed-up}}_2\) are 55.3191 and 13.7952 respectively, which further confirms the superiority of the GBEK method for solving large sparse least squares problems.
The curves of the relative solution error versus the iteration counts for the RDBK, REABK, ERMR and GBEK methods for different matrices are showed in Fig. 2. It is clear that the relative solution error of the GBEK method decreases the fastest as the number of iteration steps increases for these three examples.
Example 5
Consider solving the X-ray computed tomography problem in AIR Tools II [14]. The size of the matrix A is set to be \(15300\times 3600\).
In Example 5, the effectiveness of the RDBK, REABK, ERMR and GBEK methods is evaluated by the Peak Signal-to-Noise Ratio (PSNR), which is a widely used metric in image processing to measure the similarity between two images. The higher PSNR value indicates the better image quality. All methods were run with the same number of iteration steps.
The original image and the approximate images recovered by the four methods are given in Fig. 3. It is obvious that the image reconstructed by the GBEK method is the best and attains the highest PSNR value of 34.9177.
4 Conclusions
A greedy block extended Kaczmarz method is proposed for solving least squares problems. Theoretical analysis is established and a linear convergence rate is derived. Numerical experiments show the proposed method exhibits a better performance than randomized block extended Kaczmarz methods in terms of both the number of iteration steps and computational time.
References
Arioli, M., Duff, I.S.: Preconditioning linear least-squares problems by identifying a basis matrix. SIAM J. Sci. Comput. 37(5), S544–S561 (2015)
Bai, Z.-Z., Wen-Ting, W.: On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput. 40(1), A592–A606 (2018)
Bai, Z.-Z., Wen-Ting, W.: On partially randomized extended Kaczmarz method for solving large sparse overdetermined inconsistent linear systems. Linear Algebra Appl. 578, 225–250 (2019)
Bai, Z.-Z., Wen-Ting, W.: On greedy randomized augmented Kaczmarz method for solving large sparse inconsistent linear systems. SIAM J. Sci. Comput. 43(6), A3892–A3911 (2021)
Bao, W., Lv, Z., Zhang, F., Li, W.: A class of residual-based extended Kaczmarz methods for solving inconsistent linear systems. J. Comput. Appl. Math. 416, 114529 (2022)
Björck, Åke: Numerical methods for least squares problems. SIAM, (1996)
Chen, J.-Q., Huang, Z.-D.: On a fast deterministic block Kaczmarz method for solving large-scale linear systems. Numer. Algorithms 89(3), 1007–1029 (2022)
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
Kui, D.: Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms. Numer. Linear Algebra Appl. 26(3), e2233 (2019)
Kui, D., Si, W.-T., Sun, X.-H.: Randomized extended average block Kaczmarz for solving least squares. SIAM J. Sci. Comput. 42(6), A3541–A3559 (2020)
Gower, R.M., Molitor, D., Moorman, J., Needell, D.: On adaptive sketch-and-project for solving linear systems. SIAM J. Matrix Anal. Appl. 42(2), 954–989 (2021)
Haddock, J., Ma, A.: Greed works: An improved analysis of sampling Kaczmarz-Motzkin. SIAM J. Math. Data Sci. 3(1), 342–368 (2021)
Haddock, J., Needell, D.: On Motzkin’s method for inconsistent linear systems. BIT Numer. Math. 59, 387–401 (2019)
Hansen, P.C., Jørgensen, J.S.: AIR Tools II: algebraic iterative reconstruction methods, improved implementation. Numer. Algorithms 79(1), 107–137 (2018)
Herman, G T.: Fundamentals of computerized tomography: image reconstruction from projections. Springer Science & Business Media, (2009)
Hoyos-Idrobo, A., Weiss, P., Massire, A., Amadon, A., Boulant, N.: On variant strategies to solve the magnitude least squares optimization problem in parallel transmission pulse design and under strict SAR and power constraints. IEEE Trans. Med. Imaging 33(3), 739–748 (2013)
Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 36(4), 1590–1604 (2015)
Mccormick, S.F.: The method of Kaczmarz and row orthogonalization for solving linear equations and least squares problems in hilbert space. Indiana Univ. Math. J. 26(6), 1137–1150 (1977)
Necoara, I.: Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 40(4), 1425–1452 (2019)
Needell, D., Zhao, R., Zouzias, A.: Randomized block Kaczmarz method with projection for solving least squares. Linear Algebra Appl. 484, 322–343 (2015)
Niu, Y.-Q., Zheng, B.: A greedy block Kaczmarz algorithm for solving large-scale linear systems. Appl. Math. Lett. 104, 106294 (2020)
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262 (2009)
Wu, N-C, Liu, C, Wang, Y, Zuo, Q: On the extended randomized multiple row method for solving linear least-squares problems. arXiv preprint arXiv:2210.03478, (2022)
Wen-Ting, W.: On two-subspace randomized extended Kaczmarz method for solving large linear least-squares problems. Numer. Algorithms 89(1), 1–31 (2022)
Yang, X.: A geometric probability randomized Kaczmarz method for large scale linear systems. Appl. Numer. Math. 164, 139–160 (2021)
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no Conflict of interest.
Additional information
Communicated by Rosihan M. Ali.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ke, NH., Li, R. & Yin, JF. Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems. Bull. Malays. Math. Sci. Soc. 47, 148 (2024). https://doi.org/10.1007/s40840-024-01739-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40840-024-01739-8