Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems

Ke, Ni-Hong; Li, Rui; Yin, Jun-Feng

doi:10.1007/s40840-024-01739-8

Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems

Published: 29 July 2024

Volume 47, article number 148, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Bulletin of the Malaysian Mathematical Sciences Society Aims and scope Submit manuscript

Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems

Download PDF

Ni-Hong Ke^1,2,
Rui Li³ &
Jun-Feng Yin^1,2

90 Accesses
Explore all metrics

Abstract

A greedy block extended Kaczmarz method is introduced for solving the least squares problem where the greedy rule combines the maximum-distances with relaxation parameters. In order to save the computational cost of Moore–Penrose inverse, an average projection technique is used. The convergence theory of the greedy block extended Kaczmarz method is established and an upper bound for the convergence rate is also derived. Numerical experiments show that the proposed method is efficient and better than the randomized block extended Kaczmarz methods in terms of the number of iteration steps and computational time.

On Multi-step Greedy Kaczmarz Method for Solving Large Sparse Consistent Linear Systems

Article 25 March 2024

On greedy randomized block Gauss–Seidel method with averaging for sparse linear least-squares problems

Article 13 November 2023

A weighted randomized sparse Kaczmarz method for solving linear systems

Article 08 November 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the solution of the least squares problem

$$\begin{aligned} \min \limits _{x\in \mathbb {R}^{n}} \Vert b-Ax\Vert _2^2, \end{aligned}$$

(1.1)

where $A \in \mathbb {R}^{m \times n}$, $ b \in \mathbb {R}^{m}$, which widely arises from many scientific and engineering computing fields, such as image reconstruction [15], big data analysis[6] and optimization [16].

Iterative methods, particularly stochastic iterative methods, recently attract much attention in solving the least squares problem, as direct methods such as QR decomposition and singular value decomposition are usually expensive due to the memory and computational cost. One stochastic iterative method is the randomized extended Kaczmarz (REK) method [26], which was proved to have an exponential convergence in expectation towards the least squares solution $x_{LS}=A^{\dagger }b$ of (1.1). To accelerate the randomized extended Kaczmarz method, the randomized double block Kaczmarz (RDBK) method was introduced in [20] by selecting multiple rows and columns for projection. In order to save the computational cost of the Moore-Penrose inverse, the randomized extended average block Kaczmarz method [10] and the extended randomized multiple rows method [23] were presented and well studied. For more research on the randomized extended Kaczmarz method, we refer the readers to [3, 4, 9, 24].

Greedy techniques including maximizing the distance and the residual were firstly considered for Schwarz method in [18] and were proved to improve the efficiency of Kaczmarz methods [11]. A greedy randomized Kaczmarz method was proposed in [2] by using the combination of the maximum distance and the average distance to construct a novel greedy strategy. Further, a different greedy randomized Kaczmarz method was presented and studied in [21] with the maximum distance and a relaxation parameter. From a geometric point of view, a geometric probability randomized Kaczmarz method and its greedy version were established in [25].

In order to improve the performance of the randomized block extended Kaczmarz methods, a greedy block extended Kaczmarz method is proposed for solving the least squares problem, and the average block projection is used to save the computational cost. The convergence theory of the greedy block extended Kaczmarz method is established and an upper bound for the convergence rate is derived and analyzed in details. Numerical experiments show that the proposed method is efficient and better than the existing randomized block extended Kaczmarz methods.

The rest of the paper is organized as follows. In Sect. 2, the greedy block extended Kaczmarz method is presented and its convergence theory is established. Numerical experiments are provided in Sect. 3 to illustrate the efficiency and excellent performance of the proposed method. Finally we conclude the paper with a brief summary in Sect. 4.

2 The Greedy Block Extended Kaczmarz Method

This section introduces the greedy block extended Kaczmarz method for solving the least squares problem and establishes its convergence theory.

In the past decade, a number of greedy rules are proposed and studied, for instance, maximizing the distance, the residual and the geometric angles, and different greedy strategies usually lead to different block iterative methods [2, 21, 25].

In this paper, the greedy rule of maximum-distances with relaxation parameter is utilized for both row and column projections. At the $(k+1)$-th iteration, the column block $\mathcal {J}_k$ and row block $\mathcal {I}_k$ are selected as follows:

$$ \begin{aligned} \mathcal {J}_{k}&=\{j\in [n]:|A_{(j)}^Tz^{(k)}|^2\ge \tilde{\varepsilon }_k\Vert A_{(j)}\Vert _2^2\},\\ \mathcal {I}_{k}&=\{i\in [m]:|b_i-z^{(k+1)}_i-(A^{(i)})^Tx^{(k)}|^2\ge \varepsilon _k\Vert A^{(i)}\Vert _2^2\}, \end{aligned} $$

where $A^{(i)}, A_{(j)}$ denote the i-th row and the j-th column of A respectively, [m] represents the set $\{1,2,\ldots ,m\}$, and

$$\begin{aligned}{} & {} \tilde{\varepsilon }_k=\rho _y\max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} ,\quad \varepsilon _k=\rho _x\max \limits _{i\in [m]}\left\{ \frac{|b(i)-z^{(k+1)}(i)-(A^{(i)})^Tx^{(k)}|^2}{\Vert A^{(i)}\Vert _2^2}\right\} ,\\{} & {} \quad \quad \rho _y, \rho _x \in (0,1]. \end{aligned}$$

The condition $\rho _y, \rho _x \in (0,1]$ guarantees that $\mathcal {J}_k$ and $\mathcal {I}_k$ are non-empty sets. Without pre-partitioning the rows and columns of A, the blocks $\mathcal {J}_k$ and $\mathcal {I}_k$ are adaptive and made up of the larger entries of the distance vectors at each iteration.

By combining the above greedy selection rule with the average block projection technique, the greedy block extended Kaczmarz method is proposed and described in detail in Algorithm 1.

In Algorithm 1, $\tilde{\eta }_{k}$ and $\eta _k$ are sparse residual vectors used to create two linear combinations of rows in $A_{:,\mathcal {J}_{k}}^T$ and $A_{\mathcal {I}_{k},:}$ respectively. These linear combinations serves as the direction of row and column projections, thus the computaion of Moore-Penrose inverse is not required.

Before discussion of the convergence property of the greedy block extended Kaczmarz method, the following useful lemma is introduced.

Lemma 1

([10]). Let $A\in \mathbb {R}^{m\times n}$ and ${\text {rank}}(A) = r$. For any $u\in R(A)$, it holds that

$$ \sigma _1^2(A)\Vert u\Vert _2^2 \ge \Vert A^Tu\Vert _2^2 \ge \sigma _r^2(A)\Vert u\Vert _2^2, $$

where $\sigma _{1 }(A)\ge \sigma _{2 }(A)\ge \ldots \ge \sigma _{r }(A)>0$ denote all the nonzero singular values of A.

Denote $R(A)^{\perp }$ as the orthogonal complement of the column space of A and $b_{R(A)^{\perp }}$ as the orthogonal projection of b onto $R(A)^{\perp }$. The convergence theory of the sequence $\{z^{(k)}\}_{k=0}^{\infty }$ generated by Algorithm 1 is established as follows.

Theorem 1

The sequence $\{z^{(k)}\}_{k=0}^{\infty }$ generated by the GBEK method converges to $z^*=b_{R(A)^{\perp }}$. Moreover, it holds that

$$\begin{aligned} \Vert z^{(k+1)}-z^*\Vert _2^2\le \left( 1-\frac{\rho _z\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}\right) ^{k+1}\Vert z^{(0)}-z^*\Vert _2^2, \end{aligned}$$

(2.1)

where $\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2$.

Proof

By subtracting $z^*=b_{R(A)^{\perp }}$ from both sides of step 5 in Algorithm 1, we get

$$ z^{(k+1)}-z^*=z^{(k)}-z^*-\frac{\tilde{\eta }_{k}^TA^Tz^{(k)}}{\Vert A\tilde{\eta }_{k}\Vert _2^2}A\tilde{\eta }_{k}. $$

Let $\tilde{P}_k:=\frac{A\tilde{\eta }_{k}\tilde{\eta }_{k}^TA^T}{\Vert A\tilde{\eta }_{k}\Vert _2^2}$, and $\tilde{P}_k$ is an orthogonal projection. Due to the fact that $\tilde{P}_kz^*=0,$ it holds that

$$ z^{(k+1)}-z^*=z^{(k)}-z^*-\tilde{P}_k(z^{(k)}-z^*). $$

Let $\tilde{e}^{(k)}=z^{(k)}-z^*$. Taking the norm of both sides of the above equality yields:

$$\begin{aligned} \Vert \tilde{e}^{(k+1)}\Vert _2^2=\Vert \tilde{e}^{(k)}\Vert _2^2-\Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2. \end{aligned}$$

(2.2)

Note that $\tilde{e}^{(0)} = z^{(0)}-z^* = AA^{\dagger }b \in R(A)$ and $\tilde{P}_k\tilde{e}^{(k)}\in R(A)$, then it follows that $\tilde{e}^{(k+1)}\in R(A)$. For the second term on the right side of (2.2), it holds that

$$\begin{aligned} \Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2=\frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\ge \frac{\Vert (A_{:,\mathcal {J}_k})^Tz^{(k)}\Vert _2^4}{\sigma _{1}^2(A_{:,\mathcal {J}_k})\Vert \tilde{\eta }_{k}\Vert _2^2} \ge \frac{\tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})}, \end{aligned}$$

(2.3)

where the first inequality holds because $\Vert A\tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^T\tilde{\eta }_{k}\Vert _2^2\le \sigma _{1}^2(A_{:,\mathcal {J}_k})\Vert \tilde{\eta }_{k}\Vert _2^2$ and the second inequality holds because $\Vert \tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^Tz^{(k)}\Vert _2^2\ge \tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2$. Note that

$$ \tilde{\eta }_{k-1}^T(A^Tz_k)=\tilde{\eta }_{k-1}^TA^T\left( z^{(k-1)}-\frac{\tilde{\eta }_{k-1}^TA^Tz^{(k-1)}}{\Vert A\tilde{\eta }_{k-1}\Vert _2^2}A\tilde{\eta }_{k-1}\right) =0, $$

therefore, $\Vert (A_{:,\mathcal {J}_{k-1}})^Tz^{(k)}\Vert _2^2=0$ and then

$$ \begin{aligned} \Vert A^Tz^{(k)}\Vert _2^2&=\sum \limits _{j\in [n]\backslash \mathcal {J}_{k-1}}\frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\Vert A_{(j)}\Vert _2^2 \\&\le \max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} (\Vert A\Vert _F^2-\Vert A_{\mathcal {J}_{k-1}}\Vert _F^2). \end{aligned} $$

Thus,

$$\begin{aligned} \tilde{\varepsilon }_k=\rho _z\max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} \ge \rho _z\frac{\Vert A^Tz^{(k)}\Vert _2^2}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}\ge \rho _z \frac{\sigma _{r}^2(A)\Vert \tilde{e}^{k}\Vert _2^2}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}. \end{aligned}$$

(2.4)

Substituting (2.3) and (2.4) into (2.2) yields

$$ \Vert \tilde{e}^{(k+1)}\Vert _2^2\le \Vert \tilde{e}^{(k)}\Vert _2^2-\frac{\tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})} \le \left( 1-\rho _z\frac{\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})}\frac{\sigma _{\min }^2(A)}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}\right) \Vert \tilde{e}^{(k)}\Vert _2^2. $$

From the fact $\frac{\Vert A_{:,\mathcal {J}k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}k})}\ge 1$ and the definition $\tilde{\phi }_{\min }:= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2$, the recursive expression (2.1) is derived. $\square $

Remark 1

In the extended randomized multiple rows method, the expected decrease in mean squared error at the $(k+1)$-th iteration is

$$ \mathbb {E}\Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2 = \frac{\Vert A^Tz^{(k)}\Vert _2^4}{\Vert AA^Tz^{(k)}\Vert _2^2}, $$

which is obtained by taking the expectation over $\mathcal {J}_k$ for the first equality of (2.3). Here $\mathbb {E}$ denotes the expected value conditional on the first k iterations. For the greedy block extended Kaczmarz method, the error reduction is $\frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2}$. It is obvious that

$$ \frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2} = \frac{\sum \limits _{j\in \mathcal {J}_{k}}(A^Tz^{(k)})_j^4}{\sum \limits _{j\in \mathcal {J}_{k}}\Vert A_{(j)}\Vert _2^2(A^Tz^{(k)})_j^2}\ge \frac{\sum \limits _{j\in [n]}(A^Tz^{(k)})_j^4}{\sum \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2(A^Tz^{(k)})_j^2}= \frac{\Vert A^Tz^{(k)}\Vert _2^4}{\Vert AA^Tz^{(k)}\Vert _2^2}, $$

which indicates that the convergence rate of $\{z^{(k)}\}_{k=0}^{\infty }$ in the greedy block extended Kaczmarz method is larger than that of the extended randomized multiple rows method.

The convergence analysis of $\{x^{(k)}\}_{k=0}^{\infty }$ in the greedy block extended Kaczmarz method relies on the utilization of the following lemma.

Lemma 2

([5]). Let $c_1, c_2$ be real numbers such that $c_1\in [0,1),\quad c_2\ge -1, \quad c_2-c_1=c_1c_2,$ then

$$ (r_1+r_2)^2\ge c_1 r_1^2-c_2 r_2^2,\quad \forall \ r_1, r_2\in \mathbb {R}. $$

By Theorem 1 and Lemma 2, the convergence property for the greedy block extended Kaczmarz method is constructed as follows.

Theorem 2

Assume ${\text {rank}}(A) = r $. The sequence $\{x^{(k)}\}_{k=0}^{\infty }$ with the initial guess $x^{(0)}=0$ generated by the GBEK method converges to the least squares solution $x_{LS}=A^{\dagger }b$. Moreover, the solution error satisfies

$$ \Vert x^{(k+1)}-x_{LS}\Vert _2^2\le \max \{\alpha _x,\alpha _z\}^{k+1}\left( 1+(k+1)\beta \sigma _{1}^2(A)\right) \Vert x^{(0)}-x_{LS}\Vert _2^2, $$

where

$$ \alpha _x:=1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}, \alpha _z:=1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}, \beta :=\frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}}, $$

with constants $c_1, c_2 $ from Lemma 2, $ \phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2$ and $ \tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2. $

Proof

Subtracting $x_{LS}$ from both sides of step 9 in the Algorithm 1 leads to

$$ x^{(k+1)}-x_{LS}=x^{(k)}-x_{LS}+\frac{\eta _k^T(b-z^{(k+1)}-Ax^{(k)})}{\Vert A^T\eta _k\Vert _2^2}A^T\eta _k. $$

For simplicity, let $e^{(k)}=x^{(k)}-x_{LS}$ and $P_k=\frac{A^T\eta _k\eta _k^TA}{\Vert A^T\eta _k\Vert _2^2}$. Then we have

$$ e^{(k+1)}=e^{(k)}-P_ke^{(k)}-\frac{A^T\eta _k\eta _k^T\tilde{e}^{(k+1)}}{\Vert A^T\eta _k\Vert _2^2}. $$

Observe that $e^{(k)}-P_ke^{(k)}$ is perpendicular to $\frac{A^T\eta _k\eta _k^T\tilde{e}^{(k+1)}}{\Vert A^T\eta _k\Vert _2^2}$ and $P_k$ is an orthogonal projection. By taking the norm of both sides of the above equality and using the Pythagorean Theorem, it yields

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2=\Vert e^{(k)}\Vert _2^2-\Vert P_ke^{(k)}\Vert _2^2+\frac{(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2}. \end{aligned}$$

(2.5)

Since $e^{(0)} = x^{(0)}-x_{LS}=A^{\dagger }b\in R(A^T), \frac{\eta _k^T(b-z^{(k+1)}-Ax^{(k)})}{\Vert A^T\eta _k\Vert _2^2}A^T\eta _k\in R(A^T)$, it holds $e^{(k+1)}\in R(A^T)$ by induction. It follows that

$$ \begin{aligned} \Vert P_ke^{(k)}\Vert _2^2-\frac{(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2}&=\frac{(\eta _k^TAe^{(k)})^2-(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2} \\&=\frac{\eta _k^T(Ae^{(k)}+\tilde{e}^{(k+1)})\eta _k^T(Ae^{(k)}-\tilde{e}^{(k+1)})}{\Vert A^T\eta _k\Vert _2^2} \\&\ge \frac{\eta _k^T(-Ae^{(k)}+\tilde{e}^{(k+1)})\Vert \eta _k\Vert _2^2}{\sigma _{1}^2(A_{\mathcal {I}_k,:})\Vert \eta _k\Vert _2^2} \\&=\frac{\sum \limits _{i\in \mathcal {I}_{k}}\left( -(A^{(i)})^Te^{(k)}-\tilde{e}^{(k+1)}_i\right) \left( -(A^{(i)})^Te^{(k)}+\tilde{e}^{(k+1)}_i\right) }{\sigma _{1}^2(A_{\mathcal {I}_k,:})} \\&=\frac{\Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2-\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2}{\sigma _{1}^2(A_{\mathcal {I}_k,:})}. \end{aligned} $$

For the term $\Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2$,

$$\begin{aligned} \Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2&=\sum \limits _{i\in \mathcal {I}_{k}}|(A^{(i)})^Te^{(k)}|^2=\sum \limits _{i\in \mathcal {I}_{k}}|b_{R(A)_i}-(A^{(i)})^Tx^{(k)}|^2\nonumber \\&\ge \sum \limits _{i\in \mathcal {I}_{k}}\left( c_1\left( b_i-z^{(k+1)}_i-(A^{(i)})^Tx^{(k)}\right) ^2-c_2\left( z^{(k+1)}_i-b_{R(A)_i^{\perp }}\right) ^2 \right) \nonumber \\&\ge \sum \limits _{i\in \mathcal {I}_{k}}\left( c_1\varepsilon _k\Vert A^{(i)}\Vert _2^2-c_2|\tilde{e}^{(k+1)}_i|^2\right) \nonumber \\&=c_1\varepsilon _k\Vert A_{\mathcal {I}_{k},:}\Vert _F^2-c_2\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2, \end{aligned}$$

(2.6)

where the first inequality holds because of Lemma 2. Therefore equation (2.5) becomes

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \Vert e^{(k)}\Vert _2^2-\frac{c_1\varepsilon _k\Vert A_{\mathcal {I}_{k},:}\Vert _F^2}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}+\frac{c_2+1}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2\nonumber \\&\le \Vert e^{(k)}\Vert _2^2-c_1\varepsilon _k+ \frac{c_2+1}{\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2}\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2 , \end{aligned}$$

(2.7)

where the second inequality holds because $\frac{\Vert A_{\mathcal {I}_{k},:}\Vert _F^2}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}\ge 1$ and $\sigma _{1}^2(A_{\mathcal {I}_{k},:})\ge \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2$. For the lower bound of $\varepsilon _k$, it holds that

$$ \begin{aligned} \varepsilon _k&=\rho _x\max \limits _{i\in [m]}\left\{ \frac{|b(i)-z^{(k+1)}(i)-(A^{(i)})^Tx^{(k)}|^2}{\Vert A^{(i)}\Vert _2^2}\right\} \ge \rho _x\frac{\Vert b-z^{(k+1)}-Ax^{(k)}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2} \\&\ge \rho _x\frac{c_1\Vert b_{R(A)}-Ax^{(k)}\Vert _2^2-c_2\Vert z^{(k+1)}-b_{R(A)^{\perp }}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2} \\&\ge \rho _x\frac{c_1\sigma _{r}^2(A)\Vert e^{(k)}\Vert _2^2-c_2\Vert \tilde{e}^{(k+1)}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2}. \end{aligned} $$

Let $\phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2$. With the lower bound of $\varepsilon _k$, the inequality (2.7) is reformulated as

$$\begin{aligned} \begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \left( 1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}\right) \Vert e_{k}\Vert _2^2\\&+\left( \frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}} \right) \Vert \tilde{e}^{(k+1)}\Vert _2^2. \end{aligned} \end{aligned}$$

(2.8)

According to Theorem 1,

$$ \Vert \tilde{e}^{(k+1)}\Vert _2^2\le \left( 1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}\right) \Vert \tilde{e}^{(k)}\Vert _2^2, $$

where $\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2$. For simplicity, let

$$ \alpha _x:=1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}, \alpha _z:=1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}, \beta :=\frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}}, $$

then the inequality (2.8) is rewritten as

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \alpha _x\Vert e^{(k)}\Vert _2^2+\beta \Vert \tilde{e}^{(k+1)}\Vert _2^2\end{aligned}$$

(2.9)

$$\begin{aligned}&\le \alpha _x^{k+1}\Vert e^{(0)}\Vert _2^2+\beta \sum \limits _{l=0}^k\alpha _x^l\Vert \tilde{e}^{(k+1-l)}\Vert _2^2\nonumber \\&\le \alpha _x^{k+1}\Vert e^{(0)}\Vert _2^2+\beta \sum \limits _{l=0}^k\alpha _x^l\alpha _z^{k+1-l}\Vert \tilde{e}^{(0)}\Vert _2^2\nonumber \\&\le \max \{\alpha _x,\alpha _z\}^{k+1}\Vert e^{(0)}\Vert _2^2+(k+1)\beta \max \{\alpha _x,\alpha _z\}^{k+1}\Vert b_{R(A)}\Vert _2^2\nonumber \\&\le \max \{\alpha _x,\alpha _z\}^{k+1}\left( 1+(k+1)\beta \sigma _{1}^2(A)\right) \Vert e^{(0)}\Vert _2^2, \end{aligned}$$

(2.10)

where the third inequality holds because of $\tilde{e}^{(0)}=z^{(0)}-b_{R(A)^{\perp }}=b_{R(A)}$, and the last inequality holds because of $x^{(0)}=0$ and $\Vert b_{R(A)}\Vert _2^2\le \sigma _{1}^2(A)\Vert x_{LS}\Vert _2^2$. This completes the proof. $\square $

3 Numerical Experiments

In this section, the numerical examples are presented to show the efficiency of the greedy block extended Kaczmarz (GBEK) method compared with the randomized double block Kaczmarz (RDBK) method, the randomized extended average block Kaczmarz (REABK) method and the extended randomized multiple rows (ERMR) method.

The inconsistent system is $Ax + \varepsilon = b$, where $\varepsilon $ is a noise vector whose entries are drawn from a normal distribution and satisfies $\Vert \varepsilon \Vert _2 = 0.01\times \Vert Ax\Vert _2$. The number of iteration steps (denoted as “IT”) and the computational time in seconds (denoted as “CPU”) are used for evaluation. The row blocks $\left\{ \mathcal {I}_i\right\} _{i=1}^s$ and the column blocks $\left\{ \mathcal {J}_j\right\} _{j=1}^t$ of the RDBK, REABK and ERMR methods are partitioned as follows:

$$ \begin{aligned}&\mathcal {I}_i=\{(i-1) \tau _r+1,(i-1) \tau _r+2, \ldots , i \tau _r\}, \quad i=1, 2, \ldots , s-1, \\&\mathcal {I}_s=\{(s-1) \tau _r+1,(s-1) \tau _r+2, \ldots , m\}, \quad \left| \mathcal {I}_s\right| \le \tau _r, \end{aligned} $$

and

$$ \begin{aligned} \mathcal {J}_j&=\{(j-1) \tau _c+1,(j-1) \tau _c+2, \ldots , j \tau _c\}, \quad j=1, 2, \ldots , t-1, \\ \mathcal {J}_t&=\{(t-1) \tau _c+1,(t-1) \tau _c+2, \ldots , n\}, \quad \left| \mathcal {J}_t\right| \le \tau _c, \end{aligned} $$

where $\tau _r$ and $\tau _c$ are block sizes for the row and column partitions respectively. To ensure a fair comparison, it is necessary to use the same block size for all four methods. This is achieved by initially applying the GBEK method to get the average sizes of the row and column blocks, then utilizing these sizes to partition the rows and columns for the RDBK, REABK and ERMR methods.

All the methods are started from the initial vectors $x^{(0)}=0$ and $ z^{(0)}=b$ and stopped if the relative solution error (RSE) satisfies

$$ {\text {RSE}}=\frac{\Vert x^{(k)}-x_{LS}\Vert _{2}^{2}}{\Vert x_{LS}\Vert _{2}^{2}}\le 10^{-6}, $$

or the number of iteration steps exceeds 50000. To compare the difference in computational time between the proposed method and other methods, define the following speed-ups:

$$\begin{aligned}{} & {} {\text {speed-up}}_{1}=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {RDBK}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}},\quad {\text {speed-up}}_2=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {REABK}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}},\quad \\{} & {} \quad {\text {speed-up}}_3=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {ERMR}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}}. \end{aligned}$$

Example 1

Apply the GBEK method with different parameters $\rho _x$ and $\rho _z$ to solve the problem, where A is either a random Gaussian matrix or a sparse matrix from [8].

The influence of parameters $\rho _x$ and $ \rho _z$ on the efficiency of the GBEK method is firstly explored in Example 1.

In Fig. 1, the curves of the computational time versus $\rho _z$ with fixed $\rho _x$ of the GBEK method for two different matrices are presented respectively. For $A\in \mathbb {R}^{1000\times 100}$ and ${\text {cond}}(A)=50$, it is observed that the computing time first decreases and then increases when the value of $\rho _x$ is fixed and value of $\rho _x$ is increasing. Similar phenomenon exists in $A={\text {rel5}}$. For the rest of the numerical examples, we set the parameters $\rho _x= \rho _z=0.5$ in the GBEK method.

Example 2

The coefficient matrix A is an overdetermined random Gaussian matrix.

Table 1 Numerical results for Example 2

Full size table

In Table 1, the number of iteration steps, computational time and speed-ups for the RDBK, REABK, ERMR and GBEK methods for solving Example 2 are presented respectively.

From Table 1, it is obvious that the GBEK method significantly reduces the number of iteration steps and computing time, compared with the RDBK, REABK, ERMR methods. The GBEK method demonstrates a noticeable performance against the ERMR method, with a maximum value of ${\text {speed-up}}_3$ reaching 12.9021. Due to the fact that the GBEK method and the ERMR method employ the same iterative format, the possible reason of the superior performance of the GBEK method is the use of greedy block criterion.

Example 3

The coefficient matrix A is an underdetermined random Gaussian matrix.

Table 2 Numerical results for Example 3

Full size table

The numerical results of Example 3 are listed in Table 2. When the coefficient matrix A is underdetermined, the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of both iteration counts and computing time. The GBEK method exhibits the lowest iteration counts and the shortest computational time to achieve the desired accuracy.

Example 4

The matrix A is taken from the SuiteSparse Matrix Collection [8].

Table 3 Numerical results for Example 4

Full size table

For solving Example 4, the numbers of iteration steps, the computational time and the ${\text {speed-up}}$s for the RDBK, REABK, ERMR and GBEK methods are provided in Table 3. All the coefficient matrices are sparse and rank-deficient, with different matrix sizes, densities, and condition numbers. Here the density of A is defined as

$$ {\text {density}} = \frac{{\text {number}}\hspace{5.0pt}\textrm{of}\hspace{5.0pt}{\text {nonzeros}}\hspace{5.0pt}\textrm{of}\hspace{5.0pt}A}{mn}, $$

which accurately describes the sparsity of A.

From Table 3, it is seen that the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of the number of iteration steps and computational time. Furthermore, in Table 3, the maximum values of ${\text {speed-up}}_1$ and ${\text {speed-up}}_2$ are 55.3191 and 13.7952 respectively, which further confirms the superiority of the GBEK method for solving large sparse least squares problems.

The curves of the relative solution error versus the iteration counts for the RDBK, REABK, ERMR and GBEK methods for different matrices are showed in Fig. 2. It is clear that the relative solution error of the GBEK method decreases the fastest as the number of iteration steps increases for these three examples.

Example 5

Consider solving the X-ray computed tomography problem in AIR Tools II [14]. The size of the matrix A is set to be $15300\times 3600$.

In Example 5, the effectiveness of the RDBK, REABK, ERMR and GBEK methods is evaluated by the Peak Signal-to-Noise Ratio (PSNR), which is a widely used metric in image processing to measure the similarity between two images. The higher PSNR value indicates the better image quality. All methods were run with the same number of iteration steps.

The original image and the approximate images recovered by the four methods are given in Fig. 3. It is obvious that the image reconstructed by the GBEK method is the best and attains the highest PSNR value of 34.9177.

4 Conclusions

A greedy block extended Kaczmarz method is proposed for solving least squares problems. Theoretical analysis is established and a linear convergence rate is derived. Numerical experiments show the proposed method exhibits a better performance than randomized block extended Kaczmarz methods in terms of both the number of iteration steps and computational time.

References

Arioli, M., Duff, I.S.: Preconditioning linear least-squares problems by identifying a basis matrix. SIAM J. Sci. Comput. 37(5), S544–S561 (2015)
Article MathSciNet Google Scholar
Bai, Z.-Z., Wen-Ting, W.: On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput. 40(1), A592–A606 (2018)
Article MathSciNet Google Scholar
Bai, Z.-Z., Wen-Ting, W.: On partially randomized extended Kaczmarz method for solving large sparse overdetermined inconsistent linear systems. Linear Algebra Appl. 578, 225–250 (2019)
Article MathSciNet Google Scholar
Bai, Z.-Z., Wen-Ting, W.: On greedy randomized augmented Kaczmarz method for solving large sparse inconsistent linear systems. SIAM J. Sci. Comput. 43(6), A3892–A3911 (2021)
Article MathSciNet Google Scholar
Bao, W., Lv, Z., Zhang, F., Li, W.: A class of residual-based extended Kaczmarz methods for solving inconsistent linear systems. J. Comput. Appl. Math. 416, 114529 (2022)
Article MathSciNet Google Scholar
Björck, Åke: Numerical methods for least squares problems. SIAM, (1996)
Chen, J.-Q., Huang, Z.-D.: On a fast deterministic block Kaczmarz method for solving large-scale linear systems. Numer. Algorithms 89(3), 1007–1029 (2022)
Article MathSciNet Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. (TOMS) 38(1), 1–25 (2011)
MathSciNet Google Scholar
Kui, D.: Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss-Seidel algorithms. Numer. Linear Algebra Appl. 26(3), e2233 (2019)
Article MathSciNet Google Scholar
Kui, D., Si, W.-T., Sun, X.-H.: Randomized extended average block Kaczmarz for solving least squares. SIAM J. Sci. Comput. 42(6), A3541–A3559 (2020)
Article MathSciNet Google Scholar
Gower, R.M., Molitor, D., Moorman, J., Needell, D.: On adaptive sketch-and-project for solving linear systems. SIAM J. Matrix Anal. Appl. 42(2), 954–989 (2021)
Article MathSciNet Google Scholar
Haddock, J., Ma, A.: Greed works: An improved analysis of sampling Kaczmarz-Motzkin. SIAM J. Math. Data Sci. 3(1), 342–368 (2021)
Article MathSciNet Google Scholar
Haddock, J., Needell, D.: On Motzkin’s method for inconsistent linear systems. BIT Numer. Math. 59, 387–401 (2019)
Article MathSciNet Google Scholar
Hansen, P.C., Jørgensen, J.S.: AIR Tools II: algebraic iterative reconstruction methods, improved implementation. Numer. Algorithms 79(1), 107–137 (2018)
Article MathSciNet Google Scholar
Herman, G T.: Fundamentals of computerized tomography: image reconstruction from projections. Springer Science & Business Media, (2009)
Hoyos-Idrobo, A., Weiss, P., Massire, A., Amadon, A., Boulant, N.: On variant strategies to solve the magnitude least squares optimization problem in parallel transmission pulse design and under strict SAR and power constraints. IEEE Trans. Med. Imaging 33(3), 739–748 (2013)
Article Google Scholar
Ma, A., Needell, D., Ramdas, A.: Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl. 36(4), 1590–1604 (2015)
Article MathSciNet Google Scholar
Mccormick, S.F.: The method of Kaczmarz and row orthogonalization for solving linear equations and least squares problems in hilbert space. Indiana Univ. Math. J. 26(6), 1137–1150 (1977)
Article MathSciNet Google Scholar
Necoara, I.: Faster randomized block Kaczmarz algorithms. SIAM J. Matrix Anal. Appl. 40(4), 1425–1452 (2019)
Article MathSciNet Google Scholar
Needell, D., Zhao, R., Zouzias, A.: Randomized block Kaczmarz method with projection for solving least squares. Linear Algebra Appl. 484, 322–343 (2015)
Article MathSciNet Google Scholar
Niu, Y.-Q., Zheng, B.: A greedy block Kaczmarz algorithm for solving large-scale linear systems. Appl. Math. Lett. 104, 106294 (2020)
Article MathSciNet Google Scholar
Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15(2), 262 (2009)
Article MathSciNet Google Scholar
Wu, N-C, Liu, C, Wang, Y, Zuo, Q: On the extended randomized multiple row method for solving linear least-squares problems. arXiv preprint arXiv:2210.03478, (2022)
Wen-Ting, W.: On two-subspace randomized extended Kaczmarz method for solving large linear least-squares problems. Numer. Algorithms 89(1), 1–31 (2022)
Article MathSciNet Google Scholar
Yang, X.: A geometric probability randomized Kaczmarz method for large scale linear systems. Appl. Numer. Math. 164, 139–160 (2021)
Article MathSciNet Google Scholar
Zouzias, A., Freris, N.M.: Randomized extended Kaczmarz for solving least squares. SIAM J. Matrix Anal. Appl. 34(2), 773–793 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematical Sciences, Tongji University, Shanghai, 200092, China
Ni-Hong Ke & Jun-Feng Yin
Key Laboratory of Intelligent Computing and Applications (Ministry of Education), Shanghai, 200092, China
Ni-Hong Ke & Jun-Feng Yin
College of Data Science, Jiaxing University, Jiaxing, 314001, China
Rui Li

Authors

Ni-Hong Ke
View author publications
You can also search for this author in PubMed Google Scholar
Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Jun-Feng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun-Feng Yin.

Ethics declarations

Conflict of interest

The authors declare that there is no Conflict of interest.

Additional information

Communicated by Rosihan M. Ali.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ke, NH., Li, R. & Yin, JF. Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems. Bull. Malays. Math. Sci. Soc. 47, 148 (2024). https://doi.org/10.1007/s40840-024-01739-8

Download citation

Received: 18 January 2024
Revised: 14 May 2024
Accepted: 25 June 2024
Published: 29 July 2024
DOI: https://doi.org/10.1007/s40840-024-01739-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Greedy Block Extended Kaczmarz Method for Solving the Least Squares Problems

Abstract

Similar content being viewed by others

On Multi-step Greedy Kaczmarz Method for Solving Large Sparse Consistent Linear Systems

On greedy randomized block Gauss–Seidel method with averaging for sparse linear least-squares problems

A weighted randomized sparse Kaczmarz method for solving linear systems

1 Introduction