1 Introduction

Consider the solution of the least squares problem

$$\begin{aligned} \min \limits _{x\in \mathbb {R}^{n}} \Vert b-Ax\Vert _2^2, \end{aligned}$$
(1.1)

where \(A \in \mathbb {R}^{m \times n}\), \( b \in \mathbb {R}^{m}\), which widely arises from many scientific and engineering computing fields, such as image reconstruction [15], big data analysis[6] and optimization [16].

Iterative methods, particularly stochastic iterative methods, recently attract much attention in solving the least squares problem, as direct methods such as QR decomposition and singular value decomposition are usually expensive due to the memory and computational cost. One stochastic iterative method is the randomized extended Kaczmarz (REK) method [26], which was proved to have an exponential convergence in expectation towards the least squares solution \(x_{LS}=A^{\dagger }b\) of (1.1). To accelerate the randomized extended Kaczmarz method, the randomized double block Kaczmarz (RDBK) method was introduced in [20] by selecting multiple rows and columns for projection. In order to save the computational cost of the Moore-Penrose inverse, the randomized extended average block Kaczmarz method [10] and the extended randomized multiple rows method [23] were presented and well studied. For more research on the randomized extended Kaczmarz method, we refer the readers to [3, 4, 9, 24].

Greedy techniques including maximizing the distance and the residual were firstly considered for Schwarz method in [18] and were proved to improve the efficiency of Kaczmarz methods [11]. A greedy randomized Kaczmarz method was proposed in [2] by using the combination of the maximum distance and the average distance to construct a novel greedy strategy. Further, a different greedy randomized Kaczmarz method was presented and studied in [21] with the maximum distance and a relaxation parameter. From a geometric point of view, a geometric probability randomized Kaczmarz method and its greedy version were established in [25].

In order to improve the performance of the randomized block extended Kaczmarz methods, a greedy block extended Kaczmarz method is proposed for solving the least squares problem, and the average block projection is used to save the computational cost. The convergence theory of the greedy block extended Kaczmarz method is established and an upper bound for the convergence rate is derived and analyzed in details. Numerical experiments show that the proposed method is efficient and better than the existing randomized block extended Kaczmarz methods.

The rest of the paper is organized as follows. In Sect. 2, the greedy block extended Kaczmarz method is presented and its convergence theory is established. Numerical experiments are provided in Sect. 3 to illustrate the efficiency and excellent performance of the proposed method. Finally we conclude the paper with a brief summary in Sect. 4.

2 The Greedy Block Extended Kaczmarz Method

This section introduces the greedy block extended Kaczmarz method for solving the least squares problem and establishes its convergence theory.

In the past decade, a number of greedy rules are proposed and studied, for instance, maximizing the distance, the residual and the geometric angles, and different greedy strategies usually lead to different block iterative methods [2, 21, 25].

In this paper, the greedy rule of maximum-distances with relaxation parameter is utilized for both row and column projections. At the \((k+1)\)-th iteration, the column block \(\mathcal {J}_k\) and row block \(\mathcal {I}_k\) are selected as follows:

$$ \begin{aligned} \mathcal {J}_{k}&=\{j\in [n]:|A_{(j)}^Tz^{(k)}|^2\ge \tilde{\varepsilon }_k\Vert A_{(j)}\Vert _2^2\},\\ \mathcal {I}_{k}&=\{i\in [m]:|b_i-z^{(k+1)}_i-(A^{(i)})^Tx^{(k)}|^2\ge \varepsilon _k\Vert A^{(i)}\Vert _2^2\}, \end{aligned} $$

where \(A^{(i)}, A_{(j)}\) denote the i-th row and the j-th column of A respectively, [m] represents the set \(\{1,2,\ldots ,m\}\), and

$$\begin{aligned}{} & {} \tilde{\varepsilon }_k=\rho _y\max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} ,\quad \varepsilon _k=\rho _x\max \limits _{i\in [m]}\left\{ \frac{|b(i)-z^{(k+1)}(i)-(A^{(i)})^Tx^{(k)}|^2}{\Vert A^{(i)}\Vert _2^2}\right\} ,\\{} & {} \quad \quad \rho _y, \rho _x \in (0,1]. \end{aligned}$$

The condition \(\rho _y, \rho _x \in (0,1]\) guarantees that \(\mathcal {J}_k\) and \(\mathcal {I}_k\) are non-empty sets. Without pre-partitioning the rows and columns of A, the blocks \(\mathcal {J}_k\) and \(\mathcal {I}_k\) are adaptive and made up of the larger entries of the distance vectors at each iteration.

By combining the above greedy selection rule with the average block projection technique, the greedy block extended Kaczmarz method is proposed and described in detail in Algorithm 1.

Algorithm 1
figure a

(The greedy block extended Kaczmarz method)

In Algorithm 1, \(\tilde{\eta }_{k}\) and \(\eta _k\) are sparse residual vectors used to create two linear combinations of rows in \(A_{:,\mathcal {J}_{k}}^T\) and \(A_{\mathcal {I}_{k},:}\) respectively. These linear combinations serves as the direction of row and column projections, thus the computaion of Moore-Penrose inverse is not required.

Before discussion of the convergence property of the greedy block extended Kaczmarz method, the following useful lemma is introduced.

Lemma 1

([10]). Let \(A\in \mathbb {R}^{m\times n}\) and \({\text {rank}}(A) = r\). For any \(u\in R(A)\), it holds that

$$ \sigma _1^2(A)\Vert u\Vert _2^2 \ge \Vert A^Tu\Vert _2^2 \ge \sigma _r^2(A)\Vert u\Vert _2^2, $$

where \(\sigma _{1 }(A)\ge \sigma _{2 }(A)\ge \ldots \ge \sigma _{r }(A)>0\) denote all the nonzero singular values of A.

Denote \(R(A)^{\perp }\) as the orthogonal complement of the column space of A and \(b_{R(A)^{\perp }}\) as the orthogonal projection of b onto \(R(A)^{\perp }\). The convergence theory of the sequence \(\{z^{(k)}\}_{k=0}^{\infty }\) generated by Algorithm 1 is established as follows.

Theorem 1

The sequence \(\{z^{(k)}\}_{k=0}^{\infty }\) generated by the GBEK method converges to \(z^*=b_{R(A)^{\perp }}\). Moreover, it holds that

$$\begin{aligned} \Vert z^{(k+1)}-z^*\Vert _2^2\le \left( 1-\frac{\rho _z\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}\right) ^{k+1}\Vert z^{(0)}-z^*\Vert _2^2, \end{aligned}$$
(2.1)

where \(\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\).

Proof

By subtracting \(z^*=b_{R(A)^{\perp }}\) from both sides of step 5 in Algorithm 1, we get

$$ z^{(k+1)}-z^*=z^{(k)}-z^*-\frac{\tilde{\eta }_{k}^TA^Tz^{(k)}}{\Vert A\tilde{\eta }_{k}\Vert _2^2}A\tilde{\eta }_{k}. $$

Let \(\tilde{P}_k:=\frac{A\tilde{\eta }_{k}\tilde{\eta }_{k}^TA^T}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\), and \(\tilde{P}_k\) is an orthogonal projection. Due to the fact that \(\tilde{P}_kz^*=0,\) it holds that

$$ z^{(k+1)}-z^*=z^{(k)}-z^*-\tilde{P}_k(z^{(k)}-z^*). $$

Let \(\tilde{e}^{(k)}=z^{(k)}-z^*\). Taking the norm of both sides of the above equality yields:

$$\begin{aligned} \Vert \tilde{e}^{(k+1)}\Vert _2^2=\Vert \tilde{e}^{(k)}\Vert _2^2-\Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2. \end{aligned}$$
(2.2)

Note that \(\tilde{e}^{(0)} = z^{(0)}-z^* = AA^{\dagger }b \in R(A)\) and \(\tilde{P}_k\tilde{e}^{(k)}\in R(A)\), then it follows that \(\tilde{e}^{(k+1)}\in R(A)\). For the second term on the right side of (2.2), it holds that

$$\begin{aligned} \Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2=\frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\ge \frac{\Vert (A_{:,\mathcal {J}_k})^Tz^{(k)}\Vert _2^4}{\sigma _{1}^2(A_{:,\mathcal {J}_k})\Vert \tilde{\eta }_{k}\Vert _2^2} \ge \frac{\tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})}, \end{aligned}$$
(2.3)

where the first inequality holds because \(\Vert A\tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^T\tilde{\eta }_{k}\Vert _2^2\le \sigma _{1}^2(A_{:,\mathcal {J}_k})\Vert \tilde{\eta }_{k}\Vert _2^2\) and the second inequality holds because \(\Vert \tilde{\eta }_{k}\Vert _2^2=\Vert (A_{:,\mathcal {J}_k})^Tz^{(k)}\Vert _2^2\ge \tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2\). Note that

$$ \tilde{\eta }_{k-1}^T(A^Tz_k)=\tilde{\eta }_{k-1}^TA^T\left( z^{(k-1)}-\frac{\tilde{\eta }_{k-1}^TA^Tz^{(k-1)}}{\Vert A\tilde{\eta }_{k-1}\Vert _2^2}A\tilde{\eta }_{k-1}\right) =0, $$

therefore, \(\Vert (A_{:,\mathcal {J}_{k-1}})^Tz^{(k)}\Vert _2^2=0\) and then

$$ \begin{aligned} \Vert A^Tz^{(k)}\Vert _2^2&=\sum \limits _{j\in [n]\backslash \mathcal {J}_{k-1}}\frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\Vert A_{(j)}\Vert _2^2 \\&\le \max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} (\Vert A\Vert _F^2-\Vert A_{\mathcal {J}_{k-1}}\Vert _F^2). \end{aligned} $$

Thus,

$$\begin{aligned} \tilde{\varepsilon }_k=\rho _z\max \limits _{j\in [n]}\left\{ \frac{|A_{(j)}^Tz^{(k)}|^2}{\Vert A_{(j)}\Vert _2^2}\right\} \ge \rho _z\frac{\Vert A^Tz^{(k)}\Vert _2^2}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}\ge \rho _z \frac{\sigma _{r}^2(A)\Vert \tilde{e}^{k}\Vert _2^2}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}. \end{aligned}$$
(2.4)

Substituting (2.3) and (2.4) into (2.2) yields

$$ \Vert \tilde{e}^{(k+1)}\Vert _2^2\le \Vert \tilde{e}^{(k)}\Vert _2^2-\frac{\tilde{\varepsilon }_k\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})} \le \left( 1-\rho _z\frac{\Vert A_{:,\mathcal {J}_k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}_k})}\frac{\sigma _{\min }^2(A)}{\Vert A\Vert _F^2-\Vert A_{:,\mathcal {J}_{k-1}}\Vert _F^2}\right) \Vert \tilde{e}^{(k)}\Vert _2^2. $$

From the fact \(\frac{\Vert A_{:,\mathcal {J}k}\Vert _F^2}{\sigma _{1}^2(A_{:,\mathcal {J}k})}\ge 1\) and the definition \(\tilde{\phi }_{\min }:= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\), the recursive expression (2.1) is derived. \(\square \)

Remark 1

In the extended randomized multiple rows method, the expected decrease in mean squared error at the \((k+1)\)-th iteration is

$$ \mathbb {E}\Vert \tilde{P}_k\tilde{e}^{(k)}\Vert _2^2 = \frac{\Vert A^Tz^{(k)}\Vert _2^4}{\Vert AA^Tz^{(k)}\Vert _2^2}, $$

which is obtained by taking the expectation over \(\mathcal {J}_k\) for the first equality of (2.3). Here \(\mathbb {E}\) denotes the expected value conditional on the first k iterations. For the greedy block extended Kaczmarz method, the error reduction is \(\frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2}\). It is obvious that

$$ \frac{(\tilde{\eta }_{k}^TA^T\tilde{e}^{(k)})^2}{\Vert A\tilde{\eta }_{k}\Vert _2^2} = \frac{\sum \limits _{j\in \mathcal {J}_{k}}(A^Tz^{(k)})_j^4}{\sum \limits _{j\in \mathcal {J}_{k}}\Vert A_{(j)}\Vert _2^2(A^Tz^{(k)})_j^2}\ge \frac{\sum \limits _{j\in [n]}(A^Tz^{(k)})_j^4}{\sum \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2(A^Tz^{(k)})_j^2}= \frac{\Vert A^Tz^{(k)}\Vert _2^4}{\Vert AA^Tz^{(k)}\Vert _2^2}, $$

which indicates that the convergence rate of \(\{z^{(k)}\}_{k=0}^{\infty }\) in the greedy block extended Kaczmarz method is larger than that of the extended randomized multiple rows method.

The convergence analysis of \(\{x^{(k)}\}_{k=0}^{\infty }\) in the greedy block extended Kaczmarz method relies on the utilization of the following lemma.

Lemma 2

([5]). Let \(c_1, c_2\) be real numbers such that \(c_1\in [0,1),\quad c_2\ge -1, \quad c_2-c_1=c_1c_2,\) then

$$ (r_1+r_2)^2\ge c_1 r_1^2-c_2 r_2^2,\quad \forall \ r_1, r_2\in \mathbb {R}. $$

By Theorem 1 and Lemma 2, the convergence property for the greedy block extended Kaczmarz method is constructed as follows.

Theorem 2

Assume \({\text {rank}}(A) = r \). The sequence \(\{x^{(k)}\}_{k=0}^{\infty }\) with the initial guess \(x^{(0)}=0\) generated by the GBEK method converges to the least squares solution \(x_{LS}=A^{\dagger }b\). Moreover, the solution error satisfies

$$ \Vert x^{(k+1)}-x_{LS}\Vert _2^2\le \max \{\alpha _x,\alpha _z\}^{k+1}\left( 1+(k+1)\beta \sigma _{1}^2(A)\right) \Vert x^{(0)}-x_{LS}\Vert _2^2, $$

where

$$ \alpha _x:=1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}, \alpha _z:=1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}, \beta :=\frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}}, $$

with constants \(c_1, c_2 \) from Lemma 2, \( \phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\) and \( \tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2. \)

Proof

Subtracting \(x_{LS}\) from both sides of step 9 in the Algorithm 1 leads to

$$ x^{(k+1)}-x_{LS}=x^{(k)}-x_{LS}+\frac{\eta _k^T(b-z^{(k+1)}-Ax^{(k)})}{\Vert A^T\eta _k\Vert _2^2}A^T\eta _k. $$

For simplicity, let \(e^{(k)}=x^{(k)}-x_{LS}\) and \(P_k=\frac{A^T\eta _k\eta _k^TA}{\Vert A^T\eta _k\Vert _2^2}\). Then we have

$$ e^{(k+1)}=e^{(k)}-P_ke^{(k)}-\frac{A^T\eta _k\eta _k^T\tilde{e}^{(k+1)}}{\Vert A^T\eta _k\Vert _2^2}. $$

Observe that \(e^{(k)}-P_ke^{(k)}\) is perpendicular to \(\frac{A^T\eta _k\eta _k^T\tilde{e}^{(k+1)}}{\Vert A^T\eta _k\Vert _2^2}\) and \(P_k\) is an orthogonal projection. By taking the norm of both sides of the above equality and using the Pythagorean Theorem, it yields

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2=\Vert e^{(k)}\Vert _2^2-\Vert P_ke^{(k)}\Vert _2^2+\frac{(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2}. \end{aligned}$$
(2.5)

Since \(e^{(0)} = x^{(0)}-x_{LS}=A^{\dagger }b\in R(A^T), \frac{\eta _k^T(b-z^{(k+1)}-Ax^{(k)})}{\Vert A^T\eta _k\Vert _2^2}A^T\eta _k\in R(A^T)\), it holds \(e^{(k+1)}\in R(A^T)\) by induction. It follows that

$$ \begin{aligned} \Vert P_ke^{(k)}\Vert _2^2-\frac{(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2}&=\frac{(\eta _k^TAe^{(k)})^2-(\eta _k^T\tilde{e}^{(k+1)})^2}{\Vert A^T\eta _k\Vert _2^2} \\&=\frac{\eta _k^T(Ae^{(k)}+\tilde{e}^{(k+1)})\eta _k^T(Ae^{(k)}-\tilde{e}^{(k+1)})}{\Vert A^T\eta _k\Vert _2^2} \\&\ge \frac{\eta _k^T(-Ae^{(k)}+\tilde{e}^{(k+1)})\Vert \eta _k\Vert _2^2}{\sigma _{1}^2(A_{\mathcal {I}_k,:})\Vert \eta _k\Vert _2^2} \\&=\frac{\sum \limits _{i\in \mathcal {I}_{k}}\left( -(A^{(i)})^Te^{(k)}-\tilde{e}^{(k+1)}_i\right) \left( -(A^{(i)})^Te^{(k)}+\tilde{e}^{(k+1)}_i\right) }{\sigma _{1}^2(A_{\mathcal {I}_k,:})} \\&=\frac{\Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2-\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2}{\sigma _{1}^2(A_{\mathcal {I}_k,:})}. \end{aligned} $$

For the term \(\Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2\),

$$\begin{aligned} \Vert A_{\mathcal {I}_{k},:}e^{(k)}\Vert _2^2&=\sum \limits _{i\in \mathcal {I}_{k}}|(A^{(i)})^Te^{(k)}|^2=\sum \limits _{i\in \mathcal {I}_{k}}|b_{R(A)_i}-(A^{(i)})^Tx^{(k)}|^2\nonumber \\&\ge \sum \limits _{i\in \mathcal {I}_{k}}\left( c_1\left( b_i-z^{(k+1)}_i-(A^{(i)})^Tx^{(k)}\right) ^2-c_2\left( z^{(k+1)}_i-b_{R(A)_i^{\perp }}\right) ^2 \right) \nonumber \\&\ge \sum \limits _{i\in \mathcal {I}_{k}}\left( c_1\varepsilon _k\Vert A^{(i)}\Vert _2^2-c_2|\tilde{e}^{(k+1)}_i|^2\right) \nonumber \\&=c_1\varepsilon _k\Vert A_{\mathcal {I}_{k},:}\Vert _F^2-c_2\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2, \end{aligned}$$
(2.6)

where the first inequality holds because of Lemma 2. Therefore equation (2.5) becomes

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \Vert e^{(k)}\Vert _2^2-\frac{c_1\varepsilon _k\Vert A_{\mathcal {I}_{k},:}\Vert _F^2}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}+\frac{c_2+1}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2\nonumber \\&\le \Vert e^{(k)}\Vert _2^2-c_1\varepsilon _k+ \frac{c_2+1}{\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2}\Vert \tilde{e}^{(k+1)}_{\mathcal {I}_{k}}\Vert _2^2 , \end{aligned}$$
(2.7)

where the second inequality holds because \(\frac{\Vert A_{\mathcal {I}_{k},:}\Vert _F^2}{\sigma _{1}^2(A_{\mathcal {I}_{k},:})}\ge 1\) and \(\sigma _{1}^2(A_{\mathcal {I}_{k},:})\ge \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\). For the lower bound of \(\varepsilon _k\), it holds that

$$ \begin{aligned} \varepsilon _k&=\rho _x\max \limits _{i\in [m]}\left\{ \frac{|b(i)-z^{(k+1)}(i)-(A^{(i)})^Tx^{(k)}|^2}{\Vert A^{(i)}\Vert _2^2}\right\} \ge \rho _x\frac{\Vert b-z^{(k+1)}-Ax^{(k)}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2} \\&\ge \rho _x\frac{c_1\Vert b_{R(A)}-Ax^{(k)}\Vert _2^2-c_2\Vert z^{(k+1)}-b_{R(A)^{\perp }}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2} \\&\ge \rho _x\frac{c_1\sigma _{r}^2(A)\Vert e^{(k)}\Vert _2^2-c_2\Vert \tilde{e}^{(k+1)}\Vert _2^2}{\Vert A\Vert _F^2-\min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2}. \end{aligned} $$

Let \(\phi _{{\min }}= \min \limits _{i\in [m]}\Vert A^{(i)}\Vert _2^2\). With the lower bound of \(\varepsilon _k\), the inequality (2.7) is reformulated as

$$\begin{aligned} \begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \left( 1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}\right) \Vert e_{k}\Vert _2^2\\&+\left( \frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}} \right) \Vert \tilde{e}^{(k+1)}\Vert _2^2. \end{aligned} \end{aligned}$$
(2.8)

According to Theorem 1,

$$ \Vert \tilde{e}^{(k+1)}\Vert _2^2\le \left( 1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}\right) \Vert \tilde{e}^{(k)}\Vert _2^2, $$

where \(\tilde{\phi }_{{\min }}= \min \limits _{j\in [n]}\Vert A_{(j)}\Vert _2^2\). For simplicity, let

$$ \alpha _x:=1-\rho _x c_1^2\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\phi _{{\min }}}, \alpha _z:=1-\rho _z\frac{\sigma _{r}^2(A)}{\Vert A\Vert _F^2-\tilde{\phi }_{{\min }}}, \beta :=\frac{\rho _x c_1c_2}{\Vert A\Vert _F^2-\phi _{{\min }}}+\frac{c_2+1}{\phi _{{\min }}}, $$

then the inequality (2.8) is rewritten as

$$\begin{aligned} \Vert e^{(k+1)}\Vert _2^2&\le \alpha _x\Vert e^{(k)}\Vert _2^2+\beta \Vert \tilde{e}^{(k+1)}\Vert _2^2\end{aligned}$$
(2.9)
$$\begin{aligned}&\le \alpha _x^{k+1}\Vert e^{(0)}\Vert _2^2+\beta \sum \limits _{l=0}^k\alpha _x^l\Vert \tilde{e}^{(k+1-l)}\Vert _2^2\nonumber \\&\le \alpha _x^{k+1}\Vert e^{(0)}\Vert _2^2+\beta \sum \limits _{l=0}^k\alpha _x^l\alpha _z^{k+1-l}\Vert \tilde{e}^{(0)}\Vert _2^2\nonumber \\&\le \max \{\alpha _x,\alpha _z\}^{k+1}\Vert e^{(0)}\Vert _2^2+(k+1)\beta \max \{\alpha _x,\alpha _z\}^{k+1}\Vert b_{R(A)}\Vert _2^2\nonumber \\&\le \max \{\alpha _x,\alpha _z\}^{k+1}\left( 1+(k+1)\beta \sigma _{1}^2(A)\right) \Vert e^{(0)}\Vert _2^2, \end{aligned}$$
(2.10)

where the third inequality holds because of \(\tilde{e}^{(0)}=z^{(0)}-b_{R(A)^{\perp }}=b_{R(A)}\), and the last inequality holds because of \(x^{(0)}=0\) and \(\Vert b_{R(A)}\Vert _2^2\le \sigma _{1}^2(A)\Vert x_{LS}\Vert _2^2\). This completes the proof. \(\square \)

3 Numerical Experiments

In this section, the numerical examples are presented to show the efficiency of the greedy block extended Kaczmarz (GBEK) method compared with the randomized double block Kaczmarz (RDBK) method, the randomized extended average block Kaczmarz (REABK) method and the extended randomized multiple rows (ERMR) method.

The inconsistent system is \(Ax + \varepsilon = b\), where \(\varepsilon \) is a noise vector whose entries are drawn from a normal distribution and satisfies \(\Vert \varepsilon \Vert _2 = 0.01\times \Vert Ax\Vert _2\). The number of iteration steps (denoted as “IT”) and the computational time in seconds (denoted as “CPU”) are used for evaluation. The row blocks \(\left\{ \mathcal {I}_i\right\} _{i=1}^s\) and the column blocks \(\left\{ \mathcal {J}_j\right\} _{j=1}^t\) of the RDBK, REABK and ERMR methods are partitioned as follows:

$$ \begin{aligned}&\mathcal {I}_i=\{(i-1) \tau _r+1,(i-1) \tau _r+2, \ldots , i \tau _r\}, \quad i=1, 2, \ldots , s-1, \\&\mathcal {I}_s=\{(s-1) \tau _r+1,(s-1) \tau _r+2, \ldots , m\}, \quad \left| \mathcal {I}_s\right| \le \tau _r, \end{aligned} $$

and

$$ \begin{aligned} \mathcal {J}_j&=\{(j-1) \tau _c+1,(j-1) \tau _c+2, \ldots , j \tau _c\}, \quad j=1, 2, \ldots , t-1, \\ \mathcal {J}_t&=\{(t-1) \tau _c+1,(t-1) \tau _c+2, \ldots , n\}, \quad \left| \mathcal {J}_t\right| \le \tau _c, \end{aligned} $$

where \(\tau _r\) and \(\tau _c\) are block sizes for the row and column partitions respectively. To ensure a fair comparison, it is necessary to use the same block size for all four methods. This is achieved by initially applying the GBEK method to get the average sizes of the row and column blocks, then utilizing these sizes to partition the rows and columns for the RDBK, REABK and ERMR methods.

All the methods are started from the initial vectors \(x^{(0)}=0\) and \( z^{(0)}=b\) and stopped if the relative solution error (RSE) satisfies

$$ {\text {RSE}}=\frac{\Vert x^{(k)}-x_{LS}\Vert _{2}^{2}}{\Vert x_{LS}\Vert _{2}^{2}}\le 10^{-6}, $$

or the number of iteration steps exceeds 50000. To compare the difference in computational time between the proposed method and other methods, define the following speed-ups:

$$\begin{aligned}{} & {} {\text {speed-up}}_{1}=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {RDBK}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}},\quad {\text {speed-up}}_2=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {REABK}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}},\quad \\{} & {} \quad {\text {speed-up}}_3=\frac{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {ERMR}}}{{\text {CPU}}\hspace{5.0pt}\textrm{of} \hspace{5.0pt}{\text {GBEK}}}. \end{aligned}$$

Example 1

Apply the GBEK method with different parameters \(\rho _x\) and \(\rho _z\) to solve the problem, where A is either a random Gaussian matrix or a sparse matrix from [8].

The influence of parameters \(\rho _x\) and \( \rho _z\) on the efficiency of the GBEK method is firstly explored in Example 1.

In Fig. 1, the curves of the computational time versus \(\rho _z\) with fixed \(\rho _x\) of the GBEK method for two different matrices are presented respectively. For \(A\in \mathbb {R}^{1000\times 100}\) and \({\text {cond}}(A)=50\), it is observed that the computing time first decreases and then increases when the value of \(\rho _x\) is fixed and value of \(\rho _x\) is increasing. Similar phenomenon exists in \(A={\text {rel5}}\). For the rest of the numerical examples, we set the parameters \(\rho _x= \rho _z=0.5\) in the GBEK method.

Fig. 1
figure 1

Curves of the computing time versus \(\rho _z\) with fixed \(\rho _x\) (left: \({A\in \mathbb {R}^{1000\times 100}, {\text {cond}}(A)=50}\), right: \({A={\text {rel5}}, {\text {cond}}(A)={\text {Inf}}}\))

Example 2

The coefficient matrix A is an overdetermined random Gaussian matrix.

Table 1 Numerical results for Example 2

In Table 1, the number of iteration steps, computational time and speed-ups for the RDBK, REABK, ERMR and GBEK methods for solving Example 2 are presented respectively.

From Table 1, it is obvious that the GBEK method significantly reduces the number of iteration steps and computing time, compared with the RDBK, REABK, ERMR methods. The GBEK method demonstrates a noticeable performance against the ERMR method, with a maximum value of \({\text {speed-up}}_3\) reaching 12.9021. Due to the fact that the GBEK method and the ERMR method employ the same iterative format, the possible reason of the superior performance of the GBEK method is the use of greedy block criterion.

Example 3

The coefficient matrix A is an underdetermined random Gaussian matrix.

Table 2 Numerical results for Example 3

The numerical results of Example 3 are listed in Table 2. When the coefficient matrix A is underdetermined, the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of both iteration counts and computing time. The GBEK method exhibits the lowest iteration counts and the shortest computational time to achieve the desired accuracy.

Example 4

The matrix A is taken from the SuiteSparse Matrix Collection [8].

Table 3 Numerical results for Example 4

For solving Example 4, the numbers of iteration steps, the computational time and the \({\text {speed-up}}\)s for the RDBK, REABK, ERMR and GBEK methods are provided in Table 3. All the coefficient matrices are sparse and rank-deficient, with different matrix sizes, densities, and condition numbers. Here the density of A is defined as

$$ {\text {density}} = \frac{{\text {number}}\hspace{5.0pt}\textrm{of}\hspace{5.0pt}{\text {nonzeros}}\hspace{5.0pt}\textrm{of}\hspace{5.0pt}A}{mn}, $$

which accurately describes the sparsity of A.

From Table 3, it is seen that the GBEK method outperforms the RDBK, REABK and ERMR methods in terms of the number of iteration steps and computational time. Furthermore, in Table 3, the maximum values of \({\text {speed-up}}_1\) and \({\text {speed-up}}_2\) are 55.3191 and 13.7952 respectively, which further confirms the superiority of the GBEK method for solving large sparse least squares problems.

Fig. 2
figure 2

Convergence curves of the RDBK, REABK, ERMR and GBEK methods for different matrices (left: \(A\in \mathbb {R}^{10000\times 500}\), middle: \(A\in \mathbb {R}^{500\times 10000}\), right: \(A={\text {abtaha1}}\))

The curves of the relative solution error versus the iteration counts for the RDBK, REABK, ERMR and GBEK methods for different matrices are showed in Fig. 2. It is clear that the relative solution error of the GBEK method decreases the fastest as the number of iteration steps increases for these three examples.

Example 5

Consider solving the X-ray computed tomography problem in AIR Tools II [14]. The size of the matrix A is set to be \(15300\times 3600\).

In Example 5, the effectiveness of the RDBK, REABK, ERMR and GBEK methods is evaluated by the Peak Signal-to-Noise Ratio (PSNR), which is a widely used metric in image processing to measure the similarity between two images. The higher PSNR value indicates the better image quality. All methods were run with the same number of iteration steps.

The original image and the approximate images recovered by the four methods are given in Fig. 3. It is obvious that the image reconstructed by the GBEK method is the best and attains the highest PSNR value of 34.9177.

Fig. 3
figure 3

Numerical results for Example 5

4 Conclusions

A greedy block extended Kaczmarz method is proposed for solving least squares problems. Theoretical analysis is established and a linear convergence rate is derived. Numerical experiments show the proposed method exhibits a better performance than randomized block extended Kaczmarz methods in terms of both the number of iteration steps and computational time.