1 Introduction

We consider the solution of the generalized saddle point linear system

$$\begin{aligned} \begin{bmatrix} A &{}\quad B^T\\ -B&{}\quad C\end{bmatrix} \begin{bmatrix} x\\ y\end{bmatrix}= \begin{bmatrix} f \\ -g\end{bmatrix},~~~~\mathrm {or}~~ {\mathcal {A}} u=b, \end{aligned}$$
(1)

where A is an \(n\times n\) symmetric and positive definite matrix, B is an \(m\times n\) full-rank matrix, C is an \(m\times m\) symmetric and positive semidefinite matrix and \(m\le n\). With such assumptions, the existence and uniqueness of the solution of (1) can be assured. The block linear system (1) stems from many real-world applications, including the constrained optimal control (Betts 2001), the computational fluid dynamics (Saad 2003) and the mixed finite element of elliptic PDEs (Benzi et al. 2005), etc.

Recent years have witnessed vigorous developments in the numerical algorithms for solving (1). Due to the large size and sparsity of the coefficient matrix \({\mathcal {A}}\), iterative methods are usually favored. In addition to those ad hoc ones, existing approaches include the stationary iteration schemes (Bai et al. 2005b; Zhang and Shang 2010; Yun 2013; Zhang et al. 2019), the matrix-splitting methods (Bai et al. 2008; Cao et al. 2016; Bai and Golub 2007; Li and Wu 2015; Liang and Zhang 2016; Shen 2014; Zeng and Ma 2016), the more involved Krylov subspace methods (Bai 2015; Saad 2003) and their hybrid variants (Cao and Yi 2016). Among all methods, the restarted GMRES method (Saad and Schultz 1986) has received much attention and been used extensively in view of their effectiveness. However, it is known that GMRES(k) often suffers from slow convergence or even stagnation. Therefore, a reasonable preconditioner must be chosen to accelerate GMRES(k). By far, some kinds of preconditioners pertaining to Krylov subspace methods have been proposed, including the HSS-type preconditioners (Cao et al. 2016; Bai and Golub 2007; Huang et al. 2009; Liao and Zhang 2019), the matrix splitting preconditioners (Bai et al. 2005a; Ling and Liu 2017; Murphy et al. 2000; Shen and Shi 2016; Simoncini 2004; Zhang et al. 2014, 2017), the constraint preconditioners (Keller et al. 2000; Shen et al. 2019; Zhang et al. 2011) and the dimensional splitting preconditioners (Ke and Ma 2017; Cao et al. 2013); see (Benzi et al. 2005) for developments up to the year 2005 and more recent surveys (Bai 2015; Rozložník 2018; Wathen 2015).

The essential idea behind developing a good preconditioner is that the preconditioner is expected to approximate the coefficient matrix in some sense such that the preconditioned matrix will have a clustered spectrum (away from the origin) or readily be computed. For this reason, we are interested in constructing efficient preconditioners that preserve the structure of the original coefficient matrix in (1) as much as possible. In the context of solving (generalized) saddle point system, some efforts have been made towards this goal in recent years. For example, Murphy et al. (2000) propose a Schur-complement-based block-diagonal preconditioner which yields a preconditioned matrix with exactly three or exactly two distinct eigenvalues. Following this reasoning, de Sturler and Liesen (2005) derive block-diagonal preconditioners from the splitting of the (1,1)-block matrix given in Murphy et al. (2000) from which the solution of the original problem can be restored by solving a smaller related linear system; see also (Beik et al. 2017). By incorporating the (1,2) and (2,1) matrix blocks, Keller et al. (2000) devise a constraint preconditioner that is different from the original system only in the (1,1)-block. Pan et al. (2006) propose a deteriorated positive-definite and skew-symmetric splitting (DPSS) preconditioner which stimulates the research of matrix-splitting-based preconditioners (Cao et al. 2015; Zhang et al. 2014).

In the literature, a common way of implementing the HSS iteration (Bai et al. 2003) is to alternate between the Hermitian and skew-Hermitian parts of the coefficient matrix. By switching the role of the two splitting matrices in the Hermitian and skew-Hermitian splitting, Zhang (2018) proposes an efficient variant of HSS preconditioner (EVHSS) for solving (1). In that work, it is noted that the contraction factors differ though swapping the Hermitian part with the skew-Hermitian part gives the same asymptotic convergence rate as in the original HSS iteration. The theoretical result on convergence is proved in Zhang (2018) and is further refined by Chen (2018). Numerical performance of EVHSS is very promising when compared with some existing HSS-type preconditioners.

When implementing EVHSS, one often needs to strike a balance in choosing the relaxation parameter to make the EVHSS preconditioner maximally preserve the structure of \({\mathcal {A}}\), though a theoretical quasi-optimal parameter is given. Motivated by this observation, we propose a new iteration scheme for solving (1). The unconditional convergence of the new stationary method is proved. Induced by the new method, an HSS-type preconditioner is also devised to accelerate the convergence of GMRES. By construction, the new preconditioner resembles more from the coefficient matrix \({\mathcal {A}}\) than EVHSS. Theoretical analysis shows that the preconditioned matrix with the new preconditioner is endowed with a well-clustered eigenvalue distribution which is a welcome for Krylov subspace methods. To make the new preconditioner practical, we also give a framework for developing inexact preconditioners tailored to large and sparse generalized saddle point problems.

The remainder of this paper is as follows. In Sect. 2, we first give a quick recap of the EVHSS iteration/preconditioner, and then introduce the new stationary iteration and analyze its unconditional convergence. In Sect. 3, we present the new preconditioner induced by the new iteration scheme in Sect. 2, investigate the spectrum information of the preconditioned matrix, and come up with a framework of practical inexact preconditioners. In Sect. 4, we employ some numerical experiments to verify the effectiveness of the new preconditioner and its inexact variants. Finally, some conclusions are given in Sect. 5.

2 A new iteration scheme and its convergence analysis

The new iteration method in this work is motivated by the EVHSS iteration (Zhang 2018). Therefore, we first give a sketch of the EVHSS iteration method/preconditioner in this section. Aware of the possible problem of choosing the relaxation parameter in EVHSS, we then present the new iteration scheme for solving (1) and finally establish the unconditional convergence result.

2.1 The EVHSS iteration/preconditioner

In the language of the Hermitian and skew-Hermitian splitting (Bai et al. 2003), the generalized saddle point matrix \({\mathcal {A}}\) in (1) admits the following splitting:

$$\begin{aligned} {\mathcal {A}}=\begin{bmatrix} A &{}\quad 0\\ 0 &{}\quad C\end{bmatrix} +\begin{bmatrix}0 &{}\quad B^T\\ -B &{}\quad 0\end{bmatrix}\equiv {\mathcal {H}}+{\mathcal {S}}, \end{aligned}$$

where \({\mathcal {H}}\) and \({\mathcal {S}}\) are the Hermitian and skew-Hermitian parts of \({\mathcal {A}}\), respectively. The corresponding HSS preconditioner for (1) is then given by

$$\begin{aligned} {\mathcal {P}}_{HSS}=\frac{1}{2\alpha }(\alpha I+{\mathcal {H}})(\alpha I+{\mathcal {S}}), \end{aligned}$$
(2)

where \(\alpha >0\). In Zhang (2018), it is stated that the contraction factor can be different if the Hermitian and skew-Hermitian parts are interchanged.

Motivated by this finding, Zhang proposes an efficient variant of \({\mathcal {P}}_{HSS}\) by swapping \(\alpha I+{\mathcal {H}}\) and \(\alpha I+{\mathcal {S}}\) in (2). The resulting preconditioner EVHSS reads as

$$\begin{aligned} {\mathcal {P}}_{\mathrm{EVHSS}}=\frac{1}{\alpha }\begin{bmatrix} A&{}\quad B^T\\ -B &{}\quad \alpha I\end{bmatrix} \begin{bmatrix} \alpha I &{}\quad 0\\ 0 &{}\quad \alpha I+C\end{bmatrix}= \begin{bmatrix} A &{}\quad B^T(I+\frac{1}{\alpha }C) \\ -B &{}\quad \alpha I+C\end{bmatrix}. \end{aligned}$$
(3)

The difference between \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {A}}\) is given by

$$\begin{aligned} {\mathcal {R}}_{\mathrm{EVHSS}}={\mathcal {P}}_{\mathrm{EVHSS}}-{\mathcal {A}}=\begin{bmatrix} 0&{}\quad \frac{1}{\alpha }B^TC\\ 0&{}\quad \alpha I\end{bmatrix}. \end{aligned}$$
(4)

Based on the matrix splitting (4), Zhang constructs the EVHSS iteration by

$$\begin{aligned} x^{(k+1)}=Gx^{(k)}+{\tilde{c}}, \end{aligned}$$

where \(G={\mathcal {P}}_{\mathrm{EVHSS}}^{-1}{\mathcal {R}}_{\mathrm{EVHSS}}\) and \({\tilde{c}}={\mathcal {P}}_{\mathrm{EVHSS}}^{-1}b\).

2.2 A new iteration method

In Zhang (2018), the EVHSS preconditioner is shown to outperform some other existing HSS-based preconditioners with appropriate choices of \(\alpha \). In practice, however, we require to strike a balance between the value of \(\alpha \) and \(1/\alpha \), as observed from (4); sufficiently large value of \(\alpha \) makes the (1,2)-block a better approximation to the zero matrix but a (2,2)-block matrix with large entries in (4), and vice versa. In this work, we sidestep this problem by simply abandoning the (2,2)-block \(\alpha I\) in (4), which yields the new preconditioner

$$\begin{aligned} {\mathcal {P}}_{\mathrm{new}}=\begin{bmatrix} A &{}\quad B^T(\frac{1}{\alpha }C+I)\\ -B &{}\quad C\end{bmatrix}, \end{aligned}$$
(5)

where \(\alpha >0\). We note that another variant can be obtained by letting the (1,2)-block in (4) be zero. Nevertheless, we only consider the variant in (5) since it is efficient and easier to be analyzed theoretically. Consequently, the corresponding difference matrix between \({\mathcal {P}}_{\mathrm{new}}\) and \({\mathcal {A}}\) presents to be

$$\begin{aligned} {\mathcal {R}}_{\mathrm{new}}={\mathcal {P}}_{\mathrm{new}}-{\mathcal {A}}= \begin{bmatrix} 0&{}\quad \frac{1}{\alpha }B^TC \\ 0&{}\quad 0\end{bmatrix}. \end{aligned}$$
(6)

Hence, the following stationary iteration scheme induced from the splitting (6) is given by

$$\begin{aligned} \begin{bmatrix} A &{}\quad B^T(\frac{1}{\alpha }C+I)\\ -B &{}\quad C\end{bmatrix}\begin{bmatrix} x^{(k+1)}\\ y^{(k+1)}\end{bmatrix} =\begin{bmatrix} 0 &{}\quad \frac{1}{\alpha }B^TC \\ 0 &{}\quad 0\end{bmatrix}\begin{bmatrix} x^{(k)}\\ y^{(k)}\end{bmatrix}+b, \end{aligned}$$

or in a compact form, i.e.,

$$\begin{aligned} x^{(k+1)}=\varGamma x^{(k)}+c, \end{aligned}$$
(7)

where \(\varGamma ={\mathcal {P}}^{-1}_{\mathrm{new}}{\mathcal {R}}_{\mathrm{new}}\), \(c={\mathcal {P}}_{\mathrm{new}}^{-1}b\) and \(k=0, 1, \ldots \).

It is known that the iteration (7) converges if the spectral radius of the iteration matrix \(\varGamma \) is less than 1, which is guaranteed by the following theorem.

Theorem 1

Suppose that the matrices A, B and C are defined in (1), and \(\alpha \) is a positive constant. Let \((\theta _i,v_i)\) be the ith eigenpair of the matrix \(\alpha ^{-1}S^{-1}BA^{-1}B^TC\), where \(S=C+BA^{-1}B^T(\alpha ^{-1}C+I)\). Then \(\rho (\varGamma )<1\), that is, the iterative scheme (7) converges to the unique solution of (1) unconditionally.

Proof

The new preconditioner \({\mathcal {P}}_{\mathrm{new}}\) in (5) can be factorized as

$$\begin{aligned} \begin{aligned} {\mathcal {P}}_{\mathrm{new}}&=\begin{bmatrix} A &{}\quad 0\\ 0 &{}\quad I\end{bmatrix} \begin{bmatrix} I &{}\quad A^{-1}B^T(\frac{1}{\alpha }C+I)\\ -B &{}\quad C\end{bmatrix} \\&=\begin{bmatrix} A &{} \quad 0\\ 0 &{}\quad I\end{bmatrix} \begin{bmatrix} I &{}\quad 0\\ -B &{}\quad I\end{bmatrix} \begin{bmatrix} I &{}\quad 0\\ 0 &{}\quad S\end{bmatrix} \begin{bmatrix} I &{}\quad A^{-1}B^T(\frac{1}{\alpha }C+I)\\ 0 &{}\quad I\end{bmatrix}, \end{aligned} \end{aligned}$$
(8)

where \(S=C+BA^{-1}B^T(\frac{1}{\alpha }C+I)\). As a result, we have

$$\begin{aligned} \begin{aligned} {\mathcal {P}}_{\mathrm{new}}^{-1}&=\begin{bmatrix} I &{}\quad -A^{-1}B^T(\frac{1}{\alpha }C+I)\\ 0 &{}\quad I\end{bmatrix} \begin{bmatrix} I &{}\quad 0\\ 0 &{}\quad S^{-1}\end{bmatrix} \begin{bmatrix} I &{}\quad 0\\ B &{}\quad I\end{bmatrix} \begin{bmatrix} A^{-1} &{}\quad 0\\ 0 &{}\quad I\end{bmatrix}. \end{aligned} \end{aligned}$$
(9)

Using (7) and (9), we rewrite the iteration matrix \(\varGamma \) as

$$\begin{aligned} \varGamma ={\mathcal {P}}^{-1}_{\mathrm{new}}{\mathcal {R}}_{\mathrm{new}}= \begin{bmatrix} 0&{}\quad T_1\\ 0&{}\quad T_2\end{bmatrix}, \end{aligned}$$
(10)

where \(T_1=\alpha ^{-1}A^{-1}B^TC-\alpha ^{-1}A^{-1}B^T(\alpha ^{-1}C+I)S^{-1}BA^{-1}B^TC\) and \(T_2=\alpha ^{-1}S^{-1}BA^{-1}B^TC\). From (10), it is clear that \(\varGamma \) has the eigenvalue 0 with multiplicity n while the remaining eigenvalues of \(\varGamma \) are given by those of \(T_2\). Then it suffices to look into the eigenvalues of \(T_2\). Let \((\theta _i,v_i)\) be the ith eigenpair of the matrix \(T_2\), i.e.,

$$\begin{aligned} BA^{-1}B^TCv_i=\alpha \theta _i Sv_i. \end{aligned}$$
(11)

Multiplying \({v_i^*}/{v_i^*v_i}\) on both sides of (11) yields

$$\begin{aligned} \frac{v_i^*BA^{-1}B^TCv_i}{v_i^*v_i}=\alpha \theta _i \frac{v_i^*Sv_i}{v_i^*v_i}. \end{aligned}$$
(12)

By the definition of S, we obtain from (12) that

$$\begin{aligned} \frac{v_i^*BA^{-1}B^TCv_i}{v_i^*v_i}=\theta _i\left( \alpha \frac{v_i^*Cv_i}{v_i^*v_i}+\frac{v_i^*BA^{-1}B^TCv_i}{v_i^*v_i}+\alpha \frac{v_i^*BA^{-1}B^Tv_i}{v_i^*v_i}\right) . \end{aligned}$$
(13)

Denote by

$$\begin{aligned} \mu _i=\frac{v_i^*Cv_i}{v_i^*v_i}, \varepsilon _i+\eta _i \iota =\frac{v_i^*BA^{-1}B^TCv_i}{v_i^*v_i}, \gamma _i=\frac{v_i^*BA^{-1}B^Tv_i}{v_i^*v_i}, \end{aligned}$$
(14)

where \(\iota \) is the imaginary unit. It follows from (13) and (14) that

$$\begin{aligned} \theta _i=\frac{\varepsilon _i+\eta _i \iota }{\varepsilon _i+\eta _i \iota +\alpha (\mu _i+\gamma _i)}. \end{aligned}$$

Therefore, the spectral radius of the iteration matrix \(\varGamma \) is given by

Since A is symmetric positive definite, B is of full rank, and C is symmetric positive semidefinite, then we obtain from (14) that \(\gamma _i>0\) and \(\mu _i\ge 0\), which, coupled with \(\alpha >0\), shows that \( \rho (\varGamma )<1\). Therefore, the new iteration converges to the unique solution of (1) unconditionally. \(\square \)

3 A new preconditioner and its properties

The convergence rate of the stationary methods (including the classical Jacobi, Gauss–Seidel and SOR ) can be slow when compared with the more sophisticated Krylov subspace methods. For this reason, we will not elaborate on the implementation details of the new iteration scheme (7). Alternatively, we are interested in exploiting the matrix \({\mathcal {P}}_{\mathrm{new}}\) as a preconditioner to accelerate the Krylov subspace methods, which is the main task of this section.

3.1 Spectrum of the preconditioned matrix

It is known that the convergence behavior of preconditioned iterative methods relies heavily on eigen-information of the preconditioned matrix. Therefore, we investigate the eigenvalue distribution of the preconditioned matrix \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) in this subsection.

We begin the discussion by considering the eigenvalue problem

$$\begin{aligned} \begin{bmatrix} A&{}\quad B^T\\ -B&{}\quad C\end{bmatrix}\begin{bmatrix} A &{}\quad B^T(\frac{1}{\alpha }C+I)\\ -B &{}\quad C\end{bmatrix}^{-1} \begin{bmatrix} p\\ q\end{bmatrix} =\lambda \begin{bmatrix} p\\ q\end{bmatrix}. \end{aligned}$$
(15)

The following result describes the eigenvalue distribution of the preconditioned matrix.

Theorem 2

The eigenvalues of \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) are either 1 (with multiplicity at least n) or of the form

$$\begin{aligned} \lambda _i=\frac{\alpha (\mu _i+\gamma _i)}{\alpha (\mu _i+\gamma _i)+\varepsilon _i+\eta _i \iota }, \end{aligned}$$
(16)

where \(\mu _i\), \(\gamma _i\), \(\varepsilon _i\) and \(\eta _i\) are defined in (14).

Proof

Since the right-preconditioned matrix \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) is similar to its left-preconditioned counterpart \({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}\), then it reduces to examining the eigenvalues of \({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}\). By using (10), we have

$$\begin{aligned} {\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}={\mathcal {P}}_{\mathrm{new}}^{-1}({\mathcal {P}}_{\mathrm{new}}-{\mathcal {R}}_{\mathrm{new}})=I-\varGamma =\begin{bmatrix} I &{}&{}\quad -T_1\\ 0 &{}&{}\quad I-T_2\end{bmatrix}, \end{aligned}$$
(17)

where \(T_1\), \(T_2\) are defined in (10). It follows from (11) and (14) that the eigenvalues of \(I-T_2\) are of the form

$$\begin{aligned} \lambda _i=\frac{\alpha (\mu _i+\gamma _i)}{\alpha (\mu _i+\gamma _i)+\varepsilon _i+\eta _i \iota }, \end{aligned}$$

which completes the proof.\(\square \)

Remark 1

As shown in (16), the nonunit eigenvalues \(\lambda _i\) approach 1 if \(\alpha \) is sufficiently large. Under this condition, all nonunit eigenvalues of \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) huddle around the point (1, 0), which is appealing for the convergence of Krylov subspace methods; see the numerical examples in Sect. 4.

3.2 Right-preconditioned GMRES

In this subsection, we employ the new preconditioner to speed up the convergence of GMRES (Saad and Schultz 1986). The implementation details are given in Algorithm 1. With right preconditioning, the resulting Krylov subspace now becomes

$$\begin{aligned} {\mathcal {K}}_k({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1},b)= \mathrm {span}\{b, {\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}b,\ldots ,( {\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1})^{k-1}b\}. \end{aligned}$$
(18)
figure a

It is known that GMRES converges very fast if the approximation of the exact solution is found in a low-dimensional subspace. The next result reveals the intrinsic relation between the dimension of the Krylov subspace \({\mathcal {K}}({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1},b)\) and the size of (2, 2)-block matrix in \({\mathcal {A}}\).

Theorem 3

The minimal polynomial of the preconditioned matrix \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) has a degree not exceeding \(m+1\). Thus, the dimension of the corresponding Krylov subspace \({\mathcal {K}}({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}, b)\) is at most \(m+1\).

Proof

From (17), we know that

$$\begin{aligned} {\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}= \begin{bmatrix} I &{}&{}\quad -T_1\\ 0 &{}&{}\quad I-T_2 \end{bmatrix}. \end{aligned}$$
(19)

Since \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}={\mathcal {P}}_{\mathrm{new}}({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}){\mathcal {P}}_{\mathrm{new}}^{-1}\), then \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) and \({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}\) are similar. Similar matrices have the same minimal polynomial (Horn and Johnson 1990, Corollary 3.3.3). Therefore, it reduces to prove that the minimal polynomial of \({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}\) has a degree at most \(m+1\). It follows from Theorem 2 that the characteristic polynomial of \({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}\) is given by

$$\begin{aligned} ({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-I)^n\cdot \prod \limits _{i=1}^m({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-\lambda _iI), \end{aligned}$$
(20)

where \(\lambda _i\) is the nonunit eigenvalue defined in (16). By expanding the polynomial \(({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-I)\cdot \prod \nolimits _{i=1}^m({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-\lambda _iI)\), we have

$$\begin{aligned} ({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-I)\cdot \prod ^m_{i=1}({\mathcal {P}}_{\mathrm{new}}^{-1}{\mathcal {A}}-\lambda _iI)= \begin{bmatrix} 0&{}\quad -T_1\prod ^m_{i=1}[(1-\lambda _i)I-T_2]\\ \quad &{} \quad \\ \quad &{} \quad \\ 0\quad &{} -T_2\prod ^m_{i=1}[(1-\lambda _i)I-T_2]\end{bmatrix}, \end{aligned}$$

where \(1-\lambda _i\) are the eigenvalues of \(T_2\) for \(i=1,\dots ,m\); see Theorem 2. By the Cayley–Hamilton theorem, we have

$$\begin{aligned} \prod ^m_{i=1}[(1-\lambda _i)I-T_2]=0. \end{aligned}$$

In other words, the degree of the minimal polynomial of \({\mathcal {P}}^{-1}_{\mathrm{new}}{\mathcal {A}}\) is at most \(m+1\). Consequently, the degree of the minimal polynomial of the right preconditioned matrix \({\mathcal {A}}{\mathcal {P}}^{-1}_{\mathrm{new}}\) is at most \(m+1\). From (Saad 2003, Proposition 6.1), it is known that the degree of the minimal polynomial is equal to the dimension of the corresponding Krylov subspace \({\mathcal {K}}({\mathcal {A}}{\mathcal {P}}^{-1}_{\mathrm{new}}, b)\) which completes the proof. \(\square \)

Remark 2

As stressed in Remark 1, the nonunit eigenvalues \(\lambda _i\) approach 1 if the value of \(\alpha \) is sufficiently large. Therefore, a further reduction in the iteration steps (much less than \(m+1\)) can be anticipated if some nonunit eigenvalues \(\lambda _i\) approximate 1 well. This is confirmed by numerical examples in Sect. 4.

Now let us have a close look at Algorithm 1. The major difference between Algorithm 1 and its unpreconditioned counterpart (Saad and Schultz 1986) is that an extra matrix-vector product involving \({\mathcal {P}}^{-1}_{\mathrm{new}}\) in line 3 (thus line 12) requires to be computed. To this end, a naive approach is to compute \({\mathcal {P}}_{\mathrm{new}}^{-1}\) explicitly and then do the resulting matrix-vector product. However, this is not preferred numerically and even prohibited in most situations. Instead, it is often accomplished by solving an equivalent linear systems. For example, we solve r from the linear systems \({\mathcal {P}}_{\mathrm{new}}z=r\) if \({\mathcal {P}}_{\mathrm{new}}^{-1}r\) is needed, where \(r=(r_a^T,r_b^T)^T\) and \(z=(z_a^T,z_b^T)^T\). Using the decomposition (9), we transform the problem of solving \(z={\mathcal {P}}_{\mathrm{new}}^{-1}r\) into the following form:

$$\begin{aligned} \left[ \begin{array}{c} z_a\\ z_b\\ \end{array} \right] = \left[ \begin{array}{cc} I &{}\quad -A^{-1}B^T(\frac{1}{\alpha }C+I)\\ 0&{}\quad I\\ \end{array} \right] \left[ \begin{array}{cc} I &{}\quad 0\\ 0 &{}\quad S^{-1}\\ \end{array} \right] \left[ \begin{array}{cc} I &{}\quad 0\\ B &{}\quad I\\ \end{array} \right] \left[ \begin{array}{cc} A^{-1} &{}\quad 0\\ 0 &{}\quad I\\ \end{array} \right] \left[ \begin{array}{c} r_a\\ r_b\\ \end{array} \right] , \end{aligned}$$
(21)

whose implementation details are presented in Algorithm 2.

figure b

Algorithm 2 shows clearly how operations like \({\mathcal {P}}_{\mathrm{new}}^{-1}r\) are done. From a numerical point of view, however, the process is not optimized and can be improved to be more efficient; for instance, the intermediate vectors \(v^{(1)}\) (Step 1), \(u^{(2)}\), \(v^{(2)}\) (Step 2) and \(u^{(3)}\) (Step 4) need not be stored. In light of this, we give a practical version of Algorithm 2 which is employed in our numerical experiments; see Algorithm  3.

figure c

3.3 Inexact variants

In Algorithm 3, the main computational overhead consists of solving three linear systems associated with two different coefficient matrices. In particular, one needs to solve a sub-linear system with the Schur complement coefficient matrix \(S=C +BA^{-1}B^T(\alpha ^{-1}C+I)\). In the context of solving saddle point linear systems, it can be done either directly or iteratively; see, for instance, (Cao et al. 2015; Rozložník 2018). Alternatively, an efficient work-around is to replace A with its approximation \({\tilde{A}}\) that is easier to implement. Many possible candidates for approximating A are available; for instance, A can be approximated by its diagonal/tridiagonal part or matrix factors derived from incomplete LU factorization. Thus, the approximation of the preconditioner \({\mathcal {P}}_{\mathrm{new}}\) and the associated factorization (reminiscent of (8)) can be presented as

$$\begin{aligned} \widetilde{{\mathcal {P}}}_{\mathrm{new}}= & {} \left[ \begin{array}{cc} {\tilde{A}} &{}\quad 0\\ 0 &{}\quad I\\ \end{array} \right] \left[ \begin{array}{cc} I &{}\quad 0\\ -B &{}\quad I\\ \end{array} \right] \left[ \begin{array}{cc} I &{}\quad 0\\ 0 &{}\quad {\tilde{S}}\\ \end{array} \right] \left[ \begin{array}{cc} I &{}\quad {\tilde{A}}^{-1}B^T(\frac{1}{\alpha }C+I)\\ 0 &{}\quad I\\ \end{array} \right] \nonumber \\= & {} \left[ \begin{array}{cc} {\tilde{A}} &{}\quad B^T(\frac{1}{\alpha }C+I)\\ -B &{}\quad C\\ \end{array} \right] , \end{aligned}$$
(22)

where \({\tilde{S}}=C +B{\tilde{A}}^{-1}B^T(\alpha ^{-1}C+I)\) with \({\tilde{A}}\) being the approximation to A. It is easy to check that solving \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}^{-1}r\) is in essence the same as Algorithm 3 except for A being replaced by \({\tilde{A}}\).

The underlying idea of the aforementioned two approaches is either to solve the Schur complement linear systems with direct/iterative methods or to obtain an inexpensive approximation of the matrix A. In fact, we can also make use of the properties/structures of the saddle point matrix and develop efficient approximations to the Schur complement matrix. Below are some related ways that give us a hint to this end. In the context of solving PDE-constrained optimization problem, Pearson and Wathen (2012) take \({\tilde{S}}=(K+1/\sqrt{\beta }M)M^{-1}(K+1/\sqrt{\beta }M)\) as the approximation of the Schur complement S, where the saddle point problem is of the following form:

$$\begin{aligned} \left[ \begin{array}{ccc} M &{}\quad 0&{}\quad K\\ 0&{}\quad \beta M&{}\quad -M\\ K&{}\quad -M&{}\quad 0 \end{array} \right] \left[ \begin{array}{c} y\\ u\\ p \end{array} \right] =\left[ \begin{array}{c} b\\ 0\\ d \end{array} \right] . \end{aligned}$$
(23)

It is validated in Pearson and Wathen (2012) that the eigenvalues of \({\tilde{S}}^{-1}S\) are contained in the interval [0.5, 1]. In Axelsson (2019), Axelsson further tailors the approximation \({\tilde{S}}\) given in Pearson and Wathen (2012) to more standard type of problems. However, we cannot apply the approximation to our problem straightforwardly since the block matrices in (23) fail to satisfy restrictions needed in (1). One may also employ the Schur complement method (Rozložník 2018, p. 34) to solve the Schur complement linear systems. In spite of this, it should be noted that constructing an efficient approximation to the Schur complement is still a challenge for general saddle point problems and deserves a special treatment (Cao et al. 2019). Moreover, the focus of this subsection is on introducing a framework of inexact preconditioners, and a thorough analysis of approximation to Schur complement is beyond the scope of this work. Therefore, we will not dwell on this topic and regard it as an interesting future work. For more on approximations to the Schur complement from different engineering fields, we refer to Deuring (2009), Kay et al. (2002), Loghin and Wathen (2002), Olshanskii and Vassilevski (2007), Pearson and Wathen (2018) and references therein. Numerical examples in Sect. 4 indicate that the inexact preconditioner can be very practical if provided with suitable choices of \({\tilde{A}}\).

Fig. 1
figure 1

Spectrum of \({\mathcal {A}}\) and \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) for \(32\times 32\) uniform (left) and stretched (right) grids

4 Numerical experiments

In this section, we present some numerical experiments to show the effectiveness of the new preconditioners for the generalized saddle point problem (1) in terms of the number of iteration steps/restarts (Iter), CPU time in seconds (CPU) and the relative residual norm (Res). The unpreconditioned GMRES with restarting (GMRES(k)) (Saad and Schultz 1986) and EVHSS (Zhang 2018) are used for comparison with the new preconditioners. For completeness, we record both the number of restarts and the number of inner steps in the last restart; see Tables 1, 2, 3, 4, 5. In what follows, the restarting frequency k is set to be 10. The initial guess is chosen to be the zero vector, and algorithms are terminated if \(||r_i||_2/||b||_2<10^{-8}\) within 1500 restarts, where \(r_i=b-Ax_i\) is the ith residual. All experiments are run on a PC using MATLAB 2014b under Windows 10 operating system.

We consider the generalized saddle point problems arising from the discretization of Stokes equation

$$\begin{aligned} \left\{ \begin{array}{l} -\varDelta u+\nabla p=f, \\ \nabla \cdot u=0, \end{array} \right. \end{aligned}$$
(24)

in a bounded domain \(\varOmega \in {\mathbb {R}}^2\) with suitable boundary conditions, where u is the velocity, p is the pressure, \(\varDelta \) represents the vector Laplacian operator, \(\nabla \) stands for the gradient, and \(\nabla \cdot u\) is the divergence of u. As in Zhang (2018), the test problems generated here are “leaky” two-dimensional lid-driven cavity problems. The stabilized \(Q1-P0\) finite element is employed for discretization on both uniform and stretched grids. The IFISS software package (Elman et al. 2014) is used to generate the linear systems (1). We follow the practice in Zhang (2018) by deleting the first two rows of the original matrix B generated by IFISS (and thus the first two rows and columns of C) such that the resulting matrix B is of full rank.

Example 1

This example serves to illustrate the eigenvalue clustering effect by the new preconditioner. As stated in Theorem 2, the eigenvalues of the preconditioned matrix \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) are either 1 or of the form (16). If the value of \(\alpha \) is sufficiently large, then the nonunit eigenvalues will approach 1; see Remark 1. Figure 1 depicts the eigenvalue distribution of the original saddle point matrix \({\mathcal {A}}\) and its right-preconditioned counterpart \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) for \(32\times 32\) grid of different types. It should be stressed that the five ticks 1.0000 on the x-axis in the lower-left subplot (\(\alpha =100\), \(32 \times 32\) uniform grid) are different from the value 1. In fact, the real parts of all eigenvalues (corresponding to the x-axis values) therein lie in the interval [0.999973966055229, 1] which are rounded to 1.0000 and displayed. We are now in a position to see the clustering effect by the new preconditioner. Take the left subplots in Fig. 1 for example. Without preconditioning, the eigenvalues of the original matrix \({\mathcal {A}}\) are well spread between the two points (0,0) and (4,0). However, the eigenvalues become clustered even with a small value of \(\alpha \); see the subplot with \(\alpha =0.01\) on the left of Fig. 1. Such clustering effect becomes more pronounced when \(\alpha \) is chosen to be moderately large, say \(\alpha =100\). A well-clustered spectrum is desirable for Krylov subspace methods. Apart from that, the conditioning of the original saddle point matrix \({\mathcal {A}}\) is also improved via the new preconditioner. For instance, the condition number of the original saddle point matrix \({\mathcal {A}}\) in the left subplot of Fig. 1 is \(1.013\times 10^6\), which indicates the associated linear systems make standard Krylov subspace methods cumbersome. With the proposed preconditioner, however, the condition number decreases to \(1.125\times 10^4\) (with \(\alpha =0.01\)) and further to 1.011 (with \(\alpha =100\)). As suggested in Benzi’s survey (Benzi 2002, p. 420), a good preconditioner should be chosen such that the preconditioned coefficient matrix will have a smaller spectral condition number, and/or eigenvalues clustered around 1. In light of this, we conclude that the new preconditioner is appealing, and we shall look into how it affects the convergence of Krylov subspace methods in the following examples.

Example 2

The purpose of this example is threefold: (i) showing the accelerating effect of the new preconditioner when applied to the Krylov subspace method; (ii) analyzing the choice of the experimentally optimal parameter \(\alpha \) for both \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\); (iii) comparing the numerical results when the inner sub-linear system of equations in Algorithm 3 are solved either directly or iteratively.

Let us begin with examining the accelerating effect of the new preconditioner when applied to the Krylov subspace method. Two other candidates, including (unpreconditioned) GMRES (Saad and Schultz 1986) and GMRES preconditioned by EVHSS (Zhang 2018), are used for comparison. For the moment, the inner sub-linear system of equations in Algorithm 3 are solved by the direct method. The numerical results are tabulated in Tables 1 and 2. It should be noted that the unpreconditioned GMRES fails to reach the required accuracy within 1500 restarts for most cases here. Therefore, we only display the results of \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) in the two tables. As observed from Tables 1 and 2, both \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) are superior to their unpreconditioned peer. When \(\alpha \) is relatively small, the new preconditioner is on par with EVHSS regarding CPU time; sometimes EVHSS is slightly better than the new preconditioner or vice versa. As \(\alpha \) grows, however, the new preconditioner outperforms EVHSS for different grid sizes; see, for instance, EVHSS even fails to reach the prescribed accuracy within 1500 restarts (denoted by “–”) in Table 2 for \(32\times 32\) stretched grid. As pointed out earlier, the quantity “Iter” includes both the number of restarts and the number of inner steps in the last restart. For instance, the notation “3(1)” denotes that EVHSS-preconditioned GMRES uses three restarts and the number of inner iterations in the last restart is 1. To put it another way, the total number of (inner) steps for EVHSS-preconditioned GMRES is 21Footnote 1. We also note that GMRES preconditioned by \({\mathcal {P}}_{\mathrm{new}}\) often takes only one restart due to the favorable eigenvalue clustering of \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) which is echoed by Fig. 1 and Remark 2. A similar conclusion can also be drawn from Table 2.

Next we give an in-depth analysis of the influence of the parameter \(\alpha \) upon EVHSS and the new preconditioner. In Zhang (2018, Table 1), Zhang presents the experimentally optimal parameters to illustrate the potential of EVHSS. For fairness, we compare EVHSS (using the experimentally optimal parameters) with the new preconditioner. As stated in Remark 1, large values of \(\alpha \) often bring about a tight spectrum. However, this does not imply that we have to use unduly large values of \(\alpha \). Instead, a moderately large value, say \(\alpha =100\), is sufficient for practical use; see Fig. 1. To further justify our statement, we carry out more detailed comparisons between \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) with varying values of \(\alpha \) for different uniform grids in Figs. 2, 3, 4. As given in Zhang (2018, Table 1), the experimentally optimal parameters of EVHSS for \(16\times 16\), \(32\times 32\), and \(64\times 64\) are found in the interval (0, 0.01) (Zhang 2018). Therefore, we plot the iteration step and CPU time curves on different scales, that is, the intervals in which \(\alpha \) locates in Figs. 2, 3, 4 are, respectively, (0, 0.01], (0.01, 1] and (1, 100]. Some remarks are in order. Owing to a clustered spectrum, the new preconditioner requires fewer iteration steps than EVHSS for \(\alpha \) ranging from 1 to 100. In Fig. 2, EVHSS achieves its optimal performance in terms of the CPU time and is slightly better than the new preconditioner when \(\alpha \) varies from 0 to 0.01. However, the advantage of the new preconditioner dominates when \(\alpha \) is greater than 0.01; see Figs. 3, 4. Besides, there are many oscillations in the iteration step and CPU time curves of EVHSS which implies EVHSS is sensitive to the choice of \(\alpha \) (especially when \(\alpha \) becomes large). Fortunately, this undesirable property does not carry over to the new preconditioner. In this sense, the new preconditioner is more applicable than EVHSS. To figure out the practical choice of \(\alpha \) for the new preconditioner, we take for example the \(32\times 32\) case (bottom subplots) in Fig. 4. In that case, the improvement regarding total iteration steps and CPU time becomes negligible for \(\alpha \) greater than 20. Thus, each value between 20 and 100 can be a reasonable choice of \(\alpha \) for the new preconditioner. In accordance with Remark 2 and Fig. 1, we adopt a relatively large \(\alpha \), that is, \(\alpha =100\) as the experimental optimal value for the new preconditioner. The results are listed in Table 3. In Zhang (2018), the restarting frequency in GMRES(k) is \(k=30\) (instead of \(k=10\) as chosen here) and the stopping tolerance is \(10^{-6}\) (instead of \(10^{-8}\) as chosen here), which explains the difference in the number of iterations between Zhang (2018, Tables 2–3) and Table 3. For most tests, the new preconditioner reaches a higher accuracy with shorter CPU time than that of EVHSS; see Table 3. This verifies the effectiveness of the new preconditioner.

Finally, we shall touch upon the matter of inner linear systems solvers in Algorithm 3. We use previously the direct methods, say the MATLAB command backslash to solve the inner sub-linear systems, the numerical results of which are tabulated in Table 3. However, we can also use iterative methods to obtain approximations to these sub-linear systems. This inner iteration, together with the outer GMRES iteration, yields the well-known inner–outer iteration (Simoncini and Szyld 2003). Here we use the restarted GMRES method to solve the sub-linear systems with an inner tolerance \(10^{-4}\) and a maximum 100 restarts. For fair comparison, the inner linear systems in EVHSS are also solved in an iterative manner. Choices of \(\alpha \) are the same as in Table 3. The results are presented in Table 4. Some remarks are available by comparing Table 4 with Table 3. Loosely speaking, the preconditioners \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) with a direct inner solver usually ends up with a higher accuracy than their counterparts with an iterative inner solver, though there exist occasional exceptions. Besides, preconditioners with a direct inner solver appear more suitable for the stretched grids, while preconditioners with an iterative inner solver seem more appropriate for the uniform grids. Moreover, choosing an iterative or a direct inner linear systems solver has less impact on \({\mathcal {P}}_{\mathrm{new}}\) than on \({\mathcal {P}}_{\mathrm{EVHSS}}\) in terms of iteration steps and CPU time; see, for instance, the stretched cases in Tables 3 and 4. For either direct or iterative approach, the new preconditioner surpasses EVHSS regarding iteration steps and CPU time.

Table 1 Numerical results for Stokes equation with different uniform grids
Table 2 Numerical results for Stokes equation with different stretched grids
Fig. 2
figure 2

Total iteration steps and CPU time of GMRES preconditioned by \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) with \(\alpha \) ranging from 0 to 0.01 for \(16\times 16\) (top) and \(32\times 32\) (bottom) uniform grids, respectively

Fig. 3
figure 3

Total iteration steps and CPU time of GMRES preconditioned by \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) with \(\alpha \) ranging from 0.01 to 1 for \(16\times 16\) (top) and \(32\times 32\) (bottom) uniform grids, respectively

Fig. 4
figure 4

Total iteration steps and CPU time of GMRES preconditioned by \({\mathcal {P}}_{\mathrm{EVHSS}}\) and \({\mathcal {P}}_{\mathrm{new}}\) with \(\alpha \) ranging from 1 to 100 for \(16\times 16\) (top) and \(32\times 32\) (bottom) uniform grids, respectively

Table 3 Numerical results of EVHSS and \({\mathcal {P}}_{\mathrm{new}}\) with practical optimal values of \(\alpha \), where the inner linear systems are solved by a direct method
Table 4 Numerical results of EVHSS and \({\mathcal {P}}_{\mathrm{new}}\) with practical optimal values of \(\alpha \), where the inner linear systems are solved by an iterative method

Example 3

Example 2 shows that the new preconditioner is competitive with EVHSS in speeding up the convergence of GMRES. As reported in Sect. 3.3 and Algorithm 3, however, a sub-linear system with the coefficient matrix S needs to be solved per iteration. If A is large and relatively dense, then solving the linear systems involving S can be rather time-consuming; it can be monitored by evoking the MATLAB profiling tool \(\texttt {Profiler}\). As noted in (22), it poses no difficulty if inexpensive approximations to the matrix A in S are available. Several candidates are at our disposal, including the diagonal, triangular or incomplete LU factorization matrices of A (Bai et al. 2005b; Benzi et al. 2005).

This example serves to demonstrate the potential of the inexact preconditioner \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) when compared with the exact preconditioner \({\mathcal {P}}_{\mathrm{new}}\). For the moment, we only consider the simplest case by superseding A with its diagonal part, i.e., \({\tilde{A}}=\texttt {diag}(\texttt {diag}(A))\) in (22). It does not imply that using the diagonal part to replace A is optimal. Numerical practice with other inexact variants or finding efficient approximations of the Schur complement matrix needs further investigation and thus is regarded as an important future work. The numerical results are given in Table 5, where the inner linear systems are solved by the direct method. For coarse grids, as indicated in Table 5, the exact preconditioner \({\mathcal {P}}_{\mathrm{new}}\) is sometimes better than its inexact counterpart \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) (in CPU time) by a small margin or vice versa; see, for instance, the \(16\times 16\) case in Table 5. As the grids get finer, however, the inexact preconditioner becomes much better than the exact one in terms of CPU time, albeit with more iteration steps. This can be exemplified by the case \(128\times 128\) grid with \(\alpha =100\) where the CPU time ratio of the inexact preconditioner to its exact counterpart is merely \(10.3\%\).

Another interesting observation is that the CPU time exhausted by \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) is less sensitive to the grid size compared with those by EVHSS and \({\mathcal {P}}_{\mathrm{new}}\); cf. Tables 1, 2, 3, 4, and 5. This drops us a hint that the inexact preconditioner is more appropriate for solving large-scale linear systems (1). Apart from that, it appears that the value of \(\alpha \) has much less impact on the inexact preconditioner than it does on the exact prototype. Since the spectrum of the preconditioned matrix is closely related to GMRES convergence, it is inspiring to ask: how well do the eigenvalues cluster after using an inexact preconditioner and how does it affect the related convergence? To answer this question, we plot the eigenvalue distribution of \({\mathcal {A}}\widetilde{{\mathcal {P}}}_{\mathrm{new}}^{-1}\) in Fig. 5. As shown in Fig. 5, the eigenvalues of \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) become more clustered as \(\alpha \) goes up, which in turn leads to a faster convergence regarding iteration steps. Nevertheless, this does not apply to the spectrum of \({\mathcal {A}}\widetilde{{\mathcal {P}}}_{\mathrm{new}}^{-1}\). In fact, the spectrum of \({\mathcal {A}}\widetilde{{\mathcal {P}}}_{\mathrm{new}}^{-1}\) does not change too much even if \(\alpha \) increases from 0.01 to 100; see the right subplots in Fig. 5. This explains why the acceleration is not so remarkable for \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) when compared with \({\mathcal {P}}_{\mathrm{new}}\) in terms of iteration steps. However, since the computational overhead involved in using the inexact preconditioner \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) is far less than that of the exact preconditioner \({\mathcal {P}}_{\mathrm{new}}\), then the CPU time of the former can be expected to be much less than that of the latter. In most practical situations, it is the CPU time that we are really concerned with. With this insight in hand, we conclude that \(\widetilde{{\mathcal {P}}}_{\mathrm{new}}\) can be an efficient alternative for \({\mathcal {P}}_{\mathrm{new}}\) when the linear systems (1) is large and sparse.

Table 5 Numerical results for exact and inexact preconditioners on uniform grids
Fig. 5
figure 5

Eigenvalue distributions of \({\mathcal {A}}{\mathcal {P}}_{\mathrm{new}}^{-1}\) and \({\mathcal {A}}\widetilde{{\mathcal {P}}}_{\mathrm{new}}^{-1}\) for \(32\times 32\) uniform grid

5 Conclusion

We present a fast HSS-type iterative scheme for solving the generalized saddle point problem. Theoretical results on unconditional convergence, eigenvalue distribution of the preconditioned saddle point matrix and the speed-up effect on GMRES have been established. Based on the new preconditioner, a framework for developing practical inexact preconditioners is also proposed. Numerical experiments show that the new preconditioner and its practical inexact variants are superior to EVHSS for most cases in terms of the CPU time. Future work includes constructing efficient approximations to the Schur complement for large, sparse and structured saddle point linear systems from different applied fields.