1 Introduction

Stokes equations appear in numerous applications such as incompressible fluid dynamics and some structural mechanics problems. Their discretization lead to linear systems whose matrix has a \(2\times 2\) block structure, in which the lower diagonal block (the one related to the pressure unknowns) is either a zero block or a block containing very small matrix elements.

This makes uneasy the application of multigrid methods to solve these linear systems, since standard smoothing iterations such as damped Jacobi or Gauss–Seidel methods may be either undefined or not convergent. Nevertheless, the tradition of solving Stokes equations with multigrid is long, and interesting approaches have been presented during the last 35 years, each of them characterized by the use of a specific smoother. This includes the Vanka smoother [34], in which the primary unknowns, pressure and the velocities in a grid cell, are updated simultaneously. Another approach is called “distributive smoothing”, in which one first transforms the discrete system in such a way that standard Gauss–Seidel smoothing performs well on the transformed system [5, 39]. Inexact Uzawa type procedures (e.g. [2, Sect. 8.1]) have also been considered as smoothing iterations for multigrid schemes; see [14, 18].

It is worth noting that these approaches have rarely been considered in combination with algebraic multigrid (AMG) schemes (see, however [20, 36, 37]). There are indeed two obstacles. On the one hand, it is part of the philosophy of AMG methods to fix the smoother to a simple scheme such as damped Jacobi or Gauss–Seidel, and address any peculiarity of the linear system via the design of the prolongation operator. On the other hand, the lack of a proper diagonal block for the pressure unknowns does not permit to use the “unknown-based” coarsening [9, 32], in which the prolongation operator is set up by considering separately the different type of unknowns (in the present case, velocity components and pressure). Another possible approach is “point-based” AMG [9, Sect. 3.4], in which a general coarsening scheme is set up for the discretization grid, and then used separately on each type of unknown. A difficulty might be here that the discretization of Stokes equations often uses different grids for the different types of unknowns: staggered grids in the finite difference case [38], or different order of accuracy for finite element discretizations [12].

In the present paper we develop the foundation of a novel approach that naturally overcomes these difficulties. Like distributive smoothing, it is based on a transformation of the original linear system. However, distributive smoothing considers the transformation only as a mean to obtain convergent smoothers for the original system, whereas here we propose a transformation designed to allow the straightforward application of multigrid schemes in their whole.

The transformation consists in pre- and post-multiplication with simple, algebraic, sparse block triangular matrices. It provides a form of pre-conditioning in the literal sense, the multigrid scheme being afterwards applied to the transformed system. The transformed matrix is “well adapted” to multigrid firstly because all the diagonal blocks are symmetric and positive definite (SPD), and resemble, or correspond to, a discrete Laplace operator. Hence, it is straightforward to apply the “unknown-based” coarsening strategy. Secondly, we show that the approach is theoretically well founded by proving a uniform bound on the two-grid convergence factor associated with damped Jacobi-smoothing for the global system, under the main assumption that the two-grid schemes for the different diagonal blocks are themselves uniformly convergent—a requirement easy to meet given that these blocks are discrete Laplace-like matrices.

Our results are general since they are compatible with virtually any type of algebraic or even geometric coarsening scheme. The proposed approach can thus be combined with each one’s favorite method: classical AMG [6, 32], smoothed aggregation AMG [35], or plain (unsmoothed) aggregation AMG [22, 25]. On the other hand, we focus on rigorous convergence proofs, which goes along with some limitations: only two-grid schemes with Galerkin coarse grid matrices and a single step of damped Jacobi smoothing are covered. Of course, to fully validate a method, some practically oriented questions should be further investigated, such as: which smoothing scheme performs best in practice, which multigrid cycle is recommended, etc. However, a proper answer to these questions would likely depend on the type of coarsening used and possibly also on the class of Stokes problems under considerations. To keep the present study general, these questions are therefore deliberately left for future research. The results presented here are thus to be seen as a proof of concept for a class of methods, rather than a complete study of one particular instance.

To conclude this introduction, let us stress that the above remarks on the use of AMG schemes to solve discrete Stokes problems focus on their direct application to the system of coupled PDEs. It goes without saying that the mentioned difficulties do not concern the use of AMG techniques within the framework of block preconditioning methods. These (see, e.g. [2, 11, 12, 29]) require approximations of the inverse of certain matrix blocks, and are in fact most effective when multigrid is used for this purpose. Here, one may apply AMG without particular difficulty because these matrix blocks are related to scalar Poisson-like problems. Therefore, when geometric multigrid cannot be used, this combination of block preconditioning with AMG is often the most effective option. It is also widely used in practice. Note, however, that when geometric multigrid can be used, the comparison developed in [16] tends to favor multigrid schemes for the coupled system. This gives strong motivation in the development of AMG schemes that can similarly be directly applied to the coupled system. In Sect. 6 below, we give a limited comparison of the approach proposed here with the most popular of these block preconditioners.

The remaining of this paper is organized as follows. In Sect. 2, we present the class of Stokes problems that serves us as motivation, and give some properties of the associated linear systems. The proposed approach is then described in Sect. 3, whereas its analysis is developed in Sect. 4. In Sect. 5, we discuss two particular examples, providing also some numerical illustrations. Some further numerical results including a comparison with block diagonal preconditioning is provided in Sect. 6. Concluding remarks are given in Sect. 7.

1.1 Notation

For any (possibly complex) vector \(\mathbf{v}\), \(\mathbf{v}^T\) is its transpose, \(\Vert \mathbf{v}\Vert \) is its 2-norm and, for arbitrary SPD matrix \(G\), \(\Vert \mathbf{v}\Vert _G\) is the associated energy norm: \(\Vert \mathbf{v}\Vert _G=\sqrt{\mathbf{v}^*G\,\mathbf{v}}\). For any square matrix \(C\), \(\rho (C)\) is its spectral radius (i.e., its largest eigenvalue in modulus) and \(\sigma (C)\) is its spectrum. If \(C\) has only real eigenvalues (e.g., if it is similar to a symmetric matrix), \(\lambda _{\min }(C)\) and \(\lambda _{\max }(C)\) further stand for its smallest and largest eigenvalue, respectively. Finally, inequalities between square symmetric matrices are to be understood in the nonnegative definite sense: \(C_1\le C_2\) if and only if \(C_1-C_2\) is nonnegative definite.

2 Stokes equations and their discretization

We consider the following problem: find the velocity vector \({\mathbf u}\) and the pressure field \(p\) satisfying

$$\begin{aligned} \xi {\mathbf u}-\nu \Delta {\mathbf u} + \nabla p&= {\mathbf f}, \quad \hbox {in } \Omega , \nonumber \\ \nabla \cdot {\mathbf u}&= 0, \quad \hbox {in } \Omega , \end{aligned}$$
(2.1)

and appropriate boundary conditions on \(\partial \Omega \). In (2.1), \(\Omega \) is a bounded domain of \(R^2\) or \(R^3\), \({\mathbf f}\) represents a prescribed force, and the parameters \(\nu >0\) (viscosity) and \(\xi \ge 0\) are given. The latter is often a quantity proportional to the inverse of the time-step in an implicit time integration method applied to a nonstationary Stokes problem; \(\xi =0\) corresponds to the classical stationary Stokes problem.

With most schemes, the discretization of (2.1) leads to a linear system of the form

$$\begin{aligned} \mathcal{A}\begin{pmatrix}\mathbf{u}\\ \mathbf{p}\end{pmatrix}= \begin{pmatrix}\mathbf{b}_{\mathbf{u}} \\ \mathbf{b}_{\mathbf{p}}\end{pmatrix}\end{aligned}$$
(2.2)

where

$$\begin{aligned} \mathcal{A}= \begin{pmatrix}A &{}\quad B^T \\ B &{}\quad -C\end{pmatrix}. \end{aligned}$$
(2.3)

In this matrix, \(A\) is the discrete representation of the operator \(\xi -\nu \Delta \); more precisely, \(A\) is block diagonal with one diagonal block per spatial dimension, being the discrete operator acting on one of the velocity components. It further follows that \(A\) is SPD. The matrix block \(B^T\) is the discrete gradient and \((-B)\) the discrete divergence; \(C\) is a stabilization term which is needed by some discretization schemes to avoid spurious solutions. Such spurious solutions arise when the discrete gradient admits more than the constant vector in its null space or near null space; i.e., when the discrete gradient is zero or near zero for some spurious pressure modes. The existence of such modes depends on which discretization scheme is used for velocities and pressure. We refer to, e.g. [12, 38] for more details on, respectively, finite difference and finite element discretizations. Note a required property of the stabilization operator: if \(B\) is not full rank, \(C\) has to be positive definite on the null space of \(B^T\), which further entails that the system matrix is nonsingular [2].

An important exception to this latter rule is when the boundary conditions are such that the physical pressure is only determined up to a constant. In such cases some additional condition is needed to make the problem well-posed. Often, one imposes that the mean pressure is equal to zero. Then, the discrete system is singular but compatible, and, regarding iterative methods, one has to care that the iteration is actually performed in a subspace in which the system is regular. Often this raises no particular difficulty, and this approach is in general preferred to the alternative that consists in fixing the discrete pressure at some point. Indeed, this latter approach transforms the singular system in a near singular one; i.e., it introduces a form off ill-conditioning, which may spoil the convergence iterative methods.

However, a nice features of multigrid methods lies in their ability to handle efficiently near singular systems, and the approach presented here makes no exception. For the examples considered in Sect. 5, we even numerically checked that the eigenvalue distribution of the iteration matrix was nearly identical with both the techniques mentioned in the previous paragraph. Hence we shall only discuss explicitly the second one (in which the problem is regularized). This further displays the robustness of our approach while keeping the overall presentation simpler, allowing us to develop the theory for regular matrices only. More precisely, the results to follow are for matrices of the form (2.3), assuming that \(A\) is SPD, that \(C\) is nonnegative definite, and that either \(B\) is full rank or \(C\) is positive definite on the null space of \(B^T\) (which, as note above, implies that \(\mathcal{A}\) is nonsingular).

3 The proposed approach

Let \(D_A=\text{ diag }(A)\) and let \(\alpha \) be a positive parameter such that \(\alpha \approx (\lambda _{\max }(D_A^{-1}A))^{-1}\); e.g., \(\alpha =(\Vert D_A^{-1}A\Vert _{\infty })^{-1}\). Defining

$$\begin{aligned} \mathcal{L}= \begin{pmatrix}I_n &{} \\ \alpha B D_A^{-1}&{} -I_m\end{pmatrix}~,\quad \mathcal{U}= \begin{pmatrix}I_n &{} -\alpha D_A^{-1}B^T\\ &{} I_m\end{pmatrix}, \end{aligned}$$
(3.1)

our approach is firstly based on a change of variable:

$$\begin{aligned} \begin{pmatrix}\mathbf{u}\\ \mathbf{p}\end{pmatrix}= \mathcal{U}\,\begin{pmatrix}\widehat{\mathbf{u}} \\ \widehat{\mathbf{p}} \end{pmatrix}. \end{aligned}$$

Then, multiplying both side of (2.2) to the left by \(\mathcal{L}\), we obtain

$$\begin{aligned} \widehat{\mathcal{A}}\begin{pmatrix}\widehat{\mathbf{u}} \\ \widehat{\mathbf{p}} \end{pmatrix}= \mathcal{L}\,\begin{pmatrix}\mathbf{b}_{\mathbf{u}} \\ \mathbf{b}_{\mathbf{p}}\end{pmatrix}\end{aligned}$$
(3.2)

with \(\widehat{\mathcal{A}}=\mathcal{L}\,\mathcal{A}\,\mathcal{U}\); i.e.,

$$\begin{aligned} \widehat{\mathcal{A}}= \begin{pmatrix}\widehat{A}&{}\quad \widehat{B}^T \\ -\widehat{B}&{}\quad \widehat{C}\end{pmatrix}= \begin{pmatrix}A &{}\quad (I_n-\alpha A\,D_A^{-1})B^T \\ -B(I_n-\alpha D_A^{-1}\,A) &{}\quad C+B(2\alpha D_A^{-1}-\alpha ^2 D_A^{-1}A D_A^{-1})B^T\end{pmatrix}. \end{aligned}$$
(3.3)

Our approach consists then in solving the transformed system (3.2) with multigrid. More precisely, we propose to use the “unknown-based” multigrid approach [9, 32], in which the coarsening for a system of discrete PDEs is obtained by considering separately the diagonal blocks associated to the different type of unknowns. This is consistent with the structure of the block diagonal part of \(\widehat{\mathcal{A}}\) :

$$\begin{aligned} \text{ diag }_\mathrm{block}\left( \widehat{\mathcal{A}}\right) ~= ~ \begin{pmatrix}A &{} \\ &{} \widehat{C}\end{pmatrix}\end{aligned}$$
(3.4)

is SPD.Footnote 1 Further, its diagonal blocks are “well adapted” to multigrid. Indeed, as noted in the preceding section, each diagonal block of \(A\) (one per spatial dimension) is a discrete representation of \(\xi -\nu \Delta \). Regarding \(\widehat{C}\), there is no similar general argument, but, in the two examples considered in Sect. 5, it turns out that \(\widehat{C}\) is a symmetric M-matrix with nonnegative row-sum, and further that it corresponds to a stencil which is a linear combination of several classical stencils for the Laplace operator. This is likely connected to the fact that, for the suggested value of \(\alpha \), the dominating term in \(\widehat{C}\) is \(2\alpha \,B\,D_A^{-1}B^T\), where \(B^T\) is the discrete gradient and \((-B)\) the discrete divergence, whereas \(D_A\) is essentially proportional to the viscosity \(\nu \); hence, \(B\,D_A^{-1}B^T\approx -c ~\nabla \cdot (\nu ^{-1}\,\nabla ) \) for some constant \(c\).

Thus we pursue the discussion presuming that both \(A\) and \(\widehat{C}\) are “well adapted” to multigrid, and hence that it is easy to set up relevant prolongations matrices \(P_A\) and \(P_{\widehat{C}}\). We further assume that these prolongations work well in combination with damped Jacobi smoothing. Note that this is a natural requirement for prolongations obtained by applying AMG methods to \(A\) and \(\widehat{C}\), since this is part of the philosophy of these methods that the prolongation is designed to work well with simple smoothers.

Equipped with these \(P_A\) and \(P_{\widehat{C}}\), we define the prolongation for the global system

$$\begin{aligned} \mathcal{P}= \begin{pmatrix}P_A &{}\quad \\ &{}\quad P_{\widehat{C}}\end{pmatrix}. \end{aligned}$$
(3.5)

And, in fact, nothing else is needed to setup a relevant two-grid scheme for \(\widehat{\mathcal{A}}\), combining this prolongation with damped Jacobi smoothing and the Galerkin coarse grid matrix

$$\begin{aligned} \widehat{\mathcal{A}}_c = \mathcal{P}^T\,\widehat{\mathcal{A}}\,\mathcal{P}. \end{aligned}$$

In particular, letting

$$\begin{aligned} \mathcal{D}= \text{ diag }\left( \widehat{\mathcal{A}}\right) = \begin{pmatrix}\text{ diag }(A) &{} \\ &{}\quad \text{ diag }\left( \widehat{C}\right) \end{pmatrix}, \end{aligned}$$
(3.6)

we can define the iteration matrix associated with a basic scheme using only a single step of damped Jacobi post-smoothing:

$$\begin{aligned} \mathcal{T}= \left( I-\omega \,\mathcal{D}^{-1}\widehat{\mathcal{A}}\right) \left( I-\mathcal{P}\widehat{\mathcal{A}}_c^{-1}\mathcal{P}^T\widehat{\mathcal{A}}\right) . \end{aligned}$$
(3.7)

In the following section, we prove a bound on the spectral radius of \(\mathcal{T}\) that necessitates only the convergence of similar two-grid schemes for the diagonal blocks \(A\) and \(\widehat{C}\) (considered separately).

3.1 Computational cost and implementation

Clearly, the transformed matrix (3.3) will be denser than the original matrix, with a potential significant impact on the overall efficiency of the proposed approach. However, most of this impact can be annihilated with a clever implementation.

Indeed, at fine grid level, instead of storing explicitly the transformed matrix, one can keep the original one (2.3) together with the triangular factors (3.1), and implement the matrix vector product with \(\mathcal{A}\) as the product with, successively, \(\mathcal{U}\), \(\mathcal{A}\), and \(\mathcal{L}\). Because \(B\) is usually very sparse, the extra cost compared with a mere multiplication by \(\mathcal{A}\) will be moderate.

Moreover, looking closely at the operations involved with this strategy, it turns out that further saving is possible because the steps associated with \(B\) and \(B^T\) (both present in \(\mathcal{A}\) and in either \(\mathcal{U}\) or \(\mathcal{L}\)) can be combined so as to perform only one multiplication with these terms. The following algorithm computes

$$\begin{aligned} \begin{pmatrix}\mathbf{w}_{\mathbf{u}} \\ \mathbf{w}_{\mathbf{p}}\end{pmatrix}= \widehat{A}\,\begin{pmatrix}\mathbf{v}_{\mathbf{u}} \\ \mathbf{v}_{\mathbf{p}}\end{pmatrix}\end{aligned}$$

according this idea.

$$\begin{aligned}&1. &\hat{\mathbf{w}}_{\mathbf{u}}&=B^T\,\mathbf{v}_{\mathbf{p}}~ \\&2. &\tilde{\mathbf{w}}_{\mathbf{u}}&=\mathbf{v}_{\mathbf{u}}-\alpha \,D_A^{-1}\,\hat{\mathbf{w}}_{\mathbf{u}} \\&3. &{\mathbf{w}}_{\mathbf{u}}&=A\,\tilde{\mathbf{w}}_{\mathbf{u}}+\hat{\mathbf{w}}_{\mathbf{u}} \\&4. &\bar{\mathbf{w}}_{\mathbf{u}}&= \tilde{\mathbf{w}}_{\mathbf{u}}-\alpha \,D_A^{-1}\,{\mathbf{w}_{\mathbf{u}}} \\&5. &\mathbf{w}_{\mathbf{p}}&=-B\,\bar{\mathbf{w}}_{\mathbf{u}}+C\,\mathbf{v}_{\mathbf{p}}. \end{aligned}$$

The validity of this algorithm can be checked following the steps backward, which gives

$$\begin{aligned} \mathbf{w}_{\mathbf{p}}&= B\,\left( \alpha \,D_A^{-1}\,{\mathbf{w}_{\mathbf{u}}}-\tilde{\mathbf{w}}_{\mathbf{u}}\right) +C\,\mathbf{v}_{\mathbf{p}} \\&= B\,\left( \alpha \,D_A^{-1}\,\left( A\,\tilde{\mathbf{w}}_{\mathbf{u}}+\hat{\mathbf{w}}_{\mathbf{u}}\right) -\tilde{\mathbf{w}}_{\mathbf{u}}\right) +C\,\mathbf{v}_{\mathbf{p}} \\&= B\,\left( \left( \alpha \,D_A^{-1}A-I\right) \left( \mathbf{v}_{\mathbf{u}}-\alpha \,D_A^{-1}\,\hat{\mathbf{w}}_{\mathbf{u}}\right) + \alpha \,D_A^{-1}\,\hat{\mathbf{w}}_{\mathbf{u}} \right) +C\,\mathbf{v}_{\mathbf{p}} \\&= B\,\left( \alpha \,D_A^{-1}A-I\right) \mathbf{v}_{\mathbf{u}} +B\,\left( 2\,\alpha \,D_A^{-1}-\alpha ^2 D_A^{-1}A\,D_A^{-1}\right) B^T\,\mathbf{v}_{\mathbf{p}}+C\,\mathbf{v}_{\mathbf{p}} \\&= -\widehat{B}\,\mathbf{v}_{\mathbf{u}}+\widehat{C}\,\mathbf{v}_{\mathbf{p}}, \\ {\mathbf{w}}_{\mathbf{u}}&= A\,\left( \mathbf{v}_{\mathbf{u}}-\alpha \,D_A^{-1}\,\hat{\mathbf{w}}_{\mathbf{u}} \right) +\hat{\mathbf{w}}_{\mathbf{u}} \\&= A\,\mathbf{v}_{\mathbf{u}}+\left( I-\alpha \,A\,D_A^{-1}\right) B^T\,\mathbf{v}_{\mathbf{p}} \\&= \widehat{A}\,\mathbf{v}_{\mathbf{u}}+\widehat{B}^T \,\mathbf{v}_{\mathbf{p}}. \end{aligned}$$

Exchanging steps 2 and 4 for \(\tilde{\mathbf{w}}_{\mathbf{u}}=\mathbf{v}_{\mathbf{u}}\) and \(\bar{\mathbf{w}}_{\mathbf{u}}=\tilde{\mathbf{w}}_{\mathbf{u}}\) (i.e., setting \(\alpha =0\)), the above algorithm reduces to a sensible implementation of the multiplication by the original matrix \(\mathcal{A}\) (up to a change sign of \(\mathbf{w}_{\mathbf{p}}\)). Hence, steps 2 and 4 represent the only extra operations involved by the transformation of \(\mathcal{A}\) into \(\widehat{\mathcal{A}}\). Storing the entries of the diagonal matrix \(\alpha \, D_A^{-1}\) in a vector, this amounts to only two multiplications and two additions per velocity unknown.

4 Theoretical analysis

4.1 A preliminary motivating result

We first need to recall some results from the algebraic convergence theory of two-grid methods. We start with the definition of the approximation property constant \(K(G,P_G,M_G)\) associated with a triplet of matrices \(G\), \(P_G\) and \(M_G\), where \(G\) is the matrix to which the two-grid scheme is applied, \(P_G\) the prolongation matrix of the two-grid scheme, and \(M_G\) a matrix related to the smoother (see below). For a SPD matrix \(G\), the usage of \(K(G,P_G,\text{ diag }(G))\) traces back to [4]. Here the definition is extended to general (possibly nonsymmetric) matrices positive definite in \(\mathbb {R}^n\); that is, to matrices \(G\) such that

$$\begin{aligned} \mathbf{v}^TG\,\mathbf{v}> 0 \quad \forall \mathbf{v}\in \mathbb {R}^n\backslash \{\mathbf{0}\}, \end{aligned}$$

or, equivalently, such that

$$\begin{aligned} G_S = \tfrac{1}{2}(G+G^T) \end{aligned}$$
(4.1)

is SPD. For the sake of completeness, we give two formulations whose equivalence is well known (see [28] for an explicit proof).

Definition 4.1

Let \(G\) and \(M_G\) be \(n\times n\) matrices such that \(G\) is positive definite in \(\mathbb {R}^n\) and \(M_G\) is SPD. Let \(P_G\) be an \(n\times n_c\) matrix of rank \(n_c<n\). The associated approximation property constant is

$$\begin{aligned} K(G,P_G,M_G) = \max _{\mathbf{v}\in \mathbb {R}^n\backslash \{\mathbf{0}\}} \frac{\mathbf{v}^T\, M_G\big (I-P_G\left( P_G^TM_G\,P_G\right) ^{-1}P_G^T M_G\big ) \,\mathbf{v}}{\mathbf{v}^T\,G\,\mathbf{v}} . \end{aligned}$$
(4.2)

Equivalently, \(K(G,P_G,M_G)\) is the smallest constant \(K\) such that

$$\begin{aligned} \forall \mathbf{u}\in \mathbb {R}^n~\exists \mathbf{v}\in \mathbb {R}^{n_c}~\text { such that }~ \Vert \mathbf{u}-P_G\,\mathbf{v}\Vert ^2_{M_G} \le K~\Vert \mathbf{u}\Vert _{G_S}^2, \end{aligned}$$
(4.3)

where \(G_S\) is defined by (4.1).

In [4], a bound on the two grid convergence rate is obtained for \(G\) SPD, based on \(K(G,P_G,\text{ diag }(G))\) and a further smoothing property constant. The theory has been later improved and extended in, e.g. [7, 13, 26, 32] (see [28] for a review). In particular, Corollary 2.1 in [26] allows to make a direct connection between the approximation property constant and the convergence of a mere two-grid method with a single post-smoothing step; namely, the two-grid method with iteration matrix

$$\begin{aligned} T_G= \left( I-M_G^{-1}G\right) \left( I-P_GG_c^{-1}P_G^T G\right) , \end{aligned}$$
(4.4)

where

$$\begin{aligned} G_c = P_G^T \,G\,P_G\end{aligned}$$
(4.5)

is the Galerkin coarse grid matrix. Furthermore, Corollary 2.2 of [26] extends this connection to nonsymmetric matrices positive definite in \(\mathbb {R}^n\). These results from [26] are recalled in the following lemma.

Lemma 4.1

Let \(G\) and \(M_G\) be \(n\times n\) matrices such that \(G\) is positive definite in \(\mathbb {R}^n\) and \(M_G\) is SPD. Let \(P_G\) be an \(n\times n_c\) matrix of rank \(n_c<n\).

Letting \(T_{G}\) be defined by (4.4), the eigenvalues of \(I-T_{G}\) that are not equal to 1 are the inverse of the nonzero eigenvalues of

$$\begin{aligned} G^{-1}M_G\bigg (I-P_G\left( P_G^TM_G\,P_G\right) ^{-1}P_G^T M_G\bigg ). \end{aligned}$$

Moreover, letting \(K(G,P_G,M_G)\) be as in Definition 4.1, the nonzero eigenvalues \(\lambda \) of \(T_{G}\) satisfy

$$\begin{aligned} \mathfrak {R}\mathrm{{\small e}}(\lambda ) \le 1-\frac{1}{K(G,P_G,M_G)}. \end{aligned}$$
(4.6)

If, in addition, \(G\) is symmetric (i.e., SPD), then the eigenvalues of \(T_{G}\) are real and

$$\begin{aligned} \max _{\lambda \in \sigma (T)\backslash \{0\}}\lambda&= 1- \frac{1}{K(G,P_G,M_G)} \end{aligned}$$
(4.7)
$$\begin{aligned} \min _{\lambda \in \sigma (T)\backslash \{0\}}\lambda&\ge 1- \lambda _{\max }(M_G^{-1}G). \end{aligned}$$
(4.8)

Applied to the SPD matrix \(A\) with \(P_A\) and \(M_A=\omega ^{-1}\text{ diag }(A)\), this lemma tells us that, for properly chosen \(\omega \) (i.e., such that \(1\le \omega \lambda _{\max }(D_A^{-1}A)\ll 2\)), the two-grid method for \(A\) with one single step of damped Jacobi smoothing converges fast if and only if \(K(A,P_A,D_A)\) is reasonably bounded. Similarly, the two-grid method for \(\widehat{C}\) with one single step of damped Jacobi smoothing converges fast if and only if \(K(\widehat{C},P_{\widehat{C}},D_{\widehat{C}})\) is reasonably bounded. Hence the requirement that the prolongations \(P_A\) and \(P_{\widehat{C}}\) work well in combination with damped Jacobi smoothing amounts to the requirement that \(K(A,P_A,D_A)\) and \(K(\widehat{C},P_{\widehat{C}},D_{\widehat{C}})\) are reasonably bounded.

Now, consider the transformed matrix (3.3). A first observation is that its symmetric part coincides with its block diagonal part:

$$\begin{aligned} \widehat{\mathcal{A}}_S = \tfrac{1}{2}\left( \widehat{\mathcal{A}}+\widehat{\mathcal{A}}^T\right) = \text{ diag }_\mathrm{block}\left( \widehat{\mathcal{A}}\right) = \begin{pmatrix}A &{} \\ &{}\quad \widehat{C}\end{pmatrix}. \end{aligned}$$

Hence \(\widehat{\mathcal{A}}\) is positive definite in \(\mathbb {R}^n\). Further, it is clear from Definition 4.1 that \(K(\widehat{A},\mathcal{P},\mathcal{M})\) \(= K(\widehat{A}_S,\mathcal{P},\mathcal{M})\) for any \(\mathcal{P},\mathcal{M}\). Therefore, for \(\mathcal{P}\) of the form (3.5) and \(\mathcal{D}\) as in (3.6):

$$\begin{aligned} K\big (\widehat{\mathcal{A}},\mathcal{P},\mathcal{D}) = \max \big (K(A,P_A,D_A),\,K(\widehat{C},P_{\widehat{C}},D_{\widehat{C}})\big ). \end{aligned}$$
(4.9)

Thus, in view of the preceding paragraph, the basic requirement that \(P_A\) and \(P_{\widehat{C}}\) work well in combination with damped Jacobi smoothing suffices to guarantee that \(K\big (\widehat{\mathcal{A}},\mathcal{P},\mathcal{D})\) is reasonably bounded; that is, see (4.6), to guarantee that the iteration matrix (3.7) associated with one single step of damped Jacobi smoothing for \(\mathcal{A}\) has no eigenvalue close to \(1\). This is a major step: what prevents the fast convergence of iterative methods is classically the presence of near singular modes for the system matrix, to which correspond eigenvalues of the iteration matrix that are very close to 1. Just with the above reasoning, we already know that the suggested approach handles efficiently such modes.

This is, however, not yet a full convergence proof: the inequality (4.6) does not say anything about the imaginary part of the eigenvalues, and does also not prevent eigenvalues with large negative real part. In the SPD case, the result (4.8) offers a useful complement to (4.7), but there is no such a general result for nonsymmetric matrices, except possibly for M-matrices [26, 27]. Hence a specific analysis is needed, which is developed in the next two subsections, the first one containing technical developments needed to prove the main results that are stated in the second one.

4.2 Technical lemmas

The first lemma extends, for saddle point matrices in the form (4.10) (i.e., nonsymmetric but positive definite in \(\mathbb {R}^{n+m}\)), the eigenvalue analysis developed in [3, Proposition 2.12] and [29, Theorem 4.1] to a form (4.12) of generalized eigenvalue problem where the right hand side matrix is an orthogonal projector. Observe that the matrices (3.3) resulting from the transformation suggested in Sect. 3 satisfy the assumptions of the lemma.

Lemma 4.2

Let

$$\begin{aligned} \mathcal{A}= \begin{pmatrix}{A}&{}\quad {B}^T \\ -{B}&{} \quad {C}\end{pmatrix}\end{aligned}$$
(4.10)

be a matrix such that \({{A}}\) and \({C}\) are, respectively, \(n\times n\) and \(m\times m\) SPD matrices.

Let

$$\begin{aligned} \mathcal{Q}= \begin{pmatrix}Q_{{A}} &{} \quad \\ &{} \quad Q_{{C}}\end{pmatrix}~ \end{aligned}$$
(4.11)

be a matrix such that \(Q_{{A}} \) is \(n\times n_c\) and \(Q_{{C}}\) is \(m\times m_c\) with \(n_c\le n\), \(m_c\le m\) and \(Q_{{A}}^TQ_{{A}}= I_{n_c}\), \(Q_{{C}}^TQ_{{C}} =I_{m_c}\).

Letting

$$\begin{aligned} S_{{A}} = {{A}}+{B}^T {C}^{-1}{B}\quad \text { and } \quad S_{{C}} = {C}+{B}\,{{A}}^{-1}{B}^T, \end{aligned}$$

define

$$\begin{aligned} \kappa _{{A}}&= \lambda _{\max }(Q_{{A}}^T {{A}}^{-1}Q_{{A}}),\quad \gamma _{{A}}= \lambda _{\max }(S_{{A}}), \\ \kappa _{{C}}&= \lambda _{\max }(Q_{{C}}^T{C}^{-1}Q_{{C}}),\quad \gamma _{{C}}= \lambda _{\max }(S_{{C}}), \\ \overline{\kappa }&= \frac{2\,\kappa _{{A}}\,\kappa _{{C}}}{\kappa _{{A}}+\kappa _{{C}}},\quad \overline{\gamma }= \frac{2\,\gamma _{{A}}\,\gamma _{{C}}}{\gamma _{{A}}+\gamma _{{C}}}. \end{aligned}$$

The eigenvalues \(\lambda \) of the generalized eigenvalue problem

$$\begin{aligned} \mathcal{A}\, \mathbf{z}= \lambda \,\mathcal{Q}\,\mathcal{Q}^T \, \mathbf{z},\quad \mathcal{Q}^T\,\mathbf{z}\ne 0 \end{aligned}$$
(4.12)

are such that either

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) = 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda )\ge \big ({\max \left( \kappa _{{A}},\,\kappa _{{C}}\right) }\big )^{-1}\\ \mathfrak {R}\mathrm{{\small e}}(\lambda )\le \max \left( \gamma _{{A}},\,\gamma _{{C}}\right) \end{array}\right. } \end{aligned}$$
(4.13)

or

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) \ne 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \ge \overline{\kappa }^{-1}\\ \left| \lambda -\tfrac{\overline{\gamma }}{2}\right| \le \tfrac{\overline{\gamma }}{2}. \end{array}\right. } \end{aligned}$$
(4.14)

Proof

The stated results (4.13), (4.14) are straightforward corollaries of Theorem 4.1 of [29] if we can show that the eigenvalues of (4.12) are also that of a matrix

$$\begin{aligned} \mathcal{F}= \begin{pmatrix}F_A &{} F_B^T \\ -F_B &{} F_C\end{pmatrix}\end{aligned}$$
(4.15)

with \(F_A\), \(F_C\) SPD and such that

$$\begin{aligned} \lambda _{\min }(F_A)&\ge \kappa _{{A}}^{-1}, \quad \lambda _{\max }\left( F_A+F_B^T F_C^{-1}F_B\right) \le \gamma _{{A}}, \end{aligned}$$
(4.16)
$$\begin{aligned} \lambda _{\min }(F_C)&\ge \kappa _{{C}}^{-1}, \quad \lambda _{\max }\left( F_C+F_B F_A^{-1}F_B^T\right) \le \gamma _{{C}}. \end{aligned}$$
(4.17)

Now, the eigenvalues of (4.12) are the inverse of the nonzero eigenvalues of \(\mathcal{A}^{-1}\mathcal{Q}\,\mathcal{Q}^T\), which, by virtue of Theorem 1.3.22 in [15] are also the eigenvalues of \(\mathcal{Q}^T\mathcal{A}^{-1}\mathcal{Q}\). Thus, it suffices to check the above relations for \(\mathcal{F}=\big (\mathcal{Q}^T\mathcal{A}^{-1}\mathcal{Q}\big )^{-1}\).

In this view, note first that, for any matrix of the form (4.15), its inverse is given by

$$\begin{aligned} \begin{pmatrix}F_A &{}\quad F_B^T \\ -F_B &{}\quad F_C\end{pmatrix}^{-1}&= \begin{pmatrix}S_{F_A}^{-1}&{}\quad -S_{F_A}^{-1}F_B^T F_C^{-1}\\ F_C^{-1}F_B\,S_{F_A}^{-1}&{}\quad S_{F_C}^{-1}\end{pmatrix}~ \nonumber \\&= \begin{pmatrix}S_{F_A}^{-1}&{}\quad -{F_A}^{-1}F_B^T S_{F_C}^{-1}\\ S_{F_C}^{-1}F_B\,F_A^{-1}&{}\quad S_{F_C}^{-1}\end{pmatrix}\end{aligned}$$
(4.18)

(see [1, p. 93]), where

$$\begin{aligned} S_{F_A}&= F_A+F_B^T F_C^{-1}F_B, \quad S_{F_C}= F_C+F_B F_A^{-1}F_B^T. \end{aligned}$$

Applying the above identity to obtain the inverse of \(\mathcal{A}\) then yields

$$\begin{aligned} \mathcal{F}^{-1}= \mathcal{Q}^T\mathcal{A}^{-1}\mathcal{Q}= \begin{pmatrix}Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}} &{}\quad -Q_{{A}}^T\,{{A}}^{-1}{B}^T S_{{C}}^{-1}\,Q_{{C}} \\ Q_{{C}}^T\, S_{{C}}^{-1}{B}\,{{A}}^{-1}\,Q_{{A}} &{}\quad Q_{{C}}^T\, S_{{C}}^{-1}\,Q_{{C}} \end{pmatrix}~ . \end{aligned}$$
(4.19)

Comparing with (4.18), we immediately obtain that

$$\begin{aligned} F_A+F_B^T F_C^{-1}F_B&= \left( Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}}\right) ^{-1},&F_C+F_B F_A^{-1}F_B^T&= \left( Q_{{C}}^T\, S_{{C}}^{-1}\,Q_{{C}}\right) ^{-1}. \end{aligned}$$

Therefore, the right inequalities (4.16), (4.17) amount to

$$\begin{aligned} \lambda _{\min }\left( Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}}\right)&\ge \gamma _{{A}}^{-1}, \quad \lambda _{\min }\left( Q_{{C}}^T\, S_{{C}}^{-1}\,Q_{{C}}\right) \ge \gamma _{{C}}^{-1}, \end{aligned}$$

which hold because (remembering the assumption \(Q_A^TQ_A=I_{n_c}\))

$$\begin{aligned} \lambda _{\min }\left( Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}}\right)&= \min _{\mathbf{v}\in \mathbb {R}^{n_c}\backslash \{\mathbf{0}\}} \frac{\mathbf{v}^T\,Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}}\,\mathbf{v}}{\mathbf{v}^T\,\mathbf{v}} \\&= \min _{\mathbf{v}\in \mathbb {R}^{n_c}\backslash \{\mathbf{0}\}}\frac{\mathbf{v}^T\,Q_{{A}}^T\, S_{{A}}^{-1}\,Q_{{A}}\,\mathbf{v}}{\mathbf{v}^T\,Q_{{A}}^T\,Q_{{A}}\,\mathbf{v}} \\&\ge \min _{\mathbf{v}\in \mathbb {R}^n\backslash \{\mathbf{0}\}}\frac{\mathbf{v}^T\, S_{{A}}^{-1}\,\mathbf{v}}{\mathbf{v}^T\,\mathbf{v}} = \lambda _{\min }\left( S_{{A}}^{-1}\right) = \gamma _{{A}}^{-1}, \end{aligned}$$

and, similarly, \(\lambda _{\min }(Q_{{C}}^T\,S_{{C}}^{-1}\,Q_{{C}})\ge \lambda _{\min }(S_{{C}}^{-1})=\gamma _{{C}}^{-1}\).

We are thus left with the proof of the left inequalities (4.16), (4.17). Here we consider the application of (4.18) to the inverse of \(\mathcal{F}\) expressed in (4.19). This shows that \(F_A\) and \(F_C\) are equal to the inverse of the Schur complements of the matrix in the right hand side of (4.19); that is,

$$\begin{aligned} F_A^{-1}= Q_{{A}}^T\Big (S_{{A}}^{-1}+ {{A}}^{-1}{B}^T S_{{C}}^{-1}\, Q_{{C}} \left( Q_{{C}}^T\, S_{{C}} \,Q_{{C}} \right) ^{-1}Q_{{C}}^T\, S_{{C}}^{-1}{B}\,{{A}}^{-1}\Big )Q_{{A}}, \end{aligned}$$

and, similarly (permuting the roles of \(A\) and \(C\)),

$$\begin{aligned} F_C^{-1}= Q_{{C}}^T\Big (S_{{C}}^{-1}+ {C}^{-1}{B}^T S_{{A}}^{-1}\, Q_{{A}} \left( Q_{{A}}^T\, S_{{A}} \,Q_{{A}}\right) ^{-1}Q_{{A}}^T\, S_{{A}}^{-1}{B}\,{C}^{-1}\Big )Q_{{C}}. \end{aligned}$$

Now, for any positive definite \(R_{{C}}\), \(R_{{C}}^{1/2} Q_{{C}} (Q_{{C}}^T R_{{C}} Q_{{C}})^{-1}Q_{{C}}^T R_{{C}}^{1/2}\) is an orthogonal projector, entailing \(R_{{C}}^{1/2} Q_{{C}} (Q_{{C}}^T R_{{C}} Q_{{C}})^{-1}Q_{{C}}^T R_{{C}}^{1/2}\le I\) and therefore \(Q_{{C}} (Q_{{C}}^T R_{{C}} Q_{{C}})^{-1}Q_{{C}}^T \le R_{{C}}^{-1}\). Hence,

$$\begin{aligned} F_A^{-1}\le Q_{{A}}^T\Big (S_{{A}}^{-1}+ {{A}}^{-1}{B}^T S_{{C}}^{-1}{B}\,{{A}}^{-1}\Big )Q_{{A}} = Q_{{A}}^T {{A}}^{-1}\,Q_{{A}}, \end{aligned}$$

the last equality following from the fact that \(S_{{A}}^{-1}+ {{A}}^{-1}{B}^T S_{{C}}^{-1}{B}\,{{A}}^{-1}\) is a Schur complement of \(\mathcal{A}^{-1}\), hence its inverse has to be equal to the corresponding principal submatrix in \(\mathcal{A}\) [see again (4.18)].

Similarly one finds

$$\begin{aligned} F_C^{-1}\le Q_{{C}}^T\Big (S_{{C}}^{-1}+ {C}^{-1}{B}^T S_{{A}}^{-1}{B}\,{C}^{-1}\Big )Q_{{C}} = Q_{{C}}^T {C}^{-1}\,Q_{{C}}. \end{aligned}$$

Therefore:

$$\begin{aligned} \lambda _{\min }(F_A) = \bigg (\lambda _{\max }\left( F_A^{-1}\right) \bigg )^{-1}\ge \bigg (\lambda _{\max }\left( Q_{{A}}^T {{A}}^{-1}\,Q_{{A}}\right) \bigg )^{-1}= \kappa _{{A}}^{-1}\end{aligned}$$

and

$$\begin{aligned} \lambda _{\min }(F_C) = \bigg (\lambda _{\max }\left( F_C^{-1}\right) \bigg )^{-1}\ge \bigg (\lambda _{\max }\left( Q_{{C}}^T {C}^{-1}\,Q_{{C}}\right) \bigg )^{-1}= \kappa _{{C}}^{-1}~; \end{aligned}$$

i.e., the left inequalities (4.16), (4.17) that remained to be proved. \(\square \)

The second lemma states useful algebraic properties of the Schur complements of the matrices (3.3) resulting from the transformation suggested in Sect. 3.

Lemma 4.3

Let

$$\begin{aligned} \mathcal{A}= \begin{pmatrix}A &{}\quad B^T \\ B &{} \quad - C\end{pmatrix}~ \end{aligned}$$

be a matrix such that \(A\) is an \(n\times n\) SPD matrix, and \(C\) and \(m\times m\) nonnegative definite matrix. Let \(Z_A\) be a nonsingular \(n\times n \) matrix such that \(Z_A+Z_A^T-A\) is SPD, and let

$$\begin{aligned} \mathcal{L}= \begin{pmatrix}I_n &{} \\ B\,Z_A^{-1}&{} -I_m\end{pmatrix}~,\quad \mathcal{U}= \begin{pmatrix}I_n &{} -Z_A^{-T} B^T\\ &{} I_m\end{pmatrix}. \end{aligned}$$

The matrix

$$\begin{aligned} \mathcal{L}\,\mathcal{A}\,\mathcal{U}&= \begin{pmatrix}A &{}\quad \left( I_n-A\,Z_A^{-T}\right) B^T \\ -B(I_n-Z_A^{-1}\,A) &{}\quad C+B(Z_A^{-1}+Z_A^{-T}-Z_A^{-1}\,A\,Z_A^{-T})B^T \end{pmatrix}\\&= \begin{pmatrix}\widehat{A}&{}\quad \widehat{B}^T \\ -\widehat{B}&{} \quad \widehat{C}\end{pmatrix}\end{aligned}$$

is such that its Schur complements satisfy

$$\begin{aligned} S_{\widehat{A}}&= \widehat{A}+\widehat{B}^T\,\widehat{C}^{-1}\,\widehat{B}\le Z_A\,( Z_A+Z_A^T-A)^{-1}\, Z_A^T, \end{aligned}$$
(4.20)
$$\begin{aligned} S_{\widehat{C}}&= \widehat{C}+\widehat{B}\,\widehat{A}^{-1}\,\widehat{B}^T = C+B\,A^{-1}B^T. \end{aligned}$$
(4.21)

Proof

The last result (4.21) can be checked by direct computation. To prove (4.20), note first that, for any positive definite \(R_C\), \(R_C^{1/2} B^T (C+B R_C B^T)^{-1}B R_C^{1/2}\) has the same nonzero eigenvalues as \(B R_C B^T (C + B R_C B^T)^{-1}\). Hence, since \(C\) is nonnegative definite,

$$\begin{aligned} \lambda _{\max }\left( R_C^{1/2} B^T (C+B R_C B^T)^{-1}B R_C^{1/2}\right) = \max _{\mathbf{v}\ne 0}\frac{\mathbf{v}^T\, B R_C B^T\,\mathbf{v}}{\mathbf{v}^T\,(C + B R_C B^T) \,\mathbf{v}} \le 1, \end{aligned}$$

showing that \(B^T (C+B R_C B^T)^{-1}B \le R_C^{-1}\). Then, letting \(E_A=Z_A+Z_A^T-A\),

$$\begin{aligned} S_{\widehat{A}}&= A\!+\!\left( I_n\!-\!A\,Z_A^{-T}\right) B^T \bigg (C\!+\!B\left( Z_A^{-1}\!+\!Z_A^{-T}\!-\!Z_A^{-1}\,A\,Z_A^{-T}\right) B^T\bigg )^{-1}B\left( I_n-Z_A^{-1}\,A\right) \\&\le A+\left( I_n-A\,Z_A^{-T}\right) \left( Z_A^{-1}+Z_A^{-T}-Z_A^{-1}\,A\,Z_A^{-T}\right) ^{-1}\left( I_n-Z_A^{-1}\,A\right) \\&= A+\left( Z_A^T-A\right) \left( Z_A+Z_A^T-A\right) ^{-1}(Z_A-A) \\&= A+(E_A-Z_A) E_A^{-1}\left( Z_A^T-E_A\right) \\&= Z_A E_A^{-1}Z_A^T. \end{aligned}$$

\(\square \)

4.3 Main results

We first prove a general result for matrices of the form

$$\begin{aligned} \widehat{\mathcal{A}}= \begin{pmatrix}\widehat{A}&{}\quad \widehat{B}^T \\ -\widehat{B}&{} \quad \widehat{C}\end{pmatrix}\end{aligned}$$
(4.22)

with \(\widehat{A}\) and \(\widehat{C}\) SPD. Of course, more can be said for the specific case of matrices (3.3) resulting from the transformation of the discrete Stokes equations as suggested in this paper. This is considered in a subsequent corollary.

Theorem 4.4

Let \(\widehat{\mathcal{A}}\) be a matrix of the form (4.22) such that \({\widehat{A}}\) and \(\widehat{C}\) are, respectively, \(n\times n\) and \(m\times m\) SPD matrices.

Let \(M_{\widehat{A}}\), \(M_{\widehat{C}}\) be, respectively, \(n\times n\) and \(m\times m\) SPD matrices, and let \(P_{\widehat{A}}\) and \(P_{\widehat{C}}\) be, respectively, \(n\times n_c\) and \(m\times m_c\) matrices of rank \(n_c<n\) and \(m_c<m\). Set

$$\begin{aligned} \mathcal{M}&= \begin{pmatrix}M_{\widehat{A}} &{} \\ &{} M_{\widehat{C}}\end{pmatrix}, \quad \mathcal{P}= \begin{pmatrix}P_{\widehat{A}} &{} \\ &{} P_{\widehat{C}}\end{pmatrix}\end{aligned}$$

and

$$\begin{aligned} \mathcal{T}= \left( I-\mathcal{M}^{-1}\widehat{\mathcal{A}}\right) \left( I-\mathcal{P}\widehat{\mathcal{A}}_c^{-1}\mathcal{P}^T\widehat{\mathcal{A}}\right) , \end{aligned}$$

where

$$\begin{aligned} \widehat{\mathcal{A}}_c = \mathcal{P}^T\,\widehat{\mathcal{A}}\,\mathcal{P}. \end{aligned}$$

Letting

$$\begin{aligned} S_{\widehat{A}} = {\widehat{A}}+\widehat{B}^T \widehat{C}^{-1}\widehat{B}\quad \text { and } \quad S_{\widehat{C}} = \widehat{C}+\widehat{B}\,{\widehat{A}}^{-1}\widehat{B}^T, \end{aligned}$$

set, using Definition 4.1,

$$\begin{aligned} \kappa _{\widehat{A}}&= K\left( \widehat{A},P_{\widehat{A}},M_{\widehat{A}}\right) \!, \quad \gamma _{\widehat{A}}= \lambda _{\max }\left( M_{\widehat{A}}^{-1}S_{\widehat{A}}\right) \!, \\ \kappa _{\widehat{C}}&= K\left( \widehat{C},P_{\widehat{C}},M_{\widehat{C}}\right) \!, \quad \gamma _{\widehat{C}}= \lambda _{\max }\left( M_{\widehat{C}}^{-1}S_{\widehat{C}}\right) \!, \\ \overline{\kappa }&= \frac{2\,\kappa _{\widehat{A}}\,\kappa _{\widehat{C}}}{\kappa _{\widehat{A}}+\kappa _{\widehat{C}}}, \quad \overline{\gamma }= \frac{2\,\gamma _{\widehat{A}}\,\gamma _{\widehat{C}}}{\gamma _{\widehat{A}}+\gamma _{\widehat{C}}}. \end{aligned}$$

The nonzero eigenvalues \(\lambda \) of \(\mathcal{T}\) are such that either

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) = 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda )\le 1- \big ({\max \left( \kappa _{\widehat{A}},\,\kappa _{\widehat{C}}\right) }\big )^{-1}\\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \ge 1-\max \left( \gamma _{\widehat{A}},\,\gamma _{\widehat{C}}\right) \end{array}\right. } \end{aligned}$$
(4.23)

or

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) \ne 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \le 1- {\overline{\kappa }}^{-1}\\ \left| \lambda -\left( 1-\tfrac{\overline{\gamma }}{2}\right) \right| \le \tfrac{\overline{\gamma }}{2}. \end{array}\right. } \end{aligned}$$
(4.24)

Moreover,

$$\begin{aligned} \rho (\mathcal{T}) \le \max \left( 1-\frac{1}{\kappa _{\widehat{A}}},\, 1-\frac{1}{\kappa _{\widehat{C}}},\, \gamma _{\widehat{A}}-1,\, \gamma _{\widehat{C}}-1,\, \sqrt{1-\frac{2-\overline{\gamma }}{\overline{\kappa }}}\right) . \end{aligned}$$
(4.25)

Proof

By virtue of Lemma 4.1, the eigenvalues of \(I-\mathcal{T}\) that are not equal to one are the inverse of the nonzero eigenvalues of

$$\begin{aligned} \widehat{\mathcal{A}}^{-1}\mathcal{M}\bigg (I-\mathcal{P}\left( \mathcal{P}^T\mathcal{M}\,\mathcal{P}\right) ^{-1}\mathcal{P}^T \mathcal{M}\bigg ). \end{aligned}$$

Equivalently, they are the solutions of the generalized eigenvalues problem:

$$\begin{aligned} \mathcal{M}^{-1/2}\widehat{\mathcal{A}}\, \mathcal{M}^{-1/2} \mathbf{z}= \lambda ~\big (I-\mathcal{M}^{1/2}\mathcal{P}(\mathcal{P}^T \mathcal{M}\mathcal{P})^{-1}\mathcal{P}^T \mathcal{M}^{1/2}\big ) \, \mathbf{z},\quad \mathbf{z}\not \in \mathcal{R}(\mathcal{M}^{1/2}\mathcal{P}). \end{aligned}$$

Because the matrix in the right hand side is an orthogonal projector, this eigenvalue problem can be formulate as a problem (4.12), where \(\mathcal{Q}\) has the form (4.11) with \(Q_A\) such that

$$\begin{aligned} Q_AQ_A^T=I-M_{\widehat{A}}^{1/2}P_{\widehat{A}}\left( P_{\widehat{A}}^T M_{\widehat{A}} P_{\widehat{A}}\right) ^{-1}P_{\widehat{A}}^T M_{\widehat{A}}^{1/2} \end{aligned}$$

and \(Q_C\) such that

$$\begin{aligned} Q_CQ_C^T=I-M_{\widehat{C}}^{1/2}P_{\widehat{C}}\left( P_{\widehat{C}}^T M_{\widehat{C}} P_{\widehat{C}}\right) ^{-1}P_{\widehat{C}}^T M_{\widehat{C}}^{1/2}. \end{aligned}$$

Hence, we may apply Lemma 4.2 to \(\mathcal{Q}\) defined from the above relations and

$$\begin{aligned} \mathcal{A}= \mathcal{M}^{-1/2}\widehat{\mathcal{A}}\, \mathcal{M}^{-1/2} = \begin{pmatrix}M_{\widehat{A}}^{-1/2}\widehat{A}\, M_{\widehat{A}}^{-1/2}&{}\quad M_{\widehat{A}}^{-1/2}\widehat{B}^T M_{\widehat{C}}^{-1/2} \\ -M_{\widehat{C}}^{-1/2}\widehat{B}\, M_{\widehat{A}}^{-1/2}&{}\quad M_{\widehat{C}}^{-1/2}\widehat{C}\,M_{\widehat{C}}^{-1/2}\end{pmatrix}. \end{aligned}$$

This straightforwardly yields (4.23) and (4.24), providing that

$$\begin{aligned} \kappa _{\widehat{A}}&=\lambda _{\max }\left( Q_A^T M_{\widehat{A}}^{1/2}\widehat{A}^{-1}M_{\widehat{A}}^{1/2}Q_A\right) ,&\kappa _{\widehat{C}}&=\lambda _{\max }\left( Q_C^T M_{\widehat{C}}^{1/2}\widehat{C}^{-1}M_{\widehat{C}}^{1/2}Q_C\right) . \end{aligned}$$
(4.26)

Now, by virtue of Theorem 1.3.22 in [15], \(\big (Q_A^T M_{\widehat{A}}^{1/2}\widehat{A}^{-1/2}\big )\big (\widehat{A}^{-1/2} M_{\widehat{A}}^{1/2}Q_A\big )\) has the same nonzero eigenvalues as \(\widehat{A}^{-1/2}M_{\widehat{A}}^{1/2}Q_AQ_A^TM_{\widehat{A}}^{1/2}\widehat{A}^{-1/2}\), whereas

$$\begin{aligned}&\lambda _{\max }\bigg (\widehat{A}^{-1/2}M_{\widehat{A}}^{1/2}Q_AQ_A^TM_{\widehat{A}}^{1/2}\widehat{A}^{-1/2}\bigg )\\&\quad =\max _{\mathbf{v}\in \mathbb {R}^n\backslash \{\mathbf{0}\}} \frac{\mathbf{v}^T\,M_{\widehat{A}}^{1/2}Q_AQ_A^TM_{\widehat{A}}\,\mathbf{v}^T}{\mathbf{v}^T\,\widehat{A}\,\mathbf{v}} \\&\quad =\max _{\mathbf{v}\in \mathbb {R}^n\backslash \{\mathbf{0}\}} \frac{\mathbf{v}^T\, M_{\widehat{A}}\bigg (I-P_{\widehat{A}} \left( P_{\widehat{A}}^TM_{\widehat{A}}\,P_{\widehat{A}}\right) ^{-1}P_{\widehat{A}}^T M_{\widehat{A}}\bigg ) \,\mathbf{v}}{\mathbf{v}^T\,\widehat{A}\,\mathbf{v}}. \end{aligned}$$

The last right hand side coincides by definition with \(K(\widehat{A},P_{\widehat{A}},M_{\widehat{A}})\); i.e., is equal to \(\kappa _{\widehat{A}}\), proving the left equality (4.26). Reasoning similarly for the \(C\) block completes the proof of (4.26), and thus that of (4.23) and (4.24).

We now consider (4.25). It is clear from (4.23) that the real eigenvalues of \(\mathcal{T}\) satisfy

$$\begin{aligned} |\lambda | \le \max \left( 1-\frac{1}{\kappa _{\widehat{A}}},\, 1-\frac{1}{\kappa _{\widehat{C}}},\, \gamma _{\widehat{A}}-1,\, \gamma _{\widehat{C}}-1\right) . \end{aligned}$$

The proof of (4.25) is then complete if we can show that any eigenvalue \(\lambda =x+iy\) with nonzero imaginary part \(y\) satisfies

$$\begin{aligned} |\lambda |^2~=x^2+y^2 \le \max \left( (\overline{\gamma }-1)^2,~1-\frac{2-\overline{\gamma }}{\overline{\kappa }}\right) . \end{aligned}$$

(this is sufficient because \(\overline{\gamma }\le \max (\gamma _{\widehat{A}},\,\gamma _{\widehat{C}})\)). The condition \(\left| \lambda -\left( 1-\tfrac{\overline{\gamma }}{2}\right) \right| \le \tfrac{\overline{\gamma }}{2}\) implies

$$\begin{aligned} x^2+y^2 \le x(2-\overline{\gamma })-(1-\overline{\gamma }). \end{aligned}$$

For \(\overline{\gamma }>2\), this is largest for \(x\) as strongly negative as possible; i.e., for \(x=1-\overline{\gamma }\), yielding \(x^2+y^2 \le (1-\overline{\gamma })^2\). On the other hand, for \(\overline{\gamma }<2\), \(x^2+y^2\) is maximal when \(x\) is as large as possible; i.e., when it reaches the limit imposed by the condition \(\mathfrak {R}\mathrm{{\small e}}(\lambda )=x\le 1-{\overline{\kappa }}^{-1}\). Then we have:

$$\begin{aligned} x^2+y^2 \le (1-{\overline{\kappa }}^{-1})(2-\overline{\gamma })-(1-\overline{\gamma }) = 1-\frac{2-\overline{\gamma }}{\overline{\kappa }}, \end{aligned}$$

concluding the proof of (4.25). \(\square \)

Corollary 4.5

Let

$$\begin{aligned} \mathcal{A}= \begin{pmatrix}A &{}\quad B^T \\ B &{}\quad -C\end{pmatrix}\end{aligned}$$

be a matrix such that \(A\) is an \(n\times n\) SPD matrix and \(C\) is an \(m\times m\) nonnegative definite matrix. Assume that \(B\) has rank m or that \(C\) is positive definite on the null space of \(B^T\).

Let \(D_A=\text{ diag }(A)\), let \(\alpha \) a positive number such that \(\alpha <2(\lambda _{\max }(D_A^{-1}A))^{-1}\), let \(\mathcal{L}\), \(\mathcal{U}\) be defined by (3.1), and let \(\widehat{\mathcal{A}}=\mathcal{L}\,\mathcal{A}\,\mathcal{U}\) be given by (3.3).

Let \(P_A\) and \(P_{\widehat{C}}\) be, respectively, \(n\times n_c\) and \(m\times m_c\) matrices of rank \(n_c<n\) and \(m_c<m\), and set, using Definition 4.1,

$$\begin{aligned} \hat{\kappa }_{A}&= K(A,P_A,D_A), \quad \hat{\gamma }_{A}= \big (\alpha \,(2-\alpha \,\lambda _{\max }(D_A^{-1}A))\big )^{-1}~ , \\ \hat{\kappa }_{\widehat{C}}&= K(\widehat{C},P_{\widehat{C}},D_{\widehat{C}}), \quad \hat{\gamma }_{\widehat{C}}= \lambda _{\max }\big (D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\big ), \\ \widetilde{\kappa }&= \frac{2\,\hat{\kappa }_{A}\,\hat{\kappa }_{\widehat{C}}}{\hat{\kappa }_{A}+\hat{\kappa }_{\widehat{C}}}, \quad \widetilde{\gamma }= \frac{2\,\hat{\gamma }_{A}\,\hat{\gamma }_{\widehat{C}}}{\hat{\gamma }_{A}+\hat{\gamma }_{\widehat{C}}}, \end{aligned}$$

where \(\widehat{C}=C+B(2\alpha D_A^{-1}-\alpha ^2 D_A^{-1}A D_A^{-1})B^T\) and \(D_{\widehat{C}}=\text{ diag }(\widehat{C})\).

Let then the matrix \(\mathcal{T}\) be defined by (3.5), (3.6) and (3.7), where \(\omega \) is a positive parameter. Its nonzero eigenvalues \(\lambda \) are such that either

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) = 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \le 1- {\omega }\, \big ({\max \left( \hat{\kappa }_{A},\,\hat{\kappa }_{\widehat{C}}\right) }\big )^{-1}\\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \ge 1-\omega \,\max \left( \hat{\gamma }_{A},\,\hat{\gamma }_{\widehat{C}}\right) \end{array}\right. } \end{aligned}$$
(4.27)

or

$$\begin{aligned} {\left\{ \begin{array}{ll} \mathfrak {I}\mathrm{{\tiny m}}(\lambda ) \ne 0 \\ \mathfrak {R}\mathrm{{\small e}}(\lambda ) \le 1- {\omega }\,{\widetilde{\kappa }}^{-1}\\ \left| \lambda -\left( 1-\tfrac{\omega \,\widetilde{\gamma }}{2}\right) \right| \le \tfrac{\omega \,\widetilde{\gamma }}{2}. \end{array}\right. } \end{aligned}$$
(4.28)

Moreover,

$$\begin{aligned} \rho (\mathcal{T}) \le \max \left( 1-\frac{\omega }{\hat{\kappa }_{A}},\, 1-\frac{\omega }{\hat{\kappa }_{\widehat{C}}},\, \omega \,\hat{\gamma }_{A}-1,\, \omega \,\hat{\gamma }_{\widehat{C}}-1,\, \sqrt{1-\frac{\omega \,(2-\omega \,\widetilde{\gamma })}{\widetilde{\kappa }}}\right) . \end{aligned}$$
(4.29)

Proof

Firstly, observe that the condition \(0<\alpha <2(\lambda _{\max }(D_A^{-1}A))^{-1}\) implies the positive definiteness of \(2\alpha D_A^{-1}-\alpha ^2 D_A^{-1}A D_A^{-1}\), and hence that of \(\widehat{C}\) since \(C\) is positive definite on the range of \(B^T\). Thus we may apply Theorem 4.4 with \(M_{\widehat{A}}=\omega ^{-1}D_A\) and \(M_{\widehat{C}}=\omega ^{-1}D_{\widehat{C}}\). Moreover, we may use Lemma 4.3 with \(Z_A=\alpha ^{-1}D_A\) to characterize \(S_{\widehat{A}}\) and \(S_{\widehat{C}}\) [noting that the condition \(Z_A+Z_A^T-A\) SPD amounts to \(0<\alpha <2(\lambda _{\max }(D_A^{-1}A))^{-1}\)]. This yields

$$\begin{aligned} S_{\widehat{C}} = C+B\,A^{-1}B^T \end{aligned}$$

and

$$\begin{aligned} S_{\widehat{A}} \le \alpha ^{-2}D_A (2\,\alpha ^{-1}D_A-A)^{-1}D_A \le \big (\alpha \,(2-\alpha \,\lambda _{\max }(D_A^{-1}A))\big )^{-1}D_A. \end{aligned}$$

Then, for the quantities in Theorem 4.4, we obtain

$$\begin{aligned} \kappa _{\widehat{A}}&= K\left( \widehat{A},P_{\widehat{A}},M_{\widehat{A}}\right) = \omega ^{-1}\hat{\kappa }_{A}, \quad \gamma _{\widehat{A}}= \lambda _{\max }\left( M_{\widehat{A}}^{-1}S_{\widehat{A}}\right) \le \omega \,\hat{\gamma }_{A}, \\ \kappa _{\widehat{C}}&= K\left( \widehat{C},P_{\widehat{C}},M_{\widehat{C}}\right) = \omega ^{-1}\hat{\kappa }_{\widehat{C}}, \quad \gamma _{\widehat{C}}= \lambda _{\max }\left( M_{\widehat{C}}^{-1}S_{\widehat{C}}\right) = \omega \,\hat{\gamma }_{\widehat{C}}. \end{aligned}$$

The stated results straightforwardly follow. \(\square \)

Note that there is no explicit restriction on the positive parameter \(\omega \) in Corollary 4.5, but the bound on the spectral radius becomes useless when \(\omega \ge 2/\max \left( \hat{\gamma }_{A},\,\hat{\gamma }_{\widehat{C}}\right) \).

That said, we can pursue the discussion initiated in Sect. 4.1. If \(P_A\) and \(P_{\widehat{C}}\) work well in combination with damped Jacobi smoothing for respectively \(A\) and \(\widehat{C}\), \(\hat{\kappa }_{A}=K(A,P_A,D_A)\) and \(\hat{\kappa }_{\widehat{C}}=K(\widehat{C},P_{\widehat{C}},D_{\widehat{C}})\) will be reasonably bounded. The related bounds in (4.23) and (4.24) then guarantee that the iteration matrix will have no eigenvalue close to 1. Regarding this aspect, Corollary 4.5 offers in fact only a slight improvement over the result already discussed in Sect. 4.1, based on Lemma 4.1 applied to \(\widehat{A}\) in combination with (4.9).

However, the further constraints provided by the last inequalities in (4.23) and (4.24) allow us to complete the analysis. In particular, we now have an effective bound on the spectral radius providing that, in addition to upper bounds on \(\hat{\kappa }_{A}\) and \(\hat{\kappa }_{\widehat{C}}\), one can show that \(\omega \,\hat{\gamma }_{A}\ll 2\) and \(\omega \,\hat{\gamma }_{\widehat{C}}\ll 2\).

Regarding these requirements, first note that we intend to select \(\omega \) as for damped Jacobi smoothing applied to a discrete Laplacian; that is, typically, such that \(0.5\le \omega \le 0.75\). This immediately ensures \(\omega \,\lambda _{\max }(D_A^{-1}A)\ll 2\), and hence

$$\begin{aligned} \omega \,\hat{\gamma }_{A}= \frac{\omega \,\lambda _{\max }\left( D_A^{-1}A\right) }{\big (\alpha \lambda _{\max }(D_A\,^{-1}A)\big ) \,\big (2-\alpha \,\lambda _{\max }(D_A^{-1}A)\big )\big )} \end{aligned}$$

will also be away from 2 as soon as \(\alpha \) is chosen reasonably close to \((\lambda _{\max }(D_A^{-1}A))^{-1}\) (as recommended in Sect. 3).

Regarding \(\omega \,\hat{\gamma }_{\widehat{C}}\), we need some additional conditions. These, however, seem not restrictive. At least, the results in the next section will confirm that they are satisfied in the typical examples considered there.

Firstly, recall that the approach is consistently defined if the diagonal block \(\widehat{C}\) in (3.4) is “well adapted” to multigrid, because, as will be illustrated in the next section, it resembles a discrete Laplacian. Then one will have \(\omega \,\lambda _{\max }(D_{\widehat{C}}^{-1}\widehat{C})\ll 2\) when using \(\omega \) tailored for discrete Laplacian, and \(\omega \,\hat{\gamma }_{\widehat{C}}\ll 2\) will hold if

$$\begin{aligned} \frac{\lambda _{\max }\left( D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\right) }{\lambda _{\max }\left( D_{\widehat{C}}^{-1}\widehat{C}\right) } \approx 1. \end{aligned}$$
(4.30)

Note that, because

$$\begin{aligned} (C+B\,A^{-1}B^T)-\widehat{C}= B\, \left( I-\alpha D_A^{-1}A\right) A^{-1}\left( I-\alpha A\,D_A^{-1}\right) B^T. \end{aligned}$$

is SPD, the above ratio can in fact not be smaller than 1. Now,

$$\begin{aligned} \lambda _{\max }\left( D_{\widehat{C}}^{-1}\widehat{C}\right) = \max _{\mathbf{z}\in \mathbb {R}^n\backslash \{\mathbf{0}\}}\frac{\mathbf{z}^T\, \widehat{C}\,\mathbf{z}}{\mathbf{z}^T\, D_{\widehat{C}}\,\mathbf{z}}. \end{aligned}$$

Let then \(\mathbf{v}\) be any vector such that

$$\begin{aligned} \frac{\mathbf{v}^T\,\big (C\!+\!B\,A^{-1}B^T-\widehat{C}\big )\mathbf{v}}{\mathbf{v}^T\,(C\!+\!B\,A^{-1}B^T)\,\mathbf{v}} = \frac{\mathbf{v}^T\,B\, (I-\alpha D_A^{-1}A) A^{-1}(I-\alpha A\,D_A^{-1}) B^T\mathbf{v}}{\mathbf{v}^T\,(C+B\,A^{-1}B^T)\,\mathbf{v}} = \varepsilon ~\ll 1. \end{aligned}$$
(4.31)

There holds

$$\begin{aligned} \frac{\lambda _{\max }\left( D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\right) }{\lambda _{\max }\left( D_{\widehat{C}}^{-1}\widehat{C}\right) }&\le \frac{\lambda _{\max }\left( D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\right) }{\left( \tfrac{\mathbf{v}^T\, \widehat{C}\,\mathbf{v}}{\mathbf{v}^T\, D_{\widehat{C}}\,\mathbf{v}}\right) }\\&= \frac{\lambda _{\max }\left( D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\right) }{(1-\varepsilon )\,\left( \tfrac{\mathbf{v}^T\, (C+B\,A^{-1}B^T)\,\mathbf{v}}{\mathbf{v}^T\, D_{\widehat{C}}\,\mathbf{v}}\right) }. \end{aligned}$$

Hence (4.30) holds if there exists such a vector \(\mathbf{v}\) satisfying in addition

$$\begin{aligned}&\frac{\mathbf{v}^T\, (C+B\,A^{-1}B^T)\,\mathbf{v}}{\mathbf{v}^T\, D_{\widehat{C}}\,\mathbf{v}} ~\approx ~ \max _{\mathbf{z}\in \mathbb {R}^n\backslash \{\mathbf{0}\}}\frac{\mathbf{z}^T\, (C+B\,A^{-1}B^T)\,\mathbf{z}}{\mathbf{z}^T\, D_{\widehat{C}}\,\mathbf{z}}\nonumber \\&\quad = \lambda _{\max }\bigg (D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\bigg ). \end{aligned}$$
(4.32)

Thus, the approach is well founded if, on the one hand, \(\widehat{C}\) resembles a discrete Laplacian, and, on the other hand, there exist a vector \(\mathbf{v}\) satisfying simultaneously (4.31) and (4.32). Regarding this latter condition, we cannot supply a general algebraic argument, but, considering the Stokes problem (2.1) with constant viscosity \(\nu \), we can anticipate that there will be no difficulty at least in the two extreme cases of \(\xi =0\) and of \(\xi \rightarrow \infty \).

Indeed, if \(\xi =0\), it is a general property of stable discretizations that \(C+B\,A^{-1}B^T\) is well conditioned (see below for finite difference schemes and [12, Sect. 5.5] for finite element methods); and “well conditioned” means nothing but that the minimal and maximal values of the ratio \({\mathbf{z}^T\, (C+B\,A^{-1}B^T)\,\mathbf{z}}/{\mathbf{z}^T\, D_{\widehat{C}}\,\mathbf{z}}\) are close to each other. (Because the viscosity is assumed constant, \(D_{\widehat{C}}\) is close to a multiple of the identity.) Hence (4.32) holds for any vector, including those satisfying (4.31).

On the other hand, consider \(\xi \) large enough, so that \(A\) is fairly dominated by its diagonal. It follows that, for properly chosen \(\alpha \),

$$\begin{aligned} \left\| I-\alpha \,A^{1/2} D_A^{-1}A^{1/2}\right\| = \max \bigg (\alpha \,\lambda _{\max }\left( D_A^{-1}A\right) -1,\,1-\alpha \,\lambda _{\min }\left( D_A^{-1}A\right) \bigg ) \end{aligned}$$

is small. Then,

$$\begin{aligned}&\frac{\mathbf{v}^T\,B\, \left( I-\alpha D_A^{-1}A\right) A^{-1}\left( I-\alpha A\,D_A^{-1}\right) B^T\mathbf{v}}{\mathbf{v}^T\,(C+B\,A^{-1}B^T)\,\mathbf{v}}\\&\quad \le \left\| I-\alpha \,A^{1/2} D_A^{-1}A^{1/2}\right\| ^2 ~~\frac{\mathbf{v}^T\,B\,A^{-1}B^T\,\mathbf{v}}{\mathbf{v}^T\,(C+B\,A^{-1}B^T)\,\mathbf{v}} \\&\quad \le \left\| I-\alpha \,A^{1/2} D_A^{-1}A^{1/2}\right\| ^2~ \end{aligned}$$

will be small as well for any \(\mathbf{v}\). That is (4.31) is satisfied by any vector, including the eigenvector associated with the largest eigenvalue of \(D_{\widehat{C}}^{-1}(C+B\,A^{-1}B^T)\) (which obviously satisfies (4.32)).

These reasonings will be further illustrated in the next section.

5 Examples

In this section we intend, among other things, to discuss \(\alpha \), in order to validate the rule \(\alpha =(\lambda _{\max }(D_A^{-1}A))^{-1}\) while investigating the sensitivity with respect to variations around this value (\(\lambda _{\max }(D_A^{-1}A)\) is in general only known approximately). Most often the results are expressed as a function of

$$\begin{aligned} \widetilde{\alpha }=\alpha \,\left\| D_A^{-1}A\right\| _{\infty }. \end{aligned}$$

That is, we use \(\Vert D_A^{-1}A\Vert _{\infty }\) as cheap approximation to \(\lambda _{\max }(D_A^{-1}A)\), and \(\alpha =(\Vert D_A^{-1}A\Vert _{\infty })^{-1}\) as practical implementation of the suggested rule \(\alpha \!\approx \!(\lambda _{\max }(D_A^{-1}A))^{-1}\). Expressing the results as a function of \(\widetilde{\alpha }\) makes then easier their interpretation, since the rule amounts to \(\widetilde{\alpha }=1\) independently of the problem at hand.

5.1 Finite difference discretization on staggered grid

Here we consider the finite difference discretization of (2.1), in which the 2D domain \(\Omega \) is the unit square and the parameters \(\nu >0\) and \(\xi \ge 0\) are constant. We use a staggered grid, which ensures the natural stability of the discretization [38]. Hence the matrix (2.3) of the resulting linear system (2.2) is such that \(C=0\). We consider a uniform mesh size \(h\) in both directions, with \(h\) such that \(h^{-1}\) is an even integer.

For the prolongation, we select plain aggregation for both the pressure and the velocity components; that is, \(P_A\) and \(P_{\widehat{C}}\) have exactly one nonzero entry per row, which is further equal to one. The position of the nonzero is determined by the partitioning in aggregates: if the \(i\)th unknown belongs to the \(j\)th aggregate, the nonzero component in row \(i\) will be the one in column \(j\).

To keep the discussion of aggregation algorithms outside of this work, we consider a model geometric-based aggregation. More precisely, we assume that regularly aligned \(2\times 2\) box aggregates are always formed whenever possible. If the number of grid points in one of the directions is odd, this is completed with one line of aggregates of size 2. (For the boundary conditions considered below, there is always, for each type of unknown, at least one direction with an even number of grid points).

5.1.1 Periodic boundary conditions

We first discuss the case of periodic boundary conditions for all velocity components in all directions. The stencils for all matrices are then the same as for an infinite grid; that is, as with the classical assumptions of local mode or local Fourier analysis (LFA; see, e.g. [33, Chapter 4]). With periodic boundary conditions, however, we may apply our theoretical results without raising the question of their extension of to infinite dimensions. As often when using LFA, our motivation lies in a simplified setting (in particular, free from any boundary effect) that allows to derive analytic estimates. Note, however that this will be done without developing calculation in the Fourier basis.

Now, with these boundary conditions, the system matrix has one singular mode in general (the vector corresponding to constant pressure and zero velocity) and two additional singular modes when \(\xi =0\) (the vectors corresponding to constant velocity). However, it is well known that, in the LFA setting, these singular modes form an invariant subspace for all components of the iteration matrix, and what matters is the spectral radius associated with the set of other modes. Technically, it mean that we may apply our results in Sect. 4.3 to the matrices expressed in the Fourier basis, after discarding the rows and columns corresponding to the singular mode(s). To avoid making explicitly this transformation, we proceed as follows. We set \(C=\varepsilon \,I\) (where \(\varepsilon >0\)) and consider in first instance that \(\xi \) is positive. This allows to apply our results (that are restricted to regular matrices). Next, by a continuity argument on the eigenvalues, the limit of the bounds for \(\varepsilon \rightarrow 0\) and (possibly) for \(\xi \rightarrow 0\) provide valid bounds for the submatrices that one would obtain by discarding explicitly the singular mode(s).

We now characterize the different matrices, which will allow us to derive bounds for the quantities \(\hat{\kappa }_{A}\), \(\hat{\kappa }_{\widehat{C}}\), \(\hat{\gamma }_{A}\), and \(\hat{\gamma }_{\widehat{C}}\) involved in Corollary 4.5.

The matrix \(A\) has two diagonal blocks, and each of them corresponds to the following constant stencil acting on the periodic grid:

$$\begin{aligned} \frac{\nu }{h^2}\begin{bmatrix}&-1&\\ -1&4\left( 1+\tfrac{h^2\,\xi }{4\,\nu }\right)&-1 \\&-1&\ \end{bmatrix}. \end{aligned}$$

Hence, letting

$$\begin{aligned} D_{A_0} = \text{ diag }\left( A\mid _{\xi =0}\right) = \tfrac{4\,\nu }{h^2}\,I \end{aligned}$$

and

$$\begin{aligned} \eta = \frac{\xi \,h^2}{4\,\nu }, \end{aligned}$$

one has \(D_A=\text{ diag }(A)=(1+\eta ) D_{A_0}\).

On the other hand, \(B\,D_{A_0}^{-1}B^T\) corresponds to the stencil

$$\begin{aligned} \frac{1}{4\,\nu }\begin{bmatrix}&\quad -1&\quad \\ -1&\quad 4&\quad -1 \\&\quad -1&\quad \end{bmatrix} , \end{aligned}$$

whereas, for \(B\,D_{A_0}^{-1}(A-D_A)D_{A_0}^{-1}B^T\) we obtain the following stencil:

$$\begin{aligned} \frac{1}{16\,\nu } \left[ \begin{array}{crrrc} &{}\quad \quad &{}\quad 1 &{}\quad &{}\quad \\ &{}\quad 2 &{}\quad -4 &{}\quad 2 &{}\quad \\ 1&{}\quad -4 &{}\quad 4 &{}\quad -4 &{}\quad 1\\ &{}\quad 2 &{}\quad -4 &{}\quad 2 &{}\quad \\ &{}\quad &{}\quad 1&{}\quad &{}\quad \end{array} \right] . \end{aligned}$$

Then, for the matrix

$$\begin{aligned} \widehat{C}&= C+B\left( 2\alpha D_A^{-1}-\alpha ^2 D_A^{-1}A D_A^{-1}\right) B^T \\&= C+(2\,\alpha -\alpha ^2)(1+\eta )^{-1}\, B\,D_{A_0}^{-1}B^T- \alpha ^2(1+\eta )^{-2}\,B\,D_{A_0}^{-1}(A-D_A)D_{A_0}^{-1}B^T, \end{aligned}$$

and using \( \widetilde{\alpha }=\alpha \,\Vert D_A^{-1}A\Vert _{\infty }=\alpha \,\lambda _{\max }(D_A^{-1}A)=\alpha \,(2+\eta )/(1+\eta ) \), we obtain the following stencil:

where \(s=4\big (4(2+\eta )\,\widetilde{\alpha }(2-\widetilde{\alpha })+3\,\widetilde{\alpha }^2\big )+16\,\nu \,(2+\eta )^2\,\varepsilon \); i.e., the row-sum is equal to \(\varepsilon \).

This stencil is that of an M-matrix as long as the condition \(0<\widetilde{\alpha }<2\) holds, which is also the condition \(0<\alpha <2(\lambda _{\max }(D_A^{-1}A))^{-1}\) needed to apply Corollary 4.5. One may also observe that this stencil is in fact a linear combination of the application to three different grids of the classical five point stencil for the Laplace operator: the standard grid of mesh size \(h\), the grid of mesh size \(2\,h\), and the skew grid of mesh size \(\sqrt{2}\,h\).

For the parameters in Corollary 4.5, we first obtain

$$\begin{aligned} \hat{\gamma }_{A}= \frac{\left\| D_A^{-1}A\right\| _{\infty }}{\widetilde{\alpha }\,(2-\widetilde{\alpha })} = \frac{2+\eta }{(1+\eta )\,\widetilde{\alpha }(2-\widetilde{\alpha })} \le \frac{2}{\widetilde{\alpha }(2-\widetilde{\alpha })}. \end{aligned}$$
(5.1)

Next, it is well known that, letting \(A_0=A\mid _{\xi =0}\),

$$\begin{aligned} \lambda _{\max }\left( B\, A_0^{-1}B^T\right) = \nu ^{-1}, \end{aligned}$$

which further implies that, since \(A=A_0+\xi \,I\ge \left( 1+\frac{\xi }{\lambda _{\max }(A_0)}\right) A_0= \left( 1+\frac{\eta }{2}\right) A_0\),

$$\begin{aligned} \lambda _{\max }\left( B\, A^{-1}B^T\right) \le \frac{2\,\nu ^{-1}}{2+\eta }. \end{aligned}$$

Hence, since \(D_{\widehat{C}}= \frac{4(2+\eta )\,\widetilde{\alpha }(2-\widetilde{\alpha })+3\,\widetilde{\alpha }^2+\nu \,(2+\eta )^2\,\varepsilon }{4\,\nu \,(2+\eta )^2}\,I\) :

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~\hat{\gamma }_{\widehat{C}}\le \frac{8\,(2+\eta )}{4(2+\eta )\,\widetilde{\alpha }(2-\widetilde{\alpha })+3\,\widetilde{\alpha }^2}~ \le \frac{2}{\widetilde{\alpha }(2-\widetilde{\alpha })}. \end{aligned}$$
(5.2)

This result is in fact in agreement with the discussion at the end of Sect. 4.3. This is more easily seen for \(\widetilde{\alpha }=1\), because then the eigenvector of \(\widehat{C}\) associated with its largest eigenvalue is the vector that, using a red black partitioning, takes the value +1 at red nodes and the value \(-\)1 at black nodes. The corresponding eigenvalue is \({2\,\nu ^{-1}}/({2+\eta })\); i.e., it is equal to the above upper bound on \(\lambda _{\max }(B\,A^{-1}B^T)\). This means that \(\lambda _{\max }(B\,A^{-1}B^T)\) and \(\lambda _{\max }(\widehat{C})\) are both equal to this value, since the former cannot be smaller than the latter. Thus (4.30) holds as an equality. Going to the details, we may associate this with the fact that \(B\,A^{-1}B^T\) amounts to the identity for \(\eta =0\). Hence, it suffices to have one vector \(\mathbf{v}\) satisfying \((I-\alpha \,A\, D_A^{-1})B^T\mathbf{v}=\mathbf{0}\) to enforce the equality between the largest eigenvalues of \(\widehat{C}\) and \(B\,A^{-1}B^T\). On the other hand, with \(\alpha =(\lambda _{\max }(D_A^{-1}A))^{-1}\),

$$\begin{aligned} \left\| \left( I-\alpha \, D_A^{-1}A\right) A^{-1}\left( I-\alpha \,A\, D_A^{-1}\right) \right\|&= \left( 1-\tfrac{\lambda _{\min }(D_A^{-1}A)}{\lambda _{\max }(D_A^{-1}A)}\right) ^2 \lambda _{\min }(A)^{-1}\\&\le \left( \tfrac{2}{2+\eta }\right) ^2~\tfrac{h^2}{4\,\nu \,\eta }, \end{aligned}$$

showing that \(\Vert B\,A^{-1}B^T-\widehat{C}\Vert \rightarrow 0\) for \(\eta \rightarrow \infty \).

Let us now return to (5.1) and (5.2). We deduce an upper bound on \(\widetilde{\gamma }\) using the left inequalities for greater accuracy:

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~\widetilde{\gamma }^{-1}= \frac{\hat{\gamma }_{A}^{-1}+\lim |_{\varepsilon \rightarrow 0}~\hat{\gamma }_{\widehat{C}}^{-1}}{2} \ge \frac{\widetilde{\alpha }\,(32-13\,\widetilde{\alpha })+\eta \,\widetilde{\alpha }\,(24-12\,\widetilde{\alpha })}{32+16\,\eta }. \end{aligned}$$

A uniform bound w.r.t. \(\eta \) can then be derived by taking the worse of the values obtained for \(\eta =0\) and \(\eta =\infty \), which gives

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~\widetilde{\gamma }\le \max \left( \frac{32}{\widetilde{\alpha }\,(32-13\,\widetilde{\alpha })},\, \frac{4}{3\,\widetilde{\alpha }\,(2-\widetilde{\alpha })}\right) . \end{aligned}$$
(5.3)

Regarding \(\hat{\kappa }_{A}\) and \(\hat{\kappa }_{\widehat{C}}\), we may apply the results in [21] to obtain bounds on the approximation property constant for aggregation-based prolongations. We first consider \(\hat{\kappa }_{A}\). A key ingredient is a splitting \(A=A_b+A_r\) of the matrix \(A\) such that both \(A_b\) and \(A_r\) are nonnegative definite whereas \(A_b\) is block diagonal with respect to the partitioning in aggregates. Then, particularized to the present context, and using Definition 4.1, Theorem 3.2 of [21] essentially says that

$$\begin{aligned} K(A,P_A,D_A) = \max _{\text {over all aggregates }i} K\left( A_b^{(i)},\mathbf{1}_i,D_A^{(i)}\right) , \end{aligned}$$

where \(A_b^{(i)}\) is the diagonal block of \(A_b\) related to the unknowns in aggregate \(i\), where \(\mathbf{1}_i\) is the constant vector of size equal to the number of unknowns in aggregate \(i\), and where \(D_A^{(i)}\) is the diagonal of \(A\) restricted to aggregate \(i\).

For the model example under consideration, the application of this result is facilitated by the fact that the coefficients are constant and that all the aggregates are similarly made of \(2\times 2\) grid points. (With periodic boundary conditions and \(h^{-1}\) equal to an even integer, the number of grid points in \(x\) and \(y\) directions is even for both velocity components.) For each aggregate \(i\), we take the corresponding diagonal block in \(A_b\) asFootnote 2

$$\begin{aligned} A_b^{(i)}~= \frac{\nu }{h^2}\begin{pmatrix}2+4\,\eta &{}\quad -1 &{}\quad -1 &{}\quad 0\\ -1 &{}\quad 2+4\,\eta &{}\quad 0 &{}\quad -1\\ -1 &{}\quad 0 &{}\quad 2+4\,\eta &{}\quad -1\\ 0 &{}\quad -1 &{}\quad -1 &{}\quad 2+4\,\eta \end{pmatrix}. \end{aligned}$$

One may check that \(A_r=A-A_b\) is then an M-matrix with zero row-sum, hence nonnegative definite. On the other hand, \(D_A^{(i)}= \frac{4\,\nu \,(1+\eta )}{h^2}\,I\). It follows that the constant vector is the eigenvector of \(\left. D_A^{(i)}\right. ^{-1}A_b^{(i)}\) associated with its smallest eigenvalue. Then, Theorem 3.4 of [21] further tells us that \(K(A_b^{(i)},\mathbf{1}_i,D_A^{(i)})\) is in fact equal to the inverse of the second smallest eigenvalues. The related eigenvectors being \((1~~1~-\!1~-\!1)^T\) and \((1~-\!1~~1~-\!1)^T\), this yields

$$\begin{aligned} K\left( A_b^{(i)},\mathbf{1}_i,D_A^{(i)}\right) = \frac{2\,(1+\eta )}{1+2\,\eta }. \end{aligned}$$

Thus:

$$\begin{aligned} \hat{\kappa }_{A}\le \frac{2\,(1+\eta )}{1+2\,\eta } \le 2. \end{aligned}$$
(5.4)

We proceed similarly for \(\hat{\kappa }_{\widehat{C}}\). Using

$$\begin{aligned}&\widehat{C}_b^{(i)} =\tfrac{\widetilde{\alpha }}{8\,(2+\eta )^2} \\&\quad \times {\small \begin{pmatrix}4(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }&{}\quad -2(2+\eta )(2-\widetilde{\alpha })&{}\quad -2(2+\eta )(2-\widetilde{\alpha }) &{}\quad -\widetilde{\alpha }\\ -2(2+\eta )(2-\widetilde{\alpha })&{}\quad 4(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }&{}\quad -\widetilde{\alpha }&{}\quad -2(2+\eta )(2-\widetilde{\alpha }) \\ -2(2+\eta )(2-\widetilde{\alpha })&{}\quad -\widetilde{\alpha }&{}\quad 4(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }&{}\quad -2(2+\eta )(2-\widetilde{\alpha }) \\ -\widetilde{\alpha }&{}\quad -2(2+\eta )(2-\widetilde{\alpha })&{}\quad -2(2+\eta )(2-\widetilde{\alpha })&{}\quad 4(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }\end{pmatrix}} \end{aligned}$$

ensures that \(\widehat{C}_r=\widehat{C}-\widehat{C}_b\) is an M-matrix with positive row-sum (equal to \(\varepsilon \)). With

$$\begin{aligned} D_{\widehat{C}}^{(i)} = \tfrac{4\big (4(2+\eta )\,\widetilde{\alpha }(2-\widetilde{\alpha })+3\,\widetilde{\alpha }^2\big )+16\,\nu \,(2+\eta )^2\,\varepsilon }{16\,\nu \,(2+\eta )^2}~I, \end{aligned}$$

the constant vector is again the eigenvector of \(\left. D_{\widehat{C}}^{(i)}\right. ^{-1}\widehat{C}_b^{(i)}\) associated with the smallest eigenvalue. Further, the vectors \((1~~1~-\!1~-\!1)^T\) and \((1~-\!1~~1~-\!1)^T\) are also eigenvectors associated with the second smallest eigenvalue, and computing its inverse yields

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~K(\widehat{C}_b^{(i)},\mathbf{1}_i,D_{\widehat{C}}^{(i)}) = \frac{4(2+\eta )\,(2-\widetilde{\alpha })+3\,\widetilde{\alpha }}{2(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }}, \end{aligned}$$

and, therefore,

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~\hat{\kappa }_{\widehat{C}}\le \frac{4(2+\eta )\,(2-\widetilde{\alpha })+3\,\widetilde{\alpha }}{2(2+\eta )(2-\widetilde{\alpha })+\widetilde{\alpha }} \le \frac{16-5\,\widetilde{\alpha }}{8-3\,\widetilde{\alpha }}. \end{aligned}$$
(5.5)

Finally, combining the right inequalities (5.4) and (5.5), we get

$$\begin{aligned} \lim |_{\varepsilon \rightarrow 0}~\widetilde{\kappa }^{-1}= \frac{\hat{\kappa }_{A}^{-1}+\hat{\kappa }_{\widehat{C}}^{-1}}{2} \ge \frac{32-11\,\widetilde{\alpha }}{4\,(16-5\,\widetilde{\alpha })}. \end{aligned}$$
(5.6)

In view of (5.15.6), the upper bound (4.29) of Corollary 4.5 implies thus, for any \(\xi \ge 0\) [using the right inequalities to get uniform bounds w.r.t. \(\eta \), and noting that 2 in (5.4) cannot be larger than the right hand side of (5.5)],

$$\begin{aligned} \rho (\mathcal{T}) \le \max (\delta _1,\,\delta _2,\,\delta _3), \end{aligned}$$
(5.7)

where

$$\begin{aligned} \delta _1&= 1-\frac{\omega \,(8-3\,\widetilde{\alpha })}{16-5\,\widetilde{\alpha }}, \\ \delta _2&= \frac{2\,\omega }{\widetilde{\alpha }(2-\widetilde{\alpha })}-1, \\ \delta _3&= \left( 1-\frac{4\,\omega \,(16-5\,\widetilde{\alpha }) \left( 2-\omega ~ \max \left( \frac{32}{\widetilde{\alpha }\,(32-13\,\widetilde{\alpha })},\, \frac{4}{3\,\widetilde{\alpha }\,(2-\widetilde{\alpha })} \right) \right) }{32-11\,\widetilde{\alpha }}\right) ^{1/2}. \end{aligned}$$

This result is illustrated in Fig. 1. One sees that (5.7) guarantees that the spectral radius is below \(0.85\) when \(\widetilde{\alpha }\) is around 1 and \(\omega \) around \(0.6\). It is worth noting that this holds uniformly with respect to \(\nu \), \(h\) and \(\xi \) without using any form of parameter adjustment.

Fig. 1
figure 1

Staggered grid: analytical uniform (w.r.t. \(\nu \), \(h\) and \(\xi \)) bound (5.7) on \(\rho (\mathcal{T})\) as a function of user parameters \(\widetilde{\alpha }\), \(\omega \)

5.1.2 Dirichlet boundary conditions

We use Dirichlet boundary conditions on all boundaries for both velocity components. Then, away from the boundaries, the stencils of the different matrices are as with periodic boundary conditions. However, perturbations are introduced near boundaries. In addition, the staggered arrangement of the unknowns on the grid implies that, for each velocity component, the number of grid points is odd in either the \(x\) or the \(y\) direction. Hence, box aggregation cannot be used everywhere. Finally, according what is written at the end of Sect. 2, the problem is regularized by removing the last pressure component from the unknown set, fixing its value to zero.

Altogether these departures from the ideal situation of the preceding case make untractable the derivation of analytic bounds. We therefore illustrate the application of Corollary 4.5 resorting to numerical computation for the assessment of the needed quantities \(\hat{\kappa }_{A}\), \(\hat{\kappa }_{\widehat{C}}\), \(\hat{\gamma }_{A}\), and \(\hat{\gamma }_{\widehat{C}}\). The obtained values are reported in Table 1, in regard of the analytic bounds derived for the case of periodic boundary conditions (using in all cases the left inequalities to get the most accurate estimate). One sees that these analytic estimates remain realistic despite the boundary effects.

In Fig. 2, we plot the eigenvalue distribution of the iteration matrix in regards of the inequalities provided in Corollary 4.5. One sees that the analysis reflects well the actual distribution, which is much more clustered than indicated by the sole analysis of the spectral radius. This suggests that it might be more effective to use the two-grid method as a preconditioner for a Krylov subspace method (whose convergence is favored by a good clustering of the eigenvalues).

Table 1 Staggered grid: results for \(h=1/32\), \(\nu =1\), \(\widetilde{\alpha }=1\), and \(\omega =0.6\)
Fig. 2
figure 2

Staggered grid. \(+\), eigenvalues of \(\mathcal{T}\) for \(h=1/32\), \(\nu =1\), \(\widetilde{\alpha }=1\), and \(\omega =0.6\); \(-\), limit of the region of defined by the inequalities in Corollary 4.5 (horizontal lines close to the real axis indicate regions where in fact only real eigenvalues are permitted); - - -, bound (4.29) on the spectral radius

Finally, it is worth checking the relevance of the transformation proposed in Sect. 3 independently of the theoretical results. In this view we depict in Fig. 3, left, the actual value of the spectral radius as a function of \(\widetilde{\alpha }\). One sees that \(\widetilde{\alpha }\approx 1\) is indeed near optimal as suggested by the analysis, whereas the results only weakly depend on \(\widetilde{\alpha }\) when chosen around this value. Thus, one may rely in practice on the rule \(\alpha =(\Vert D_A^{-1}A\Vert _{\infty })^{-1}\), and a more accurate estimation of \(\lambda _{\max }(D_A^{-1}A)\) is not needed in practice.

Fig. 3
figure 3

\(\rho (\mathcal{T})\) as a function of \(\widetilde{\alpha }\) for \(h=1/32\) and \(\omega =0.6\)

5.2 Finite difference discretization on collocated grid

Here we consider again the finite difference discretization of (2.1) in which the 2D domain \(\Omega \) is the unit square. However, we use a collocated grid, whereas we restrict ourselves to \(\xi =0\), but consider two situations for the viscosity: constant viscosity, where \(\nu \) is uniformly equal to 1, and variable viscosity, where \(\nu \) is equal to \(10^3\) in the central part of the domain [i.e., in the square \((0.25,\,0.75) \times (0.25,\,0.75)\)] and \(\nu =1\) elsewhere. Here also, we use a uniform mesh size \(h\) in both directions, with \(h\) such that \(h^{-1}\) is an even integer. We consider only Dirichlet boundary conditions on all boundaries for all velocity components.

With collocated grids, all unknowns are located at the vertices of grid cells, which makes the discretization somewhat easier but induces the presence of spurious pressure modes with zero discrete divergence [38]. Hence a form of stabilization is required and, according to the discussion in [17], we take here \(C\) equal to the five point discretization of \((16\,\nu )^{-1}h^2 \Delta \) (with Neumann boundary conditions).

On the other hand, as in the preceding section, the pressure is only determined up to a constant, and we regularize the problem by removing the last pressure component from the unknown set, fixing its value to zero.

On collocated grids, the classical prolongation based on bilinear interpolation is well defined for all three types of unknowns. Hence we select this prolongation, noting that classical AMG schemes aim at reproducing it for simple discretizations of the Laplace operator.

A first remark here is that a simple transformation of (2.2) into

$$\begin{aligned} \begin{pmatrix}A &{}\quad B^T \\ -B &{}\quad C\end{pmatrix}\begin{pmatrix}\mathbf{u}\\ \mathbf{p}\end{pmatrix}= \begin{pmatrix}\mathbf{b}_{\mathbf{u}} \\ -\mathbf{b}_{\mathbf{p}}\end{pmatrix}\end{aligned}$$
(5.8)

suffices to make the unknown-based coarsening well defined, since \(C\) is SPD (after regularization) and further corresponds to a discrete Laplace matrix. Observing that the above system corresponds to the limit case of our approach for \(\alpha \rightarrow 0\), it is thus worth here to start with the experiment that concluded the preceding section, and check the evolution of the spectral radius with respect to \(\widetilde{\alpha }=\alpha \,\Vert D_A^{-1}A\Vert _{\infty }\). This is done in Fig. 3, right. One sees that the method is indeed still convergent for \(\widetilde{\alpha }=0\), but significantly slower than for \(\widetilde{\alpha }\) between \(0.5\) and \(2\). Observe also that our approach performs equally well in the presence of jumping viscosity.

We then focus on \(\widetilde{\alpha }=1\), according the recommendation in Sect. 3. Computation reveal that, away from internal and external boundaries, \(\widehat{C}\) corresponds to the stencil

$$\begin{aligned} \frac{\nu ^{-1}}{256} \begin{bmatrix}&\quad&\quad&\quad -1&\quad&\quad&\quad \\&\quad&\quad -1&\quad -12&\quad -1&\quad&\quad \\&\quad -1&\quad 0&\quad -13&\quad 0&\quad -1&\quad \\ -1&\quad -12&\quad -13&\quad 112&\quad -13&\quad -12&\quad -1 \\&\quad -1&\quad 0&\quad -13&\quad 0&\quad -1&\quad \\&\quad&\quad -1&\quad -12&\quad -1&\quad&\quad \\&\quad&\quad&\quad -1&\quad&\quad&\quad \end{bmatrix}. \end{aligned}$$

Again, this stencil is that of an M-matrix with zero row-sum. Moreover, one sees that the strong couplings correspond to a linear combination of the application to two different grids of the classical five point stencil for the Laplace operator: the standard grid of mesh size \(h\), and the grid of mesh size \(2\,h\).

Hence, Corollary 4.5 can be consistently applied. The numerically computed quantities \(\hat{\kappa }_{A}\), \(\hat{\kappa }_{\widehat{C}}\), \(\hat{\gamma }_{A}\), and \(\hat{\gamma }_{\widehat{C}}\) are reported in Table 2, as well as the related bound (4.29) and the actual value of the spectral radius. Here again, one sees that the jump of the viscosity has nearly no influence.

Table 2 Collocated grid: results for \(h=1/32\), \(\widetilde{\alpha }=1\), and \(\omega =0.6\)

Finally, in Fig. 4, right, we plot the eigenvalue distribution of the iteration matrix in regards of the inequalities provided in Corollary 4.5 with the parameters as in Table 2. This eigenvalue distribution can be compare with that plotted in Fig. 4, left, which corresponds to \(\widetilde{\alpha }=0\); that is, to the case where one applies the multigrid scheme directly to the system (5.8). This provides another illustration of the improvement brought by the transformation proposed in this paper.

Fig. 4
figure 4

Collocated grid. \(+\), eigenvalues of \(\mathcal{T}\) for \(h=1/32\) and \(\omega =0.6\) (constant viscosity); \(-\), limit of the region of defined by the inequalities in Corollary 4.5 (\(\widetilde{\alpha }>0\) only; horizontal lines close to the real axis indicate regions where in fact only real eigenvalues are permitted); - - -, bound (4.29) on the spectral radius

6 Comparison with block diagonal preconditioning

We mentioned in the introduction the family of block preconditioners for saddle points matrices in their original form (2.3). Among these, the most popular is perhaps the mere block diagonal preconditioner

$$\begin{aligned} \begin{pmatrix}\widetilde{A} &{} \\ \quad &{} \widetilde{S} \end{pmatrix}~ \end{aligned}$$

where \(\widetilde{A}\) is an approximation of \(A\), and \(\widetilde{S}\) is an approximation of the Schur complement \(C+B\,A^{-1}B^T\). Indeed, this preconditioner is symmetric and positive definite, whereas \(\widehat{\mathcal{A}}\) is symmetric; thus, the convergence can be accelerated with MINRES [31], which minimizes the residual norm with a short recurrence algorithm. Of course, relevant approximations \(\widetilde{A}\) can be setup by applying any sensible AMG method, which should raise no difficulty since \(A\) is made up with diagonal blocks that are discrete representations of \(\xi -\nu \Delta \).

Schur complement approximations require more care, but, in fact, a fairly complete convergence analysis is available for Stokes type problems (e.g. [12, 19, 30]). Considering the formulation (2.1), when \(\xi =0\), \(\widetilde{S}\) can be taken as \(\nu ^{-1}\) times (an approximation of) the pressure mass matrix, which further reduces to the identity in the finite difference case. In the case of positive \(\xi \), the Cahouet–Chabard preconditioner can be used [8], which approximates the Schur complement with a Poisson-like operator; this latter can itself be further replaced by, say, one application of an AMG preconditioner.

Now, a complete comparison of the approach suggested here with block diagonal preconditioning lies outside the scope of the present work. This would require a dedicated study like the one undertaken in [16] for geometric multigrid methods. More importantly, we would need to address practically oriented questions such as which smoothing scheme performs best in practice, which multigrid cycle is recommended, etc. As stated in the introduction, these are deliberately left for future research because their answer likely depends on the type of coarsening and possibly also on the class of Stokes problems under consideration.

Nevertheless, we want to complete the proof of concept presented here by illustrating that the proposed approach can be competitive even when used in a straightforward (and perhaps naive) way. By “straightforward”, we mean just passing the transformed matrix \(\widehat{\mathcal{A}}\) (3.3) to an AMG code, without any form of tuning or adaptation, besides the fact that one requires “unknown-based” coarsening (i.e., the prolongation is setup separately for the different type of unknowns, based on the corresponding diagonal block in \(\widehat{\mathcal{A}}\)).

We selected the aggregation-based algebraic multigrid methods (AGMG) from [22, 25, 27]. Indeed, a black box code is available [24], in which “unknown-based” coarsening is available as an option. The package, although written in FORTRAN, has also a MATLAB interface, which allowed us to use this environment to define the problem and also to perform the matrix transformations described in Sect. 3. We set all parameters to default, which, in particular, implies that the system is solved using the AMG method as preconditioner for GCR [10] (restarted each 10 iterations).

Regarding the block diagonal preconditioner, the same AGMG code with default parameters was used to generate the needed approximation \(\widetilde{A}\); more precisely, the action of \(\widetilde{A}^{-1}\) was implemented as one call to the package, requesting one application of the (previously setup) AMG preconditioner. Note that here the main iteration was performed in MATLAB, using the provided implementation of MINRES.

As test problem, we consider those of Sects. 5.1 and 5.2 (with Dirichlet boundary conditions) in their simplest form; i.e., with \(\xi =0\) and constant viscosity \(\nu =1\), which simplifies a bit the discussion of the Schur complement approximation: \(\widetilde{S}=I\) is then a standard choice (for the staggered grid, it would give the exact Schur complement on an infinite grid). The right hand side was a vector with random velocity components and zero pressure components, the initial approximation was the zero vector, and iteration were stopped when the relative residual was below \(10^{-6}\).

Table 3 Number of iterations and total elapsed time (including setup) in seconds needed to solve the linear system

The results are given in Table 3. As observed in [29], the numbers of iterations for the block diagonal preconditioner can be made smaller by using a more involved AMG method than the plain aggregation method of AGMG, but this is overall not necessarily cost effective. Regarding AMG for the transformed matrix, the number of iterations is roughly 50 % larger than that reported in [23] for the solution of pure Poisson problems with the same software.Footnote 3 This extra cost can be considered as moderate: typically, geometric multigrid method for Stokes problems will achieve similar convergence rate as for pure Poisson, but at the price of a much more costly smoothing procedure.

On the other hand, the timing results should be considered with care. The codes are not really comparable, and one may think that the block preconditioner would benefit from a pure FORTRAN implementation, although only matrix and vector operations are performed in the MATLAB environment, and most of the time is spent in the calls to AGMG. On the other hand, we mentioned above that the straightforward use of AGMG for \(\widehat{\mathcal{A}}\) was perhaps naive, and this is certainly the case with respect to implementation: in this way, we do not use the trick indicated at the end of Sect.  3; hence the timing results can certainly be significantly improved with a dedicated implementation.

7 Conclusions

We developed the foundations of a new approach to solve discrete Stokes equation with multigrid. It consist in pre-conditioning the original linear system in such a way that an “unknown-based” multigrid approach can be straightforwardly applied to the transformed system, allowing in particular the use of algebraic multigrid methods.

The approach has been validated via a detailed theoretical analysis of the iteration matrix associated with a single step for damped Jacobi smoothing. It turns out that a uniform bound on the spectral radius holds, under the main assumption that the “unknown-based” multigrid approach is applied in a sensible way. Besides, some technical conditions are to be checked, which seem however not restrictive regarding standard discretizations of Stokes equations. At least, two examples of such discretizations were successfully investigated.

The analysis of the examples further reveals that the approach can be applied indifferently to stationary and time-dependent Stokes problems, and can also be robust in presence of variable viscosity. Finally, a numerical comparison displays that the new approach can indeed be a relevant alternative to the nowadays commonly used strategies based on the combination of AMG with block preconditioning methods.