Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Motivation

We consider the numerical solution of nonsymmetric linear systems of equations of the form

$$\displaystyle{ A\mathbf{u} = \mathbf{f}, }$$
(1)

that arise from the discretization of partial differential equations (PDEs). In practical problems, the number of mesh points is very large, and thus also the number of unknowns in (1), and the resulting matrix is large and sparse. In these circumstances, iterative methods are often used, due to their ability to deal more effectively with a high degree of sparsity. A popular iterative method is the Generalized Minimum Residual iterative scheme, or GMRES [810]. This method is based on minimizing at the kth iterate the residual within the affine Krylov subspace \(\mathbf{u}_{0} + \mathcal{K}^{k}(A,\mathbf{r}_{0})\), where u 0 is an initial vector, \(\mathbf{r}_{0} = \mathbf{f} - A\mathbf{u}_{0}\) is the initial residual, and

$$\displaystyle{\mathcal{K}^{k}(A,\mathbf{r}_{ 0}) =\mathrm{ span}(\mathbf{r}_{0},A\mathbf{r}_{0},\ldots,A^{k-1}\mathbf{r}_{ 0}).}$$

The performance of GMRES is often (though not exclusively) determined by the structure of the eigenvalues of the matrix A. Loosely speaking, if they are strongly clustered, then GMRES is expected to converge fast. To accomplish a clustering effect, a preconditioner M is typically used: instead of solving (1) we solve, say,

$$\displaystyle{\mathit{AM}\tilde{\mathbf{u}} = \mathbf{f},}$$

where M is constructed so that AM has a more favorable eigenstructure than A. Upon incorporating the preconditioner M, the Krylov subspace changes accordingly: the matrix associated with the subspace becomes AM, and the preconditioned residual is now minimized.

A common way of dealing with the large number of degrees of freedom in a fine mesh is to break the problem down into a number of more manageable sub-problems. This amounts to the technique of domain decomposition; see, e.g., [11]. We can then incorporate preconditioners that work on the subdomains into the general iterative framework.

The additive Schwarz preconditioner [11] and its restricted variant (RAS) [3], can be written in the form

$$\displaystyle{M =\sum _{ i=1}^{t}\tilde{R}_{ i}A_{i}^{-1}R_{ i}^{T},}$$

where t is usually the number of subdomains, \(\tilde{R}_{i}\) is a restriction operator, R i T is a prolongation operator, and \(A_{i} = R_{i}^{T}AR_{i}\) is the restriction of A onto the ith subdomain.

A possible generalization would be to use a weighted additive or restricted additive Schwarz preconditioner, say of the form

$$\displaystyle{M^{(k)} =\sum _{ i=1}^{t}\alpha _{ i}^{(k)}\tilde{R}_{ i}A_{i}^{-1}R_{ i}^{T},}$$

where the weights \(\alpha _{i}^{(k)}\) are chosen at the kth iteration of GMRES so as to minimize the preconditioned residual, cf. [1].Footnote 1 What we propose in this paper is to go a step further, and implicitly find at each iteration both the current weights and all the weights at the previous iterations, so as to minimize the residual at the current step.

Incorporating weights which change from one iteration to the next is significant and we can no longer talk about a standard iterative method with a single preconditioner. Instead, the proposed strategy fits into the MPGMRES paradigm the authors recently described in [5], where more than one preconditioner may be applied simultaneously.Footnote 2 Our main goal in this paper is to show that this methodology is particularly effective in the domain decomposition paradigm, since we can associate each subdomain with a specific, unique preconditioner.

An outline of the remainder of this paper follows. In Sect. 2 we briefly describe Additive and Restricted Additive Schwarz Preconditioning. In Sect. 3 we describe the MPGMRES algorithm. We address the question of computational cost of the algorithm and characterize the generalized Krylov subspace and its unique features in domain decomposition setting. In Sect. 4 we provide some details on numerical experiments. Finally, in Sect. 5 we make some concluding remarks.

2 Additive Schwarz Preconditioning

Suppose we divide the domain Ω containing n nodes into t subdomains \(\varOmega _{1},\ldots,\varOmega _{t}\), which overlap by bands of width δ nodes. Suppose each subdomain consists of m i  ≪ n nodes, which we denote as the entries of the set I i . We can define a prolongation matrix \(R_{i,\delta }^{T} \in \mathbb{R}^{n\times m_{i}}\) which extends vectors \(\mathbf{u}^{(i)} \in \mathbb{R}^{m_{i}}\) to \(\mathbb{R}^{n}\) by

$$\displaystyle{(R_{i,\delta }^{T}\mathbf{u}^{(i)})_{ k} = \left \{\begin{array}{ll} \ (\mathbf{u}^{(i)})_{ k}\quad &\mathrm{if}\ k \in I_{i} \\ \ 0\quad &\mathrm{otherwise.} \end{array} \right.}$$

The transpose of this matrix defines a restriction operator R i which restricts vectors in \(\mathbb{R}^{n}\) to the subdomain Ω i . The restriction of the discretized PDE, A, to the ith subdomain is given by \(A_{i} = R_{i,\delta }AR_{i,\delta }^{T}.\)

We can now define the additive Schwarz preconditioner as

$$\displaystyle{ M:=\sum _{ i=1}^{t}R_{ i,\delta }^{T}A_{ i}^{-1}R_{ i,\delta } =\sum _{ i=1}^{t}M_{ i}, }$$
(2)

where \(M_{i}:= R_{i,\delta }^{T}(R_{i,\delta }AR_{i,\delta }^{T})^{-1}R_{i,\delta }\). Note that, by the definition of \(R_{i,\delta }^{T}\), there exists some permutation Π i such that, for all x,

$$\displaystyle{\varPi _{i}M_{i}\mathbf{x} = \left (\times \cdots \times 0\cdots \cdots 0\right )^{T},}$$

i.e., the vector resulting from multiplication by the M i (regardless of the permutation) will be sparse.

We can also define a restricted additive Schwarz (RAS) preconditioner [5] by considering the prolongation \(R_{i,0}^{T}\) instead of \(R_{i,\delta }^{T}\) in (2).

3 The MPGMRES Algorithm for Domain Decomposition Problems

MPGMRES [5] is a minimal residual algorithm for solving a linear system of equations which allows the user to apply more than one preconditioner simultaneously (see also [2] for a multipreconditioned version of the conjugate gradient method). At each step, new search directions are added to the search space, corresponding to AM i v for each i = 1, , t, and for each basis vector v of the current search space. The multipreconditioned search directions are all combined into a generalized Krylov subspace, and the minimization procedure requires solving a linear least-squares problem. As opposed to standard GMRES, here the subspace grows quickly due to the presence of multiple search directions and the projection can be expressed in terms of a block upper Hessenberg matrix; see Fig. 1. It has been shown in [5] that a so-called selective MPGMRES (sMPGMRES) algorithm—which chooses a subset of t search directions and hence keeps the size of the search space growing only linearly—can be an effective method. MPGMRES (in both complete and selective forms) is given as Algorithm 1.

Fig. 1
figure 1

Schematic of Arnoldi decompositions in (a) complete and (b) selective MPGMRES

Algorithm 1 MPGMRES

      Choose u 0, \(\mathbf{r}_{0} = \mathbf{f} -\mathcal{A}\mathbf{u}_{0}\)

      \(\beta =\| \mathbf{r}_{0}\|\), \(\mathbf{v}_{1} = \mathbf{r}_{0}/\beta\)

      \(Z_{1} = [\mathcal{M}_{1}\mathbf{v}_{1}\cdots \mathcal{M}_{t}\mathbf{v}_{1}]\)

      for k = 1, ,  until convergence do

          \(W = \mathcal{A}Z_{k}\)

          for j = 1, , k do

              \(H_{j,k} = (V _{j})^{T}W\)

              \(W = W - V _{j}H_{j,k}\)

          end for

        \(W = V _{k+1}H_{k+1,k}\) (skinny QR factorization)

        \(\mathbf{y}_{k} =\mathrm{ argmin}\|\beta \mathbf{e}_{1} -\tilde{ H}_{k}\mathbf{y}\|_{2}\)

        \(\mathbf{u}_{k} = \mathbf{u}_{0} + [Z_{1}\cdots Z_{k}]\mathbf{y}_{k}\)

        \(Z_{k+1} = \left \{\begin{array}{ll} \mathtt{[\mathcal{M}_{1}V_{k+1}\cdots \mathcal{M}_{t}V_{k+1}]} &\mathrm{for\ complete\ MPGMRES} \\ \mathtt{[\mathcal{M}_{1}V_{k+1}\mathbf{1}\cdots \mathcal{M}_{t}V_{k+1}\mathbf{1}]}&\mathrm{for\ selective\ MPGMRES}\end{array} \right.\)

    end for

3.1 Computational Work

In the selective algorithm we need t matrix-vector products and t preconditioner solves per iteration, as opposed to one for both in the standard preconditioned GMRES algorithm. The main other source for work is the inner products. Note that every entry in the Hessenberg matrix H k is the result of an inner product, and these are the only inner products in the algorithm. MPGMRES therefore needs \((2k - 1)\frac{t^{2}} {2} + \frac{3} {2}t\) inner products at the kth step [5, Table 4.1].

Significantly, in the domain decomposition setting, due to the nature of the standard Additive Schwarz preconditioner, the preconditioning step is exactly the same cost when using both selective MPGMRES and standard preconditioned GMRES. Moreover, since the vectors we obtain by applying the preconditioners are sparse, the cost of the matrix-vector products will also be of the same order as in the standard GMRES algorithm—the only extra expense coming from the overlapping nodes. Indeed, if we use RAS, then the cost of a matrix-vector product would be identical here too. While we studied RAS in the context of MPGMRES in [5], in the rest of this paper we restrict our comments and experiments to additive Schwarz. The extra cost in the MPGMRES approach therefore lies completely with the inner products. The vectors here are, in general, dense, as we lose sparsity of W in the modified Gram-Schmidt step (in the inner loop of Algorithm 1).

3.2 The Subspace in Complete MPGMRES

Recall that (complete) MPGMRES minimizes over the multi-Krylov subspace

$$\displaystyle{\mathcal{K}_{M_{1},\ldots,M_{t}}^{k}(A,\mathbf{r}_{ 0}),}$$

where

$$\displaystyle\begin{array}{rcl} \mathcal{K}_{M_{1},\ldots,M_{t}}^{1}(A,\mathbf{r}_{ 0})& =& \mathit{span}\{M_{1}A\mathbf{r}_{0},\ldots,M_{t}A\mathbf{r}_{0}\}, {}\\ \mathcal{K}_{M_{1},\ldots,M_{t}}^{2}(A,\mathbf{r}_{ 0})& =& \mathit{span}\{M_{1}A\mathbf{r}_{0},\ldots,M_{t}A\mathbf{r}_{0},M_{1}AM_{1}\mathbf{r}_{0},\ldots,M_{1}AM_{t}\mathbf{r}_{0},\ldots {}\\ & & \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ldots,M_{t}AM_{1}\mathbf{r}_{0},\ldots,M_{t}AM_{t}\mathbf{r}_{0}\}, {}\\ \end{array}$$

etc. Usually the size of this space grows exponentially with each iteration. However, in an additive Schwarz context the situation is not quite so dire, as we see below.

First, note that each preconditioned matrix is a projection, since

$$\displaystyle\begin{array}{rcl} M_{i}AM_{i}& =& R_{i,\delta }^{T}(R_{ i,\delta }AR_{i,\delta }^{T})^{-1}R_{ i,\delta }AR_{i,\delta }^{T}(R_{ i,\delta }AR_{i,\delta }^{T})^{-1}R_{ i,\delta } = M_{i}. {}\\ \end{array}$$

Hence applying M i to AM i does nothing to enrich the space.

Next, note that

$$\displaystyle{M_{i}AM_{j} = R_{i,\delta }^{T}(R_{ i,\delta }AR_{i,\delta }^{T})^{-1}R_{ i,\delta }AR_{j,\delta }^{T}(R_{ j,\delta }AR_{j,\delta }^{T})^{-1}R_{ j,\delta }.}$$

In the middle of this expression is the cross-term \(R_{i,\delta }AR_{j,\delta }^{T}\). Now note that \(R_{i,\delta }AR_{j,\delta }^{T} = 0\) whenever I i I j  = . Provided the overlap δ is not large enough to touch two subdomains, this implies that only the contributions from sub-domains that touch each other add anything to the multi-Krylov subspace. This is the number of edges + corners in 2D (a maximum of 8 for a tensor product-based grid), and these plus the number of faces in 3D (a max of 26 for a tensor product-based grid). Altogether, this means that

$$\displaystyle{\mathit{dim}(\mathcal{K}_{M_{1},\ldots,M_{t}}^{k}(A,\mathbf{r}_{ 0})) = (kc + 1)t,}$$

where c is a constant independent of k, t. Therefore, even in the complete MPGMRES case, we only have linear growth in the search space.

4 Numerical Experiments

If we split the domain into a small number of subdomains, i.e., we have a high proportion of subdomains lying on an edge, then there may not be much difference between the spaces minimized over by the selective algorithm and the complete algorithm.

For example, consider the special case where we split the domain Ω into two subdomains, Ω 1 and Ω 2 such that Ω 1Ω 2 = Ω. Then it can be shown [5, Sect. 5.2.1] that, provided the subdomain solves are exact, the space over which we minimize in both selective and complete MPGMRES are identical.

Figure 2 shows the convergence curves for solving the advection-diffusion equation

$$\displaystyle\begin{array}{rcl} -\nabla ^{2}u +\omega \cdot \nabla u = f\qquad \mathrm{in}\ \varOmega & &{}\end{array}$$
(3)
$$\displaystyle\begin{array}{rcl} u = 0\qquad \mathrm{on}\ \partial \varOmega,& &{}\end{array}$$
(4)

where Ω denotes the unit square and \(\omega = 10\left (\cos ( \frac{\pi }{3}),\sin ( \frac{\pi }{3})\right )^{T}.\) This is discretized using finite differences with a uniform mesh size h, and the right hand side is taken to be the vector of ones. Thus, in 2D, \(n = 1/h^{2}\) and in 3D, \(n = 1/h^{3}\).

Fig. 2
figure 2

Convergence curves for solving the advection-diffusion equation (3)–(4) with two subdomains in 2D and 3D. The iteration number is plotted along the x-axis, and \(\|\mathbf{r}_{k}\|_{2}\) is plotted along the y-axis. (a) 2D, \(h =\{ 2^{-3},2^{-4},2^{-5},2^{-6},2^{-7},2^{-8}\}\). (b) 3D, \(h =\{ 2^{-2},2^{-3},2^{-4},2^{-5}\}\)

As we see in Fig. 2, the iteration counts are significantly better using a multipreconditioned approach. Despite only having a serial MATLAB code, this also corresponds to significantly better timings, as is seen in Table 1: it is anticipated that the difference between the two approaches would be even more striking in a parallel implementation.

Table 1 Timings for sMPGMRES and GMRES with two subdomains in 2D (left) and 3D (right)

For a large numbers of subdomains, the work involved in the inner products and vector updates becomes significant, even though the work in actually applying the preconditioners is essentially the same as for the usual AS method. Convergence curves for the problem (3)–(4) are given in Fig. 3.

Fig. 3
figure 3

Convergence curves for multiple subdomains in 2D (\(h = 2^{-6}\)). The iteration number is plotted along the x-axis, and \(\|\mathbf{r}_{k}\|_{2}\) is plotted along the y-axis

Although the iteration counts are impressive for a large number of subdomains (with, e.g., 101 iterations for GMRES with an additive Schwarz preconditioner being reduced to 17 iterations with selective MPGMRES for 256 subdomains), the timings in this case are not yet competitive—e.g., for the case with 256 subdomains GMRES converges in 2.5 s whereas sMPGMRES takes 9 s. This is due to the fact that we are using a proof-of-concept (serial) MATLAB code. Recall that the only extra work between the methods is in calculating the inner products and the subsequent vector update in the Gram-Schmidt process. Due to the block nature of the proposed method much of this extra work could be distributed across any available processors. We envisage that a state-of-the-art implementation would yield great computational savings, which would be manifested in a significantly reduced running time. This would be especially true for very large scale problems, where the cost of the subdomain solves would dominate the cost of each iteration. A Fortran 95 implementation of MPGMRES—HSL_MI29—will be included in the 2013 release of the HSL subroutine library.

Recall from Algorithm 1 that in the implementation of sMPGMRES reported here we apply each preconditioner to the sum of the columns of V k+1. This choice is by no means unique, and there are many other possible selection strategies [5, Sect. 2.3]. The approach employed here seems to perform well on a wide range of problems, but it is a somewhat arbitrary choice. There may be situations where another selection strategy would be superior; this is one avenue for future research.

5 Conclusions

We have presented an algorithm that applies Additive Schwarz with Variable Weights. The approach is incorporated as a set of multiple preconditioners into MPGMRES. Domain decomposition has a few unique features that make our approach particularly attractive. First, the preconditioning step entails the same cost when using both selective MPGMRES and standard preconditioned GMRES, and the cost of the matrix-vector products is also of the same order as in the standard GMRES algorithm. Secondly, because there is a very low degree of overlap between nodes in the different subdomains, the growth in the search space for complete MPGMRES is only linear, i.e., very modest. This is in contrast to other situations, where the search space for complete MPGMRES grows exponentially and we settle for a selective algorithm. For these reasons we believe that the combination of domain decomposition preconditioners and the MPGMRES framework is an effective method for the numerical solution of linear systems arising from PDEs.