Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In 2006 the Multipreconditioned Conjugate Gradient (MPCG) algorithm was introduced by Bridson and Greif (2006). It is an iterative linear solver, adapted from the Preconditioned Conjugate Gradient (PCG) algorithm (Saad, 2003), which can be used in cases where several preconditioners are available or the usual preconditioner is a sum of operators. In Bridson and Greif (2006) it was already pointed out that Domain Decomposition algorithms are ideal candidates to benefit from MPCG. This was further studied in Greif et al. (2014) which considers Additive Schwarz preconditioners in the Multipreconditioned GMRES (MPGMRES) Greif et al. (2016) setting. In 1997, Rixen had proposed in his thesis (Rixen, 1997) the Simultaneous FETI algorithm which turns out to be MPCG applied to FETI. The algorithm is more extensively studied in Gosselet et al. (2015) where its interpretation as an MPCG solver is made explicit.

The idea behind MPCG is that if at a given iteration N preconditioners are applied to the residual, then the space spanned by all of these directions is a better minimization space than the one spanned by their sum. This can significantly reduce the number of iterations needed to achieve convergence, as we will observe in Sect. 3, but comes at the cost of loosing the short recurrence property in CG. This means that at each iteration the new search directions must be orthogonalized against all previous ones. For this reason, in Spillane (2016) it was proposed to make MPCG into an Adaptive MPCG (AMPCG) algorithm where, at a given iteration, only the contributions that will accelerate convergence are kept, and all others are added into a global contribution (as they would be in classical PCG). This works very well for FETI and BDD but the theory in that article does not apply to Additive Schwarz. Indeed, the assumption is made that the smallest eigenvalue of the (globally) preconditioned operator is known. The test (called the τ-test), which chooses at each iteration which contributions should be kept, heavily relies on it. More precisely, the quantity that is examined by the τ-test can be related to a Rayleigh quotient, and the vectors that are selected to form the next minimization space correspond to large frequencies of the (globally) preconditioned operator. These are exactly the ones that are known to slow down convergence of BDD and FETI. Moreover, they are generated by the first few iterations of PCG (van der Sluis and van der Vorst, 1986). These two reasons make BDD and FETI ideal for the AMPCG framework.

The question posed by the present work is whether an AMPCG algorithm can be developed for Additive Schwarz type preconditioners. The goal is to design an adaptive algorithm that is robust at a minimal cost. One great feature of Additive Schwarz is that it is algebraic (all the components in the preconditioner can be computed from the knowledge of the matrix A), and we will aim to preserve this property. The algorithms will be presented in an abstract framework. Since the short recurrence property is lost anyway in the MPCG setting, we will consider the more efficient (Efstathiou and Gander, 2003) Restricted Additive Schwarz preconditioner (RAS) (Cai and Sarkis, 1999) in our numerical experiments, instead of its symmetric counterpart the Additive Schwarz preconditioner [see Toselli and Widlund (2005)]. RAS is a non symmetric preconditioner but, provided that full recurrence is used, conjugate gradient based algorithms apply and still have nice properties (in particular the global minimization property). We will detail this in the next section where we briefly introduce the problem at hand, the Restricted Additive Schwarz preconditioner, and the MPCG solver. Then in Sect. 2, we propose two ways to make MPCG adaptive. Finally, Sect. 3 presents some numerical experiments on matrices arising from the finite element discretization of two dimensional elasticity problems. Three types of difficulties will be considered: heterogeneous coefficients, automatic (irregular) partitions into subdomains and almost incompressible behaviour.

These are sources of notoriously hard problems that have been, and are still, at the heart of much effort in the domain decomposition community, in particular by means of choosing an adequate coarse spaces (see Sarkis (2002), Nataf et al. (2011), Spillane et al. (2014), Efendiev et al. (2012), Brezina et al. (1999), Sousedík et al. (2013), Spillane and Rixen (2013), Haferssas et al. (2015), Klawonn et al. (2015), Cai et al. (2015), Dohrmann and Widlund (2010), Klawonn et al. (2016) and many more).

1 Preliminaries

Throughout this work, we consider the problem of solving the linear system

$$\displaystyle{\mathbf{A}\mathbf{x}_{{\ast}} = \mathbf{b},}$$

where \(\mathbf{A} \in \mathbb{R}^{n\times n}\) is a sparse symmetric positive definite matrix, \(\mathbf{b} \in \mathbb{R}^{n}\) is a given right hand side, and \(\mathbf{x}_{{\ast}}\in \mathbb{R}^{n}\) is the unknown. We consider Conjugate Gradient type solvers preconditioned by the Restricted Additive Schwarz (RAS) preconditioner. To construct the RAS preconditioner, a non overlapping partition of the degrees of freedom into N subsets, or subdomains, must first be chosen and then overlap must be added to each subset to get an overlapping partition. Denoting for each s = 1, , N, by \(\widetilde{\mathbf{R}}^{s}\) and R s, the restriction operators from \([\![1,n]\!]\) into the s-th non overlapping and overlapping subdomains, respectively, the preconditioner is defined as:

$$\displaystyle{\mathbf{H} =\sum \limits _{ s=1}^{N}\mathbf{H}^{s}\text{ with }\mathbf{H}^{s} =\widetilde{ \mathbf{R}}^{s}{}^{\top }\mathbf{A}^{s}{}^{-1}\mathbf{R}^{s}\text{ and }\mathbf{A}^{s} = \mathbf{R}^{s}\mathbf{A}\mathbf{R}^{s}{}^{\top }.}$$

In Algorithm 1 the MPCG iterations are defined. Each contribution H s to H is treated separately. This corresponds to the non adaptive algorithm, i.e., the condition in line 8 is not satisfied and N search directions are added to the minimization space at each iteration (namely the columns in Z i+1). We have denoted by \(\boldsymbol{\varDelta }_{i}^{\dag }\) the pseudo inverse of \(\boldsymbol{\varDelta }_{i}\) to account for the fact that some search directions may be linearly dependent [see Gosselet et al. (2015), Spillane (2016)].

Although RAS is a non symmetric preconditioner the following properties hold:

  • \(\|\mathbf{x}_{{\ast}}-\mathbf{x}_{i}\|_{\mathbf{A}} =\mathop{ \mathrm{min}}\nolimits \left \{\|\mathbf{x}_{{\ast}}-\mathbf{x}\|_{\mathbf{A}};\,\mathbf{x} \in \mathbf{x}_{0} +\sum _{ j=0}^{i-1}\mathop{ \mathrm{range}}\nolimits (\mathbf{P}_{j})\right \}\),

  • P j AP i  = 0 (ij), r i P j  = 0 (i > j), and r i Hr j  = 0 (i > j).

This can be proved easily following similar proofs in Spillane (2016) and the textbook Saad (2003). The difference from the symmetric case is that the two last properties only hold for i > j, and not for every pair ij.

Algorithm 1: Adaptive Multipreconditioned Conjugate Gradient Algorithm for Ax  = b. Preconditioners: \(\left \{\mathbf{H}^{s}\right \}_{s=1,\ldots,N}\). Initial guess: x 0.

Multipreconditioning significantly improves convergence as has already been observed (Bridson and Greif, 2006; Gosselet et al., 2015; Greif et al., 2014; Spillane, 2016) and as will be illustrated in the numerical result section. The drawback is that a dense matrix \(\boldsymbol{\varDelta }_{i} \in \mathbb{R}^{N\times N}\) must be factorized at each iteration and that N search directions per iteration need to be stored. In the next section, we will try to remove these limitations by reducing the number of search directions at every iteration. We aim to do this without having too strong a negative impact on the convergence.

2 An Adaptive Algorithm

There is definitely a balance to be found between the number of iterations, the cost of each iteration, and the memory required for storage. Here, we do not claim that we have achieved a perfect balance, but we introduce some ways to influence it. More precisely, we propose two methods of reducing the number of columns in Z i+1 (or in other words how to fill in line 9 in Algorithm 1). In Sect. 2.1, we propose a τ-test that measures the relevance of every candidate H s r i+1 and only keeps the most relevant contributions. In Sect. 2.2, we propose to form m coarser subdomains (which are agglomerates of the initial N subdomains) and aggregate the N candidates into only m search directions. Note that there is a definite connection with multigrid studies from where we have borrowed some vocabulary [see Vassilevski (2008), Chartier et al. (2003), Brandt et al. (2011) and many references therein].

2.1 Select Search Directions with a τ-Test

The τ-test in the original AMPCG publication (Spillane, 2016) is based on the assumption that the smallest eigenvalue for the globally preconditioned operator HA is known (Toselli and Widlund, 2005). This allows for an error estimate inspired by those in Axelsson and Kaporin (2001), and the choice of the τ-test is a direct consequence of it. Here, the largest eigenvalue is known and it is the presence of small eigenvalues that is responsible for slow convergence. Unfortunately, we have failed to produce an estimate similar to that in Spillane (2016) in this case. Note that there is no such estimate in Axelsson and Kaporin (2001) either, and we believe that this is inherent to the properties of the conjugate gradient algorithm.

The approach that we propose here to select local contributions is different. It is well known by now [see, e.g., Saad (2003)] that, at each iteration, the approximate solution returned by the conjugate gradient algorithm is the A-orthogonal projection of the exact solution x onto the minimization space. Here, the property satisfied by the update between in iteration i + 1 is

$$\displaystyle{\|\mathbf{x}_{{\ast}}-\mathbf{x}_{i+1}\|_{\mathbf{A}} =\mathop{ \mathrm{min}}\nolimits \left \{\|\mathbf{x}_{{\ast}}-\mathbf{x}\|_{\mathbf{A}};\,\mathbf{x} \in \mathbf{x}_{i} +\mathop{ \mathrm{range}}\nolimits (\mathbf{P}_{i})\right \},}$$

where P i forms a basis of \(\mathop{\mathrm{range}}\nolimits (\mathbf{Z}_{i})\) after orthogonalization against previous search spaces (line 12 in Algorithm 1).

For this reason, the τ-test that we propose aims at evaluating, for each s = 1, , N, the ratio between the norm of the error projected onto the global vector Hr i+1 and the norm of the error projected onto the local candidate H s r i+1. More precisely, we compute (with 〈⋅ , ⋅ 〉 denoting the 2 inner product)

$$\displaystyle{ t_{i}^{s} = \frac{\langle \mathbf{r}_{i+1},\mathbf{H}\mathbf{r}_{i+1}\rangle ^{2}} {\langle \mathbf{H}\mathbf{r}_{i+1},\mathbf{A}\mathbf{H}\mathbf{r}_{i+1}\rangle } \times \frac{\langle \mathbf{H}^{s}\mathbf{r}_{i+1},\mathbf{A}\mathbf{H}^{s}\mathbf{r}_{i+1}\rangle } {\langle \mathbf{r}_{i+1},\mathbf{H}^{s}\mathbf{r}_{i+1}\rangle ^{2}}. }$$
(1)

This is indeed the announced quantity, since the square of the A-norm of the A orthogonal projection of x x i onto any vector v is

$$\displaystyle{\|\mathbf{v}(\mathbf{v}^{\top }\mathbf{A}\mathbf{v})^{-1}\mathbf{v}^{\top }\mathop{\underbrace{\mathbf{A}(\mathbf{x}_{ {\ast}}-\mathbf{x}_{i})}}\limits _{=\mathbf{r}_{i}}\|_{\mathbf{A}}^{2} = \frac{\langle \mathbf{r}_{i},\mathbf{v}\rangle ^{2}} {\langle \mathbf{v},\mathbf{A}\mathbf{v}\rangle }.}$$

Then, given a threshold τ, the number of columns in Z i+1 is reduced by eliminating all those for which t i s > τ. In order for the global preconditioned residual \(\sum \limits _{s=1}^{N}\mathbf{H}^{s}\mathbf{r}_{i+1}\) to be included in the search space (as is always the case in PCG), we add it to Z i+1 in a separate column. This way we obtain a minimization space \(\mathop{\mathrm{range}}\nolimits (\mathbf{P}_{i})\) of dimension anywhere between 1 and N

An important question is of course how to choose τ. Considering that t i s measures the (inverse of the) impact of one of N contributions compared to the impact of the sum of the N contributions, it is quite natural to choose τ ≈ N. In the next section, we illustrate the behaviour of the adaptive algorithm with the τ-test for values of τ ranging between N∕10 and 10N with satisfactory results.

In order to determine whether or not t i s ≤ τ (i.e., perform the τ-test) it is necessary to compute t i s. Here, we will not discuss how to do this at the smallest cost but it is of course an important consideration (that was discussed for the AMPCG algorithm applied to BDD in Spillane (2016)). One noteworthy observation is that if H were either the Additive Schwarz (AS), or the Additive Schwarz with Harmonic overlap [ASH Cai and Sarkis (1999)] preconditioner (i.e., H =  s = 1 N R s A s −1 R s or \(\mathbf{H} =\sum _{ s=1}^{N}\mathbf{R}^{s}{}^{\top }\mathbf{A}^{s}{}^{-1}\widetilde{\mathbf{R}}^{s}\)) then all terms involving H s AH s would simplify since, obviously, A s −1 R s AR s A s −1 = A s −1.

Another option is to prescribe a number m of vectors to be selected at each iteration instead of a threshold τ, and keep the m vectors with smallest values of t i s. Then, only the second factor in (1) would be required. We leave a more in depth study of these questions for future work.

2.2 Aggregate Search Directions

Here, we propose a completely different, and much simpler, way of reducing the number of vectors in Z i+1. This is to choose a prescribed number m, with m ≤ N, of search directions per iteration, and a partition of \([\![1,N]\!]\) into m subsets. Then, the columns of Z i+1 that correspond to the same subset are simply replaced by their sum, leaving m vectors. We refer to this as aggregation as it is the same as assembling coarse domains from the original subdomains and computing coarse search directions as sums of the H i+1 s. The question of how to choose m is of course important. It can be a fraction of N or the maximal size of the dense matrix that the user is prepared to factorize. In the next section, we consider values ranging from N∕20 to N.

3 Numerical Results with FreeFem++ (Hecht, 2013) and GNU Octave (Eaton et al., 2009)

In this section, we consider the linear elasticity equations posed in Ω = [0, 1]2 with mixed boundary conditions. We search for u = (u 1, u 2) ∈ H 1(Ω)2 such that

$$\displaystyle{\left \{\begin{array}{rl} -\mbox{ div}(\boldsymbol{\sigma }(\mathbf{u})) = (0,0)^{\top },\quad &\text{in }\varOmega, \\ \mathbf{u} = (1/2(y(1 - y)),0)^{\top },\quad &\text{on }\{(x,y) \in \partial \varOmega: x = 0\}, \\ \mathbf{u} = (-1/2(y(1 - y)),0)^{\top },\quad &\text{on }\{(x,y) \in \partial \varOmega: x = 1\}, \\ \sigma (\mathbf{u}) \cdot \mathbf{n} = 0,\quad &\text{on the rest of }\partial \varOmega (\mathbf{n}: \text{ outward normal}).\end{array} \right.}$$

The stress tensor σ(u) is defined by σ ij (u) = 2μ ɛ ij (u) +λ δ ij div(u) for i, j = 1, 2 where \(\varepsilon _{ij}(\mathbf{u}) = \frac{1} {2}\left (\frac{\partial u_{i}} {\partial x_{j}} + \frac{\partial u_{j}} {\partial x_{i}} \right )\), δ ij is the Kronecker symbol and the Lamé coefficients are functions of Young’s modulus E and Poisson’s ratio ν : \(\mu = \frac{E} {2(1+\nu )},\,\lambda = \frac{E\nu } {(1+\nu )(1-2\nu )}\). In all test cases, ν is uniform and equal either to 0. 4 (compressible test case) or 0. 49999 (almost incompressible test case) while E varies between 106 and 1012 in a pattern presented in Fig. 1-left. The geometries of the solutions are also presented in this figure.

Fig. 1
figure 1

Test case setup (all three configurations are drawn to scale). Left: Young’s modulus—E = 106 with square inclusions of larger E, up to 1012. Middle: Solution for ν = 0. 4. Right: Solution for ν = 0. 49999

The computational domain is discretized into a uniform mesh with mesh size: h = 1∕60, and partitioned into N = 100 subdomains by the automatic graph partitioner METIS (Karypis and Kumar, 1998). One layer of overlap is added to each subdomain. In the compressible case, the system is discretized by piecewise second order polynomial (\(\mathbb{P}_{2}\)) Lagrange finite elements. In the almost incompressible setting it is known that the locking phenomenon occurs rendering the solution unreliable. To remedy this, the problem is rewritten in a mixed formulation with an additional unknown \(p =\mathop{ \mathrm{div}}\nolimits (u)\), and then discretized. Although the \(\mathbb{P}_{2} - \mathbb{P}_{0}\) mixed finite element does not satisfy the discrete inf-sup condition it is often used in practice, and we choose it here. Finally, the pressure unknowns are eliminated by static condensation.

In both cases the problem has 28798 degrees of freedom (once degrees of freedom corresponding to Dirichlet boundary conditions have been eliminated). As an initial guess, we first compute a random vector v and then scale it to form \(\mathbf{x}_{0} = \frac{\mathbf{b}^{\top }\mathbf{v}} {\|\mathbf{v}\|_{A}^{2}}\), according to what is proposed in Strakoš and Tichý (2005). This guarantees that ∥ x x 0 ∥  A  ≤ ∥ x  ∥  A : the initial error is at most as large as it would be with a zero initial guess.

In Table 1, we report on the number of iterations needed to reduce the initial error ∥ x x 0 ∥  A by a factor 10−7 and on the size of the minimization space that was constructed to do this, which is \(\sum \limits _{i}\mathop{ \mathrm{rank}}\nolimits (\mathbf{P}_{i})\). Note that, although they are presented in the same table, we cannot compare the compressible and incompressible test cases as they are simply not the same problem. Figures 234 and 5 show in more detail the convergence behaviour of each method.

Fig. 2
figure 2

Compressible test case—reducing the number of directions with the τ-test—error norm versus iteration count for different values of τ

Fig. 3
figure 3

Incompressible test case—reducing the number of directions with the τ-test—error norm versus iteration count for different values of τ

Fig. 4
figure 4

Compressible test case—reducing the number of directions by aggregating them into m vectors—error norm versus iteration count for different values of m

Fig. 5
figure 5

Incompressible test case—reducing the number of directions by aggregating them into m vectors—error norm versus iteration count for different values of m

Table 1 Summary of all numerical results presented

The first point to be made is that the MPCG algorithm does an excellent job at reducing the number of iterations. This can be observed by looking at the data for m = 100 = N directions per iteration in Figs. 4 and 5. The iteration counts are reduced from 889 to 60 and from over 999 to 56 compared to the classical PCG iterations (m = 1 direction per iteration). Secondly the adaptation steps that we introduced seem to do their job since they ensure fast convergence with smaller minimization spaces. In particular, all of these adaptive methods converged in less than 512 iterations even for the incompressible case (for which the usual PCG still has a relative error of 8 ⋅ 10−4 after 999 iterations).

With the τ-test, the number of iterations is always reduced by a factor at least 8 compared to PCG even with the smallest threshold τ = 10 = N∕10. With τ = 10N the number of iterations is almost the same as with the full MPCG. For these test cases the choice τ = N advocated in Sect. 2 seems to be a good compromise.

With the aggregation procedure, convergence is achieved even when the coarsening is quite aggressive (5 vectors per iteration means that 20 local contributions have been added together to form the search direction). As expected, keeping more vectors per iteration yields significantly better results in terms of iteration count.

Based on these results, it is not possible to compare the two approaches and future work will definitely be focused on an optimized implementation and on decreasing the CPU time.

4 Conclusions and Future Work

In this work, we have implemented the MPCG (Bridson and Greif, 2006; Greif et al., 2014) algorithm for Restricted Additive Schwarz. We have observed very good convergence on test cases with known difficulties (heterogeneities and almost incompressible behaviour). This is a confirmation that multipreconditioning is a valuable tool to improve robustness. The main focus of this article has been to propose an adaptive version of the algorithm so that, when possible, the cost of each iteration and the cost of storage can be reduced while maintaining fast convergence. To this end, we have introduced two methods to reduce the number of search directions at each iteration: one is based on the so called τ-test, and the other on adding some local components together. Numerical results have confirmed that both these approaches behave as expected.

One important feature of the algorithms proposed is that they are completely algebraic in that they can be applied to any symmetric, positive definite matrix A without any extra knowledge.

An optimized parallel implementation is the subject of ongoing work in order to compare MPCG and the two AMPCG algorithms in terms of CPU time. Scalability must also be measured. The author is quite confident that the best AMPCG algorithm should be a combination of the two adaptive approaches. Additionally there is no reason why the components that are added together in the aggregation procedure should not first be weighted by some optimized coefficients, turning the algorithm into a multilevel one.