Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction and Description of the Method

The GDSW preconditioner is a two-level overlapping Schwarz preconditioner introduced in Dohrmann et al. (2008a) with a proven condition number bound for the general case of John domains for scalar elliptic and linear elasticity model problems. It is algebraic in the sense that it can be constructed from the assembled system matrix. However, compared to FETI-DP (see Toselli and Widlund 2005) or BDDC methods, in GDSW the standard coarse space is relatively large, especially in three dimensions. In Dohrmann and Widlund (2010), a related hybrid preconditioner with a reduced coarse problem for three-dimensional elasticity was introduced. Here, the degrees of freedom (d.o.f.) corresponding to the faces are modified.

The GDSW preconditioner is a two-level additive overlapping Schwarz preconditioner with exact local solvers; cf. Toselli and Widlund (2005). It can be written as

$$\displaystyle{ M_{\mathrm{GDSW}}^{-1} =\varPhi \left (\varPhi ^{T}A\varPhi \right )^{-1}\varPhi ^{T} +\sum \limits _{ i=1}^{N}R_{ i}^{T}\tilde{A}_{ i}^{-1}R_{ i}, }$$
(1)

cf. Dohrmann et al. (2008b). The matrix Φ is the essential ingredient of the GDSW preconditioner. It is composed of coarse space functions which are discrete harmonic extensions from the interface to the interior degrees of freedom of nonoverlapping subdomains. The values on the interface are restrictions of the nullspaces of the operator to the interface.

For \(\varOmega \subset \mathbb{R}^{2}\) being decomposed into John domains, the condition number of the GDSW preconditioner is bounded by

$$\displaystyle\begin{array}{rcl} \kappa \left (M_{GDSW}^{-1}K\right ) \leq C\left (1 + \frac{H} {\delta } \right )\left (1 +\log \left (\frac{H} {h} \right )\right )^{2},& &{}\end{array}$$
(2)

cf. Dohrmann et al. (2008a) and Dohrmann et al. (2008b). Here, H is the size of a subdomain, h is the size of a finite element, and δ is the overlap.

Implementation Our parallel implementation of the GDSW preconditioner is based on Trilinos version 12.0; cf. Heroux et al. (2005). For the mesh partitioning, we use ParMETIS, cf. Karypis et al. (2011), the problems corresponding to the local level are solved using UMFPACK, cf. Davis and Duff (1997) (version 5.3.0), and the coarse level is solved using Mumps, cf. Amestoy et al. (2001) (version 4.10.0), in parallel mode. For the finite element implementation, we use the library LifeV; see Formaggia et al. (2016) (version 3.8.8).

On the JUQUEEN BG/Q supercomputer, we use the clang compiler 4.7.2 and ESSL 5.1 when compiling Trilinos and the GDSW preconditioner implementation. On the Cray XT6m at Universität Duisburg-Essen, we use the Intel compiler 11.1 and the Cray Scientific Library (libsci) 10.4.4.

2 Model Problems

We consider model problems in two and three dimensions, i.e. Ω = [0, 1]2 or Ω = [0, 1]3. The domain is decomposed either in a structured way, i.e., into squares or cubes, or in an unstructured way, using ParMETIS.

Laplacian in 2D The first model problem is: find \(u \in H^{1}\left (\varOmega \right )\)

$$\displaystyle{ \begin{array}{rcl} -\varDelta u& =&1\qquad \text{in }\varOmega, \\ u& =&0\qquad \text{on }\partial \varOmega.\end{array} }$$
(3)

Linear Elasticity in 2D and 3D The second model problem is: find \(u \in (H^{1}\left (\varOmega \right ))^{2}\);

$$\displaystyle{ \begin{array}{rcl} \mathop{\mathrm{div}}\limits \boldsymbol{\sigma }& =&\boldsymbol{f}\qquad \text{in }\varOmega, \\ \mathbf{u}& =&0\qquad \text{on }\partial \varOmega _{D} = \partial \varOmega \cap \left \{x = 0\right \}\end{array} }$$
(4)

where \(\boldsymbol{\sigma }= 2\mu \boldsymbol{\varepsilon } +\lambda \mathrm{ trace}(\boldsymbol{\varepsilon })I\) is the stress and \(\boldsymbol{\varepsilon }= \frac{1} {2}(\boldsymbol{\nabla }\boldsymbol{u} + (\boldsymbol{\nabla }\boldsymbol{u})^{T})\) the strain. The Lamé parameters are λ = 1∕2. 6 and μ = 0. 3∕0. 52.

3 Numerical Results

We first show parallel scalability results in two and three dimensions. Finally, we show an application of the preconditioner within a block preconditioner in monolithic fluid-structure interaction. The model problems are discretized using piecewise quadratic (P2) finite elements. Our default Krylov method is GMRES and will be used also for the symmetric positive definite model problems. Our stopping criterion is the relative criterion \(\left \|r^{(k)}\right \|_{2}/\left \|r^{(0)}\right \|_{2} \leq 10^{-7}\) with r (0) and r (k) being the initial and the k-th residual, respectively. In our experiments, each subdomain is assigned to one processor core.

Weak Scalability in 2D We use five different meshes with Hh = 100 and an increasing number of subdomains; see Tables 1 and 2. The results of weak scaling tests from 4 to 1024 processor cores for both model problems and an overlap δ = 1h or δ = 2h are presented in Figs. 1 and 2. The GDSW preconditioner is numerically and parallel scalable, i.e., the number of iterations is bounded, both, for structured and unstructured decompositions, and the time to solution grows only slowly. The one-level preconditioner (OS1) does not scale numerically, and the number of iterations grows very fast. Indeed, for the unstructured decomposition, no convergence is obtained for OS1 within 500 iterations for more than 256 subdomains for the scalar problem and for more that 16 subdomains for elasticity. This is, of course, also due to the comparably small overlap. As a result of the better constant in (2), for the GDSW preconditioner, we observe better convergence for structured decompositions. Note that for the case of four subdomains the overlapping subdomains are significantly smaller.

Fig. 1
figure 1

Weak scaling for the Laplacian model problem in 2D, cf. (3), using P2 finite elements: number of iterations (left), runtimes (right). For the structured and the unstructured decomposition (ParMETIS), we have approximately 40, 000 d.o.f. per subdomain

Fig. 2
figure 2

Weak scaling for the linear elastic model problem in 2D, cf. (4), using P2 finite elements: number of iterations (left), runtimes (right). For the structured and the unstructured decomposition (ParMETIS), we have approximately 80, 000 d.o.f. per subdomain

Table 1 Number of degrees of freedom of the total mesh, coarse and local space dimensions of the GDSW preconditioner for the weak scaling tests in Fig. 1
Table 2 Number of degrees of freedom of the total mesh, coarse and local space dimensions of the GDSW preconditioner for the weak scaling tests in Figs. 2 and 3

A detailed analysis of different phases of the method is presented for linear elasticity in 2D in Fig. 3. We consider the standard full GDSW coarse space and the GDSW coarse space without rotations, i.e., the rotations are omitted from the coarse space. This latter case is not covered by the bound (2), but the results indicate numerical and parallel scalability.

Fig. 3
figure 3

Weak parallel scalability using the GDSW preconditioner for the model problem of linear elasticity in 2D, cf. (4): structured (left) and unstructured decomposition (right); number of iterations (top), timings for overlap δ = 1 h (middle), and timings for overlap δ = 2 h (bottom). For the structured and the unstructured decomposition (ParMETIS) we use a subdomain size of roughly 40, 000 degrees of freedom

Strong Scalability in 2D Results for strong parallel scaling tests are shown in Fig. 4 for linear elasticity in 2D. We observe very good strong scalability for structured and unstructured domain decompositions. Note that the number of d.o.f. per subdomain decreases when increasing the number of processor cores, and, to a certain extent, we thus benefit from an increasing speed of the local sparse direct solvers.

Fig. 4
figure 4

Strong parallel scalability using the GDSW preconditioner for the model problem of linear elasticity in 2D, cf. (4): structured decomposition (left), ParMETIS decomposition (right)

Weak Scalability for Linear Elasticity in 3D We present results of weak scalability runs for a linear elastic model problem in 3D from 8 to 4096 cores. We consider a structured decomposition of a cube and use the full GDSW coarse space in 3D. In Fig. 5, we present the number of iterations and the timings using P2 elements using an overlap δ of one or two elements.

Fig. 5
figure 5

Weak parallel scalability using the GDSW preconditioner for the problem of linear elasticity in 3D: number of iterations (left), timings (right). We use a subdomain size of Hh = 6 and P2 finite elements

The number of iterations seems to be bounded by a constant number, whereas the solution times increases, i.e., the cost of the (parallel) sparse direct solver used for the coarse problem is noticeable in 3D.

Application in Fluid-Structure Interaction (FSI) We consider time-dependent monolithic FSI as in Balzani et al. (2015) but using a fully implicit scheme as in Deparis et al. (2015) and Heinlein et al. (2015). We apply a monolithic Dirichlet-Neumann preconditioner applying the GDSW preconditioner for the structural block; see Balzani et al. (2015) and Heinlein et al. (2015) and the references therein. We use a pressure wave inflow condition for a tube using Mesh #1 from Heinlein et al. (2015). We consider a Neo-Hookean material for the tube; as opposed to Heinlein et al. (2015), we here use a fixed time step of 0. 0005 s and show the runtimes during the simulation.

In Fig. 6, the runtimes of ten time steps using 128 cores of the Cray XT6m at Universität Duisburg-Essen are shown. We compare IFPACK, a one-level algebraic overlapping Schwarz preconditioner from Trilinos, our geometric one-level Schwarz preconditioner (OS1), the GDSW preconditioner without rotations (GDSW-nr), and the standard GDSW preconditioner for the structural block. We see that, although the computing times vary over the simulation time, the combination of the geometric overlap and a sufficiently large coarse space consistently reduces the runtime of the fully coupled monolithic FSI simulation by a factor of about two compared to the baseline given by IFPACK. Figure 7 shows the pressure and the deformation at t = 0. 007 s where we have the largest computation time per timestep, cf. Fig. 6.

Fig. 6
figure 6

Runtimes for the monolithic FSI simulation. For clarity, the runtimes of two subsequent time steps of size Δ t = 0. 0005 s are combined. The monolithic system has approximately 1. 2 million d.o.f. We use a Neo-Hookean material. “OS1” is the one-level Schwarz preconditioner, “GDSW-nr” is the GDSW preconditioner without rotations, and “GDSW” is the GDSW preconditioner with full coarse space

Fig. 7
figure 7

Pressure and deformation at time t = 0. 007 s. The deformation is magnified by a factor of 10