Abstract
Parallel results obtained with a new implementation of an overlapping Schwarz method using an energy minimizing coarse space are presented. We consider structured and unstructured domain decompositions for scalar elliptic and linear elasticity model problems in two dimensions. In particular, strong and weak parallel scalability studies for up to 1024 processor cores are presented for both types of problems. Additionally, weak scalability results for a three-dimensional linear elasticity model problem using up to 4096 processor cores are discussed. Finally, an application from fully-coupled fluid-structure interaction using a nonlinear hyperelastic material model for the structure is shown.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Introduction and Description of the Method
The GDSW preconditioner is a two-level overlapping Schwarz preconditioner introduced in Dohrmann et al. (2008a) with a proven condition number bound for the general case of John domains for scalar elliptic and linear elasticity model problems. It is algebraic in the sense that it can be constructed from the assembled system matrix. However, compared to FETI-DP (see Toselli and Widlund 2005) or BDDC methods, in GDSW the standard coarse space is relatively large, especially in three dimensions. In Dohrmann and Widlund (2010), a related hybrid preconditioner with a reduced coarse problem for three-dimensional elasticity was introduced. Here, the degrees of freedom (d.o.f.) corresponding to the faces are modified.
The GDSW preconditioner is a two-level additive overlapping Schwarz preconditioner with exact local solvers; cf. Toselli and Widlund (2005). It can be written as
cf. Dohrmann et al. (2008b). The matrix Φ is the essential ingredient of the GDSW preconditioner. It is composed of coarse space functions which are discrete harmonic extensions from the interface to the interior degrees of freedom of nonoverlapping subdomains. The values on the interface are restrictions of the nullspaces of the operator to the interface.
For \(\varOmega \subset \mathbb{R}^{2}\) being decomposed into John domains, the condition number of the GDSW preconditioner is bounded by
cf. Dohrmann et al. (2008a) and Dohrmann et al. (2008b). Here, H is the size of a subdomain, h is the size of a finite element, and δ is the overlap.
Implementation Our parallel implementation of the GDSW preconditioner is based on Trilinos version 12.0; cf. Heroux et al. (2005). For the mesh partitioning, we use ParMETIS, cf. Karypis et al. (2011), the problems corresponding to the local level are solved using UMFPACK, cf. Davis and Duff (1997) (version 5.3.0), and the coarse level is solved using Mumps, cf. Amestoy et al. (2001) (version 4.10.0), in parallel mode. For the finite element implementation, we use the library LifeV; see Formaggia et al. (2016) (version 3.8.8).
On the JUQUEEN BG/Q supercomputer, we use the clang compiler 4.7.2 and ESSL 5.1 when compiling Trilinos and the GDSW preconditioner implementation. On the Cray XT6m at Universität Duisburg-Essen, we use the Intel compiler 11.1 and the Cray Scientific Library (libsci) 10.4.4.
2 Model Problems
We consider model problems in two and three dimensions, i.e. Ω = [0, 1]2 or Ω = [0, 1]3. The domain is decomposed either in a structured way, i.e., into squares or cubes, or in an unstructured way, using ParMETIS.
Laplacian in 2D The first model problem is: find \(u \in H^{1}\left (\varOmega \right )\)
Linear Elasticity in 2D and 3D The second model problem is: find \(u \in (H^{1}\left (\varOmega \right ))^{2}\);
where \(\boldsymbol{\sigma }= 2\mu \boldsymbol{\varepsilon } +\lambda \mathrm{ trace}(\boldsymbol{\varepsilon })I\) is the stress and \(\boldsymbol{\varepsilon }= \frac{1} {2}(\boldsymbol{\nabla }\boldsymbol{u} + (\boldsymbol{\nabla }\boldsymbol{u})^{T})\) the strain. The Lamé parameters are λ = 1∕2. 6 and μ = 0. 3∕0. 52.
3 Numerical Results
We first show parallel scalability results in two and three dimensions. Finally, we show an application of the preconditioner within a block preconditioner in monolithic fluid-structure interaction. The model problems are discretized using piecewise quadratic (P2) finite elements. Our default Krylov method is GMRES and will be used also for the symmetric positive definite model problems. Our stopping criterion is the relative criterion \(\left \|r^{(k)}\right \|_{2}/\left \|r^{(0)}\right \|_{2} \leq 10^{-7}\) with r (0) and r (k) being the initial and the k-th residual, respectively. In our experiments, each subdomain is assigned to one processor core.
Weak Scalability in 2D We use five different meshes with H∕h = 100 and an increasing number of subdomains; see Tables 1 and 2. The results of weak scaling tests from 4 to 1024 processor cores for both model problems and an overlap δ = 1h or δ = 2h are presented in Figs. 1 and 2. The GDSW preconditioner is numerically and parallel scalable, i.e., the number of iterations is bounded, both, for structured and unstructured decompositions, and the time to solution grows only slowly. The one-level preconditioner (OS1) does not scale numerically, and the number of iterations grows very fast. Indeed, for the unstructured decomposition, no convergence is obtained for OS1 within 500 iterations for more than 256 subdomains for the scalar problem and for more that 16 subdomains for elasticity. This is, of course, also due to the comparably small overlap. As a result of the better constant in (2), for the GDSW preconditioner, we observe better convergence for structured decompositions. Note that for the case of four subdomains the overlapping subdomains are significantly smaller.
A detailed analysis of different phases of the method is presented for linear elasticity in 2D in Fig. 3. We consider the standard full GDSW coarse space and the GDSW coarse space without rotations, i.e., the rotations are omitted from the coarse space. This latter case is not covered by the bound (2), but the results indicate numerical and parallel scalability.
Strong Scalability in 2D Results for strong parallel scaling tests are shown in Fig. 4 for linear elasticity in 2D. We observe very good strong scalability for structured and unstructured domain decompositions. Note that the number of d.o.f. per subdomain decreases when increasing the number of processor cores, and, to a certain extent, we thus benefit from an increasing speed of the local sparse direct solvers.
Weak Scalability for Linear Elasticity in 3D We present results of weak scalability runs for a linear elastic model problem in 3D from 8 to 4096 cores. We consider a structured decomposition of a cube and use the full GDSW coarse space in 3D. In Fig. 5, we present the number of iterations and the timings using P2 elements using an overlap δ of one or two elements.
The number of iterations seems to be bounded by a constant number, whereas the solution times increases, i.e., the cost of the (parallel) sparse direct solver used for the coarse problem is noticeable in 3D.
Application in Fluid-Structure Interaction (FSI) We consider time-dependent monolithic FSI as in Balzani et al. (2015) but using a fully implicit scheme as in Deparis et al. (2015) and Heinlein et al. (2015). We apply a monolithic Dirichlet-Neumann preconditioner applying the GDSW preconditioner for the structural block; see Balzani et al. (2015) and Heinlein et al. (2015) and the references therein. We use a pressure wave inflow condition for a tube using Mesh #1 from Heinlein et al. (2015). We consider a Neo-Hookean material for the tube; as opposed to Heinlein et al. (2015), we here use a fixed time step of 0. 0005 s and show the runtimes during the simulation.
In Fig. 6, the runtimes of ten time steps using 128 cores of the Cray XT6m at Universität Duisburg-Essen are shown. We compare IFPACK, a one-level algebraic overlapping Schwarz preconditioner from Trilinos, our geometric one-level Schwarz preconditioner (OS1), the GDSW preconditioner without rotations (GDSW-nr), and the standard GDSW preconditioner for the structural block. We see that, although the computing times vary over the simulation time, the combination of the geometric overlap and a sufficiently large coarse space consistently reduces the runtime of the fully coupled monolithic FSI simulation by a factor of about two compared to the baseline given by IFPACK. Figure 7 shows the pressure and the deformation at t = 0. 007 s where we have the largest computation time per timestep, cf. Fig. 6.
References
P.R. Amestoy, I.S. Duff, J.-Y. L’Excellent, J. Koster, A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23 (1), 15–41 (2001)
D. Balzani, S. Deparis, S. Fausten, D. Forti, A. Heinlein, A. Klawonn, A. Quarteroni, O. Rheinbach, J. Schröder, Numerical modeling of fluid-structure interaction in arteries with anisotropic polyconvex hyperelastic and anisotropic viscoelastic material models at finite strains. Int. J. Numer. Methods Biomed. Eng. (2015). ISSN 2040-7947. http://dx.doi.org/10.1002/cnm.2756.
T.A. Davis, I.S. Duff, An unsymmetric-pattern multifrontal method for sparse LU factorization. SIAM J. Matrix Anal. Appl. 18 (1), 140–158 (1997)
S. Deparis, D. Forti, G. Grandperrin, A. Quarteroni, FaCSI: a block parallel preconditioner for fluid-structure interaction in hemodynamics, Technical Report 13, MATHICSE, EPFL, Lausanne, 2015
C.R. Dohrmann, O.B. Widlund, Hybrid domain decomposition algorithms for compressible and almost incompressible elasticity. Int. J. Numer. Methods Eng. 82 (2), 157–183 (2010)
C.R. Dohrmann, A. Klawonn, O.B. Widlund, Domain decomposition for less regular subdomains: overlapping Schwarz in two dimensions. SIAM J. Numer. Anal. 46 (4), 2153–2168 (2008a). ISSN 0036-1429
C.R. Dohrmann, A. Klawonn, O.B. Widlund, A family of energy minimizing coarse spaces for overlapping Schwarz preconditioners, in Domain Decomposition Methods in Science and Engineering XVII. Lecture Notes in Computational Science and Engineering, vol. 60 (Springer, Berlin, 2008b), pp. 247–254
L. Formaggia, M. Fernandez, A. Gauthier, J.F. Gerbeau, C. Prud’homme, A. Veneziani, The LifeV Project. Web. http://www.lifev.org (2016)
A. Heinlein, A. Klawonn, O. Rheinbach, Parallel two-level overlapping Schwarz methods in fluid-structure interaction, in Proceedings of the European Conference on Numerical Mathematics and Advanced Applications (ENUMATH), Ankara, September, 2015. Springer Lecture Notes on Computational Science and Engineering, vol. 112 (2016), pp. 521–530. TUBAF Preprint 15/2015: http://tu-freiberg.de/fakult1/forschung/preprints
M.A. Heroux, R.A. Bartlett, V.E. Howle, R.J. Hoekstra, J.J. Hu, T.G. Kolda, R.B. Lehoucq, K.R. Long, R.P. Pawlowski, E.T. Phipps, A.G. Salinger, H.K. Thornquist, R.S. Tuminaro, J.M. Willenbring, A. Williams, K.S. Stanley, An overview of the Trilinos project. ACM Trans. Math. Softw. 31 (3), 397–423 (2005)
G. Karypis, K. Schloegel, V. Kumar, ParMETIS - Parallel graph partitioning and sparse matrix ordering. Version 3.2, Technical Report, University of Minnesota, Department of Computer Science and Engineering, April 2011
M. Stephan, J. Docter, JUQUEEN: IBM Blue Gene/QⓇ Supercomputer System at the Jülich Supercomputing Centre. J. Large-Scale Res. Facil. 1, A1 (2015). ISSN 2364-091X. doi:10.17815/jlsrf-1-18. http://dx.doi.org/10.17815/jlsrf-1-18
A. Toselli, O. Widlund, Domain Decomposition Methods—Algorithms and Theory. Springer Series in Computational Mathematics, vol. 34 (Springer, Berlin, 2005). ISBN 3-540-20696-5
Acknowledgements
The authors acknowledge the use of the JUQUEEN BG/Q supercomputer (Stephan and Docter, 2015) at JSC Jülich, the use of the Cray XT6m at Universität Duisburg-Essen and the financial support by the German Science Foundation (DFG), project no. KL2094/3 and RH122/4.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Heinlein, A., Klawonn, A., Rheinbach, O. (2017). Parallel Overlapping Schwarz with an Energy-Minimizing Coarse Space. In: Lee, CO., et al. Domain Decomposition Methods in Science and Engineering XXIII. Lecture Notes in Computational Science and Engineering, vol 116. Springer, Cham. https://doi.org/10.1007/978-3-319-52389-7_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-52389-7_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52388-0
Online ISBN: 978-3-319-52389-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)