A Massively Parallel Multigrid Method with Level Dependent Smoothers for Problems with High Anisotropies

Reiter, Sebastian; Vogel, Andreas; Nägel, Arne; Wittum, Gabriel

doi:10.1007/978-3-319-47066-5_45

Sebastian Reiter⁴,
Andreas Vogel⁴,
Arne Nägel⁴ &
…
Gabriel Wittum⁴

1737 Accesses
1 Citations

Abstract

Anisotropic layers, as often seen in biological and geological domains, impose difficulties to several aspects of numerical simulations. In this article we examine how the highly scalable approach to massively parallel geometric multigrid solvers presented in Reiter et al. (Comput Vis Sci 16(4):151–164, 2013) can be extended to problem domains featuring such anisotropies. Considering the real world problem of drug diffusion through the human skin we combine hierarchically distributed multigrids, anisotropic refinement, and level dependent smoothing strategies to create a robust and highly scalable multigrid solver for anisotropic domains.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Highly Parallel Multigrid Solvers for Multicore and Manycore Processors

Parallel Solvers for Numerical Upscaling

Parallel Finite Cell Method with Adaptive Geometric Multigrid

Keywords

1 Introduction

The development of algebraic solvers for discretizations of partial differential equations on massively parallel computers is an active research field. Multigrid methods [7] have been employed with great efficiency for elliptic PDEs on large super-computers [1, 3, 4, 6, 9, 10, 13–15, 17]. In [13] we demonstrated that the geometric multigrid solver of the software package UG4 [16] has nearly optimal weak scaling properties for up to 262,144 processes and more than 10¹⁰ unknowns. The study indicates that very good scalability should be achievable for even higher numbers of processes and unknowns with the given approach, once the required resources are available.

An issue that had not been fully addressed in previous studies are the difficulties to solve those equations on massively parallel computers in the presence of anisotropic coefficients or anisotropic elements in the underlying grid. Techniques exist to address those problems for general settings for lower process numbers, e.g., involving the use of anisotropic refinement to construct a specialized grid hierarchy [2]. While parallelism is considered in [2], massively parallel systems as todays supercomputers with hundred thousands of computing cores did not exist at that time and no optimizations regarding massive scalability have thus been performed. In [10] massively parallel multigrid is described for the solution of elliptic PDEs for the special case with strong vertical anisotropies on structured grids.

Considering the real world problem of drug diffusion through the human skin [11], we extended the methods described in [2] to construct a method that employs geometric multigrid on massively parallel computers for problems with highly anisotropic elements using a combination of specialized refinement techniques and smoothers resulting in a robust and highly scalable solver for anisotropic problems. The special grid layout of the model problem thereby requires a solver which can handle anisotropies in all spatial directions on unstructured grids.

2 Problem Description

The motivating biological question for the construction of our solver is the numerical simulation of substance transport through the human skin. Using simulations of such processes helps to estimate the risk assessment of chemical exposures and at the same time the need for in vitro and in vivo testing can be reduced. However, the special structure of the human skin imposes several numerical challenges due to the anisotropic geometry and physical coefficients varying by orders of magnitude, cf., e.g., [8, 11]. The uppermost part of the skin, called stratum corneum (SC), consists of multiple layers of cells (corneocytes) which are connected by thin channels (lipid layers) (cf. Fig. 1). Since parameters affecting transport and diffusivity of a substance vary strongly in those subdomains, both have to be considered in a simulation. For benchmark purposes we solve a modified heat equation

$$\displaystyle{ \frac{\partial } {\partial t}(Ku) + abla \cdot (-\mathbf{D}Kabla u) = 0}$$

in a computational domain Ω = Ω _lip ∪Ω _cor. Here, u corresponds to the chemical activity, K = K(x) and D = D(x) are spatially dependent partition and diffusion coefficients respectively. Concentrations are given by c = Ku and may undergo discontinuities which reflect domain dependent variations in lipophilicity and hydrophilicity. For reasons of simplicity we used K = 1 and

$$\displaystyle{ D(x) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1 \quad &\mbox{ if }x \in \varOmega _{lip} \\ 10^{-3}\quad &\mbox{ if }x \in \varOmega _{cor}. \end{array} \right. }$$

For this study we focused on the idealized but still realistic brick-and-mortar domain. To this end, we used finite element grids consisting of highly anisotropic elements. This level of anisotropy is required to reduce the number of elements in the coarse grids, while still capturing the topology and morphology of the underlying domain.

For the considered geometry only hexahedral elements with varying degrees of anisotropy were used. The elements used to construct the coarse grid and an overview over the resulting FE-grid are depicted in Fig. 2. In the given example the highest aspect ratios are 1:15, however, the presented methods work for much higher aspect ratios, too.

The steady state solution of the regarded problem setup is depicted in Fig. 3.

3 Solver Setup

In [13] we employed a massively parallel geometric multigrid solver for the Poisson problem on grids with isotropic cells. While the rather simple Jacobi smoother used in those studies is fast and perfectly scalable, it is not suitable for anisotropic problems, since its smoothing properties deteriorate in this case. Iterative methods like the ILU method on the other hand are known to possess good smoothing properties also for highly anisotropic problems (cf. [2, 5, 18]). However, for most applicable methods an efficient parallel implementation is not feasible. The typical strategy for parallelization is then to employ those methods in Block-Jacobi-type fashion, i.e., on each process the more sophisticated smoother is executed locally, while interprocess couplings are treated using a Jacobi method.

While this setup works nicely for smaller process numbers, the iteration numbers typically increase with the number of processes being involved. As a second aspect, load imbalances can have a severe impact on the runtime of solver initializations and such load imbalances are typically given for massively parallel simulations on unstructured grids. This in particular holds true, when setup times do not grow like O(n), as, e.g., for a threshold based ILUT.

In order to construct an optimal solver which can handle high anisotropies while still providing nearly optimal scalability, we combined the benefits of the Jacobi smoother for isotropic elements with the efficiency and robustness of the ILU smoothers for anisotropic problems. This combination is possible using a special refinement technique, which reduces the anisotropy of elements with each refinement until the resulting grid can be considered isotropic. The anisotropic and isotropic refinement rules used for the elements from Fig. 2 are depicted in Fig. 4.

In each refinement step only edges which are longer than a certain threshold are refined. Starting with a threshold of half of the length of the longest edge, the threshold is halved after each step so that shorter edges will be refined in the next iteration. For the shown brick-and-mortar geometry this technique leads to an isotropic grid after a certain number of refinements, depending on the highest aspect ratio.

On the lower levels in which anisotropic elements are present we employ a multigrid method with robust ILU smoothing, whereas on higher levels, which contain nearly isotropic elements, a highly scalable multigrid method with Jacobi smoothing is used. This setup is even more justified considering the fact, that lower levels can be distributed to only a subset of the available processes, since they only contain a fraction of the elements of higher levels. We can thus make sure that complex ILU smoothing is only performed on a fixed number of processes, thus not interfering with the scalability on higher levels. To this end, we are using the hierarchical distribution approach described in [13]. Technically this setup is realized in the software package UG4 [16] by employing a multigrid method with ILU smoothing on lower levels as a base solver for the multigrid method with damped Jacobi smoothing (omega = 0. 5) on higher levels. The prescribed tolerance for the intermediate coarse grid solver was a relative reduction by three orders of magnitude. Table 1 shows the different properties of the involved solver components.

Table 1 Solver components: solvers on lower levels serve as base solvers for the preconditioners on higher levels

Full size table

4 Parallelization

Parallelization is performed using the approach detailed in [13]. Hierarchical distribution hereby plays an important role, since we apply the multigrid method with ILU smoothing on a smaller subset of processes only. The base solver for this lower ILU smoothed multigrid method may finally run on one process only.

Since anisotropic refinement may lead to load-imbalances, redistributions of the grid hierarchy may be necessary to improve those balances. To this end, we are using a fast parallel bisection strategy for distributed multigrid hierarchies as described in [12].

All communication between different processes is realized through the message passing interface MPI.

5 Results

A weak scaling study was performed on the Cray XC40 super computer Hazel Hen at the HLRS Stuttgart which features 7712 compute nodes, each with 128 GB of memory, 24 cores per node (virtually 48 through hyperthreading) and a peak performance of 7420 TFlops.

The study was performed by solving the aforementioned human skin brick-and-mortar model using the described solver setup. To allow for better comparability of the different runs, the number of outer CG-iterations was thereby fixed to 12, which resulted in a relative reduction of the defect by approximately 10⁻⁶ in all runs. The study starts on 1 process and for each subsequent run we refine the grid once more using regular refinement, thus increasing the number of elements by a factor of 8. At the same time we also increase the number of involved processes by a factor of 8 to guarantee a constant workload per process for all runs. Since we executed the parallel base-solver of the outer multigrid method on level 4, our study starts with level 5. Table 2 shows the number of unknowns and the run times of the different runs. The scaling behavior of assembly, solver initialization, and solving is also shown in Fig. 5.

Table 2 Each line corresponds to an individual run. Recorded are the number of processes (PEs), the number of levels (Levels), the number of unknowns (DoFs), the run times of assembly (T _ass), solver initialization (T _ini), and solving (T _sol)

Full size table

Table 3 gives an overview over levelwise distribution qualities. The distribution quality q _l of a level l of the hierarchy is computed as

$$\displaystyle{ q_{l}:= \frac{n_{l}^{total} - n_{l}^{max}} {n_{l}^{max} \cdot (P_{l} - 1)}, }$$

where P _l > 1 is the number of processes of the given process-hierarchy on level l, n _l ^p is the number of elements in level l on process p, and

$$\displaystyle\begin{array}{rcl} n_{l}^{total}&:=& \sum _{ p=1}^{P_{l} }n_{l}^{p}, {}\\ n_{l}^{max}&:=& \max _{ p=1,\ldots,P_{l}}n_{l}^{p}. {}\\ \end{array}$$

For P _l = 1 numerator and denominator both vanish and we define q _l = 1. q _l is thus in the range [0, 1], where q _l = 0 means that all elements of level l are contained on one process only and q _l = 1 reflects an equal share of elements amongst all processes.

Table 3 Distribution qualities for each level of the multigrid hierarchy for the different runs

Full size table

In Table 4 the number of processes used on each level for the individual runs are specified. They were chosen so that each level on each process would at least contain 32 elements (given an ideal load-balance). The first redistribution is performed for up to 256 processes. For each further redistribution the number of processes is multiplied by 64 and capped, if the maximum number of processes is reached.

Table 4 Number of processes used on each level for the individual runs

Full size table

Both matrix assembly and solver initialization are performed process locally. The increase in run time T _ass and T _ini from 8 to 64 processes is related to the slight load-imbalance which can be observed for higher process numbers (cf. Table 3). Nevertheless, the scaling behavior of both assembly and initialization is very good and perfectly suited for large scale parallel runs.

The solver scalability is satisfactory as well. The run times T _sol increase slightly the more processes are involved and two effects are in play here: The distribution quality of the grid hierarchy deteriorates the larger the number of processes involved. This reflects the fact that some processes have more work to do than others in each program section due to the slight load imbalance. The slight imbalance is to be expected for an unstructured grid in which no special properties can be exploited for partitioning. However, for runs up to 256 processes the increasing parallelization of the intermediate base-solver on level 4 has a positive effect on total solver run times.

As demonstrated in [13], the underlying multigrid implementation in UG4 has nearly optimal scaling properties for perfectly balanced grids. The slightly worse scaling properties in the study at hand are thus likely to be linked to the observed load-imbalance. Nevertheless, given the complexity of the problem at hand we think that the achieved run-times are still convincing. The achieved results demonstrate the applicability of the presented approach to gain insight into complex biological processes through high-resolution numerical simulations on massively parallel computers.

References

Baker, A.H., Falgout, R.D., Kolev, T.V., Yang, U.M.: Multigrid smoothers for ultra-parallel computing. SIAM J. Sci. Comput. 33, 2864–2887 (2011)
Article MathSciNet MATH Google Scholar
Bastian, P., Wittum, G.: Adaptive multigrid methods: the UG concept. In: Adaptive Methods – Algorithms, Theory and Applications: Proceedings of the Ninth GAMM-Seminar, Kiel, 22–24 Jan 1993, pp. 17–37. Vieweg+Teubner Verlag, Wiesbaden (1994)
Google Scholar
Bastian, P., Blatt, M., Scheichl, R.: Algebraic multigrid for discontinuous Galerkin discretizations of heterogeneous elliptic problems. Numer. Linear Algebra Appl. 19 (2), 367–388 (2012)
Article MathSciNet MATH Google Scholar
Bergen, B., Gradl, T., Rude, U., Hulsemann, F.: A massively parallel multigrid method for finite elements. Comput. Sci. Eng. 8 (6), 56–62 (2006)
Article Google Scholar
Bramble, J., Zhang, X.: Uniform convergence of the multigrid v-cycle for an anisotropic problem. Math. Comput. 70 (234), 453–470 (2001)
Article MathSciNet MATH Google Scholar
Gmeiner, B., Köstler, H., Stürmer, M., Rüde, U.: Parallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurr. Comput.: Pract. Exp. 26 (1), 217–240 (2014)
Article Google Scholar
Hackbusch, W.: Multi-grid Methods and Applications, vol. 4. Springer, Berlin/New York (1985)
MATH Google Scholar
Heisig, M., Lieckfeldt, R., Wittum, G., Mazurkevich, G., Lee, G.: Non steady-state descriptions of drug permeation through stratum corneum. I. The biphasic brick-and-mortar model. Pharm. Res. 13 (3), 421–426 (1996)
Google Scholar
Heppner, I., Lampe, M., Nägel, A., Reiter, S., Rupp, M., Vogel, A., Wittum, G.: Software framework ug4: parallel multigrid on the hermit supercomputer. In: High Performance Computing in Science and Engineering 12, pp. 435–449. Springer, Berlin/London (2013)
Google Scholar
Müller, E.H., Scheichl, R.: Massively parallel solvers for elliptic partial differential equations in numerical weather and climate prediction. Q. J. R. Meteorol. Soc. 140 (685), 2608–2624 (2014)
Article Google Scholar
Nägel, A., Heisig, M., Wittum, G.: Detailed modeling of skin penetration–an overview. Adv. Drug Deliv. Rev. 65 (2), 191–207 (2013) Modeling the human skin barrier – towards a better understanding of dermal absorption
Google Scholar
Reiter, S.: Effiziente Algorithmen und Datenstrukturen für die Realisierung von adaptiven, hierarchischen Gittern auf massiv parallelen Systemen. PhD thesis, Universität Frankfurt am Main (2014)
Google Scholar
Reiter, S., Vogel, A., Heppner, I., Rupp, M., Wittum, G.: A massively parallel geometric multigrid solver on hierarchically distributed grids. Comput. Vis. Sci. 16 (4), 151–164 (2013)
Article MATH Google Scholar
Sampath, R.S., Biros, G.: A parallel geometric multigrid method for finite elements on octree meshes. SIAM J. Sci. Comput. 32, 1361–1392 (2010)
Article MathSciNet MATH Google Scholar
Sundar, H., Biros, G., Burstedde, C., Rudi, J., Ghattas, O., Stadler, G.: Parallel geometric-algebraic multigrid on unstructured forests of octrees. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 43:1–43:11, Los Alamitos. IEEE Computer Society Press (2012)
Google Scholar
Vogel, A., Reiter, S., Rupp, M., Nägel, A., Wittum, G.: UG 4: a novel flexible software system for simulating PDE based models on high performance computers. Comput. Vis. Sci. 16 (4), 165–179 (2013)
Article MATH Google Scholar
Williams, S., Lijewski, M., Almgren, A., Van Straalen, B., Carson, E., Knight, N., Demmel, J.: s-step Krylov subspace methods as bottom solvers for geometric multigrid. In: 28th International Parallel and Distributed Processing Symposium, pp. 1149–1158. IEEE, Piscataway (2014)
Google Scholar
Wittum, G.: On the robustness of ILU smoothing. SIAM J. Sci. Stat. Comput. 10 (4), 699–717 (1989)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank the HLRS for the opportunity to use Hazel Hen and their kind support.

Author information

Authors and Affiliations

G-CSC, Goethe-Universität Frankfurt, Kettenhofweg 139, 60325, Frankfurt (M.), Germany
Sebastian Reiter, Andreas Vogel, Arne Nägel & Gabriel Wittum

Authors

Sebastian Reiter
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Vogel
View author publications
You can also search for this author in PubMed Google Scholar
Arne Nägel
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Wittum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastian Reiter .

Editor information

Editors and Affiliations

Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Wolfgang E. Nagel
Abteilung für Angewandte Mathematik, Universität Freiburg, Freiburg, Germany
Dietmar H. Kröner
Höchstleistungsrechenzentrum Stuttgart (HLRS), Universität Stuttgart, Stuttgart, Germany
Michael M. Resch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Reiter, S., Vogel, A., Nägel, A., Wittum, G. (2016). A Massively Parallel Multigrid Method with Level Dependent Smoothers for Problems with High Anisotropies. In: Nagel, W.E., Kröner, D.H., Resch, M.M. (eds) High Performance Computing in Science and Engineering ´16. Springer, Cham. https://doi.org/10.1007/978-3-319-47066-5_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-47066-5_45
Published: 12 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47065-8
Online ISBN: 978-3-319-47066-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

A Massively Parallel Multigrid Method with Level Dependent Smoothers for Problems with High Anisotropies

Abstract

Similar content being viewed by others