1 Introduction

We consider the solution of large, parallel distributed stationary and dynamic discretized elasticity problems with a moderate Poisson ratio; i.e., we do not consider the nearly incompressible case. As the solver, we use the Generalized Minimal Residual (GMRES) method preconditioned by two-level overlapping Schwarz preconditioners with Generalized Dryja–Smith–Widlund (GDSW) [2, 3] and Reduced dimension GDSW (RGDSW) [5, 12] coarse spaces. Even though these preconditioners can be constructed from the fully assembled system matrix, a minimum of geometric information is also needed. In particular, the domain decomposition interface and the null space are used for their construction. Here, we focus on the construction of fully algebraic GDSW type coarse spaces if this information is not available. In particular, we consider the case when the system matrix is uniquely distributed, such that the interface cannot be identified.

Therefore, we will describe a method to approximate the nonoverlapping subdomains, resulting in an approximate interface; cf. [10]. Our parallel implementation is based on the FROSch framework [9], which is part of the ShyLU package in Trilinos [13]; see [10, 11] for more details on the implementation. To discuss the performance of the fully algebraic approach, we will compare it to the classical GDSW type coarse spaces using all necessary information.

2 Model Problems

The equilibrium equation for an elastic body covering the domain Ω = [0, 1]3 under the action of a body force f in the time interval [0, T] is

(1)

with the symmetric Cauchy stress tensor σ and the displacement u. We consider a Saint Venant-Kirchhoff material, a hyperelastic material with the linear material law

(2)

and the nonlinear strain tensor given by \(\mathbf {E} := \frac {1}{2} \left ( \mathbf {C} - I \right ),\) where C is the right Cauchy-Green tensor. Furthermore, we consider the boundary conditions

$$\displaystyle \begin{aligned} \mathbf{u} & = 0 \qquad \mbox{on } \partial\Omega_D := \{0\} \times [0,1]^2, \\ \boldsymbol{\sigma} \cdot \mathbf{n} & = 0 \qquad \mbox{on } \partial\Omega_N :=\partial\Omega \setminus \partial\Omega_D,\\ \end{aligned}$$

and the body force f = (−20, 0, 0)T, for t < 5 ⋅ 10−3, and f = 0, afterwards.

In addition to this, we also consider a stationary problem with tt u = 0, i.e.,

(3)

We transform the model problems to their respective variational formulations and discretize them using piecewise linear or quadratic finite elements; we denote the corresponding finite element spaces by \(V^h = V^h \left (\Omega \right )\). For the time-dependent problem, the resulting semi-discrete problem is further discretized with the Newmark-β method. In particular, we choose the parameters for the fully implicit constant average acceleration method, i.e., β = 1∕2 and γ = 1∕4.

The fully discrete nonlinear equations are linearized using Newton’s method. We solve the equation

$$\displaystyle \begin{aligned} J(u^{(k)}) \delta u^{(k+1)} = R(u^{(k)}), \end{aligned} $$
(4)

for the update δu (k+1). Here, J(u (k)) and R(u (k)) are the Jacobian and the nonlinear residual for the solution u (k), respectively.

2.1 GDSW and RGDSW Preconditioners

We consider the system of linear equations (4) as derived in the previous section. For simplicity, we use the notation Ax = b in this section.

Let Ω be decomposed into nonoverlapping subdomains \(\left \lbrace \Omega _i\right \rbrace _{i=1}^N\) with typical diameter H and corresponding overlapping subdomains \(\left \lbrace \Omega _i^{\prime }\right \rbrace _{i=1}^N\), resulting from extending the nonoverlapping subdomains by k layers of elements. We define \(R_i: V^h \rightarrow V_i^h\), i = 1, …, N, as the restriction from the global finite element space V h to the local finite element space \(V_i^h := V^h \left (\Omega _i^{\prime }\right )\) and corresponding prolongation operators \(R_i^T\). In addition to that, we can also define restricted and scaled prolongation operators \(\tilde R_i^T\); cf., e.g., [1, 4, 7].

Furthermore, let

$$\displaystyle \begin{aligned} \Gamma := \left\lbrace x \in ( \overline{\Omega}_i \cap \overline{\Omega}_j ) \setminus \partial\Omega_D : i \neq j, 1 \leq i,j \leq N \right\rbrace \end{aligned} $$

be the interface of the nonoverlapping domain decomposition.

The GDSW preconditioner, which was introduced by Dohrmann, Klawonn, and Widlund in [2, 3], is a two-level additive overlapping Schwarz preconditioner with energy minimizing coarse space and exact solvers. Thus, the preconditioner can be written in the form

$$\displaystyle \begin{aligned} M_{\mathrm{GDSW}}^{-1} = \Phi A_0^{-1} \Phi^T + \sum_{i=1}^{N} R_i^T A_i^{-1} R_i, \end{aligned} $$
(5)

where \(A_i = R_{i} A R_{i}^T\). In the second level, we solve the coarse problem matrix A 0 = ΦT A Φ. The columns of Φ are the basis functions of the coarse space. To construct the GDSW coarse basis functions, let R Γj be the restriction from Γ onto the interface component Γj. For the GDSW coarse space in three dimensions, the interface components are the vertices, edges, and faces. Then, the values of the GDSW basis functions on Γ read

$$\displaystyle \begin{aligned} \Phi_\Gamma = \left[ \begin{array}{ccc} R_{\Gamma_1}^T \Phi_{\Gamma_1} & \ldots & R_{\Gamma_M}^T \Phi_{\Gamma_M} \end{array} \right], \end{aligned}$$

where the columns of \(\Phi _{\Gamma _j}\) are the restrictions of the null space of subdomain Neumann matrices to the interface component Γj. For elasticity, the null space consists of the rigid body motions, i.e., the translations and rotations. After partitioning the degrees of freedom into interface (Γ) and interior (I) ones, the matrix A can be written as

$$\displaystyle \begin{aligned} A = \begin{bmatrix} A_{II} & A_{I\Gamma} \\ A_{\Gamma I} & A_{\Gamma \Gamma} \end{bmatrix} \end{aligned} $$

and the GDSW coarse basis functions are the discrete harmonic extensions of ΦΓ into the interior,

$$\displaystyle \begin{aligned} \Phi = \left[ \begin{array}{c} \Phi_{I}\\ \Phi_\Gamma\\ \end{array} \right] = \left[ \begin{array}{c} -A_{II}^{-1} A_{I \Gamma } \Phi_\Gamma \\ \Phi_\Gamma \\ \end{array} \right]. \end{aligned} $$
(6)

The RGDSW coarse space is constructed similarly. However, in general, we only obtain one basis function for each vertex, resulting in a much smaller dimension of the coarse space; cf. [5] and, for more details on the parallel implementation in FROSch, [7, 12]. The reduction of the coarse space dimension can also be seen in Table 1. There are several variants of RGDSW coarse spaces, which differ in a scaling of the interface degrees of freedom. Here, we will only consider the most algebraic variant, which is denoted as Option 1 in [5]; cf. [7] for a detailed description of our implementation of Option 1 of the RGDSW coarse space.

Table 1 Comparison of coarse matrix sizes for a structured domain decomposition and the approximated subdomain maps for a P1 (Hh = 21) and P2 (Hh = 9) discretization

In our numerical simulations, we will also employ the recycling strategies presented in [7]. We always reuse the symbolic factorizations from previous time or Newton iterations. Moreover, we reuse the coarse space from previous iterations and, for the dynamic problem, additionally the coarse matrix. Furthermore, as in [7], we always use a scaled first level operator with overlap δ = 1h.

3 Fully Algebraic Construction of GDSW and RGDSW Coarse Spaces

As previously described, the construction of GDSW and RGDSW coarse spaces for elasticity problems requires both the domain decomposition interface and the null space of the operator, i.e., the rigid body motions. Here, we describe how we construct the coarse space if this information is not available.

Algebraic Approximation of the Interface

If the distribution of the system matrix is unique, the interface cannot be recovered. Therefore, we will carry out the following process to approximate the nonoverlapping subdomains and hence the interface. Starting from the unique distribution, we first add one layer of elements to each subdomain. The overlap of the resulting domain decomposition now contains the interface but also other finite element nodes. In order to reduce the number of unnecessary nodes, we compare the subdomain ID of the original unique decomposition and the decomposition with one layer of overlap and remove nodes from the overlapping subdomains if the subdomain ID is lower compared to the original decomposition; this process is sketched in [10] and Fig. 1.

Fig. 1
figure 1

Sketch of the approximation of the nonoverlapping subdomains and the interface, respectively: uniquely distributed map (left); extension of the uniquely distributed map by one layer of elements resulting in an overlapping map, where the overlap contains the interface (middle); by selection, using the lower subdomain ID, the a map approximating to the nonoverlapping subdomains is constructed (right). Taken from [10]

Incomplete Null Space

The rigid body modes are the translations and rotations of the elastic body. The translations are constant functions which can be constructed without any geometric information. Since we are not able to compute the rotations from the fully assembled matrix and without coordinates of the finite element nodes, we just omit them in the fully algebraic coarse space; see also [11]. For the results in Sect. 4, only the number of iterations is negatively affected by omitting rotations from the coarse space but the time to solution actually benefits from the smaller coarse space. Note that, from theory, the rotational basis functions are necessary for numerical scalability. Therefore, we expect that there are problems for which the full coarse space performs better.

4 Numerical Results

In this section, we compare the GDSW and RDSW preconditioners with exact interface maps and full coarse space, GDSW and RGDSW preconditioners with exact interface map but without rotational basis functions, and the fully algebraic variant with approximated interface and without rotational basis functions; for the sake of brevity, we denote the three variants as “rotations”, “no rotations”, and “algebraic”, respectively. As discussed in Sect. 2, we consider a stationary elasticity problem with homogeneous shear modulus of μ = 5 ⋅ 103 and a dynamic elasticity problem with two material phases; cf. Fig. 2 (left) for a graphical representation of the coefficient distribution of the shear modulus. For both cases, we choose ν = 0.4. For the stationary homogeneous model problem, we use structured grids and structured decompositions into square subdomains, whereas for the dynamic problem, we use a fixed unstructured tetrahedral mesh with roughly 3.3 million elements and 588 k nodes. We use the inexact Newton method of Eisenstat and Walker [6] with a type 2 forcing term until a relative residual of 𝜖 nl = 10−8 is achieved. The initial forcing term is η init = 10−3 and the maximum and minimum forcing terms are η max = 10−2 and η min = 10−8, respectively. Therefore, we use a combination of the Trilinos packages Thyra and NOX. Furthermore, NOX is used for a backtracking globalization strategy. In particular, the step length is chosen as 0.5l with l = 0, 1, … until the Armijo condition is satisfied. All linearized problems are solved with right-preconditioned GMRES with the corresponding GDSW and RGDSW preconditioners and the tolerance for the relative residual error is the forcing term η. All computations were carried out on the supercomputer magnitUDE of the University Duisburg-Essen, Germany.

Fig. 2
figure 2

Left: Slice through elements with high coefficient (μ high = 103) displayed as a wireframe. Low coefficient is μ low = 1; cf. [8], for a detailed discussion of the foam geometry used for an heterogeneous Poisson problem. Right: Solution of dynamic problem at T = 10−2 for Δt = 10−3 with a warp filter and a 5 times scaling factor

In Tables 2 and 3, weak scaling results for the stationary model problem with piecewise linear and piecewise quadratic elements are depicted. Although, iteration counts are slightly higher for the RGDSW coarse spaces compared to the respective GDSW coarse spaces, the total computation time is much smaller for RGDSW due to the lower dimension of the coarse problem. This effect is even stronger for larger numbers of subdomains and cores; cf. Table 1. Furthermore, we observe competitive iteration counts and computing times when using the fully algebraic coarse spaces. In addition to that, the approximation strategy for the interface seems to perform better for piecewise linear than for piecewise quadratic elements.

Table 2 Stationary problem, discretization P1 (Hh = 21), iteration counts are averages over all Newton iterations. All problems were solved in 4 Newton iterations. The three timings are for the setup/solve/total time and are in seconds. All total times are highlighted
Table 3 Stationary problem, discretization P2 (Hh = 9), iteration counts are averages over all Newton iterations. All problems were solved in 4 Newton iterations. The three timings are for the setup/solve/total time and are in seconds. All total times are highlighted

In Fig. 3, we present strong scaling results from 48 to 720 cores for the dynamic model problem. The reported times are the total times for our preconditioners, i.e., the sum of the times needed for their construction and their applications in GMRES. We solve the problem with Δt = 10−3 up to a final time T = 2 ⋅ 10−2 using the RGDSW rotations coarse space and using the RGDSW algebraic coarse space both with matrix recycling; cf. [7]. Here, we observe very good strong scalability results for both variants even though the model problem has coefficient jumps. Again, the fully algebraic variant is competitive.

Fig. 3
figure 3

Strong scaling for dynamic problem up to time T = 2 ⋅ 10−2 for the foam geometry