1 Introduction

High-order finite element methods (FEM) enjoy an increasing interest in computational science and engineering. They include hp-FEM, spectral element methods (SEM) as well as discontinuous Galerkin methods [7, 16]. The motive that drives the development of high-order methods lies in their potential to deliver accuracy with lower cost in comparison to the first and second order methods used in common simulation tools [29]. However, realizing this advantage in practice is a formidable task. Along with curvilinear mesh generation, the provision of efficient solvers for the resulting algebraic equation systems remains the main challenge.

Projection methods for incompressible flow, or implicit discretization of diffusion terms lead to a sequence of linear elliptic problems which are related or equivalent to the Poisson equation or, more generally, the Helmholtz equation [10]. Fast solvers for such equations are therefore a crucial ingredient of competitive high-order methods and, hence, have been in focus of research for almost 30 years [1, 5, 8, 11,12,13, 15, 17,18,19, 21, 22, 24,25,26]. For Helmholtz or Poisson problems discretized on regular meshes, efficient multigrid (MG) techniques have been developed recently. [19] proposed additive Schwarz smoothers based on extended element domains, which attain residual reductions of approximately 0.2 within one sweep. They found that weighting the overlapping Schwarz updates by the inverse of the counting matrix, which corresponds to taking the arithmetic mean, plays a crucial role in obtaining multigrid-like iteration counts. A detailed analysis of the method was given in [18]. [13] presented a similar multigrid approach for the p-finite element method on locally refined Cartesian meshes. They used a multiplicative Schwarz smoother on element domains which possess only a minimal overlap confined to the element boundaries. [11] developed a p-multigrid method based on static condensation which, apart from pre- and post-processing, reaches linear complexity. The proposed block smoother can be classified as an additive Schwarz method using a monotonic increasing shape function for blending the overlapping updates. Using this smoother the multigrid method attained convergence rates of about 0.02 combined with a run-time efficiency that comes close to fast direct finite difference solvers. The success of this approach inspired us to extend the idea of nonuniform weighting to the full, “uncondensed” problem and thus led to the present work. The primary goal is to show how nonuniform weighting can be used to boost the performance of high-order spectral-element multigrid techniques. Further, we investigate the influence of the overlap width, smoothing strategies, additive versus multiplicative Schwarz methods and Krylov acceleration on robustness and efficiency. In addition to this, we consider the extension to diffusion problems with variable coefficients.

The remainder of the paper is organized as follows: Sect. 2 provides a brief description of the spectral element discretization. Section 3 presents the solution techniques, namely the weighted additive and multiplicative Schwarz methods, the p-multigrid method and the inexact multigrid-preconditioned conjugate gradient method. Section 4 proceeds with the discussion of numerical experiments for assessing the solution methods and application to variable diffusion. Finally, Sect. 5 concludes the paper.

2 Discretization

As the model problem we consider the Poisson equation

$$\begin{aligned} -\nabla ^2 u = f \end{aligned}$$
(1)

in the rectangular domain \(\varOmega =[0,{\ell }_x]\times [0,{\ell }_y]\) with periodic boundaries. For discretization \(\varOmega \) is decomposed into \(n_{\textsc {e}}=n_x \times n_y\) rectangular elements \(\varOmega ^{mn}\) with dimensions \(\Delta x = {\ell }_x/n_x\) and \(\Delta y = {\ell }_y/n_y\). In each element the solution is approximated as

$$\begin{aligned} u(x,y)|_{\varOmega ^{mn}} \simeq \sum _{i,j=0}^{p} u^{mn}_{ij} \, {\varphi }_i\big (\xi ^m(x)\big ) \, {\varphi }_j\big (\eta ^n(y)\big ) \end{aligned}$$
(2)

where \({\varphi }_i\) are the Lagrange polynomials to the Gauss-Lobatto-Lengendre (GLL) points \(\{\xi _i\}_{i=0}^{p}\) in the one-dimensional standard region \({\hat{\varOmega }} = [-1,1]\) and \(\xi ^m(x)\), \(\eta ^n(y)\) the mapping of coordinates from \(\varOmega ^{mn}\) to \(\hat{\varOmega }\) [7, 16]. Concatenation of the element coefficients \({{\varvec{u}}}^{mn} = [u^{mn}_{ij}]\) and enforcing continuity for shared vertices and edges yields the unique global coefficients \({{\varvec{u}}}\) [see, e.g. [7], pp. 191–194]. Application of the Galerkin spectral element method leads to the discrete equations

$$\begin{aligned} {{\varvec{A}}} {{\varvec{u}}} = {{\varvec{f}}} \,. \end{aligned}$$
(3)

As a consequence of the tensor product ansatz (2) and the Cartesian mesh, the global system matrix in Eq. (3) assumes the tensor product form

$$\begin{aligned} {{\varvec{A}}} = {{\varvec{M}}}_y \otimes {{\varvec{L}}}_x + {{\varvec{L}}}_y \otimes {{\varvec{M}}}_x, \end{aligned}$$
(4)

where \({{\varvec{M}}}_*\) and \({{\varvec{L}}}_*\) represent the one-dimensional mass and stiffness matrices for directions \(*=x,y\), respectively. The detailed structure of these operators and underlying spectral element techniques are well described in literature [7, 16] and therefore deliberately skipped here.

3 Solution Methods

For solving Eq. (3) we consider polynomial multigrid (MG) and multigrid-preconditioned conjugate gradients (MGCG). Both approaches rely on Schwarz methods for smoothing. We first present the Schwarz methods and then sketch MG and MGCG.

3.1 Schwarz Methods

Schwarz methods are iterative domain decomposition techniques which improve the approximate solution by parallel or sequential subdomain solves, leading to additive or multiplicative methods, respectively. Following [19] we use extended element regions as the subdomains. Figure 1 illustrates how the subdomain \(\varOmega _{s}\) results from the corresponding element domain \(\varOmega ^{mn}\) by attaching a rectangular strip matching the overlap width \(\delta _{\textsc {o}}\). As consequence, \(\varOmega _{s}\) adopts \(n_{\textsc {o}}\) layers of additional nodes from the neighbor elements. Note, however, that we exclude the outer layer of nodes located on \(\partial \varOmega _{s}\). For definiteness we define the overlap width in terms of \(n_{\textsc {o}}\) and the GLL points,

$$\begin{aligned} \delta _{\textsc {o}}= \xi _{n_{\textsc {o}}+ 1} + 1. \end{aligned}$$
(5)
Fig. 1
figure 1

Example of a subdomain used with the Schwarz method. The shaded area represents \(\varOmega _s\) and the dark region in its center the corresponding element. The circles are the GLL nodes for polynomial order \(p=8\). Filled circles indicate the nodes that are solved for and updated; \(\delta _{\textsc {o}}\) is the overlap width

To derive a local correction to some approximate solution \(\tilde{{\varvec{u}}}\) we first convert Eq. (3) into the equivalent residual form

$$\begin{aligned} {{\varvec{A}}} \Delta {{\varvec{u}}} = {{\varvec{f}}} - {{\varvec{A}}} {\tilde{{\varvec{u}}}} = {\tilde{{\varvec{r}}}}, \end{aligned}$$
(6)

where \({\Delta {{\varvec{u}}} = {{\varvec{u}}} - {\tilde{{\varvec{u}}}}}\). Further we introduce the restriction operator \({{\varvec{R}}}_s\) such that \({{\varvec{u}}}_s = {{\varvec{R}}}_s {{\varvec{u}}}\) gives the coefficients associated with \(\varOmega _{s}\). Conversely, the transposed restriction operator globalizes any local coefficients by adding zeros for exterior nodes. With these prerequisites the correction contributed by \(\varOmega _{s}\) is defined as the solution of the subproblem

$$\begin{aligned} {{\varvec{A}}}_{ss} \Delta {{\varvec{u}}}_s = {{\varvec{r}}}_s, \end{aligned}$$
(7)

where \({{\varvec{A}}}_{ss} = {{\varvec{R}}}_s {{\varvec{A}}} {{\varvec{R}}}^t_s\) represents the restricted system matrix and \({{\varvec{r}}}_s = {{\varvec{R}}}_s {\tilde{{\varvec{r}}}}\) the restricted residual. Due to the rectangular shape of the subdomain, \({{\varvec{A}}}_{ss}\) inherits the tensor product structure of \({{\varvec{A}}}\). Using the fast diagonalization technique developed by [20] and adopted for SEM e.g. in [6], the inverse subdomain operator can be expressed in the form

$$\begin{aligned} {{\varvec{A}}}_{ss}^{-1} = ({{\varvec{S}}}_y \otimes {{\varvec{S}}}_x) ({{\varvec{I}}} \otimes \varvec{\varLambda }_x + \varvec{\varLambda }_y \otimes {{\varvec{I}}})^{-1} ({{\varvec{S}}}_y^t \otimes {{\varvec{S}}}_x^t), \end{aligned}$$
(8)

where \({{\varvec{I}}}\) is the unity matrix, \({{\varvec{S}}}_{*}\) the matrix of eigenvectors to the generalized eigenproblem for the restricted one-dimensional stiffness and mass matrices, and \(\varvec{\varLambda }_{*}\) the diagonal matrix of eigenvalues for directions \(*=x,y\). With equidistant meshes, as in the present case, the operators are identical for all subdomains and, hence, the cost for their pre-computation becomes negligible. Exploiting the tensor-product structure of the inverse, the solution to a single subdomain, \(\Delta {{\varvec{u}}}_s = {{\varvec{A}}}_{ss}^{-1} {{\varvec{r}}}_s\), can be evaluated with just \(\varTheta \big (2(p+1+2n_{\textsc {o}})^3\big )\) operations.

There exist several options for combining the local solutions. We consider a weighted version of the additive Schwarz method and the multiplicative Schwarz method. The multiplicative Schwarz method solves the subproblems (7) consecutively while continually updating the residual. Note that, in general, one multiplicative Schwarz iteration corresponds to the application of a non-symmetric linear operator, albeit \({{\varvec{A}}}\) is symmetric. However, for an even number of steps, the method is symmetrized by reversing the order of subdomains in each step, which leads to Algorithm 1.

figure a
figure b

The weighted additive Schwarz method determines all local corrections independently and computes the global correction as a linear combination of these results, i.e.

$$\begin{aligned} \Delta {{\varvec{u}}} \simeq \sum _{s} {{\varvec{R}}}_{s}^t ({{{\varvec{W}}}} \Delta {{\varvec{u}}}_{s}), \end{aligned}$$
(9)

where \({{{\varvec{W}}}}\) is a diagonal local weight matrix. Application of Eq. (9) leads to Algorithm 2. Note that \({{{\varvec{W}}}} = {{\varvec{I}}}\) recovers the classical additive Schwarz method. The arithmetic mean employed in [19] is obtained by choosing \({{{\varvec{W}}}} = {{\varvec{R}}}_s {{\varvec{C}}}^{+} {{\varvec{R}}}^t_s\), where \({{\varvec{C}}}^{+}\) is the pseudoinverse of the counting matrix \({{\varvec{C}}} = \sum _s {{\varvec{R}}}^t_s {{\varvec{R}}}_s\). We propose a more flexible approach which elevates the weights gradually from zero at the border to one in the core zone. Due to the regular shape of \(\varOmega _s\) the weights can be cast in the tensor product form \({{{{\varvec{W}}} = {{\varvec{W}}}_y \otimes {{\varvec{W}}}_x}}\). The one-dimensional weight distributions \({{{\varvec{W}}}_{*}}\) are generated from the continuous weighting function

$$\begin{aligned} {w_{{\kappa }}(\xi ) = \frac{1}{2} \left[ \phi _{{\kappa }}\left( \frac{\xi + 1}{\delta _{\textsc {o}}}\right) - \phi _{{\kappa }}\left( \frac{\xi - 1}{\delta _{\textsc {o}}}\right) \right] ,} \end{aligned}$$
(10)

where \(\xi \) is the 1D standard coordinate extended beyond \(\hat{\varOmega }\) and \(\phi _{{\kappa }}\) is a weakly monotonic increasing shape function. In particular we consider the shape functions \(\phi _{{\kappa }}\) with \({{\kappa }}\in \{1,3,5,\dots \}\) defined as

$$\begin{aligned} \phi _{{\kappa }}(x) = \left\{ \begin{array}{l@{\qquad }l} \hat{\phi }_{{\kappa }}(x) &{} x \in {\hat{\varOmega }} \\ {\text {sgn}}(x) &{} \text {else} \end{array} \right. \end{aligned}$$
(11)

where \(\hat{\phi }_{{\kappa }}\) is a polynomial of degree i satisfying the conditions

$$\begin{aligned}&{\hat{\phi }}_{{\kappa }}(\pm 1) = \pm 1 \end{aligned}$$
(12a)
$$\begin{aligned}&\frac{{\text {d}}^k {\hat{\phi }}_{{\kappa }}}{{\text {d}}x^k}(\pm 1) = 0, \quad 0 < k \le ({{\kappa }}-1)/2. \end{aligned}$$
(12b)

The \({\hat{\phi }}_{{\kappa }}\) are strictly monotonic in \((-1,1)\) and generate a smooth transition of the weight function in the overlap zone, as exemplified in Fig. 2 for the quintic case. By increasing the polynomial degree the shape function converges toward the sign function, which translates into a top hat weighting function. We remark that omitting the shape function in Eq. (10) yields the arithmetic mean. For reference, Table 1 summarizes all weight functions used in the numerical experiments.

Fig. 2
figure 2

One-dimensional weight distribution for elements of order \(p=16\) with an overlap of \(n_{\textsc {o}}=2\) points using a quintic shape function. The core region and the overlap zone of the subdomain are shaded in dark and light blue, respectively. Filled circles indicate the node positions (Color figure online)

Table 1 Weight functions (WF) and related shape functions

3.2 Multigrid

For MG we define a series of polynomial levels \(\{p_l\}\) with \(p_l = 2^l\) increasing from 1 at \(l=0\) to p at top level L. Correspondingly, \({{\varvec{u}}}_l\) denotes the global coefficients and \({\varvec{A}}_l\) the system matrix on level l. On the top level we have \({\varvec{u}}_L = {{\varvec{u}}}\) and \({{\varvec{A}}}_L = {{\varvec{A}}}\), whereas on lower levels \({{\varvec{u}}}_l\) is the defect correction and \({{\varvec{A}}}_l\) the counterpart of \({{\varvec{A}}}\) obtained with elements of order \(p_l\). For transferring the correction from level \(l-1\) to level l we use the embedded interpolation operator \(\varvec{\mathcal {I}}_l\), and for restricting the residual its transpose. These ingredients allow to build a multigrid V-cycle, which is summarized in Algorithm 3. Both, the multiplicative and the weighted additive Schwarz method stated in Algorithm 1 and , respectively, can serve as the Smoother. The number of pre- and post-smoothing steps, \(n_{\textsc {s}1,l}\) and \(n_{\textsc {s}2,l}\), can differ from level to level to allow variable V cycles [4]. Line 8 of Algorithm 3 defines the coarse grid solution by means of the pseudoinverse \({{\varvec{A}}}_0^{+}\). In our implementation the coarse problem is solved using the conjugate gradient method. To achieve convergence in spite of singularity, the right side is projected to the null space of \({\varvec{A}}_0\), as proposed by [14].

figure c

3.3 Preconditioned Conjugate Gradients

For enhancing robustness and efficiency multigrid methods can be accelerated by Krylov subspace methods [27]. In the present case, with symmetric system matrices on all grid levels, one would favor preconditioned conjugate gradients. Unfortunately, weighted additive Schwarz and multiplicative Schwarz with uneven iteration count are both non-symmetric and hence affect the symmetry of MG as well. As a remedy, it is possible to symmetrize the weighting method or to use GMRES instead of CG for acceleration. According to [18], however, symmetrization can deteriorate the efficiency of the method. This detrimental behavior was confirmed in own tests and, hence, the symmetrized method is not considered here.

Recently, generalizations of the conjugate gradient method have been developed that allow for relaxing some restrictions of standard CG and, thus, promise a cheaper alternative to GMRES. The use of inaccurately solved and non-symmetric preconditioners in CG-like methods has been justified, e.g., in [2, 9, 23]. Moreover, [3] demonstrated the suitability of the so-called flexible PCG in conjunction with non-symmetric multigrid preconditioners. Following this approach, we use the MGCG method summarized in Algorithm 4. This method is equivalent to the flexible PCG of [23], but can be regarded also as a variant of the inexact PCG proposed by [9]. The main difference to standard PCG consists in the application of the Polak-Ribìere formula for \(\beta \), instead of the Fletcher-Reeves formula, on line 13 of the algorithm. We also note that, as before with the coarse problem, the right side \({{\varvec{f}}}\) must be in the null space of \({{\varvec{A}}}\) if the system is singular.

figure d

4 Results

Numerical tests were performed to assess the influence of weighting, overlap and cycling strategy on the computational efficiency and robustness of MG and MGCG. The methods were implemented in Fortran and compiled using the GNU compiler collection 6.0 with –O3.

All results are based on test cases with the source f evaluated analytically from the exact solution and starting from a random initial guess confined to the interval [0,1]. The convergence speed is evaluated using the number of cycles \(n_{10}\) needed to reduce the norm of the residual by a factor of \(10^{10}\) and the average logarithmic convergence rate according to [28]

$$\begin{aligned} \bar{r} = \frac{1}{n} \log _{10} \frac{||{{\varvec{r}}}^{(0)} ||}{||{{\varvec{r}}}^{(n)} ||}, \end{aligned}$$
(13)

where \({{\varvec{r}}}^{(n)}\) is the Euclidean norm of the residual vector after the nth cycle. Note that \(n_{10}\) is nearest integer greater than or equal to \(10/{\bar{r}}\).

As an efficiency measure we define the approximate number of operator applications required for reducing the residual by a factor of \(10^k\),

$$\begin{aligned} {\bar{\omega }}_k = \frac{k}{{\bar{r}}} \frac{W_{\text {cyc}}}{W_{\text {op}}}, \end{aligned}$$
(14)

where \(W_{\text {cyc}}\) is the cost for one cycle and \(W_{\text {op}}\) the cost for one application of the system matrix \({{\varvec{A}}}\). Exploiting sum factorization [7, 16], \(W_{\text {op}}\) can be estimated as \(2 n_{\textsc {p}}^3 n_{\textsc {e}}\), where \(n_{\textsc {p}}=p+1\) and \(n_{\textsc {e}}\) is the number of elements. According to Sect. 3.1, the cost of one Schwarz iteration is approximately \(2(n_{\textsc {p}}+ 2n_{\textsc {o}})^3n_{\textsc {e}}\). Assuming a maximum relative overlap of \(n_{\textsc {o}}/n_{\textsc {p}}\) this yields the estimate

$$\begin{aligned} W_{\text {cyc}} = \left[ 4 \Big (1 + 2\frac{n_{\textsc {o}}}{n_{\textsc {p}}}\Big )^3 c_{\textsc {s}}n_{\textsc {s}}+ 2 c_{\textsc {s}}+ c_{\textsc {cg}} \right] \, n_{\textsc {p}}^3 n_{\textsc {e}}, \end{aligned}$$
(15)

where \(n_{\textsc {s}}\) is the number of pre- and post-smoothing steps on the finest level, \(c_{\textsc {s}}= 4/3\) for the classical V-cycle and \(c_{\textsc {s}}= 2\) for a variable V-cycle doubling the number of smoothing steps when changing to the next lower level, and \(c_{\textsc {cg}} = 2\) is the extra cost for conjugate gradients with MGCG. Since the bracketed term is constant, the overall cost of one multigrid cycle scales approximately with pN, where \({N = p^2 n_{\textsc {e}}}\) denotes the number of unknowns. Among the multigrid components, the smoother is by far the most expensive part, accounting for about 80–90% of the total cost in typical applications.

4.1 Weighting and Overlap

We consider the Poisson problem (1) in the domain \({\varOmega =[0,2]^2}\), which is uniformly subdivided in \(8\times 8\) square elements with order p ranging from 4 to 32. The right hand side is chosen to match the exact solution \({u=\sin (\pi x)\sin (\pi y)}\). In the first test series we set the overlap to \(n_{\textsc {o}}=1\) on all levels \(l>0\). Table 2 shows the measured convergence rates for MG with one pre-smoothing. Column “\(w_a\)” corresponds to the weighted additive Schwarz method using the arithmetic mean in overlap areas. Compared to [19] our results agree well for \(p=4\), but show a faster convergence with higher polynomial orders. This could be attributed to using periodic instead of Dirichlet boundary conditions.

The remaining columns in Table 2 display the convergence rates for additive Schwarz smoothing with the gradual weighting introduced in Sect. 3.1 and multiplicative Schwarz. In comparison to the arithmetic mean, weighting using a smooth—cubic, quintic or 7th order—shape function roughly doubled the convergence rate for orders 4, 8 and 16, while linear and top hat weighting yielded a lower, but still remarkable improvement. As expected, the multiplicative Schwarz smoother attained the fastest convergence. At \(p=32\) all methods suffer a serious performance degradation, except for arithmetically weighted Schwarz, which nonetheless remains the slowest.

Table 2 Convergence rates \({\bar{r}}\) for MG with additive Schwarz smoothing using one pre- and no post-smoothing steps, \(n_{\textsc {e}}=8\times 8\) elements and a fixed overlap of \(n_{\textsc {o}}=1\)

Inspired by these observations, several tests were run with overlaps depending on the polynomial degree on each mesh level. Table 3 shows the convergence rates for the case \(n_{\textsc {o},l}=\lfloor p_l/8 \rfloor \). Note that this choice implies \(n_{\textsc {o}}=0\) for degrees less than 8, while reaching \(n_{\textsc {o}}=4\) with \(p=32\). As a consequence, the convergence rates for \(p=4\) are slightly lower than with \(n_{\textsc {o}}=1\), except for multiplicative Schwarz. For \(p \ge 16\) the increased overlap yields a considerable speedup. This improvement is most pronounced for cubic and quintic weighting, which come remarkably close to multiplicative Schwarz.

As a résumé of the first study we conclude that 1) gradual weighting with a smooth shape function yields a decisive improvement over arithmetic weighting, and 2) increasing the overlap with growing p is crucial for robustness.

Table 3 Convergence rates for a level-dependent overlap of \(n_{\textsc {o},l}=\lfloor p_l/8 \rfloor \). For caption see Table 2

4.2 Robustness and Efficiency

Next we investigate robustness with respect to the mesh size and aspect ratio. First, MG with one pre-smoothing is applied on uniform meshes consisting of \(4^2\) to \(1024^2\) elements with p ranging from 4 to 32 and up to four million unknowns. Table 4 compiles the results for quintically weighted and multiplicative Schwarz smoothers with overlap \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\). Except in coarse quadrangulations, where periodicity can induce interference effects, the convergence characteristics are virtually independent of the number of elements \(n_{\textsc {e}}\). The convergence rate \({\bar{r}}\) shows a moderate growth for increasing order p and is similar for both smoothers, with a slight advantage for the weighted additive Schwarz method. As a consequence, the equivalent number of operator applications required for reducing the residual by an order of magnitude drops almost to one third when increasing p from 4 to 32 and, thus, mitigates the higher operator cost per DOF.

In the second test we fixed the mesh to \(16\times 16\) elements of order \(p=16\), but increased the aspect ratio \(AR=\Delta x / \Delta y\) by enlarging the domain into the x direction, i.e., \({\varOmega =[0,2AR]\times [0,2]}\). Table 5 reports the results for MG and MGCG using additive weighted Schwarz with \(w_5\), \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\) and one pre-smoothing step. As expected, the stand-alone MG performs well for small aspect ratios, but degrades for \(AR>2\). MGCG is slightly less efficient than MG for \(AR\le 2\), but proves more robust at higher aspect ratios. At \(AR=8\) it converges approximately twice as fast as MG.

Table 4 Robustness with respect to problem size: MG using additive and multiplicative Schwarz smoothers with overlap \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\)
Table 5 Robustness with respect to aspect ratio: MG versus MGCG using additive Schwarz with \(w_5\) and overlap \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\)

While these observations hold almost uniformly for all orders p considered, it remains to investigate the impact of solver parameters such as smoothing steps and overlap. Figure 3 presents selected results of the corresponding study for \(p=16\) and aspect ratios \(AR=1\) to 16. In particular we considered several variants of MGCG(1,1), each applying one pre- and one post-smoothing. In one case, indicated by “var”, we employed a variable V-cycle in which the number of smoothing steps doubles with each coarser level, i.e. \(n_{\textsc {s}1,l}=n_{\textsc {s}2,l}=2^{L-l}\). The study included quintically weighted additive (“add, \(w_5\)”) as well as multiplicative (“mult”) Schwarz smoothers with a level-dependent overlap of \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\). Additionally we tested multiplicative Schwarz with \(n_{\textsc {o}}=0\) and \(n_{\textsc {s}1}=n_{\textsc {s}2}=2\), which corresponds to the method of [13], and the arithmetically averaged additive Schwarz smoother using a constant overlap of \(n_{\textsc {o}}=1\). Figure 3a depicts the achieved convergence rates. Compared to the case of only one smoothing, the additional post-smoothing raises \({\bar{r}}\) by a factor between 1.5 and 2, which is well in the expected range. Switching to multiplicative Schwarz yields an even higher gain for increasing aspect ratios. A similar effect is achieved using additive Schwarz with the variable V-cycle. MGCG(2,2) with zero overlap attains a convergence rate similar to MGCG(1,0) with level-dependent overlap. The arithmetically averaged Schwarz method with two smoothing steps falls about two thirds behind the quintically weighted method with only one smoothing for \(AR=1\), but gains a slight advantage over the latter for higher aspect ratios.

As the convergence rate does not account for the cost, it is of limited value when comparing methods of different computational complexity. A better measure is the equivalent number of operator applications required for reducing the residual by one order of magnitude, \({\bar{\omega }}_1\), which is shown in Fig. 3b. In this metric, the multiplicative MGCG(1,1) with level-dependent overlap performs best, especially for higher aspect ratios. It is followed by its additive counterpart with quintic weighting, which is at level for \(AR\le 2\), but needs ca 34 instead of 26 operator applications for \(AR=16\). The comparison also reveals that the benefit of the variable V-cycle is lost due to the higher computational complexity. Generally, the influence of smoothing and overlap parameters lessens with increasing aspect ratio (exempting the case of \(n_{\textsc {o}}=0\)), which indicates that the role of the conjugate gradient method gets more important.

Figure 3c depicts the runtimes measured on a 3.1 GHz Intel Core i7-5557U CPU. Note that MGCG(1,1) with quintic weighting attained the best performance despite its higher operation count in comparison to MGCG(1,1) with multiplicative Schwarz. This is because the additive Schwarz method evaluates the residual for all elements at once, yielding a single, highly efficient BLAS 3 operation. In contrast, multiplicative Schwarz requires a series of local residual updates, which is harder to optimize. Consistently, the multiplicative MGCG(2,2) with \({n_{\textsc {o}}=0}\) remains the least efficient method for all aspect ratios. Compared to MGCG(1,1) with \({n_{\textsc {o}}=1}\) and arithmetic weighting, the method with \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\) and quintic weighting succeeds twice as fast for \(AR=1\) and still gains 23% at \(AR=16\). Though other choices may yield even better performance, the study documents that the method is not too sensitive to parameter variations, such that only minor improvements can be expected.

Fig. 3
figure 3

Performance of MGCG for different aspect ratios. All cases use \(16\times 16\) elements of order \(p=16\) and \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\), if not specified otherwise. a Average logarithmic convergence rate. b Operator applications required for \(10^{1}\) residual reduction. c Solver runtime for \(10^{10}\) residual reduction

4.3 Variable Diffusion

Since many problems in physics involve variable coefficients, it is interesting to explore if the multigrid method is capable to retain its efficiency in such applications. As an example we consider the diffusion equation

$$\begin{aligned} -\nabla \cdot (\nu \nabla u) = f \end{aligned}$$
(16)

with variable diffusivity \(\nu \) in the periodic domain \({\varOmega =[0,AR]\times [0,1]}\). From a physical perspective it seems reasonable that the solution and the diffusivity vary on similar scales. Following this idea we set \({u = \sin (2\pi x) \sin (2\pi y)}\) and

$$\begin{aligned} \nu = 1 + {\hat{\nu }} \sin \big (2\pi (x - s)\big ) \sin \big (2\pi (y - s)\big ), \end{aligned}$$
(17)

where \({\hat{\nu }}\) is the amplitude and s the shift of the diffusivity fluctuation. According to preliminary studies, a non-zero shift poses an additional difficulty to the solver. Taking this into account, \({s = 0.2}\) is chosen in all tests reported below. The source is analytically computed from \({f = -(\nu \nabla ^2 u + \nabla \nu \cdot \nabla u)}\).

Discretization using rectangular spectral elements yields the linear system

$$\begin{aligned} {{\varvec{B}}} {{\varvec{u}}} = {{\varvec{f}}}, \end{aligned}$$
(18)

where \({{\varvec{B}}}(\nu )\) represents the discrete diffusion operator or, equivalently,

$$\begin{aligned} {{\varvec{B}}} \Delta {{\varvec{u}}} = {{\varvec{f}}} - {{\varvec{B}}} {\tilde{{\varvec{u}}}} = {\tilde{{\varvec{r}}}}, \end{aligned}$$
(19)

for the correction \(\Delta {{\varvec{u}}}\) to a given approximation \({\tilde{{\varvec{u}}}}\). Application of the Schwarz method described in Sect. 3.1 leads to the local correction equation

$$\begin{aligned} {{\varvec{B}}}_{ss} \Delta {{\varvec{u}}}_s = {{\varvec{r}}}_s, \end{aligned}$$
(20)

where \({{\varvec{B}}}_{ss}\) and \({{\varvec{r}}}_s\) are the diffusion operator and, respectively, the residual restricted to the subdomain \(\varOmega _s\). In comparison to the subdomain problem for Poisson case (7), Eq. (20) is more expensive to solve, because the fast diagonalization technique is no longer applicable. Yet, the smoothing property is more important for multigrid than accurate solution of the subproblems. This motivates the reintroduction of the discrete Laplacian by approximating the restricted diffusion operator on the left side of (20) by \({{{\varvec{B}}}_{ss} \approx {\bar{\nu }}_{s}} {{\varvec{A}}}_{ss}\), where the diffusivity \({\bar{\nu }}_{s}\) is assumed to be constant in \(\varOmega _s\). For simplicity, \({\bar{\nu }}_{s}\) is set to the average of \(\nu \) over the embedded element. The correction can then be approximated as

$$\begin{aligned} \Delta {{\varvec{u}}}_s \approx \frac{1}{{\bar{\nu }}_{s}}{{\varvec{A}}}_{ss}^{-1} {{\varvec{r}}}_s, \end{aligned}$$
(21)

where, again, \({{\varvec{A}}}_{ss}^{-1}\) stands for the application of the factored inverse obtained from fast diagonalization. As a result, the solution techniques developed in Sect. 3 can be utilized with no change except for the residual evaluation.

The performance of the scheme was studied in two test series. In the first series, the aspect ratio was fixed to \({AR=1}\) and the domain \({\varOmega = [0,1]^2}\) decomposed into \(8^2\) square elements of order \({p = 16}\). The diffusivity fluctuation amplitude \({\hat{\nu }}\) was gradually increased from 0 to 0.9, where the latter corresponds to variations of the magnitude up to 90%. Figure 4 shows the measured convergence rates for MG und MGCG using one pre- and post-smoothing based on additive Schwarz with a level-dependent overlap of \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\) and quintic weighting. The results indicate that MG retains its efficiency up to fluctuation amplitudes of about 30%, but then degrades with rising \({\hat{\nu }}\). As expected, Krylov acceleration improves the robustness, such that MGCG achieves \({{\bar{r}} = 0.91}\) for \({{\hat{\nu }} = 0.9}\), which is nearly twice the convergence rate obtained with MG. Compared to \({{\hat{\nu }} = 0}\), this corresponds to an increase of the cycle count and, hence, in runtime, by a factor of just 2.2. A similar behavior was observed for polynomial orders \({p=4}\), 8 and 32.

In the second test series, we increased the domain extension in the x-direction, while keeping the diffusivity fluctuation amplitude \({\hat{\nu }}\) at a constant level. The number of elements is fixed and identical in both directions, such that the element aspect ratio equals AR. For achieving a robustness similar to the Poisson case it proved necessary to increase the subdomain overlap with growing \({\hat{\nu }}\). Figure 5 shows the convergence rates \({\bar{r}}\) for MGCG using a variable V-cycle and additive Schwarz smoothing for an amplitude of 90%, which represents the most challenging test in the series. Comparing the results for \({p=8}\) and \({p=16}\) one observes that \({\bar{r}}\) strongly depends on the quadrangulation, but only marginally on the polynomial order. Using a finer mesh yields considerably higher convergence rates and better robustness. Orders 4 and 32 fit nicely into this picture, but are not shown for clarity. The congruence of different orders using the same mesh suggests, that the performance depends on how well the diffusivity fluctuation is resolved by the element mean values adopted for \({\bar{\nu }}_s\). This presents a possible limitation of the approach, which needs further consideration in subsequent work. Nevertheless, the study demonstrates the suitability of the proposed method for problems involving variable diffusivity, as long as the latter is sufficiently resolved.

Fig. 4
figure 4

MG and MGCG convergence rates for different diffusivity fluctuation amplitudes. Discretization is based on an isotropic mesh comprising \(8^2\) elements of order \({p=16}\). One pre- and post-smoothing with an overlap of \({n_{\textsc {o},l}=\lceil p_l/8 \rceil }\) and quintic weighting were applied in both cases

Fig. 5
figure 5

MGCG convergence rate for variable diffusivity on anisotropic meshes with increasing aspect ratio using a variable V-cycle with one pre- and post-smoothing, overlap \({n_{\textsc {o},l}=\lceil p_l/2 \rceil }\) and quintic weighting

5 Conclusions

We have developed a nonuniformly weighted additive Schwarz method acting as the smoother in multigrid solvers for the spectral element discretization of the Poisson equation in \({\mathbb {R}}^2\). The method generalizes the Schwarz/multigrid method proposed in [19] and was inspired from weighting techniques devised in [11]. In each step, it determines the solution for a subdomain corresponding to an extended element region. These local solutions are blended according to a polynomial shape function which features a smooth transition from zero at the border toward one in the core of the subdomain. As an alternative we considered a multiplicative Schwarz method with no weighting required. Both Schwarz methods were integrated in a polynomial multigrid method which, in turn, was embedded in a preconditioned CG method.

The performance of these methods was assessed in a series of numerical experiments with ansatz orders p ranging from 4 to 32 and up to \(n_{\textsc {e}}=256^2\) elements of aspect ratios AR from 1 to 16. For unit-aspect ratio elements the proposed weighting improved the logarithmic MG convergence rate and reduced the cost by a factor of 1.5–3 in comparison to the original method. The study indicates that for robustness the subdomain overlap has to be bounded, i.e., the number \(n_{\textsc {o}}\) of node layers adopted from neighbor elements must grow with increasing order. Thus, with MG, the number of layers varies from level to level. A reasonable choice is to use an overlap of \({\lceil p_l/8\rceil }\) layers, where \(p_l\) denotes the polynomial order on level l. The resulting multigrid method is robust with respect to the mesh size, i.e. p and \(n_{\textsc {e}}\), but degrades with increasing aspect ratio. This behavior can be mitigated by Krylov subspace acceleration: Using MG as a preconditioner for the inexact conjugate gradient method [9] improves the convergence rate for higher aspect ratios considerably.

Finally, it has been shown that the proposed multigrid method is easily adapted and well suited for solving diffusion problems with varying coefficients, provided the mesh is fine enough to approximate diffusivity fluctuations by element mean values. Improving the treatment of variable coefficients and extending the approach to three space dimensions are topics of ongoing and future work.