Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The efficient computation of wave propagation phenomena in three-dimensional heterogeneous media is of significant research interest in many environmental inverse problems [50, 58]. The core of these large scale nonlinear optimization problems usually consists of the approximate solution of a linear system issued from the discretization of a Helmholtz scalar wave equation, typically written in the frequency domain. Hence, as discussed in this book, the design of efficient direct or iterative solvers for the resulting large scale linear systems is of paramount importance. In particular, efficient domain decomposition methods [17, 18, 25, 35, 47, 48] or multigrid methods [6, 7, 15, 16, 1923, 26, 33, 37, 38, 42, 5456] have been proposed in the past few years in this context; we also refer the reader to, e.g., [24, 52, Sect. 11.5] and references therein for comprehensive surveys.

In this chapter, we focus on the parallel performance of a geometric multigrid preconditioner for the solution of wave propagation problems related to acoustic imaging in seismic exploration. For such a purpose, we consider the geometric two-grid preconditioner proposed in [11] for the numerical solution of Helmholtz problems in three-dimensional heterogeneous media. This two-grid cycle is directly applied to the original Helmholtz operator and relies on an approximate coarse grid solution. A second multigrid method applied to a complex shifted Laplace operator is then used as a preconditioner when solving the coarse grid system to obtain an approximate coarse solution. In this chapter, we consider this preconditioner in relation with high order dispersion minimizing finite difference schemes to tackle propagation problems at relatively high frequencies. In particular, as main contributions, we investigate the strong scaling properties of the numerical method in a massively parallel setting and provide a complexity analysis related to a realistic test case in geophysics.

The chapter is organized as follows. We introduce both the continuous and discrete Helmholtz problems in Sect. 2. In Sect. 3, we describe the geometric multigrid preconditioner that is considered throughout the chapter. Numerical experiments performed in a massively parallel environment are reported in Sect. 4. Finally, conclusions are drawn in Sect. 5.

2 Problem Setting

We specify the continuous and discrete versions of the heterogeneous Helmholtz problem that we consider throughout this chapter.

2.1 Mathematical Formulation at the Continuous Level

Given a three-dimensional physical domain Ω p of parallelepiped shape, the propagation of a wavefield in a heterogeneous medium can be modeled by the Helmholtz equation written in the frequency domain [50]

$$\displaystyle\begin{array}{rcl} -\sum _{i=1}^{3} \frac{\partial ^{2}u} {\partial x_{i}^{2}} -\frac{(2\pi f)^{2}} {c^{2}} u& =& \delta (\mathbf{x} -\mathbf{s}),\quad \mathbf{x} = (x_{1},x_{2},x_{3}) \in \varOmega _{p}.{}\end{array}$$
(1)

In Eq. (1), the unknown u represents the pressure wavefield in the frequency domain, c the acoustic-wave velocity in m s−1, which varies with position, and f the frequency in Hertz. The source term δ(xs) represents a harmonic point source located at s = (s 1, s 2, s 3) ∈ Ω p . The wavelength λ is defined as λ = cf and the wavenumber as 2π fc. A popular approach—the Perfectly Matched Layer formulation (PML) [4, 5]—has been used in order to obtain a satisfactory near boundary solution, without many artificial reflections. Artificial boundary layers are then added around the physical domain to absorb outgoing waves at any incidence angle as shown in [4]. We denote by Ω PML the surrounding domain created by these artificial layers. This formulation leads to the following set of coupled partial differential equations with homogeneous Dirichlet boundary conditions imposed on Γ, the boundary of the domain

$$\displaystyle\begin{array}{rcl} -\sum _{i=1}^{3} \frac{\partial ^{2}u} {\partial x_{i}^{2}} -\frac{(2\pi f)^{2}} {c^{2}} u& =& \delta (\mathbf{x} -\mathbf{s})\quad \mbox{ in}\quad \varOmega _{p},{}\end{array}$$
(2)
$$\displaystyle\begin{array}{rcl} -\sum _{i=1}^{3} \frac{1} {\xi _{x_{i}}(x_{i})} \frac{\partial } {\partial x_{i}}( \frac{1} {\xi _{x_{i}}(x_{i})} \frac{\partial u} {\partial x_{i}}) -\frac{(2\pi f)^{2}} {c^{2}} u& =& 0\quad \mbox{ in}\quad \varOmega _{PML}\setminus \varGamma,{}\end{array}$$
(3)
$$\displaystyle\begin{array}{rcl} u& =& 0\quad \mbox{ on}\quad \varGamma,{}\end{array}$$
(4)

where the one-dimensional \(\xi _{x_{i}}\) function represents the complex-valued damping function of the PML formulation in the i-th direction, selected as in [34]. The set of equations ((2)–(4)) defines the forward problem related to acoustic imaging in geophysics that will be considered in this chapter. We note that the proposed numerical method can be applied to other application fields, where wave propagation phenomena appear as well.

We also introduce the complex shifted Laplace operator defined as

$$\displaystyle\begin{array}{rcl} -\sum _{i=1}^{3} \frac{\partial ^{2}u} {\partial x_{i}^{2}} - (1 + i\beta )\frac{(2\pi f)^{2}} {c^{2}} u& =& \delta (\mathbf{x} -\mathbf{s})\quad \mbox{ in}\quad \varOmega _{p},{}\end{array}$$
(5)
$$\displaystyle\begin{array}{rcl} -\sum _{i=1}^{3} \frac{1} {\xi _{x_{i}}(x_{i})} \frac{\partial } {\partial x_{i}}( \frac{1} {\xi _{x_{i}}(x_{i})} \frac{\partial u} {\partial x_{i}}) - (1 + i\beta )\frac{(2\pi f)^{2}} {c^{2}} u& =& 0\quad \mbox{ in}\quad \varOmega _{PML}\setminus \varGamma,{}\end{array}$$
(6)
$$\displaystyle\begin{array}{rcl} u& =& 0\quad \mbox{ on}\quad \varGamma,{}\end{array}$$
(7)

where the parameter \(1 + i\beta \in \mathbb{C}\) is called the complex shift.Footnote 1 This operator will play a significant role later in the definition of our multigrid preconditioner in Sect. 3.

2.2 Mathematical Formulation at the Discrete Level

2.2.1 Dispersion Minimizing Finite Difference Scheme

As frequently used in the geophysics community, we have considered a finite difference discretization of the Helmholtz problem ((2)–(4)) on a uniform equidistant Cartesian grid of size n x × n y × n z . We denote later by h the corresponding mesh grid size, Ω h the discrete computational domain and n PML the number of points in each PML layer.

Since the standard second-order finite difference scheme is often found to be too dispersive [3, 13, 29, 47], we have considered dispersion minimizing finite difference schemes. These schemes are especially recommended when targeting the solution of heterogeneous Helmholtz problems at high frequency, since they provide a pollution-free solution [12, 34, 47, 51]. In the context of multilevel algorithms, these schemes are also relevant for the discretization of the coarse grid operator in order to provide the same dispersion level on both the coarse and fine scales [47]. This feature has also been found beneficial by several authors, see, e.g., [12, 47, 54]. Hereafter, we have considered the compact finite difference scheme proposed by Harari and Turkel [29] based on Padé approximations, which leads to a finite difference discretization with a 27 points stencil. This scheme is formally third-order accurate on general Cartesian grids and fourth-order accurate on uniform grids. Following [3], given reference values for both the frequency f ref and the step size h ref and denoting by q the discretization order of the finite difference scheme, we have used the following relation to determine the step size h, given a certain frequency f,

$$\displaystyle\begin{array}{rcl} h^{q}\ f^{q+1} = h_{ ref}^{q}\ f_{ ref}^{q+1}.& &{}\end{array}$$
(8)

2.2.2 Properties of the Discrete Linear System

The discretization of the forward problem ((2)–(4)) with the dispersion minimizing finite difference scheme leads to the following linear system A h x h  = b h , where \(A_{h} \in \mathbb{C}^{n_{h}\times n_{h}}\) is a sparse complex matrix which is non Hermitian and non symmetric due to the PML formulation [5, 36, 45] and where \(x_{h},b_{h} \in \mathbb{C}^{n_{h}}\) represent the discrete frequency-domain pressure field and source, respectively. In addition, the right-hand side is usually very sparse. The condition (8) imposes to solve large systems of equations at the (usually high) frequencies of interest for the geophysicists, a task that may be too memory expensive for standard [45, 46] or advanced sparse direct methods exploiting hierarchically semi-separable structure [59, 60] on a reasonable number of cores of a parallel computer. Consequently, preconditioned Krylov subspace methods are most often considered and efficient preconditioners must be developed for such indefinite problems. We describe next in detail a multigrid preconditioner that has been proposed in [11] for the solution of the forward problem related to acoustic imaging.

3 A Geometric Multigrid Preconditioner

We describe the geometric two-grid preconditioner proposed in [11] and detail its salient properties. We first introduce notation related to multigrid methods to make easier the description of the multilevel algorithm. We conclude this section by briefly commenting the related parallel implementation.

3.1 Notation

The fine and coarse levels denoted by h and H are associated with discrete grids Ω h and Ω H , respectively. Due to the application in seismic exploration where structured grids are routinely used, a geometric construction of the coarse grid Ω H is used. The discrete coarse grid domain Ω H is then deduced from the discrete fine grid domain Ω h by doubling the mesh size in each direction as classically done in vertex-centered geometric multigrid [49]. In the following, we assume that A H represents a suitable approximation of the fine grid operator A h on Ω H . We also introduce \(I_{h}^{H}: \mathcal{G}(\varOmega _{h}) \rightarrow \mathcal{G}(\varOmega _{H})\) a restriction operator, where \(\mathcal{G}(\varOmega _{k})\) denotes the set of grid functions defined on Ω k . Similarly \(I_{H}^{h}: \mathcal{G}(\varOmega _{H}) \rightarrow \mathcal{G}(\varOmega _{h})\) will represent a given prolongation operator. More precisely, we select as a prolongation operator trilinear interpolation and as a restriction its adjoint which is often called the full weighting operator [28, 49]. We refer the reader to [53, Sect. 2.9] for a complete description of these operators in three dimensions.

3.2 Algorithm Overview

A two-grid preconditioner for the numerical solution of Helmholtz problems in three-dimensional heterogeneous media has been proposed in [11] in relation with second order finite difference discretization schemes. This two-grid cycle is directly applied to the original Helmholtz operator and relies on an approximate coarse grid solution. As shown in [36], the main difficulty is to find efficient approximate solution methods for the coarse level system A H z H  = v H . In this chapter, as in [11], we consider a preconditioning operator (the complex shifted Laplace operator) based on a different partial differential equation for which an efficient multilevel solution is possible. A second multigrid method applied to a complex shifted Laplace operator is then used as a preconditioner for the approximate solution of this coarse problem.

This combination of two cycles defined on two different hierarchies is detailed next. First, a two-grid cycle using Ω h and Ω H only (as fine and coarse levels, respectively) is applied to the original Helmholtz operator ((2)–(4)). A second sequence of grids Ω k (k = 1, ⋯ , l) with the finest grid Ω l defined as Ω l : = Ω H is introduced. On this second hierarchy, a multigrid cycle applied to a complex shifted Laplace operator S H (β): = S l (β) is then used as a preconditioner when solving approximately the coarse level system A H z H  = v H of the two-grid cycle. We note that the complex shifted Laplace operator S H (β) is simply obtained by direct coarse grid discretization of Eqs. (5)–(7) on Ω H .

The resulting cycle is sketched in Algorithm 1. The notation \(\mathcal{T}_{l,C}\) of Algorithm 1 uses subscripts related to the cycle applied to the complex shifted Laplace operator with l denoting the number of grids of the second hierarchy and C referring to the cycling strategy which can be of V, F or W type.

Algorithm 1: Cycle applied to A h z h  = v h . \(z_{h} = \mathcal{T}_{l,C}(v_{h})\)

As an illustration, Fig. 1 depicts the simplest configuration \((\mathcal{T}_{2,V })\) based on a two-grid cycle applied to the complex shifted Laplace operator. This cycle will be considered later in Sect. 4.

Fig. 1
figure 1

Multigrid cycle applied to A h z h  = v h sketched in Algorithm 1 (case of \(\mathcal{T}_{2,V }\)). The two-grid cycle is applied to the Helmholtz operator (left part), whereas the second two-grid cycle to be used as a preconditioner when solving the coarse grid problem A H z H  = v H is shown on the right part. This second multigrid cycle acts on the complex shifted Laplace operator S H (β) with β as a shift parameter

As explained in [11], this cycle leads to a variable nonlinear preconditioner which must be combined with an outer flexible Krylov subspace method [43, 44] and [57, Chap. 10]. In [11], the efficiency of the preconditioner in combination with FGMRES(5) [39] has been highlighted on both academic and realistic three-dimensional test problems. We investigate in the next section its performance when used in combination with a dispersion minimizing discretization scheme.

3.3 Parallel Implementation

The parallel implementation of the cycle proposed in Algorithm 1 is based on standard MPI (Message Passing Interface) [27]. We refer the reader to [53, Chap. 6] for details on the parallelization of geometric multigrid methods based on domain partitioning. In particular, the operations related to matrix-vector products, restriction and interpolation require local communications between neighbouring processes. As in [16, 56], the polynomial smoothing procedure is based on GMRES [41], which requires both local and global communications. Local and global communications also occur when solving the coarse grid system with the preconditioned FGMRES Krylov subspace method [39]. We refer the reader to [40, Chap. 11] for a discussion on parallel implementations of Krylov subspace methods. To take advantage of the current multicore based computer architectures, we note that the use of MPI and OpenMP would be relevant to consider. This is left to a future line of development.

We investigate in the next section the performance of the proposed geometric preconditioner, when a dispersion minimizing finite difference scheme is considered for the discretization of the Helmholtz problem.

4 Numerical Results on the SEG/EAGE Salt Dome Model

In this section, we illustrate the performance of the multigrid preconditioner used in combination with FGMRES(m) for the solution of the acoustic Helmholtz problem ((2)–(4)) on a realistic heterogeneous benchmark velocity model. The SEG/EAGE Salt dome model [2] is a velocity field containing a salt dome in a sedimentary embankment. It is defined in a parallelepiped domain of size 13. 5 × 13. 5 × 4. 2 km3. The minimum value of the velocity is 1500 m s−1 and its maximum value is 4481 m s−1, respectively. This test case is considered as challenging due to both the occurrence of a geometrically complex structure (salt dome) and to the truly large dimensions of the computational domain. We first analyse the strong scalability properties of the numerical method on this realistic application. Then we investigate numerically the complexity of the numerical method, i.e., the evolution of the memory requirements and computational times with respect to the number of unknowns. We first define the settings and parameters of the multigrid preconditioner used in this study.

4.1 Settings and Parameters

In the two-grid cycle of Algorithm 1, we consider as a smoother the case of one cycle of GMRES(2) preconditioned by two iterations of damped Jacobi (ϑ = 1, m s  = 2 and ν = 2), a restarting parameter equal to m c  = 10 for the preconditioned GMRES method used on the coarse level and a maximal number of coarse cycles equal to ϑ c  = 10. In the complex shifted multigrid cycle used as an approximate coarse solver, we use a shift parameter equal to β = 0. 5 and two iterations of damped Jacobi as a smoother (ν β  = 2). On the coarsest level we consider as an approximate solver one cycle of GMRES(10) preconditioned by two iterations of damped Jacobi (ϑ β  = 1, m β  = 10 and ν β  = 2). Finally, the relaxation coefficients considered in the Jacobi method are given by the following relation

$$\displaystyle{(\omega _{h},\omega _{2h},\omega _{4h}) = (0.8,0.8,0.2).}$$

We consider a value of the restarting parameter of the outer Krylov subspace method equal to m = 5 as in [10, 36]. The unit source is located at

$$\displaystyle{(s_{1},s_{2},s_{3}) = (h\ n_{x_{1}}/2,h\ n_{x_{2}}/2,h\ (n_{PML} + 1))}$$

where, e.g., \(n_{x_{1}}\) denotes the number of points in the first direction and n PML is set to 10. A zero initial guess x h 0 is selected and the iterative method is stopped when the Euclidean norm of the residual normalized by the Euclidean norm of the right-hand side satisfies the following relation

$$\displaystyle{\frac{\vert \vert b_{h} - A_{h}x_{h}\vert \vert _{2}} {\vert \vert b_{h}\vert \vert _{2}} \leq 10^{-5}.}$$

This numerical study has been performed on Turing,Footnote 2 a IBM BG/Q computer located at IDRIS (each node of Turing is equipped with 16 PowerPC A2-64 bit cores at 1.6 GHz) using a Fortran 90 implementation with MPI in complex single precision arithmetic. Physical memory on a given node (16 cores) of Turing is limited to 16 GB.

4.2 Strong Scalability Analysis

We are interested in the strong scalability properties of the numerical method. Hence, we consider the acoustic wave propagation problem ((2)–(4)) at a fixed frequency (20 Hz) on a growing number of cores.The step size h is determined by relation (8) with f ref  = 10 Hz, h ref  = 15 and q ref  = 4. Table 1 collects the number of preconditioner applications (Prec) and computational time (Time) versus the number of cores. We note that the number of preconditioner applications (which corresponds to the number of outer Krylov subspace iterations) is found to be independent of the number of cores, which is a nice property. We also define a scaled parallel efficiency as

$$\displaystyle{ \tau _{s} = \frac{T_{ref}} {T} / \frac{Cores} {Cores_{ref}}, }$$
(9)

where T ref and Cores ref denote reference values related to computational time and number of cores, respectively. A perfect scaling corresponds to the value of 1. In practice, we note that τ s is close to this value. Only the last numerical experiment performed on 131, 072 cores leads to a moderate decrease in terms of scaled parallel efficiency. This is partly due to the increased number of communications, which leads to a significant decrease of the ratio computation/communication.

Table 1 Strong scalability analysis

4.3 Complexity Analysis

We next analyse the complexity of the numerical method with respect to the frequency or to the problem size, equivalently. In this numerical experiment, the number of cores is kept fixed to 131, 072, while the frequency is growing from 15 Hz to 40 Hz, respectively. The case of f = 40 Hz leads to a linear system with approximately 56. 7 billion of unknowns. The number of preconditioner applications (Prec), computational times (T) and memory requirements (M) are reported in Table 2. The number of preconditioner applications is rather moderate and is found to grow almost linearly with respect to the frequency. This linear dependency has been also observed for the complex shifted Laplace preconditioner in relation with other dispersion minimizing finite difference schemes [12], on problems of smaller size though. This behaviour is quite satisfactory, since huge linear systems can be solved in a reasonable amount of computational time on a parallel distributed memory machine.

Table 2 Complexity analysis

Figure 2 shows the evolution of the computational time (T) versus the problem size. If N denotes the total number of unknowns, the computational time T is found to behave asymptotically as N 1. 32. This is quite competitive with advanced sparse direct solution methods based on block low rank [1] or hierarchical matrix compression techniques [35, 59, 60]. To complement this study, it would be interesting to perform the same complexity analysis, now when addressing linear systems with multiple right-hand sides. Efficient block Krylov subspace methods based on block size reduction at each restart [9, 10] or at each iteration [8] have been proposed in this setting. This is left to a future line of research.

Fig. 2
figure 2

Complexity analysis of the improved two-grid preconditioned Krylov subspace method. Evolution of computational time versus problem size. EAGE/SEG Salt dome. Results of Table 2

The evolution of the requested memory (M) versus the problem size is shown in Fig. 3. As expected, the memory requirements grow linearly with the number of unknowns, since no sparse factorization is involved neither at the global nor at local levels in the preconditioner. We remark that the benefit of the proposed method has to be viewed in the light of future parallel architectures with the most scalable architectures having limited memory per core.

Fig. 3
figure 3

Complexity analysis of the improved two-grid preconditioned Krylov subspace method. Evolution of memory requirements versus problem size. EAGE/SEG Salt dome. Results of Table 2

5 Summary and Outlook

In this chapter, we have focused on the performance of a geometric multigrid preconditioner for the solution of wave propagation problems related to acoustic imaging. We have proposed a two-grid preconditioner for the numerical solution of Helmholtz problems in three-dimensional heterogeneous media. This two-grid cycle is directly applied to the original Helmholtz operator and relies on an approximate coarse grid solution. A second multigrid method applied to a complex shifted Laplace operator is then used as a preconditioner to obtain the approximate coarse solution. We have highlighted the efficiency of the multigrid preconditioner on a concrete application in geophysics requiring the solution of problems of huge dimension (namely, billion of unknowns) in combination with dispersion minimizing finite difference schemes. Numerical results have demonstrated the usefulness of the combined algorithm on a realistic three-dimensional application at high frequency. Finally, a detailed complexity analysis has been provided to close this chapter.

We would like to mention three recent contributions for the solution of heterogeneous Helmholtz problems exhibiting attractive complexities and almost frequency independent rate of convergence. Zepeda-Núñez and Demanet [61] have proposed an algorithm based on the combination of domain decomposition techniques and integral equations with application to two-dimensional acoustic problems. Liu and Ying [31, 32] have proposed enhancements of the sweeping preconditioner leading to a O(N) complexity for both the setup phase and the preconditioner application. A detailed performance comparison with the proposed numerical method on the same benchmark problem would be interesting to perform.

In the context of inverse problems in seismic, e.g., acoustic full waveform inversion, the solution of forward Helmholtz problems represents a major computational kernel, as outlined above. For that purpose, the geometric multigrid preconditioner used in combination with block Krylov subspace methods will play a key role to address the solution of linear systems with multiple right-hand sides efficiently; see [14] for a first attempt with a basic two-grid preconditioner developed in [36].

Advanced discretization methods based on Discontinuous Galerkin or high order finite element methods on unstructured grids are nowadays frequently used in geophysics for the solution of acoustic and/or elastic problems. Algebraic multigrid methods [53, Appendix A] could be used as well to extend the proposed geometric multigrid preconditioner and define an efficient numerical method in this setting; see, e.g., [6, 33] for related contributions.

Finally, we will have to reconsider the global algorithm to fully exploit the extreme core count of forthcoming parallel computers. Communication-avoiding or minimizing Krylov subspace methods [30] with asynchronous variants of the multigrid preconditioner should be developed in a near future to tackle this exciting new challenge.