Abstract
In this paper we propose a new way to compute a rough approximation solution, to be later used as a warm starting point in a more refined optimization process, for a challenging global optimization problem related to earth imaging in geophysics. The warm start consists of a velocity model that approximately solves a full-waveform inverse problem at low frequency. Our motivation arises from the availability of massively parallel computing platforms and the natural parallelization of evolution strategies as global optimization methods for continuous variables. Our first contribution consists of developing a new and efficient parametrization of the velocity models to significantly reduce the dimension of the original optimization space. Our second contribution is to adapt a class of evolution strategies to the specificity of the physical problem at hands where the objective function evaluation is known to be the most expensive computational part. A third contribution is the development of a parallel evolution strategy solver, taking advantage of a recently proposed modification of these class of evolutionary methods that ensures convergence and promotes better performance under moderate budgets. The numerical results presented demonstrate the effectiveness of the algorithm on a realistic 3D full-waveform inverse problem in geophysics. The developed numerical approach allows us to successfully solve an acoustic full-waveform inversion problem at low frequencies on a reasonable number of cores of a distributed memory computer.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Vibrations generated by earthquakes, explosions or similar phenomena and propagated within the Earth or along its surface can yield information about the Earth and its subsurface structure. Such a knowledge, called Earth imaging, is of major interest for economy, environment, and science. Geologists have developed several methods for Earth imaging using seismic wave information. Acoustic full-waveform inversion (FWI) is one of such procedures and it attempts to derive high-resolution quantitative models of the subsurface using the full information of acoustic waves (Virieux and Operto 2009). Following (Tarantola 2005), a description of the problem can be given as follows. During the propagation, waves interfere with the environment and the total wavefield is recorded through a certain number of receivers (called hydrophones or geophones). Since the waves are affected by the physical properties of the subsurface, they are carrying information about the environment that can be retrieved by an inversion process. The propagation waves are generated by sources situated in the domain of consideration (see Fig. 1 for a simple illustration).
For many years acoustic full-waveform inversion has been almost exclusively employed by academic researchers. Only recently it has been adopted by practitioners in industry. Nevertheless, its computational cost is still large compared to other existing methods in seismic exploration. The attractiveness of the approach is the promise of deriving high-fidelity Earth models for seismic imaging. Since our ability to both understand and manage complex nonlinear inversions has improved and since the available computing power has grown at the same time, full-waveform inversion has become more and more practical.
It is known that the acoustic full-waveform inversion, formulated as a nonlinear optimization least-squares minimization problem, can be efficiently solved if the starting propagation velocity model is accurate enough [in the sense of explaining the data at a low frequency and still being a smooth version of the true velocity model, see Brossier (2009), Virieux and Operto (2009)]. Otherwise the inversion procedure suffers from stalled convergence to spurious local minima due to the oscillatory nature of the data (Mulder and Plessix 2008). Thus, a crucial step related to full-waveform inversion in seismic imaging consists of finding a good starting model (or point, in an optimization context) without the need of sophisticated a priori knowledge on the velocity model. In industry, first- arrival travel-time tomography (Nolet 1987) is the most popular method to generate an accurate initial propagation velocity model. More recent methods such as stereo tomography (Lambaré 2008) and inversion in the Laplace domain (Shin and Cha 2008) are being investigated in academia. It must be also mentioned here that the use of multiscale strategies can mitigate the nonlinearity and reduce the dependence on the starting velocity model for FWI (Pratt and Worthington 1990; Sirgue and Pratt 2004).
In this paper we propose a novel approach to find an initial smooth velocity model for the full-waveform inversion problem without any a priori physical knowledge. We are motivated by the recent availability of massively parallel computing platforms [see Rauber and Rünger (2013)]. First we introduce a new parametrization of the problem to reduce the number of parameters needed to describe a velocity model and therefore the objective function of our optimization problem. Then, we show how to adapt an evolution strategy (ES) to take advantage of such a model or space reduction. ES’s are a class of evolutionary algorithms designed for searching the global minimum of a function in a continuous space. We are motivated by the modifications in ES’s recently proposed in (Diouane et al. 2015) to ensure some form of rigorous convergence and a better computational performance under moderate budgets of function evaluations. Thirdly, based on one of the modified ES’s given in (Diouane et al. 2015), we propose a highly parallel ES adapted to the full-waveform inversion setting. By combining model reduction and ES’s in a parallel environment, we aim at solving realistic instances of the problem. In fact the numerical results obtained along this direction will show the appropriateness and promise of our approach.
We note that global optimization heuristics have already been already employed to solve related inverse problems. A first attempt to invert the ocean bottom properties (Collins and Kuperman 1992) has been made through simulated annealing. Later Gerstoft (1994) has applied genetic algorithms to invert seismoacoustic data. Training a neural network to compute a reliable estimate of a one-dimensional velocity model has been also proposed in (Röth 1994). In all these applications, the heuristics were either applied to problems where the objective function was cheap to evaluate or where a very simple parametrization of the velocity model was used.
The paper is organized as follows. We start by describing in Sect. 2 the large-scale Earth imaging optimization problem of interest to us. In Sect. 3, we detail our proposed methodology to reduce the number of unknowns, while representing the full search space as faithfully as possible. In Sect. 4 we describe a parallel ES adapted to the specificity of the given application. Numerical results for a realistic large-scale public domain inversion problem are then presented and discussed in Sect. 5. Finally, we draw some conclusions and describe future lines of research in Sect. 6.
2 Full-waveform inversion
Estimating the subsurface velocity from seismic recordings is the main goal of the full-waveform inversion procedure. One uses the recorded wavefields to guess the physical properties of the medium through which the wavefield has propagated. Two formulations (either time-domain or frequency-domain based) are traditionally used for finding the solution of this inverse problem. Relevant details on both approaches can be found in, e.g., (Brossier 2009; Pratt and Worthington 1990; Sirgue and Pratt 2004; Virieux and Operto 2009). Since the frequency-domain approach is regarded as more advantageous when solving the full-waveform inversion in the multiple frequency case (Virieux and Operto 2009), we will exclusively consider this approach in our paper. Below we briefly detail the full seismic wavefield problem (forward problem) and the associated problem used to recover the velocity model (inverse problem). At the end of the section, we will also introduce the three-dimensional public domain velocity model used in all our numerical illustrations and experiments.
2.1 The forward problem
Given the medium properties (e.g., the subsurface velocity), the forward problem consists of modeling the full seismic wavefield in a three-dimensional parallelipedic domain \(\Omega \subset {\mathbb {R}}^3\) at a given frequency. The wave propagation is usually controlled by a partial differential equation, whose formulation depends on the characteristics of the propagation model (Cohen 2002; Pinel 2010). In the frequency domain, the acoustic propagation of a pressure field u(x, y, z) at the position \((x,y,z) \in \Omega \) in a heterogeneous medium is governed by the Helmholtz equation defined as:
where \(k(x,y,z)= 2\pi f/ m(x,y,z)\) is the wavenumber, \(f \in {\mathbb {R}^{+}}\) is the frequency in Hz, and m(x, y, z) is the variable acoustic-wave velocity model in m/s. In Eq. (1), \(\Delta \) represents the standard Laplace operator and s(x, y, z) a Dirac source term. In a more realistic scenario, the source excitation is estimated by solving an inverse problem [see Sjogreen and Petersson (2014)]. The wavelength is given by m(x, y, z) / f. We consider absorbing boundary conditions: First, a popular approach, called perfectly matched layer formulation (PML) (Berenger 1994, 1996), is used to obtain a satisfactory near boundary solution, without many artificial reflections; Second, this artificial boundary layer is used to absorb outgoing waves at any incidence angle as shown in (Berenger 1994). The acoustic full-waveform inversion requires the solution of three-dimensional Helmholtz problems at various locations of the Dirac sources and thus leads to multiple right-hand side problems (Sourbier et al. 2009a, b).
In this paper, we consider a standard second-order accurate seven point finite-difference discretization of the Helmholtz equation (1) on an uniform equidistant Cartesian grid of size \(N_x \times N_y \times N_z\). For later use, we define \(N = N_x \times N_y \times N_z\), h the corresponding mesh grid size, and \(\Omega _h\) the discrete computational domain. After discretization, the acoustic full-wave inversion leads to the following linear system with p multiple right-hand sides:
where \(S \in {\mathbb {C}^{N \times p}}\) and \(A \in {\mathbb {C}^{N \times N}}\) is a sparse complex matrix (nonhermitian and nonsymmetric due to the PML formulation). We note that the matrix A embeds the properties of the subsurface and depends on the propagation velocity model m that we want to quantify. Since a stability condition has to be satisfied to correctly represent the wave propagation phenomena (Cohen 2002), we consider numerical discretization schemes with 10 points per wavelength. Consequently, at a given frequency f in Hz, we deduce the mesh grid size h in m as
In practice this last relation imposes the solution of very large systems of equations (see Sect. 5, where \(N \approx 10^6\)) at the frequencies of interest for the geophysicists. Such a task may be too computationally and memory expensive when solving linear systems by sparse direct methods. Due to their indefiniteness, these systems are known to be very challenging for iterative methods (Ernst and Gander 2012). Based on previous studies (Calandra et al. 2013, 2012; Lago 2013), we consider a recently proposed block flexible Krylov subspace method [BFGMRES-S (Calandra et al. 2013, Algorithm3)] for the solution of the linear system with multiple right-hand sides (2). In (Calandra et al. 2013) the authors have shown the relevance of the BFGMRES-S algorithm combined with a variable two-level preconditioner to address the solution of such large-scale acoustic forward problems in a distributed memory parallel environment. We refer the reader to (Calandra et al. 2012, Algorithm 5) for a complete description of the geometric two-grid preconditioner and to (Pinel 2010) for additional theoretical properties in relation with Krylov subspace methods. Applications to large-scale forward acoustic problems have been considered in detail in (Lago 2013, Chap. 4).
Figure 2 depicts a graphical representation of the SEGFootnote 1/EAGEFootnote 2 salt dome velocity model (described later in Sect. 2.3) and the real part of the wavefield at frequency \(12\,{{\mathrm{Hz}}}\) obtained after solving the Helmholtz equation in the case of a single source. The numerical solution has been obtained using BFGMRES-S. We note that the wave propagation is affected by the properties of the velocity model. The interference of the waves with the reflected layer generates reflection waves. The latter are recorded at different time steps using geophones to generate the so-called seismograms, i.e., the observed data for the associated inverse problem detailed next.
2.2 Full-waveform inversion as a least-squares global optimization problem
The standard formulation of acoustic full-waveform inversion (at a given frequency f) aims at minimizing the following least-squares misfit function (Tarantola 2005):
where \(\dagger \) denotes the adjoint operator (transpose conjugate). The weighting matrices \(W^i\) are in general used to include a priori data information. The misfit vector \(d^i(m) - d^i_{{{\mathrm{obs}}}} \in {\mathbb {R}}^n\), related to the i-th source, is computed as the difference at the receiver positions between the recorded seismic data \(d^i_{{{\mathrm{obs}}}}\) (i.e. seismograms) and the modeled seismic one \(d^i(m)\). The latter one corresponds to the modeled seismic wavefield \(u^i\) [computed as the i-th column of the U solution of (2)], projected using the operator \({\mathcal {P}}_{{{\mathrm{data}}}}\), which extracts the values of the wavefield at the receiver positions for each source, i.e., \(d^i(m) = {\mathcal {P}}_{{{\mathrm{data}}}}(u^i)\). The use of this projection operator makes the full-waveform inversion an ill-posed problem, meaning that an infinite number of velocity models matches the data, leading to the same objective function value. Therefore, an additional regularization term is classically added to the inversion problem to make it well posed (Tarantola 2005). In addition to the velocity model, the source excitation is generally unknown and must be included as an unknown of the problem (Pratt 1999). Provided that a good starting velocity model \(m_s\) is available (good in the sense that smoothly represents the structure of the true velocity model), the minimization of the objective function (4) is in practice solved using a Newton type method [see Virieux and Operto (2009) and references therein].
2.3 The SEG/EAGE salt dome velocity model
We have conducted our numerical studies on the acoustic inverse problem using a three-dimensional public domain velocity model known in the geophysics community as the SEG/EAGE salt dome velocity model.
This velocity model (depicted in Fig. 3) is based on a typical US Golf coast salt structure, and special care was taken to ensure that it is geologically feasible. Hence it is widely accepted as an adequate benchmark model for seismic imaging in the geophysics community. Next, we will introduce a parametrization procedure to compute an appropriate basis for the velocity models, which will then allows to compute an accurate and smooth representation of the SEG/EAGE salt dome velocity model using a reduced number of parameters.
3 Search space reduction
Evolution Strategies are heuristic methods designed for the solution of global optimization problems (with continuous variables) that have performed well in terms of the quality of the final point computed [see Auger et al. (2013, 2009), Hansen et al. (2010), Rios and Sahinidis (2013)]. However, like any other method for global optimization, ES’s suffer from the curse of dimensionality, meaning that their performance is satisfactory on low dimensional problems, but deteriorates as the dimensionality of the search space increases (Nabi and Xiaodong 2010). For realistic simulations of full-waveform inversion (Lago 2013; Pinel 2010), the typical size N of the velocity models exceeds in general \(10^6\), and thus trying to solve directly the problem using an ES is ruled out.
However, our purpose is not to solve the full-waveform inversion problem but rather to find a good starting velocity model \(m_s\) which can be later improved using a local gradient based method. The initial velocity model \(m_s\) is only required to represent the general structure of the true model, and such a representation is often smooth and can be expressed using only a few parameters (Virieux and Operto 2009). Given an appropriate and efficient procedure to represent a velocity model using a reduced number of parameters, the ES method will be applied to try to find the values of the parameters that lead to a smooth representation of the unknown velocity model to be inverted.
The problem of search space reduction has been investigated over the past years using subspace approaches (Kennett et al. 1988; Oldenburg et al. 1993; Skilling and Bryan 1984). In our context, a velocity model perturbation \(\tilde{m} \in {\mathbb {R}}^N\) can be restricted to lie in an n-dimensional subspace of \({\mathbb {R}}^N\), spanned by the vectors \(\{ v_i\}_{i=1,\ldots ,n}\), with \(n \le N\) (as with N, n will also later have a 3-D interpretation). The model perturbation can be then written as follows:
where \(g \in {\mathbb {R}}^n\) are the new parameters to invert, and \(V=[v_1,\ldots ,v_n] \in {\mathbb {R}}^{N \times n}\) is the so-called reduction basis. Subspace approaches lead to an important simplification of the problem (Kennett et al. 1988; Skilling and Bryan 1984), but are unfortunately very sensitive to the choice of the reduction basis. In fact, by restricting the search space to directions in a subspace, the neglected ones could be the vectors which are important in finding a local or global minimum of the objective function \({\mathcal {J}}\) given in (4). Often researchers use a sinusoidal basis as a reduction basis and try to find a vector g of parameters which produces an acceptable agreement to the observation (Oldenburg et al. 1993). The existing subspace methods, previously cited, use gradient information of the objective function, thus the reduction process is problem dependent. In our case, no information about the objective function will be used, thus our subspace technique can be applied regardless of the problem.
Inspired by the methodology used in image compressing, we propose in this work a new procedure to construct this basis using a combination of sinusoidal and rectangular basis functions, more specifically a discrete cosine transform (DCT) (Britanak et al. 2006) and a step function transform. In fact, the step function used to magnify the vector parameter \(g \in {\mathbb {R}}^n\), so that it fits the original space \({\mathbb {R}}^N\), usually leads to a pixelization effect. Thus a DCT is then applied to produce a smooth velocity model, reducing such pixelization. To simplify the exposition, we will first explain our approach in the one-dimensional case, and then give a generalization to cover the realistic 3D geometry of interest. We refer the reader to (Diouane 2014, Chap. 7) for a complete description of these procedures.
3.1 One-dimensional space reduction
There are three main procedures in our space reduction scheme: a reduction, a duplication, and a magnification.
In the reduction procedure, given a vector \(m \in {\mathbb {R}}^N\) discretizing a velocity vector, n subdivisions are first created. Then, it is taken the mean value the parameters included in each subdivision. In our context, the reduction operation will be only used to estimate how efficient is our magnification procedure.
The duplication procedure consists of building a vector \(m \in {\mathbb {R}}^N\) using a small-size vector \(g \in {\mathbb {R}}^n\) with \(n < N\). We first construct an empty vector m of size N, considering n subdivisions of indices \([x_i, x_{i+1}]\). Each subdivision contains around \(\delta =[N/n]\) parameters and thus \(\delta = x_{i+1} - x_i\) and \(x_i = (i-1)\delta +1\). The n parameters of the velocity vector g are distributed over the n subdivisions. The value associated to each subdivision is then duplicated over the \(\delta \) parameters of m assigned for that subdivision. The duplication procedure, as presented here, introduces a pixelization effect in the constructed vector m. A DCT can then be applied to improve the quality of the duplication and to remove the subdivision discontinuities.
A magnification procedure aims in general at removing noise or producing a less pixelated image. The most used smoothing algorithms are based on Gaussian smoothing (Aditya et al. 2012), bilateral filters (Tomasi and Manduchi 1998), and sinusoidal based approaches (Britanak et al. 2006). As a smoothing procedure, we choose to work with a sinusoidal basis since it is one of the most popular subspace approaches for FWI to generate a smooth approximation vector using few coefficients (Oldenburg et al. 1993). We will smooth the pixelization effect in the magnified vector using a DCT (Britanak et al. 2006).
As we have seen before, when duplicating a velocity vector from a vector of smaller size, the value in each subdivision is constant and thus it can be seen as a mean of all the subdivision values. Such a property will be imposed as well on the magnified velocity vector m in the sense that it corresponds to a model \(m(\cdot )\) such that
In turn, this velocity model \(m(\cdot )\) is expressed using a discrete cosine basis in the following way:
where \(a=(a_j)_{1 \le j \le n} \in {\mathbb {R}}^n \). By incorporating the expression (6) into the equations (5), we can obtain the vector a by solving a linear system of the form
where \(C \in {\mathbb {R}}^{n \times n}\) is a matrix with the following coefficients:
The matrix C is nonsingular (see the proof in the appendix). The one-dimensional smoothed vector m is then built by evaluating (8) for all \(i \in \{ 1,\ldots , N \}\), which amounts to
or, equivalently, to
where \(M=KC^{-1} \in {\mathbb {R}}^{N \times n}\) and \(K \in {\mathbb {R}}^{N \times n}\) is the matrix defined by \(K_{ij}=\cos (\frac{(j-1)(i-1)\pi }{N})\). Equation (8) shows that the magnification procedure corresponds to the application of a linear operator. The magnification cost is negligible compared to an objective function evaluation. In fact, the magnification is accomplished by first computing \(C^{-1}g \in {\mathbb {R}}^n\) and then multiplying it by \(K \in {\mathbb {R}}^{N\times n}\), resulting in \(K (C^{-1}g) \in {\mathbb {R}}^{N}\), while an evaluation of the objective function requires the solution of a larger linear system of size \(N\times N\) with p multiple right-hand sides [see (2)].
3.2 Three-dimensional space reduction
A multi-dimensional transform (known as fast direct multi-dimensional DCT) can be carried out by using a composition of the one-dimensional magnification procedure along each dimension (Britanak et al. 2006). Equation (8) can then be immediately extended to 2D or 3D velocity models. A detailed description of the extension of Eq. (8) to higher dimensions is given in (Britanak et al. 2006). In the case of three-dimensional data, given a 3D velocity model G of \(n=n_x \times n_y \times n_z\) parameters, we ought to build a magnified 3D velocity model m of size \(N=N_x \times N_y \times N_z \gg n\) parameters. The magnification procedure is obtained by applying (8) first to the x axis, then to y, and finally to z as follows (using Matlab notation):
where \(M_x \in {\mathbb {R}}^{N_x \times n_x}\), \(M_y \in {\mathbb {R}}^{N_y \times n_y}\), and \(M_z \in {\mathbb {R}}^{N_z \times n_z}\) are the one-dimensional smoothing matrices defined in (8) along the axes x, y, and z, respectively.
3.3 Application to the SEG/EAGE salt dome velocity model
To illustrate numerically the performance of the three-dimensional approximation procedure, we have used the SEG/EAGE salt dome velocity model introduced in Sect. 2.3.
Figure 4 outlines three one-dimensional SEG/EAGE salt dome velocity profiles (with respect to the x, y, and z axes, respectively). Reduction procedures were then applied to these velocity profiles to create vectors for duplication and magnification. Following the x axis (resp. y axis), the velocity profile has been selected at the position \(y=8.9\, {{\mathrm{km}}}\) and \(z=2.5\,{{\mathrm{km}}}\) (resp. \(x=8.9\, {{\mathrm{km}}}\) and \(z=2.5\,{{\mathrm{km}}}\)). In both cases the reduced velocity vector is built using \(n = 8\) parameters only, while \(N = 225\) parameters are required for the true velocity vector. Following the z axis, the velocity profile (selected at the position \(x=8.9\, {{\mathrm{km}}}\) and \(y=8.9\,{{\mathrm{km}}}\)) is reduced using \(n=5\) parameters compared to the true \(N=70\) ones. From the results obtained, we observe that the smoothing effect of the DCT transform improves the quality of the duplicated velocity profiles as a representation of the original ones, and that the space reduction approach can work relatively well with a reduced number of parameters.
For the three dimensional case, we have found that the true velocity model can be relatively well approximated using \(n=8 \times 8 \times 5=320\) parameters instead of the original \(N= 225 \times 225 \times 70=3543750\) ones, in the sense of still representing the main structure (i.e., the salt dome) of the true velocity model. Figure 5 outlines an illustration of the obtained results. As expected, the magnification procedure using DCT (see Fig. 5g–i) gives better results compared to the duplication procedure which is based on the step function transform (see Fig. 5d–f). Although we use only a few parameters to represent the velocity model, our smooth magnification procedure preserves the main specificity of the true model, in particular the salt dome.
4 A parallel ES for acoustic full-waveform inversion
In this section we start by briefly reviewing the existing methods to compute a satisfactory initial velocity model for seismic inversion. Then we explain how to apply ES’s for this purpose when using the space reduction introduced before. A parallel implementation of the resulting ES’s is also proposed.
4.1 Existing methods
The acoustic full-waveform inversion problem introduced in Sect. 2.2 is nonconvex, and thus its solution by optimization algorithms crucially depends on the starting velocity model \(m_s\). In fact, it is known that the inversion procedure converges to satisfactory results only if the starting velocity model is situated not far from a global minimizer (Virieux and Operto 2009). Hence, before applying the full-waveform inversion, a starting model is generally built. To do this, the most common techniques are first-arrival travel-time tomography (FATT) (Nolet 1987), stereotomography (Lambaré 2008) or, more recently, inversion in the Laplace domain (Shin and Cha 2008). For many years, FATT has proven to be stable in generating smooth velocity models of the subsurface. Some examples of application of FWI to real data using a starting model built by FATT are described in (Operto et al. 2006; Ravaut et al. 2004). Similarly, the stereotomography is regarded as one of the most promising methods for building a smooth velocity model. It exploits the arrival time of locally coherent events within an automatic procedure to select a seismogram collection (Lambaré 2008). Some applications to synthetic and real data sets are shown in (Billette et al. 2003; Billette and Lambaré 1998). Finally, the inversion in the Laplace domain can be viewed as a frequency domain inversion using a pure imaginary complex valued frequency that controls the time damping of the seismic wavefield. Applications of Laplace domain FWI to synthetic and real data are proposed in (Shin and Cha 2008; Shin and Ha 2008, 2012).
4.2 Evolution strategies and CMA–ES
Evolutions strategies are a class of evolutionary algorithms designed for the optimization of a possibly nonconvex function in a continuous domain without using derivatives. It has been originally developed in (Rechenberg 1973) for the unconstrained optimization of a function, \(\min _{v \in {\mathbb {R}}^n} f(v)\), and has been extensively investigated and tested [see, e.g., Beyer and Schwefel (2002), Hansen et al. (to appear)] and the references therein]. We are interested in a large class of ES’s denoted by \((\mu ,\lambda )\)–ES, with \(\mu \) and \(\lambda \) integers such that \(1 < \mu \le \lambda \), where a certain number \(\lambda \) of points are randomly generated in each iteration, among which \(\mu \) of them are selected as the best in terms of the objective function f.
CMA–ES (Hansen et al. 1995) (where CMA stands for covariance matrix adaptation) is regarded as one of the best in the class \((\mu ,\lambda )\)–ES in terms of numerical performance (Auger et al. 2013, 2009; Hansen et al. 2010; Rios and Sahinidis 2013). More precisely, CMA–ES belongs to the ES family denoted by \((\mu /\mu _W,\lambda )\)–ES, where the subscript ‘W’ indicates the use of ‘recombination’ via weights. Broadly speaking, at iteration k, a candidate minimizer \(\bar{x}_k\) is used to produce a generation of \(\lambda \) ‘offspring’, each consisting of adding to \(\bar{x}_k\) a random direction multiplied by a parameter controlling the length or size of the steps; the best \(\mu \) of these are retained as ‘parents’ in a ‘selection’, and \(\bar{x}_{k+1}\) is taken as a weighted combination (‘recombination’) of these parents. The CMA component considers a Gaussian distribution of mean zero for the random generation of the directions and provides a scheme for updating the corresponding covariance matrix as well as the step length.
4.3 A modified CMA–ES
CMA–ES has exhibited robust performance for difficult ill-conditioned, non-separable, and highly multi-modal problems (Auger et al. 2013, 2009; Hansen et al. 2010; Rios and Sahinidis 2013). Its main drawback is, however, that a large budget is required to provide outstanding results. Recently, the authors in (Diouane et al. 2015) have proposed modifications to the class of algorithms in \((\mu /\mu _W,\lambda )\)–ES to make them enjoying a favorable convergence property and performing better for smaller budgets. The modifications have been essentially the imposition of a sufficient decrease on the objective function values to accept new iterates and the reduction of the step size when such a condition is not satisfied. Under such modifications, these ES’s can converge globally (meaning independently of the starting point) for a first-order stationary point. Algorithm 1 shows an adaptation of the globally convergent ES’s proposed in (Diouane et al. 2015) to the context of full-waveform inversion.
The authors in (Diouane et al. 2015) have proposed three different globally convergent ES versions named mean/mean, max/max, and max/mean. On a large data set of problems, the mean/mean version has performed numerically the best. However, the incorporation of the mean/mean sufficient decrease condition requires an extra objective function evaluation \({\mathcal {J}}(m_{k+1}^{trial})\) at each iteration, where \(m_{k+1}^{trial}\) is the trial mean parent computed as the mean of the best \(\mu \) generated velocity models. The mean/mean version would therefore corrupt the parallel nature of ES’s. In fact, if one supposes that the evaluation of \({\mathcal {J}}\) at the offspring is performed at the same time using synchronized parallel clusters, the mean parent evaluation \({\mathcal {J}}(m_{k+1}^{trial})\) will force all these clusters to wait until the end of such an evaluation to be able to restart a new offspring generation. Alternatively, the max/max version has shown good performance (not as good as the mean/mean version) without the need of any extra objective function evaluation to impose the corresponding sufficient decrease condition. Consequently, we have adopted the max/max version in Algorithm 1.
We also note that the update of the weights to enforce the sufficient decrease condition, originally proposed in (Diouane et al. 2015), has not been activated in our setting since we aim at the least amount of changes in the original ES’s and since such an update did not seem to have a real impact on the results for the max/max version [see Diouane et al. (2015)].
Moreover, in the evaluation procedure of the objective function, one needs to satisfy the relation (3). Hence we have imposed a lower bound on the velocity equal to a known minimum value \(m_{min}=1500\,{\rm{m/s}}\) (the velocity model value of the water). A maximum value on the velocity model of \(m_{max}=4500\,{\rm{m/s}}\) has been also imposed to avoid propagation by meaningless velocity models. Both requirements were guaranteed by projecting the offspring models onto the feasible domain defined by the bounds, an approach that has been shown to be globally convergent in (Diouane et al. 2015). Since there are no other constraints rather than simple bounds on the variables one can use the simple orthogonal \(\ell _2\)-projection.
4.4 A parallel implementation
The proposed ES implementation consists of a synchronized parallel optimizer composed of \(\lambda \) clusters (typically, the population size). Each cluster is composed of a group of processors, which is assigned to evaluate the objective function (4). At a given iteration k, the clusters are synchronized and not activated until the new mean parent \(m_{k+1}\) is defined, which in turn depends on the iteration state (successful or not). Figure 6 sketches in detail a given iteration of our proposed parallel implementation of Algorithm 1.
The proposed parallel implementation can be described as follows. In addition, at each iteration, \(\lambda \) clusters, represented in Fig. 6 by the components [Generate \(m^i\)], are launched in a synchronous manner. Each of these clusters generates a reduced velocity model based on the ES parameters and strategies. Once the velocity model is generated, the related cluster evokes the component [Propagate \(m^i\)].
The wave propagation simulation related to each velocity model considers the p source terms (i.e., p right-hand sides) at once. The [Propagate \(m^i\)] component is based on MPI (Message Passing Interface) and is responsible for discretizing and generating the linear system related to the forward problem (see Sect. 2.1) and to provide the information needed to evaluate the objective function \({\mathcal {J}}\) given in (4). The last component will just return the value of the objective function to the unique master processor [Master]. Once the master processor has received results from the \(\lambda \) clusters, it will choose the best, decide whether the iteration is successful or not, and update the mean parent accordingly. The [Master] component updates also the ES parameters (e.g., the distribution and the step length) and repeats the loop until a convergence criterion is achieved.
The propagation in itself behaves as a black box process, hiding from the ES the complexity of the discretization and of the forward solver. Changing the discretization and/or forward solver parameters will not incur in any rewriting of the ES implementation. MPI-2 has been used with the MPI_COMM_SPAWN primitive which allows an MPI process to spawn a number of clusters. Each newly spawned cluster has a specific MPI_COMM_WORLD intracommunicator that allows us to launch easily the propagation simulations over a number of CPU cores. We note that the proposed ES implementation is portable and that the propagator itself can be a standalone client. When the available cluster number is less than \(\lambda \), one can launch many propagation simulations on the same cluster until we obtain the needed function evaluations.
5 Numerical experiments
In this section we first describe the validation scenario and detail the parameters of the global optimization algorithm. Then we analyze the numerical results obtained for the inversion procedure at two frequencies. The numerical simulations have been performed on CURIE Footnote 3, a parallel cluster located at TGCC, France (two eight-cores Intel Sandy Bridge EP (E5-2680) at 2.7 GHz and 64 GB RAM per computing node with InfiniBand QDR Full Fat Tree interconnect) using a Fortran 2003 implementation with MPI in single precision arithmetic. The code has been compiled by the Intel ifort compiler suite with standard compiling options and linked with the MKL library. The numerical experiments have been performed on a fixed number of cores (2048, i.e., 128 computing nodes) with a maximal allowed elapsed computing time of 24 hours.
5.1 Problem and algorithm specifications
We have used a simple scenario where the p source terms are supposed to be Dirac functions and where the observed data \(d^i_{{{\mathrm{obs}}}}\) [i.e. seismograms (Tarantola 2005)] are generated from the propagating velocity model that we are trying to invert (see Sect. 2.3). The p sources (with \(p=16\)) are uniformly distributed in a survey plan located at 500 meters of depth (\(10\%\) of the exploration depth).
In this scenario we consider the acoustic full waveform inversion procedure at two different frequencies (\(1\,{{\mathrm{Hz}}}\) and \(2\,{{\mathrm{Hz}}}\) respectively) with only 320 parameters for the initial velocity model (see Sect. 3). Table 1 reports the dimensions \(N_x \times N_y \times N_z\) of the forward problem in agreement with the stability condition (3), the number of clusters, as well as the population size \(\lambda \) versus the frequency. As the frequency increases, the number of cores dedicated to the objective function evaluation becomes larger. In fact, the forward problem gets more complicated to solve as the frequency f increases. The initial iterate \(\bar{x}_0\), for the parallel ES algorithm in the case of 1 Hz, is built using the magnification procedure based on two given velocity values (3000 m/s and 1500 m/s, distributed as follows \(2= 1 \times 1 \times 2\)). Both the nonlinearity and the ill-posedness of the FWI problem are in practice tackled in the frequency domain using a multi-scale approach where one starts the inversion procedure in a low frequency range to mitigate the nonlinearity of the inversion, and then incorporate progressively higher frequencies (Sirgue 2006; Virieux and Operto 2009). Hence the velocity model solution obtained in the case of \(1\,{{\mathrm{Hz}}}\) will serve as a warm-starting point for the \(2\,{{\mathrm{Hz}}}\) case.
In the context of CMA–ES, the choice \(\lambda = {\text{ floor }}(4+3\log (n))\) (where \({\text{ floor }}(\cdot )\) rounds to the nearest integer no larger than the number given) has been shown to provide a good compromise between quality of the solution found and effort in determining it [see Hansen (2011)]. However, given the strong need for global exploration at \(1\,{{\mathrm{Hz}}}\), we have chosen an even larger value, equal to \(\lambda =512\), to better take advantage of the number of cores and cores per cluster, as we explain next. Since doubling the frequency doubles the number of discretization points in each direction, the number of cores per cluster is multiplied by at least a factor of eight when considering the case of \(2\,{{\mathrm{Hz}}}\). A factor of 16 has been retained in these numerical experiments. Given that the total number of cores is fixed to 2048 in both experiments, we note that the number of clusters at \(2\,{{\mathrm{Hz}}}\) is reduced from \(2048/8=256\) to \(2048/128=16\). Since the population size \(\lambda \) is chosen as twice the number of available clusters, we had to run two evaluations on the same cluster to get the entire offspring population (for both frequencies).
The parameters of the optimization algorithm are chosen similarly to those of CMA–ES for unconstrained optimization [see Hansen (2011)]: \(\mu = {\text{ floor }}(\lambda /2)\) and \(\omega _0^i = a_i/(a_1+\cdots + a_\mu )\) with \(a_i = \log (\lambda /2+1/2)-\log (i)\), \(i=1,\ldots ,\mu \). The choices of the distribution \({\mathcal {C}}_k\) and of the update of \(\sigma _k^{{{\mathrm{ES}}}}\) also followed CMA–ES for unconstrained optimization [see Hansen (2011)]. The selected forcing function was \(\rho (\sigma )=10^{-4} \sigma ^2\). To reduce the step length in unsuccessful iterations we have used \(\sigma _{k+1} = 0.5 \sigma _k\) which corresponds to set \(\beta _1 = \beta _2 = 0.5\). Finally, the initial step size \(\sigma _0\) is set to half of the difference between the velocity value on the bottom and the one on the top, given (in SEG/EAGE salt dome velocity model) by 3000 and 1500 m/s, respectively.
5.2 Numerical results for full-waveform inversion
Figure 7 reports a graphical representation of the inverted velocity model at \(1\,{{\mathrm{Hz}}}\). These numerical results are found to be satisfactory since a smooth version of the true velocity model is obtained. We can indeed consider that we are able to invert the general structure of the regarded velocity model, since in particular the salt dome structure is recovered. We also remark that such a model can be considered as a good starting point when using gradient based methods (Virieux and Operto 2009).
Figure 9a depicts the values of the objective function at the best population point at each iteration for the case \(1\,{{\mathrm{Hz}}}\). The variation of the objective function is more significant only at the early stages of the inversion process. Such a behavior is due to the sufficient decrease condition which monitors the quality of the sampling procedure. We note that the objective function value decreases from 2042.113 to 575.8082, and that after 278 iterations the inversion procedure is stopped due to the maximal elapsed computational time.
Figure 8 shows the results obtained for \(f = 2\,{{\mathrm{Hz}}}\). Even though the optimization procedure is started from the inverted velocity model obtained at \(1\,{{\mathrm{Hz}}}\), we note that the inversion result is getting less accurate and rather far from being a good approximation of the targeted velocity model.
Figure 9b shows that the optimization procedure is converging to a fixed velocity model with an objective function value of 123.2841 after 1220 iterations. Unlike the \(1\,{{\mathrm{Hz}}}\) case, this plot at \(2\,{{\mathrm{Hz}}}\) indicates that the inversion process is getting stacked at a local minimum. The explanation of such results is that the objective function becomes more and more noisy and multi-modal as far as the frequency increases (Sirgue 2006). A possible way to overcome such a difficulty is to increase the population size of the ES in order to encourage the global exploration.
6 Conclusion
In this paper we have presented a numerical method based on evolution strategies for the solution of the acoustic full-waveform inversion problem arising in Earth imaging in geophysics. Considering that each parameter in the velocity model to be inverted is an additional variable to optimize, we have proposed a new parametrization of the problem, reducing the number of parameters needed to faithfully representing the velocity models. We have illustrated the efficiency of our parametrization on the public domain SEG/EAGE salt dome velocity model, where we were able to reconstruct a rather satisfactory approximation using very few parameters.
We have then adapted the chosen ES’s to a reduction of the original space, in particular to the full-waveform inversion setting when using reduced parameterized velocity models. The main purpose of the incorporation of ES’s in the inversion procedure is to find a good starting velocity model without the need of sophisticated a priori knowledge on the background velocity model.
Given the high cost of the problem function evaluations, a highly parallel scheme for such ES’s has been derived and validated on parallel distributed memory platforms. The parallel implementation has been tested in combination with the new model parametrization. The obtained numerical results have shown that a great improvement can be obtained in the automation of the inversion procedure.
The testing scenario used for the acoustic full-waveform inversion problem is a simple one where the source excitations are known and the observed data (i.e. seismograms) is generated using the propagating velocity model which we are trying to invert. We note that such a scenario is not truly realistic for the following two reasons: (a) the source excitation is generally unknown and must be rather included as an unknown of the problem, and (b) the observed data is generally given by geophones located at the surface of the exploration domain. Thus, a more realistic test case has to be investigated in the future to fully understand the potential of the proposed approach.
Finally we remark that our model reduction procedure can certainly be generalized to cover other fields of applications. Similarly, the developed parallel approach can also be applied to other optimization problems in geosciences (e.g. well placement in reservoir modeling or analysis of permeability of rocks in petrophysics).
Notes
The Society of Exploration Geophysicists.
European Association of Geoscientists and Engineers.
References
Aditya G, Akhilesh B, Kuntal C (2012) A comprehensive review of image smoothing techniques. Int J Adv Res Comput Eng Technol 1:315–319
Auger A, Brockhoff D, Hansen N (2013) Benchmarking the local metamodel CMA–ES on the noiseless BBOB’2013 test bed. In: Proceedings of the 15th annual conference companion on genetic and evolutionary computation, GECCO ’13 companion. ACM, New York, pp 1225–1232
Auger A, Hansen N, Perez Zerpa ZJ, Ros R, Schoenauer M (2009) Experimental comparisons of derivative free optimization algorithms. In: 8th international symposium on experimental algorithms, vol 5526 of Lecture Notes in Computer Science. Springer, pp 3–15
Berenger J-P (1994) A perfectly matched layer for absorption of electromagnetic waves. J Comput Phys 114:185–200
Berenger J-P (1996) Three-dimensional perfectly matched layer for absorption of electromagnetic waves. J Comput Phys 127:363–379
Beyer H-G, Schwefel H-P (2002) Evolution strategies: a comprehensive introduction. Nat Comput 1:3–52
Billette F, Le Bégat S, Podvin P, Lambaré G (2003) Practical aspects and applications of 2D stereotomography. Geophysics 68:1008–1021
Billette F, Lambaré G (1998) Velocity macro-model estimation from seismic reflection data by stereotomography. Geophys J Int 135:671–690
Britanak V, Yip PC, Rao KR (2006) Discrete cosine and sine transforms. Academic Press, Oxford
Brossier R (2009) Imagerie Sismique à Deux Dimensions des Milieux Visco-élastiques par Inversion des Formes d’Ondes : Développements Méthodologiques et Applications. PhD thesis, Université de Nice-Sophia Antipolis
Calandra H, Gratton S, Lago R, Vasseur X, Carvalho L (2013) A modified block flexible GMRES method with deflation at each iteration for the solution of non-hermitian linear systems with multiple right-hand sides. SIAM J Sci Comput 35:S345–S367
Calandra H, Gratton S, Langou J, Pinel X, Vasseur X (2012) Flexible variants of block restarted GMRES methods with application to geophysics. SIAM J Sci Comput 34:A714–A736
Cohen G (2002) Higher-order numerical methods for transient wave equations. Springer, Berlin
Collins MD, Kuperman WA (1992) Nonlinear inversion for ocean bottom properties. J Acoust Soc Am 92:2770–2782
Diouane Y (2014) Globally convergent evolution strategies with application to an earth imaging problem in geophysics. PhD thesis. Institut National Polytechnique de Toulouse
Diouane Y, Gratton S, Vicente LN (2015) Globally convergent evolution strategies. Math Program 152:467–490
Diouane Y, Gratton S, Vicente LN (2015) Globally convergent evolution strategies for constrained optimization. Comput Optim Appl 62:323–346
Ernst O, Gander MJ (2012) Why it is difficult to solve Helmholtz problems with classical iterative methods. In: Lakkis O, Graham I, Hou T, Scheichl R (eds) Numerical analysis of multiscale problems. Springer, New York, pp 325–363
Gerstoft P (1994) Nonlinear inversion for ocean bottom properties. J Acoust Soc Am 95:770–782
Hansen N (2011) The CMA evolution strategy: a tutorial. Available via URL: https://www.lri.fr/ hansen/cmatutorial.pdf
Hansen N, Arnold DV, Auger A (to appear) Evolution strategies. In Kacprzyk J, Pedrycz W (eds) Handbook of computational intelligence. Springer, Berlin
Hansen N, Auger A, Ros Raymond R, Finck S, Pošík P (2010) Comparing results of 31 algorithms from the black-box optimization benchmarking bbob-2009. In: Proceedings of the 12th annual conference companion on genetic and evolutionary computation, GECCO ’10. ACM, New York, pp 1689–1696
Hansen N, Ostermeier A, Gawelczyk A (1995) On the adaptation of arbitrary normal mutation distributions in evolution strategies: the generating set adaptation. In Eshelman L (ed) Proceedings of the sixth international conference on genetic algorithms. Pittsburgh, pp 57–64
Henderson A (2007) ParaView guide. A parallel visualization application. Kitware Inc, Clifton Park
Kennett BL, Sambridge MS, Williamson PR (1988) Subspace methods for large inverse problems with multiple parameter classes. Geophys J Int 94:237–247
Lago R (2013) A study on block flexible iterative solvers with application to earth imaging problem in geophysics. PhD thesis. Institut National Polytechnique de Toulouse
Lambaré G (2008) Stereotomography. Geophysics 73:VE25–VE34
Mulder WA, Plessix RE (2008) Exploring some issues in acoustic full waveform inversion. Geophys Prospect 56:827–841
Nabi OM, Xiaodong L (2010) A comparative study of CMA–ES on large scale global optimisation. In Jiuyong Li (ed) Australasian conference on artificial intelligence, vol 6464 of Lecture Notes in Computer Science. Springer, pp 303–312
Nolet G (ed) (1987) Seismic tomography: with applications in global seismology and exploration geophysics. D. Reidel publishing Company, Dordrecht
Oldenburg DW, McGillivray PR, Ellis RG (1993) Generalized subspace methods for large-scale inverse problems. Geophys J Int 114:12–20
Operto S, Virieux J, Dessa JX, Pascal G (2006) Crustal seismic imaging from multifold ocean bottom seismometer data by frequency domain full waveform tomography: application to the eastern Nankai Trough. J Geophys Res 159:1032–1056
Pinel X (2010) A perturbed two-level preconditioner for the solution of three-dimensional heterogeneous Helmholtz problems with applications to geophysics. PhD thesis. Institut National Polytechnique de Toulouse
Pratt GR, Worthington MH (1990) Inverse theory applied to multi-source cross-hole tomography. Part 1: acoustic wave-equation method. Geophys Prospect 38(3):287–310
Pratt RG (1999) Seismic waveform inversion in the frequency domain, Part 1: theory and verification in a physical scale model. Geophysics 64:888–901
Rauber T, Rünger G (2013) Parallel programming: for multicore and cluster systems, 2nd edn. Springer, Berlin
Ravaut C, Operto S, Improta L, Virieux J, Herrero A, Dell’Aversana P (2004) Multiscale imaging of complex structures from multifold wide-aperture seismic data by frequency-domain full-waveform tomography: application to a thrust belt. Geophys J Int 159:1032–1056
Rechenberg I (1973) Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. Frommann-Holzboog
Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. J Glob Optim 56:1247–1293
Röth G (1994) Application of neural networks to seismic inverse problems. PhD thesis. Institut de Physique du Globe de Paris
Shin C, Cha YH (2008) Waveform inversion in the Laplace domain. Geophys J Int 173:922–931
Shin C, Ha W (2008) A comparison between the behavior of objective functions for waveform inversion in the frequency and Laplace domains. Geophysics 73:VE119–VE133
Shin C, Ha W (2012) Laplace-domain full-waveform inversion of seismic data lacking low-frequency information. Geophysics 77:R199–R206
Sirgue L (2006) The importance of low frequency and large offset in waveform inversion. In: Extended abstracts, vol A037 of 68th confererence & technical exhibition. EAGE
Sirgue L, Pratt RG (2004) Efficient waveform inversion and imaging: a strategy for selecting temporal frequencies. Geophysics 69:231–248
Sjogreen B, Petersson NA (2014) Source estimation by full wave form inversion. J Sci Comput 59:247–276
Skilling J, Bryan RK (1984) Maximum entropy image reconstruction: general algorithm. Mon Not R Astron Soc 211:111–124
Sourbier F, Operto S, Virieux J, Amestoy P, L’ JY (2009) Excellent. FWT2D: a massively parallel program for frequency-domain Full-Waveform Tomography of wide-aperture seismic data—part 1: algorithm. Comput Geosci 35:487–495
Sourbier F, Operto S, Virieux J, Amestoy P, L’ JY (2009) Excellent. FWT2D: a massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data: part 2: numerical examples and scalability analysis. Comput Geosci 35:496–514
Tarantola A (2005) Inverse problem theory and methods for model parameter estimation. SIAM, Philadelphia
Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. In: Proceedings of the sixth international conference on computer vision. IEEE Computer Society, Washington, DC, pp. 839–846
Virieux J, Operto S (2009) An overview of full-waveform inversion in exploration geophysics. Geophysics 74:WCC1–WCC26
Acknowledgments
The authors would like to acknowledge GENCI (Grand Equipement National de Calcul Intensif) for letting us used the CURIE computer at CCRT, Bruyères-le-Châtel, France. This work was granted access to the HPC resources of CCRT under allocation 2014065068 made by GENCI. Support for L. N. Vicente was provided by FCT under Grants PTDC/MAT/116736/2010 and PEst-C/MAT/UI0324/2011 and by the Réseau Thématique de Recherche Avancée, Fondation de Coopération Sciences et Technologies pour l’Aéronautique et l’Espace, under the grant ADTAO.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Theorem 6.1
Given \(n \in {\mathbb {N}}, N \in {\mathbb {N}}\) with \(n < N\) and \(\delta =[N/n],\) the coefficient matrix \(C \in {\mathbb {R}}^{n \times n}\) defined as
is nonsingular. (Note that for \(j=1\) we have used the convention \(sin(0)/0=1\) , i.e., \(C_{i1} = 1\).)
Proof
A direct calculation shows that
where \(M \in {\mathbb {R}}^{n \times n}\) is defined as \(M_{ij}= \cos (\frac{\pi }{N}(j-1) (i-\frac{1}{2})\delta )\). The nonsingularity of M can be deduced from the properties of DCT-III (Britanak et al. 2006). \(\square \)
Rights and permissions
About this article
Cite this article
Diouane, Y., Gratton, S., Vasseur, X. et al. A parallel evolution strategy for an earth imaging problem in geophysics. Optim Eng 17, 3–26 (2016). https://doi.org/10.1007/s11081-015-9296-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-015-9296-8