Keywords

1 Introduction

The solution of structural gravity problems and magnetic inverse problems has an extraordinary importance in the study of the Earth’s crust structure [1,2,3].

This paper deals with the problem of finding an interface between layers with different magnetizations using known magnetization contrast, interface depth, and magnetic field [4, 5].

The problem is described by a nonlinear integral equation of the first kind and thus is ill-posed. It is therefore necessary to use iterative regularization methods [6].

Real observations are performed on large areas. To increase the accuracy and the level of detail, it is essential to use finer grids, which leads to big data sets. The application of modern computing technologies and parallel computations makes it possible to significantly reduce computation time.

An effective method to determine the structural boundary in the case of arbitrarily directed magnetization was constructed in [7, 8] on the basis of the linearized conjugate gradient method.

A time-efficient componentwise gradient method for solving gravity inverse problems was constructed in [9]. In the present paper, we use this method to solve the magnetic inverse problem of finding a magnetization interface in the case of an arbitrarily directed magnetization. Here, we modify the method for better performance. The modification consists in offsetting the indices of the components with respect to the angle of the magnetization vector.

Moreover, we construct a parallel algorithm based on the modified componentwise method and implement this parallel algorithm using the Intel CPUs and NVIDIA Tesla GPUs of the Uran supercomputer, which is installed at the Institute of Mathematics and Mechanics of the Ural Branch of the Russian Academy of Sciences. We also investigate the efficiency and speedup of the parallel algorithm and compare it with a conjugate gradient-based algorithm in terms of iteration number and computation time.

Fig. 1.
figure 1

Two-layer medium for the magnetic problem

2 Problem Statement

Let us introduce a cartesian coordinate system in which the x0y plane coincides with the Earth’s surface and the z axis is directed downwards, as shown in Fig. 1. Assume that the lower half-space consists of two layers with constant magnetizations \(J_1\) and \(J_2\), divided by the surface sought, which is described by a bounded function \(\zeta =\zeta (x,y)\), and \(\lim \limits _{| x |+ | y | \rightarrow \infty } (h-\zeta (x,y))=0\) for some h.

The function \(\zeta \) must satisfy the following equation:

$$\begin{aligned} \varDelta Z(x',y',0)&= \frac{1}{4\pi } \int \limits _{-\infty }^{\infty } \int \limits _{-\infty }^{\infty } \Bigg [ \frac{\varDelta J_x(x-x')+ \varDelta J_y(y-y')-\varDelta J_z h}{\big ((x'-x)^2+(y'-y)^2+h^2\big )^{3/2}} {}\nonumber \\&- \frac{\varDelta J_x(x-x')+\varDelta J_y(y-y')-\varDelta J_z \zeta (x,y) }{\big ((x'-x)^2+(y'-y)^2+\zeta ^2(x,y)\big )^{3/2}} \Bigg ]\,dx\,dy, \end{aligned}$$
(1)

where \(\varDelta J_x,\varDelta J_y,\varDelta J_z\) are the components of the magnetization contrast \(\varDelta J=J_2-J_1\), and \(\varDelta Z(x,y,0)\) is the vertical component of the anomalous magnetic field measured at the Earth’s surface.

A preliminary processing of data with the aim of extracting the anomalous field from the measured magnetic data is performed using a technique described and implemented in [10].

Equation (1) is a nonlinear two-dimensional integral equation of the first kind.

After discretization of the region \(\varPi =\{(x,y): a\leqslant x \leqslant b, c\leqslant y \leqslant d\}\) by means of an \(n=M\times N\) grid and approximation of the integral operator using quadrature rules, we obtain a vector F on the right-hand side and an approximation of the solution vector z of dimension n. Equation (1) can be thus written as

$$\begin{aligned} \varDelta F_i&=\frac{\varDelta x \varDelta y}{4\pi } \sum \limits _{j=1..n} \Bigg [ \frac{\varDelta J_x(x_i-x_j)+ \varDelta J_y(y_i-y_j)-\varDelta J_z h}{\big ((x_i-x_j)^2+(y_i-y_j)^2+h^2\big )^{3/2}} {}\nonumber \\&- \frac{\varDelta J_x(x_i-x_j)+\varDelta J_y(y_i-y_j)-\varDelta J_z z_j }{\big ((x_i-x_j)^2+(y_i-y_j)^2+z_j^2\big )^{3/2}} \Bigg ] , \end{aligned}$$
(2)

We can rewrite the equation as

$$\begin{aligned} A(z)=F. \end{aligned}$$
(2a)

3 Numerical Methods for the Solution of the Problem

3.1 Linearized Conjugate Gradient Method

The linearized conjugate gradient method (LCGM) has the following form [11]:

$$\begin{aligned} z^{k+1}&= z^k - \psi \frac{\langle p^k, S(z^k)\rangle }{\left\| A'(z^k)p^k\right\| ^2}p^k,\nonumber \\ p^k&=S(z^k)+\beta ^k p^{k-1},\nonumber \\ p^0&=S(z^0),\\ \beta ^k&=\max \left\{ \frac{\bigl \langle S(z^k),\bigl (S(z^k)-S(z^{k-1})\bigr ) \bigr \rangle }{\left\| S(z^{k-1})\right\| ^2},0 \right\} ,\nonumber \\ S(z)&=A'(z)^T\bigl (A(z)-F\bigr ),\nonumber \end{aligned}$$
(3)

where \(z^k\) is the approximation of the solution in the kth iteration, \(k \in \mathbb {N}\), and \(\psi \) is a damping factor.

A parallel algorithm based on this method was developed and implemented in [8] for NVIDIA GPUs using CUDA technology.

3.2 Componentwise Gradient Method

The componentwise gradient method (CWM) has the following form [9]:

$$\begin{aligned} z^{k+1}_i=z^{k}_i- \psi \frac{A_i(z^k)-F_i}{\left\| \nabla A_i(z^k) \right\| ^2} \left( \frac{\partial A_i(z^k)}{\partial z_i}\right) , \end{aligned}$$
(4)

where \(z_i\) is the ith component of the solution approximation, \(i=1,\ldots ,n\), \(k \in \mathbb {N}\), and \(\psi \) is a damping factor.

The main idea of this method is to minimize the residual \(A_i(z)-F_i\) at one grid node i by changing the value \(z_i\) at this node. The idea is based on the fact that the value of a gravity or magnetic (in the case of vertically directed magnetization) field depends on \(1/r^2\). Thus, the value of \(z_i\) exerts the greatest influence on the field value \(F_i\) at the node directly situated above it. In the case of an arbitrarily directed magnetization, the correlation between \(z_i\) and \(F_i\) is weaker, so this method will not be as effective as it is for vertical magnetization.

3.3 Modified Componentwise Gradient Method

Let us find the approximation of a new point j at which \(F_j\) is mostly influenced by \(z_i\) in the case of an arbitrarily directed magnetization. This point is displaced from the point i by the biases \(\bar{x}\) and \(\bar{y}\). To find \(\bar{x}\), we need to solve the following problem:

$$\begin{aligned} \bar{x} = \mathop {\mathrm{arg\,max}} \limits _{x} \bigg [- \frac{\varDelta J_x(x)-\varDelta J_z h }{\big (x^2+h^2\big )^{3/2}} \bigg ]. \end{aligned}$$

The necessary condition for maximum is

$$\begin{aligned} \frac{d}{dx} \bigg [- \frac{\varDelta J_x(x)-\varDelta J_z h }{\big (x^2+h^2\big )^{3/2}} \bigg ] = 0. \end{aligned}$$

Write the derivative:

$$\begin{aligned} - \frac{\varDelta J_x(2x^2-h^2)+3 \varDelta J_z h x}{\big (x^2+h^2\big )^{5/2}} = 0. \end{aligned}$$

Evidently, \(x\ne 0\) for the case of nonvertical magnetization and the surface lies below the Earth’s level, i.e. \(h>0\), so that

$$\begin{aligned} \varDelta J_x(2x^2-h^2)+3 \varDelta J_z h x = 0. \end{aligned}$$

Write the roots of this equation:

$$\begin{aligned} \bar{x}_{1,2}= \frac{\big (-3 \varDelta J_z \pm \sqrt{8 \varDelta J_x^2 +9 \varDelta J_z^2}\big )h}{4 \varDelta J_x}. \end{aligned}$$

Assume that \(\varDelta J_z>0\). Then, obviously, the relation \(\mathop {\mathrm{sgn}}(\varDelta J_x)=\mathop {\mathrm{sgn}}(\bar{x})\) must hold. Only the first root (the one with the plus sign) satisfies this condition. For \(\varDelta J_z<0\), we have the second root (the one with the minus sign).

The \(\bar{y}\) bias can be found in the same way. We can now write the modified componentwise gradient method (MCWM) as follows:

$$\begin{aligned} z^{k+1}_i&=z^{k}_i- \psi \frac{A_j(z^k)-F_j}{\left\| \nabla A_j(z^k) \right\| ^2} \left( \frac{\partial A_j(z^k)}{\partial z_i}\right) , \nonumber \\ j&= i + M \frac{\big (-3 \varDelta J_z + \mathop {\mathrm{sgn}} (\varDelta J_z) \sqrt{8 \varDelta J_y^2 +9 \varDelta J_z^2}\big )h}{4 \varDelta J_y \varDelta y} \\&+\,\frac{\big (-3 \varDelta J_z + \mathop {\mathrm{sgn}} (\varDelta J_z) \sqrt{8 \varDelta J_x^2 +9 \varDelta J_z^2}\big )h}{4 \varDelta J_x \varDelta x},\nonumber \end{aligned}$$
(5)

where \(\varDelta x\) and \(\varDelta y\) are the grid element sizes.

We should also check whether the offsetted indices are out of the grid. If so, we should use the boundary values.

4 Parallel Implementation

The parallel algorithms based on the componentwise methods were implemented on a multicore CPU, using OpenMP technology, and NVIDIA M2090 GPUs, using CUDA technology.

Note that storing a Jacobian matrix for a \(2^9 \times 2^9\) grid takes more than 512 GB.

The elements of the Jacobian matrix in the constructed algorithms are calculated on-the-fly, which means that the value of an element is computed when calling this element, without storing it previously in memory.

The most expensive operation is to compute the values of the integral operator and its Jacobian matrix. This operation consists of four nested loops. In the OpenMP implementation, the outer loops are parallelized using ‘#pragma omp parallel’, whereas the inner loops are vectorized using ‘#pragma simd’ directives. When using multiple GPUs, two outer loops are distributed to the GPUs, and two inner loops are executed on each GPU. The CPU transfers the data between the host memory and GPUs, and then calls the kernel functions.

The adjustment of the kernel execution parameters for the grid size is an important problem. In [12], we proposed an original method for automatic adjustment of parameters. This method is based on rescaling the optimal parameters found for a reference grid size.

This imposes some constraints on the input data and GPUs configuration:

  • the grid size must be divisible by 128 (\(128, 256, 512, 1024, \ldots \));

  • the number of GPUs must be a power of 2 (\(1, 2, 4, 8,\ldots \)).

5 Numerical Experiments

The model problems consisted in finding the interface between two layers. Figure 2 shows the model surface \(z^*\) considered in all model problems.

Figures 3, 4, 5, 6 and 7 show the model magnetic fields \(\varDelta Z_i(x,y,0)\). These fields were obtained by solving the direct problem for the surface with the asymptotic plane \(H=10\) km and various magnetization contrasts:

$$\begin{aligned} \varDelta J_1&=(0,0,1)\,A /m , \\ \varDelta J_2&=(0.19,0.19,1)\,A /m , \\ \varDelta J_3&=(0.41,0.41,1)\,A /m , \\ \varDelta J_4&=(0.71,0.71,1)\,A /m , \\ \varDelta J_5&=(1.23,1.23,1)\,A /m . \end{aligned}$$

These contrasts correspond to magnetization direction angles of \(0^{\circ }\), \(15^{\circ }\), \(30^{\circ }\), \(45^{\circ }\), and \(60^{\circ }\).

The problems were solved on the Uran supercomputer nodes (two eight-core Intel E5-2660 CPUs and eight NVIDIA Tesla M2090 GPUs) by the following three methods:

  • linearized conjugate gradient method (LCGM) (3);

  • componentwise gradient method (CWM) (4);

  • modified componentwise gradient method MCWM (5).

The reconstructed interfaces are shown in Fig. 8.

The condition \({\left\| A(z)-F \right\| }/{\left\| F\right\| }<\varepsilon \), \(\varepsilon = 0.011\), was taken as termination criterion for all methods. The parameter \(\psi \) was set at 0.85 in the CGM for \(60^{\circ }\), as well as in the CWM and MCWM for \(45^{\circ }\). In the CWM and MCWM for \(60^{\circ }\), it was set at 0.75. Everywhere else, it was set at 1.

The relative error of all solutions is \(\delta = \Vert z-z^*\Vert /\Vert z^*\Vert < 0.01\).

Fig. 2.
figure 2

The original surface \(z^*\)

Fig. 3.
figure 3

Model gravitational field for an angle of \(0^{\circ }\)

Fig. 4.
figure 4

Model gravitational field for an angle of \(15^{\circ }\)

Fig. 5.
figure 5

Model gravitational field for an angle of \(30^{\circ }\)

Fig. 6.
figure 6

Model gravitational field for an angle of \(45^{\circ }\)

Fig. 7.
figure 7

Model gravitational field for an angle of \(60^{\circ }\)

Fig. 8.
figure 8

Reconstructed surfaces for various magnetization angles

Table 1 summarizes the numbers of iterations N and average execution times T for 10 runs on two eight-core Intel E5-2660 CPUs (16 cores) with a \(512 \times 512\) grid.

Speedup and efficiency coefficients are used to analyse the scaling of parallel algorithms. The speedup is expressed as \(S_m = T_1 / T_m\), where \(T_1\) is the execution time of a program running on one GPU, and \(T_m\) is the execution time for m GPUs. The efficiency is defined as \(E_m = S_m / m\). The ideal values are \(S_m=m\) and \(E_m=1\), but real values are lower because of the overhead.

Table 2 summarizes the average execution times for the CWM method on a \(512 \times 512\) grid for various numbers of GPUs.

Table 1. Comparison of methods
Table 2. Execution times (in minutes) of the parallel CWM algorithm on multiple GPUs

The experiments show that the constructed modified algorithms are very effective. New algorithms are more economical in terms of operations and time at each iteration step. For the model problems, the componentwise method has a better performance in terms of number of iterations and computation time than the conjugate gradient methods. The parallel algorithms demonstrate an excellent scaling; the efficiency is more than 100% for eight GPUs. Probably, this is due to a non-optimal automatic adjustment of the kernel execution parameters for some configurations of GPUs.

6 Conclusions

We constructed an original variant of a componentwise gradient method for a structural magnetic inverse problem consisting in finding a contact surface in the case of an arbitrarily directed magnetization.

We developed parallel algorithms based on the componentwise gradient method and its modified variant. The parallel algorithms were implemented on a multicore CPU, using OpenMP technology, and on multiple GPUs, using CUDA technology. Model problems with fine grids were solved. The parallel algorithms demonstrated an excellent scaling and nearly 100% efficiency.

The componentwise gradient methods (CWM and MCWM) are very effective for solving problems with a nearly vertical magnetization direction; in this case, computation times are reduced by a factor of 2 to 4. For greater magnetization angles, the modified componentwise gradient method (MCWM) show better computation times compared to the unmodified componentwise method.