Keywords

1 Introduction

In the solution of many physical problems with neural networks (NN), it is necessary to reduce the dimension of the input data [1]. This usually allows one to obtain a more accurate and resilient solution, also reducing computational complexity. In addition, such data preprocessing improves the generalizing ability of the model.

Exploration geophysics uses methods based on the measurement of physical fields at the Earth’s surface to determine the distribution of some physical quantity in the Earth’s interior. Magnetotelluric sounding (MTS) is one of such methods reconstructing electrical conductivity distribution from the properties of electromagnetic fields measured on the surface. However, such reconstruction is an inverse problem (IP) which is often ill-posed or ill-conditioned, and this IP has high dimension both by input and by output.

Such problems may be efficiently solved using NN. Starting from more general pioneering studies in the beginning of 1990s [2] and the first investigations devoted to the NN solution of the IP of MTS [3], subsequent studies of the MTS IP solution within the approximation approach differed by increasing complexity of the parameterization scheme, increasing dimensionality of the IP, and by various improvements in the methods of NN approximation [4,5,6,7,8]. However, the increase in the number of input features hampers the efficiency of approximation. Therefore, reduction of the input dimensionality of the problem is an important part of building its solution.

The MTS data is often characterized by multicollinearity. Therefore, a method used to select significant input features should take into account the correlation between them. There are several methods to detect multicollinearity, and various approaches are used to solve this problem [9, 10].

Feature selection (FS) is a general approach that chooses the subset of features most important for the target variable by removing irrelevant and redundant features. The methods most often used for FS in the case of high-dimensional data are filtering methods [11]. The approach studied in this article considers a special method of filter type. It iteratively selects features with the highest Pearson correlation with the target variable and discards features with high mutual correlation.

In this study, we compare the quality of the NN solution to the MTS IP on the full set of input features and on its subsets. These subsets are created using the considered selection method, as well as using traditional FS methods, such as cross-correlation based selection of significant input features.

The primary objective of this study is to test the effectiveness of the novel method of selecting essential features, whose main contribution is taking into account feature multicollinearity, in solving the MTS IP. In addition, we consider determination of the optimal parameters of the algorithm, and we compare the results obtained using this method with the results of cross-correlation based FS, and with the results obtained when neural networks are trained on the full data set.

2 Problem Statement

2.1 Parameterization Scheme

The MTS IP model considered in the present study is an integral part of the general model designed for joint application of three physical methods: magnetometry, gravimetry, and magnetotelluric sounding. To ensure the possibility of simultaneous use, it is necessary that the formulation of the problem was similar for all the physical methods considered [12]. In this case, such a formulation would consist of determining the structural boundaries that separate geological layers with constant values of the parameters: magnetization in magnetometry, density in the gravimetry problem, electrical resistivity in MTS.

The considered parameterization scheme was a 4-layer 2D model corresponding to the section of the Norilsk region, relevant in the context of ore exploration. The first layer modeled a basalt layer, the second and fourth ones—terrigenous-carbonate deposits of the Tunguska series, and the third one—gabbro-dolerite massive copper-nickel-platinum ores. The medium parameterization scheme is shown in Fig. 1, and described in more detail in [13,14,15].

Fig. 1
An area graph of the parameterization scheme plots z versus y in kilometers. It has three shades plotted between 0 and 1.5, 1.5 and 3, and 1.5 and 2.3. The values are estimated.

Parameterization scheme

The values of the layers resistivity were fixed, i.e. the same for the entire data set. The determined parameters were the values of the depths of the boundaries of the layers h(y) along the section, the thickness of each layer for each y was greater than zero.

The depth values for each pattern of the training sample were set randomly in the range of layer boundaries considered. Next, the direct problem was solved by the finite difference method. In this case, the six components of the EM field were calculated: the real and imaginary components of the impedance tensor Z (ZYX—H polarization and ZXY—E polarization) and tipper W [16, 17]. The calculation was made for 13 frequencies ranging from 0.001 to 100 Hz.

2.2 Data

The data array was obtained by repeatedly solving the direct problem, as stated above. For each pattern, the direct 2D problem was solved for random distribution of the depths of the boundaries of the layers [18].

Thus, the input dimension of the problem was:

$$ \begin{aligned} & 6{\text{ field components }}\left( {{\text{taking into account the complex}} - {\text{valued data presentation}}} \right) \, \\ & \times \, 13{\text{ frequencies }} \times \, 31{\text{ pickets }} = \, 2418{\text{ attributes }}\left( {{\text{features}}} \right) \\ \end{aligned} $$

It should be noted that due to the geometry and physics of the problem, many features are correlated with each other, which is an additional argument in favor of FS.

The output dimension of the problem was: 3 layers × 15 depths = 45 parameters.

A total of 30,000 patterns were calculated.

3 Methods of Solving the Inverse Problem

3.1 The Use of Neural Networks

In this study, to solve the IP, we use the type of NN called a multilayer perceptron (MLP), which is known to be a universal approximator [19,20,21].

Here we apply the approach of autonomous determination of parameters, when a separate single-output NN is used to determine each target parameter independently. The architecture used was an MLP with a single output and 32 neurons in the single hidden layer. To reduce the influence of weights initialization, three NN’s were trained in each considered case; the statistical indicators of the solution quality of the 3 NN’s were averaged. To prevent overtraining of the NN, training was stopped after 500 epochs with no improvement in the solution quality on the validation set.

The initial data set was divided into training, validation and test sets in the ratio of 70:20:10. The size of the sets was 21,000, 6,000, and 3,000 patterns, respectively.

3.2 Description of the Iterative Feature Selection Algorithm

Traditionally [22, 23], among the methods of selecting essential attributes based on supervised training, three groups are distinguished: filter methods, embedded methods and wrappers. Filter methods are highly computationally efficient; however, feature sets selected by them may be not optimal.

Here we consider a method based on iterative feature selection (IFS), which takes into account multicollinearity of the input features [24]. Hereafter, by “correlation” we mean the Pearson correlation.

As the first step, the algorithm selects the feature with the highest correlation with the target variable. As the second step, all features whose correlation with the one chosen at the first step was higher than some threshold value are excluded from the set.

This process is repeated either until the features run out, or until there are no features left in the initial set whose correlation with the target variable is greater than a certain threshold (Fig. 2).

Fig. 2
A flow chart of the iterative feature selection algorithm. Initial feature set points to selection of the feature with the highest relevance, removal of features, stopping criteria satisfied? If yes, select the feature set, and if no, go back to the selection of the feature with the highest relevance.

Scheme of the iterative feature selection algorithm

Thus, the described method has two parameters that need to be set:

  1. 1.

    The maximum allowable value of correlation with other input features Txx.

  2. 2.

    The minimum allowable value of correlation with the target Txy.

It should be noted that the studied algorithm has been already tested by the authors on an inverse problem in spectroscopy in their previous study [24], and proved its efficiency. A similar FS method is discussed in [25].

3.3 Application of the Iterative Feature Selection Algorithm

For efficient implementation of the described IFS algorithm, the absolute value of Pearson correlation coefficient between each pair of the input features was calculated. A large number of input features, especially neighboring ones, have indeed a high degree of mutual correlation. This may indicate redundancy of some features, which will be excluded from the feature set using the developed FS method.

As specified above, the method discussed in this article has two threshold values to be set (Txx and Txy). We chose the values of the thresholds at which the number of selected features is 600 (~25% of all input parameters). There are several ways to achieve this. We used 4 different pairs of threshold values, and then the best ones for each of the methods were selected for further consideration.

4 Results of Solving the Inverse Problem

The quality of the NN solution of the inverse problem on the full feature set was compared with the quality obtained using the IFS algorithm and CC (cross-correlation based FS). The CC-based FS method calculates the correlation of each of the input features with the target variable and either takes the specified number of features with larger correlation values or takes all the features whose correlation with the target variable exceeds a pre-defined threshold. The results of comparison of the three approaches are presented in Fig. 3.

Fig. 3
Three grouped bar graphs with error bars. In layers 1, 2, and 3, the bars of input features, cross-correlation features, and iterative feature selection are in the I m group.

The quality of the solution (root mean squared error) of the inverse problem on the test set on the full set of input features (No sel.), for cross-correlation feature selection (CC), and for the iterative feature selection algorithm (IFS), for 600 selected features

The results are provided separately for the real part of the fields, for the imaginary part of the fields, and for both parts of input data considered simultaneously. In each case, the number of selected features was set to 600 for both IFS and CC methods. It should be noted that the fourfold reduction in the number of input features with respect to the full feature set, when any of the two FS methods is used, causes also a significant reduction in the computational resources needed to train the NN. As for the FS procedure itself, both considered methods require computation of the correlation values for all the initial features, with their subsequent selection based on different types of comparison of the obtained values. This selection stage requires comparable numbers of comparison operations for the two methods, so the total amount of computational resources required for implementation of the two methods is nearly the same.

The presented results are for the three blocks of the central vertical line, placed above each other and equally spaced from the edges of the section. The quality of the solution of all methods decreases with increasing layer number (depth). This is due to the distortions introduced by the upper layers in the readings for the lower ones.

5 Conclusions

Solving the inverse problem of MTS using the IFS algorithm considered in the article gives better results than solving this IP on the full set of input features. Another popular FS method (by cross-correlation) does not show such good results due to the high multicollinearity of this problem. Thus, the IFS algorithm makes it possible to reduce the error of solving the IP of MTS with a decrease in the input dimension.