Keywords

1 Introduction

Kriging is a geostatistical interpolation technique that predicts the value of observations in unknown locations, based on previously collected data. The kriging error or interpolation error is minimized by studying and modeling the spatial distribution of points already obtained. This spatial distribution or spatial variation is expressed in the form of an experimental variogram [9].

The variogram is the basis for the application of the kriging method. Thus, the kriging process is defined in three main steps. First, the experimental variogram is calculated. Then, the theoretical variogram is modeled to represent the experimental variogram. Finally, the value of a given point is predicted using the built theoretical model.

Artificial Intelligence techniques have been used to improve the kriging process as shown in [2, 8, 10, 13, 15], however, it is still a challenge to determine what method is better suited for a given database. As stated in [8] and applied in [10], bio-inspired algorithms, in general, are suitable to help define the theoretical variogram parameters. Furthermore, these types of algorithms do not require a single initial seed value as input, but rather an interval.

Several researchers have implemented bio-inspired algorithms to optimize theoretical variogram parameters, such as Genetic Algorithms (GA) [2, 10, 15], and more recently Particle Swarm Optimization (PSO) [12]. However, which to the best of the authors knowledge, there is no systematic comparison in relation to accuracy and convergence of the kriging parameters estimation as well computational processing effort of these algorithms considering the same scenario of study.

This paper applies the cluster-based kriging method designed by [1]. In this method, the spatial data are divided into different subgroups by the K-means clustering method [14], where each data point is interpolated using only data from the same group. A limitation of the cluster-based method proposed by [1] is the task of defining a single theoretical variogram model for all groups, without considering that distinct regions of the analyzed database may present specific behaviors during the kriging process. Therefore, we propose the estimation and optimization of different parameters to each group. Some problems encountered in this kind of improvement, such as cluster overlapping and unknown point classification, are solved using the K-Nearest Neighbour (KNN) algorithm [14] and data preprocessing.

In this context, the purpose of this paper is to evaluate well known bio-inspired techniques such as Genetic Algorithms and Particle Swarm Optimization when applied in the estimation of the essential kriging parameters. As said, in order to perform tests, the cluster-based kriging method was applied and a GA and PSO model was built for each cluster. The results showed that both algorithms were statistically equal in optimizing the variogram parameters. However, PSO converged faster than GA. This conclusion is important because the adopted methodology (cluster-based kriging, interpolation cost function, and so forth) presents a high computational cost. Therefore, it is essential to reduce the number of iterations of optimization algorithms.

This paper is organized as follows. Section 2 consists of the theoretical background involving important concepts for the understanding of this paper. In Sect. 3, the steps of the proposed methodology are detailed. Section 4 presents the database used in this work besides the results of the experiments performed. Finally, Sect. 5 presents the final considerations.

2 Background

2.1 Kriging

Kriging is an interpolation technique widely used in geostatistics to predict spatial data. This method takes into account the characteristics of regional variables autocorrelation. These variables have some spatial continuity, which allows the data obtained by the sampling of specific points to be used to parameterize the prediction of points where the value of the variable is unknown [9].

Let Z be a set of observations of a target variable (response variable) denoted as \(\{z(s_1)\), \(z(s_2)\), ..., \(z(s_N)\}\), where \(s_i = (x_i, y_i)\) is a point in a geographical space; \(x_i\) and \(y_i\) are its coordinates (primary locations); and N is the number of observations.

Values of the target variable at some new location \(s_0\) can be derived using a spatial prediction model. The standard version of kriging is called ordinary kriging (OK), where the predictions are based on the model:

$$\begin{aligned} \hat{z}_{OK} (s_0) = \sum ^n_{i=1}w_i(s_0){.}z(s_i)=\lambda ^T_0{.}\mathbf {z} \end{aligned}$$
(1)

where \(\lambda _0\) is a vector of kriging weights \((w_i)\), and \(\mathbf {z}\) is the vector of N observations at primary locations.

So, in order to estimate the weights, we calculate the semivariances \(\gamma (h)\) based on the differences between the neighboring values:

$$\begin{aligned} \gamma (h) = \frac{1}{2} E[(z(s_i)-z(s_i+h))^2] \end{aligned}$$
(2)

where \(z(s_i)\) is the observation of the target variable at some point location, and \(z(s_i + h)\) is the observation of the neighbour at a distance \(s_i + h\).

Suppose that there are N point observations, this yields \(N \times (N-1)/2\) pairs for which a semivariance can be calculated. If we plot all semivariances versus their separation distances a variogram cloud is produced. For an easier visualization of this variogram cloud, the values are commonly averaged for a standard distance called “lag”. If we display such averaged data, then we get the standard experimental variogram, which can be seen in Fig. 1.

Fig. 1.
figure 1

Example of a final variogram model.

Once we calculate the experimental variogram, we can fit it using a theoretical model, such as linear, spherical, exponential, Gaussian, among others. The variograms are commonly fitted using a cost function (e.g. weighted least squares [3]). Hence, the main objective is to minimize this cost function. In this work, in order to simplify the experiments, we only use the exponential theoretical model, which is given by

$$\begin{aligned} \gamma (h) = C_0 + C_1 \left( 1-EXP\left( -3\left( \frac{h}{R}\right) ^2\right) \right) \end{aligned}$$
(3)

where R is the range of influence or simply range, which is the coordinate where the model starts to flatten out; \(C_0\) is the nugget effect, which can be attributed to measurements errors or spatial sources of variation at distances smaller than the sampling interval; and \(C_O+C_1\) is the sill, which is the value that the model attains at the range R. These parameters, also called coefficients, determine the theoretical variogram as illustrated in Fig. 1.

Once we have estimated the theoretical model, we can use it to derive semivariances at all locations and solve the kriging weights. The ordinary kriging (OK) weights are solved multiplying the covariances:

$$\begin{aligned} \lambda _0 = \mathbf {C^{-1}} . \mathbf {c_0}; \ \ \ \ \ \ C(|h|=0)=C_0+C_1 \end{aligned}$$
(4)

where \(\mathbf {C}\) is the covariance matrix derived for \(N \times N\) observations and \(\mathbf {c_0}\) is the vector of covariances at a new location. Note that the \(\mathbf {C}\) is in fact a \((N+1) \times (N+1)\) matrix if it is used to derive kriging weights, since one extra row and column are used to ensure that the sum of weights is equal to one:

$$\begin{aligned} \begin{bmatrix} C_{(s_1,s_1)} &{} \ldots &{} C_{(s_1,s_N)} &{} 1 \\ \vdots &{} \ddots &{} \vdots &{} \vdots \\ C_{(s_N,s_1)} &{} \ldots &{} C_{(s_N,s_N)} &{} 1 \\ 1 &{} \ldots &{} 1 &{} 0 \end{bmatrix}^{-1} . \begin{bmatrix} C_{(s_0,s_1)} \\ \vdots \\ C_{(s_0,s_N)} \\ 1 \end{bmatrix} = \begin{bmatrix} w_{1}(s_0) \\ \vdots \\ w_{N}(s_0) \\ \varphi \end{bmatrix} \end{aligned}$$
(5)

where \(\varphi \) is the Lagrange multiplier. After calculating the weights, the prediction is then given by Eq. 1.

When the experimental variogram is distinct for two or more directions, we have an anisotropic phenomenon [9]. The anisotropy is calculated considering a certain angle from 0 to 180\(^\circ \), and a factor given by

$$\begin{aligned} Anisotropy factor = \frac{a_2}{a_1} \end{aligned}$$
(6)

where \(a_1\) and \(a_2\) are the biggest and smallest radius of the ellipse (area of effect in the kriging process), respectively. This factor varies between 0 and 1, with 1 being an isotropic model. Therefore, in case of anisotropy, five parameters are used to estimate the theoretical variogram model: nugget, sill, range, angle, and the anisotropy factor.

2.2 Population Diversity Index

The standard population diversity (SPD) describes the level of variation in a certain population. Greater diversity implies greater variability of the population [16]. Considering a population with P individuals (\(G_{1}\), \(G_{2}\),...,\(G_{p}\)), given that each individual (or particle) has T parameters (or genes), we can denote \(G_{i}\) \(=\) (\(G_{i,1}\), \(G_{i,2}\),...,\(G_{i,T}\)). So, the general mean for each gene T is given by

$$\begin{aligned} G^{ave}_T = \frac{1}{P}\sum ^{P}_{i=1}G_{i,T} \end{aligned}$$
(7)

In the normalization step, the standard deviation for each gene T in relation to the population P is calculated by

$$\begin{aligned} \sigma (G^{ave}_{T})=\sqrt{\frac{1}{P}\sum ^{P}_{i=1}(G_{i,T}-G^{ave}_{T})^2} \end{aligned}$$
(8)

Finally, the variability of the population P in each generation of the bio-inspired algorithm is given by

$$\begin{aligned} SPD = \frac{1}{T} \sum ^{T}_{j=1} (\frac{\sigma {(G^{ave}_{j}})}{G^{ave}_{j}}) \end{aligned}$$
(9)

3 Methodology

The cluster-based kriging method [1] was adopted to evaluate GA and PSO algorithms. In this scenario, we first applied a preprocessing step using standardization algorithms and a statistical measure to remove outliers. After that, the K-means algorithm was used to find U groups of data. In this clustering step, the KNN algorithm was used to minimize the cluster overlapping problem, improving the clustering groups. For each group, the bio-inspired algorithms were used to find the kriging parameters, in other words, a model was built for each cluster u. Then, the unknown points (test data) were allocated via KNN to one of the previously built clusters and interpolated by the respective model.

The flow chart that describes the proposed methodology can be seen in Fig. 2. For each number of cluster (1 to 5), this process was repeated 10 times in the 10-fold cross-validation. Each iteration was performed using different training and test data sets.

Fig. 2.
figure 2

Flowchart of the proposed methodology.

3.1 Cost Function and Evaluation Metric

The fitness function used in GA and PSO algorithms was obtained by applying the kriging process at each data point (leave-one-out cross-validation) of the training data (90% of the database). Regarding the evaluation metric, the 10-fold cross-validation was applied. For both cases, fitness and evaluation, the interpolation cost function (Eq. 10) was employed [10]. More specifically, the normalized mean squared error (NMSE) index was used as figure of merit and calculated by

$$\begin{aligned} NMSE_u = \frac{1}{\sigma ^2{.}n} \sum _{i=1}^{n}[\hat{z}(s_i)- z(s_i)]^2 \end{aligned}$$
(10)

where \(\hat{z}(s_i)\) is the predicted value of the target variable obtained by the kriging method at the hidden point \(s_i\); \(z(s_i)\) is the real value of the target variable at the hidden point \(s_i\); n is the total number of points in the cluster u; and \(\sigma ^2\) is the variance of the target variable considering the cluster u data. A lower NMSE indicates a better prediction value. The NMSE index of the database is given by

$$\begin{aligned} NMSE = \sum _{u=1}^{U}NMSE_u \end{aligned}$$
(11)

It is important to point out that the leave-one-out cross-validation was used only to calculate the fitness function of each solution (GA and PSO). In order to measure the accuracy, we applied the 10-fold cross-validation. So, the studied database was randomly partitioned in 90% for training and 10% for test in each iteration. In all, 10 iterations with different partitions were performed for each number of clusters. The average of these 10 tests was calculated in the end.

3.2 Data Preprocessing

In the data preprocessing step, the spatial information x and y, and the target variable, piezometric wells in this work, were normalized between 0 and 1. This procedure is important to ensure that every variable has the same weight in the clustering process and to avoid cluster overlapping. In the sequel, we used the Z-score measure with 99% confidence level to remove outliers.

3.3 Data Clustering

K-means [14] is one of the simplest unsupervised learning algorithms that solve the clustering problem. This clustering algorithm partitions the database into U clusters, where the user provides the value of U. In this work, the K-means method was chosen based on the solution proposed by [1], however, it is important to observe that any other clustering method can be used in the proposed methodology. Remembering that the main objective is to evaluate the performance and behavior of GA and PSO bio-inspired optimization methods.

After the preprocess and partition tasks, the training data was split into U clusters using the spatial information x and y, and the target variable. As shown in [1], the clustering process often results into overlapped clusters. This problem impairs the correct allocation of unknown (new) data into the clusters. So, in order to minimize the overlapping, all data were previously normalized between 0 and 1, and the KNN algorithm was applied in order to enhance the data grouping by allocating the current point based on the k neighbors. For example, the black circle highlighted in Fig. 3(a) demonstrates data overlapping, which was reduced with the application of the proposed methodology, as can be seen in Fig. 3(b).

Fig. 3.
figure 3

Example of data clustering: (a) Original data clustering with overlapping and (b) Normalization + KNN data clustering.

3.4 Optimization Phase

For each cluster obtained by the K-means technique, we applied an optimization algorithm (GA and PSO) to find optimal kriging parameters: nugget, sill, range, factor, and angle. The accuracy of the interpolation are directly correlated to how good these parameters are. Each bio-inspired algorithm was evaluated based on the best set of parameters.

3.5 Classification

The KNN algorithm was applied in order to classify the test data points into one of the previously built clusters based only on the spatial coordinates. Then, the kriging process was carried out and the error was calculated.

4 Experiments and Results

4.1 Database

The studied database represents the mountainous region of Wolfcamp Aquifer in West Texas/New Mexico [1]. This study area was already employed in other works [1, 3] and classified as irregularly spaced with anisotropic data. This database contains 85 data points, including the spatial coordinates (x and y) and piezometric wells information (target variable).

4.2 Experiments

In order to evaluate and compare the results for the two studied bio-inspired algorithms, a manual tuning process was performed. The population size and the number of iterations were the same for both algorithms, and other specific parameters were manually tested. The final values are shown in Table 1. The GA code was designed on the R package GA [11] and the PSO code was also implemented using the R language programming.

Table 1. GA and PSO parameters.

For convenience, some parameters of the experimental variogram were fixed, such as the number of lags (= 1), the model type (= exponential), and the nugget effect (= 0). So, the chromosome (GA) and the particle (PSO) had the following variables: sill, range, angle, and factor. Their lower and upper bounds were defined, after the normalization step, as: range (= 0 to d); sill (= 0 to \(\sigma ^2\)); angle (= 0 to \(180^\circ \)); and factor (= 0 to 1); where d is the maximum distance between two points and \(\sigma ^2\) is the target variable variance.

GA and PSO fitness from one to five clusters in the optimization phase are described in Table 2. For more than one cluster settings, the results were obtained by summing the NMSE errors (Eq. 11). Because of the computational cost, 10 executions for each number of clusters were performed, each one using 90% of the database and the leave-one-out process. Note that the best results (or lowest errors) were obtained with five clusters for both algorithms and the average and standard deviation tend to decrease as the number of clusters increases. PSO always achieved better fitness than GA, but we can infer that GA is more stable, since PSO presented a higher standard deviation.

Table 2. Best fitness/NMSE, fitness average and fitness standard deviation.

Figure 4 presents GA and PSO convergence curves from one to five clusters in the optimization phase. The results showed that PSO converges faster than GA, in other words, PSO reached the best result found by GA before the tenth generation in most cases. Since it is a high computational cost process, this kind of information is relevant considering future works. This better convergence is probably explained by the fact that PSO presented populations with higher diversity level, as will be discussed in the sequel.

Fig. 4.
figure 4

Convergence charts for one to five clusters. Each line represents the average of 10 executions of GA or PSO.

In Fig. 5, we can see the standard population diversity (SPD) calculated for both algorithms. PSO had a much higher variation in the population than its counterpart GA. Obviously, a high diversity does not guarantee a better result, but it is a good indicator that the population is well spread out in the search space. Figure 5 shows the SPD values for the configuration with one cluster, just as an example, since the behavior was similar in the other settings.

Fig. 5.
figure 5

SPD for GA and PSO. Average of 10 executions for only one cluster.

Figure 6 presents the results of the classification step that were calculated using the 10-fold cross-validation process. More specifically, each iteration (= 10 for each number of clusters) used 10% of the database to test the kriging parameters previously estimated with the remaining 90%. The p-value obtained by the Friedman test [7] was 0.17, indicating that the error difference between GA and PSO was not statistically significant.

Fig. 6.
figure 6

Boxplot of NMSE considering the classification step for GA and PSO from one to five clusters. Average of 10 executions for each number of clusters.

5 Conclusions

The results obtained with the proposed methodology demonstrated that, based on the Friedman test, the evaluated algorithms (GA and PSO) are statistically equivalents when estimating the kriging parameters on the studied database. However, in the optimization phase, PSO converged faster than GA in all scenarios (1 to 5 clusters), which is an important conclusion.

Furthermore, exploring different parameters and customizing other operators could be interesting tasks to thoroughly assess the strengths of each method. Other topics that would add value to this research are: (i) reduce the computational processing time; (ii) test other techniques and databases in the proposed methodology; and (iii) discuss the impact of the clustering-based method on the stationary hypothesis. Stationary data is one whose statistical properties such as mean, variance, among others, are all constant over the spatial domain, which is suitable for the kriging process [9]. In [6], the author states that including the spatial coordinates in the clustering step, like the proposed methodology, does not guarantee the stationary hypothesis.