Keywords

1 Introduction

Groundwater is one of the most important water sources. Groundwater management is important for their sustainable development. Therefore, we need appropriate information about the spatio-temporal behavior of the water table in a region. However, monitoring groundwater levels is inherently costly and time consuming, especially during installation stages that require the drilling of wells and piezometers. As a result, the number of surveillance sites available in a particular region is relatively small and often does not reflect the actual range of variation that may exist. Therefore, accurate groundwater level spatial interpolation at unsampled sites is required for better groundwater management.

Among various interpolation methods, there is no clear optimal method, thus results need to be compared depending on specific situation [10]. This study presents the use of GIS tools to generate the groundwater level surface for a sparsely monitored region from groundwater levels monitored at random locations. Geostatistical analyst tool in ArcGIS 10.8 is used to explore the spatial variability in groundwater levels for the Sagar district region located in Madhya Pradesh, India.

2 Study Area and Data Source

2.1 Sagar District

The Sagar district is located in the north central part of Madhya Pradesh, India, and occupies an area of 10,252 km2. The district extends between the latitude of 23° 10′ and 24° 27′N, longitude of 78° 04′ and 79° 21′E. Figure 1 shows the index map for the Sagar district.

Fig. 1
An outline map of India highlights the Madhya Pradesh state, while the Madhya Pradesh map traces the Sagar district. An index map of the Sagar district map spots the observation wells.

Index map of study area

2.2 Data Used

The pre (April–June) and post-monsoon (Oct–Dec) seasonal groundwater level data of 31 central ground water board (CGWB) monitoring wells for 2019 are used. The location and groundwater level of observation wells are collected directly from the India Water Resources Information System (WRIS). Traditionally, geostatistical studies are performed on at least 100 samples [11]. However, the sample size is small in this study.

3 Materials and Method

3.1 Exploratory Analysis

The datasets are initially visualized in order to identify incorrect coordinate information and illogical data points. The screened datasets are then subjected to exploratory information evaluation to identify the outliers which can be unfavorable to spatial prediction. The variogram, in particular, may be very touchy to outliers due to the fact it's far primarily based totally at the squared differences among information [8]. Description of the information values is done through fundamental precis of statistics, inclusive of means, medians, variances, and skewness.

The geostatistical method mainly kriging is taken into consideration the high-quality unbiased linear prediction (BLUP) if the information meets the situations of normality, variance uniformity, and stationarity [7]. However, spatial information, specifically weather information, violates those situations. High asymmetries and outliers have undesired consequences on variogram shape and kriging estimates [6]. For spatio-temporal data that follow a Gaussian distribution, the effects of extremes are reduced, and more stable variograms are obtained, making it easier to model spatial variability [5]. Data transformations is required prior to kriging to standardize data distribution, eliminate outliers, and improvise data stationarity [4]. In this study, the normality of groundwater level data is visualized using tools such as histograms and boxplots, and the mean and median, symmetry (skewness) and flattening (kurtosis) coefficients. It is checked numerically by comparing it with the normal distribution and also according to the formal statistical Shapiro–Wilk test.

3.2 Interpolation Methods

Inverse distance weighting directly works on the assumption that the point closest to the predicted position is weighted more heavily, and the weight is reduced as a function of distance, hence the name inverse distance weighting. The exact formula for this interpolator is

$$z\left( {s_{0} } \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \frac{{s_{i} }}{{d_{i}^{p} }}}}{{\mathop \sum \nolimits_{i = 1}^{n} \frac{1}{{d_{i}^{p} }}}}$$
(1)

where \(z\left( {s_{0} } \right)\) is predicted value of unsampled point, n is total no. of sampling points, di is separation distance between unsampled point and ith sampled point, and p denotes the weighing power.

Radial basis functions (RBFs) are an exact set of interpolation methods. That is, the surface must pass through each measurement sample value. RBF is used to generate smooth surfaces from a large number of data points. There are five different functions: thin-plate spline (TPS), spline with tension (SPT) [2], fully regularized spline (CRS) [2], multiquadric function (MQ), and inverse multiquadric function (IMQ) [12]. Each basis function has a different form and has a different interpolation plane.

Kriging belongs to a family of generalized least squares regression methods in geostatistics which uses observations sampled in a particular search environment as a linear combination to estimate values in unsampled locations [5, 7].

$$\widehat{Z}\left( {s_{o} } \right) = \mathop \sum \limits_{\alpha } \omega_{\alpha } Z\left( {s_{\alpha } } \right)$$
(2)

\(\widehat{Z}\left( {s_{o} } \right)\) is the estimated variable of interest (groundwater level) at the unsampled location \(s_{o,}\) and \(Z\left( {s_{\alpha } } \right)\) is the observed values at the sampled locations in the vicinity of \(s_{o}\).

OK is most commonly used form of kriging, in which mean is considered unknown and fluctuates locally, which makes it possible to maintain stationarity in the local neighborhood.

Simple kriging is mathematically the simplest but the least common. It is assumed that the expected value (mean) of the random field is known and depends on the covariance function. SK assumes second-order stationary that is constant mean, variance, and covariance across the domain or the region of interest [11].

The OCK method is a modification of the OK method. The main advantage of OCK is that it uses multiple variables in the estimation process. The OCK method is used to improve the predictability of the primary variable by using the auxiliary variable, assuming that both primary and auxiliary variables are in good correlation [7]. This method is especially suitable when the main attributes of interest are sparse, but the relevant secondary information is abundant.

3.3 Variograms (Semivariogram)

The existence of spatial structures (spatial autocorrelation) in which nearby observations are more similar than distant observations is a pre-requisite for the application of geostatistics [5]. Experimental variograms measure the average dissimilarity between unsampled values ​​and nearby data values [4] and can therefore represent autocorrelation at various distances. The kriging method requires a model of functions that characterize spatial variability, variograms, and key characteristic parameters such as nugget effects, thresholds, and ranges [5]. The experimental variogram is calculated based on the following formula:

$$\hat{\gamma }\left( d \right) = \frac{1}{2N\left( d \right)}\mathop \sum \limits_{\alpha = 1}^{N\left( d \right)} \left( {Z\left( {s_{\alpha } + d} \right) - Z\left( {s_{\alpha } } \right)} \right)^{2}$$
(3)

with \(Z\left( {s_{\alpha } } \right)\) and \(Z\left( {s_{\alpha } + d} \right)\) being the values observed at the locations \(u_{\alpha }\) and \(u_{\alpha }\) + d separated by the distance d and N(d) being the number of such pairs. If the value at \(Z\left( {s_{\alpha } } \right)\) and \(Z\left( {s_{\alpha } + d} \right)\) is autocorrelated, the result of Eq. (3) will be small, relative to an uncorrelated pair of points. From an experimental variogram analysis, the appropriate model (Gaussian, spherical, etc.) is usually fitted by the weighted least squares method, and the parameters (range, threshold, and nugget) are used in kriging. Exponential, Gaussian, and spherical are the most commonly used (theoretical) variogram models in hydrological kriging applications [1] and are also used to model experimental variograms.

3.4 Cross-Validation

The performance of the various interpolation methods (IDW, RBF, SK, OK, and OCK) is evaluated through a cross-validation process. Cross-validation is a validation technique that removes observations one by one from the dataset and re-estimates from the remaining sampled data using the selected model. If the sample size of the data is very small, such as when there are only 31 observations, the method comparison is done by cross-validation [7]. This is a common way to verify the accuracy of the interpolation method [1].

The overall performance of the interpolation methods for groundwater level estimation is conducted using correlation and error-based measures. The correlation includes the coefficient of determination (R2), Nash–Sutcliffe efficiency coefficient (E), and Willmott agreement index (d), whereas the error measures include the mean relative error (MRE), the root mean square error (RMSE), and the mean error (ME).

4 Results and Discussion

4.1 Exploratory Data Analysis

Seasonal groundwater data from 31 observation wells are examined to understand the pattern in the data. Histogram (Fig. 2) and standard statistics (Table 1) are used to describe the data.

Fig. 2
Two graphs labeled histograms with the normal curve are plotted against frequency. It plots pre-monsoon and post-monsoon G W elevations. Each plots fluctuating trends with a curve above the bars resembling a downward parabola.

Histogram of seasonal groundwater levels (amsl). Curve represents the fitting of a normal distribution

Table 1 Descriptive statistics for seasonal groundwater elevation level (m) data

The histogram shows symmetry indicating normality of distribution. In fact, seasonal groundwater elevation data have mean and median close enough and skewness value close to zero, thus suggesting this distribution as symmetrical distribution. Finally, the Shapiro–Wilk test confirms the normality of the original data (p = 0.83 > 0.05, p = 0.71 > 0.05). Since original data follow a normal distribution hence, it is decided to work on original data without transformation.

4.2 Spatial Analysis of Groundwater Level Data

The spatial variation of seasonal groundwater elevation is considered to be isotropic ignoring the separation direction because the size of the sample (31) is limited and would not possibly detect anisotropy [7].

Table 3 and Figs. 3 and 4 show the variogram model and its parameters tuned for seasonal groundwater levels. The nugget effect to nugget to threshold ratio is used to classify the spatial dependence of the variable [3]. Variables have strong spatial dependence when the nugget-threshold ratio is less than 0.25, and moderate spatial dependence when the ratio is 0.25 to 0.75 [8]. Otherwise, the variables are less spatially dependent. Therefore, in our case, the groundwater level was strongly spatially correlated (Table 2).

Fig. 3
2 line graphs plot the groundwater levels in pre-monsoon and post-monsoon for interpolation methods O K and S K in distance. Each traces a solid increasing curve for the model and plus signs on the curve for average.

Experimental (points) and fitted theoretical (curve) variograms of pre-monsoon (left) and post-monsoon (right) groundwater levels for OK and SK

Fig. 4
2 line graphs plot the groundwater levels in pre-monsoon and post-monsoon for the O C K interpolation method in the distance. Each traces a solid increasing curve for the model and plus signs on the curve for average.

Experimental (points) and fitted theoretical (curve) variograms of pre-monsoon (left) and post-monsoon (right) groundwater levels for OCK

Table 2 Summary of semivariogram parameters of best-fitted theoretical model to predict seasonal groundwater level (amsl)
Table 3 Summary of descriptive statistics for observed and predicted pre-monsoon groundwater level

Elevation, as auxiliary information, has decreased semi-variances. It may be visible that the sill is higher for OK and SK (4507.4 m2 and 4,295.7 m2) than for OCK (3713.8 m2 and 3577 m2). This is predicted due to the fact the covariate, elevation which become taken into consideration for OCK, however, now no longer for OK and SK variogram, in part explains the variability of the groundwater level data.

For IDW method, an optimal power value (p) as well as the number of the closest neighbors to include are determined, whereas for RBF method, the choice of radial basis function, their kernel parameter, and number of the closest neighbors to include are determined by minimizing the root mean square error (RMSE) statistics obtained from a cross-validation procedure. In this study, for IDW method, the optimizing parameter of the weight function (p) is taken as 2.0, whereas for RBF method, multiquadric radial basis function is found to be optimal among all the functions.

4.3 Groundwater Mapping

Tables 3 and 4 show various descriptive statistical parameters for the measured seasons (before and after the monsoon) and the parameters predicted by two deterministic and three geostatistical interpolation methods.

Table 4 Summary of descriptive statistic for observed and predicted post-monsoon groundwater level

One of the hallmarks of geostatistical methods is smoothing, as predicted values ​​are less variable than measured values. In other words, the minimum expected value is greater than the measured value, and the maximum expected value is less than the measured value [9]. This smoothing phenomenon is least for OCK while it is the most accentuated for SK which has 373.19, 376.34 m as minimal and 526.9, 533.14 m as maximal values compared to 348.93, 368.43 m as minimal and 586.18, 592.98 m as maximal for the measured groundwater values during pre- and post-monsoon periods, respectively.

This phenomenon is confirmed by the standard deviation, especially the decrease in the estimated variance for the measured data variance of 43, 39.7% of SK; 40.6, 39.2% for OK; 18.4, 15.6% for OCK during pre- and post-monsoon periods, respectively, and also by the coefficient of variation which is minimal for SK (9.62, 9.42%) followed by OK (9.64,9.47) and OCK (11.31,11.16%) compared to measured values (12.53, 12.17%) for pre- and post-monsoon periods.

Smoothing phenomenon among deterministic methods is least for RBF and most accentuated for IDW which has 389.37, 390.41 m as minimal and 524.98, 531.62 m as maximal values compared to 348.93, 368.43 m as minimal and 586.18, 592.98 m as maximal for the measured groundwater values during pre- and post-monsoon periods, respectively.

In geostatistical methods, estimates of SK are higher than those of OCK and OK with SK, OK, and OCK mean values of 456.34, 462.51 m; 456.16, 462.21 m; 456.10, 462.14 m for pre- and post-monsoon periods, respectively. Similar results can be seen in minimal values and for different percentiles. Similar analysis is carried out for deterministic interpolation methods, and overall estimates of IDW are found to be higher among all five interpolation methods used in this study.

Figures 5 and 6 represent the groundwater maps obtained from five methods of spatial interpolation (IDW, RBF, SK OK, and OCK) for pre- and post-monsoon periods for the year 2019.

Fig. 5
5 contour maps of the Sagar district, Madhya Pradesh. Each traces the pre-monsoon groundwater level above the mean sea level for various interpolation methods like I D W, R B F, S K, O K, and O C K. Each reads values between less than 370 and greater than 570.

Maps of pre-monsoon groundwater level (m) amsl estimated by IDW, RBF, SK, OK, and OCK

Fig. 6
5 contour maps of the Sagar district, Madhya Pradesh. Each traces the post-monsoon groundwater level above the mean sea level for various interpolation methods like I D W, R B F, S K, O K, and O C K. Each reads values between less than 370 and greater than 570.

Maps of post-monsoon groundwater level (m) amsl estimated by IDW, RBF, SK, OK, and OCK

4.4 Performance Evaluation Study

To deepen the comparative study of spatial interpolation, the performance indicators for cross-validation are shown in Figs. 7 and 5. The boxplot of groundwater level prediction error (Fig. 7) shows that interpolation generally corrects the predicted groundwater level of 31 observation wells. The perfect match between the predicted and measured values ​​is represented by 0 in the Fig. 7. Comparing the five methods, the residual (error) between the measured and predicted groundwater levels of OCK is significantly reduced. This shows that OCK is the best interpolator, though there exists small underestimation and overestimation.

Fig. 7
2 box-whisker plots labeled pre-monsoon and post-monsoon G W elevation prediction errors. Each plot has decreasing trends for predicted errors versus interpolation methods of I D W, R B F, S K, O K, and O C K.

Boxplots of seasonal groundwater level prediction errors using IDW, RBF, SK, OK, OCK

Because aim of this study was to compare various methods, first the impact of various interpolation method on accuracy was considered. The 3 kriging methods performed better than the deterministic approach to estimate groundwater levels for each season. Performance measures of interpolation methods are summarized in Table 5. High values of coefficients of determination, Nash–Sutcliffe efficiency, and Willmott agreement index suggested an amazing fit among measured and predicted water level depth. Of the five interpolation methods, OCK had the best overall performance, with OK, SK, and RBF significantly superior to IDW. Not only the performance indicators of the model, but also the errors confirmed the above facts. The low RMSE and ME of all interpolation methods showed applicability to the prediction of groundwater level, and the superiority of OCK over all other methods was fully demonstrated by its minimum error value.

Table 5 Performance evaluation of interpolation methods to predict groundwater levels

The second approach to assess the accuracy of a method is done by adopting regression coefficients (intercept and slope). The best model performance is represented by small intercept and large gradient. Among the five interpolation methods, the OCK, method which considered elevation as an auxiliary variable for predicting groundwater levels, showed best results.

5 Conclusions

The following conclusions are derived from the foregoing study:

  • Introduction of elevation information improved the performance of covariate kriging method, OCK in particular, in sparsely monitored region.

  • Smoothing phenomenon in geostatistical method is least for OCK while it is the most accentuated for SK whereas smoothing phenomenon among deterministic methods is least for RBF and most accentuated for IDW.

  • In geostatistical methods, estimates of SK were higher than OCK and OK, whereas overall estimates of IDW are found to be higher among all five interpolation methods.

  • Geostatistical methods performed better than the deterministic methods for predicting groundwater levels.