Introduction

Groundwater plays various vital parts in our surroundings and in our economies. Groundwater is the best choice of water supply with limited surface water sources. One of the principal characters is that the quality of the groundwater does not change much throughout the year, which is a benefit. Since the groundwater responds slowly to changes in precipitation, it is feasible in summer. Towards engineering, the groundwater consideration is significant for tunnel and basement excavations and constructions (Khan et al. 2020). When the groundwater level is deep, it requires a lot of energy to pump it, which leads to more expenditure. Furthermore, the deep groundwater may be affected by salt. When the groundwater level is shallow, it may be acidic. In the field of constructions, the determination of groundwater table can be done only by performing bore holes as it is not economical. Therefore, the groundwater table consideration is very mandatory for most civil engineering applications. Hence, the determination of groundwater level is critical and this article paves the economical solution to attain the target. Various researchers have utilized machine learning techniques for estimating the groundwater fluctuations and developed some empirical relations (Nayak et al. 2006; Jalalkamali et al. 2011; Chang et al. 2016; Takafuji et al. 2019; Adiat et al. 2020; Alotaibi 2020).

Soft computing can be interpreted as the computation intelligence which reduces the complexity in conventional computing and provides the effortless yield. Soft computing ideology was inspired from human brain; likely, it is permissive of uncertainty, ingenious algorithms, and approximation (Pratihar 2014). The fuzzy computing, neural networks, swarm intelligence, evolutionary computing, etc. fall under the general category of soft computing techniques (Sharma and Chandra 2019). There are various methods branched under mentioned different soft computing techniques which were adopted for solving numerous complex issues from a wide range of applications (Zhang et al. 2020; Nguyen et al. 2020; Asteris et al. 2021; Band et al. n.d.; Ray et al. 2021; Harirchian et al. 2021; Shahrour and Zhang 2021; Yu et al. 2021).

With support vector machine, the statistical theory imitates the refined tool of prediction (Vapnik 1998). SVM method includes a preparation stage in which a progression of the source of input and target yield esteems is nourished into the model. A trained algorithm is then utilized to assess a different arrangement of training data Suryanarayana et al. 2021). The application of SVM has been escalated because of its capability in solving the quandary. Seismic liquefaction is one of the major effects of earthquake, and it successfully predicted the liquefaction susceptibility of soil by adopting a support vector machine model (Samui 2014; Khan et al. 2021; Li et al. 2021). The researchers had utilized support vector machine for predicting the occurrence times of a few occasions that could occur amid a serious mishap in nuclear power plants with tolerable errors (Seung et al. 2015). Concrete, the most widely utilized building material, has its own uncertainty in its engineering property, i.e., autogenous shrinkage. The research scientists employed the support vector machine model for estimating the autogenously diminution of concrete, and hence, it proved its phenomenal potential (Jun et al. 2016). In addition, the compression strength of light weight foamed concrete was predicted with an encouraging accuracy. The cement content and density matter much in designing foamed concrete mixes (Abbas and Suhad 2017; Alsufyani et al. 2021).

Backpropagation, a family of artificial neural network whose structure consists of interconnected layers which were based on deepest descent method (Buscema 1998). Stimulated annealing-ANN (SA-ANN) was used to extract an unequivocal formula for the peak ground acceleration (PGA) for the tectonic regions of Iran (Gandomi et al. 2016). This SA-ANN model solved the non-linearity and yields the optimized equation for PGA. Moreover, when compared with other 10 models, it exposed its potential by outperforming other models. With the data of electrocardiogram, the investigators estimated the blood pressure with high accuracy with the help of backpropagation (Xu et al. 2017). The critical and paramount parameter of reservoir evaluation is capillary pressure. Backpropagation predicts the capillary pressure curve for both identical and non-identical reservoir with high precision and efficiency (Lijun et al. 2018). The analysts utilized the BP model for the wind speed forecasting in China with better performance (Wei and Yuwei 2018).

Adaptive neuro-fuzzy inference system implements a dynamic machine learning technique to govern changeableness rely upon Zadeh’s fuzzy set theory. This technique has been applied in various fields and yields some impressive outputs. The tunnel diameter convergence and convergence velocity are the most primitive issue for tunneling construction, and it was successfully estimated (Adoko and Li 2012; Jha et al. 2021). Teleconnection patterns were used as the data for modeling the minimum temperature in the northeast of Iran, and adaptive neuro-fuzzy inference system was adopted to model it for the short- and long-term periods (Hojatollah et al. 2015; Khalaf et al. 2021). The investigators had predicted the durability of recycled aggregate from concrete with much better performance compared with multilinear regression (Faezehossadat et al. 2016). The river water data from Surma River of Bangladesh has been incorporated in adaptive neuro-fuzzy inference system for foreseeing the biochemical oxygen demand (BOD). The adaptive neuro-fuzzy inference system model predicted with high coefficient of correlation value and justified its potential in determining the water quality (Ahmed and Shah 2017; Alotaibi et al. 2021). Apart from the abovementioned applications, various researchers had utilized the machine learning techniques in great scale to attain the best solutions to the numerous problems (Wan and Si 2017; Zhang et al. 2017; Miskony and Wang 2018; Mathew et al. 2018; Mallqui and Fernandes 2019 Tang et al. 2019; Rahul et al. 2021; Mosbeh et al. 2021; Guo et al. 2021).

These article coequals the compatible of the envy provided during the identical experimental expounds of the veteran unchangeable techniques flourished. The latitude and longitude of Chennai are 13.08°Ν and 90.27°E, respectively. The dataset was collected from Tamil Nadu Water Supply and Drainage Board (TWAD), Chennai. The compiled dataset has been utilized for estimating the groundwater table depth (dw) by incorporating the data into the abovementioned adopted models. The comparative studies between the developed models were carried out in order to expose the effective model to accomplish the target.

The paper is organized as following: The “Related worksS7” section conveys the related works. The “MethodologyS6” section presents the detailed methodology and adopted model. The “Results and discussions” section composes of the output obtained by the adopted models, where the performance of the adopted models will be compared statistically. Finally, the “Conclusion” section explains the conclusions and possible improvements.

Related works

The traditional models and the field measurements are inadequate for the arid regions due to uncertain hydrological cycle. In order to overcome this, Yang et al. (2019) utilized the maximum height of tree and the volume as the index for the classical measurement error model for determining the groundwater depth in the parched regions. They have generated the mathematical equation with the precision value of R2 = 0.82. In the event of planning the mitigation measures at large scale, tangled hydro-biogeochemical models are required. However, the limitations reach due to inferior data and spatial discretization. The spatial dissemination of nitrate level present in the groundwater was calculated using a parsimonious GIS-based statistical approach (Knoll et al. 2019).

Multiple machine learning techniques such as boosted regression tree, random forest, classification and regression trees, and multiple linear regression were adopted to predict the nitrate concentrations in large scale. The groundwater quality index is the primitive parameter for the drinking purposes, and this was determined by different geostatistical methods for a location in Algeria (Lazhar et al. 2020). The ordinary kriging and co-kriging were utilized to forecast the groundwater quality index based on diversified hydrochemical parameters for 35 wells.

The groundwater level prediction with greater precision and reliability in reclaimed coastal region is one of the strenuous tasks. Zhang et al. (2019) utilized some of the soft computing technique such as nonlinear input-output network (NIO), nonlinear autoregressive network with exogenous inputs (NARX), and wavelet-NARX (WA-NARX). The high-water demand paved the path to exorbitant utilization of the water resource in the Mediterranean region which in turn ultimately leads to seawater intrusion. NARX neural network was utilized to forecast the daily groundwater level for 76 wells (Fabio and Francesco 2020).

In order to make sure the safe water for the drinking purposes, the knowledge on quality of water is highly essential. Therefore, Sudhakar et al. (2021) have utilized various soft computing techniques to determine the entropy weight–based groundwater quality index (EWQI) with different physicochemical parameters as inputs. One of the major environmental issues in the coastal region was groundwater salinization. Dang et al. (2021) had developed distinct machine learning techniques for the prediction of precise salinity concentration of groundwater by using 216 geodatabase and 14 factors. Among the adopted regression models, the CatBoost regression model gave accurate prediction with less error.

Similarly, Tao et al. (2021) had utilized Gaussian processes and kriging model for computing the groundwater salinity in the regions of Australia. The evolution of hybrid models has provided more solutions for the serious complex issues. As an instance, Sami et al. (2021) have utilized a hybrid model, adaptive neuro-fuzzy inference system-evolutionary algorithms (ANFIS-EA) for the determining the optimal groundwater exploitation in the aquifers located in Iran. Along with the ANFIS, particle swarm optimization (PSO), gray wolf optimization (GWO), and Harris hawk optimization (HHO) models were utilized for the higher accuracy.

Similarly, many researchers have utilized many other intelligence techniques in monitoring the quality and the level of water (Moghaddam et al. 2019; Wei et al. 2019; Varouchakis et al. 2019; Sharafati et al. 2020; Panahi et al. 2020; Wei et al. 2020; Mohapatra et al. 2021). Apart from solving groundwater issues, the soft computing intelligent techniques were utilized for resolving many other engineering issues (Bharti et al. 2021; Panagiotis et al. 2021; Kardani et al. 2021; Deepak et al. 2021; Nhu et al. 2020)

Methodology

Soft computing technique is a rising technique to deal with registering that gives the noteworthy capacity of the human brain to contend and learn in the environment of vulnerability and doubt. Besides, soft computing might be seen as an establishment part for the developing field of conceptual intelligence. It manages inaccuracy, vulnerability, fractional truth, and closeness to accomplish tractability, robustness, and solution at least amount. As mentioned earlier, the data has been complied, which is then incorporated in the soft computing models like support vector machine, backpropagation, and adaptive neuro-fuzzy inference systems for forecasting the groundwater table for Chennai Region.

Details of support vector machine

Generally, the support vector machine is composed into two cleaves as support vector classification (SVC) and support vector regression (SVR). It is an elegant and evolved method of suspension (Xiang et al. 2012; Alotaibi 2021).

SVM adopts structural risk minimization (SRM) principle, which is admirable than the classical empirical risk minimization (ERM) principle. The SRM principle has been utilized in many other modeling techniques (Mukherjee et al. 1997; Gunn 1998). SRM minimizes the risk, which reduces the error.

Alternate loss function has been introduced and modified to incorporate the distance measure. Furthermore, the cost function has to insert in order to count risk on a hypothesis that leads to measure the regression error. Minimizing the errors could lead to achieving generalized performance. Figure 1 conveys the typical architecture of support vector machine.

Fig. 1
figure 1

Architecture of support vector machine

Figure 1 is the modest representation of the SVM model in which the adopted inputs were utilized for framing the scope and kernel functions which in turn transfer to prefer the Lagrange multipliers. Then, with the combination of Lagrange multipliers and kernel functions, the model did the better prediction.

Let as considers the following sample (y)

$$y=f(x)=w.x+b$$
(1)

where “.” illustrates the product, w and b are the developed and diplomatic criterion, respectively, and x is simplified design of the normalized form. The structural anomaly to emirate the valued pentacle provided for risk management established Remp defined the Eq. (2), and the hypothesized peril factor can be epitomized by the flaw (ε)-unresponsive deficit function Lε(y, f(xi, w)) defined as the Eq. (3) (Stitson et al. 1996).

$${R}_{emp}\left(w,b\right)=\frac{1}{n}\sum_{i=1}^n{L}_{\varepsilon}\left({y}_i,f\left({x}_i,w\right)\right)$$
(2)
$$L_\varepsilon\left(y_i,f\left(x_i,w\right)\right)=\left\{\begin{array}{ll}\varepsilon,&\mathrm{if}\;\left|y_i-f\left(x_i,w\right)\right|\leq\varepsilon\\\left|y_i-f\left(x_i,w\right)\right|-\varepsilon,&\mathrm{otherwise}\end{array}\right.$$
(3)

L ε is the ε-insensitive unstable function, sensitivity hung up with determination (yi) and the predicted outcome values(f(xi, w))in affordable prediction, and xi is attempted design. The conundrum of describing a and c to cultivate the tantamount of aforesaid ε-unresponsive deficit function is parallel to the convex upsurge obstacles that taper the margin (w) and slack variables(ξi, ξi) as

$$\begin{array}{l}\operatorname{Minimize}\;\mathrm{to}:\min_{w,b,\xi_i,\xi_1^\ast}\left[\frac12w.w+C\left(\sum_{i=1}^n\xi_i^\ast+\sum_{i=1}^nv\right)\right]\\\mathrm{Subjected}\;\mathrm{to}\;\left\{\begin{array}{l}y_i-w.x_i-b\leq\varepsilon+\xi_i\\w.x_i+b-y_i\leq\varepsilon+\xi_i,\mathrm i=1,\dots,\mathrm n\\\xi_{\mathrm i},\xi_i^\ast\geq0\end{array}\right.\end{array}$$
(4)

where the initial figure \(\left(\frac{1}{2}w.w\right)\) is the materialize to medicate befall the amplify explicate dilation. For resolving the above mentioned optimization problem (equation 4), enrich lengthen the peculiarity of advanced general contrary of developed and depicted as follows

$$\begin{array}{l}L\left(w.\xi^\ast,\xi,\alpha^\ast,\alpha,C,\gamma^\ast,\gamma\right)=\frac12w.w+C\left(\sum_{i=1}^n\xi_i^\ast+\sum_{i=1}^n\xi_i\right)-\sum_{i=1}^n\alpha_i\left[y_i-w.x_i-b+\varepsilon+\xi_i\right]\\-\sum_{i=1}^n\alpha_i^\ast\left[w.x_i+b-y_i+\varepsilon+\xi_i^\ast\right]-\sum_i^n\left(\gamma_i^\ast\xi_i^\ast+\gamma_i\xi_i\right)\end{array}$$
(5)

The Lagrange function has magnified to elaborate the parameter \(w,b,{\xi}_i^{\ast }\) and \({\xi}_i\). Such intensified equation develops the variation of Eq. (5) can be obtained to progress the conditions Karush-Kuhn-Tucker (KKT)

$$\frac{\partial L}{\partial w}=w+\sum_{i=1}^n{\alpha}_i{x}_i-\sum_{i=1}^n{\alpha}_i^{\ast }{x}_i=0=w=\sum_{i=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right){x}_i$$
(6)
$$\frac{\partial L}{\partial w}=\sum_{i=1}^n{\alpha}_i-\sum_{i=1}^n{\alpha}_i^{\ast }=0;\sum_{i=1}^n{\alpha}_i=\sum_{i=1}^n{\alpha}_i^{\ast }$$
(7)
$$\frac{\partial L}{\partial {\xi}^{\ast }}=C-\sum_{i=1}^n{\gamma}_i^{\ast }-\sum_{i=1}^n{\alpha}_i^{\ast }=0;\sum_{i=1}^n{\gamma}_i^{\ast }=\sum_{i=1}^n{\alpha}_i^{\ast }$$
(8)
$$\frac{\partial L}{\partial {\xi}^{\ast }}=C-\sum_{i=1}^n{\gamma}_i-\sum_{i=1}^n{\alpha}_i=0;\sum_{i=1}^n{\gamma}_i=\sum_{i=1}^n{\alpha}_i^{\ast }$$
(9)

where the equation matured into Eq. (6), the variety of accomplishment of Eq. (1), and C is the capacity factor. Now, substituting Eqs. (6) to (9) to the acquired knowledge (Eq. (5)), the bifold pattern of the enhanced multipliers to the Lagrange for the substitution of equation\({\alpha}_i^{\ast}\ge 0,{\alpha}_i\ge 0,{\gamma}_i^{\ast}\ge 0,{\gamma}_i\ge 0,\) and C ≥ 0 becomes Eq. (10)

$${ \begin{array}{l}\underset{\alpha, {\alpha}^{\ast }}{\mathit{\max}}\left[w\left(\alpha, {\alpha}^{\ast}\right)\right]=\underset{\alpha, {\alpha}^{\ast }}{\mathit{\max}}\left[\sum_{i=1}^n{y}_i\left({\alpha}_i^{\ast }-{\alpha}_i\right)-\varepsilon \sum_{i=1}^n\left({\alpha}_i+{\alpha}_i^{\ast}\right)-\frac{1}{2}\sum_{i,j=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right)\left({\alpha}_j^{\ast }-{\alpha}_j\right)\left({x}_i.{x}_j\right)\right]\\ {}\mathrm{Subjected}\ \mathrm{to}\ \left\{\begin{array}{l}\sum_{i=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right)=0,\\ {}0\le {\alpha}_i^{\ast },{\alpha}_i\le C\end{array}\kern0.5em I=1,\dots \dots .,n\right.\end{array}}$$
(10)

where \({\alpha}_i^{\ast }\) and αi are Lagrange quadratic method of adopting the variance to expose the methodology adequately and xi. xj is the internal multiplication of dual teaching design. Decisively, Eq. (6) has to be surrogated into Eq. (1); the linear deliverable similarity exemplifies the dataset.

$$y=\sum_{i=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right)K\left({x}_i.x\right)+b$$
(11)

where Lagrange supplies the constant density \(0\le {\alpha}_i^{\ast },{\alpha}_i\le C\); the criterion of the density function that involved w and b is computed as

$$w=\sum_{i=1}^n\left({\alpha}_i^{\ast }-{\alpha}_i\right){x}_i;b=-\frac{1}{2}w.\left({x}_r+{x}_s\right)$$
(12)

where xr and xs are any support vectors.

Various dynamic modules of the management of the dataset are to be indulged in prediction.

This study utilizes the radial basis function.

$$K\left({x}_i,x\right)=\mathit{\exp}\left\{-\frac{{\left({x}_i-x\right)}^T\left({x}_i-x\right)}{2{\sigma}^2}\right\}$$
(13)

where xi represents the training pattern, x is the testing pattern, d is a measure of the input vector, and σ is the universal base function width respectively.

The dataset consists of latitude (Lx), longitude (Ly), and dw at 27 different points of Chennai. Table 1 reveals the compiled database, and Table 2 conveys the summary of the collected dataset.

Table 1 Database collected for Chennai Region
Table 2 Summary of the collected dataset

In order to construct the support vector machine model, the datasets have been branched into training and testing dataset. Training dataset develops the support vector machine model, and for that, 19 datasets out of 27 were employed for training, whereas the remaining 8 datasets were considered for the testing dataset as it will be used to evaluate the developed training dataset. In order to avoid the complexity for the support vector machine model, the values in the dataset have been normalized within the range of 0 and 1 through the following Eq. (14).

$${d}_{{normalized}}=\frac{\left(d-{d}_{\mathrm{min}}\right)}{\left({d}_{\mathrm{max}}-{d}_{\mathrm{min}}\right)}$$
(14)

where d is any data, dmin is the lower limit in the respective data, and dmax is the upper limit in the respective data.

Details of backpropagation neural network

Backpropagation was originated during the period 1970s, but it was appreciated in the year 1986. Backpropagation endeavors at high speed to solve the former insoluble problems. Recently, BP is a drudge of learning in neural networks. BP is feed forward. In order to attain the generalized performance, backpropagation filters the errors and confirm the weights which lead to the increase of performance (Haque and Sudhakar 2002; Alotaibi et al. 2021).

Basic BP’s design composes of different layers of input in the beginning, hidden layer in the intermediate, and finally output layer. The simple architecture of backpropagation was delivered in Fig. 2.

Fig. 2
figure 2

Underlying architecture of backpropagation

The number of layers for input, hidden, and the output is problem-specific (Zhang et al. 1998). According to a researcher (Bishop 1995), an architecture with a single secret layer using a sigmoid activation function can use almost accurate continuous function provided an approved number of hidden neurons. The delta rule relying upon the squared error minimization principle has been adopted by backpropagation (Haykin 1998).

Let w and b as the weight and the bias. The ultimate accomplishment of backpropagation is to calculate the partial derivatives \({\partial C}\left/ {\partial w}\right.\) and \({\partial C}\left/{\partial b}\right.\) of the application C with respect to w and b. The two main assumptions have to make in the event of backpropagation execute. The cost function (C) can be composed as

$$C=\frac{1}{n}{\sum}_x{C}_x$$
(15)

where x is any training example. Cx is the cost function over C.

The main sense for this assumption is that the BP calculates the partial derivatives for single training data, and then, it can be reclaimed for many datasets. The cost function C can be the function of the outputs.

$${Cost} \;C=C\left({a}^L\right)$$
(16)

where aL is the output activation.

Thus, the quadratic cost for single training data can be penned as

$${ \begin{array}{c}C=\frac{1}{2}\left\Vert y-{a}^L\right\Vert \\ {}=\frac{1}{2}\sum_j{y}_j-{a}_j^L\end{array}}$$
(17)

where j is the amount of datasets and y is the output.

BP is based the nourished variability of deviation that is weight parallel to the lowered error. Mathematically, it can be expressed as

$${W}_{k+1}={W}_k+\eta {d}_k$$
(18)

where w represents procuring match at η epoch. dk is the direction vector.

The positive constants can be preferred by the manipulator, and it is named as learning rate. The dimension of the predicted value dissemination to which “E” which is addressed below.

$${d}_k=-\nabla E\left({w}_k\right)$$
(19)

There are two varieties of learning backpropagation. When the weights are brought updated utilization of the incoming used data and output pairs instantaneously, then it is called online learning. Batch learning is another variety. The network has been modernized with the consideration of all input and output in an array. As a matter of fact, the vector wk includes the weights reckoned in the time of kth iteration. The estimated functional data can be criticized based on the gaps in the output generated (Kamarthi and Pittner 1999).

$$E\left({w}_k\right)=\left\{\begin{array}{l}{E}_p\left({w}_k\right)\left[ on-{line}\right]\\ {}{\sum}_p{E}_p\left({w}_k\right)\left[{batch}\right]\end{array}\right.$$
(20)

where p is the input pattern. Ep(wk) is the half sum of the squares error function of the output network.

The training patterns were imparted in every cycle for the efficient usage, and it is referred as an epoch. The sigmoid function has been endrossed in backpropagation to the modification, and it is referred as “back problem to the initiative” (Widrow and Lehr 1990). The upcoming equation delivers the finest.

$$f(x)={\left(1+\mathit{\exp}\left(-x\right)\right)}^{-1}$$
(21)

By endorsing Taylor series expansion, a new approach can be designed that is the Taylor series can be initiated as the function of the weight vector.

$$E\left(w+\Delta w\right)=E(w)+{g}^T\Delta w+\frac{1}{2}\Delta w+\dots$$
(22)

where \(g=\frac{\Delta E}{\Delta w}\) is the flash matrix and Hessian matrix, \(H=\frac{\partial^2w}{\partial {w}^2}\).

The unchanged dataset of support vector machine was used for constructing and determining the performance of BP, as it has been divided into training dataset and testing dataset. The dataset which was adopted for constructing the model is training dataset (19 datasets), whereas the pending dataset (8 datasets) was considered for evaluating the developed model which is referred as testing dataset.

Details of adaptive neuro-fuzzy inference system

Adaptive neuro-fuzzy inference system is a type of artificial neural network relies on Takagi–Sugeno fuzzy inference system (Jang 1991). Adaptive neuro-fuzzy inference system is the combo of backpropagation and fuzzy principles. This neuro-adaptive approach paves the way for the fuzzy system to learn the knowledge about the dataset. By using the dataset, adaptive neuro-fuzzy inference system develops a system whose function parameters are modified or adjusted either only by backpropagation or with the integration of least squares type method. This adjustment makes the fuzzy systems to acquire info from the data. The underlying architecture of ANFIS is depicted in Fig. 3.

Fig. 3
figure 3

Architecture of adaptive neuro-fuzzy inference system

Figure 3 exposes that the adaptive neuro-fuzzy inference system is analogous to neural network, as it maps the income and the outcomes through their respective member functions and parameters which shall be utilized to construe the input or output map.

Adaptive neuro-fuzzy inference system makes inference by fuzzy logic and shapes fuzzy membership function using neural network (Altrock 1995; Brown and Harris 1994). Like Mamdani and Sugeno, the fuzzy rule–based systems have created various inference skills (Brown and Harris 1994).

As mentioned earlier, Sugeno-type systems were implemented in which a crisp function individualizes the outcome of the fuzzy rule. If (x1, x2, x3, ……xn) = (A1, A2, A3, .……An), then y = f(x) in (A1, A2, A3, .……An) which are fuzzy sets and y is crisp function. In this specified structure, the output of every norm is crisp value and the weighted average has been utilized to determine the outcome of every norms. The customized deliveries are intrigued with framed fFS that can be construed as follows:

$$f_{FS}=\frac{\sum_{i=1}^mw_i\prod_{j=1}^n\mu_{A_j^i}\left(x_j\right)}{\sum_{i=1}^m\prod_{j=1}^n\mu_{A_j^i}\left(x_j\right)}$$
(23)

where m is the measure of rules, n is considered as the count of data points, and μA is the membership function of fuzzy set A.

This method included various types of membership functions such as square oval and modulated techniques. The membership commitments can be duplicated repeatedly to obtain better output. The Gaussian function was utilized for a jury, and it is in the following form:

$$f\left(x,\sigma,c\right)=e^\frac{-\left(x-c\right)^2}{2\sigma^2}$$
(24)

where c represents the average of the collected data. σ is the standard deviation of the data.

The unchanged dataset of support vector machine was adopted for developing the adaptive neuro-fuzzy inference system model that means same inputs. Like support vector machine and backpropagation, the adaptive neuro-fuzzy inference system model’s dataset has to be segregated into the two subsets.

Training dataset

As mentioned earlier, it was compiled for framing the adaptive neuro-fuzzy inference system model. The alike 19 datasets out of 27 are considered as the training datasets.

Testing dataset

After the development of the model, the dataset which was utilized for verifying the developed model is the testing dataset. The left-out 8 datasets were utilized as testing datasets.

Both the datasets are scaled to 0 and 1 which is obtained by urging Eq. (14).

MATLAB was the application tool utilized for developing the mentioned support vector machine, backpropagation, and adaptive neuro-fuzzy inference system model.

Results and discussions

In this study, radial basis function (Eq. (13)) has been chosen as a covariance function. There are three different design parameters C, ε, and σ which have to be estimated by cut and try method. The design values are C = 10000, ε=0.007, and σ=0.3.The support vectors are those which are non-zeros. In this study, the amount of support vectors counted is 19. The corresponding tuning parameters performs constantly to not count on the C values. The following equation has been developed based on support vector machine by incorporating\(K\left({x}_i,x\right)=\mathit{\exp}\left\{-\frac{{\left({x}_i-x\right)}^T\left({x}_i-x\right)}{2{\sigma}^2}\right\}\), which are design values in Eq. (11).

$$y=\mathit{\operatorname{sign}}\left(\sum_{i=1}^{19}{\alpha}_i{y}_i\mathit\;{\exp}\left\{-\frac{{\left({x}_i-x\right)}^T\left({x}_i-x\right)}{0.18}\right\}\right)$$
(25)

The coefficient of correlation (R) computes the competence of the constructed support vector machine model. The optimal value of R is one, and it will not exceed one.

$$R=\frac{\sum_{i=1}^n\left({d}_{ai}-{\overline{d}}_a\right)\left({d}_{pi}-{\overline{d}}_p\right)}{\sqrt{\sum_{i=1}^n\left({d}_{ai}-{\overline{d}}_a\right)}\sqrt{\sum_{i=1}^n\left({d}_{pi}-{\overline{d}}_p\right)}}$$
(26)

where dai and dpi are the possessions of the innovative techniques to its extent, and \({\overline{d}}_{\mathrm{a}}\) and \({\overline{d}}_{\mathrm{p}}\) are the values determined in the variation of the datasets.

The built support vector machine has the value of R in the vicinity to unity, so it can be concluded that the developed support vector machine has the capability on forecasting the dw value of the Chennai Region. Figure 4 conveys the behavior of the support vector machine model.

Fig. 4
figure 4

Training and testing performance of support vector machine

Figure 4 reveals the performance to the empirical value of judgment. Since the value of R is in the vicinity to one, then it is represented as the better developed model. The succeeding Fig. 5 depicts the effectiveness of the developed SVM model, which is the predicted spatial variability of water depth in Chennai. The output will be exposed in the form of a map as shown in Fig. 5a and b.

Fig. 5
figure 5

a Two-dimensional map of the predicted dw by support vector machine model. b Three-dimensional map of the predicted dw by support vector machine model

The above maps which were delivered by the support vector machine model can be used for determining the groundwater table for the futuristic purposes.

The coefficient of correlation (R) was used to evaluate the potential of BP. As the matter of fact, when the value of R is near to one, then the model is considered to be the better predictive model. The abovementioned Eq. (26) provides the formula for determining the R value. Figure 6 exposes the performance of training and testing dataset of backpropagation.

Fig. 6
figure 6

Training and testing performance of backpropagation model

It is clear from the above figures that the developed backpropagation is a good model. Hence, the output will be appreciable. Figure 7a and b establish the yield of the backpropagation model.

Fig. 7
figure 7

a Two-dimensional spatial variability map of dw by using backpropagation. b Three-dimensional spatial variability map of dw by using backpropagation

The output of backpropagation model can be used for the future and various purposes. For the adaptive neuro-fuzzy inference system model, initial count of membership functions for every input is 21. An appropriate design has to be preferred for the optimal workability and outcome of the network. The final configuration fuzzy inference system (FIS) is detailed below after the work out with 60 epochs.

The count of input and output membership functions is 21 numbers; however, fuzzy rules have the same number of rules. The number of inputs is 2 (Lx and Ly), where Lx and Ly are coordinates of the location of the borewells. The kind of membership functions utilized for each input and output was Gaussian and linear functions. The ANFIS model took 60 numbers of training epochs and hence yield the value of R for the training dataset (R) = 0.827 and testing performance (R) = 0.785.

The capability of the adaptive neuro-fuzzy inference system model can be assessed by coefficient of correlation (R). A known fact, when R is in the vicinity to 1, then the developed model can be considered as better model. Equation (26) provides the formula for the judgment of R value. Figure. 8 delivers the peculiarity of the developed models.

Fig. 8
figure 8

Performance of adaptive neuro-fuzzy inference system model

It is clear from the above figures constructed during the adaptive neuro-fuzzy inference system is a good model. Hence, the output will be valuable. Fig. 9a and b convey the result of adaptive neuro-fuzzy inference system.

Fig. 9
figure 9

a Two-dimensional spatial variability map of dw by using adaptive neuro-fuzzy inference system. b Three-dimensional spatial variability map of dw by using adaptive neuro-fuzzy inference system

The output of the adaptive neuro-fuzzy inference system model can be advantageous for the forthcoming purposes.

The above-developed models provide the best output information of depth (dw) for Chennai Region. The comparison between the developed support vector machine, adaptive neuro-fuzzy inference system, and backpropagation model was carried out. The above three-dimensional surface graphs of Lx, Ly, and dw are presented. The subtlety of the predicted dw to Lx and Ly can be verified (Gandomi and Alavi 2013). Figure 10 illustrates the capability of the verified designs with respect to the determined R value.

Fig. 10
figure 10

Comparison of the developed models

Although all the three developed models are effective in determining the dw, a superior model has to sort out. The support vector machine model outweighs the other back propagation and adaptive neuro-fuzzy inference system model in computing the dw for Chennai. The equations generated by the SVM model can be able to predict the groundwater table level in the Chennai Region for the futuristic purposes. Moreover, the maps were generated by the models. The above shown maps represent the predicted groundwater table of Chennai Region which will help in determining the level of groundwater table at any point of location without performing any experimental works. This in turn reduces the cost of any infrastructural projects and the groundwater utilization–related projects.

The sensitivity examination researches the commitment of the input parameters to the yield expectation. In order to perform this sensitivity examination, the following equations were utilized to compute the percentage of sensitivity (Se) of the output to each input parameter (Nash and Sutcliffe 1970).

$${D}_i={f}_{max}\left({d}_i\right)-{f}_{min}\left({d}_i\right)$$
(27)
$${S}_e=\frac{D_i}{\sum_{j=1}^n{D}_j}\times 100$$
(28)

where fmax(di) and fmin(di) are the upper limit and the lower limit of the foreseen yield over the ith domain, whereas the remaining variables are proportionate to the mean value. The computed sensitivity values of the developed models are tabulated in Table 3.

Table 3 Sensitivity analysis of the input parameters of the SVM, BP, and ANFIS models

From the above Table 3, the latitude (Lx) has the maximal effect on determining the depth of groundwater (dw). The following Fig. 11 depicts the sensitivity of the inputs for the developed models.

Fig. 11
figure 11

Sensitivity analysis of inputs for the developed model

The developed SVM, BP, and ANFIS models may require more verification for justifying the capability of the models; hence, the parametric study was carried out. The literature recommended some statistical approaches for justifying the veracity of the predictive models; hence, the same statistical parameters were determined for the adopted SVM, BP, and ANFIS models. Root mean square error (RMSE) correlates the measured and foreseen values, and then, it calculates the square root of the mean leftover miscue. The lessen RMSE value hints the improved capability of the model. Weighted mean absolute percentage error (WMAPE) examines the contrast between the residual error of each data with the measured or target data. Nash-Sutcliffe efficiency (NS) is known as coefficient of efficiency (E) which is the ratio of residual error variance to measured variance in observed data (Nash and Sutcliffe 1970).

Variance account factor (VAF) illustrates the proportion of error deviation to detected data deviation. Adjusted determination coefficient (Adj. R2) was utilized to determine performance index (PI) in the event of computing the accuracy of the model (Yagiz et al. 2012). Normalized mean biased error (NMBE) reckons the potential of the model to foresee the value which is settled away from the mean value. The non-negative value of NMBE describes that the prediction is elevated, whereas negative value exposes that the prediction is held down of the developed model (Srinivasulu and Jain 2006). Root mean square error to observation’s standard deviation ratio (RSR) consolidates the advantages of error index measurements and incorporates a scaling/standardization factor, with the goal that the subsequent measurement and revealed qualities can apply to different constituents. The depressed value of RSR depicts the best efficiency of the model and vice versa, whereas the optimal value that is zero was computed for each model by the following equations and it was tabulated in Table 3 (Gandomi and Alavi 2013; Gokceoglu 2002; Gokceoglu and Zorlu 2004; Nayak et al. 2005; Moriasi et al. 2007; Wang et al. 2009; Chen et al. 2012; Nurichan 2014; Chandwani et al. 2015)),

$${RMSE}=\sqrt{\frac{\sum_{t=1}^n{\left({d}_t-{y}_t\right)}^2}{n}}$$
(29)
$${WMAPE}=\frac{\sum_{t=1}^n\left|\frac{d_t-{y}_t}{d_t}\right|\mathrm{x}{d}_t}{\sum_1^n{d}_t}$$
(30)
$$E=1-\left[\frac{\sum_1^n{\left({d}_t-{y}_t\right)}^2}{\sum_1^n{\left({d}_t-{d}_{{mean}}\right)}^2}\right]$$
(31)
$$VAF=\left(1-\frac{\mathit{\operatorname{var}}\left({d}_t-{y}_t\right)}{\mathit{\operatorname{var}}\left({d}_t\right)}\right)\mathrm{x}100$$
(32)
$${R}^2=\frac{\sum_{t=1}^n{\left({d}_t-{d}_{{mean}}\right)}^2-\sum_{t=1}^n\left({d}_t-{y}_t\right)}{\sum_{t=1}^n{\left({d}_t-{d}_{{mean}}\right)}^2}$$
(33)
$$Adj\;R^2=1-\left(1-R^2\right)\frac{\left(n-1\right)}{\left(n-p-1\right)}$$
(34)
$$PI=Adj\;R^2+0.01VAF-{RMSE}$$
(35)
$$RSR=\frac{{RMSE}}{\sqrt{\sum_1^N\frac{{\left({d}_i-{d}_{{mean}}\right)}^2}{N}}}$$
(36)

where n is the count of training and testing dataset; dt is the measured value; dmean is the mean of actual value; and yt is the predicted value

The above equations were adopted to determine various statistical parameters, and Table 4 expresses the value of the various statistical parameters of the developed models.

Table 4 Performance indices of the developed models

Figure 12 depicts the statistical performances in the form of chart which justifies the performance of the developed model.

Fig. 12
figure 12

Statistical performances of the developed model

Table 4 clearly represents the computed values of distinct statistical parameters in which for all the three models, the RMSE is less which indicate the appreciable performances of the developed model. Furthermore, RSR values are in the vicinity of the optimal value zero, which exposes the accuracy of the developed model. The coefficient of determination value is more for support vector machine which illustrates the superiority of the model when compared with the other backpropagation and adaptive neuro-fuzzy inference system model. The researchers expressed that the accomplishment indicators were used to evaluate the precision of the developed model; however, not one thing is exceptional (Gokceoglu 2002).

The Taylor diagram is one of the pictorial representations of briefing how adjacently the pattern matched with the observations. This figure assesses the diversified features of convoluted models (Taylor 2001). The models which produce the predicted values which agree with the measured value will lie in the vicinity of x-axis which depicts the high-pitched correlation and low errors.

The following Fig. 13 shows the Taylor diagram which depicted the model performance, in which SVM model plot is near the best correlation than the other two ANFIS and backpropagation neural network (BP) models. The plots which were made between fitted value and residual values show the linearity, the homoscedasticity, and the availability of outliers; moreover, it exposes average residual value for every fitted value close to zero. The Q-Q (quantile-quantile) plot technically conveys the plot between theoretical quantiles and the standardized residuals that shows that the residuals are on the dashed line. A horizontal line having equitably dispersed points indicates the better homoscedasticity of the variance of residues. The leverage against residual plot allows to determine the effective observances in the regression models. The point which falls exterior to the dashed line was considered as the influential point. The following are the charts which depict the William plot for the developed models.

Fig. 13
figure 13

Taylor diagram for the developed soft computing models

In the above Fig. 14a, b, and c, most of the points are in the linear line with less outliers and good homoscedasticity; however, SVM model is doing well compared to the other ANFIS and BPNN model. Marginal histogram analyzes the distribution of each measure by adding it to the boundaries of each axis in the scatter plot. It displays the predicted data on several aggregated levels in single view (Dai et al. 2022)

Fig. 14
figure 14

William’s plot for the developed model. a SVM. b ANFIS. c BPNN

The above Figure 15 depicts the marginal histogram by using the forecasted groundwater depth for the specific Chennai City. In the abovementioned figure, SVM model lies within the range with fairly acceptable errors compared to the other developed ANFIS and BP models.

Fig. 15
figure 15

Marginal histograms based on the predicted results of SVM, ANFIS, and BP models

Some observations arise from the results generated from the adopted SVM, ANFIS, and BP models. The 2D map and 3D maps were obtained from all the models; however, based on the accuracy, SVM model has more precision than the other models. Based on the statistical hypothesis, SVM model shows appreciable values even though it was under trained. The sensitivity for the input latitude (Lx) is greater than the other input in all the adopted models. The exploration of other regression models is possible, but the advanced level models were utilized here for this specific problem. Based on the Taylor diagram, the William plot, and the marginal histogram, again SVM outperforms the other ANFIS and BP models.

Conclusion

This article describes the efficiency of the regression procedures for foretelling groundwater depth. In this, all the models provide the map as an output, whereas support vector machine produces the equation for determining dw for various datasets. Support vector machine uses three tuning parameters, whereas Adaptive neuro-fuzzy inference system uses six tuning parameters. Even though all the adopted models yield the better output, support vector machine issued the best result among the developed models. Various statistical calculations and graphical representation expose that the SVM model has the better potential in determining the groundwater table depth for the specific Chennai City. Thus, the adopted model provided the better equation for earlier prediction of groundwater table for the respective Chennai City in order to be utilized for the future drinking water and construction projects.