1 Introduction

1.1 Aim & Scope

Today, water scarcity is one of the most critical problems for humans. Also, water quality modeling is one of the main challenges in water resources management (Kheradpisheh et al. 2015; Qu et al. 2020). Use for agricultural, industrial, and drinking purposes is one of the reasons for the high importance of our water quality management. One of the management challenges is to predict the future state of water resources. Also, in today's research, water resources have been studied and modeled by different scenarios (Chang et al. 2021).

However, water scarcity has occurred worldwide due to population growth, industrial development, and increasing water use, especially in arid and semi-arid regions (Sharafati et al. 2020). Recently, multiple and continuous droughts in different parts of Iran have occurred. It should be noted that available water resources are unstable, so that there is no guarantee that they will be usable. At this particular time, with the advancement of science and various models for studying climate change, water quality, air pollution, etc. Meanwhile, One of these advances has occurred in the case of AI, which contributes to many types of research, and AI models have been more successful than the other approaches (Cao et al. 2020; Lyu and Liu 2021).

1.2 Literature Review

Recently, many types of research have been done in water resources management. These studies have been conducted to qualitative and quantitative modeling, optimization of the system, and estimating the parametric changes of water resources. Scientific advances in engineering applications led to the development of AI that makes it easy to analyze nonlinear and complex problems. These methods are divided into several categories such as Artificial Neural Network (ANN), Machine Learning (ML), Metaheuristic Optimization Algorithms (MOA), Fuzzy Inference System (FIS), and Combination of ANNs and MOAs.

On the one hand, some researchers studied water resources management using ANNs, which not only increase the speed of evaluation but also improve the accuracy of the groundwater level and other water parameters estimation (Yang et al. 2014; Shahid and Ehteshami 2015; Heddam et al. 2016). On the other hand, In many articles, authors modeled the water variables using ML models (Elkiran et al. 2019; Majumder and Eldho 2020; Qu et al. 2020; Kadkhodazadeh and Farzin 2021). Moreover, in some of the papers, parameters simulated by ML models and ANNs conjunctions (Zhu and Heddam 2019), while in other works, ANNs and ML models have been used separately for quantitative and qualitative assessment of water resources parameters, water resource monitoring, estimation (Azad et al. 2015; Shi et al. 2018; Ye et al. 2019; Patki et al. 2021; Zhang et al. 2021). Other studies used optimization algorithms to improve the modeling and simulation of water resources. MOAs such as genetic algorithm, differential evolution, and particle swarm optimization was implemented to increase the accuracy and precision of water engineering (Heddam et al. 2016; Guneshwor et al. 2018; Jeihouni et al. 2020). In the rest of the articles, a combination of MOAs, wavelet transform, and ANNs and MLs is considered a solution to improve the modeling optimization, optimal design, accurate estimation, and prediction of water resources. (Jaddi and Abdullah 2017; Alizadeh et al. 2018; Zhang et al. 2019; Poursaeid et al. 2020; Noori et al. 2020). Also, fuzzy logic, fuzzy neural networks, and FISs were used for parametric modeling of water, water pollution. (Tokachichu and Gaddam 2021; Sada and Ikpeseni 2021; Niu et al. 2021; Asgari et al. 2021).

1.3 Contribution

In this paper, the groundwater level was predicted using ML techniques and mathematical methods. Although various studies have been done in this field, for the first time in the study area, the GWL was estimated with ELM, LSSVM, ANFIS models, and MLR model as a comparative study to simultaneously estimate groundwater parameters. In quantitative and qualitative water resource management, various practical factors include Cl, EC, TDS, SO42+, Ca2+, Mg2+, etc. Among these factors, the most widely used water quality parameters are TDS, EC, salinity, and time considered the input vector to the abovementioned models.

The rest of the paper is organized as follows: The water quality and its parameters are explained in Sect. 2. In Sect. 3, materials and methods such as various AI models and MLR formulation are described. In Sect. 4, the study area and its steps are expressed. The results are presented and discussed in Sect. 5. Finally, the conclusion is summarized in Sect. 6.

2 Problem Description

In this section, water quality and its parameters are explained. Also, some international water quality standards are presented.

2.1 Water Quality (WQ)

WQ management plays a critical role in the quality management of water resources and sustainable use of these water resources (Ahmadianfar et al. 2020). Several factors affect water quality. These factors are divided into two categories of quantitative and qualitative factors. Among the quantitative factors, such as rainfall, temperature, etc., can be mentioned. Qualitative factors in terms of number are much more than quantitative factors. The following are some of the most handful of water quality parameters (Lukawska-Matuszewska and Urbański 2014).

2.1.1 Total Dissolved Solids (TDS)

Total Dissolved Solids (TDS) is one of the reliable and valid parameters for WQ, which means the numerical sum of all types of soluble solids in water (Jamei et al. 2020). On the one hand, TDS measures the number of solutes remaining after evaporation of the measured volume of purified water (Mokhatab et al. 2019). On the other hand, this parameter is widely known as a measure of water suitability for drinking and irrigation purposes, which is measured in milligram per liter (mg/l). TDS includes various types of mineral salts such as sodium (Na+), magnesium (Mg+2), calcium (Ca+2), potassium (K+), chloride (Cl), sulfate (SO4−2), nitrates (NO−3), soluble bicarbonates (HCO−3), and organic matter (Ahmadianfar et al. 2020).

2.1.2 Electrical Conductivity (EC)

Electrical conductivity (EC) is one of the essential WQ parameters used in several articles to model the WQ resources. EC is one of the most important factors in WQ analysis which is equivalent to the concept of Salinity (Serrano-Finetti et al. 2019). Therefore, EC is a parameter that indicates the degree of electrical transmission in water, which is closely related to the number of water-soluble salts. The salts concentration of water is a crucial factor in determining WQ's suitability for drinking and irrigation purposes. It should be noted that EC is measured in micro Siemens per centimeter (μS/cm). Same with TDS, EC is dependent on the amount of dissolved ionic solutes such as sodium (Na+), chloride (Cl), magnesium (Mg+2), sulfate (SO4−2), And calcium (Ca+2) in water. However, the amount of ionic salts in the water reduces its drinking quality (Ahmadianfar et al. 2020).

2.1.3 Salinity

The water salinity is a qualitative parameter that is one of the criteria for WQ assessment. This parameter is known as the concentration of salt in the water. However, salinity is defined as natural salinity in water resources, but some factors such as high evaporation rates or increased human consumption cause it to increase (Harris 2009). In other words, salinity can be defined as the concentration of soluble mineral salts in water and soil based on volume or weight per unit area (Sparks 2003).

2.1.4 Hardness

This parameter included the presence of some of the water-soluble salts. It can also mean the amount of calcium and magnesium in water. Moreover, its ordinary meaning is the concentration of calcium carbonate in water. These salts have different forms which the most common form are fluorides, carbonates, sulfates of calcium, and magnesium (Mtaita 2003). Since water hardness is a criterion of WQ measurement, it is considered slightly different from Salinity (Ansell 2005).

2.1.5 Dissolved Oxygen (DO)

Dissolved Oxygen (DO) is one of the WQ parameters, which means the amount of soluted oxygen present in water. This parameter plays a significant role in the science of water resource management (Yang et al. 2021). Also, it represents the health of the water and is a criterion for understanding the health status of the river (Tiyasha et al. 2021).

2.2 WQ Standards

There are several standards for WQ parameters, shown in Table 1. These standards include WHO, BIS, and SSMQO (Ahuja et al. 2019).

Table 1 Standards of WQ

3 Material and Methods

3.1 AI Models

In this section, AI models and the MLR method are expressed.

3.1.1 Least Square-Support Vector Machine (LSSVM)

The support vector machine (SVM) is based on Vapnik theory (Sapankevych and Sankar 2009). This type of ML uses the method of minimizing structural risk, while some other methods of AI use the experimental method of minimizing the risks (Cristianini and Shawe-Taylor 2000; Dibike et al. 2001). The SVM can be used for classification and regression problems. In this theory, in a quadratic programming problem, an equation is obtained that determines the constant parameters of the model. Then, the optimal values for the constants of SVM can be obtained using MOAs. SVMs were initially used for classification, but they can be used for time-series prediction (Cristianini and Shawe-Taylor 2000; Campbell 2002; Schölkopf and Smola 2002; Suykens et al. 2002).

By mathematical definition, the least squares support vector machine (LSSVM) is considered as if xi and yi are the input and output data for the model, respectively, then the nonlinear regression function is also defined as follows (Valyon and Horvath 2007):

$$f\left(x\right)=\sum_{i=1}^{k}{w}_{i}{\varphi }_{i}\left(x\right)+b={w}^{T}\cdot \varphi \left({x}_{i}\right)+b$$
(1)

where w is the weight vector, b is the bias, and φ are nonlinear functions for mapping data into large feature spaces:

$$\begin{array}{l}w=\left[\begin{array}{c}{w}_{1}\\ \vdots \\ {w}_{k}\end{array}\right]\qquad,\qquad\varphi =\left[\begin{array}{c}{\varphi }_{1}\\ \vdots \\ {\varphi }_{k}\end{array}\right]\\ or\\ w={\left[{w}_{1},\cdots ,{w}_{k}\right]}^{T}\qquad,\qquad\varphi ={\left[{\varphi }_{1},\cdots ,{\varphi }_{k}\right]}^{T}\end{array}$$
(2)

The nonlinear regression problem can be solved by minimizing the following quadratic programming problem:

$$\begin{array}{cc}{Min}_{w,e,b}& \left\{G\left(w,e\right)=\frac{1}{2}{w}^{T}w+\frac{1}{2}C\sum\limits_{i=1}^{n}{e}_{i}^{2}\right\}\end{array}$$
(3)

where C has the role of tradeoff variable between two terms of the equation. The result is defined as follows:

$${Y}_{i}={w}^{T}\cdot \varphi \left({x}_{i}\right)+b+{e}_{i},\qquad{Y}_{i}={y}_{i}+{\lambda }_{i}$$
(4)

λi is the system noise. Also, for each xi in LSSVM, the result is a weighted sum of n kernel functions, in which the central variable of the kernel functions is obtained using trained inputs. The Lagrangian form of the equation with these explanations is shown in Eq. (5).

$$\Psi \left(w,b,e,\alpha \right)=G\left(w,e\right)-\sum_{i=1}^{n}{\alpha }_{i}\left({w}^{T}\cdot \varphi \left({x}_{i}\right)+b+{e}_{i}-{Y}_{i}\right)$$
(5)

In Eq. (5), \({\alpha }_{i}\)'s are Lagrangian multipliers. Then, a constrained optimization problem can be solved. Optimization constraints will be defined as Eq. (6).

$$\begin{array}{l}\frac{\partial\Psi }{\partial w}=0\qquad\to w=\sum\limits_{i=1}^{n}{\alpha }_{i}\varphi \left({x}_{i}\right)\\ \frac{\partial\Psi }{\partial b}=0\qquad\to \sum\limits_{i=1}^{n}{\alpha }_{i}=0\\ \frac{\partial\Psi }{\partial {e}_{i}}=0\qquad\to {\alpha }_{i}=C{e}_{i}\;\;\:\quad\qquad\qquad\qquad\qquad,i=\mathrm{1,2},\cdots ,n\\ \frac{\partial\Psi }{\partial {\alpha }_{i}}=0\qquad\to {w}^{T}\varphi \left({x}_{i}\right)+b+{e}_{i}-{Y}_{i}=0\qquad,i=\mathrm{1,2},\cdots ,n\end{array}$$
(6)

At the end of the above steps, the final solution of the problem is as follows:

$$\left[\begin{array}{ccccc}0& 1& 1& \cdots & 1\\ 1& {\Phi }_{11}+\frac{1}{C}& {\Phi }_{12}& \cdots & {\Phi }_{1n}\\ 1& {\Phi }_{21}& {\Phi }_{12}+\frac{1}{C}& \cdots & {\Phi }_{2n}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1& {\Phi }_{\left(n-1\right)1}& {\Phi }_{\left(n-1\right)2}& \cdots & {\Phi }_{\left(n-1\right)n}\\ 1& {\Phi }_{n1}& {\Phi }_{n2}& \cdots & {\Phi }_{nn}+\frac{1}{C}\end{array}\right]\left[\begin{array}{c}b\\ {\alpha }_{1}\\ {\alpha }_{1}\\ \vdots \\ {\alpha }_{n}\end{array}\right]=\left[\begin{array}{c}0\\ {Y}_{1}\\ {Y}_{2}\\ \vdots \\ {Y}_{n}\end{array}\right]$$
(7)
$$\begin{array}{l}Y\quad=\left[{Y}_{1}\quad,\quad{Y}_{2}\quad,\quad\cdots \quad,\quad{Y}_{n}\right]\; ,\alpha =\left[{\alpha }_{1}\quad,\quad{\alpha }_{2}\quad,\quad\cdots \quad,\quad{\alpha }_{n}\right]\\,1=\left[\mathrm{1\quad,\quad1}\quad,\quad\cdots\quad ,1\right]\end{array}$$
(8)

Furthermore, in Eq. (7), the Φi,j is the kernel matrix, and H(xi,xj) is the kernel functions, which will be written as follows:

$${\Phi }_{i,j}=H\left({x}_{i},{x}_{j}\right)\quad,\quad{\Phi }_{i,j}={\varphi }^{T}\left({x}_{i}\right)\cdot \varphi \left({x}_{j}\right)\quad,\quad i,j=1,\cdots ,n$$
(9)

3.1.2 Extreme Learning Machine (ELM)

The theory of this algorithm was proposed by a Singaporean scientist named Bin in 2004 (Bin et al. 2004). This model of AI is one of the learning machines, and in various researches, its superiority over other methods of AI has been proved due to its single-layer feed-forward neural network (Bin et al. 2006, 2012). If we have n neurons in the hidden layer, we can define the single-layer feed-forward network as Eq. (10) (Liang et al. 2006).

$${y}_{j}=\sum\limits_{i=1}^{n}{\beta }_{i}g\left({x}_{k};{c}_{i},{a}_{i}\right)\quad,j=\mathrm{1,2},3,\cdots ,k$$
(10)

where g, ci, and βi are the transfer function between input and output layers, respectively. The weights that connect the output nodes to the hidden layer nodes and the biases are initialized randomly. The equation, as mentioned earlier, can be rewritten in the form of the following equations.

$$H\ \beta =Y$$
(11)
$$H={\left[\begin{array}{ccc}g\left({x}_{1};{c}_{1},{w}_{1}\right)& \cdots & g\left({x}_{1};{c}_{k},{w}_{k}\right)\\ \vdots & \ddots & \vdots \\ g\left({x}_{n};{c}_{1},{w}_{1}\right)& \cdots & g\left({x}_{N};{c}_{k},{w}_{k}\right)\end{array}\right]}_{n\times k}$$
(12)
$$\beta ={\left({\beta }_{1}^{T},{\beta }_{2}^{T},\cdots ,{\beta }_{h}^{T}\right)}_{h\times m}^{T}$$
(13)

Finally, the output weights of the learning machine can be calculated in the hidden layer using the Moore–Penrose generalized inverse matrix method:

$$\beta ={H}^{\dagger}Y$$
(14)

3.1.3 Adaptive Neuro-Fuzzy Inference System (ANFIS)

Adaptive Neuro-Fuzzy Inference System (ANFIS) is a feed-forward neural network that simulates based on fuzzy logic. In this network, two types of Fuzzy Inference Systems (FIS) based on fuzzy logic (Tokachichu and Gaddam 2021; Arora and Keshari 2021):

  • Fuzzy inference system-based network, called Mamdani, known as M-FIS for short.

  • Takagi–Sugeno fuzzy inference system-based network, known as TS-FIS for short.

In these networks, at least there are two inputs, D1 and D2, which will be the two if–then conditional principles for each output as Oi for a network based on the TS-FIS fuzzy inference system. The conditional rules of these fuzzy networks are as follows:

  1. 1)

    If x is input D1 and output O1, then we have:

    $${f}_{1}={a}_{1}{x}_{1}+{b}_{1}{y}_{1}+{c}_{1}$$
  2. 2)

    If x is input D2 and output O2, then we have:

    $${f}_{2}={a}_{2}{x}_{2}+{b}_{2}{y}_{2}+{c}_{2}$$

Neuro-fuzzy networks are organized with an input layer and five other layers, which can be a multi-layered neural network.

  • Layer 0: Input layer with n Input Nodes

  • Layer 1: This layer provides a membership function for points using Gaussian principles by fuzzifying each node.

    $${\mu }_{Di}\left(x\right)=\mathrm{exp}\left\{-{\left[{\left(\frac{x-{h}_{i}}{{z}_{i}}\right)}^{2}\right]}^{{t}_{i}}\right\}$$
    (15)

    where zi, ti, and hi are the parameter of adaptive functions in the network.

  • Layer 2: all fuzzified data is passed into operators. Di, Oi The membership parameters μli(x) and μki(x), Are the antecedent parameters of rule (1).

    $${w}_{i}={\mu }_{Di}\left(x\right)\times {\mu }_{Oi}\left(x\right)$$
    (16)
  • Layer 3: All of the nodes is normalized as:

    $${\overline{w} }_{i}=\frac{{w}_{i}}{\sum\limits_{t=1}^{T}{w}_{t}}$$
    (17)

    where the  \({\overline{w} }_{i}\) second layer is the sum of the operator in the ith order.

  • Layer 4: In each node, the corresponding linear function is calculated, and the coefficient of the functions is calculated using the backpropagation neural network error.

    $${\overline{w} }_{i}{f}_{i}={\overline{w} }_{i}\left({a}_{0}{x}_{0}+{a}_{1}{x}_{1}+{a}_{2}\right)$$
    (18)

    where ai is the input i and \({\overline{w} }_{i}\) as the output of layer 3. This model is trained using the least-squares approximation method.

  • Layer 5: This layer is the sum of the output of each node from the fourth layer, which is calculated as below:

    $$\sum {\overline{w} }_{i}{f}_{i}=\frac{\sum {w}_{i}{f}_{i}}{\sum {w}_{i}}$$
    (19)

3.1.4 Multiple Linear Regression (MLR)

Multiple linear regression (MLR) methods are a statistical method used to examine and infer the relationship between the dependent variable and multivariate primary variables. These methods are written as the following equation based on the mathematical relationships between the primary and secondary variables (predictors and responses):

$$f\left({x}_{i}\right)={a}_{0}+{a}_{1}{x}_{1}+{a}_{2}{x}_{2}+\cdots +{a}_{n}{x}_{n}+e$$
(20)

where f (xi) is a secondary variable, xi's are multiple primary variables, ai are regression multipliers, and e is a random error (Mustapha and Abdu 2012).

4 Case Study and Data Collection

4.1 Study Area

In this paper, the study area is Mighan plain in Arak, located in Markazi province in Iran. According to the statistical results provided by Synoptic stations in the region, the maximum and minimum rainfall varies from 461 mm in the northeast to 208 mm in the center of Arak plain. Figure 1 shows the geographical location of the study area and Vismeh well. In this work, TDS, EC, water salinity, and time parameters were used as models dataset for GWL simulation.

Fig. 1
figure 1

Mighan Lake (Source: Wikimedia & Google Map)

4.2 Study Steps and Data Analysis

In this study, the time-series database was first collected through the database of the Regional Water Company of Markazi province, and then the dataset was categorized. The K-Fold cross-validation method was used to increase the simulation's reliability and accuracy by removing the data trend (detrending data) and data randomization (Poursaeid et al. 2021).

It should be noted that 173 months of sampling data were used in the training of models. In most articles on AI, the combination of test and train data percent is 80 to 20 or 70 to 30. (Reynolds et al. 2019; Jang et al. 2019; Sada and Ikpeseni 2021; Hameed et al. 2021; Hou et al. 2021). Therefore, due to better validation of the model, it was decided that 70% of the data would be used for training and the remaining 30% for the testing phase in modeling. Then, the same training dataset is entered for all of the LSSVM, ELM, ANFIS, and MLR models. Moreover, the observed data were TDS, Salinity, t, and EC, as Primary variables, and GWL, as a response parameter. Finally, the performance and accuracy of the models were compared. Statistical indices made this comparison according to Eqs. (21) to (24).

$${R}^{2}=\frac{{\left(n{\sum }_{i=1}^{n}{O}_{i}{I}_{i}-{\sum }_{i=1}^{n}{O}_{i}{\sum }_{i=1}^{n}{I}_{i}\right)}^{2}}{\left(n{\sum }_{i=1}^{n}{\left({O}_{i}\right)}^{2}-{\sum }_{i=1}^{n}{\left({O}_{i}\right)}^{2}\right)\left(n{\sum }_{i=1}^{n}{\left({I}_{i}\right)}^{2}-{\sum }_{i=1}^{n}{\left({I}_{i}\right)}^{2}\right)}$$
(21)
$$RMSE=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({O}_{i}-{I}_{i}\right)}^{2}}$$
(22)
$$MAPE=\frac{{\sum }_{i=1}^{n}\left|{O}_{i}-{I}_{i}\right|}{n}\times 100$$
(23)
$$SI=\frac{RMSE}{\overline{I} }$$
(24)

where Ii and Oi are the input values and output values, respectively. Also, for all AI and MRL models, \(\overline{I }\) it is considered the mean of observational values and equal to n the number of observational values. In the following, the accuracy of different models for estimating GWL parameter values is investigated.

5 Results and Discussions

First, the input vectors were applied to all models, and GWL is considered the output vector. Then, according to the evaluation indices, the performance of the models was evaluated. In this research, seven approaches were used to assess the performance of models in simulation (Figs. 2 and 3).

Fig. 2
figure 2

Research Flowchart

Fig. 3
figure 3

Graphical Abstract

5.1 Response and Correlation Plots

The response plot shows the actual and the predicted values for every sample that mapped on each other. Also, the correlation plot is a scatter diagram used to demonstrate the linear correlation between the actual values and their corresponding predicted ones. The response and correlation plot of all models are drawn in Figs. 4 and 5, respectively.

Fig. 4
figure 4

Models Prediction Performance: (a) LSSVM, (b) ELM, (c) MLR, (d) ANFIS

Fig. 5
figure 5

Observed data and Results Correlation: (a) LSSVM, (b) ELM, (c) MLR, (d) ANFIS

To determine the superior model, the simulation results are drawn by response plots. According to Fig. 4, the ELM model was the most accurate GWL prediction. In Fig. 5, the ELM model had the best correlation between observed and predicted data and was determined as a superior model. However, The least accuracy and performance were assigned to the ANFIS model. As shown in Fig. 5, it had the lowest correlation between responses and observed GWL data.

5.2 Statistical Indicators

The results that compare the accuracy of all models based on different performance indices are shown in Table 2. As can be seen in Table 2, the statistical indices for the ELM model were the most accurate, which has the lowest RMSE, MAPE, and SI value compared to the other methods while it has the closest value to 1 for R2. The RMSE, MAPE, SI, and R2 values for ELM equal to 0.1562, 0.0067, 0.000094, and 0.988, respectively. Besides, the LSSVM model with indices equal to 0.3952, 0.0165, 0.000238, and 0.927 is known as the second accurate model. The MLR and ANFIS models placed in third and fourth, respectively.

Table 2 Statistical Indices

To better visualize Table 2, the various performance indices are compared to each other, shown in Fig. 6. ELM model was determined as the superior model.

Fig. 6
figure 6

Accuracy Compare: (a) RMSE, (b) SI, (c) R2, (d) MAPE

5.3 Uncertainty Analysis by Wilson Score Method (WSM)

Each of the four methods, as mentioned above, has errors between the actual values and the predicted ones, which are evaluated using uncertainty analysis by WSM analysis (Bonakdari et al. 2020). This analysis can be calculated independently according to the computational error in the simulation of each model. However, some of the uncertainties are related to data sampling errors, which is impossible to investigate this type of uncertainty due to the limited number of data or the accessibility to the monthly gathering of the datasets. Computational parameters in WSM analysis are forecast error Eri, average prediction error Average (Eri), and standard deviation of error values Se, which are calculated according to as follows:

$${Er}_{i}={O}_{i}-{I}_{i}$$
(25)
$$Avrg\left(Er\right)=\frac{1}{n}{\sum }_{i=1}^{n}{Er}_{i}$$
(26)
$${S}_{e}=\sqrt{{\sum }_{i=1}^{n}{\left({Er}_{i}-Avrg\left(Er\right)\right)}^{2}/\left(n-1\right)}$$
(27)

where Ii, and Oi are the input and output values, respectively, while n is the number of observation samples. The results of the WSM analysis are shown in Table 3 by considering the Width of Uncertainty Band (WUB) of 95% and applying ± 1.64 Se, which causes the formation of confidence interval equal to 95% (5% error) approximately and denotes by 95% CI.

Table 3 Models Uncertainty Analysis

According to Table 3, ELM and LSSVM models have an underestimation performance, while MLR and ANFIS have an overestimation performance. The ELM model with an average prediction error equal to 0.02744 is considered the most accurate model.

5.4 Regression Receiver Operating Characteristic (RROC) Curve and Area over the RROC Curve (AOC)

The Receiver Operating Characteristic (ROC) curve is a two-dimensional (2D) curve used in classification problems. This criterion is used for the effectiveness of the factor in various issues (Fluss et al. 2012). Analysis of classification issues using this curve is known as ROC analysis. The ROC curve shows the classification problem performance and brought in regression issues, known as the RROC curve. The RROC curve is equivalent to the concept of the ROC curve but in the case of regression problems.

Moreover, the RROC curve shows estimation accuracy and proposed that shows the Over-estimation against Under-estimation (Hernández-Orallo 2013). Also, for comparing the regression modeling and prediction, the Area Over the RROC Curve (AOC) can be implemented (Poursaeed and Namdari 2022). The smaller the value of AOC shows, the higher accuracy of the modeling.

Based on the values range in RROC curves which are shown in Fig. 7, it is concluded that the model has the most accuracy in predicting GWL. The ELM range and domain on 2D Axis conducted that the ELM simulation was better than other models. Then LSSVM model earned second place in the ranking. Meanwhile, the MLR and ANFIS models were ranked next. Also, based on the values of AOC in Table 4, considering that the ELM model has a minimum value of AOC equal to 28.7048, so is known as the superior model. The LSSVM model also has an AOC equal to 178.2307, in second place while MLR and ANFIS models with AOC values equal 468.59 and 270,000 were ranked third and fourth, respectively.

Fig. 7
figure 7

Models RROC Curve: (a) LSSVM, (b) ELM, (c) MLR, (d) ANFIS

Table 4 AOC values of the Models

5.5 Discrepancy Ratio (DR)

According to the mathematical concept of the DR as Eq. (28), the closeness of its value to a horizontal line (DR = 1) shows that the predicted values are close to the actual ones (Poursaeid et al. 2020). Moreover, this diagram shows the superiority of the ELM model to other approaches. The ANFIS model had the worst result.

$$DR=\left[\frac{{GWL}_{predicted}}{{GWL}_{observed}}\right]$$
(28)

Based on the diagrams of Fig. 8, the ELM model had the most accurate prediction that can be detected according to the closeness of points on the line DR = 1. The LSSVM, MLR, and ANFIS were ranked next after the ELM.

Fig. 8
figure 8

Discrepancy Ratio: (a) LSSVM, (b) ELM, (c) MLR, (d) ANFIS

5.6 Error Distribution Plots

According to Eqs. (29) and (30), the concept of error is based on absolute Error definition, which is introduced as the "difference between the actual value and the predicted value." The differences obtained for each model are calculated. Then, it is written in percent.

$$Error={GWL}_{predicted}-{GWL}_{observed}$$
(29)

and

$${Error}_{Percent}=\left|\frac{{GWL}_{predicted}-{GWL}_{observed}}{{GWL}_{observed}}\right|\times 100$$
(30)

Based on the results of prediction error distribution in Fig. 9, The ELM model has the most Error-percent in the range of less than 10%. Moreover, the ELM model has the least Error-percent in the range greater than 20%, so it is determined as superior to other models.

Fig. 9
figure 9

Models Error Distribution

5.7 Comparison of Testing Times

The computer with specification Intel® Core™ i5-4510U CPU @ 2 GHz calculates the testing phase of all four models. In this part, the consumed time for testing the models is compared, shown in Table 5. As depicted in mentioned Table, the ELM model was the fastest model, which the testing time equal to 1.119735. Next, the LSSVM model was more rapid than other models. The MLR and ANFIS were ranked next position.

Table 5 Models Simulation Time

6 Conclusions

This study used the qualitative and quantitative parameters of groundwater in Arak plain, Markazi province, in Mighan wetland, simultaneously to predict the GWL. The parameters used as input are sampling time, TDS, EC, and salinity, which are used to estimate GWL by implementing four models consisting of three AI models and one statistical model. The AI models are ELM, LSSVM, ANFIS, while the mentioned statistical model is MLR. After analyzing the results, based on statistical indices, the best results were recorded for ELM and LSSVR models with the less amount of RMSE, MAPE, and SI and closer value to 1 for R2. Moreover, based on the response plot, the best performance was assigned to the ELM model by better mapping the predicted values on the target ones.

The uncertainty analysis by WSM, the ELM model with changes in the confidence bound from 0.02056 to 0.07544, and average errors equal to 0.02744 was the best and the most accurate in the simulation of GWL, which has underestimation performance. In the case of DR, the ELM model had the most concentration of output points in the closeness of the DR = 1 line. Also, Based on the error distribution method, the best accuracy in the prediction was assigned to the ELM, according to the least simulation errors in range (> 20%) and the most error, in range (< 10%).

Regarding the RROC analysis, ELM Model was considered the superior model due to having the smallest range of changes in the coordinate axes of Fig. 7. Also, based on AOC values, the ELM model had the lowest value of AOC and was known as the most accurate model. Finally, the ELM was the fastest model in the aspect of consumed time in the testing phase.