1 Introduction

Water is necessary for life and is critical for irrigation, drinking water, and industry. Therefore, many techniques have been proposed to monitor water quality parameters, which is important for assessing the hydrological status of water and its management activities (Varotsos et al. 2020a; Varotsos et al. 2019; Krapivin et al. 2017).

Ground-water is an essential element of the natural water resources system and people have used it continuously since antiquity The progress in drilling and pumping knowledge led to easy use of groundwater and enables people to reach a store of extremely deep aquifers. Ground-water is inexpensive compared with treating surface water. Along with these advantages, it includes nutrients that are beneficial for health. The use of groundwater could be very helpful to address a water scarcity in regions where surface water is restricted. Also, it could be utilized to complement surface water resources.

Human activities influence the quality of ground water supplies because of the dangerous chemical substances generated via agricultural and manufacturing practices. Nitrate (NO3) is a common chemical contaminant that exists in various aquifers worldwide (Nadiri et al. 2019). NO3 has ability to penetrate into ground water therefore, the high concentrations of it NO3 > 10 mg per liter are dangerous and cause and harmful effect to humans (RadFard et al. 2019). Evaluating the groundwater pollution caused by NO3 contamination is a difficult matter due to the complexity of associated uncertainties and NO3 transport (Vadiati et al. 2016). Moreover, another common pollutant in groundwater is salinity, comprising various ions, including chloride, magnesium, sodium, nitrate, bicarbonate, calcium, and sulfate. The simulation experiments performed in (Varotsos and Krapivin 2018; Krapivin et al. 2021) showed that heavy metals, oil hydrocarbons, and radionuclides are considered as primary contaminants of water. Therefore, assessment of groundwater pollutants such as heavy metals is essential to human health. Toxic ions, for instance, bromide (Br-) boron (B), and iron (Fe) can accumulate at advanced level (Nasr and Zahran 2014). Additionally, some of the heavy metals consider as micronutrients could cause harmful health impacts when they have contents exceeding the allowable limits in drinking water (Prasanna et al. 2011; Prasad et al. 2014; Sobhanardakani 2016; Varotsos et al. 2021).

Pollution could limit and influence the development of groundwater for potable uses. Therefore, ensuring quality of water and its accessibility has a great effect on quality of life. Ground-water is a main water resource in semiarid regions; its quality and accessibility are major concerns for hydrogeologists and environmental managers (Jiang et al. 2009; Bain et al. 2014). Changes in GWQ could be produced via changing ground water flow characteristics that could leads to dissolution and transportation of various minerals within aquifers (Hem 1985; Swarna Latha and Nageswara Rao 2012; Knoll et al. 2017).

This study aims to provide a comprehensive review and discuss the application of AI models for assessing the GWQ. In this context, application of many AI models is assessed. Moreover, a comprehensive literature review and meta analysis are conducted to discuss the past trends and future opportunities of AI models for predicting the GWQ.

1.1 Groundwater Quality (GWQ) Parameters

To determine the GWQ, the chemical listed in Table 1 should be considered:

Table 1 Substances naturally discovered in some groundwaters that could impact GWQ

In addition, the assessment of GWQ must take into consideration the following drinking water parameters: (Dohare et al. 2014; Varotsos et al. 2020b)

  1. 1.

    pH: is the inverse logarithm of H2 ion concentration. The pH scale ranges from 0 to 14, and pH values 7-14 are alkaline, from 0-7 are acidic, and 7 is neutral. The pH of mostly drinking water is between 4.4–8.5.

  2. 2.

    Turbidity: refers to the interference of some suspension of particles in water with the passing of light. It is produced via a broad set of suspended particles. It is measured either via its impact on the transmission of light, known as turbiditymetry, or via it is impact on the scattering of light, called nephelometry. According to IS: 10500–2012, an appropriate and acceptable limit are one and five.

  3. 3.

    Total dissolved solids (TDS): A variation of TDS is utilized to determine the ability of filter solids via assistance of a filtrate. Also, it could be estimated from the conductivity measurement in water samples. The standard and permitted limits according to IS: 10500–2012 are 500 and 2000 mg/l respectively.

  4. 4.

    Electrical Conductivity (EC): is the capability of water to hold an electrical current and differs both with number and kinds of ions the solution comprises. On the other hand, the conductivity of distilled water is lower than 1μmhos per cm. Such conductivity relies on the existence of ions, their total concentrations, temperature of liquid flexibility, valency, and comparative concentration. Most inorganic acid solutions are good conductors such as salts and bases.

  5. 5.

    Total hardness (TH): According to IS: 10500–2012 the desirable and permissible limits for hardness are from 200 to 600 mg per liter, respectively. The impact of it size can be seen on utensils and hot water systems in boilers, etc. The amount of TH for drinking water is categorized in terms of corresponding CaCO3 concentrations as soft -0-60 mg/, moderate – 60-120 mg/, tough-120-180 mg/, extremely tough - >180 mg/.

  6. 6.

    Sulfate (SO4−2): Sulfate ions are present in natural water, and the majority of these ions are also soluble in water. Several sulfate ions are generated during the oxidation phase of their ores, and they are often found in industrial wastes. A UV spectrophotometer is used to calculate the amount of sulfate. According to IS: 10500–2012, the desirable limit for sulfate is 200 mg/ litter and the permissible limit is 400 mg/.

  7. 7.

    Nitrate (NO3): is found in raw water and is mostly a form of the N2 compound (of its oxidizing state). It is provided by chemical and fertilizer plants, animal waste, rotting crops, and domestic and industrial waste. A UV Spectrophotometer is used to determine the amount of nitrate. According to IS: 10500–2012, the desired limit for nitrate is max.45, and there is no relaxation of the allowable limit.

  8. 8.

    Total alkalinity (AIK): The number of the components in water that appear to raise the pH to the alkaline side of neutrality is referred to as alkalinity. It is usually expressed as milligrams per liter as calcium carbonate (m as CaCo 3) and is determined by titration with uniform acid to a pH of 4.5. CaCo 3, PO4, and hydroxides are common alkalinity-increasing materials found in water.

  9. 9.

    Chloride (Cl): can be found in all forms of natural and raw water. It is derived from agricultural operations, industrial activities, and Cl- stones. Because of human activity, its concentration is high. According to IS: 10500–2012, the desirable limit for Cl- is 250 and the permissible limit is 1000 mg/l.

  10. 10.

    Fluoride (F): is found in nature as fluorspar, rock phosphate, triphite, and phosphorite crystals, among other things. The temperature of the region, as well as the presence of accessory minerals in the rock minerals assemblage from which the groundwater circulates, are factors that regulate the concentration of F-. According to IS: 10500–2012, the desirable limit for F- is 1 mg/l and the permissible limit is 1.5 mg/l.

  11. 11.

    Boron: is found in nature as boric acid and boric acid salts. Weathering causes it to be expelled from rocks and soils and end up in water. It also enters the soil and groundwater through domestic landfills that are not properly sealed. It is a common indicator compound that signals the presence of other potentially dangerous compounds. According to IS: 10500–2012 the desirable limit for boron is 0.5 and 1 mg/l the permissible limit.

  12. 12.

    Phosphate (PO4): is a necessary plant nutrient that often regulates aquatic plant growth in freshwater. Because of the poor solubility of natural phosphate minerals and the ability of soils to absorb phosphate, groundwater usually absorbs only a minimal amount of phosphorus.

  13. 13.

    Chemical Oxygen Demand (COD): is a measurement of the amount of oxygen available for the chemical oxidation of organic matter using a powerful chemical oxidant. High COD levels can cause oxygen depletion due to microbe decomposition to levels harmful to aquatic life. COD determination has an advantage over BOD determination in that the outcome can be achieved in approximately 5 h as opposed to the 5 days needed by the BOD test.

  14. 14.

    Zinc (Zn): The earth’s crust contains approximately 0.05 g/kg of zinc. Its most common mineral is sphalerite (ZnS), which is usually combined with other sulfide elements. Zinc poisoning symptoms in humans include vomiting, dehydration, electrolyte deficiency, and stomach discomfort. Zinc chloride has been linked to acute renal failure.

1.2 Problem Statement

Alterations in ground water chemical composition are affected via a number of geochemical processes and human activities that may be complicated to identify (Almasri and Kaluarachchi 2005; Yesilnacar and Sahinkaya 2012). Therefore, it’s essential to comprehend the procedures that affect GWQ for efficient management of water resource and for management and protection of aquatic environments and such considerations could have the most important influence on sustainable development of countries (Batayneh et al. 2016; Fijani et al. 2017; Sheikhy Narany et al. 2014;Srinivas et al. 2015; Srinivasamoorthy et al. 2013; Krapivin et al. 2018). Conceptual or physical based models are conventionally major tools; nevertheless, they have several practical limitations, including the requirement for a large volume of data and input parameters. GWQ assessment required a large number of samples with chemical, physical, and biological variables (Belkhiri et al. 2018). However, in various zones, large scale sampling become unfeasible because of it is expensive and a lack of facilities; thus, lower cost and quick monitoring approaches are required. Conventional techniques of GWQ estimation are based on mathematical model that have lower accuracy because they typically consider the linear relationship between dependent and independent parameters. Because of limitations in GWQ simulation, new computational methods are necessary to enhance estimate accuracy (Maroufpoor et al. 2020). Additionally, in various instances, data have limitations, and achieving accurate prediction is more essential than comprehending basic mechanisms; in contrast, the artificial intelligence (AI) models could be an appropriate alternate.

1.3 Research Objectives

To build new and better artificial intelligence methods for GWQ modeling, it’s vital to examine what has been carried out with artificial intelligence approaches and recent research, and there is a necessity for scholars to understand what other researchers have performed in this respect. Consequently, the contributions of the current review are summarized as following:

  1. 1.

    This survey is summarizing 81 research studies based on artificial intelligence approaches and their advance application, growing over the past two decades (2001-2021), input-output data considerations, study areas, performance metrics, predicting period, and assessment.

  2. 2.

    The survey has emphasized the gaps in application of artificial intelligence approaches for predicting water quality in groundwater research and models that have not been investigated enough with hydro-chemical, and hydrogeological data of groundwater.

  3. 3.

    Lastly, the review shows the benefits and limitations of almost all models, the necessity for further research in different fields, concerns with data gathering, and recommendations for forthcoming improvements. Therefore, this survey study introduces a snapshot of historical evolution of AI methods over time. In depth explanation of AI models’ architectures and mathematical theories have not been described in this survey. However, proper references have been mentioned in background part of each AI method category for the readers who want additional detailed information.

The GWQ modeling employing AI approaches for any study includes the following steps: (1) data collection from remotely sensed/field, (2) dividing data into sets training, validation, and testing of AI methods, (3) selecting essential parameters that are important for training AI models, and (4) evaluating of attained output and comparing it with actual dataset then finding the precision of predictor, as in Fig.1. Two sorts of input parameters involving the hydro-chemical and aquifer characteristics parameters, or their patterns have been applied to predict the GWQ. The hydro-chemical parameters involve sodium (Na +), temperature (T), calcium (Ca2+), magnesium (Mg2+), bicarbonate (HCO3-), SO4−2, Cl, TH, TDS, dissolved solids (DS), EC, pH, F, potassium (K), PO4, and Fe. Such parameters have been gathered via measures filed. The hydrogeological variables refer to aquifer characteristics such as land use. AI methods used for predicting the water quality of groundwater could be categorized as: (1) artificial neural network (ANN) models, (2) fuzzy logic (FL) based models, (3) support vector machines (SVM) models, (4) hybrid models, and (5) machine learning models.

Fig. 1
figure 1

Procedure of applying AI models for GWQ modeling

Several papers (more than 80) have been published to predict the water quality in groundwater by AI models since 2001, as seen in Fig. 2. These studies have been published in international journals from well-known publishers such as Elsevier, Springer, IWA, Wiley, etc., between 2001 and 2021. The studies were obtained from searching the web utilizing the related keywords and were selected because they were published in famous international journals in the hydrology field, water resources field and AI sciences. It could be observed from Fig. 2 that AI models have had obvious growth recently, especially in 2017–2021 which gained the highest precent of published papers. Furthermore, many architecture variations have been noticed in addition to this exponential increase in applications of artificial intelligence approaches. The continued progress indicates the necessity for further exploration because every model has it is own limitation and advantages.

Fig. 2
figure 2

Number of published papers (in the current study) that used AI models in field GWQ modeling

Based on the search, Fig.3 delineates that the most frequently used models involve in ANN methods, with 52 papers followed by machine learning models with 21 papers. Also, it would be observed that the lowest frequency of papers included in hybrid models compared with single models with only nine papers. The RBF, MLP and back propagation neural network (BPNN) are the most common ANN methods applied in the groundwater field. Moreover, application of machine learning methods substantially in ceased from 2017 to 2021, whereas SVM model have been infrequently applied recently. Even though a varied rate is observed in all publications, there is a substantial growing trend in applying AI models to predict GWQ from 2017, as shown in Fig.3.

Fig. 3
figure 3

Number of times AI methods are applied among reviewed papers for predicting GWQ

In Table 2 many among the most widely used AI approaches for GWQ modeling are discussed. The methods include artificial neural network (ANN), adaptive neuro-fuzzy inference systems (ANFIS), support vector machines (SVM), hybrid models, and various machine learning (ML) techniques. Firstly, a brief overview for every AI method is introduced and then the related conducted studies are cited and reviewed. This is followed by assessment and evaluation, conclusion, and future trends in research.

Table 2 Summary of studies that reported the application of AI models and their learning methods for modeling water quality in groundwater

2 Artificial Intelligence Methods

2.1 Artificial Neural Network (ANN) for GWQ Modeling

Is a resilient method, using ANN inspired by brain processes to solve a complicated problems by making a relationship between input and output datasets. This relationship is produced via neurons. The connection between inputs and outputs are created in training during the iteration process that is terminated after the specified criterion are satisfied. Feed forward neural networks (FFNN), that is easiest kind of neural network, is classifying to single layer perceptron (SLP) and MLP. SLP is makes linear connection between input and output, is the improper model for complicated and nonlinear problem, whereas MLP provide a robust connection between input variable and output variable to modeling a nonlinear problem. Datasets are pass across the loop in order to get valid relation for input-output. The BPNN is well-known sort for MLP that is use a back-ward method of adjusting parameter of networks. RBFNN is other model of ANN based model that is used the RBF as an activation function. RBFNN model was primarily presented by (Lowe, Broomhead, and Royal Signals and Radar Establishment 1988). Figure 4 shows a simple structure of ANN model.

Fig. 4
figure 4

Schematic view of ANN approach in predictioning GWQ

2.1.1 Bibliographic Review

Current research in predictioning GWQ reported that ANN can present as a superior replacement for conventional techniques. Heidarzadeh (2017) used an ANN model to predict the GWQ of the Amol-Babol aquifer using a the dataset from 1987 to 2010. Na was studied as an output variable in the ANN approach because of its high concentration in irrigation. Based on pre modeling, the variables pH, EC, and TH were selected as the better inputs. The researchers in (Khudair et al. (2018) investigated applying an ANN model using data for Baghdad, Iraq, for water parameters: pH, Cl, SO4−2, and TDS. The findings gained using the ANN model showed high prediction efficiency and the parameters pH and Cl have substantially affected model prediction, and thus are critical factors. Klçaslan et al. (2014) proposed an ANN method to estimate GWQ and compared it to conventional water quality evaluation techniques; this paper proved the effectiveness and accuracy of the ANN approach. Kuo et al. (2004) was used-the BPNN to predict the variation of GWQ in Taiwan. Various kinds of back progression models were created in order to assess their performance. The findings showed that ANN was able to define the complex variation of GWQ and be applied to present accurate predicting. Furthermore, a number of hidden neurons do not greatly effect performance of the model in training or testing. Sunayana et al. (2020) was evaluated GWQ from 2016 to 2017 at 10 locations in India. The TH was forecasted employing ANN by applying various training algorithms, then the finest one was utilized in the ANN final model to predict GWQ. They quantified the impact for both space and time on TH of ground-water. The ANN used here was able to forecast the TH concentrations of each point. Choi et al. (2014) observed the foremost ions, for instance, Ca2+, Mg2+, Na +, K +, HCO3-, Clˉ, SO4−2, and NO3, in 2008 to evaluate GWQ in South Korea. The hydro-chemical dataset of ground-water samples was analyzed applying the SOM method, which is a type of ANN model. The results revealed that this method could be effectively employed for classifying and characterizing groundwaters in terms of hydro-chemistry and quality on the area size. To predict NO3 concentrations of ground water wells in the southern area of the Gaza Strip, Zaqoot et al. (2018) used to ANN models MLP-NN and RBF-NN. Both were trained and developed using seven input variables (pH, EC, TDS, Ca2+, Mg2+, hardness, and abstraction rate), and the results of MLP-NN showed better performance than RBF-NN. Wagh et al. (2017) applied the ANN approach by BP algorithm to predict GWQ in the Kadava River basin in the Nashik district using parameters EC, HCO3-, TDS, TH, Ca2+, Na +, Clˉ, CO3, Mg2+, F, and SO4−2. The mimic output tracks the measured and forecast NO3 value showed that the ANN is a superior model to manage groundwater resources in the a simpler manner. Wagh et al. (2016) applied ANN in order to predict values of %Na, MAR, SAR, KR, and RSC in the ground water of Nanded Tehsil. The 50 ground-water according to various samples were studied to various physicochemical variables, for example, pH, TDS, EC, Ca2+, Mg2+, HCO3-, Na, Clˉ, CO3, NO3, and SO4−2 during the pre monsoon season in 2012. These parameters were used as input variables in the ANN-based GWQ indices. The results confirm that the ANN model is successfully applied to predict ground-water suitability. Nasr and Zahran (2014) created an ANN with feedforward BP to forecast groundwater salinity in Egypt, which is expressed through TDS, using pH as input parameters. The network output was satisfactory, and simulation can be used for entering new inputs. Khan et al. (2021) forecast the amount of waterborne bacteria in GWQ via forecasting the presence of E.coli applying chemical, physical, and microbiological parameters of groundwater. SLA was proposed to examine the pattern of ANN based sensitivity analyses to be determine the significance of all GWQ parameters resulting in prediction of E.coli in ground-water. A higher correlation is seen between E.coli and a lower pH values, although the lower correlation is found between DO and pH values. The model with turbidity, pH, TDS, and EC as inputs performed better, according to the data. It could be inferred that Grover’s algorithm-based superposition models are more effective at predicting each pattern of E.coli count in ground-water. Fadipe et al. (2021) assessed the GWQ by predicting the water quality parameters in Boluwaduro community, Ofatedo in Osun State. An ANN model was employed using water samples collected from 18 randomly selected dug wells and subjected to physical, chemical, and microbiological analyses. The ANN design and were trained in some rounds until acceptable outputs was attained with R2 out of 0.97. The created model for TDS provided a good prediction with TH and Mg2+ respectively. Honget al. (2001) applied the KSOFM neural network model to analyze the impact of stormwater infiltration on GWQ. It was concluded that the KSOFM model provided useful analyzing and diagnosing tools to comprehend a dynamic in GWQ and to extract information included in the multidimensional dataset. It is considered as to have potential not only in GWQ monitoring and diagnosis, but also in further environmental fields. Kassem et al. (2021) used the ANN method with different input combinations of GWQ variables to identify the most important variables impacting of the Cl concentration prediction values. Seventeen ANN models were developed through altering inputs variables. Results revealed that ANN 5 model with combination (AVR, ICC, A, LT, GWL, RR, DSWS, W, AT, DSSL) delivered outstanding estimation of prediction values of final Cl concentrations. The ANN method illustrated how ANN developing method could be utilized to identifying key variables needed for the most important parameters influencing Cl concentrations. Nakagawa et al. (2017) examined the ground water chemistry traits at Shimabara via used of SOM, inputs to this model were the eight main ground water chemical elements (Cl−, NO3, SO4−2, HCO3-, Na +, K +, Ca2+ and Mg2+). Results indicated that based on chemistry, surface water and groundwater can be classified to five major clusters demonstrating distinctive patterns. Additionally, the five clusters can also be divided to two main water categories, i.e., nitrate-polluted and nonpolluted water. According to Stiff and Piper trilinear diagrams, nitrate polluted water was the Ca-(So4 + NO3) kind, whereas nonpolluted water was classified as Ca- HCO3 kind. Wang et al. (2006) developed a BPNN model to mimic spatial distribution of NO3-N concentration in groundwater in the North China Plain by land-use data and site specific hydro geological properties. The GIS tool was used to prepare and process input and output vectors information for BPNN. The outcome indicated that GIS-based BPNN mimic ground water NO3-N concentrations effectively and capture the overall trend of groundwaters NO3 pollution patterns. Gemitzi et al. (2010) applied ANN methods combined with GIS for spatial prediction of NO3pollution in groundwater. These involve hydraulic conductivity and depth to aquifer, land uses, fine to coarse grain ratio in the unsaturated area and soil permeability. The method was employed at the South Rhodope Thrace, Greece aquifer. The findings indicated that this method is able to illustrate a spatial pattern of NO3contamination. Alizamir and Sobhanardakani (2016) built the ANN model to predict As, Pb, and Zn concentration in ground water. The Levenberg-Marquardt (LM) algorithm and Bayesian regularization (BR) algorithm were used as the ANN training algorithms. The performance of these algorithms was assessed, and it was noticed that LM was a great choice for predicting heavy metals concentrations in the chosen monitoring wells. The outcomes showed that the LM algorithm took lesser time for training of network compared with the BR algorithm. Based on results, it was proved that the ANN showed acceptable range in accuracy in predicting heavy metals concentrations in groundwater resources. Sakizadeh (2015) utilized ANN through BR to predict water quality index (WQI), with regard to the concentrations of 16 GWQ variables gathered from 47 wells. The results indicated the effective predicting of WQI via ANNs via used BR algorithm. Also, the sensitivity analyses was applied to illustrate the significance of all parameters in predict WQI through ANN training and it revealed that parameters such as PO4 and Fe are the most significant parameters in predicting the water quality index. Moasheri and Tabatabaie (2013) measured the value of NO3 groundwater qualitative using the dataset of the Birjand plain, which were gathered from 35 wells and aqueducts every six months from 2008 until 2010. ANN was optimized through genetic algorithm (GA) and provided values between the lab result and actual result, which showed the reliability of this approach for predicting the NO3 values. Keskin et al. (2015) predicted groundwater pollution sources applying ANN in Turkey. BP and Bee Algorithm (BA) were employed in ANN modeling and 14 hydro chemical datasets were employed. The better ANN classification was achieved with 80% accuracy using the BA algorithm. Singh and Datta (2004) used the universal function approximation property of a multilayer, feedforward ANN to estimate temporal and spatial changing of unknown contamination sources, for instance, hydraulic conductivity, porosity, and dispersivities, and to deliver an accurate estimate for unknown flows and transport parameters. The model was training on patterns of simulated datasets used by the BP algorithm. The set of source fluxes and temporal changing simulating concentrations measurements constitute patterns of trained. A limiting performance evaluation display that introduced method performs well. Sahoo et al. (2006) used BPNN to predict pesticide concentration in ground-water monitored well. The results of the predictive model generated good agreement with actual data in terms of R and pesticide detection efficiency, also good correlation between actual and predict “classes” groups. The BPNN was applied in order to rank inputs variables with maximum potential to pollute ground water, comprising two original parameters which are depth to aquifer material and pesticide leaching class. Once those parameters were solitary input parameters to the model, they were not capable of predicting pollution. Though, if they were used with further parameters, the prediction performance effectiveness of the model in terms of R, E, ME, RMSE, and pesticide occurrence groups improved. El Tabach et al. (2007) proposed ANN metamodel was applied to evaluate a risk of pollution to attain the ground-water resource underneath the road axis of a high-way project in north of France. Barzegar and Moghaddam (2016) investigationed and comparationed the accuracy of various neural calculating methods, MLP, RBFNN, and GRNN, in predicting of ground-water salinity which is expresses by EC. The performance criteria of the built neural networks model revealed that RBF-NN model has the most excellent performance in predicting EC. Sirat (2012) employed an ANN model in the USA in Illinois, Iowa, and 12 other states for predicting pollution of groundwaters with pesticides. The findings of many trials have shown that this model has predicted the pollution well with-in every minor group within high accuracy. It proved that ANNs models are effective tools for predicting groundwater pollution. (Dar et al. 2012) ANN model was used to comprehend a correlation and sensitivity of every chemical parameter with regard to fluoride. The prediction results indicated that four parameters pH, Cl, SO4−2 and Ca2+ were able to better influence fluorides compared with the other eight parameters. Also, the Cl ions were noted to be the most sensitive parameter and most correlating to fluoride. Kadam et al. (2019) applied ANN and MLR methods for predicting the fitness of GWQ for drinking. The physicochemical parameters were taken in account to compute WQI with the view of producing reliable and accurate modeling to predict WQI-based GWQ. The LM with 3-layer BP algorithm was applied in the ANN structure. Additionally, the MLR model was used to test the effectiveness of the prediction model. The result verified that prediction of ANN is acceptable and confirmed regular satisfactory performance for each season. Moasheri et al. (2013) applied the fusion of ANN and geostatistical to assess more parameters of spatial distribution for GWQ is the TDS plain Birjand more precisely. The analyses of the geostatistical interpolation method and application of ANN for optimization results of the geostatistical method have been examined. Ehteshami et al. (2016) employed BPNN and RBFNN and the best architecture for the BP model was noted to 4-5-1 and RBF with a spread parameter = 0.5 and MSE =0.50. Result displayed insignificant variation between architectures. Both ANN models could be consistently predict nitrate contamination in ground-water with satisfactory precision.

2.1.2 Results of Artificial Neural Networks Models

The assessment of many papers on ANN modeling for water quality in groundwater showed the follows points:

  1. (1)

    The ANN methods could be expanded simply from univariate to multivariate instances compared to conceptual methods, and model complexity could easily vary through adjusting a transfer functions, learning algorithms, or networks architectures. Similar to any regression model, inputs would depend on experimental proofs or correlational analyses. Also, conclusions of review studies showed that ANN captured the complexity of nonlinear behaviors for GWQ reasonably more accurately than regular regression methods such as LR or MLR.

  2. (2)

    The most popular structure of ANN for GWQ modeling is using three or two layers with sigmoid transfer function in the hidden layer and linear transfer function in the output layer. This function is more stable, differentiable, and monotonically growing in its field and is an often used function in predicting GWQ. It should be noted that in most reviewed studies, the ANN architecture and number of hidden neurons were attained through the trial-and-error method.

  3. (3)

    The review show that the Levenberg-Marquardt(LM) and Bee Algorithm (BA) algorithms are good training algorithms used to train ANN for GWQ modeling compared with other ANN algorithms. Keskin et al. (2015) compared the performance of BP and BA utilized in ANN training. The best ANN classification was achieved by the BA. The LM algorithm is the alteration of the classical Newton algorithm utilized to find the optimal result to minimizing the problems. The LM algorithm is frequently described as continuous and efficient, and many investigators indicate that it is quicker and less simply trapping in local minimum than further learning algorithms. Alizamir and Sobhanardakani (2016) compared LM and Bayesian regularization (BR) algorithms in terms of how they perform in GWQ, and it was found that LM was a good choice for forecasting heavy metals concentration in the selected monitoring wells. The simulation results indicated that the LM algorithm took less time for training of the network compared to the BR algorithm.

2.2 Fuzzy Logic- Based Modeling for GWQ Modeling

The integration of the fuzzy inference system (FIS) and the adaptive neural networks (AN) produced model adaptive neuro fuzzy inference systems (ANFIS). These models have the ability for capturing the benefit of both approaches in a single structure. Jang (1993) established the structure and the training process of ANFIS which used the neural network training algorithms to create sets of fuzzy “if and then” rule with proper membership functions (MFs) from specifying input-output. FIS correspond to sets of fuzzy “if and then” rule that have training ability to estimate non-linear function. FIS have two methods called Sugeno and Mamdani. The disparities between such methods occur as the following section explains. Mamdani’s method used fuzzy MFs, while Sugeno’s method utilizes linear/continuous MFs.

ANFIS is the AI approach with adaptable statistical structure that can identify complicated non-linearity and uncertainties because of randomness and inaccuracy among variables, without trying to attain an insight as to the nature of trends. Such method is capable of estimating every actual stable function on the compact sets to every level for precision. Hence, in parameters prediction, wherever provided data in which a system links measurable system variables with the internal system parameters, the function mapped maybe built through ANFIS which approximates the processes for evaluation of an internal system parameters. More details about ANFIS it can be found in supplementary materials and in (Jang 1993). Figure 5 shows ANFIS structure.

Fig. 5
figure 5

ANFIS architecture

2.2.1 Bibliographic Review

Some recent investigations have tried to evaluate GWQ applying FIS. Dahiya et al. (2007) applied FSE to analyze GWQ. The model was optimized by testing 42 ground water-samples based on ten drinking water quality parameters. Khaki et al. (2015) assessed the ability for using ANFIS and ANN models to assess TDS and the level of EC for GWQ, using the amounts of further present water quality parameters NO3, SiO2, Na+, pH, Ca2+,, Mg2+, K, HCO3, Clˉ, Fe) in Malaysia. The results indicated that both methods have the ability to interpret behavior of water quality parameters in groundwater. (Aryafar et al. 2019) GP was employed to identify relationship between GWQ parameters, using the independent variables (K+, Na+, pH, Ca2+, Clˉ, CO3, HCO3-, Mg2+, So4−2 and NO3) in order to predict dependent variables (TH, TDS, EC) for 240 samples of groundwaters gathered from 12 wells in Iran. Also, ANN and ANFIS techniques were developed to further verify the estimation ability of GP through comparing the predicted and actual values for chemical parameters. Findings revealed the superiority of GP acceptable performance was delivered via each model in prediction of intended parameters of water quality. In India (Kumar et al. 2020) applied MLR, SEM, and ANFIS models, through mixing three datasets for various areas and time period in four different methods, various regression approaches designed used the TDS as the output parameter and the input parameters used were (K +, SO4−2, Ca2+, Mg2+, Clˉ, Na +, and NO3, TH). Similar regression coefficients were noticed for both SEM and MLR models, while the ANFIS model demonstrated a better performance than the MLR model. Vadiati et al. (2016) used the MFIS model through commonly accept GWQ indices. The method assessed for it is able to evaluate the drinking water quality for 49 samples gathered in every season from groundwater sources in Iran from 2013 to 2014. Input membership function was described as “desirable”, “acceptable” and “unacceptable” depend on experts’ knowledge and standards and allowable restrictions imposed via WHO. Tutmez et al. (2006) used the ANFIS model of ground-water that EC depend on concentrations for positive charged ions in water. The prediction results indicated that this model outperform over conventional approaches in prediction EC from TDS. Jebastina and Prince Arulraj (2018) investigated NO3 pollution in ground water of Coimbatore district of India used 71 observation wells years 2011–2012. The inputs for producing the ANFIS method is average for EC, Na+, Ca2+, Clˉ, TH, K and the output was the concentrations of NO3. Outcomes for the best ANFIS model were evaluated via deterministic, geo-statistical and kernel smooth techniques utilizing GIS. Five models developed using the, ANFIS method predicted NO3 concentrations. Selvaraj et al. (2020) gathered 30 ground water samples from bore well as dug well source and used chemical parameters to measure FWQI. Five FIS models with various linguistic parameters and five classifications of fuzzy model were established, for example “excellent”, “good”, “poor”, “very poor” and “not suitable”. Of the findings achieved from such method, six samples were classified to “excellent”, eight samples to “good”, 12 into “poor”, three into “very poor” and one into “not suitable”. Also the result of this method were compared with the output found from the deterministic technique. MFIS was employed for analyses of the GWQ in Nadiad district, India (Prajapati and Parekh 2016). Various groundwater variables such as EC, TDS, PH, Cl and Ca were utilized to analyze GWQ. Different membership functions, “desirable”, “acceptable” and “not acceptable” are used and fuzzy logic rules were defined. The results indicated that MFIS is very useful and effective tool in assessment of GWQ. Dixon (2005) effectively used FL and GIS, GPS, remote sensing to measure ground-water-vulnerability maps. Nadiri et al. (2017) used Larsen fuzzy logic (LFL) in conjunction with Sugeno fuzzy logic (SFL) and Mamdani fuzzy logic (MFL) to assess the ground waters vulnerability in the Varzeqan Plain, in north western Iran. Nikoo and Mahjouri (2013) used PSVM and FIS methods for probabilities water quality zoning which is important to GWQ. The needed data for training PSVM was produced by using FIS in the Monte Carlo analysis. PSVM rained based water quality index give probability of belonging to various water quality classes “very low”, “low”, “medium”, “high” and “very high”. The prediction result was compared with attained results from clustering methods which are FCT and SOM. These outcomes of GWQ zoning were illustrated in maps using using GIS. Nadiri et al. (2013) presented the supervised committee machine with AI method employing the SFL, MFL, ANN, and neuro fuzzy (NF) to predict fluorides concentrations. The findings demonstrated that all of those models have the same fitting to fluorides dataset in the study area, and do not predicting fine for samples in mixing zones.

2.2.2 Results of Fuzzy Logic Models

The assessment for many papers on fuzzy based prediction for water quality in groundwater displays the following issues:

  1. (1)

    Of the investigated studies, those using a fuzzy-based model ranked second, while ANFIS ranked highest because it provides the benefits of both ANN and FIS models. ANFIS could managing non-linearity, uncertainties and fuzziness relate to water quality of groundwater. ANFIS has some benefits in signal identifications, converse coding and noise cancelation. ANFIS has improved result when datasets are normalize (Ahmed et al. 2017).

  2. (2)

    ANFIS model has demonstrated superior performance in managing data of water quality in groundwater as it has two categories of parameters (linear and nonlinear) that could be adapted appropriately during training phase (Jang 1993). Beside member function which was illustrates the vital role and right selection lead to finest optimizing for parameters, thus result with highest precision (Jang 1996), an additional vital reason to design the excellent ANFIS model is identifications technique used to split inputs spaces. Common methods employed are subtractive cluster, grid partition and fuzzy c-mean cluster by different functions (Jang et al. 1997). Consequently, this model could do well through nonlinear data alongside with appropriate selection groundwater parameters such as (EC, TDS, PH, Cl, Ca, Mg, NO3 …, etc.) decided via the designer.

  3. (3)

    Some models such as MFIS and FWQI were presented for water quality in groundwater modeling. They performed well in managing GWQ. The fuzzy-based model was capable of delivering the highest accuracy of result, and simply incorporating previous knowledge to the model language naturally. This permits parameters to estimate the algorithms in order to give the practical primary values that reducess the opportunity to find needless local minimum. An Additional benefit is the capability to extrapolate that reveals new information revealed from data (Lindskog 1997). The fuzzy rule based model describes input and output relationships more pragmatically and it has physical interpretations that have allowed scholars in GWQ research to select FIS instead of ANN models (Dahiya et al. 2007; Esmaeilbeiki et al. 2020).

2.3 Support Vector Machine (SVM) for GWQ Modeling

The idea of not being able to separate dataset in support vector networks was established in 1995. This kind of machine training used non-linear inputs mapped to highest dimension space then ensure higher generalizing. This was model based on the named large margin component and developed according to structure risks minimizing theory. Support vector machine could produce good accuracy compared with KNN, linear and NN classifiers because it does not involve knowledge. The model includes regression model complexity, regularization, and kernels functions (Cortes and Vapnik 1995). Hence, it’s called the kernel model. Kernels function and parameters must be carefully choosen to decrease the bound on Vapnik dimensions (Hadian et al. 2021).

The mappings used by SVM schemes are designed to ensure that dot products may be computed easily in terms of the variables in the original space, by defining them in terms of a kernel function selected to suit the problem. The extended support vector machine model can solve a function assessment problem via using loss function Vapnik epsilon (ε) insensitive and Huber loss function as displayed in Fig. 6 through overall structure for support vector machine (Chen and Pao-Shan 2007; Yaseen et al. 2019).

Fig. 6
figure 6

(a) Typical SVM architecture for groundwater quality modeling, (b) Nonlinear SVM Vapnik ε insensitivity loss functions, when predict values are with in tube loss is 0 and outside loss is scale for variation between predicted value and radius of ε tube

This model could be design via linear, spline, polynomial, and RBF networks (Suykens and Vandewalle 1999). SVM is commonly used in order to develop the classification models of predicting trends and also solve regression and time series challenges (Raghavendra et al. 2014). Numerous models have been built utilizing kernel architectures as shown in Table 2.

2.3.1 Bibliographic Review

In the field of water quality for groundwater prediction with SVM, (Jafari et al. 2019) TDS of the ground water aquifer at the Tabriz plain was assessed with groundwaters physicochemical parameters such as (Na +, Ca2+, HCO3-, Mg2+, SO4−2) using MLP, ANFIS, SVM, and GEP for a period of 10 years from 2002 to 2012. According to the outcomes, GEP, MLP, SVM, and ANFIS methods would be effectively used in predicting TDS alteration and the GEP provided better results compared to the other methods. Alagha et al. (2014) built SVM and ANN models to predict ground-water NO3 concentrations applying scant input datasets, the samples used data of 22 municipal wells in Palestine (2000–2010). The conclusions indicated that the developing AI approaches could be effectively used to evaluate impacts of future management scenario on ground water NO3 concentrations, leading to higher acceptable groundwaters sources management and decision making. Gholami et al. (2011) applied the SVM model to forecast concentrations for Ni and Fe, then it was compared with BPNN, and the results indicated that this model could be considered as a suitable algorithm to predict Fe and Ni concentrations. Actually the SVM technique has delivered the best forecast for toxic metals, resulting faster operating time comparing with the ANN model. Dixon (2009) evaluated the ANN and SVM integrated into GIS to identify contaminated wells using 14 GIS derived land use parameter and soil hydrogeologic parameters as preliminary input. Good water quality dataset NO3-N from 6917 wells in USA were applied as an outputs goal class. Findings showed good performance by NN when compared with support vector machine. Arabgol et al. (2016) used the SVM model prediction NO3 concentrations in groundwaters in Iran. SVM presented as a promising tool to predict NO3 concentrations utilizing the sets of simply measurable GWQ parameters as input (T, ground water depth, TDS, land use, DO, pH, EC and season in year). The mapping for NO3 concentrations in ground-water were arranged for all seasons applying training SVM with GIS interpolation scheme. The results indicated that this method can be utilized as a faster, reliable, and cost-effective technique to measure and predict the GWQ. Isazadeh et al. (2017) used SVM and ANN models with various input designs, and the findings confirmed that each model showed good performance when using input well location. Also, the results showed that SVM and ANN generated better accuracy in estimation of three qualitative parameters. Moreover, the significance of auxiliary input variable was determined using Gamma testing. Purkait et al. (2008) used ASVR, ANN and MLR to prediction As contaminations in ground water. Among those models the ANN model that used the BP technique demonstrated more accurate result in predicting As contamination. Also, it is important to note that the architecture of the ANN model used here with 4-layer feedforward BP can be utilized as a satisfactory prediction model for assessing As contamination in groundwater.

2.3.2 Results of Support Vector Machines

  1. (1)

    The SVM or SVR are effective ML approaches that have been built and employed for various classification and regression challenges throughout the past two decades. Even though a low number of published studies considered the prediction of GWQ by SVM compared with ANN models, it was applied to predict various time series for the myriad of practical applications in this realm.

  2. (2)

    In predicting SVM, properly selecting kernel functions and the values of parameters is very crucial. In many above-mentioned papers such as (Gholami et al. 2011;Dixon 2009; Arabgol et al. 2016) RBF has been used as the best kernel function. While Sigmoid and RBF were used as the kernel function in (Alagha et al. 2014). However, in some papers, the kernel function used was not revealed. Many researchers have chosen the RBF function as the kernel function for SVR over the years due to it is accuracy and reliability in performance.

  3. (3)

    The majority of papers used a trial-and-error method to determine the optimal parameters of the SVM model, with the exception of one analysis (Jafari et al. 2019), which did not mention a tool to determine the optimal parameters of the SVM model.

2.4 Hybrid AI Models for GWQ Modeling

Since it was discovered that artificial intelligence methods have some limitations by non-linear process and nonstationary process, many hybrid prediction techniques that comprise data pre-processing/combined various artificial intelligence models to increase abilities for artificial intelligence models. Combined various AI models of various phases of modeling and employing effective approaches for input data preprocessing made the developed of these models more effective. Owing to the ability of geostatistical tools in spatial assessment, hybrid artificial intelligence geostatistical methods were employed in several studies to utilize their capability in mimicking spatiotemporal for GWQ. GA was initially presented during 1962, then significantly enhanced as time progressed. It works on selecting, crosses over, and mutating operators (Goldberg and Holland 1988) of the binary string used for codes; therefore, it used actual variables vector as chromosome, actual variable as gene and actual number as allele. Vectors were decoded and underwent fitting assessment, that help decide if a vector should undergo mutation or crossover. GP models, which are combined GA and GP, it is depend on Darwinian concept for evolution theory introduced by (Koza et al. 1997). It’s an independent, organized and domain approach that instinctively solves problems and does require the designer to have in-depth information for solutions structures. The functions of GP are an optimizer and classifiers. Such technique included syntax trees with branch as function with argument and leave of the tree as variables and continual. Many algorithms are dependent on behaviors for the organism, such as particle swarm optimization (PSO) particle swarm optimization uses principle of bird flocks, fish teaching and swarming theory. It depends on the simple idea by least coded. Its benefits over GA consist of improved performance because of groups interactions and memory retention (Eberhat and Kennedy 1995). Figure 7 and 8 show the structure of the hybrid model.

Fig. 7
figure 7

Steps of hybrid model procedure (Kisi et al. 2019)

Fig. 8
figure 8

General structure of optimizers by ANN (hybrid-model) where X = inputs, Y = output, bias is justified via the training process, H = number of hidden layer and w = weight (Tiyasha, Tung, and Yaseen 2020)

2.4.1 Bibliographic Review

Some researchers attempted to integrated various types of AI models to overcome weaknesses and then enhance their precision. Jha et al. (2020) used a novel hybrid structure integrated with the FL and GIS based GQI to measure GWQ in India using FGQI models (with seven critical variables) for two seasons. The model inputs parameters done by Trapezoidal membership function based on water quality standards for drinking purposes and expert knowledge were classified (desirable, acceptable, and unacceptable). The outcomes were compared with other models and the FGQI model predicted GWQ better than the traditional methods and was more reliable and realistic for GWQ valuation and analyses at large scales. Maroufpoor et al. 2020 applied hybrid models based on neuro fuzzy systems combined with fuzzy c-mean data cluster (FCM) and grid partition (GP) models and ANN combined with PSO algorithm to forecast a spatial distribution for parameters EC, SAR and Cl of ground water. The findings of these methods were compared with the geostatistical method which include kriging, inverse distance weighted (IDW), and RBF. Outcomes indicated that the hybrid model NF-GP gave lower values of both RMSE, and MAE and a higher value of R was found; it was a most appropriate model for predicting the water quality parameters. Aguilera et al. (2013) showed that applying a hybrid model of BNs with MTEs provided a proper result for predicting GWQ. The outcomes attained allow the difference of three classes of sampling, representing three various groups of GWQ and the possibility that the sampling point belongs to all clusters permits the uncertainties in clusters to be measured, and a risk related in term of water quality management. Jalalkamali (2015) used a variety of hybrid approaches, including ANFIS-GA and ANFIS-PSO to model three GWQ parameters in the Kerman plain involving Cl concentration, EC, and PH, assuming different combinations of monthly variables of rainfall and ground water level, as well as three various quality parameters. The findings showed that both models performed well in the spatiotemporal simulation of GWQ. The paper also showed that groundwater variations around the aquifer, and runoff, play important roles in forecasting GWQ. Kisi et al. (2017) employed the ANN-PSO model and the ANN-DE model to predict parameters of ground water qualitative such as SAR and So4−2. The prediction results revealed that ANN-DE provided better accuracy comparing with ANN-PSO in predicting these groundwater parameters. Cho et al. (2011) investigated the estimation of prediction performance of various models such as MLR, ANN, PCR and grouping of principal components and ANN (PC-ANN) in predicting of As concentrations and provided evaluation tool for Thailand, Laos, and Cambodia. Prediction precision of PC-ANN shows better performance than other models. Alizamir and Sobhanardakani (2018) used the ANN-PSO approach to predict heavy metals such as (Cu, Zn, Pb, and As) contamination in ground water sources in Iran. The results showed that this hybrid model could be successfully used for predicting concentrations of heavy metals in ground water resources at the Toyserkan Plain. In (Kisi et al. 2019) the GWQ variables and quality parameters were modeled through simple ANFIS and ANFIS training via evolution algorithm including PSO, DE, CGA and ACOR, and the results of modeling shown that four proposed algorithms improved ANFIS performance in predicting the (TH and EC), and in algorithms (SAR, PSO, CGA) have the best performance compared with other algorithms. Fijani et al. (2013) coupled SFL, MFL, ANNs, and NF methods to estimate groundwater vulnerability in the Maragheh-Bonab basin of Iran. (Soltani Mohammadi et al. 2017) predicted GWQ parameters such as SAR, EC, and TDS using ANN and ANN- PSO models and compared their outcomes with observed data. The inputs to predict parameter TDS included of EC, SAR, pH, SO4−2, Na +, Ca2+, Mg2+, but for SAR, it included HCO3-, pH, Na+, and TDS, and as for EC, it involved So4−2, pH, Ca2+, SAR, and Mg2+. The results reveal that the highest predicting accuracy for SAR, EC, and TDS was related to model ANN-PSO by using the tangent sigmoid activation function. Bashi-Azghadi et al. (2010) studied a new method to estimate the location and total of leakage from unknown contamination source using data of GWQ. They developed probability prediction models, PSVMs and PNNs for identifying and evaluating core attributes of an unknown ground water contamination source, in term of location, timing and magnitude. In their proposed method the GWQ prediction model derived-out of MODFLOW and MT3D were coupled with a multi objective optimization model NSGA-II. The comparison of accuracy of both models found that the PSVM model was better than PNN. The results revealed in real time monitoring groundwater that the probabilistic mass functions of unknown contamination source location and relative errors in estimation values of leakage depending on the actual concentration of water quality indicators at monitoring wells were better evaluated via training probability in both PSVMs and PNNs models.

2.4.2 Results of Hybrid AI Models

  1. (1)

    Many papers have reported using PSO in resolving practical optimizing problems in the world such as (Liu et al. 2007;Melin et al. 2013;Selakov et al. 2014; Soltani Mohammadi et al. 2017). PSO could be more efficiently applied to forecast a spatial distribution of parameters of groundwaters due to it has small computational volume, simple operation, is not reliant on problems, high convergence rate, worldwide search ability, able to deal with complex problems and could escape local minimum (Maroufpoor et al. 2020;Jalalkamali 2015). However, in (Kisi et al. 2017) DE algorithm showed better results than PSO which is a population based stochastic searching method to solve continuing optimizing problems.

  2. (2)

    GA is a population based stochastic optimizing tool that could successfully deal with nonlinear and nondifferential water quality of groundwater data problems through attaining optimal solutions via the repetition procedure. The operators could rise to other spaces with look space to discover best new way out. It’s a not thorough method till required solution is attained (Jalalkamali 2015). These benefits let them to solving high complex and multiparameter challenges.

  3. (3)

    GP delivers stable and effective result (Maroufpoor et al. 2020); although, limited investigation have considered GP model. It was shown good performance once datasets are preprocessed. The explicit and linear representation of this model relies on ease, effectiveness, and genetic operator. Further investigation is required to best utilization of GP model consider they would be simply used, develop through training and solving the complex challenges in unpredicted manners.

  4. (4)

    Another combined model was integrated such as BNs with MTEs and Fuzzy Logic with the GIS-based GQI showed good outcomes for prediction GWQ. By merge ANN with PCA in (Cho et al. 2011) a new set of variables and the PC scoring are produced from perpendicular linear transformation of previous data, then these scores utilized in prediction as helpful variables.

  5. (5)

    The majority of research papers used one or combinations of multi-dimensional data analyses approach in ANFIS, i.e., GP, PSO, CGA, DE and ACOR cluster. Moreover, CGA and PSO are the superior tools compared with others in handled the classification of GWQ (Kisi et al. 2019).

  6. (6)

    Even though many studies have applied used hybrid models in hydrology, most studies conducted for modeling groundwater level (Fahimi et al. 2017; Tiyasha and Yaseen 2020; Rajaee et al. 2019; Tiyasha and Yaseen 2020; Moosavi et al. 2012)

2.5 Machine Learning Models for Modeling Water Quality of Groundwater

Machine learning (ML) methods for instance, XGBoost, M5P, MARS, GEP and RF are the effective models to deliver the non-linear relation between input-output parameters. XGBoost approach produces several shallow decision tree, combinations for whole trees generates the highest precision regulation through forecasting (Chen 2016). A decision tree was built via XGBoost algorithms and did not just proceeded to minimize the aim function through consideration of loss function but also protect trees from overfit during regularization procedure. Additionally, flexibility in tuning hyperparameter for XGBoost model make such model very common in various exploration disciplines (Bhagat et al. 2020). The M5P used a binary decision tree (DT) and multi-linear regression to forecast continuous mathematical characteristics. The initial stage involves dividing criteria to produce the decision tree. The second stage involves pruning technique to remove overfitting and building of the linear regression functions (Etemad-Shahidi and Bonakdar 2009). MARS is a regression model built in 1990 (Friedman 1991) which is the combination of regression and recursive partitioning. The method is categorized as a forward and backward subclass selecting process. This model involves product degree, dependent parameters, intercept, basic function and knots. The GEP model is based on the same idea as GP and GA, but it have characteristics of model (Danandeh Mehr et al. 2018). In particular, GEP uses fit testing, genetic operators and individual population (Esau et al. 2019). The RF another model using regression method with combinations of trees predictor. RF is random chosen segment for training data set of all trees (Breiman 2001). The prediction process of RF approach is displayed in Fig. 9. More details about applied other machine learning models can be found in supplementary material. The applications of machine learning models for prediction water quality in groundwater is demonstrated as following sub section.

Fig. 9
figure 9

Schematic view of the RF model

2.5.1 Bibliographic Review

Many investigations have been done in the field of water quality for groundwater using machine learning methods. El Bilali et al. (2021) applied Adaboost, ANN, RF, and SVR methods with 520 samples of dataset relate to 14 ground water quality variables in Berrechid aquifer, Morocco. They were applied EC, T, PH as inputs to prediction PS, SA, MAR, R, RSC, ESP, TDS as outputs. Their calculations showed that the general forecast performance of Adaboost and RF methods were highest than SVR and ANN. Moreover, generalization capability and sensitive to input analysis showing that ANN and SVR methods have better ability in generalizing and lower sensitivity to input parameters than RF and Adaboost. Lopez et al. (2021) predicted groundwaters uranium concentration via RF regression, using 23 environmental variables from statewide ground water geochemical data base and in publicly accessible maps of soil and aquifer physico-chemical properties in California. They noticed that ground water of concentrations Ca, NO3, and So4−2, soil pH, and clay contents (weighting average from zero- and two-meters depth) were the most significant predictors of groundwaters uranium concentration. Singha et al. (2021) introduced a DL based model for predicting GWQ and compared with other machine learning models, ANN, XGBoost, and RF. A total of 226 ground water samples in India were used. Outcomes demonstrated that the DL method provided better predicting result by give higher precision compared to other ML models, input variable importance calculated through forecast approaches highlighted that DL technique is the most pragmatic and precise model in predicting the GWQ. Bedi et al. (2020) examined performance of leading ML classifiers in order to forecast the NO3 and pesticide concentrations for ground water wells, used land use, hydrogeologic and water quality data. Each scenario was able to forecast both the NO3 and pesticide concentration with accuracy equivalent to earlier attempts. The XGB generated the best prediction among other methods. Therefore, this model is able to provide a good alternative to other water quality prediction methods. Rodriguez-Galiano et al. (2018) applied a comprehensive GIS data base of 20 variables involving hydrogeological and hydrological characteristics as input for models to predict NO3. ML algorithms for example CART, RF and SVM were applied as wrappers considered 4 various subsequent search methods and most features that gained significance from RF and CART models were utilized as the embedding method. RF provided the better performance with mmce = 0.12 and AUC = 0.92. Rodriguez-Galiano et al. (2014) implemented RF to predict the NO3 contamination. A GIS database including 24 variables associated with essential hydrogeologic proprieties remotely sensed driving forces parameters and physical chemical parameters measured and used as input to design various predictive models of NO3. Also, RF was compared with LR method used various efficacy measures to guarantee their generalization capability. The outcomes showed the capability of Random forest to develop accurate model with robust regression abilities. Vijay and Kamaraj (2019) assessed the water quality parameters such as TDS, EC, PH, So4−2, Cl, NO3, CO3-, HCO3-, metals ions, trace components. There are two main classifications high and low levels of water pollution noted in Vellore district. This work emphasizes on predicting water quality by applying ML classifiers algorithms C5.0, Naïve Bayes and RF as leaner for water quality forecast with good accuracy and efficiency. Rahmati et al. (2019) chose three state of the art machine learning methods including kNN, SVM, and RF were chosen to spatially model ground water NO3 concentration. The methods were calibrated with NO3 concentration from 80 wells which equal70 % of the dataset and later validating with NO3 concentration from 34 wells which equal 30% of dataset. These methods were compared with uncertainties methods and these results highlighted that kNN model was the better model. Singha et al. (2020) measured concentrations for several heavy metals (HMs) and compute six groundwaters contamination indices. This research work also studied the performance of DL based regression model by comparison research. The findings shown that the DL-PMI scoring lower error than DL based ground water HM contamination model. Comparing of both models DL model provided better performance. Barzegar et al. (2017) applied ELM, MLP, and SVM methods to forecast the level of F pollution in the ground-water. The samples were evaluated for EC, pH, main chemical ions, and F. To develop these approaches, dataset of (Na+, K+, Ca2 and HCO3-) concentration was utilized as input and F concentrations as outputs. The result presented that ELM models performed better than MLP and SVM models. Stackelberg et al. (2021) established a BRT model for predicting pH condition in the US. The prediction result revealed that Ca2 contents of soil and aquifer material robustly control pH once combined with long flow-paths was most alkaline condition. A novel aspect of such a model was the presence of mathematically based estimates of groundwater flow properties (age and flow-path length) as predictor parameters. Knierim et al. (2020) employed BRT models to predict SC and Cl, and TDS were computed from the correlation with SC. The descriptive parameters for BRT model comprised fit location and construction, surficial parameters, and parameters extracted from the ground water flow model, including predicted groundwater ages, and the BRT model was able to be capture zones of notarized higher salinity that exceeded the TDS minor extreme pollutant levels for drinking water of 500 mg per litter. Parameters that worked as surrogate for place along groundwaters flow-paths were essential predictors, indicating that much of the controls on dissolving solid is related to rock water collaboration as residence period increases. Khalil et al. (2005) studied various models to predict NO3 concentrations in ground water such as ANN, SVM, LWPR, and RVM. The prediction result showed the capability of machine learning to develop an accurate model with robust predictive ability, therefore constituting a useful means for saving efforts in groundwaters pollution modeling and developing model performance. Esmaeilbeiki et al. (2020) data-intelligent models used data collected from 367 wells in India to chose scenarios to predict EC and TH, which are GMDH, DENFIS, GEP, MARS, and M5 Tree. Outcomes revealed that GEP for input combinations involving TH, Na+, Cl-, HCO3- parameters and DENFIS using Ca, Mg, EC parameters provided a better performance in predicting TH and EC, respectively. Maiti et al. (2013) proposed an ANN based method set in the BNN structure and employed it to predict GWQ based on the geo-chemical and geo-physical datasets gathered from the western part of India. This study showed that the trained model could classify validation and test sets with 85% and 80% accuracy, respectively. Based on cross-correlation evaluation and Gibb’s diagram of geochemical characteristics, a groundwaters attributes of the study area were classifyed to sets: “very good”, “good”, and “unsuitable”. BNN model based result indicate that most values of GWQ fall in the range of “good” to “very good” excepting for a few places near the Arabian Sea. Bui et al. (2020) used a novel data mining algorithm that included the GP in the large GWQ data to predict NO3 pollutant and strontium possible future increase concentrations groundwaters. The prediction result was comparing with RF, M5P, and RT models, it was indicated that GP algorithm outperformed over other algorithms used water quality parameters (EC, T, HCO3−, pH, F−, Cl−, K +, Ca2+, Na +, Mg2+, and SO4−2) as the inputs to the models. Yang et al. (2017) measured the GWQ via MLP-ANN optimizing by BP algorithm and WNN model in the coastal aquifer, South China applying eight parameters, e.g., TDS, TH, COD, Cl, So4−2, NO3, and F. The findings attained from these approaches demonstrated that WNN has the highest accuracy comparing with ANN. Norouzi and Moghaddam (2020) applied the fuzzy groundwater quality index (GQI) to reduce the uncertainty via the fuzzification of the GQI technique. Moreover, RF algorithm, as the learning approach based on the ensemble of decision tree, was applied to assess the GWQ. For drinking purposes, validating and comparing the methods showed that the method fuzzy groundwater quality index has high accuracy as the more reliable technique in GWQ assessment. Additionally, the result demonstrated that the RF model could be used as a reliable model for ground water vulnerability, investigation and appropriately manage or monitor of any aquifer. Nolan et al. (2015) employed different machine learning models, BRT, ANN, and BN to predict NO3 concentrations in California. Prediction result revealed that BRT better than BN and ANN. Whereas BN ranked as second-best predictive model after BRT. Also, comparisons between these models with MLR and RF regression model were done. The results of BRT were compared to RF, MLR had low holdout with R2 = 0.07 and described less than half difference in training datasets. Spatial pattern of prediction via BRT model agreed acceptably well with earlier actual patterns of NO3 occurrence. Singh et al. (2014) built ensembles of classification and regression models and used the ground water hydro-chemistry database at India. SDT, DTF, and DTB methods were compared with the SVM model. The DTF and DTB methods showed more accurate performance than SVM in classification and regression. It should be mentioned that combination of bagging and stochastic gradient boosting algorithms in DTF and DTB methods, respectively resulted in their improved prediction capability. The ensemble models effectively presented the influence for seasonally differences and anthropogenic acts on groundwater hydro-chemistry.

2.5.2 Results of Machine Learning Algorithms

The approaches that fall into the above-mentioned groups are summarized in this section. Machine learning algorithms are common methods after traditional techniques.

  1. 1)

    MARS is a flexible model and it has the capability of managing high dimension problems in regression and classification, making it a superior model to testing for predicting water quality of groundwater. It could independently classify additive contribution (Friedman 1991). Also, the M5 Tree model could produce good generalization, best comprehend the parameters relationships, avoid overfitting, and attain good accuracy. Nonetheless they displayed inferior performance in comparison with DENFIS and GEP (Esmaeilbeiki et al. 2020). GEP encode specific population as linear string for fixed length and later express them as non-linear entities of various shapes and sizes.

  2. 2)

    Also, the accurate performance of kNN model in classification was shown, and it performed better than SVM and RF (Rahmati et al. 2019). On the other hand, kNN required more discovery along with research opportunity for hybrid k-NN models.

  3. 3)

    There are many variants of DT that are remain to be tested concerning water pollution and GWQ such as naïve bayes trees. Model solve problems (recursive dive and conquer) and generalization of complex issues, leads to increase of applying complex data of GWQ prediction. Additionally DT models are easy to understand (Vijay and Kamaraj 2019).

  4. 4)

    BRT model do not require data transformation, are able to be fitted for complex non-linear relationships, and automatically way combines interactions effect between predictors (Elith et al. 2008). The BRT model is, thus, fit suitable for prediction water quality condition complexity, heterogeneous setting for instance the glacial aquifer system (Erickson et al. 2018) and further aquifers setting (Ransom et al. 2017;Knoll et al. 2017). Also, in (Knierim et al. 2020) the BRT model supporting the hypotheses for each surficial or deep sources of salinity. Moreover, BRT proved it has good capability in predicting patterns of NO3 (Nolan et al. 2015).

3 General Research Assessment and Evaluation

Analysis of artificial intelligence models shown that these methods generated better result than statistical techniques for example linear and logistic regression as exhibited in Table 2, ANN methods were applied predominantly as a first methods for GWQ modeling. Additionally, use of new classification methods such as KSOM as a type of ANN model for recognizing polluting and nonpolluting wells must increase. The propagation of using the AI approaches began in the late 1990s, and many AI applications involved using various classical AI methods including ANN, ANFIS, SVM and ML. The latest trends include hybrid models, which are an important development of AI models that integrated different basic algorithms for estimation of water quality in groundwater. These combinations make standalone models more reliable and cost effective through decreasing the cost of computation and increasing the accuracy of produced outcomes. A limited number of papers were found to hybrid models in GWQ modeling in this work.

All former review and study papers have regularly emphasized only input variables, which are is considered as crucial for the prediction procedure, and many input optimization methods were applied in order to guarantee the most likely inputs used are qualified to relationships recognized with output variables. Alhough researchers have studied the input and output relationship for prediction, attention and fewer analyses have been considered in GWQ review papers regarding outputs variables. Because the basic method of these papers is to determine the relationships between the input-output. Also, because of it could be those inputs are utilized for training, optimization and dimensional decrease, and the relationship is determined depending on its sensitivity toward selected outputs; therefore, they are emphasized in this study. Additionally, the outputs are usually chosen based on the issue addressed via the researchers for instance, when the issue is salinity in groundwater, therefore, EC or TDS is carried as output not through using any predefined method. The main output variables, as highlighted as in Fig. 10 represent a number of studies which employed every output variable out of all the considered studies. Consequently, the present review focused on the evaluation of output and which output variable have frequently used to measure water quality in groundwater in most studies. Figure 10 shows that, out of 81 measured papers, most studies reviewed in this study used variable NO3 with 20 out of 81 studies as output to measure water quality of groundwater, which indicates how essential this variable is in predicting GWQ. NO3 was mostly considered in case for GWQ classification and regression where multiple, or single parameters were evaluated versus government standard as well as classifying in simplified classes. From Fig. 10 the other most popular GWQ variables are TDS, EC, and TH which are an important, a part of many water related reactions. For instance, TDS is crucial water quality and groundwater healthy indicator, and a high concentration of it maybe cause people to suffer from some diseases such as kidney and heart illnesses. Water that has a high-level of solids maybe caused laxative or constipation effects (Sasikaran et al. 2016). Also, the increase in ions concentration improves the EC of water. Usually, EC reveals mineralization and salinity in the water and the value of dissolved solids in water determine EC. The impact of SO4−2 reduction results play a major role in mediating redox conditions and biogeochemical processes for ground water system. Along with Sulfate, Cl which are essential inorganic anions, are found in differing concentrations in natural waters and constitute an indicator of contamination. Their presence in groundwater may reveal anthropogenic pollution because they exist in urine and maintenance products. Another output frequently used is WQI which is computed via a complex numerical formula using excessive time and might have incorrect values. This process included using multiple water quality parameters to calculate and compare them with standards values to improve comprehension for the nonscientific community, for example policy makers (Smith 1990). The essential GWQ variable is pH, which is the seventh most used as output variable, hydrogen ion’s ability is the most powerful initiators of each chemical and biological processes, including decomposition, toxicity, solubility, and degradation. Only some variables maybe have not directed and cooperative influence. For example, increasing Na + in irrigation water maybe result in secondary alkalization of the soil. Also, the COD variable, is used in the GWQ indicators which have visible microbial and chemical reactions that occur in presence of oxygen in water. COD signifies the need for care or the scope of rivers contamination. In addition there are many output variables used to measure the GWQ not displayed in Fig. 10 because they were only used once, such as PO4, Ni, source of pollution, E.coli, uranium,.., etc. According to the reported review, it’s obvious that the performing ability of artificial intelligence is reliant on water quality’s multidimensional nature. While constructing the regression/classification models, complexities depend on number of various input variables combined such as less is best while excessive input variables increase complexity of the model. Its important to highlight the ability of AI methods in many applications with various model versions as in Table 2, and it could expand dependent upon government policies, geological site, regional population behavior, climate changing and other exclusive scenarios. Therefore, there is no concern in applications of AI models in easy cases, nevertheless, everchanging of GWQ caused by dependent variables could be causing a significant influence on prediction outcome.

Fig. 10
figure 10

The output variables used for AI in predicting GWQ

Figure 11 demonstrations a number of review studies regarding every country where the study area is situated. Many study areas are situated in Iran and India, with 26 and 20 out of 81 papers respectively. This point may display the attention of Iranian and Indian academics in water quality in groundwater realm. However, it could be because of dryness or partial dryness of areas as Iran, for example the surface water resources are at a low level, and groundwaters is the greatest obtainable water resources, and consequently GWQ database are more accessible compared to data of surface water. The next category has the USA with 10 papers and Turkey with 4 papers and the remaining countries can be seen in Fig. 11. It’s known that the black box AI models are valuable in GWQ modeling, however, they aren’t developed utilizing comprehensions on physical process engaged. In this kind of prediction, information regarding fundamental mechanisms is not required and a major aim is attaining predictions accurately.

Fig. 11
figure 11

The sum of studied AI models in GWQ modeling with respect to the countries

Performance indicators are statistical measures that help for developers to evaluate and calibrate prediction performance on any platform. In addition, performance precision and effective is translated to comprehensible and expressible forms. Twenty evaluation criteria were used for predicting GWQ as represent in Fig. 12. The quantitative errors measure method is mainly used with AI techniques. The root mean squares errors (RMSE) is the most often applied squared errors measuring system with 46 papers, followed by Determination coefficient (R2) (29 papers, correlation coefficient (R) with 19 papers, the mean squared errors (MSE) and mean absolute errors (MAE) with 14 and 15 papers respectively, and Nash Sutcliffe efficiency (NSE) and mean absolute percentage errors (MAPE) with 7 papers. Some performance metrics are not illustrated in Fig. 12 because they not commonly used in many papers such as (%bias, CE, RSR, SI, WI, mmce, AUC, ROC, …, etc.).

Fig. 12
figure 12

Performance indicators that have been used for prediction models’ evaluation and assessment

RMSE is commonly used because it could reveal numerous deviations, its reliable and provides the highest weights to errors. Outliers must remove R2, which is the second most preferred metric, assessing forecasted values generated via such method with respect to observed values, that provide performance of models and range between zero to one. Other set involves the absolute error, which are named: MAE, MSE and MAPE. They’re an absolute variance between observed value and predict value. However, they do not model validity, they only provided information about range of errors and they have good sharpness. R and NSE measure a robust relationship among dataset and comparative movements for two data sets by difference units. Various papers have been reported classifying the GWQ to classes named, “excellent”, “good”, “poor” and “very poor” according to standards of GWQ guidelines of study areas. Among calculated performance indicators, though R2 and R were commonly applied for model’s estimation, they are over sensitive to the highest amounts and insensitively to addition and proportionate variations between models prediction and actual dataset (Moriasi et al. 2007). However, the MAE has been applied as a major measurement because the ultimate values in the forecast dataset are less sensitive than RMSE (Fox 1981). According to studies, NSE is the normalized metric that quantifies the residual variance’s relative magnitude in comparison to the calculated data variance (Nash and Sutcliffe 1970). According to Moriasi et al. (2007) the efficiency guidelines of the hydrological model assessment are very successful for the hydrological model, where NSE is more than 0.75. Therefore, this is another extraordinary measure to be examined for such modeling.

4 Conclusion and Future Trends

This study provides a review of the innovations of water resources engineering on applying artificial intelligence models, novelty of water quality in groundwater modeling with various challenging problems in many developing areas and aims to be attained in the forthcoming research, AI modeling approaches have been successfully applied for water quality in groundwater papers, as reviewed via 81 research studies from 2001 till 2021. The AI methods are integrated tools of biological sciences, statistics, and mathematics. These models could overwhelm the particular difficulties via behavior, thinking, and learning. Once integrated with learning they become machine learning. The AI methods could be helpful once it’s difficult to develop a sufficient information drive simulating model because of a shortage for ability to acceptably build the mathematic or physic models for the basic processes. AI methods have some important steps involving inputs dataset consideration, inputs dataset partition, control of model characteristics, test set, training set and so on. All these stages will be carefully designed, and the model performance is expected to be reasonable. Even so, it must be noticed that there was no generally accepted method for performing these stages; rather some various research studies performed each stage empirically and/or by trial and error, taking available data and existing conditions into account. The analysis of the study’s findings, which were divided into two distinct sections for example the assessment of each AI group and the overall assessment and evaluation, will serve as a valuable resource for researchers looking to conduct comparable work in a related area, create novel approaches, and increase the quality of modeling. To achieve this aim, the next points are recommended:

  1. 1.1.

    AI methods are tools for data and, if proper and competent control stations are developed that can deliver continuous and stable data worldwide, they can achieve the highest precision with consistent, regular and adequate amounts of data.

  2. 1.2.

    . Several fields of research include data collection challenges and some problems are impossible to solve. Therefore, more AI models should be developed that can address the lack of data and the lack of continuous information. With recent technological advances, such as GIS and remote sensing, the problem of lost data and data, especially unreachable stations, can be solved. These tools may also be used to control resources or to model the environment. For better spatial GWQ modeling, high resolution data from satellites is available.

  3. 1.3.

    As to the various GWQ simulation AI models, a specific kind of AI model for a particular problem is practically not to be recommended. It is obvious that a hybrid or coupled model performed better than single AI models. In the various stages of the GWQ modeling, various types of AI techniques can be evaluated to choose a better AI model in each stage and combine it to achieve an optimum modeling efficiency.

  4. 1.4.

    The AI models can be implemented in the evaluation of groundwater quality in three significant application areas. The fields of application have been divided as:

    1. 1.

      Concentration estimation of different groundwater quality parameters: For reliable concentration estimation in groundwater, AI models such as ANFIS, ANN, SVR, and tree-based models can be effective instruments in this field. According to the literature, the efficiency of these models is comparable and close to each other, but newer models such as ANFIS and tree-based models provide more reliable results. However, because ANFIS employs fuzzy logic, it outperforms it in scenario-based problems.

    2. 2.

      Classification of groundwater quality parameters for instance NO3 polluting areas: for this aim, the SOM model was used as a powerful model to classify polluting and unpolluting areas, this model has superior visual capabilities, and the SOM model can be an excellent alternative in this filed.

    3. 3.

      Optimization of the prediction of groundwater quality parameters: the main focus of this field of application is the determination of hydro economic mechanisms that optimized economic expenditures for environmental purposes and various scenarios. The AI models can be optimized using multi-objective particle swarm optimization (MOPSO), Non-dominated Sorting Genetic algorithm (NSGA-II), and multi-objective shuffled complex evolution metropolis (MOSCEM-UA) algorithms, which are the well-known meta-heuristic optimizing methods to multi objective optimization in water resources management therefore, it can be a great option to find the best results in this area.

  5. 1.5.

    Several AI approaches, such as ELM, SVM, MARS models, have been effectively employed for GWQ estimation but as standalone models, they are not ever combined with bioinspired optimization techniques. They may then be stuck in optimal local solutions and cannot estimate GWQ accurately. In order to address this problem in future studies, standalone models must be coupled with other algorithm for example evolutionary algorithms such as Ant colony optimization model, genetic algorithm etc., then performance of this novel or hybrid model must be examined in estimate the GWQ. The achieved outcome from hybrid model comparison with standalone model will enable us to determine hybridization effectiveness compared to standalone models and delivers insights into the strengths of hybrid models.

  6. 1.6.

    After reviewing all the research papers regarding GWQ, it could be conclude that the contaminations mixing processes of groundwater quality are complex and causing a synergetic reactions produces changes in water quality composition. During the experimental period, other variables such as irregular velocity bed configuration, sediment load, and the dead regions should therefore be addressed.

  7. 1.7.

    Because of the low number of studies in GWQ field modeling using hybrid–AI models, its recommended that additional research be conducted on this topic.