Introduction

Water is a general requirement to plant and living organisms on the earth’s surface. It is essential for maintaining the balance of ecology, atmosphere, and natural resources. While natural resources are most important for whole natural life systems on the ground, harmless water must not encompass any harmful chemical materials or living bacteria in concentrations that affect impairment (WHO 2017). Growth and development in the world have led to extensive pollution from rainwater outlets like rivers (UNEP 2016). Many factors can affect the chemical, physical, and biotic substances of surface water, for example natural (i.e. rainfall, watershed geography, weather, geology) and anthropogenic activities (i.e. industrial activities, domestic, agricultural run-off) (Mishra et al. 2017; Ewaid et al. 2018; Su et al. 2018). The category and concentration of dissolved salts and solids have been identified by irrigation water quality (Mirabbasi et al. 2008; Ramakrishnaiah et al. 2009; Shayannejad et al. 2020; and Golian et al. 2020). The groundwater quality monitoring system included plans and activities for assessing the quality of water resources and the performance of the pollution reduction system (Tavakol et al. 2017; Shayannejad et al. 2021). Agriculture is the most critical consumer of fresh water in the semi-arid region, as irrigation practices more than two-thirds of the world’s accessible freshwater resources (Aliyu et al. 2017). The various climate change factors and manufacturing activities directly affected the water quality parameters, in which water quality is most affected due to activities. The regular monitoring of the drinking and irrigation water quality parameters is essential in sustainable development in groundwater and agricultural parts. Several approaches have been used to estimate the consequence of irrigation water on plants and soils. Various scientists and scholars have used different irrigation indices like SAR, Na%, KI, PI (Cieszynska et al. 2012; Fakhre 2014). The agricultural sector is also the major employer of water, accounting for 80% of the overall consumption and a source of water pollution. To maintain sustainable agriculture, strategic water planning using a reasonable cost for irrigation is needed. The irrigation water quality index (IWQI) has been created using the parameters defined by FAO guidelines 29 (Ayers and Westcot 1999). Machine learning (ML) models are significant and convenient accompaniments and replacements in water index and water quality forecast (Chang et al. 2017). In general, ML models are more concerned with the association of mapping between a system’s inputs and outputs than with complex process mechanisms. The highly nonlinear relations can be reliably computed with or without prior information for the studied system by learning from a vast amount of historical information that contains the dynamic evolution mechanism. In this regard, various machine learning techniques, such as an artificial neural network, have been successfully developed for algal prediction (ANN) (Recknagel et al. 2002; Chang et al. 2017; Tian et al. 2017). The analysis of the statistical-based method is functional mainly to create the WQI. Utilized in Egypt, a multivariate analysis was used to establish the IWQI for surface water (Jahin et al. 2020). The findings show that water quality can be regulated quickly and cheaply using principal component analysis (PCA) and factor analysis (FA). The analysis of the PCA and Hierarchy of the Al-Gharaf River (CA) in Iraq examines potential pollution sources. Decision-makers may use these findings to reduce the number of samples analyzed and prioritize measures to enhance the river’s efficiency (Ewaid and Abed 2017). While all these models were developed and considered effective instruments to evaluate IWQ, their application requires significant parameters and studies considering the cost and time of the study, and their applicability. Therefore, the model based on the prediction can be used by farmers to optimize water quality assessments (Ewaid et al. 2019).

Long short-term memory (LSTM) is famous in deep learning models. The LSTM is a recurrent neural network (RNN) that gathers prolonged serial data in the hidden memory for processing, demonstration, and storage. It also upgrades over time to guarantee that information is appropriate. LSTM was used by Kratzert et al. (2018) to model the rainfall-runoff flow in 241 watersheds. Specific DL approaches have been applied to forecast water quality factors in particular. For example, Liu et al. (2019) predicted drinking-water quality through LSTM deep neural network in Yangzhou, China. Multiple neural network models estimate the level of water of an integrated sewage outflow architecture, namely LSTM and a gated recurrent network (GRU) used by Zhang et al. (2018). LSTM was used by Kratzert et al. (2018) to model the rainfall-runoff flow in 241 watersheds. Specific DL approaches have been applied to forecast water quality factors in particular. For example, Liu et al. (2019) predicted drinking-water quality through LSTM deep neural network in Yangzhou, China.

In addition, some models are currently developed to predict the availability of groundwater for agriculture irrigation purposes in a dry area. However, Ewaid et al. (2018) have created a model for quick prediction in the irrigation water and water quality indices and multiple linear regression (MLR) for agriculture, industry, and drinking purposes of the Tigris River in Iraq. The machine learning models have created five groundwater quality parameters. The regression analysis is the statistical technique used in the secondary relationship variable, i.e. the dependent variable, to establish independent variables. Dependent and fully independent variables are being referred to as responding and predictor variables. The regression model shall adequately define a dependent variable, predict it, and control it based on separate variables. Several authors investigated and applied regression analyses (Charulatha et al. 2017; Noori et al. 2015, 2017). The major focus of this paper is to (1) explore the capability of ML models, namely the ANN and MLR models. To compare LSTM models with predicting irrigation water quality parameters in two scenarios (sodium absorption ratio (SAR), percentage of sodium (%Na), residual sodium carbonate (RSC), magnesium hazard (MH), Permeability Index (PI), Kelly ratio (KR)). (2) To compare the performances of the LSTM models with ANN and MLR for short-range prediction of irrigation water quality factors, and (3) to develop two scenarios (ANN, LSTM, and MLR) that show which models are best in both scenarios. Also, to the greatest of the authors’ knowledge, such a type of study, done for the first time, two scenarios have been utilized to predict irrigation water quality parameters. This research outcome is working to help the farmers, crop system, and groundwater development in the semi-arid nations to the prediction of irrigation water quality within a quick time with low-cost results.

Study area

The Akot basin area is situated in the Akot Taluka of Akola district of Maharashtra, between 20°54′30″ and 21°14′35″ N latitudes and between 76°48′ and 77003′E longitudes with 450 sq. km. This study area’s minimum and maximum temperatures are 12.6°C and 42.4°C (Fig. 1). The observed annual rainfall is 740 to 860 mm. The deep black is soil found in the southern part of the Akot basin. This soil has a deep, heavy colour with an angular block structure in the sub-surface horizon, medium drained and low to moderate water support. This basin is under the saline water zone because most the groundwater has very highly salted water found within the basin area. In this view, most of the farmers are suffering from so many groundwater quality issues. Farmers who use the poor quality of irrigation groundwater have experienced direct decrease in crop production and soil fertility in the basin. The Purna River Alluvium, which covers Akot and Telhara talukas as well as the northern sections of Akola and Balapur talukas, is afflicted by inland saline issues as well as drought and water level drops. During exploratory drilling, operations in hard rock sections of the Akola district, encountered a wide range of topics, the most common of which were caving formations (red bole) and drilling medium loss (Khadri et al. 2013; Khadri and Pande 2015a, 2015b; Pande et al. 2019a).

Fig. 1
figure 1

Location map of study area

Methodology

Machine learning models were currently used to estimate most groundwater quality variables precisely and show their effectiveness (Rahgoshay et al. 2018; Ho et al. 2019). One hundred groundwater samples are obtained from this prediction model. The dataset was collected from observation wells within the basin area. We have used 140 water samples for this model. We used 70% of data in training (98 samples) and 15 % of data used in ANN model validation. This research uses 15 % and 30 % data for ANN, MLR, and LSTM models’ prediction purposes. This study has developed three machine learning models to predict irrigation water quality parameters, specifically ANN, LSTM, and MLR models. Therefore, three machine learning models, LSTM, MLR, and ANN, were selected for both prediction scenarios. The first and second scenarios of MLR, LSTM, and ANN models have been based on all input and reduction variables, respectively. These ML models of the first scenario show the very high accuracy of prediction of irrigation water quality parameters for the Akot basin (Saline tract) compared to the second scenario. ANN, LSTM, and MLR models were applied to the actual dataset in both scenarios. The compiled dataset was divided into two processes at this stage: training and testing. The developed model of ANN, LSTM, and MLR was validated with all the results and errors. After creating the above-stated ML models, model performance has been calculated based on matching with actual and predicted water quality data for every model. The accuracy of each developed model was checked by the mean squared error (MSE), correlation coefficients (r), root mean square error (RMSE), and mean absolute error (MAE) (Fiyadh et al. 2019).

Predicting modelling should be considered to model the past data collection, and the ever-changing dataset is not corrected. At the same time, it is permissible to use the knowledge for function fit (finding the internal relationships of the components). The neural network system for the two ANN models of prediction of water quality is created. Developed models have been framed by the hidden layers, the number of nodes in each layer, and the kind of transmission function. The various irrigation water quality variables, namely KI, MH, PI, RSC, SAR, and SSP, are widely used to classify irrigation water quality. PCA and correlation analysis was carried out using the computer-aided software package SPSS software, which is used to understand irrigation water quality status. Water quality data were collected from observation wells in the study area and divided into two scenarios by the fraction of training and testing datasets used in model development.

Artificial neural network

In recent years, scientists, researchers, and decision-makers in the field of water quality management are highly applying neural network modelling to identify the source of pollution, clear view of the quality of groundwater for specific purposes and watershed management (Ostad-Ali-Askari et al. 2017; Yıldız and Karakuş 2019; Vasanthi and Kumar 2019; El Baba et al. 2020; Ostad-Ali-Askari and Shayan 2021). The artificial neural network was first time developed (McCulloch and Pitts 1943). In the view prediction of water quality modelling part, physical and chemical characteristics of groundwater are the input of the model and predict the quality for future years. In previous studies, mathematical modelling was used to predict the KR, MH, PI, RSC, SAR, and SSP for irrigation use purposes in the semi-arid region. Among the various methods, ANN gives a more accurate and efficient manner for predicting and analyzing the vast dataset. The basic form of ANN consists of three layers: input, output, and hidden layer. The given dataset has been read in the input layer and allocated the correct number of receptors based on the independent variables (Fig.2).

Fig. 2
figure 2

Predication methodology flowchart of irrigation water quality

In the hidden layer, the calculation operation is carried out to give the output of this layer by multiplying the input value with the corresponding weight (Othman et al. 2020). The output layer has taken the input value by multiplying factors with corresponding weights and gives the calculated value of each variable in the ANN model (Fig.5). Before running the ANN model, training the data is the primary and most significant process for the model for the most accurate outcome (Kim et al. 2020). For training the data, the feed-forward backpropagation algorithm was used to prepare the dataset. The present study assigns KR, MH, PI, RSC, SAR, and SSP as input variables. Based on these parameters, the irrigation water quality parameters for the study area were predicated. The quality of groundwater for irrigation purposes is present as one output parameter. The data usually refers to the Akot basin to estimate the prediction model’s feasibility and assess the proposed model’s ability in various climatic conditions. Figure 5 displays the prediction methodology flowchart of the stages followed.

Multi-linear regression

Multi-linear regression analysis is one of the most basic mathematical models. It is based on linear relationships with both inputs and outputs, or, to put it differently, it derives linear correlation between various variables by incorporating a regression steady into the formula. Results of MLR model are based on the equation below in 1:

$$y={b}_0+{b}_1{x}_1+{b}_2{x}_2+\dots {b}_i{x}_i$$
(1)

where:

Y: the independent variable

B: the regression constant

X: the ith predictor

LSTM model

The LSTM model is a sophisticated recurrent neural system specifically developed to avoid the exploding/vanishing gradient difficulties common when learning long-term dependency, even when the relatively little time lags are extremely lengthy. Ouma et al. (2012) had introduced the LSTM method to overcome this problem. In the simulation of sequence-based issues with long-term dependencies, the LSTM is better suited (Chang et al. 2015). LSTM means a long short-term memory model compared with ANN and MLR models. The memory blocks of the LSTM-RNN model include the input, forget and output gates, which are used to reset the hidden units (Fig. 3). The gates are responsible for directing the network’s internal operations. Despite other LSTM types, a comparison study indicates that the conventional LSTM remains the most significant (Gref et al. 2017).

Fig. 3
figure 3

LSTM model layers

Model’s performance criteria

To compare and estimate the machine learning models, the subsequent statistical performance criteria measures were used in Eqs. 2 to 4:

$$\mathrm{RMSE}=\sqrt{\frac{1}{\mathrm{N}}{\sum}_{i=1}^N{\left({Q}_A^i-{Q}_P^i\right)}^2}$$
(2)
$$R=\sqrt{1-\frac{\sum_{i=1}^n{\left( Zi-{Z}^i\right)}^2}{\sum_{i=1}^n{(Zi)}^2}}$$
(3)
$$\mathrm{MSE}=\frac{\sum_{i=1}^n{\left( Zi-{Z}^i\right)}^2}{n}$$
(4)

where Zi and Zi are the measures and estimated value; n is the number of values used in the model. Two models can be helpful for regression and sorting, learned by their particular methods and validated during the training process for hidden data. The comprehensive analysis of the artificial neural network modes should be marked as beyond the goals of this study. The performance of two models was estimated as per the old statistic performances such as coefficient of correlation (r) and mean square error (MSE).

Agriculture water quality parameters

In this study, six parameters as KI, MH, PI, RSC, SAR, and SSP, were selected to predict the value of water quality factors and calculated to obtain the appropriate groundwater quality for agriculture uses in the study area. The prediction of irrigation water quality was estimated using ANN modelling.

Kelly ratio

Kelly ratio is an important index to evaluate the suitability of groundwater for irrigation uses. Kelly (1940) developed the equation to estimate water quality parameters such as calcium, magnesium, and sodium on groundwater quality for irrigation purposes. The classification of groundwater based on KR values 0 to 1 is suitable, and greater than one is unsuitable for irrigation uses. The following formula was used to estimate the KR of groundwater.

$$KI=\frac{\mathrm{Na}}{Ca+ Mg}$$
(5)

Magnesium hazards

Magnesium hazards is another primary index to assess the groundwater quality for irrigation use. The concentration of magnesium in groundwater plays a vital role in crop yield and growth. In general, calcium and magnesium maintain the state of groundwater equilibrium. The excessive concentration of Mg2+ causes soil structure deterioration, increasing the soil alkaline nature and reducing plant growth. The classification of groundwater based on MH is that the value less than 50 is suitable, and greater than 50 is unsuitable for irrigation purposes. The following formula has been used to calculate the MH value of groundwater.

$$M\mathrm{H}=\frac{\mathrm{Mg}}{Ca+ Mg}\ast 100$$
(6)

Permeability Index

Soil permeability plays a vital role in crop yield and the water circulation process on the field. The permeability of soil gets affected due to excessive concentrations of sodium, calcium, magnesium, and bicarbonate in groundwater for a long time. The classification of groundwater based on PI is class I (greater than 75%), class II (25–75 %), and class III (less than 25%). Doneen (1964a, 1964b) developed the formula for PI to estimate the water movement in the soil layer (Ghazaryan and Chen 2016).

$$\mathrm{PI}=\frac{\mathrm{Na}+\surd \mathrm{HCO}3}{Ca+ Mg+ Na}\ast 100$$
(7)

Residual sodium carbonate

Residual sodium carbonate (RSC) is a significant index to assess the groundwater quality for irrigation uses. The concentration of bicarbonate and carbonate highly influenced the groundwater chemistry and its quality for irrigation use. The quality of groundwater diminishes when the concentration of carbonate and bicarbonate exceeds the total concentration of calcium and magnesium. Eaton (1950) developed the formula to estimate the RSC value of groundwater (Eq. 8).

$$RSC=\left({HCO}_3^{-}+{CO}_3^{2-}\right)-\left({Ca}^{2+}+{Mg}^{2+}\right)$$
(8)

Sodium absorption ratio

Sodium adsorption ratio (SAR) is a significant measure to evaluate the suitability of groundwater for irrigation purposes. The measured value of SAR divulged the relative concentration of sodium, calcium, and magnesium in groundwater. The excess sodium concentration in groundwater affects the quality of soil and deteriorates the groundwater equilibrium structure. The ratio of the sodium concentration and the sum of the concentration of calcium and magnesium gives the value of SAR of groundwater (Eq.9)

$$SAR=\frac{\mathrm{Na}}{\sqrt{\Big( Ca}+ Mg\Big)/2}$$
(9)

Soluble sodium percentage (SSP)

The concentration of calcium, magnesium, and sodium plays a vital role in groundwater quality for irrigation uses. The soluble sodium percent (SSP) classification of groundwater less than 50% is suitable, and greater than 50% is unsuitable for irrigation uses.

$$SSP=\frac{\mathrm{Na}\ast 100}{Ca+ Mg+ Na}$$
(10)

Principle component analysis

The PCA gives more accuracy, extracts correlation relationships, and reduces data into different components that describe a percentage of the total variance between chemical parameters. The varimax rotation methods have been adopted to identify the highest loading factors primarily related to groundwater’s chemical composition. The estimated high loaded factor helps identify the process involved in deteriorating groundwater quality in the study region. PCA visualizes the variables in two- or three-dimensional space to determine homogeneous observer groups or, on the opposite, unusual observational groups (Tables 1 and 2). In addition, the number of variables decreases without losing information (Praus 2019; Nguyen et al. 2020). PCA is used to determine the current correlations between the chemical components of the irrigation water quality parameters. To choose the elements of a robust correlation with irrigation water quality parameters and use them in constructing the models (training step) as inputs, the PCA component and scree plots are presented in Fig. 4. PCA technique was an analysis based on SPSS 25.0 software. Therefore, we have expected to describe the Pearson coefficient correlation matrix (Tables 3 and 4).

Table 1 Principal component analysis of groundwater during pre-monsoon
Table 2 Principal component analysis of groundwater during post-monsoon
Fig. 4
figure 4

PCA component and scree plots of pre-post monsoon in semi-arid region

Table 3 Correlation matrix between input and output variables of pre-monsoon
Table 4 Correlation matrix between input and output variables of post-monsoon

Correlation analysis

Correlation analysis has been used in statistical techniques for measuring the strength of a linear relationship between two variables. Due to independence or dependence, the variables are not selected. In most research, an analysis of correlation was used to examine the linear relationship between two variables. The correlation matrix has been developed by calculating the coefficient of various sets of parameters to compute correlation coefficients. Evaluating p values tested the significance of the correlation. The variation is significant if p is less than 0.05, 0.01 (p < 0.05 and p < 0.01) (Tables 3 and 4). The change is not significant when p > 0.05 (Eq. 6). The significance level is measured between 0.01 and 0.05 (Malik and Hashmi 2017; Sar et al. 2017; Tiwary et al. 2018). Pearson correlation analysis between all variables (input/output) to investigate their relationships is conducted, and those results are enclosed in Tables 3 and 4.

$$r=\frac{\sum_{i=1}^n xi\ yi-\frac{\sum_{i=1}^n xi\ {\sum}_{i=1}^n yi}{n}}{\sqrt{\left\lfloor {\sum}_{i=1}^n{x}_i^2-\kern0.5em \frac{\left({\sum}_{i=1}^n xi\right)2}{n}\right\rfloor}\sqrt{\left\lfloor {\sum}_{i=1}^n{y}_i^2-\frac{\left({\sum}_{i=1}^n yi\right)2}{n}\right\rfloor }}$$
(11)

Results

Evaluation of results

In the current period, machine learning models have been broadly used in various fields. They can be helpful to the prediction of future scenarios of conservational and natural processes. In this paper, we have studied LSTM, MLR, and artificial neural network models with two scenario results that are generally practised for measuring groundwater suitability in irrigation drives using the simply quantifiable input variables such as Mg2+, Ca2+, HCO3-, CO3, Na+, K+ for two scenario machine learning models. LSTM, ANN, and MLR models were carried out based on the number of neurons in the hidden layers. This study’s findings have shown machine learning models are very effective techniques for predicting water quality values (Elbeltagi et al. 2021). This paper describes the training and testing of LSTM, MLR, ANN models, as well as their validation and simplification outcomes for predicate values. The comparison between ANN, LSTM, and MLR models’ performance is enclosed in Tables 5 and 6. The ANN and MLR are more correct for predicting water quality values in the first and second scenario results. A detailed description is provided in the below sub-sections.

Table 5 Comparison between different models’ performance for scenario 1
Table 6 Comparison between different models’ performance for scenario 2

Comparison of training and testing datasets for scenario 1

We have included all irrigation water quality variable datasets that have been used for training and testing model developed in the scenario 1. The training and testing results obtained by LSTM, MLR, and ANN are presented in Table 5. As depicted in Table 5, in training and testing, ANN models have shown the maximum RSC value of R2=1 and R2=0.99, respectively. Other variables of ANN models such as RSC, MH, SAR, PI, SSP, and KI value are above R2=0.99, and the other values are (RMSE= 0.00064, 0.002735, 0.023108, 0.006181, 0.31305, 0.006116 and RMSE= 0.059414, 0.008781, 0.02963, 0.011446, 0.363731, 0.00634) in training and testing, respectively. ANN model has been given better performance as compared to other models for scenario 1. Similarly, MLR training and testing models developed based on the parameters such as RSC MH, SAR, PI, SSP and PI have been shown coefficient correlation, RMSE, and MSE value of (R2=1, 0.973859, 0.986722, 0.970681 and R2=0.974383 0.94756; 0.00064), (RMSE=2.32E−15, 0.016405, 0.049997, 0.01737, 1.158658, 0.0261 and RMSE=2.64E−15, 0.01934, 0.039245, 0.020062, 1.170854, 0.020295) and (MSE=5.39E−30, 0.000269, 0.0025, 0.000302, 1.342487, 0.000681 and MSE=6.98E−30, 0.000374, 0.00154, 0.000402, 1.370898, 0.000412), respectively.

Furthermore, the LSTM training and testing models of RSC and SSP have shown the lowest coefficient correlation and RMSE values of 0.92 and 0.066. Similarly, training and testing results of LSTM model of RSC MH, SAR, PI, SSP, and PI have shown RMSE and MSE values of (RMSE=1.438569, 0.00164, 0.002883, 0.001555, 2.246052, 0.0037 and RMSE =1.457929, 0.043908, 0.251246, 0.018754, 13.82245, 0.049018) and (MSE=2.069482, 2.69E−06, 8.31E−06, 2.42E−06, 5.04475, 1.37E−05 and MSE=2.125558, 0.001928, 0.063124, 0.000352, 191.060, 0.002403), respectively (Table 5). As a result, it can be assumed that ANN has predicted the accurate values of irrigation of water quality elements are the most effectively (Figs. 5 to 7).

Fig. 5
figure 5

Artificial neural network architecture

Fig. 6
figure 6

Graphical plots of the observed values and ANN model predicted values using the training, validation, test, and all dataset

Fig. 7
figure 7

Graphical plots of the observed values and ANN model predicted values using the training, validation, test, and all dataset

Comparison of training and testing datasets for scenario 2

The training and testing results obtained by LSTM, MLR, and ANN are presented in Table 6. In training and testing, ANN and LSTM models have shown the maximum RSC value of R2=0.89, 0.872 and R2=0.99, 0.88, respectively. Furthermore, other irrigation water quality parameters such as MH, SAR, PI, SSP, and KI value of (R2=0.654481, 0.982081, 0.844561, 0.826281, 0.889249 and R2=0.235225, 0.906304, 0.783225, 0.7396, 0.777924), (RMSE=1.068082, 0.057446, 0.059161, 0.03873, 3.096224, 0.037417 and RMSE=1.196797, 0.069244, 0.092815, 0.04616, 3.410414, 0.046528) and (MSE=1.1408, 0.0033, 0.0035, 0.0015, 9.5866, 0.0014 and MSE=1.2612, 0.0074, 0.0115, 0.0026, 12.1634, 0.0038), and (R2=0.999981, 0.999968, 0.999896, 0.911968, 0.999801 and R2=0.228355, 0.882314, 0.621794, 0.141853, 0.637924), (RMSE=0.042632, 0.000356, 0.002488, 0.001843, 2.198714, 0.00165 and RMSE= 1.876118, 0.144303, 0.136041, 0.070351, 8.174895, 0.066868) and (MSE=0.001818, 1.26E−07, 6.19E−06, 3.4E−06, 4.834343, 2.72E−06 and MSE=3.519817, 0.020823, 0.018507, 0.004949, 66.82891, 0.004471) in training and testing models of ANN and LSTM, respectively (Figs. 6 to 8). In training and testing, MLR models have shown the maximum SAR value of R2=0.95, respectively Furthermore, MLR models of training and testing have shown well performance as compare to ANN and LSTM developed models in scenario 2. Similarly, MLR training and testing models developed based on water quality parameters such as RSC MH, SAR, PI, SSP, and KI have shown coefficient correlation, RMSE, and MSE values of (R2=0.867167, 0.514359, 0.954267, 0.784515, 0.778099, 0.836761 and R2=0.866449, 0.465225, 0.956796, 0.791098, 0.755534, 0.848287), (RMSE=1.196797, 0.069244, 0.092815, 0.04616, 3.410414, 0.046528 and RMSE=1.177207, 0.06877, 0.075667, 0.045977, 3.399641, 0.03783), and (MSE=1.432323, 0.004795, 0.008615, 0.002131, 11.63092, 0.002165, 1.385817 and MSE=1.385817, 0.004729, 0.005726, 0.002114, 11.55756, 0.001431), respectively. As a result, it can be concluded that of all the machine learning models designed for training and testing, MLR has good prediction values of irrigation of water quality parameters as compare other models.

Fig. 8
figure 8

Graphical plots of the observed values and ANN model predicted values using the training, validation, test, and all dataset

Discussion

Artificial neural networks (ANN) use the neural network’s modelling structure, a robust method of modelling complex non-linear relations, mainly when the relationship between variables are unclear (Smith 1994). Each layer consists of one or more essential elements termed as a neuron or node. Each neuron represents an algebraic function which is assigned a parameter with limit values (Dryfus et al. 2002). In this investigation, ANN models were used to predict irrigation parameter values by using RSC, MH, SAR, PI, Na%, and KI in between irrigation water quality parameters (Figs. 12 and 13). The most acceptable number of neurons and training iterations of the hidden layer are essential indicators in ANN modelling. No specific algorithm is available to determine the adequate number of neurons in the hidden layer, and these values were obtained on a trial-and-error basis (Alizadeh and Kavianpour 2015). The complexity of the problem determines the number of hidden layers, and in most cases, a single hidden layer suffices to model a problem (Rezvan et al. 2016).

However, the outcomes of a high correct prediction of artificial intelligence methods for irrigation water quality are approved (Castrillo and García 2020; Ahmed et al. 2019; Liu et al. 2019; Lu and Ma 2020). The exact prediction is strongly dependent on the number and impact of the input variables, but all data must be available and cost-effective. A few studies currently use the parameter that can be used as input variables in situ and in real time (Castrillo and García (2020). In contrast, numerous variables have significant effects on groundwater quality, such as the hydrologic regime, land use, geomorphologic, and geologic conditions, as well as on anthropogenic activities, that need to be addressed in widespread use in areas that are different from those used in development (Pande and Moharir 2018). These situations play an essential role in the prediction mixtures of the input parameters used. A hybrid deep learning model, long short-term memory (LSTM), was used to predict the irrigation water quality, namely total nitrogen, phosphorous, and organic carbon (Liu et al. 2019).

The size of the training dataset has a significant impact on LSTM network training. It is widely assumed that network training necessitates a large amount of training sample data. However, the dataset size is determined by the catchment characteristics and flows of concern, which determine the complexity of the input-output relationships represented by the LSTM (Kratzert et al. 2019). The MLR models of ANN 4-6-6-1 and 1.028, and 1.106 were 0.836 and 0.882 during the training and testing period. Their findings showed that ANN’s evaporation estimation was superior to MLR’s, matching the current investigation findings (Alizamir et al. 2020).

The irrigation water quality is most important from an agricultural perspective and sustainable crop production. Currently, climate change factors directly impact surface and groundwater water quality (Ostad-Ali-Askari et al. 2018; Ostad-Ali-Askari et al. 2019; Pande et al. 2019a; Derakhshannia et al. 2020; Fattahi Nafchi et al. 2021a, 2021b). In this context irrigation, groundwater quality prediction can be helpful to maintain the excellent quality of groundwater under various climate change factors. Groundwater is one of the significant sources during the absence of rainwater, while anthropogenic activities affect the groundwater quality parameters, particularly irrigation water quality parameters (Moharir et al. 2019; Javadinejad et al. 2019; Talebmorad et al. 2021). They are mainly determined by analyzing an important number of water quality parameters to quantify the dissolved substance. However, in developing countries, the measurement of all groundwater parameters has often been unsatisfactory and costly. Therefore, opinion reduction and asset maintenance for water quality assessment are significant challenges (Pande et al. 2019b; Salehi-Hafshejani et al. 2019). This research outcome can be beneficial to the established dams for farming purposes, where evaporation degrades the chemical quality of the water considerably, especially in summer times. This work will thus assist farmers in managing water quality at an efficient cost. Over a short time, since the water assessment depends on the type of soil cultivation according to the water quality class for irrigation purposes, the machine learning model classification, including decreases saline water, is proposed for future studies. The error histogram of ANN models of scenarios 1 and 2 is presented in Fig. 9. The outcomes of MSE vs epoch variation of the deviation for ANN training and testing models are shown in Figs. 10 and 11. Figures 12 and 13 have demonstrated hidden neurons number vs MSE in ANN models 1 and 2. Figures 14 and 15 show LSTM model loss plots in scenarios 1 and 2 (Table 7).

Fig. 9
figure 9figure 9

Error histogram of the both ANN models

Fig. 10
figure 10

MSE vs epoch variation of the deviation for ANN models

Fig. 11
figure 11

MSE vs epoch variation of the deviation for ANN models

Fig. 12
figure 12

Show of hidden neurons number vs MSE in ANN model 1

Fig. 13
figure 13

Show of hidden neurons number vs MSE in ANN model 2

Fig. 14
figure 14

LSTM model loss plots during scenrio 1

Fig. 15
figure 15

LSTM model loss plots during scenrio 1

Table 7 Sizes of training, test and validation sets

Conclusion

Performance shows the machine learning models were given to forecasting values of the irrigation water quality parameters in the Akot basin (India). In the current research, water quality parameters were calculated by using MLR, LSTM, and ANN. The input water quality variables for forecasting water quality factors values were Mg2+, HCO3-, Ca2+, CO3, K+, and Na+. The water quality parameters were collected from observation wells in the Akot basin, processed, and analyzed in water laboratory, which is a significant concern and limitation of this study. The suggested models were trained and tested in two separate scenarios, i.e. scenario 1 and scenario 2, using different percentages of wells data. The ML models were assessed using statistical tools, including R2, RMSE, and MSE, through visual assessment used scatter plots, and line and bar diagrams. Investigation results showed the ANN and MLR models’ capability to predicate water quality parameters, integrating all six water quality parameters like RSC, MH, SAR, PI, SSP, and KI. The ANN and MLR models have been highest accuracy in scenario 1 and scenario 2, respectively. It is also clear that the testing data will show a highly precise modelled result for the water quality values forecasting with a large and small sample set for training. The analysis of PCA displays those 17 principal components described based on the results of the data. PCA is used to obtain the current correlations between the chemical components of the irrigation water quality parameters. However, it would also be helpful to conduct studies on benchmarking of different prediction models. It is also suggested that ANN and MLR models be applied under climatic conditions and water quality parameters. In future work, we will use these techniques in different areas. We shall also discover possible enhancements to the method, such as the assertion of lost values and the study of diverse global landscapes. Additionally, we shall encompass these approaches to the cooperative prediction of multiple parameters.