Introduction

Groundwater is one of the important sources of drinking water supply for the people all over the world; however, its quality is extremely sensitive and crucial problem due to anthropogenic pressures in many countries. The water quality is directly coupled with human health. Due to consumption of contaminated water, the health issues are increasing globally and mostly lead to hike in rate of morbidity and mortality especially amongst the children and poses serious threat to public health vis-à-vis mass epidemic in developing countries (Panaskar et al. 2016; Mukate et al. 2017; Wagh et al. 2018a, b). Moreover, approximately 250 million people get infected yearly, among them 10–20 million deaths mostly occur in developing nations (Dzwairo et al. 2006). In recent years, several researchers proved that groundwater quality has conspicuously deteriorated in recent years in most of the countries (Jeong 2001; Moon et al. 2004; Adimalla 2018; Gaikwad et al. 2019). Nowadays, due to inadequate fresh water resources, people are extensively using groundwater for drinking, irrigation and industrial purposes. Generally, groundwater is considered to be safe and reliable source of drinking water due to its natural, hidden existence and less vulnerable for contamination as compared with surface water. The quality of groundwater is an account of biotic, biochemical, and physical physiognomies in reference with quality standards for drinking (Khalil et al. 2011; Gazzaz et al. 2012; Wagh et al. 2016a). Therefore, groundwater quality evaluation is very important based on properties such as physical, chemical and biological, with reference to naturally occurring quality, human health impacts, and proposed uses wherein it depends on the amount of rain, rejuvenation of groundwater and water residence time on and within the surface (Logeshkumaran et al. 2014). Thus, monitoring of water quality is mandatory for the better management of accessible resources of water and to build up various remediation strategies in respective regions.

A water quality index (WQI) is a mathematical tool and composite indicator that provides information to be communicated to end users based on selected water content variables, converted into a single unit less value (Brown and Matlock 2011; Horton 1965; Brown et al. 1970). The WQIs have an advantage of determination of water quality status without interpreting the parameters individually. However, more than 20 water quality indices were developed and revived worldwide till 1970 (Bhargava 1983). In view of the easiness of their use and scientific base, several researchers used various WQIs for assessing water quality. In result, a huge water quality data are produced by the analysis and these data need compilation to conclude water quality status in a particular region. The groundwater quality indices have been developed to incorporate a set of parameters to generate a single index. Moreover, it is a dimensionless index that assigns an appropriate categorical value to cumulative set of measured chemical parameters of groundwater (Pesce and Wunderlin 2000). Subsequently, the WQI can be defined as a single numeric score that express groundwater quality status at a meticulous location over an explicit period (Kaurish and Younos 2007). Banerjee et al. (2018) carried out water quality assessment influenced by highway-broadening-induced activities in the Eastern district of Sikkim. Yaseen et al. (2018) studied various hybrid intelligence models of adaptive neuro-fuzzy inference system, integrated with Fuzzy C-Means data clustering; Grid Partition and Subtractive Clustering models to ascertain the quality of river water. Aboodi et al. (2018) assessed the water quality of Shatt Al-Arab River and its appropriateness for different needs near the Hartha and Najibia power plants through WQI method, organic pollution index, etc., during the summer and monsoon seasons. Miladenovic Ranisavljevic and Zerajic (2018) have used various models to compare WQI in the assessment of Danube River surface water quality in the Serbia. Moreover, Shooshtarian et al. (2018) used groundwater quality index based on fuzzy multi-criteria group decision-making models to monitor the impact of land-use changes on groundwater quality in the Iran.

The ANN method is a form of artificial intelligence, which by means of their architecture attempt to simulate and imitate the biological structure of the human brain and nervous system. A neural network consists of simple synchronous processing elements called as neurons that are inspired by biological nerve system (Malinova and Guo 2004). The general ANN network is composed of input, hidden or middle and output layers. In Multilayer Perceptron, the error function is alternative to the weighting coefficients. The ANN model requires high accuracy in technical components for design, development and expansion. Therefore, the input raw data are standardized and optimized for the prevention of the excessive declination in assigned weights. The normalized data are used to increase the processing speed and accuracy of the ANN’s performance. ANNs are usually a structure finalized by the designer and input data weights are automatically trained using an optimization algorithm like back propagation and Levenberg–Marquardt optimization algorithms (Huang and Huang 1990). Barzegar and Moghaddam (2016) investigated and compared the accuracy of three different neural network computing techniques viz., multilayer perceptron, radial bias and generalized regression in prediction of groundwater salinity of Tabriz plain (Barzegar and Asghari Moghaddam 2016). Further, three models were combined for improvement in the accuracy of target prediction using committee neural network. Darbandi and Pourhosseini (2018) studied the input combination of the models using Gamma test, MLP–ANN and hybrid multilayer perceptron (MLP–FFA) was used to forecast monthly river flow for a set of time intervals using observed data. Kiraz et al. (2018) used ANN model to predict the efficiency of Pb (II) adsorption and best results were obtained with ten neurons and proved that such a model could save time and cost. In addition, artificial intelligence-based models were used for the prediction of water quality parameters of the Karoon River, Iran by Emamgholizadeh et al. (2014). The efficiency of these models was evaluated based on coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE).

Yilma et al. (2018) developed ANN model and demonstrated its appropriateness for the prediction of CCME WQI in Akaki region of the Ethiopia. Salari et al. (2018) used ANN model for the quality assessment and characterization of physical and chemical parameters of potable water. CCME WQI model was used to assess the groundwater suitability for drinking and irrigation purposes in the Kadava river basin of Maharashtra (Wagh et al. 2017a). Moreover, ANN model is used to predicate the groundwater suitability for irrigation by considering irrigation indices in the Nanded tehsil of Maharashtra state (Wagh et al. 2016b). The Levenberg–Marquardt three-layer back propagation, traditional back propagation algorithm, resilient back propagation with and without weight algorithms, etc., were used to develop ANN model for prediction of nitrate content in groundwater of Kadava River basin of Nashik district of Maharashtra state (Wagh et al. 2017b, 2018c). Apart from this, in recent times, many researchers have been used ANN models in hydrology and soil-related studies in their respective regions (Hassan et al. 2018; Sihag 2018; Javdanian 2017; Sakizadeh 2016; Sharma et al. 2015).

The explicit use of ANN to develop projecting model for the prediction of WQI for Shivganga watershed is an application which has yet to be investigated (Kadam et al. 2017, 2018). Therefore, the present study is initiated by setting the objectives (i) to evaluate the groundwater quality for drinking by calculating WQI, (ii) to develop ANN and MLR models for prediction of WQI, and (iii) to compare ANN and MLR models to find the precise values of WQI.

Study area

The present study covers an area of 176 km2 located on the Eastern slopes of the Western Ghat region of Maharashtra state, India (Fig. 1). It is included in Survey of India (SoI) toposheet numbers 47F/15 and 47F/16 on 1:50,000 scale and lies between 73°44′1.131′′E and 73°56′17.941′′E longitude and 18°13′36.059′′N–18°24′7.466′′N latitudes. The proposed area is drained by River Shivganga which is a fifth order stream. The river flows towards the East for about 8 km and then abruptly changes its course to the South, having river total course of 27 km ultimately empties its water into the river Gunjawani. Shivganga watershed features a tropical monsoon climate. The study area receives maximum rain from the southwest monsoonal wind (June and September) and average annual precipitation is about 750 mm. The annual temperature ranges from 39 to 10 °C during summer and winter seasons. The study area includes 58 villages having around 0.07 million population. The area under investigation is bounded by high hill ranges made up of Deccan trap basaltic flows. The hill ranges show the presence of a number of peaks of more than 1000-m altitude. The highest peak (1316 m) is observed at Singhgarh plateau; whereas, the lowest point of 610 m is noted at the southernmost tip of the basin near the confluence of the river. About 30% area is lying above 910 m comprising of a hilly terrain. Moreover, around 16% area is lying between 835 and 910 m which possesses gently rolling topography; however, about 54% area lying below 835 m altitude is largely occupied by flat surface, which denotes a plane land of considerable areal extent with negligible relief (Kadam et al. 2018).

Fig. 1
figure 1

Shivganga watershed representing groundwater sample locations

The area under investigation is a part of Deccan Volcanic Province (DVP) comprising of poor to moderately weathered basaltic flows of Wai Subgroup. These basaltic flows exhibit multiple aquifers that are separated by thin impermeable tuffaceous layers known as red boles. The Wai Subgroup consists of five formations viz., Poladpur, Ambenali, Mahabaleshwar, Panhala and Desur are hierarchically from base to top, respectively. In addition, these formations are detached by marker Giant Plagioclase Basalts (GPBs) (Beane et al. 1986). The hydrogeological map (Fig. 2) of the study area is prepared by mapping the segments of streams, road cuttings and Ghat divisions with the help of GSI map (scale 1:250,000). Generally, groundwater in the area occurs within the weathered, amygdaloidal/vesicular, and jointed/fractured compact basaltic aquifers. The groundwater occurrence and availability is influenced by the amount of weathering, jointing, presence of interconnected vesicles and connection of vesicles with fissures and cracks. In the study area, thickness of weathered zone varies broadly up to 10 m bgl, even so, weathered and fractured trap exists in the topographic lows which appear to be the potential aquifer. Generally, the shallower weathered zones up to the depth of 18–22 m bgl form the phreatic aquifer. Alluvial patches along the river banks, flood plain areas and along second and third order streams are observed.

Fig. 2
figure 2

Hydrogeological map of Shivganga River

Materials and methodology

To ascertain the seasonal variations in the groundwater quality, a total 34 dug well samples were collected and analyzed for major ions from Shivganga watershed during pre-monsoon (May 2015) and post-monsoon (November 2015) seasons (Fig. 1). In this context, a random sampling method was employed for the collection of groundwater sample with due consideration to represent various land-use patterns, geomorphology and topography of the study area. The samples were collected in pre-cleaned polyethylene container of 1-L capacity. pH and EC were recorded in situ with handheld digital pH and EC meter. The sample location coordinates were marked using GPS (Garmin) and further exported to GIS software for preparation of the base map of the study area. The collected groundwater samples were brought to the laboratory and kept in refrigerator at 4 °C temperature until analyzed by following standard procedures of American Public Health Association (APHA 2005). The total hardness (TH), calcium (Ca2+), magnesium (Mg++) and chloride (Silver nitrate method) were determined by titrimetric procedures. Sodium (Na+) and potassium (K+) contents were analyzed using flame photometer (Systronic-130 model). While sulphate (SO4−−) by turbidometry method, phosphate (PO4−−) by SnCl2 method, and nitrate (NO3) by Brucine sulphate method were analyzed using UV–VIS double beam spectrophotometer. The groundwater quality for drinking was assessed by calculating WQI values for all groundwater samples by referring WHO (2011) drinking water standards. Thereafter, WQI prediction was made using artificial neural network (ANN) with training (70%) and testing (30%) data. To authenticate the precision and optimality of ANN model, the least error method (\(\in \cong 0\)) has been selected. Here, Levenberg–Marquardt three-layer back propagation algorithm is operated for prediction of WQI. Further, multiple regression analysis is performed to check the efficiency of ANN model. R 3.3.3v software is used to analyze the data with the various library functions viz., nnet, quantmod, devtools, NeuralNet tools and metrics. The methodology adopted for development of WQI, MLR and ANN model is shown in the flowchart (Fig. 3).

Fig. 3
figure 3

Flowchart of architecture of ANN and MLR models for WQI prediction

Calculation of WQI

Water quality index is helpful method for reflecting the composite weakness of water quality. In addition, it assists in characterizing the water quality to mark potable issues and enhances the convenience of protecting activities. The WQI was computed by assigning weight (wi) to each physicochemical parameter based on their relative significance in drinking water. In the present study, the physicochemical parameters viz., pH, EC, TDS, TH, Ca, Mg, Na, K, Cl, HCO3, SO4, NO3 and PO4 were considered for calculation of WQI. Initially, each physicochemical parameter was assigned with the weight (wi) in a scale of 1–5 based on its importance in human health and drinking suitability. Thus, maximum weight of 5 was assigned to TDS, SO4, NO3 and Cl owed to their considerable significance in drinking water quality. However, PH and EC were allotted by weight 4; TH, Ca, Mg with 3; Na and K as 2 and HCO3 and PO4 were given minimum weight of 1 due to the least importance in drinking water fitness (Yidana et al. 2010; Varol and Davraz 2014; Wagh et al. 2017c; Vasant et al. 2016; Şener et al. 2017). The WQI range and type of water have been classified and represented in Table 1. The statistical database of physicochemical parameters has been given in Table 2. The summary of WHO (2011) drinking standards, assigned weight (wi) and relative weight (Wi) of each physicochemical parameter are illustrated in Table 3. The relative weight (Wi) is calculated using the following equation:

Table 1 The range and type of water for WQI (Sahu and Sikdar 2008)
Table 2 Descriptive statistical summary of physicochemical parameter
Table 3 The assigned weights and relative weight of physicochemical parameters
$${W_i}=\frac{{{w_i}}}{{\sum\nolimits_{{i=1}}^{n} {{w_i}} }},$$
(1)

where Wi is the relative weight, wi is the weight/parameter and n is the number of parameters.

A quality rating scale (qi) for each parameter is calculated based on the following equation:

$${q_i}={\text{ (}}{C_i}/{S_i}{\text{) }} \times {\text{ }}100,$$
(2)

where qi is the quality rating, Ci is the chemical concentration/water sample (mg/L), Si is the WHO drinking water quality standard (mg/L).

SIi is the sub-index of ith parameter:

$${\text{S}}{{\text{I}}_i}={W_i} \times {q_i}.$$
(3)

The WQI is calculated by

$${\text{WQI }}=\sum {\text{S}}{{\text{I}}_i}.$$
(4)

Results and discussions

The analytical results show that the pH ranges from 7.44 to 8.38 in pre-monsoon (PRM) and 6.85–7.51 in post-monsoon (POM) samples and are within the permissible limit of WHO standards. The average EC values vary from 589.71 to 635.29 µS/cm for POM and PRM seasons, respectively and exceed the WHO desirable limit of 500 µS/cm. TDS values range from 196.10 to 832.60 mg/L in PRM and 163.5–590.60 mg/L in POM. It is observed that 23% PRM and 15% POM samples show high TDS concentration than desirable limit. The groundwater having high TDS content is probably from leaching and percolation of salts from the soil and certain prevailing anthropogenic inputs. TH is varying from 52 to 604 mg/L in PRM (average 238.78 mg/L) and 96–384 mg/L in POM (average 247.89 mg/l), the average value suggesting increased concentration towards POM owed to dissolution of minerals.

The cationic order is Ca2+ > Na+ > Mg2+ > K+ in PRM; where, rainfall is normal and Ca2+ > Mg2+ > Na+ > K+ in POM season in low rainfall condition. The Ca2+ values show wide variation from 8.02 to 120.24 mg/L and 12.83–79.78 mg/L, with an average value of 52.73 mg/L and 55.68 mg/L in PRM and POM seasons, respectively. These values were varying with monsoon as rainfall decreases in PRM and the number of samples above WHO limit increases. Thus, ~ 17% of the groundwater samples from PRM and only 6% samples of POM period are above the WHO limit (75 mg/L). The significant higher concentration of Mg2+ ranges from 7.80 to 74.07 mg/L in PRM and 2.92–54.58 mg/L in POM season. The mean value of Na+ is 24.94 and 17.91 mg/L in PRM and POM period. The average concentration of K+ value is 1.17 and 0.4 mg/L in PRM and POM seasons.

Anionic order in groundwater samples is HCO3 > SO42− > Cl > NO3 > PO42− in PRM and HCO3 > Cl > SO42− > NO3 > PO42− in POM seasons. The HCO3 concentration varies from 30 to 320 mg/L for PRM and 100–360 mg/L for PRM season. The elevated concentration of HCO3 in groundwater samples is owed by the host basaltic rock (Pawar et al. 2008). The average sulphate concentration is 42.17 mg/L and 17.9 mg/L in PRM and POM seasons. The sources of SO42 in groundwater are through dissolution/oxidation of sulphate minerals and anthropogenic inputs. In addition, chloride concentration in PRM samples is 19.1–248.5 mg/L, and POM season is 10.20–85.20 mg/L. The elevated content of Cl are symptomatic due to discharge of sewage effluent, decomposition of organic material and runoff from agricultural areas (Panaskar et al. 2016). The NO3 concentration ranges between 6.39 and 18.65 mg/L during PRM (average of 12.52 mg/L), and for POM season concentration ranges between 0.12 and 13.95 mg/L, with an average of 3.63 mg/L. It is observed that the high concentration of nitrate found in post-monsoon season owing to nitrogen complex fertilizers and percolation of surface water in certain wells due to reprehensible sealing of the dug well walls (Wagh et al. 2019). The PO4 values range from 0.02 to 0.32 mg/L in PRM and 0.07–2.90 mg/L in POM. The higher values in POM are mainly from agriculture return flow.

Assessment of the water quality using WQI

To ascertain the groundwater quality for drinking, WQI was calculated using Eqs. (1)–(4). The WQI values range from 25.75 to 129.07 and 37.54–91.38 in pre- and post-monsoon seasons, respectively. From Fig. 4, it is observed that only DW5 shows 129.07 WQI value indicating poor groundwater quality for drinking purpose mainly due to inputs from domestic and/or agricultural discharge. It is confirmed that groundwater samples from PRM and POM seasons are demonstrating excellent to good quality of groundwater for drinking purpose.

Fig. 4
figure 4

Plot of WQI values for pre- and post-monsoon seasons of 2015

Artificial neural network model (ANN)

The applicability of ANN was investigated to forecast WQI values in 34 dug wells from the study area. The performance results of the model with LM algorithm are provided in Table 4. The performance of the back propagation LM algorithm was evaluated by monitoring the error between modelled output and measured dataset. The number of neurons was optimized by keeping all other parameters constant for the output variable, i.e. WQI. The error decreased for the dataset with the appropriate number of hidden neurons. Here, after simulation the optimum 6 numbers of hidden neurons, it is found that the error does not change significantly in pre- and post-monsoon seasons. The optimal ANN structure contained 13 input variables with six hidden neuron and WQI as the output variable. Figures 4 and 5 represent the optimal neural network in pre- and post-monsoon seasons. The B1 and B2 are the bias occurred in the model and the number of iterations is helpful in its removal. Based on our proposed approach, the training was stopped at 40th and 60th iteration as error did not change significantly. Finally, at the 40th iteration the least error, i.e. 0.000086 was obtained. In the pre-monsoon season, the initial error (1.05) was high but after taking the number of iterations it has been lowered. Moreover, in the post-monsoon season, the initial error displayed was high, i.e. 9.06; but later on, after taking the number of iterations at the 60th iteration the least error, i.e. 0.000096 (Table 4). The obtained results reveal that initially the generated weight of the applied model is 91 in pre- and post-monsoon season (Figs. 5, 6). The variations measured are so diminutive; as a result, it proves as an effective tool in the assessment of WQI.

Table 4 Iteration results of the proposed ANN model
Fig. 5
figure 5

ANN structure for pre-monsoon season of 2015

Fig. 6
figure 6

ANN structure for post-monsoon season of 2015

Multiple linear regression model (MLR)

The MLR model is useful in discovering the association between various independent and dependent variables. The general form of the regression equations is according to flowchart 1. The WQI is a vector representing all location values of WQI, and α0,...,α13 are fixed but unknown constants. Also, pH, EC, TDS, TH, Ca, Mg, Na, K, HCO3, SO4, Cl, NO3 and PO4 are independent random variables used to foretell WQI by constructing regression equations using R software. The standardized regression coefficient (αj, j = 0, 1, 2, ..., 13; see flowchart 1) computes the change in independent variable and dependent variable, i.e. WQI. The standard error has been used to evaluate the stability of regression coefficients. The T test is executed to confirm the significance of means of water quality parameters and variance is assessed using F test (Sahoo and Zha 2013; Beaumont et al. 1984). Vigorous MLR models were assessed by coefficient of determination (R2). The adjusted R2 is interpreted in the identical way as the ‘R2’ values, except that the adjusted R2 takes into consideration the number of degrees of freedom. The coefficients of the model and significance of each parameter for pre-monsoon season 2015 are represented in Table 5. The standard error is 3.545e−10 on 20 degrees of freedom (DF), with R2 is 1 and adjusted R2 is 1. It is compared through ‘F’ value which is 7.079e+17 on 13 and 20 DF, and the ‘p’ value is less than 2.2e−16. It is simply elaborating the efficiency of the present model. However, in post-monsoon season of 2015, the coefficients of the model are represented (Table 6). The standard error is 3.699e−10 on 20 degrees of freedom, with multiple R2 is 1 and adjusted R2 is 1. It is compared through ‘F’ value which is1.453e+18 on 13 and 20 DF, and the ‘p’ value is less than 2.2e−16, elaborates the efficiency of the model.

Table 5 Coefficient of the MLR in pre-monsoon season 2015
Table 6 Coefficient of the MLR in post-monsoon season 2015

The predicted WQI values of ANN model gives conformity with the observed values and thus its representativeness is effective in predicting the WQI values. In this study, it is concluded that ANN is appropriate compared to the MLR model. ANN models counterpart convincingly fit quality. MLR modelling technique is based on the simple least square method; whereas, the ANN model imitates the functioning of the human being intelligence. The MLR technique having significant realistic reward is functioning much simpler and less time consuming (Sahoo and Zha 2013).

Conclusions

The main focus of this paper is to determine WQI to ascertain the groundwater suitability for drinking. The analytical results authenticated that EC, TDS, TH, Ca and Mg are beyond the desirable limits and rest of parameters are within safe limits of the WHO drinking standards. WQI results inferred that all the groundwater samples fall in good and excellent category except DW5 showing 129.07 WQI value. Hence, overall water quality in the Shivganga watershed is suitable for drinking. In this study, ANN and MLR models are used to find the accuracy of WQI for future prediction of water quality. The determination of WQI values are validated through ANN and MLR models. The MLR model exhibited that all other variables are found to be significant in the formation of regression model excluding chloride content. The neural network architecture comprising 13 contributing neurons, six hidden neurons and one output variable were used for WQI prediction. The obtained result of ANN model is well-accepted least error (\(\in \cong 0\)) in pre- and post-monsoon seasons of 2015. The comparison of ANN and MLR models suggests that the precision level is high in ANN model. Hence, the preference is given to ANN model due to its iterative moves to get more accurate results in both the seasons. Therefore, ANN model would become more beneficial in the prediction of water quality in future. Consequently, the MLR model can serve as an alternative and cost-effective tool for groundwater quality prediction in the circumstances, where trained expertise and time constraints and the field data are favourable. The ANN model is most preferred due to restrain time and exertions encumber of repetitive WQI grit, a modelling approach that can be employed fruitfully for adequate results in similar studies. This study recommends that there must be enough protection of water resources to preserve its quality in the present state. In addition, it helps for a long period of time to avoid any possible water contaminant in the future. The proposed ANN model can be further improved by more detailed meteorological and spatially distributed water quality data to exhibit the comprehensiveness of the proposed approach to research communities in the context of developing groundwater management or protection plan.