1 Introduction

İn recent years, water quality of major rivers, lakes and ponds in India has alarmingly deteriorated due to significant population increase leading to rapid urban development and industrialization. Increased anthropogenic activities including direct discharge of untreated industrial effluents, domestic sewage and agricultural waste have severely degraded the quality of surface water bodies. In India, the management strategies for cleaning up of rivers are often not optimally prioritized and therefore, spatiotemporal monitoring of pollution levels becomes essential to devise effective measures for reclamation of the degraded urban water bodies (Farhad et al., 2013; Abba et al., 2015). In situ measurement and monitoring of water quality at point locations is exhaustive and time taking (Song et al., 2012). Mathematical models integrated with geospatial techniques form a reliable time-saving solution towards controlling and sustainably managing the surface water resources (Mondal and Satpaty, 2020). Geospatial techniques offer uninterrupted scaled monitoring of several water quality parameters (WQPs) over large water bodies at spatiotemporal scales (Fulazzaky et al., 2010; Prabu et al., 2011).

In the last two decades, water quality monitoring of the urban water bodies has been the focus of research for researchers across the globe. The qualitative assessment of river water quality is carried out in terms of its physical, chemical and biological parameters and involves the analysis of complicated data matrix with large number of water quality attributes. Many studies concentrated on evaluating pollution levels in terms of individual WQPs, namely electrical conductivity (EC), turbidity, dissolved oxygen (DO), total dissolved solids (TDS), biochemical oxygen demand (BOD), chemical oxygen demand (COD), alkalinity, total suspended sediment (TSS), chlorophyll-a (Chl-a), and heavy metals such as Iron (Fe), magnesium (Mg), chromium (Cr) and lead (Pb) by utilizing remote sensing data in geographical information system (GIS) framework (Milanović Pešić et al., 2020; Nas et al., 2010; Sharma et al., 2018; Waxter, 2014; Yao et al., 2020). To reduce the number of WQPs in the analysis, a lot of consideration has been given to the development of single numerical indicators to ascertain the overall water quality trends with respect to the threshold limits. The water quality index (WQI) is a numeric indicator of the degree of severity in the quality of water for practical usage within the prescribed range and is computed by considering several significant quality parameters (Bordalo et al., 2006; Dunca, 2018; Markogianni et al., 2014; Mohamed et al., 2019; Said & Hussain, 2019; Sharaf, 2017; Sharma et al., 2018; Syahreza et al., 2012; Zhu, 2013). To classify the degree of severity, WQI is grouped into broad classes, i.e., excellent, good, moderate, poor, etc. For assessing the quality of any water body, numerous water quality indices have been proposed. Most commonly utilized WQIs are weighted arithmetic index method (Brown et al., 1970), national sanitation foundation water quality index (NSFWQI) (Hoseinzadeh et al., 2014), overall index of pollution (OIP) (Sargaonkar & Deshpande, 2003), etc. The OIP furnishes an in-depth understanding of the water quality status of the surface water sources, especially under Indian conditions (Sargaonkar & Deshpande, 2003). Remote sensing of water quality involves visible and infrared portion of the electromagnetic spectrum to explore the sensitivity of spectral band combinations by utilizing advanced computing techniques. Several data-driven approaches have been implemented to quantify the relationship between actual and modeled WQPs for qualitative modeling of water quality and requires input data, model parameters, and other relevant information (Bordalo et al., 2006). Many studies employed statistical approaches to explore linear correlations, such as MLR, logarithmic relation and exponential relation, while others concentrated on more efficient, nonlinear analytical methods, viz. artificial neural network (ANN), genetic programming (GP), group method of data handling (GMDH), GEP, etc., in conjunction with geospatial techniques (Akbal et al., 2011; Avdan et al., 2019; Boyacioglu, 2010; Chapagain et al., 2010; Hussain et al., 2008; Lotfinasabasl et al., 2018).

In recent years, ANN modeling has been widely utilized to quantify the severity of water quality issues due to its fast training process and ability to solve linear and nonlinear complex problems (Bonansea et al., 2015; Nasri, 2010; Nathan et al., 2017). Many studies utilized the BPNN and radial basis function (RBF) neural network for evaluating water quality and provided favorable outcomes through modeling complex nonlinear response functions, such as spectral reflectance values and WQP estimates (Ekercin, 2007; Gürsoy & Atun, 2019; Marquez et al., 2018; Zhang et al., 2003; Zhao et al., 2014). In river management programs, ANNs have effectively been used to evaluate the WQI levels to simulate wetland processes (Reynolds & Maberly, 2002; Kuo et al., 2007; Li et al., 2009; Song et al., 2012; Wang et al., 2012). Chu et al. (2013) developed ANN model that could effectively predict the quality of the surface water bodies and introduced the factor analysis technique to identify significant water quality parameters. In another study conducted by Hafeez et al. (2018), four machine learning approaches, namely artificial neural network (ANN), random forest (RF), cubist regression (CB) and support vector regression (SVR), were compared for retrieval of water quality indicators (i.e., Chl-a, SS and turbidity) over the coastal waters of Hong Kong by employing water reflectance values acquired from hand-held spectroradiometer and satellite data. Results revealed ANN as the best performer than other three approaches. More recent studies conducted by Wang et al. (2019, 2020) inferred deep learning process as a promising tool for formulating environmental property prediction models for screening of green solvents. Several studies successfully applied GEP, along with GP, to a variety of water resources issues (Azamathulla & Ghani, 2011; Ghavidel & Montaseri, 2014; Liu & Wang, 2019; Zakaria et al., 2010). Furthermore, these techniques have been considered as substantial tools in solving complex environmental and river engineering problems (Aras et al., 2007; Chen et al., 2008; Mohammadpour et al., 2015). Ni et al. (2012) effectively evaluated the water fluctuations in the wetlands by utilizing the GP approach. Xu and Qin (2013) measured the agricultural water quality through the combined application of GA and fuzzy simulation. In a significant study by Martí et al. (2013), comparison of three approaches, namely ANN, GEP and MLR for estimation of outlet dissolved oxygen in micro-irrigation, was carried out, and the outcomes revealed GEP as the most effective approach. In a recent study carried out by Li and Wang (2019), a reliable turbidity model was developed to predict reservoir turbidity based on Landsat-8 satellite imagery by utilizing an MLR and GEP approach. Results revealed GEP to be more rational and accurate for turbidity simulation. Quantification of pollution levels in water bodies during the lockdown period worldwide forms a crucial aspect for researchers to interpret the short and long-term effect of the coronavirus disease 2019 (COVID-19) on the river dynamics. It has been reported in few recent studies that the pollution level has exceedingly reduced and most water bodies have completely been restored (Clifford, 2020; Häder et al., 2020; Stone, 2020).

Kali River, a major source of irrigation in western Uttar Pradesh, India, has completely deteriorated due to ever increasing disposal of municipal and industrial waste from adjoining cities. Some earlier studies suggested the river water quality as safe for irrigation purposes, whereas later studies revealed river water to be severely polluted with heavy metal concentrations exceeding far beyond the permissible limits (Mishra et al., 2015; Maurya & Malik, 2016). The Kali River has been identified as the most critically contaminated after Markanda River (in Haryana State) in terms of BOD levels (CPCB, 2012). Spatial monitoring of the water quality of Kali River by employing reliable data-driven approaches is a prerequisite to conserve and manage the river restoration process. Therefore, the main objective of the study is to evaluate and map WQI estimates along a 6-km-long stretch of the Kali River passing through the Aligarh district in Uttar Pradesh, India, by utilizing high-resolution IRS P6 LISS IV imagery. Eleven spectral reflectance band combinations were formulated to identify the most significant band combination associated with the observed WQI at the sampling locations. Three approaches, namely MLR, BPNN and GEP, were employed to relate WQI as a function of most significant band combination. The performance of three approaches was assessed by via quantitative indicators such as coefficient of determination (R2), root mean square error (RMSE) and mean absolute error (MAE). A one-way ANOVA (analysis of variance) test was also performed to assess significant differences among WQI estimates from the three approaches at a confidence level of 0.05. Maps depicting spatial variation of WQI levels in the river stretch were generated in GIS framework. The present study configures the basis for policy makers and environmentalists to devise effective and sustainable strategies and policies to reclaim the completely degraded river ecosystems.

2 Materials and methods

2.1 Study area

The study area, illustrated in Fig. 1, covers 6-km-long stretch of Kali River (meaning “black” in the local language) that passes through the Aligarh district in Uttar Pradesh, India. Study area is confined within latitude 28.11°N to 28.15°N and longitude 78.14°E to 78.18°E at an elevation of 213 m above the mean sea level. The river had been a major source of water for domestic as well as irrigation requirements in the past two decades. The Kali River originates from the village of Antwada, in the Muzaffarnagar district, Uttar Pradesh, passes through many important cities and joins the Ganges River at the city of Kannauj in the Farrukhabad district. The river covers a total span of almost 300 km. Large cities, including Meerut, Hapur and Bulandshahr, accommodate numerous small- and large-scale industries along the river banks, such as sugar mills, paper mills, textile industries, slaughterhouses and distilleries. The current status of the river justifies its name, owing to the excessive discharge of domestic sewage and untreated industrial effluents into the river thus, conveying more than 60 per cent of the pollution load (CPCB, 2012). Over the years, the river has completely transformed into a highly toxic flow of chemicals, harmful for human consumption, and offers a restricted use for irrigation or any other purpose. Toxic water from the Kali River is widely consumed for fulfilling the irrigation requirements of surrounding areas. The present condition of the river is pity and demands immediate attention for its reclamation.

Fig. 1
figure 1

Location map of the study area (map not to scale)

2.2 Data collection and analysis

River water samples were collected from the midstream at a depth of 0.5 m on April 27, 2018, concurrent to the date of satellite overpass. Grab sampling procedure was adopted for the analysis of various WQPs as recommended by the standard methods of analysis (APHA, 1998). Water samples from the Kali River were analyzed in the laboratory of Environmental Engineering, Civil Engineering Department, AMU, Aligarh, and the WQI for each sampling location was estimated from 11 physicochemical parameters and heavy metals, namely pH, EC, DO, TDS, BOD, COD, alkalinity, Fe, Mn, Cr and Pb. The heavy metal concentration was measured by adopting American Society for Testing and Materials (ASTM, 2000) procedure involving the digestion of water samples with concentrated HNO3 and employing an atomic absorption spectrophotometer (AAS).

WQI values for 40 water samples were computed by following a three-step procedure (Water programme, 2007). The first step assigns weight (wi) to all the WQPs ranging from 1 to 5 in accordance with their relative significance towards the overall quality grading of the water for irrigation purposes. The relative significance among WQPs was decided on the basis of collective expert opinions taken from different published studies (Ramakrishnaiah et al., 2009; Nabizadeh et al., 2013; Suneetha et al., 2015). The highest weight value, i.e., 5, was assigned to two heavy metals, i.e., Pb and Cr, on account of their prominence towards rendering severity to the water quality. Lower rank of 1 was assigned to pH, and 2 was assigned to COD and BOD. Ranks 3 and 4 were appropriately assigned to alkalinity, TDS, DO, EC, Fe and Mn on the basis of their relative severity (Srinivasamoorthy et al., 2008). The second step computes the relative weight (Wi) as per the equation below.

$$W_{i} = {\raise0.7ex\hbox{${w_{i} }$} \!\mathord{\left/ {\vphantom {{w_{i} } {\mathop \sum \nolimits_{i = 1}^{n} w_{i} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${\mathop \sum \nolimits_{i = 1}^{n} w_{i} }$}},$$
(1)

where Wi is the relative weight, wi is the individual parameter weight, and n is the number of parameters. In the third step, a quality rating scale (qi) for each parameter was evaluated by dividing its concentration levels for every water sample by its corresponding standard concentration, as per the Bureau of Indian Standards (BIS, 1986).

$$q_{i} = {\raise0.7ex\hbox{${C_{i} }$} \!\mathord{\left/ {\vphantom {{C_{i} } {S_{i} }}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{${S_{i} }$}} \times 100,$$
(2)

where qi is the quality rating in percent, Ci is the concentration of each chemical parameter in each water sample in mg/L, and Si is the irrigation water quality standard for each chemical parameter in mg/L. Finally, the WQI for each sampling location was computed as per Brown et al. (1970) expressed as Eq. 3, where SLi is the product of Wi and qi.

$${\text{WQI}} = \mathop \sum \limits_{i = 1}^{n} SL_{i}$$
(3)

The WQI values corresponding to the sampling locations were evaluated by following the above procedure and scaled for quality rating in accordance with BIS (1986) specifications, provided in Table 1.

Table 1 WQI and corresponding water quality rating as per the BIS (1986) specifications

2.3 Remote sensing data used

Image from IRS P6 Resourcesat-2 LISS IV sensor of April 27, 2018, was utilized in the present study for evaluating and mapping the water quality of Kali River in terms of WQI measures. The study area was delineated, and a subset image was created using the Erdas Imagine software, shown in Fig. 2. IRS LISS IV sensor produces a high-resolution multispectral image in three bands (i.e., green, red and near Infrared) with 5.8 m spatial resolution in the multispectral mode at nadir. Corresponding to the sampling locations, pixel values with reference to digital numbers (DN) from three spectral bands were extracted and converted into physical quantities (e.g., radiance) and then into spectral reflectance. The process takes into account the terrain and atmospheric corrections. The conversion involved the utilization of the radiometric “gain and offset” extracted from the image metadata and employed Eqs. 4 and 5 for radiance and reflectance, respectively, proposed by Chander and Markham (2003)

$$L_{\lambda } = {\text{ Gain}}_{\lambda } \times {\text{DN}}_{\lambda } + {\text{offset}}_{\lambda } ,$$
(4)

where λ is the specific spectral band of the image; Lλ is the spectral radiance for band λ at the sensor’s aperture (mW/cm2/µm/str); gainλ is the radiometric calibration gain (mW/cm2/µm/str/DN) for band λ from product metadata (gain values for three bands were considered: G = 52, R = 47 and NIR = 31); DNλ is digital number value for band λ of the image; and offsetλ is the radiometric calibration (mW/cm2/µm/str) for band λ from product metadata, which is zero for the three bands

$$\rho_{{\text{P}}} = \frac{{\pi \times L_{\lambda } \times d^{2} }}{{E_{{{\text{SUN}}}} \times {\text{Cos}} \theta_{{\text{S}}} }},$$
(5)

where ρP is the dimensionless planetary reflectance, d is the Earth–Sun distance (astronomical units, 1 − (0.01674 cos (0.9856 (JD-4)))2, where JD is Julian Day), ESUNλ is the average solar exo-atmospheric spectral irradiances (mW/cm2/µm) at 1 astronomical unit (AU) distance between the Earth and Sun, θs is the Sun’s zenith angle (~ 67.337461° from product metadata), and Lλ is the spectral radiance for band λ at the sensor’s aperture (mW/cm2/µm/str).

Fig. 2
figure 2

Subset image of study area with sampling locations along the river stretch

2.4 Modeling approaches

2.4.1 Multiple linear regression (MLR)

MLR analysis predicts the unknown variable from two or more known variables that are termed as the predictors. In other words, a multiple regression analysis aids in predicting the Y value for given X1, X2, …, Xk values. The multiple regression equation of Y with known X1, X2, …, Xk is given by

$$Y = b_{0} + \, b_{1} X_{1} + \, b_{2} X_{2} + \cdots + b_{k} X_{k} ,$$
(6)

where b0 is the intercept and b1, b2, b3, …, bk are the regression coefficients that correspond to the slope in a linear regression equation. An MLR was employed to examine the most appropriate formulated spectral reflectance band combination, producing WQI estimates with high R2 values and low RMSE and MAE values.

2.4.2 Artificial neural network (ANN)

The feed forward backpropagation neural network (FF-BPNN) algorithm looks for the least error function in weight space by employing the gradient descent method. The learning process resolves the complexity of the problem through randomly assigning weights that produce the least error function. The entire process is executed in two phases. In the first phase, assigned weights to the network architecture are initialized randomly to propagate forward, along with input data, to compute the target value. In the second phase, the error between the actual and estimated targets is compared and the error value that is higher than the threshold value is rolled backward through the network. The weight values are recalculated, and the process is continued until the minimum error is attained. During the training process, the errors for both training and testing data decrease with number of iterations until a constant minimum error value is attained. Training is stopped at a point when, the least difference between training and testing data errors is observed so as to avoid overtraining of the network (Said et al., 2008). The most general neural network architecture consists of three layers, i.e., input, hidden and output layers, as illustrated in Fig. 3.

Fig. 3
figure 3

Neural network architecture with input variables as bands/band combinations and WQI as target variable

Every unit in a layer is connected with units in the adjoining layer with a unique weight value. Variables in the input layer, along with connected weights, propagate to every unit of the next hidden layer. The end product of every unit forming an output is compounded with weights of preceding connecting units and is advanced to the successive layer before finally being subjected to the sigmoid activation function. The output value from the jth unit of layer m is represented as

$$6O_{j}^{m} = S\left( {l_{j}^{m} } \right),$$
(7)

where S is the sigmoid activation function, as proposed by Rumelhart et al. (1986), and

$$f\left( x \right) = \frac{1}{{1 + {\text{exp}}\left( { - x} \right)}}.$$
(8)

The function f(x) acquires values from zero to unity for the entire range of inputs; x is the input value, viz. \(l_{j}^{m}\) obtained for of layer m, as

$$l_{j}^{m} = \sum O_{i}^{m - 1} w_{ij} + b_{j}^{m} ,$$
(9)

where \(b_{j}^{m}\) is the threshold value of the jth unit of layer m. \(O_{j}^{m - 1}\) and \(w_{ij}\) are the outputs of the ith unit of layer m − 1 and the weight of the connection between ith and jth units of layers m − 1 and m, respectively. The error function is expressed as

$$E = 0.5\mathop \sum \limits_{k} \left( {T_{k} - O_{k} } \right)^{2} ,$$
(10)

where Tk is the desired target value and Ok is the corresponding output value determined for k training samples.

2.4.3 Gene expression programming (GEP)

GEP, proposed by Ferreira (2001), is an evolutionary technique that has the advantage of solving complex nonlinear problems based on the GP approach developed by Koza (1999). GEP is an improved version of GA and GP that overcomes premature convergence and a 100 times higher evolution rate. GEP undergoes a continuous evolution process with the random propagation of an initial population comprising of individual chromosomes of predefined length containing one, or more than one, gene. The structure of genes comprises a head and a tail. The head consists of both functions and terminals, whereas the tail holds only terminals. For reaching an optimal solution to the defined problem, the head length h is selected; further, the tail length t is related to h, and the function is evaluated by using Eq. 11 below:

$$t = h \times \left( {n - 1} \right) + 1,$$
(11)

where n is the number of arguments of the function. Ferreira (2001) represented the encoded genetic information in the gene in the form of an expression tree (ET). With the help of the unequivocal Karva language, the gene composition of a given ET can be generalized on the basis of simple rules of top–down and right–left (Li and Wang, 2019). An example of a gene is shown in the form of an ET in Fig. 4, for which an equivalent mathematical expression is encoded as [(b × a) × (b + a)] + [(a/b) × (b − a)].

Fig. 4
figure 4

An example of gene ET

The fitness of every chromosome i in the initial population is computed by utilizing the fitness function fi expressed as Eq. 12, proposed by Ferreira (2001).

$$f_{i} = \mathop \sum \limits_{j = 1}^{{C_{i} }} \left( {M - \left| {C_{ij} - T_{j} } \right|} \right)$$
(12)

where M is the selection range, Ci,j is the value recalled by the ith chromosome for the jth fitness case, and Tj is the target value for the jth fitness case. It is to be noted that, for a perfect fit, Ci,j = Tj and fi = fmax = Ct × M. Fitness function resolves the selection of the optimal chromosomes for the next generation level through modifications achieved by genetic operators such as mutation, inversion, transposition and recombination.

Mutation is the most effective genetic operator that represents the probability of a function or a variable (symbol) to get mutated in each generation. Any symbol in the gene heads can be replaced by a terminal function; however, in the gene tails, terminals can be replaced by variables only, since there is no function in the tail. Inversion chooses a random starting as well as ending symbol in a gene, which is then reversed in order. Transposition involves actuating a sequence of symbols from one position to another within a gene or from one gene to another gene in the same chromosome. In a recombination stage, two new chromosomes are developed by the exchange of genetic information through random selection. The process is analogous to the breeding of two biological species that produces a new offspring sharing genetic material from both parents. Figure 5 illustrates the generalized process of GEP model building in the form of a flowchart.

Fig. 5
figure 5

Flowchart illustrating the process of GEP model building

Table 2 depicts 11 spectral reflectance bands/band combinations (including three inherent single bands, i.e., green, red and infrared) formulated to explore the most significant band combination related to the observed WQI estimates. As described in the preceding sections, WQI estimates as a function of most sensitive spectral band combination were examined via three approaches and the performance were compared using R2, RMSE and MAE (quantitative indicators). Out of 40 data samples in total, 80% were used for training and testing or calibration and the remaining (20%) were used for validation. Neural network architectures were developed in accordance with the band combinations, i.e., 2, 3, 4, 5 and 6 spectral bands/combinations as input variables. The same band combinations were analyzed for MLR and GEP approaches, keeping WQI as target variable. For BPNN and GEP analysis, the entire data set was normalized to lie within 0 to 1 range by using Eq. 13 below (Rajurkar et al., 2004), to ensure that data are logically structured and proportionally scaled.

$$X_{{{\text{norm}}}} = 0.1 + 0.8 \times \left( {\frac{{X_{i} }}{{X_{{{\text{max}}}} }}} \right),$$
(13)

where \(X_{{{\text{norm}}}}\) is the normalized, unitless variable; Xi is the observed variable; and Xmax is the maximum value in the data range. The optimal count of neurons in the hidden layer was ascertained by a hit-and-trial procedure. The learning rate for BPNN was gradually varied within the defined range of 0.01 to 0.5. The final values of the learning rate and the optimum count of neurons in the hidden layer obtained by the trial process are provided in Table 2.

Table 2 Formulated band combinations with details of BPNN architectures

Further, for building the optimal GEP model, the number of chromosomes or population size after many trials was selected as 50, the gene head length was selected as 14, and the number of genes per chromosome was selected as 8. Seven necessary function operators, i.e., + , − , × , ÷ , 1/a, − a, a2, were adopted for building the simplified GEP model with a reduced iteration process as well as nonconvergence occurrences. Furthermore, subgene ETs were linked by an addition function. The parameters adopted for the optimal GEP model for precise evaluation of WQI levels are illustrated in Table 3.

Table 3 Parameters adopted for the optimal GEP model

3 Results

In situ water samples collected were analyzed for 11 WQPs in the laboratory, and the basic descriptive statistics of the samples are summarized in Table 4. The physicochemical and heavy metal concentrations ranged far beyond the permissible limits prescribed under BIS specifications, although there were no traces of Cr and Mn in all the measured samples. The WQI values computed from nine WQPs (excluding Cr and Mn) for 40 water samples collected along the Kali River stretch ranged between 203.7 and 262.33, and rated under “very poor” category on the basis of BIS criteria provided in Table 1. The WQI range indicates restricted use of river water almost for all purposes including irrigation. The results of the WQI estimates from the three employed approaches, i.e., MLR, BPNN and GEP, are illustrated in Table 5.

Table 4 Descriptive statistics of the measured WQPs
Table 5 Coefficient of determination (R2), RMSE and MAE between the observed and estimated WQIs from three approaches

Results from the MLR analysis indicate that, out of 11 band combination cases analyzed, a combination of 4 bands, i.e., G, R, NIR and G/R (band combination case no. 5), exhibited strong correlation with the observed WQI yielding R2 ~ 0.81 and low RMSE and MAE values (i.e., 4.36 and 4.64, respectively) for calibration data. However, the same band combination yielded WQI estimates with R2 ~ 0.6, and relatively high RMSE and MAE values (i.e., 6.3 and 4.64) for validation data. Regression coefficients for the most significant band combination are provided in Table 6, and the formulated regression equation is expressed as Eq. 14. Scatter plot between the observed and estimated WQI for calibration and validation data is illustrated in Fig. 7(a), depicting estimated values of the WQI within ± 20% error lines. The regression equation formulated for the most significant band combination was utilized in the generation of spatially distributed WQI map of the river segment.

$${\text{WQI}} = - 183.98 + \left( {2309.744 \times {\text{GREEN}}} \right) + \left( {297.18 \times {\text{RED}}} \right) + \left( {200.93 \times {\text{NIR}}} \right) + \left( {35.84 \times \frac{{{\text{GREEN}}}}{{{\text{RED}}}}} \right)$$
(14)
Table 6 MLR coefficients for calibration data for the most appropriate band combination

Neural network architectures for all band combinations were trained using the TRAINGD function and FF-BPNN algorithm. Optimal architectures were obtained during the training process by adopting the number of neurons in the hidden layer from 2 to 10 and varying the learning rate in the defined range of 0.001 to 0.5. It was observed that neural network architectures trained with 3, 4 and 6 neurons in the hidden layer yielded much better WQI estimates in terms of R2, RMSE and MAE values (Table 6).

Results further reveal that neural network architecture trained with 3 input bands, i.e., G, R and NIR, and 4 neurons in the hidden layer (i.e., 3-4-1) produced WQI estimates with highest accuracy than the rest of combinations, yielding R2 ~ 0.95 and 0.87, RMSE as 2.36 and 4.48, and MAE as 2.15 and 3.61 for calibration and validation data, respectively. Scatter plot between the observed and estimated WQIs as shown in Fig. 7b depicted WQI estimates within ± 10% error lines. It was also observed that almost all neural network architectures with different band combinations conceded WQI estimates with considerable accuracies for calibration data, i.e., R2 ranging from 0.92 to 0.79, respectively. Table 7 illustrates the final weight matrix for the most optimal neural network architecture (i.e., 3-4-1) producing highest WQI retrieval accuracies.

Table 7 Final weight matrix of the trained BPNN model with 3 input variables

The optimal GEP model was achieved through many trials (Table 6), comprising a chromosomal architecture with 50 chromosomes, head length at 14 and number of genes at 8, and 4 spectral bands as input, viz. G, R, NIR and G/R (band combination case no. 5). The optimized GEP model produced WQI estimates with considerably high accuracies, yielding R2 ~ 0.94 and 0.91, RMSE as 2.49 and 4.45, and MAE as 2.16 and 3.53 for calibration and validation data, respectively. As observed from the results, GEP model performs substantially well with validation data as compared with BPNN and MLR models, thus indicating significant rationality in the optimized GEP model. The optimal GEP model constitutes four subordinate expression trees (i.e., sub-ET1, sub-ET2, sub-ET3 and sub-ET4), developed in accordance with the selection of the number of input variables and function operators during model-building process. Sub-ETs were linked together by an addition function to finally form the mathematical expression that was further simplified to obtain more generalized form for estimating the WQI, expressed as Eq. 15. The developed subgene ETs are shown as in Fig. 6, and a scatter plot between the observed and estimated WQI is shown in Fig. 7c, depicting estimated WQI values within ± 10% error lines.

$$\begin{gathered} WQI = \left( {\frac{{1.86 \times 10^{ - 3} }}{{1.31 \times 10^{ - 1} - GREEN}}} \right)\quad \quad \,{\text{Sub}}\,{\text{ET - 1}} \hfill \\ \quad \left[ {RED\left( {477.4 - \frac{GREEN}{{RED}}} \right)} \right] + \quad \quad {\text{Sub}}\,{\text{ET - 2}} \hfill \\ \quad \left( {426.2NIR - GREEN} \right) + \quad \quad \quad \;\;{\text{Sub}}\,{\text{ET - 3}} \hfill \\ \quad \left[ {\left( {130.5\frac{GREEN}{{RED}}} \right) - 27.78} \right]\quad \quad \;\;{\text{Sub}}\,{\text{ET - 4}} \hfill \\ \end{gathered}$$
(15)
Fig. 6
figure 6

Expression trees for the optimal GEP model with 4 spectral bands

Fig. 7
figure 7

Scatter plots between observed and estimated WQIs from a MLR approach for band combination 5; 4 inputs, b BPNN approach for band combination 4; 3 inputs and c GEP approach for band combination 5; 4 inputs

4 Discussion

The severe contamination of River Kali stretch assessed through the laboratory analysis of 11 physicochemical parameters and heavy metals as well as WQI estimates is mainly attributed to the unrestricted toxic waste disposal from numerous small- and large-scale industries. Although several studies on Kali River water quality have predicted the similar outcomes (CPCB, 2012; Mishra et al., 2015; Singh et al., 2020; Sirohi et al., 2014), a comprehensive monitoring of WQI levels by formulating spectral band combinations has been lacking. Results from the three approaches further reveal that GEP outperforms the other two approaches in terms of WQI estimates for validation data (i.e., R2 ~ 0.91, 0.87 and 0.60; RMSE ~ 4.45, 4.48 and 6.30 for GEP, ANN and MLR, respectively), suggesting a higher measure of explanatory power possessed by this approach. Moreover, the GEP approach is simple and produces reliable WQI measures and reduces substantial time and effort by optimizing the computations to generate simplified prediction expressions. This technique is highly recommended by many researchers (Hashmi et al., 2011; Mohammadpour et al., 2016; Liu & Wang, 2019) for the water quality evaluation of wetlands and other surface water bodies. In addition, the ANN approach is relatively time-consuming and does not furnish any governing equations of the optimized models, which is considered as one of its major disadvantages. The WQI estimates predicted by MLR model were of insufficient accuracy when tested with validation data, since this approach utilizes the method of least squares and is linear in nature. However, MLR is still practicable for its fast predicting ability. Figure 8 depicts comparative line plots of WQI estimates for calibration and validation data, along with the observed WQI measures.

Fig. 8
figure 8

Comparative line plot of observed WQI and estimated WQI from the three employed approaches

The contamination levels throughout the Kali River stretch exhibited consistency which lead to similar spectral distribution of remotely-sensed signal above the water surface. Therefore, WQI maps created in the GIS framework (Fig. 9) from the three approaches corroborate to the actual severity in WQI levels, exhibited by the darker spectral tones covering the entire length of the river segment. This severe contamination in the river is majorly attributed to the addition of industrial effluents, agricultural runoff, natural matter and nutrients in the water body (Jindal & Sharma, 2011).

Fig. 9
figure 9

WQI maps of the river stretch generated from a MLR, b BPNN and c GEP analysis

A one-way ANOVA test for means and variance was applied to further ascertain the spatial variability of WQI estimates from the three approaches. The null hypothesis “H0” stated “no significant difference between means of WQI estimates from the three approaches,” whereas alternate hypothesis “Ha” stated “significant difference between means of WQI estimates from three approaches.” The test results unveiled F-statistic (i.e., F = 0.01 and p-value, i.e., p = 0.994) as exceedingly higher than the significance level α = 0.05 (Table 8), implying that there were no critical differences in the mean values and variances of WQI estimates. The ANOVA test results therefore fail to reject the null hypothesis inferring that the WQI estimates from the three approaches are statistically “not significant.” The data set may, however, be consistent with the differences of practical importance. Moreover, failing to reject the null hypothesis does not necessarily imply that no potential difference in the data set exists, rather; an increased sample size could bring out the difference. Thus, larger sample sizes allow hypothesis tests to detect effects that are statistically significant. Further, to visually summarize and compare the results, box plot of WQI estimates shown in Fig. 10, were analyzed. İt was observed that, the respective medians of each box plot laid at the same level (i.e., 233.23 for GEP, 232.94 for ANN and 231.96 for MLR) suggesting no likely difference between the three estimated WQI groups. The median line of the three box plots further indicates symmetric data representation with no right or left skewness within each of the three WQI groups. Upon comparing the interquartile ranges, the relatively longer box corresponding to MLR revealed slight dispersion in WQI estimates.

Table 8 One-way ANOVA test for WQI estimates from the three employed approaches
Fig. 10
figure 10

Boxplot of WQI estimates from the three employed approaches

Overall comparison of the results indicate that GEP is much superior to MLR and ANN approaches. Furthermore, despite the restrictive spectral resolution of IRS P6 LISS IV sensor (i.e., comprising three bands), a combination of 4 bands (i.e., G, R, NIR, G/R) is identified as the most effective for modeling WQI levels through GEP approach. The methodology adopted and the WQI maps generated can be of immense help in the decision making to impose corrective conservation measures for improvement in the Kali River water quality so that the river may regain its historical importance. Moreover, the methodology can be implemented to other contaminated surface water bodies to generalize the GEP model prediction ability.

5 Conclusions

The present study evaluates WQI levels along 6-km-long Kali River segment from three approaches, namely MLR, BPNN and GEP, by utilizing spectral reflectance values from high-resolution IRS P6 LISS IV image. The water samples were collected from 40 random locations along the river stretch and analyzed for seven physicochemical and four heavy metal concentrations (i.e., 11 WQPs in total). All measured WQP concentrations ranged beyond the permissible limits as per BIS specifications, except for Cr and Mn, that were found to be absent in the water samples. Further, the WQI values computed from nine WQPs were found to range between 203.7 and 262.33, thus, designating the river condition as unfit for all purposes. Eleven spectral reflectance band combinations (including three inherent single bands) were considered to explore the sensitivity of the most significant band/band combination with the observed WQI. The analyses of the results revealed that GEP approach outperformed both BPNN and MLR approaches with considerably high WQI retrieval accuracies, yielding R2 ~ 0.94 and 0.91, RMSE as 2.49 and 4.45 and MAE as 2.16 and 3.53 for calibration and validation data, respectively. Results further revealed that both GEP and MLR approaches identified the combination of 4 spectral bands (i.e., G, R, NIR, G/R) as the most significant band combination for estimating WQI levels, whereas BPNN recognized 3 band combination (i.e., G, R, NIR) as the most significant. The results are also suggestive of the fact that machine learning approaches, viz. ANN and GEP, yield promising potential for water quality monitoring by utilizing spectral band combinations, wherein GEP proved to be superior. The ANOVA test revealed statistically insignificant difference among WQI estimates from the three approaches at a confidence level of 0.05, attributed to small river water sample size. The spatial distribution maps of WQI levels exhibited uniform spectral tones in the entire river stretch, signifying the severity of pollution concentrations in the river water. The study showcases the river condition as extremely critical, requiring immediate attention of the decision makers involved in the task of its reclamation. Future research can be focused on using hyperspectral satellite data along with integrated approaches such as fuzzy optimal model, GP, support vector machine (SVM) and RBF along with an increased water sample size.