Introduction

River water quality (RWQ) is an extremely delicate and fundamental issue in numerous nations. Likewise, it is a great need to evaluate and describe the expanded comprehension of the consequentiality of RWQ for the purport of drinking, bathing, wildlife, fisheries, irrigation, and industrial usage. The investigation includes water quality (WQ) data to demonstrate the absolute impact of natural factors on surface water, which gives the information of its quality (Ewaid et al. 2018; Sotomayor et al. 2018; Bhatti et al. 2019). The river water is as yet utilized for domestic and industrial purposes (Fathi et al. 2018). The water nature of the river under mundane conditions is affected by sundry variables, i.e., geography, topography, atmosphere, populace, and anthropogenic elements. Other human impedances are the development of dams and repositories, channelization, industrialization, urban spread, and land use advancements all through the river basin (Wang et al. 2013; Zhang et al. 2019). Anthropogenic activities and natural procedures destruct river water and impede their utilization for agribusiness, drinking, regalement, and other different purposes (Mukate et al. 2019; Verma et al. 2019).

So, the WQ observing system is a fundamental requisite to water assets. The water quality index (WQI) approach has been applied for assessing the WQ of the surface as well as the ground water sources in the worldwide since the last few decades (Effendi 2016; Bhutiani et al. 2016; Bora and Goswami 2017; Ewaid et al. 2018; Verma et al. 2019). The principle motivation behind building up a WQI is to change an intricate arrangement of WQ data into clear and usable information by which a layperson can identify the status of the water source (Abbasi and Abbasi 2012; Şener et al. 2017; Ewaid et al. 2018; Verma et al. 2019). WQI aims to minimize the vast datasets to a great extent (Effendi 2016) and simplifies the interpretation of WQ for several purposes like drinking, irrigation, and aquaculture (Abbasi and Abbasi 2012).

The WQI is still utilizable to exhibit the quality of the river basin for low-cost water quality management (Wu et al. 2018; Tian et al. 2019; Tripathi and Singal 2019; Banda and Kumarasamy 2020). Several indices and modelling approaches were introduced to evaluate the status of RWQ in recent years (Sahoo and Jha 2013; Effendi et al. 2015; Effendi 2016; Bora and Goswami 2017; Şener et al. 2017; Ewaid et al. 2018; Kadam et al. 2019; Nayak et al. 2020). Effendi (2016) applied the pollution index and WQI to evaluate the WQ of the river Ciambulawung. Goher et al. (2014) used the weighted arithmetic method based WQI to evaluate the WQ of the Ismailia Canal for drinking, irrigation, and aquatic life. Şener et al. (2017) assessed the WQ of the river Aksu, Turkey, using the WQI and GIS methods. Verma et al. (2019) developed some simplified WQIs for the assessment of spatial and temporal variations in WQ of the river Damodar, India. Chen et al. (2019) employed the monthly river pollution index distributions on the highly urbanized Danshui River basin for sustainable and recreational management.

Due to some difficulties in dealing with the complexities involved in the WQI approach, a strong need for a more straightforward and precise modelling procedure for predicting the RWQ (Li et al. 2016; Rajaee et al. 2018; Leong et al. 2019). Therefore, some researchers used modest and upfront statistical and soft computing approaches, i.e., regression models for establishing a strong relationship between the dependent and independent variables. In recent years, regression models were effectively employed in the domain of water resources for modelling a wide range of hydrological processes, e.g., ground water level (Sahoo et al. 2015; Kommineni et al. 2020), water temperature (Rehana 2019); stream flows (Adnan et al. 2017, 2020), evapotranspiration (Tabari et al. 2012; Kundu et al. 2017), flood prediction (Mosavi et al. 2018; Bafitlhile and Li 2019), and rainfall-runoff (Granata et al. 2016; Sedighi et al. 2016).

The anterior studies were mainly focused on the development of the WQI for drinking purpose only (Şener et al. 2017; Barakat et al. 2018; Ewaid et al. 2018; Tang et al. 2019). These studies enumerated the suitability of the river water without considering the aptness of the river stretches for other beneficial purposes, i.e., OB, WF and IIW. Moreover, the river pollution indices proposed in previous studies (Wang et al. 2013; Hoseinzadeh et al. 2015; Alphayo and Sharma 2018) considered very few river water quality parameters (RWQPs), i.e., ammonia nitrogen (NH3-N), dissolved oxygen (DO), biochemical oxygen demand (BOD5), and suspended solids (SS). But looking into the wide spectrum of its utilization, there was a need to reconsider the RWQ classification in the light of other essential WQPs correlating their potential cause of river pollution complications and concerns. The high concentration of fluoride (F), nitrate (NO3), sulfate (SO4−2), total coliform (TC), and heavy metals are harmful to both humans and wildlife (Tchounwou et al. 2012). While chloride (Cl), electrical conductivity (EC), total dissolved solids (TDS), sodium absorption ratio (SAR), and pH are the critical parameters for irrigation and industrial usage (Zahedi 2017) and high concentrations corrode metals and affect the taste of food products. For wildlife and fisheries; pH, EC, free ammonia (FA), and DO also play a very significant role. High concentration of pH, EC, and FA kills fishes and decreases the species diversity. However, the high concentration of DO is desirable for healthy survival of aquatic life and indicates a good health of the river (EPA 2012). In view of a specific parameter importance and the limitations of the previous studies, the present study incorporated the above mentioned critical RWQPs while deciding the suitability of RWQ for different purposes, i.e., DD, OB, DCD, WF, and IIW, because the criteria of the RWQPs vary for different practices (Leong et al. 2019). Moreover, a strong need of a straightforward and precise modelling procedure was observed for predicting the RWQ (Li et al. 2016; Rajaee et al. 2018; Leong et al. 2019). Therefore, the MLR and SVR modelling techniques were also employed to simplify the complex calculations involved in the ERPI models.

The purpose of this study was to develop cost-effective rapid models to evaluate RWQ by considering the specific RWQPs. The principal targets of this study were (i) to develop and evaluate the ERPI model to investigate the RWQ, (ii) to classify the RWQ for different usage, (iii) to develop MLR and SVR models against the ERPI models as reference for the RWQ modelling, and (iv) to compare the performance of the MLR and SVR models.

Methodology

Study area

In this study area encompasses the river Damodar, situated in the Damodar river basin (DRB), India (Fig. 1). The river originates from the Khamerpet hill and flows from Jharkhand and meets with the river Hoogli in West Bengal. This is a shallow, wide, and flashy rain-fed river. The full stretch and the catchment area of this river are approximately 541 km and 23,170 sq. km, respectively. It traverses through the steep slope of the pat region in its upper reaches to descend on the gneissic flat plain of Chandwa, and flow of the river becomes sluggish over the flattop surface. The mean discharge and annual runoff were observed as 296 m3/s and 486 mm/year at Rhondia station. The physiography of upper catchment of DRB is quite different from the lower part as the different rock types, i.e., igneous, sedimentary, and metamorphic rocks were found in different geological time scale. DRB is gifted with mineral resources of coal. It falls within dry and subhumid climatic zones and usually experiences a very hot and dry summer. The average temperature is of 30 °C, and it rises to 48 °C during the months of May–July. Winter is cold with temperature as low as 2 °C. The average annual precipitation of around 1350 mm. More than 80% of the total rainfall ensues during monsoon season between June and September months.

Fig. 1
figure 1

Geographical location of the study area

This river is not only the source of drinking water but also accomplishes the water necessity of irrigation and industrial activities at the region. The industrial activities consist of six steel power stations, four thermal power plants, and three hydroelectric power stations (Kumar et al. 2019; Verma et al. 2019). These industries influence the hydrological regime of the river by withdrawing a lot of water for their accompanying activities. These industries also discharge the substantial amount of effluent containing pollutants, e.g., heavy metals, fly ash, coal dust, and suspended solids, directly into the river, which deteriorate the RWQ. Besides industrial activities, urbanization and heavy encroachment at the bank of the river affect the RWQ and quantity (Mukherjee et al. 2012; Haldar et al. 2014; Verma et al. 2019).

River water sampling

The sampling was performed for the period of 2017–2019 during premonsoon, monsoon, and postmonsoon seasons at the selected monitoring locations on the river stretch. Three numbers of samples were collected from a single location thrice in a season, and the average of the same was reported. Twenty monitoring locations in the river Damodar stretch were carefully selected with consideration of the guidelines for water quality monitoring given by Central Pollution Control Board (CPCB), India (CPCB 2007). All water samples were stored in an insulated cool box together with cold packs and sent to the laboratory. At laboratory, water samples were immediately transferred to the refrigerator for further analysis.

Analytical methods

Fourteen RWQPs, i.e., pH, EC, BOD5, DO, TDS, Cl, TC, SAR, NO3, SO42−, F, FA, Fe, and Pb were analyzed by considering the standard methods prescribed in the guidelines, published by American Public Health Association (Baird et al. 2017). pH and EC were recorded in-situ using a pH meter (Hanna® HI98107) and conductivity meter (HACH® HQ40D multiparameter), respectively. BOD5 was assayed by using the 5-day BOD test. DO was determined using the Winkler method. Estimation of TDS was done by gravimetric analysis. Cl was measured using the argentometric method. TC was examined by the multiple tube fermentation method. Sodium and calcium were estimated using the flame photometer (Systronics Flame Photometer 128), while magnesium was evaluated by EDTA titrimetric method to calculate the SAR. NO3, SO42−, F, FA, Fe, and Pb were estimated by using an ultraviolet spectrophotometer (MOTRAS Scientific UV-Visible spectrophotometer).

Analytical quality assurance and quality control

As the natural variability is a fundamental feature of a river and cannot be controlled, to quantify this variability triplicate river water samples were collected during the sampling. The analytical data quality and accuracy were ensured through careful standardization by preparing and analyzing the reference water sample for determining the presence of any interference. For the precision of measurement, analysis of the river water sample was performed in triplicate and considered the average as the final value. The instrument was recalibrated when the relative percent difference (RPD) between the two river water samples transcend to ± 5%. Moreover, the analytical grade chemical reagents were used in the whole analysis procedure of RWQPs. The representativeness of the samples was controlled by selecting the appropriate locations and time for river water sampling.

Data processing

For regression models, the dataset of dependent and independent variables was normalized within a fixed range between 0 and 1, to transform all variables on a uniform scale. Moreover, the dataset was split into the training and testing set as, 70% for the training phase, and 30% for the testing phase (Bozorg-Haddad et al., 2017). The models were developed using training set and then validated by the testing set. The performance of the models was evaluated using the statistical metrics, i.e., RMSE, R2, and MAE. The program codes were written in R language using RStudio Desktop version 1.3 software.

Enhanced river pollution index model

The enhanced river pollution index model (ERPI) model was developed and evaluated for the monitoring and management of the RWQ for the specific usage of the river water, i.e., DD, OB, DCD, WF, and IIW, as categorized and described by the CPCB, India (CPCB 1979, 2007; BIS 1982). The ERPI model included the four essential steps. The first step was the selection of crucial RWQPs according to the particular use of the river water. The second step was to determine the relative weights for the selected RWQPs (Olasoji et al. 2019). The third step was to calculate the subindex for each selected RWQPs. In the fourth step, all of the subindices were aggregated to evaluate the final value of the ERPI model. The ERPI model was described in Eq. (1).

$$ \mathrm{ERPI}={\sum}_{j=1}^m{SI}_j/{\sum}_{j=1}^m{W}_j $$
(1)

where, SIj is the subindex and Wj is the relative weight for jth (1, 2, 3...…, m) parameter of the river water. The calculation involved in the ERPI model was described in Eqs. (25)

$$ {SI}_j={Q}_j\times {W}_j $$
(2)
$$ {Q}_j=\left[\left(\left|{\mathrm{EV}}_j-{IV}_j\right|\right)/\left({\mathrm{SPV}}_j-{IV}_j\right)\right]\times 100 $$
(3)
$$ {W}_j=k/{\mathrm{SPV}}_j $$
(4)
$$ k=1/{\sum}_{j=1}^m\left(1/{\mathrm{SPV}}_{\mathrm{j}}\right) $$
(5)

where, Qj is the quality rating, EVj is the estimated value of parameter in river water sample, IVj is the ideal value of parameter in pure water, SPVj is the standard permissible value for RWQPs, and k is the constant of proportionality. The different categories for the values of ERPI model with respective RWQ were classified in Table 1 (Tyagi et al. 2013; Bora and Goswami 2017; Hussein and Ali 2017; Trikoilidou and Samiotis 2017; Ustaoğlu et al. 2020).

Table 1 Categories for the values of ERPI model

Multiple linear regression

Multiple linear regression (MLR) is a quantitative tool used for modelling by establishing linear relationship between two or more independent variables and a dependent variable (Tabari et al. 2012; Kadam et al. 2019) and is expressed in the form of Eq. (6).

$$ y={\alpha}_0+{\alpha}_1{x}_1+{\alpha}_2{x}_2+\dots {\alpha}_m{x}_m+\varepsilon $$
(6)

where, y is the dependent variable, αo is the intercept, α1–αm is the regression coefficients, x1xm is the independent variables, m is the number of independent variables, and ε is the random error. In this study, outcomes of the ERPI models and their corresponding RWQPs were used as dependent and independent variables, respectively, to determine the RWQ by estimating the ERPI for different purposes.

Support vector regression

Support vector machine (SVM) is a method educed from statistical learning theory and can be used both for classification and regression problems (Tabari et al. 2012; Ji et al. 2017). SVR imprints a linear model to separate the sample dataset from the input vectors through some nonlinear mapping techniques. In SVR, a nonlinear function is erudite a kernel induced feature space by a linear learning machine. The SVR model is trained on dataset d = {xi, yi; i = 1, 2, ……, n} with n-dimensional input vectors xi and associated target yi. SVR aims to discover a function f(x) with at most error tolerance ε deviation from the target y for all the training datasets (Liu and Lu 2014; Raghavendra and Deka 2014). SVR deliberates the following estimation function, as shown in Eq. (7), to fulfil the aim.

$$ \mathrm{f}\left(\mathrm{x}\right)=\upomega\ \upvarphi \left(\mathrm{x}\right)+b $$
(7)

where ω ϵ d, d is the input space and b is the bias and φ(x) is the high dimensional feature space. These coefficients can be estimated by the regularized risk function (R(f)) minimizing technique using Eqs. (89).

$$ R(f)=C{\sum}_{i-1}^n{L}_{\varepsilon}\Big({y}_i-f\left({x}_i\right)+{\left\Vert \omega \right\Vert}^2/2 $$
(8)
$$ {L}_{\varepsilon}\left({y}_i,f\left({x}_i\right)\right)=\left\{\begin{array}{c}\left|{y}_i-f\left({x}_i\right)\right|-\varepsilon \kern1.5em \left|{y}_i-f\left({x}_i\right)\right|\ge \varepsilon \\ {}0\kern6.25em \mathrm{otherwise}\end{array}\right. $$
(9)

where, C is the cost function measuring empirical risk, Lε(f(xi)-yi) is the ε-insensitive loss function, ‖ω2‖/2 is the Euclidean norm, ε is the difference between actual values, and n is the number of variables. Hence, the regression problem can be defined in the form of convex optimization problem and solved using Lagrange function (Raghavendra and Deka 2014). Hence, the regression function is shown in Eq. (10).

$$ f(x)={\sum}_{i,j=1}^n\left({\delta}_i-{\delta}_i^{\ast}\right)\left({x}_i,{x}_j\right)+b $$
(10)

where, (δi- δi*) is the Lagrange multiplier. The kernel function is involved to solve the nonlinear problems in the SVR models. This function maps the data into higher dimension feature space. The SVR model in the feature space can be expressed using K (xi, xj) instead of (xi, xj), then the SVR model can be expressed as Eq. (11).

$$ f(x)={\sum}_{i,j=1}^m\left({\delta}_i-{\delta}_i^{\ast}\right)K\left({x}_i,{x}_j\right)+b $$
(11)

where, K (xi, xj) is the kernel function. From Eq. (11), the nonzero Lagrange multiplier data (support vector) is involved in the final SVR model. Finally, the SVR model can be expressed as the regression function given in Eq. (12).

$$ f(x)={\sum}_{k=1}^m\left({\delta}_k-{\delta}_k^{\ast}\right)K\left({x}_k,x\right)+b $$
(12)

where, xk is the support vector and m: number of support vectors. The SVR model can be epitomized as a two-layer network architecture (Fig. 2) in which the weights are nonlinear in the first layer and linear in the second layer.

Fig. 2
figure 2

Network architecture of SVR model

Performance analysis

In this study, root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) were used to compare the performance of the developed models in the estimation of RWQ. These were calculated using Eqs. (1315), respectively.

$$ \mathrm{RMSE}=\sqrt{\sum_{i=1}^n{\left({Y}_{p_i}\hbox{--} {Y}_{o_i}\right)}^2/n} $$
(13)
$$ {R}^2=1-\left[{\sum}_{i=1}^n{\left({Y}_{p_i}\hbox{--} {Y}_{o_i}\right)}^2/{\sum}_{i=1}^n{\left({Y}_{o_i}\hbox{--} \overline{Y}\right)}^2\right] $$
(14)
$$ \mathrm{MAE}=\left(1/n\right){\sum}_{i=1}^n\left|\left({Y}_{p_i}\hbox{--} {Y}_{o_i}\right)\right| $$
(15)

where, n is the number of observations, Yo is the observed value, Yp is the predicted value, ͞Yo is the mean observed value, and ͞Yp is the mean predicted value of respective ERPI model.

Results and discussion

Characteristics of the river

The overall analytical results of RWQPs were epitomized in Table 2. The results deliberated the alkaline nature of the river Damodar. The increment in the pH was due to industrial effluent and agricultural runoff (Haldar et al. 2014; Verma et al. 2019). The concentration level of DO was found to be sufficient except for a few locations for various physiological activities because of the geological conditions, which increased the level due to high aeration (Mukherjee et al. 2012). TDS value was high in the sites near the small and large industries. The locations had a high concentration of BOD5 and Cl, where the river received the urban waste. The higher BOD5 level determined the presence of a greater amount of organic matter for the microorganisms due to wastewater discharge on the river stretch (Mukherjee et al. 2012; Tripathi and Singal 2019; Verma et al. 2019). The main contributors of sulfates were mine wastes, sewage treatment plants, industrial discharges, and runoff from agricultural lands (Verma et al. 2019). The maximum concentration of Pb and Fe was found at the locations near the discharge point of the thermal power station and metal industries. The value of TC was found higher than the prescribed limit (BIS 1982) at most of the sites.

Table 2 Statistical summary of the RWQPs

As most of the RWQPs were not normally distributed, Spearman’s correlation matrix was used to analyze the correlation between the RWQPs as shown in Table 3. The results obtained from Table 3 represented a very strong positive correlation between EC and TDS which indicated the presence of high level of inorganic salts and organic substances in the river water which may be attributed to the domestic, industrial, and agricultural pollutions. Moreover, a negative correlation between DO and BOD5 was found, as high concentration of BOD5 depleted the DO level of the river water. A positive correlation between BOD5 and TC indicated the presence of domestic sources of pollution. This phenomenon designated that the overall RWQ was strongly affected by the domestic wastewater sources and effluents of the coal industries, steel plants, and thermal power stations situated at the Damodar River basin (Mukherjee et al. 2012; Verma et al. 2019).

Table 3 Spearman's correlation matrix of RWQPs

Evaluation of the RWQ using ERPI model

In this study, WQ of the river Damodar was evaluated for DD, OB, DCD, WF, and IIW purposes. The analysis results of all twenty sampling locations were used for RWQ estimation. To assess RWQ for purposes as mentioned earlier, five ERPI models, i.e., ERPIDD, ERPIOB, ERPIDCD, ERPIWF, and ERPIIIW, were developed with different combinations (CPCB 1979; BIS 1982) of the RWQPs (Table 4).

Table 4 Description of categorical ERPI with respective parameters

To calculate the ERPI model values at each sampling location, the relative weights (Wi) for each RWQP, were computed according to their relative importance in the overall RWQ for different purposes (Table 5).

Table 5 Relative weight of each input parameter for different ERPI models

The ERPIDD model included eleven RWQPs to estimate the suitability of the river water for DD. The values of the ERPIDD model lied between 55.252 and 122.590. The ERPIDCD model also encompassed eleven RWQPs. The results determined that the values of ERPIDCD model varied from 28.583 to 87.711. The ERPIOB model contained five RWQPs to evaluate the fitness of the river water for OB. The results deliberated that the values of the ERPIOB model were found between 51.814 and 111.047. The ERPIWF model comprised four RWQPs to evaluate the aptness of the RWQ for WF. Values of the ERPIWF model ranged between 17.038 and 82.014. The ERPIIIW model involved five RWQPs in estimating the suitability of the river water for IIW. Outcomes of the ERPIIIW were estimated between 10.010 and 68.956. Figure 3 showed the RWQ classification, based on evaluation of the ERPI models for a particular use of the river water.

Fig. 3
figure 3

River classification based on ERPI model. (a) ERPIDD, (b) ERPIOB, (c) ERPIDCD, (d) ERPIWF, (e) ERPIIIW

Overall, the ERPI was the most delicate model for different usage of the river water over the previously reported river pollution indices (Sahoo et al. 2015; Alphayo and Sharma 2018), which were focused only on drinking purpose by considering fewer RWQPs. ERPI method reflected the capability to overcome the limitations (e.g., parameter restriction and redundancy, lack of portability, and the inability to represent specific uses) of the hitherto approaches. The ERPI model assimilated only necessary RWQPs, i.e., eleven for ERPIDD, five for ERPIOB, eleven for ERPIDCD, four for ERPIWF, and five for ERPIIIW. The approach of selecting the more specific and essential RWQPs reduced the time and cost involved in the analytical procedures of the RWQPs. The developed models integrated the composite influence of different RWQPs for explicit purposes for communicating the global RWQ information to the general person as well as the decision-makers. According to the critical observation of the results for ERPI models shown in Fig. 3, ERPIDD revealed that no stretch of the river had fallen under the excellent or good categories (Mukherjee et al. 2012; Singh et al. 2019) and fibbed only between fair to unfit classes for DD. But it could be improved after performing conventional treatments conferring to ERPIDCD. ERPIOB deliberated that most of the river stretch was falling under the fair class for OB, while 33% of the river stretch lied between poor to unfit. In the view of WF, ERPIWF classified 16% of the river stretch in poor class. It was due to the discharge of municipal wastes and effluent from coal washeries, steel plants, and thermal power stations, situated near to the river bank (Mukherjee et al. 2012; Singh et al. 2019; Verma et al. 2019). However, ERPIIIW categorized the whole stretch of the river within excellent to good classes for IIW purposes.

Results of MLR model

In the MLR models for DD (MLRDD), OB (MLROB), DCD (MLRDCD), WF (MLRWF), and IIW (MLRIIW), the ERPIDD, ERPIOB, ERPIDCD, ERPIWF, and ERPIIIW variables were defined as the dependent variables. Moreover, the corresponding RWQPs, as described in Table 4, were assumed as independent variables for respective models. The reduction in the number of independent variables on the premise of less influence on the accuracy of the MLR models was comprised to minimize the workload and the information overlapping. The best suited model was derived by testing the numerous blends of independent variables with respective dependent variable. The MLR models were developed using training dataset and then validated by the testing dataset using ‘lm’ function with ‘qr’ method, available in RStudio software. The statistical summary of the derived MLR models was abridged in Table 6.

Table 6 Statistical summary of MLR models

As the results showed, the MLRDD, MLROB, and MLRDCD models had the best performance. In contrast, the MLRWF and MLRIIW models gave the poor RWQ estimates having low R2 with high RMSE and MAE. The RWQ for the respective water usage, estimated by the MLR models and RWQ computed using the ERPI models as benchmark were shown in Fig. 4. The comparative analysis between ERPI and the respective MLR models suggested that the multiple regression techniques can be an excellent way to deal with foresees ERPI for RWQ. Moreover, this sort of approach requires long haul physicochemical information to determine the parameters of the regression model, which are site and season reliant.

Fig. 4
figure 4

RWQ estimated by the ERPI models and the MLR models in the testing phase for a DD; b OB; c DCD; d WF, and e IIW

Results of SVR models

In the present study, four SVR models with different kernel functions, i.e., linear (LK), polynomial (PK), radial basis (RK), and sigmoid (SK), were developed for each usage of the river water. Overall, twenty SVR models were established as; LK-SVRDD, PK-SVRDD, RK-SVRDD, and SK-SVRDD for DD, LK-SVROB, PK-SVROB, RK-SVROB , and SK-SVROB for OB, LK-SVRDCD, PK-SVRDCD, RK-SVRDCD , and SK-SVRDCD for DCD, LK-SVRWF, PK-SVRWF, RK-SVRWF, and SK-SVRWF for WF and LK-SVRIIW, PK-SVRIIW, RK-SVRIIW, and SK-SVRIIW for IIW. The input combinations of variables with respective outputs for the SVR models were the same as that of used for the MLR models. Table 7 represented the results of the SVR models for RWQ estimation. It merely elaborated the performance of the developed SVR models with different kernel functions.

Table 7 Performance of the SVR models for the testing phase

As shown, the SVR models were sensitive to the choice of kernel functions (Tabari et al. 2012; Raghavendra and Deka 2014). An appropriate selection of the kernel function allowed the nonseparable RWQ data in the original input space to become separable in the new feature space (Raghavendra and Deka 2014). The comparison of the RWQ values calculated by ERPI models and RWQ values predicted by the four different SVR models was shown in Fig. 5. It was seen from Fig. 5 that the LK-SVRDD model closely followed the ERPIDD model values of RWQ. The same pattern was attained by LK-SVROB and LK-SVRDCD models for ERPIOB and ERPIDCD values of RWQ, respectively. However, the RK-SVRWF and RK-SVRIIW models were evidenced as the best models for the RWQ estimation for WF and IIW, respectively. Moreover, the PK and RK functions were also found significant for SVRDD, SVROB, and SVRDCD models. It confirmed the RMSE and MAE statistics given in Table 7.

Fig. 5
figure 5

RWQ values evaluated by the ERPI and SVR models with different kernel functions in the testing phase for a DD; b OB; c DCD; d WF, and e IIW

Comparison of the MLR and SVR models

Table 8 represented the comparative study between the performance of MLR and SVR models for the testing phase. The vigorous MLR and SVR models were selected and ranked according to the RMSE values, for estimating the RWQ for respective usage. The results described that the MLRDD model with RMSE of 0.001 could be designated as the best model for RWQ valuation for DD in the study area. The LK-SVRDD model attained the rank second with the RMSE of 0.012. The RK-SVRDD model with RMSE of 0.028 could be considered as the next best model, successively followed by the PK-SVRDD (RMSE = 0.058) and SK-SVRDD (RMSE = 1.031). Same pattern was found for OB (MLROB > LK-SVROB > RK-SVROB > PK-SVROB > SK-SVROB) and DCD (MLRDCD > LK-SVRDCD > RK-SVRDCD > PK-SVRDCD > SK-SVRDCD). The RK-SVRWF, MLRWF, LK-SVRWF, PK-SVRWF and SK-SVRWF models ranked 1st place to 5th, respectively, for RWQ estimation for WF. For IIW, the models followed a similar ranking (RK-SVRIIW > MLRIIW > LK-SVRIIW > PK-SVRIIW > SK-SVRIIW) as used for WF.

Table 8 Performance comparison of MLR and SVR models for the testing phase

The MLR models were found beneficial to realize the association between dependent ERPI variables and respective independent RWQPs. However, the MLR models are not flexible enough to seize the complex associations and poorly perform with nonlinear relationships (Tabari et al. 2012; Rajaee et al. 2018; Keshtegar et al. 2019). In contrast, the SVR models have the adaptability and capacity to demonstrate the same. Moreover, the training process of the SVR models consistently looks for a globally optimized solution which avoids the problem of over-fitting. The SVR method can select the support vectors (key vectors) and expel the nonsupport vectors (nonkey vectors) consequently from the models. This approach increases the model flexibility into noisy conditions. The SVR models achieve high accuracy because of simultaneous minimization of prediction error and model complexity. The critical limitation of the SVR technique is that it is a black-box data-driven technique without any physical basis (Bozorg-Haddad et al., 2017; Liu and Lu 2014; Raghavendra and Deka 2014). Moreover, the SVR models can only be used when the training datasets are available (Ji et al. 2017). Overall, it was found that the MLR and SVR (except SK-based SVR) models afforded good agreement with the ERPI models to evaluate the RWQ. The comprehensive ranking of the developed models was shown in Table 9.

Table 9 Summary of the ranking for the developed models

Conclusion

In this study, the ERPI model was proposed to determine the RWQ of different stretches of the river, not only for DD but also for OB, DCD, WF, and IIW. Assessment of the various monitoring locations determined that the river stretch was not excellent for DD, OB, and DCD. Less than 50% of the river stretch was classified in excellent and good classes for WF. However, the whole river stretch was found suitable for IIW. The ERPIMLR and ERPISVR models developed here can be implanted for estimating the RWQ to simplify the interpretation of the ERPI models. MLRDD, MLROB, MLRDCD, RK-SVRWF, and RK-SVRIIW models performed well to evaluate the RWQ for respective usage. The verdicts of this case study offered a rudimentary direction to the water resource managers, irrigation engineers, aqua-culturists, and the general public. Further research is needed to test the developed models for more RWQPs to evaluate their potentiality in RWQ determination.