Introduction

Water is vital in many aspects of human life such as for drinking, personal hygiene, agricultural purposes, manufacturing and industrial processes, biotransformation, power generation, and contamination dissolution releasing (Zhang et al. 2012; Wu et al. 2018). Unsustainable anthropogenic activities often polluted water bodies and causes high stress on freshwater resources (Chau 2005). Because of the frequent episodes of water pollution in recent times, the prediction and assessment of water quality have gradually attracted the attention of the environmental management department of many countries (Gümrah et al. 2000; Page et al. 2017).

Iraq experienced a remarkable increase in water shortage in the last two decades due to intervention of water flow in the upstream of major rivers, changes in climate, and gradual declination of rainfall (Kadhem 2013; Zolnikov 2013). Water quality is another major problem that Iraq is facing for the past couple of decades (Zolnikov 2013). This is particularly becoming a major concern for Euphrates River where water quality is drastically aggravated in recent years due to agricultural developments. The issue of river water quality has become critical for the country as it has gone beyond the standard required for industrial, domestic, and agricultural purposes (Rahi and Halihan 2010).

The quality of river water is defined by various physical, chemical, and biological properties of water. Among all the water quality parameters, dissolved oxygen (DO) is considered as the most important water quality parameter as it is essential for the survival of all aquatic organisms. Biochemical oxygen demand (BOD), on the other hand, is a measure of the amount of DO in river and thus defines the amount of organic matter available for oxygen-consuming bacteria. DO and BOD is a composite index that can be used to assess the favorable conditions for aquatic life and overall quality of water. The DO and BOD affects a large number of biological, chemical, and physical properties of water and thus considered as the most important index of water quality. The stream pollution control and management of river water quality and ecology activities are largely hinged on accurate determination of these two parameters. However, the analysis of these parameters is delicate and time-consuming compared to other water quality parameters. A great amount of cost, time, and energy can be saved if these water quality parameters can be predicted in a reasonable accuracy. This has inspired researchers to develop reliable models for prediction of BOD and DO from other easily available water quality data.

Modeling and forecasting of river water quality parameters is a challenging issue for a long time. The BOD and DO depend on many biotic and abiotic factors and their complex interactions. The knowledge of many of these interactions are still not clear, the data required for modeling such processes are difficult to acquire, and mathematical formulations of the processes are often very difficult. Therefore, physical-based models generally used for BOD and DO modeling simplify these complex physical processes and therefore often fail to predict BOD and DO with reasonable accuracy.

The BOD and DO in water bodies are found to change with time and follow stochastic behavior which encouraged developing stochastic prediction models. Regression models are most widely used for modeling stochastic behavior of BOD and DO. However, highly stochastic behavior of BOD and DO makes the reliable simulation of those parameters using conventional regression models a difficult task. It is expected that prediction models must have the high prescient ability in describing water quality. Therefore, simple statistical regression-based model cannot be used for operational management of river water quality.

Soft computing models such as artificial intelligence (AI) provide an excellent and reliable technique for modeling surface and underground water quality (Gümrah et al. 2000; Shi et al. 2018). Nevertheless, AI models exhibited robust and reliable modeling strategies for multiple hydrological, climatological, and environmental applications (Wang et al. 2014; Olyaie et al. 2015; Chen and Chau 2016). The main advantage of the AI models is their capability of handling the highly complicated nonlinear inter-parameter relationship (Barzegar et al. 2016) on the contrary of the conventional statistical models that are based on the assumption of linear relationship. The applications of AI have been presented in several predictive model forms such as artificial neural network (Sudheer et al. 2006; Zou et al. 2007; May 2008; Palani et al. 2008; Singh et al. 2009; Song et al. 2010; Balabin et al. 2011; Khalil et al. 2011; Gazzaz et al. 2012; Klaslan et al. 2014; Wu et al. 2014), support vector machine (Xu et al. 2007; Bouamar and Ladjal 2008; Yunrong and Liangzhong 2009a; Jian et al. 2010; Singh et al. 2011; Liu and Lu 2014; Jadhav et al. 2015), adaptive neuro-inference system model (Sahu et al. 2011; Emamgholizadeh et al. 2014; Najah et al. 2014; Ahmed and Shah 2015), and genetic programming (Muttil and Chau 2006; Sreekanth and Datta 2010; Orouji and Haddad 2013; Olyaie et al. 2017). On the other hand, hybrid intelligence models revealed a good performance in modeling water quality parameters (Wang et al. 2010; Liu et al. 2013; Deng et al. 2015; Barzegar et al. 2016; Ravansalar et al. 2016). In spite of the enormous implementation of AI in water quality modeling, there are still several downsides of these AI models such as difficulty in tuning internal parameters, time-consuming algorithms, human modeling interaction, and lack of generalization. Therefore, exploring new and robust mathematical models that are featured by high flexibility in solving complicated environmental phenomena are on progress (Behmel et al. 2016).

Most recently, a new mathematical model called response surface method (RSM) has been well recognized for its ability to solve complex regression problem effectively (Cho 2007; Kewlani and Iagnemma 2008; Kim and Choi 2008; Wei et al. 2008; Acherjee et al. 2009; Roussouly et al. 2012). The main advantage of RSM is its employment of high-order polynomial function (Keshtegar et al. 2016). The precision of the RSM model relies upon the fundamental numerical capacity in light of the fact that the essential response surface function frames are given adaptability to model the targeted application (i.e., water quality variable). In the current research, a hybrid response surface method (HRSM) has been developed for the first time to predict water quality variables. The motivation of proposing this study was its successful implementation in the field of hydrology (Keshtegar et al. 2017).

The HRSM models have been developed in this study for the prediction of BOD and DO from other easily available water quality parameters including water temperature (T), turbidity, power of hydrogen (pH), electrical conductivity (EC), alkalinity, calcium (Ca), chemical oxygen demand (COD), sulfate (SO4), total dissolved solids (TDS), and total suspended solids (TSS). The modeling result is verified against the support vector regression (SVR) model. Various AI models have been proposed for simulations of river water quality parameters as mentioned above. However, SVR has been reported in literature as the most predominate AI model in prediction of environmental phenomena (Fahimi et al. 2016; Fengxiang et al. 2010; Singh et al. 2011; Yunrong and Liangzhong 2009b). This study aims to develop a robust mathematical model for the prediction of BOD and DO in river water in order to aid river water quality management in a data scarce region such as Iraq. This type of model is extremely important for a developing country like Iraq where the amount of assigned budget for environmental quality monitoring and assessment is very limited, but the water pollution is very frequent and more disastrous. Hence, establishing the current research is highly significant for the sake of providing intelligent system to monitor the water quality variables of this most vital river of Iraq. To the best of the knowledge of the authors, there is no research previously conducted on such prospective and thus the novelty is presented at this point in addition to the proposed methodology.

Dataset and description of the study area

The water quality parameters of Euphrates River measured at Ramadi City, Anbar, Iraq (latitude 33°26′15″N; longitude 43°16’52″E) was used in the study (Fig. 1). The water quality of the Euphrates River has become a serious issue in recent years. The return flows from agricultural land and dumping of untreated sewage into the river and its tributaries for a long time have caused gradual deteriorations of the quality of river water. Therefore, forecasting water quality of Euphrates River is very important for environmental quality monitoring and management. The water sample at the intake of a large drinking water treatment plant in Ramadi City was collected for laboratory measurement of water quality parameters. The sampling process was based on monthly scale over the period of 2004–2013. Long-term reliable water quality data is a major problem in Iraq. Water quality data was available only for those 10 years when the study was conducted, and the available data was fully utilized in the present study. The main sources of water contamination of the Euphrates River are agricultural and domestic wastes. The salinity of river water is very high which increases along the course of the stream. In addition, discharging of the untreated sewage water in the river and its tributaries adds a serious hazard associated with different types of water contaminants. The analysis of water quality parameters was done upon ten physical and chemical water properties including T, turbidity, pH, EC, slkalinity, Ca, COD, SO4, TDS, TSS, DO, and BOD. The BOD and DO in river water system are affected by these water quality parameters, and therefore, those are selected for the development of prediction models.

Fig. 1
figure 1

The case study location Ramadi water plant station located on the Euphrates River in Iraq

Theoretical review of the predictive models

Hybrid response surface method

The RSM can be generally described as a set of approximation polynomials limited to quadratic order for experimental calibration. A set of polynomial functions for modeling of water quality variables (BOD and DO) at monthly time scale was derived in this study. The regression of several data points was used to obtain the polynomial set coefficients. The general method for a quadratic order approximating polynomial using RSM that was widely applied in previous researches (e.g., Afan et al. 2017; Keshtegar et al. 2016; Yeniay 2014) involves the following second-order function:

$$ Y={a}_0+\sum \limits_{i=1}^n{a}_i{x}_i+\sum \limits_{i=1}^n{a}_{ii}{x}_i^2+\sum \limits_{i=1}^n\sum \limits_{j=i+1}^n{a}_{ij}{x}_i{x}_j $$
(1)

The conceptual idea of prediction “multivariate regression problem” using RSM hinges on the polynomial functions in high-order. Traditional RSM typically uses low-order polynomials to approximate highly nonlinear functions and thus suffers from limited predictive capability. Thus, selecting the appropriate polynomials is essential to attain a high level of accuracy. Different approaches such as multi-layer regression, Taguchi optimization, etc. have been adapted to improve the prediction characteristics RSM compared to those uses typical polynomial-based RSM. The main modification performed in the latest equation consisted of combining the polynomial and exponential functions to obtain a hybrid function. It is expected that if the polynomial expression is able to approximate the highly nonlinear relationship like the exponential relationship over an extended range, it will able to provide better prediction.

For data having small standard deviation, the data points usually have narrow bounds with normal distribution function near the mean. Hence, the input data to this function are usually normalized via the following equation:

$$ N\left({X}_n\right)=\frac{x_n^i-{x}_n^{mean}}{2{\sigma}_{xn}} $$
(2)

where \( {x}_n^{mean} \) and 2σxn are the average and standard deviation of a normally distributed function in which 2σxn can be replaced with σxn because of the narrow bounds with normal distribution function near the mean value. With respect to the following normalized exponential function (Eq. 3), the calibration of target values (BOD and DO in this case) considering normalized input water quality parameters can be done as

$$ {Y}_n={a}_{1n}+{a}_{2n}\exp \left[N\left({X}_n\right)\right] $$
(3)

To approximate the unknown coefficients a1n and a2n, Eq. (3) can be incorporated with Eq. (1) to obtain the HRS function:

$$ Y={a}_0+\sum \limits_{i=1}^n{a}_i{Y}_{ni}+\sum \limits_{i=1}^n{a}_{ii}{Y}_{n_i^2}+\sum \limits_{i=1}^n\sum \limits_{j=i+1}^n{a}_{ij}{Y}_{ni}{Y}_{nj} $$
(4)

where Yni is the same as given in Eq. (3). The unknown coefficients in the latest equation are usually estimated using the following error function:

$$ e={\left[E-Y\right]}^T\left[E-Y\right] $$
(5)

where Y = P(Yn)Ta, E are the observed target values (BOD and DO). The basic polynomial function that depends on Eq. (3) can be computed as

$$ P\left({Y}_n\right)=\left[1,y{n}_1,y{n}_2,...,y{n}_n,y{n}_1^2,y{n}_2^2,...,y{n}_n^2,y{n}_1y{n}_2,y{n}_1y{n}_3,...,y{n}_{n-1}y{n}_n\right] $$
(6)

Following minimization of the error function given in Eq. (5), the unknown coefficients can be estimated as follows:

$$ a={\left[P{\left({Y}_n\right)}^TP\left({Y}_n\right)\right]}^{-1}\left[P{\left({Y}_n\right)}^TE\right] $$
(7)

By combining the latest equation with Y given above, the prediction of BOD and DO can be fulfilled as

$$ Y=P{\left({Y}_n\right)}^T{\left[P{\left({Y}_n\right)}^TP\left({Y}_n\right)\right]}^{-1}\left[P{\left({Y}_n\right)}^TE\right] $$
(8)

For illustration purpose, Fig. 2 shows the proposed hybrid response surface method. The figure displays four layers involved in the structure of the HRSM model that can be used for the prediction using exponential functions and hybrid polynomial.

Fig. 2
figure 2

The structure of the proposed HRSM predictive model

The input data is presented in the first layer. The second layer defines the normalization process for the supplied data. This is followed by the calibration of the targeted BOD and DO variables in accordance with the normalized input attributes using Eq. (3), reported earlier. At the last layer, the regression problem is solved using the second-order polynomial function given in Eq. (4). More information of the development of HRSM can be found in other published researches (Keshtegar and Heddam 2017; Keshtegar et al. 2017).

Support vector regression

As a distinguished intelligence predictive model, SVR had been applied successfully in environmental studies (Singh et al. 2011; Fahimi et al. 2016). Therefore, in this study, it was selected to verify the prediction proficiency of HRSM model. Recently, the SVR has been applied in several areas such as soft computing, environmental studies, and engineering as a learning algorithm. It has demonstrated a better prediction and forecasting accuracies when compared with other forecasting methods like neural network (Liu and Lu 2014). The process and theory of SVR development is available in the literature (Vapnik 1995). A statistical way of machine learning and minimization of structural risk form the basis for the development of SVR, aimed at reducing upper bound error compared to the commonly experienced local training error in other machine learning methods. Based on recent evaluations, there are several improvements in the SVR compared to other soft computing learning algorithms such as the implementation of a set of kernel equations that is highly dimensionally spaced but does not involve nonlinear transformations, thereby making data to be indispensable and linearly separable since there is no room for assumption during the functional transformation. Besides, the method is unique in its solution due to the convex nature of the optimal problem. Different algorithms have been proposed for optimization of the internal parameters of SVR including bat algorithm, firefly algorithm, particle swarm optimization, and univariate marginal distribution algorithm. Among those, the nature-inspired algorithm, firefly, has been highly recommended in recent studies (Ch et al. 2014; Moghaddam et al. 2016; Shamshirband et al. 2016; Ebtehaj et al. 2017; Ghorbani et al. 2017a, b, c; Tao et al. 2018). Yang (2010) developed firefly (FFA) algorithm as a biologically inspired metaheuristic optimization algorithm which depends on certain biological behaviors such as the characteristic flashing of light by Fireflies. Fireflies attract preys or mates through bioluminescence. When compared to other conventional metaheuristic algorithms, the FFA has shown promising, efficient, interesting, and robust potentials in achieving global optimization.

Evaluation of model performance

The simulation was done by using a new set of input variables, and the results obtained using HRSM and SVR models were compared with the observed BOD and DO, using different performance measuring indicators including scatter index (SI), mean absolute percentage error (MAPE), root mean square error (RMSE), mean absolute errors (MAE), root mean square relative error (RMSRE), mean relative error (MRE), BIAS, and correlation coefficient (R2), following most of machine learning researches evaluation (Moeeni et al. 2017):

$$ \mathrm{MAE}=\frac{1}{n}\sum \limits_{i=1}^n\mid {x}_i-{y}_i\mid $$
(9)
$$ \mathrm{RMSE}=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({x}_i-{y}_i\right)}^2} $$
(10)
$$ \mathrm{MAPE}=\frac{1}{n}\sum \limits_{i=1}^n\mid \frac{x_i-{y}_i}{x_i}\mid \times 100 $$
(11)
$$ \mathrm{SI}=\frac{\mathrm{RMSE}}{\overline{x}} $$
(12)
$$ {R}^2=\frac{{\left(\sum \limits_{i=1}^n\left({x}_i-\overline{x}\right)-\left({y}_i-\overline{y}\right)\right)}^2}{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2\sum \limits_{i=1}^n{\left({y}_i-\overline{y}\right)}^2}\times 100 $$
(13)
$$ \mathrm{RMSRE}=\sqrt{\frac{\sum_{i=1}^n\left(\frac{x_i-{y}_i}{x_i}\right)2}{n}} $$
(14)
$$ \mathrm{BIAS}=\frac{\sum \limits_{i=1}^n\left({x}_i-{y}_i\right)}{n} $$
(15)
$$ \mathrm{MRE}=\frac{\sum \limits_{i=1}^n\left(\frac{x_i-{y}_i}{x_i}\right)}{n} $$
(16)

where \( {x}_i,{y}_i\overline{x} \)and \( \overline{y} \) are observed, predicted mean value of observation, and mean value of predictions, respectively. The criteria perform in different ways; for instance, bias is a measure of the systematic tendency of a model to underestimate or overestimate the target values. A positive bias, for example, implies that observed values of BOD and DO, on average, are higher than that of predicted values and vice versa. The eight statistical metrics mentioned above can be used for the assessment of all forms of errors in model output as well as to assess the association and similarity of model output with observed data. Therefore, it is expected that the use of those eight criteria together would help in selection of the best model in an unbiased way.

Application results

Deterioration trend of water quality can be inspected via water quality prediction models. As described in the earlier parts, this study mainly focused on the prediction of two important chemical parameters (i.e., BOD and DO). Both parameters have been classically used for decades as indicators of water quality, and undoubtedly accurate prediction, in this case, is essential to ease the protective initiatives. In this work, a new predictive HRSM model was introduced and the performance of new model is compared with very well-known AI model (e.g., SVR). HRSM is relatively a new approach that can predict complex patterns using the approximating tool. The superiority of the proposed model is checked by analyzing different forms of errors in model simulation.

Exploratory analysis of Euphrates River water quality parameters are given in Table 1. For a better understanding of the influence of each predictor on the targeted variables, correlation coefficients of each input variable with BOD and DO were computed (Table 2). It was found that the correlation coefficients of all water quality parameters except temperature were low and insignificant. A total of ten parameters were used to predict the BOD and DO. In this respect, ten different models were constructed with the combination of different input parameters, which were labeled as (M1, M2, M3, …, M10). According to Table 3, model 1 (M1) consists of only one water quality parameter (i.e., temperature), M2 consists of two parameters, and likewise, M10 consists of all (ten) parameters to be fed as input attributes to the predictive models. As the number of input parameter gradually increased from M1 to M10, the changes in model performance provides the influence of each input parameter. Thus, ten models were constructed in this study in order to provide information regarding the sensitivity of the ten input parameters considered in this study in prediction of HRSM and SVR.

Table 1 Basic statistics of the measured water quality variables
Table 2 The correlation statistic between each inspected water quality (predictor attribute) and the targeted water quality BOD and DO
Table 3 The investigated input combinations to predict BOD and DO water quality variables

The model performance of HRSM and SVR is tabulated in Tables 4 and 5. According to the presented values, the HRSM was found to perform excellent in the prediction of both BOD and DO using the second input combination (M2) (temperature and turbidity). On the other hand, the benchmark model (SVR) attained the best results for fifth input combination (M5) for prediction of BOD and fourth input combination (M4) for prediction of DO. This can be explained owing to the fact that mathematical models behave differently from one case to another following the explicitness of the internal mechanism between the predictors and the predictand. Figure 3 exhibits the model performance using the scatter plots and time series plots over the testing phase. The illustrated results of BOD belong to the best achievement of the HRSM for M2 and the best performance of SVR for M5. The HRSM prediction showed more accurate performance than the best SVR model. The highest correlation R2 was obtained using HRSM, 0.92 (M2), whereas it was 0.85 for SVR (M5). Figure 3 also shows the results for DO models. The highest values of correlation for HRSM and SVR DO models were (R2 = 0.9 (M2)) and (R2 = 0.83 (M4)), respectively.

Table 4 The statistical performance indicators for the testing period of BOD variable prediction using HRSM and SVR models
Table 5 The statistical performance indicators for the testing period of DO variable prediction using HRSM and SVR models
Fig. 3
figure 3

Scatter plots and time series presentation for the actual and predictive models

In clearer appraisal for the various performance indicators, the results of the best input combinations are enlightened in Fig. 4. The SI, RMSE, MAE, and RMSRE were compared using a bar diagram in the figure. The results showed that the HRSM had significantly lesser error compared to the SVR for all cases. For instance, the scatter index (SI) of BOD prediction using HRSM and SVR was 0.035 and 0.119, respectively. The scatter index (SI) of DO prediction using HRSM and SVR was 0.023 and 0.031, respectively. It is evident that a remarkable augmentation was achieved using HRSM. Similarly, the RMSE, MAE, MAPE, and RMSRE indicators showed very promising results using HRSM for both the targeted variables.

Fig. 4
figure 4

a Comparing errors in BOD prediction for M2. b Comparing errors in DO prediction for M2

It was observed that HRSM showed better performance compared to SVR in predicting BOD and DO in most of the cases. It proves the robustness of the proposed model in comprehending the internal relationship between the predictors and predictand of the water quality parameters. The correlation coefficient achieved using HRSM was 0.9 (M2). On the other hand, SVR predicted DO with best input combination M4 with R2 value of 0.83. The details of the performance indicators of HRSM and SVR models in predicting DO are given in Tables 4 and 5, respectively. The results of other performance indicators such as SI, RMSE, MAE, and RMSRE for M2 are given in Fig. 4b.

In both BOD and DO prediction, it seems that consideration of more input variables is not always better for prediction. The prediction matrices demonstrated better prediction skill when fewer variables were used in constructing the predictive model. It was found that the HRSM performed better in the prediction of both BOD and DO when trained with only two parameters (i.e. temperature and turbidity). This observation matches well with the correlation values presented in Table 2 in which temperature and turbidity were found as the major attributes that affect the BOD and DO magnitudes. Indeed, the primary goal of a unique prediction model should be achieved closer approximation rather than including more parameters in the process. It is significantly important from the perspective of laboratory efforts. Also, this is highly valuable for the catchments that lack environmental information. The results indicate that it is important to focus on particular parameters only that have significant impact on prediction process and internal relations. Involving more parameters sometime may confuse the model and lead to inaccurate prediction or the astray. Here, the accuracy in the prediction of BOD and DO are prioritized than the number of parameter involvements. In this case study, M2 outperformed in almost every cases which consists of only two parameters (i.e., temperature and turbidity). Temperature was exclusively considered in the M1 model. Therefore, the result indicates that the turbidity is the key parameter that provides the best prediction. It should be noted that turbidity has a significantly high coefficient of variance (CV%) in Euphrates river along with TSS, which is also directly related to turbidity. When more parameters were included (i.e., M3, M4 and so on), both the models have to reform the relations among the parameters and get biased by the parameters which have less or no influence on BOD and DO variations.

The Taylor diagram is another way to compare model performance by visualizing the errors (Taylor 2001). These diagrams graphically summarize the model efficiency and are quite new in the presentation of water quality model performance. The Taylor diagrams of HRSM and SVR for both BOD and DO are given in Fig. 5a–d. The position of each model on the diagram measures the accuracy of the model in simulating BOD and DO compare to observed data. Figure 5a shows the highest correlation for HRSM with M2 combination. The simulated BOD that approximated well compared to actual data lies nearer to the point marked “actual” on the x-axis. The RMSE between the simulated and the observed BOD is represented in the diagram as proportional to the distance from the observed point. The standard deviation of the simulated BOD is proportional to the radial distance from the origin of the diagram. The standard deviation for all cases was found to vary between 0.3 and 0.5.

Fig. 5
figure 5figure 5

Taylor diagram graphical presentation for BOD variable over the testing phase using a HRSM and b SVR predictive models. Taylor diagram graphical presentation for DO variable over the testing phase using c HRSM and d SVR predictive models

On the other hand, the predicted BOD by SVR model, as given in Fig. 4b indicates that SVR with M5 is the closest to the observed data as it has the largest correlation and lowest value of RMSE. The M2 which was the best model of HRSM showed correlation value slightly more than the SVR model. The models M3, M4, and M6 were found to fall in the mid-range category. Also note that, although models M4 and M9 showed the same correlation, M4 approximated the amplitude of the variations (standard deviation) better than M9 and yielded a lower RMSE.

Figure 5c, d demonstrated the Taylor diagram for DO. Like BOD, M2 was found to perform exceptionally well among all the model input combinations by having the largest correlation of 0.9. It also achieved the lowest standard deviation and RMSE. The M6, M10, and M5 correlated similarly with the observed values (0.4); yet, M5 showed better standard deviation among the three models. Although the models M1, M3, and M4 performed moderately and had the correlation values between 0.5 and 0.6, M3 achieved the second position in approximating the DO using HRSM model. For SVR, it was found that the M4 performed very well as it showed the highest correlation (0.8) and the lowest RMSE. The M2 and M3 performed moderately well (correlation coefficients were 0.62 and 0.65, respectively) whereas M3 had slightly better standard deviation and RMSE in comparison to M2. Among all the models, M10 performed very poorly as it showed the lowest correlation (0.3) and the largest RMSE values.

From the above analysis and the description of the plotted Taylor diagrams, it can be argued that HRSM can simulate BOD and DO more accurately when trained with the temperature and the turbidity data of water. It is very clear, especially for this case study, the BOD and DO are regulated by the turbidity of the river. The SVR method came in the close prediction of BOD for M5 (correlation 0.85), but HRSM outperformed for M2 (correlation 0.92). In case of DO, again, HRSM was found to correlate better with the observed data with a correlation coefficient of 0.9 (M2) which is higher than SVR (0.83 for M4).

Research findings discussion

RSM provides technique for mapping multi-dimensional pattern of responses of an outcome variable to the changes in controlling variables that govern physical processes of the system. The strength of the method lies in capturing accurate smooth approximations of responses through the selection of a set of polynomial functions which are able to capture the nonlinearity in system behavior. The main advantage of RSM over other AI techniques is its ability to employ high-order polynomial functions for accurate approximation of responses (Keshtegar et al. 2016). Therefore, it has more explanatory potential compared to other AI-based regression analyses (Edwards 2007). The smooth nature of polynomial-based approximations eliminate numerical noises and allows efficient prediction of response variable (Yeniay 2014). The major improvement in the capability of RSM is obtained in this study by combining the polynomial and exponential functions to obtain a hybrid function which has increased the ability of RSM to approximate the highly nonlinear relationship and thus the predictive capability. This has made the HRSM model superior over the SVR model in simulating the BOD and DO from other water quality parameters.

The RSM method optimizes the response surface using the best set of predictors which are highly correlated with target variable considering that other less correlated variables can add uncertainty in model prediction. Uncertainty is one of the major limitations of using AI model predictions for water quality management. The AI models are usually data-driven, and therefore, the major uncertainty in prediction arises from the uncertainty in inputs (Noori et al. 2013). The uncertainty of model inputs is propagated through the model towards the model outputs and acts as the major source of uncertainty in prediction (Beven 2006; Griensven and Meixner 2006). In order to reduce uncertainty, the RSM uses the best set of predictors to find a suitable approximation for the functional relationship between the targeted variables and the independent variables. In the present case study, the input parameters are gradually entered in the HRSM to identify the most appropriate input variables in order to reduce uncertainty in model prediction. The HRSM found only two easily measurable water quality parameters (temperature and turbidity) as most suitable for approximation of required functional relationship. Therefore, it can be remarked that the HRSM models developed in this study with only two input variables can be used for prediction of BOD and DO with less uncertainty.

The models developed in this study can be used for prediction of BOD and DO at any location of Euphrates River through simple measurement of river water temperature and turbidity. It is well known that DO in water body has inverse relation while BOD has direct relation with temperature and turbidity. DO decreases and BOD increases as temperature or turbidity increases. Therefore, identification of temperature and turbidity as the most suitable input by HRSM for prediction of BOD and BO is justifiable.

River water temperature is related to air temperature, and turbidity is related to rainfall and different properties of catchment such as land use and soil. In the future, models can be developed to predict river water temperature from air temperature and turbidity from catchment rainfall-runoff model which can be integrated with the HRSM models developed in this study for prediction of BOD and DO from rainfall and temperature data. Such models can also be used for assessment of the impacts of climate and land use changes on BOD and DO in Euphrates River and planning management strategies for mitigation of the impacts of environmental changes on river water quality.

Conclusion

In this research, the capability of a new model, HRSM, in prediction of environmental variable has been inspected. The HRSM is developed in this study to predict BOD and DO in river water. The motivation of implementing this model was to provide a robust approach to determine water quality variables using historical laboratory observations. The models were developed using various physical and chemical quality variables of water as input attributes. A period of one-decade (2004–2013) laboratory information data was used to construct the models. The HRSM model was validated against a very well-known regression data-intelligence model known as SVR. The obtained results of HRSM showed higher performance in comparison with SVR model. In addition, the proposed model demonstrated less approximation in terms of the input attributes that is extremely important for prediction of BOD and DO in catchments having less environmental or ecological information. Overall, the results revealed that HRSM can be used as a robust predictive model for Euphrates River water quality variables. Future research can be conducted for the improvement of the performance of prediction models through incorporation of more informative input attributes such as microbiological, hydrological, or even climatological variables. In addition, feasibility of natural inspired algorithms can be explored to select the appropriate casual information between the predictors and predictand as reported by Cho and Hermsmeier (2002) and Muttil and Chau (2007) in order to select appropriate input variables. Furthermore, recent water quality data can be used in which it can provide more informative attributes to the predictive model.