Keywords

1 Introduction

A model is a simplified system, which can be used to represent the complex real-life system and can be used as a substitute for a real system under specific conditions [17]. Generally, such models are based on the formalized concepts of the real system. A surface water quality modelling was developed as a tool for the better understanding of the mechanisms and interactions between anthropogenic residual inputs and resulting water quality [7]. In the context of global climate change under anthropogenic greenhouse warming, the sensitivity of water quality will be more intensified under changes in hydro-meteorological variables. The self-purification capacity of the river in response to the pollutants and climate depends on various hydro-meteorological variables and water quality parameters. One such water quality variable, which gets most influenced by climate change and human interventions is River Water Temperature (RWT). The reasons for the alterations in RWT are generally due to human activities and anthropogenic heat sources include water withdrawals and additions, changes in channels, dam operation, alterations in riparian cover, industrial cooling water, outfalls from a sewage treatment plant, net exchange from groundwater temperature and downstream of a thermal plant. The RWT is of particular significance as (i) the discharge of excess heat from industries and municipal effluents can affect the aquatic ecosystem, (ii) temperature influences all biological and chemical reactions, and (iii) temperature variations affect the density of water and hence the transport of water [35]. It is also a vital physical property of rivers, directly affecting water quality in terms of reaction rates and dissolved oxygen (DO) levels. Increase in RWT results in the decrease of DO levels which leads to anaerobic conditions in aquatic systems, thereby affecting marine life in terms of availability of food, reproduction and migration. Besides, the river water temperature is a prominent variable in the context of climate change as it is a function of climatic variables such as air temperature, humidity, solar radiation and wind speed. Reliable prediction and assessment of RWT sensitivity under climate change have become the main issue for many environmental applications, hydrology and ecology. To this end, numerous methods have been developed in the recent years by several authors for the estimation of river water temperature as a basic mathematical model to represent the complex system of hydro-meteorological and climate data along with water quality parameters.

Some of the river water temperature models generally used are heat advection-dispersion transport equations [30, 41], which incorporates the net heat transfer processes at surface of water by using thermal equilibrium concepts [4, 5, 11, 19, 26]. Stochastic RWT models separate the RWT time series into long-term annual component (annual cycle) and short-term components [6]. Few RWT models were based on the mathematical representation of the underlying physics of heat exchange between the river and the surrounding environment [20, 32, 30]. To incorporate the watershed hydrology from climatic changes on RWT, the physically based hydrologic and stand-alone stream temperature models have been effectively used to simulate RWT (e.g. Soil Water Assessment Tool (SWAT), [2, 13]; BasinTEMP, [1]; QUAL2K, [8]. Although a mechanistic temperature model could give very accurate results, it requires large amounts of detailed data and also computationally intensive. These typically require numerous input data including stream geometry, hydro-meteorology, vegetation cover and land use, along with in-depth knowledge of the field. Furthermore, these models sometimes have complex practical implementation issues, when it is the large spatial domain of interest.

To this end, the regression-based models become well accepted in the research community under the limitation of complex meteorological and hydrological river data availability. Traditionally, river water temperature has been related to air temperature as a surrogate for net heat exchange and as an approximation to equilibrium temperature (e.g. [31]). A linear regression model relating air and water temperatures are generally most adopted model to predict the RWTs (e.g. [5, 23, 25]). These models usually predict river water temperature at weekly, monthly and annual time steps, relying mainly on the relatively high correlation between air and water temperature at those timescales. Due to the computational feasibility and ease of implementation, linear regression models have been used to obtain the relation between air and water temperature (e.g. [12, 23, 24, 25, 33]). Neumann et al. [23], developed a linear regression method to model daily maximum stream temperature in terms of maximum air temperature for the Truckee River in California and Nevada. In linear regression models, the AT and RWT are considered as the independent and dependent variables respectively and these models are claimed to work more accurately at weekly to monthly scale rather than daily scale [5]. Webb et al. [40] noted that flow is another important variable that should be considered in water temperature prediction models, and air and water temperatures are more strongly correlated when flows are below median levels. Several authors related river water temperature with both streamflow and air temperatures with the linear regression models (e.g. [21, 25]). Streamflow has an inverse relationship with the water temperature, due to the fact that as sufficient amount of streamflow is available, then the effect of river water temperature will decrease. The streamflow will be of more interest in the RWT prediction models, particularly in snowmelt-fed rivers and rivers impacted by hydropower production [36]. Generally, such regression-based models are applied by training or calibrating the model for a subset of historical data and then validate or test with the historical data which may be independent of historical data used in the training or calibration. The trained and tested models can be used for future prediction of RWT. Such linear models facilitate to study the sensitivity of the RWT to the changes in AT under changed conditions of climate [21, 25, 27, 28, 38]. Rehana and Mujumdar [25], used a linear regression model with daily data for understanding the sensitivity of RWT for the changes in AT of 1 to 2 °C and 10 to 20% reduction in the streamflows for Tunga-Bhadra river India. Further, Rehana et al. [27] revealed that the sensitivity of RWT will be for about 2.76 °C under various air temperature and discharge changes when compared with the observed conditions at mean annual scale for the Missouri River at Nebraska City, Nebraska, USA.

Morrill et al. [21] used both linear and nonlinear models in 43 river and stream sites in 13 countries and indicated that the air/water temperature relationship is better fitted with non-linear regression. Linear Regression is less appropriate if the assumption of linear relationship cannot be verified. Also, these models are sensitive to outliers and can suffer from the problems of overfitting. i.e., regression begins to model the random error (noise) in the data, rather than just the relationship between the variables. Further, linear regressions become unsuitable for modelling the RWT extremes, for example, at highest (due to increased evaporative cooling) and lowest temperatures (due to freezing). Mohseni et al. [18] developed a four-parameter non-linear regression model at a weekly time step, which is widely accepted in the research community (e.g. [28, 38]. Van Vliet et al. [38] improved the non-linear regression model developed by Mohseni et al. [18] with the inclusion of streamflow and applied at daily time scale.

Apart from regression-based models, another set of data-driven models which became promising due to the advancement of machine learning models in the RWT estimation are based on Artificial Neural Networks, Support Vector Machines (SVM), Boosted Regression Trees (BRS), specifically for data scarce regions. There is a recent advancement towards the use of Artificial Neural Networks (ANN) in river water quality prediction (e.g. [14, 29]). Modelling of RWT using ANN has gained much attention in the literature (e.g. [9, 29]) due to its ability to capture and represent complex non-linear relationships. DeWeber and Wagner [10] applied ANN for estimating daily mean RWT of the individual stream reaches throughout the range of Brook Trout Salvelinus Fontinalis in the eastern U.S with different groups of predictor variables including climate, landform and land cover attributes. Temizyurek and Dadaser-Celik [34], used ANN to study the effect of meteorological parameters on RWT at Kızılırmak River in Turkey. However, Support vector regression (SVR), which is based on structural risk minimization to avoid overfitting [37] has been adopted over ANN in several research studies due to the uniqueness and globalization of the solution [39]. In this context, there are limited studies for testing the predictability of RWT with SVR in the literature. To this end, the present work adopted well-accepted machine learning algorithm, such as Support Vector Machine (SVR) to analyze the predictability performance of river water temperature. The present study used SVR model to compare the predictability performance with a linear regression model. A Multiple Linear Regression Model (MLRM) with air temperature and streamflow as predictors and daily RWT as predictand variable was compared with the predictability of SVR model. The proposed machine learning algorithm of SVR is applied with air temperature and streamflow as predictors to estimate the RWT at Shimoga river water quality checkpoint along Tunga-Bhadra, a tributary of Krishna river, India. For understanding the possible variability in RWT under climate change, a statistical downscaling model based on Canonical Correlation Analysis (CCA) has been adopted. The future RWT projections were analyzed using the trained and tested MLRM and SVR models with the downscaled projections of air temperature and streamflow.

2 Data and Methods

Tungabhadra River is one of the highly polluted rivers in India after Yamuna River due to the rapid growth of urban industries located along the river such as effluents from paper, pulp, rayon and steel industries such as Mysore paper mill and Harihar poly fibre. Tunga River, of length 147 km and Bhadra River, about 178 km long originates in the Western Ghats, and join at Kudli, which is about 14.5 km from Shimoga city, to form the Tungabhadra River (Fig. 1). The river location considered for the quantification of RWT is Shimoga along the Tunga River. The river location receives the waste load from Shimoga city municipal effluent. The daily streamflow and river water temperature data from 1988 to 2005 recorded at Shimoga station was obtained from Central Water Commission (CWC), Karnataka, India.

Fig. 1
figure 1

Location map of Tunga-Bhadra River and Shimoga station, India

To study the impact of climate change on RWT, the downscaled streamflow and air temperatures were obtained by considering the large-scale climate predictor variables as air temperature, mean sea level pressure, specific humidity, U-wind, V-wind and geopotential height based on earlier studies [26]. The selected predictor variables for the period of January 1948 to December 2005 for six (National Center for Environmental Prediction/National Center for Atmospheric Research) NCEP/NCAR grid points were extracted for the given region of 10–20°N to 70–80°E with a spatial resolution of 2.5° × 2.5°. The daily streamflow and air temperature data and predictor set for the period of 10 years (1988–1998) were used for training the downscaling model with CCA and the data from 1999 to 2005 was used for testing. The future climate variables were obtained from the simulations of the Beijing Climate Center (BCC-CSM1-1) model output prepared from CMIP5 (Coupled Model Inter-comparison Project 5), by the Beijing Climate Center, China Meteorological Administration. The BCC-CSM1-1 model was selected based on the availability of CMIP5 projections of predictor variables to demonstrate the modelling of RWT using SVR and to analyze the future projections. The IPCC AR5 models implemented set of scenarios, called Representative Concentration Pathways (RCPs) in which radiative forcing due to anthropogenic factors reaches 2.6 (RCP 2.6), 4.5 (RCP 4.5) and 8.5 (RCP 8.5) Wm−2 by 2100, were selected for studying the possible RWT changes under climate change. The RCP 8.5, was considered as a possible scenario for the present study, which represents high concentration mitigation pathway which continues to rise throughout the 21st century. The daily GCM simulations for historical and future scenarios from CMIP5, RCP8.5 were obtained from World Data Center for Climate (http://cera-www.dkrz.de/maintenance.html).

3 Multiple Linear Regression Model (MLRM)

A MLRM is developed at daily scale to predict the RWT for Tunga-Bhadra River with air temperature and streamflow as predictor variables. The MLRM developed based on the training is given in the following equation:

$$T_{w} = a + bT_{air} + cQ$$
(1)

where \(T_{w}\) is the daily river water temperature in °C, \(T_{air}\) is the daily air temperature in °C, Q is the daily discharge in m3/s, and a, b, c are the parameters estimated based on the training the MLRM.

4 Support Vector Regression (SVR)

The Support Vector Machine (SVM) is a kernel function learning machine, which follows the structural risk principle [37]. When the training data of \(\left\{ {\left( {x_{1} ,y_{1} } \right), \ldots ..\left( {x_{n} ,y_{n} } \right)} \right\}\) with n patterns, a function \(f\left( x \right)\) will be identified with the consideration of the deviation from the actually observed target variables yi for all the training data [16]. The input variables, X will be mapped into a higher dimensional feature space using a non-linear mapping function \(\Phi\).

$$f\left( {x;w} \right) = < W,\Phi \left( {\text{x}} \right) > + b$$
(2)

where < , > denotes the inner product, and W and b are the regression coefficients, which can be estimated by minimizing the error between \(f\left( x \right)\) and the observed values of y. SVR uses the \(\in\)-insensitive error to measure the error between \(f\left( x \right)\) and the observed values of y.

$$\left| {f\left( {x;w} \right) - y} \right|_{ \in } = \left\{ {\begin{array}{*{20}l} 0 \hfill & {if \left| {f\left( {x;w} \right) - y} \right| < \in } \hfill \\ {\left| {f\left( {x;w} \right) - y} \right| - \in , } \hfill & {otherwise,} \hfill \\ \end{array} } \right.$$
(3)

Using the training data of \(\left( {x_{i} ,y_{i} } \right)\), the values of w and b are estimated by minimizing the objective function:

$$F = \frac{C}{N}\sum\nolimits_{i = 1}^{n} {\left| {f\left( {x_{i} ,w} \right) - y_{i} } \right|_{ \in } + \frac{1}{2}\left\| w \right\|^{2} }$$
(4)

where C and \(\in\) are the hyper-parameters. The minimization of the objective function, F, uses Lagrange multiplier method, and the final regression equation with kernel function \(K\left( {X,X^{{\prime }} } \right)\) can be in the form:

$$f\left( X \right) = \sum\nolimits_{i} {K\left( {X,X_{i} } \right) + b}$$
(5)

The well-known kernel functions are Linear, Polynomial, Radial Basis Function (RBF), Sigmoidal. The present study tried the Linear and Gaussian and RBF kernels and the Gaussian kernel function has been identified as suitable one in terms of the performance measures for the RWT modelling.

5 Evaluation Criteria of Model Performance

The model performance of MLRM was tested based on the Nash-Sutcliffe coefficient (NSC) [22] (Eq. 6), to show the efficiency of the model fit. The quality of the MLRM is analyzed using Root Mean Square Error (RMSE) [15] (Eq. 7).

$$NSC = \frac{{\sum\nolimits_{i = 1}^{n} {\left( {T_{{W_{Sim} }} - T_{{W_{Obs} }} } \right)^{2} } }}{{\sum\nolimits_{i = 1}^{n} {\left( {T_{{W_{Obs} }} - T_{{W_{Obs,Avg} }} } \right)^{2} } }}$$
(6)
$$RMSE = \sqrt {\frac{{\sum\nolimits_{i = 1}^{n} {\left( {T_{{W_{Sim} }} - T_{{W_{Obs} }} } \right)^{2} } }}{n}}$$
(7)

where \(T_{{W_{Sim} }}\) is the simulated daily river water temperature at time step i in °C; \(T_{{W_{Obs} }}\) is the observed daily river water temperature at time step i in °C; \(T_{{W_{Obs,Avg} }}\) is the average daily river water temperature at time step i in °C; n is the number of data pairs in comparison.

6 Statistical Downscaling Model

A statistical downscaling model can be adopted to predict the changes in daily streamflow and air temperature projections based on General Circulation Models (GCMs) outputs. GCMs are climate models designed to simulate time series of climate variables globally, accounting for the greenhouse gases in the atmosphere for current and future scenarios. Downscaling models are the statistical techniques, which are used to bridge the spatial and temporal resolution gaps between the GCMs and impact assessment studies. Generally, these methods involve deriving empirical relationships that relate the large-scale simulations of climate variables (referred as the predictors) provided by a GCM to regional scale hydrologic variables (referred as the predictands). A multivariable statistical downscaling model based on Canonical Correlation Analysis (CCA) was used in the present study, which relates the atmospheric climate variables and downscalable variables (e.g. streamflow and air temperatures) linearly. The downscaling model involves, data pre-processing of standardization and normalization to remove the systematic bias in the climate model simulations, data reduction methodology of Principal Component Analysis (PCA) on the large-scale climate variables of predictors [26]. The preprocessed predictors and predictands (streamflow and air temperatures) were given as input to CCA model, which converts them into canonical variables (Eqs. 1 and 2). Canonical regression equations will be developed for both streamflow and air temperatures separately with the NCEP/NCAR reanalysis data sets (X) and observed data (Y) of period

$$U_{m} = a^{T} X,q = 1,. \ldots \hbox{min} (N,M)$$
(8)
$$V_{m} = b^{T} Y,q = 1,. \ldots \hbox{min} (N,M)$$
(9)

where \(U_{m}\) and \(V_{m}\) are called predictor and predictand canonical variables respectively, \(a = [a_{1} ,a_{2} , \ldots a_{N} ]^{T}\) and \(b = [b_{1} ,b_{2} , \ldots b_{N} ]^{T}\) are canonical loadings or weights. The canonical correlation, \(\rho_{{c_{q} }}\), between predictors canonical variable, \(U_{q}\) and predictand canonical variable, \(V_{q}\) is maximum. The canonical coefficients of the predictor and predictand variables estimated based on the training period from 1988 to 1998 and tested for the period from 1999 to 2005 was used for the future projections of streamflow and air temperatures with GCM projected climate variable.

7 Results and Discussion

The statistical downscaling model based on CCA was used to predict the changes in daily streamflow and air temperature projections from BCC-CSM 1-1 GCM for the period from 2006 to 2099. Figure 2 shows the observed, simulated with NCEP data and simulated with GCM data for streamflow and air temperature for the training period of 1988 to 1998. The performance of the statistical downscaling model was tested with the Root Mean Square Error (RMSE) and Nash-Sutcliffe coefficients. The performance of the downscaling model in terms of N-S coefficients as 0.73 and 0.21 for the training and testing periods respectively for streamflow, whereas, for air temperature as 1.00 and 0.56 for the training and testing periods respectively. The RMSE values for the training and testing periods for streamflow were obtained as 403.61 and 419.77 respectively, whereas, for air temperature, the RMSE values were obtained as 3.41 and 3.92 for the training and testing periods, respectively.

Fig. 2
figure 2

Observed and simulated from NCEP and GCM (BCC-CSM1-1) data sets for the training period of 1988 to 1998 for a streamflow and b air temperatures

Overall, a significant decrease in daily streamflow values and increase in air temperatures were observed for Tunga river at station Shimoga for current and projected scenarios [26]. The historical and projected streamflow and air temperatures were used with MLRM and SVR to study the impact of RWT under climate change. The present study compared the MLRM and SVR models to predict the RWT at daily scale along Tunga-Bhadra River. For both the models, the training period is considered as 1989 to 1999 and the testing period as 2000 to 2005. The trained and tested models of the MLRM and SVR with good agreement over the performance measures were used for the future prediction of RWT. The Fig. 3 shows the observed, simulated daily RWT with MLRM and SVR for (a) training and (b) testing period of 1989 to 1999 and 2000 to 2005 respectively. The performance of MLRM in predicting the RWT in terms RMSE for training and testing periods were obtained as 1.19 and 1.85 respectively, whereas, the N-S numbers for training and testing were obtained as 0.79 and 0.53 respectively. The predictability of RWT has been improved by applying the SVR model with RMSE for training and testing periods as 0.95 and 1.69 respectively. The N-S numbers in the prediction of RWT for training and testing periods were obtained as 0.87 and 0.61 respectively with SVR. Overall, the performance of the MLRM and SVR in the prediction of daily RWT were satisfactory in terms of RMSE and N-S numbers, with more accuracy towards the SVR model. The trained and tested MLRM and SVR models were used to predict the RWT for future scenarios with the projections obtained from CCA downscaling model.

Fig. 3
figure 3

Observed and simulated river water temperature for a training b testing periods for Tung-Bhadra River at Shimoga, Karnataka, India

Table 1 shows the annual mean of RWT at Shimoga, along Tunga-Bhadra river for the historical period of 2000–2005 and for the future time periods of 2020–2040, 2041–2060, 2061–2080, 2081–2100 for MLRM and SVR models. Figure 4 shows the observed and projected annual RWT for the future time periods of with MLRM and SVR models. From Table 1 and Fig. 4, it is evident that the RWT projections based on regression model have been identified as more pronounced compared to SVR model. However, the present study revealed that there will be a significant impact on climate change on RWT with pronounced increases at annual scales. Tunga-Bhadra river has been suffered in terms of river water quality with the decrease of streamflow of about 3.1% at Shimoga for the historical periods [25] and 21% of reduction for the period of 2070–2100 MIROC 3.2 GCM along the Tungabhadra River [26]. Furthermore, the air temperature is also projected to increase about 1.66 °C for the period from 2070 to 2100 according to MIROC 3.2 GCM along the Tunga-Bhadra River [26] and therefore a significant increase in the RWT extremes [28] leading to deterioration of water quality.

Table 1 The annual mean of RWT for the observed and future time periods for MLRM and SVR
Fig. 4
figure 4

Annual river water temperature for a observed period of 2000–2005 b future projections for period of 2020–2040, 2041–2060, 2061–2080 and 2081–2100 with MLRM and SVR models for Shimoga station, Tunga-Bhadra river, India

8 Conclusions and Future Directions

Modelling river water quality under climate change is prominent to understand the projected risk of low water quality and possible adaptation and management policies to be implemented. Such impact assessment models need to be integrated with climate change projections models. The present study integrated the RWT prediction models with a statistical downscaling model to analyze the climate change impacts on temperatures of rivers. A multiple linear regression model and support vector regression models were developed to predict the daily RWT under climate change along Shimoga Tunga-Bhadra river, India. The SVR model has identified as the best prediction performance compared to linear regression models. The SVR model fitted the daily RWT with a N-S number of 0.87 and 0.61, whereas the MLRM fitted the RWT with N-S numbers as 0.79 and 0.53 respectively for training and testing periods. The fitted models of SVR and MLRM based on historical data were used with the downscaled projections of streamflow and air temperatures from CCA downscaling model. The RWT projections based on MLRM model has been identified as more pronounced compared to SVR model. The annual RWT increase for the river from near future time period of 2020–2040 to 2081–2100 is estimated as 3.99 and 2.24 °C for MLRM and SVR respectively. The more intensified changes in RWT was predicted based on a linear regression model compared to the advanced machine learning algorithm of SVR. Therefore, the present study suggests the use of both the data-driven models for the possible application to study the RWT under climate change. Although such data-driven models are not accurate to predict the changes in RWT due to the non-stationarity relationship between air and RWT over time, the simplicity of applicability for predicting future RWT motivates to adapt in the management policies. Further, the data-driven models will not provide a physical justification, and projections made on such models are always subjected to uncertainties [3] as the models are validated within the range of measured values [36]. Therefore, knowing the limitations and strengths of each of the existing models, the RWT prediction tools can be applied for the effective assessment of river water quality for various spatial and temporal scales of the case studies varying from global to local/regional scales.