1 Introduction

Highly reliable and accurate forecasts of river flows, both short-term and long-term, are of prime importance in water resources management for a variety of purposes such as optimizing water resources systems as well as planning for future expansion or reduction. As such, river flow forecasting has always been of particular interest among hydrologists and that many models have been proposed to improve the river flows forecasting.

In the last decade, a support vector machine (SVM) which is a relatively new tool from the artificial intelligence field and is known as the classification and regression technique has gained the attention in hydrology such as: to predict soil moisture (Gill et al. 2006); to predict rainfall runoff (Dibike et al. 2001); to predict stream flow or stage (Liong and Sivapragasam 2002; Asefa et al. 2006; Yu et al. 2006); and to predict lake water level (Asefa et al. 2005; Khan and Coulibaly 2006).

Recently, the application of wavelet transform which provides useful decompositions of original data, so that the resulting data improve the performance of a forecasting model by capturing useful information has also gained the attention in hydrology context. For example, Smith et al. (1998) explored the application of a discrete wavelet transform (DWT) to daily river discharge records to demonstrate its potential for quantifying stream flow variability. They suggested that stream flows could be effectively classified into distinct hydroclimatic categories using DWT. Partal and Kucuk (2006) used the wavelet analysis to determine the possible trends in annual total precipitation series. Li et al. (2009) used discrete wavelet analysis for identifying relationship between sunspots and natural runoff in the Yellow River. Huaqi et al. (2011) conducted long-term trend analysis for the runoff series in Yulin area in Northwest China based on wavelet transforms. Shiau and Huang (2014) applied continuous wavelet transform to detect hydrologic alteration at various scales caused by reservoir operation for the Feitsui Reservoir located in northern Taiwan.

Several studies have been published by coupling the wavelet transform or other data-preprocessing methods (e.g. singular spectrum analysis) with artificial intelligence approaches such as artificial neural networks (ANNs) and support vector machine (SVM) methods. For example, Partal and Cigizoglu (2008) used a combined wavelet and ANN method to estimate and predict the suspended sediment load in rivers and found that the model provides a good fit to observed data for the testing period. Adamowski and Sun (2010) used coupled discrete wavelet transforms and ANN models for flow forecasting at lead times of 1 and 3 days in non-perennial rivers in semi-arid watersheds and found that these models are able to provide more accurate flow forecasts than the artificial neural networks. Kisi and Cimen (2011) used discrete wavelet transform and support vector machine (SVM) conjunction models for monthly stream flow forecasting and found that the conjunction models could increase the forecast accuracy of the support vector machine. Adamowski and Chan (2011) used coupled discrete wavelet transforms and ANN models for groundwater level forecasting and found that these models are able to provide more accurate groundwater level forecasts compared to the ANN and ARIMA models. Venkata Ramana et al. (2013) studied the application of wavelet transform and ANN conjunction model to predict monthly rainfall data. They found that the conjunction models are more effective than ANN models. Wang et al. (2014) applied SVM, genetic programming (GP) and seasonal autoregressive (SAR) models coupled with singular spectrum analysis in predicting monthly inflow for the three Gorges Reservoir and their results indicated that the data preprocessing can significantly improve prediction precision of SVM and GP models. Sahay and Srivastava (2014) proposed a discrete wavelet transform-genetic algorithm-neural network model for forecasting 1-day-ahead monsoon river flows. They found that their proposed model is better than genetic algorithm-optimized ANN model.

The quality of SVM models depends on a proper setting of SVM model parameters; the main issue for SVM regression practitioners is how to set these parameters in their applications (Cherkassky and Ma 2004). The parameters have usually been determined by a trial-and-error process which is less efficient and not easy to obtain an optimum set of parameters promising the performance of the model (Li et al. 2010). Cherkassky and Ma (2004) proposed an analytic parameter selection prescription directly from the training data. Wang et al. (2009) used the shuffled complex evolution algorithm to determine SVM model parameters in forecasting monthly discharge time series. Li et al. (2010) implemented a genetic algorithm (GA) to determine the optimal parameters of the SVM model in predicting the inflow to Shihmen reservoir in Taiwan. The GA is a heuristic global optimization technique that has been applied to several complex problems and shown to converge to near optimal solutions (Winston and Venkataramanan 2003).

The objectives of this paper are to study the performance of coupled wavelet and support vector machine model for monthly flow forecasting and to compare this with the performance of a single support vector machine model. Also, genetic algorithm (GA) is applied to select the optimum values of the parameters involved in an SVM model. Moreover, the use of all wavelet decomposed sub-series as inputs to the SVM models are investigated since using the selection of only effective sub-series can be viewed as a diminutive approach since all sub-series are equally important and contain important information about the original time series.

2 Methods

2.1 Support Vector Regression (SVR)

Support vector regression (SVR) is simply used to describe regression with SVM.

Given a set of training data {x i , y i } N i = 1 , x ∈ R m, y ∈ R (x i is the input vector of m component, y i is the corresponding output value and N is the total number of data patterns), SVR is formulated as follows:

$$ f\left(x,w\right)={\displaystyle \sum_{i=1}^N{w}_i{\phi}_i\left({x}_i\right)}+b $$
(1)

where ϕ(x) denotes a set of non-linear transfer function that maps the input vector into high dimensional feature space in which theoretically a simple linear regression can cope with the complex non-linear regression of the input space, and w and b denote the coefficients.

The coefficients w and b can be estimated by minimizing the following regularized risk function (Vapnik 1995, 1998):

$$ R(f)=C\frac{1}{N}{\displaystyle \sum_{i=1}^N{L}_{\varepsilon}\left(f\left({x}_i\right)-{y}_i\right)}+\frac{1}{2}{\left\Vert w\right\Vert}^2 $$
(2)

where \( {L}_{\varepsilon}\left(y,f\left(x,w\right)\right)=\left\{\begin{array}{l}0\kern1em if\left(y-f\left(x,w\right)\le \varepsilon \right.\\ {}\left|y-f\left(x,w\right)\right|-\varepsilon \kern1em otherwise\end{array}\right. \) is called ε − insensitive loss function, the constant C > 0 specifies a trade-off between an approximation error and the weight vector ‖w‖. ε is called as the tube size that is equivalent to the approximation accuracy placed on the training data points. Both C and ε must be chosen beforehand by the user.

Two positive slack variables ξ and ξ represent the distance from actual values to the corresponding boundary values of ε − tube are introduced and then Eq. (2) into the following constrained form is transformed:

$$ \underset{w,\;b,\;\xi,\;{\xi}^{\ast }}{ \min}\kern1em \frac{1}{2}{\left\Vert w\right\Vert}^2+C{\displaystyle \sum_{i=1}^N\left({\xi}_i+{\xi}_i^{\ast}\right)} $$
(3)
$$ \mathrm{Subject}\ \mathrm{t}\mathrm{o}\left\{\begin{array}{l}{w}_i.\phi \left({x}_i\right)+{b}_i-{y}_i\le \varepsilon +{\xi}_i^{\ast}\\ {}{y}_i-{w}_i.\phi \left({x}_i\right)-{b}_i\le \varepsilon +{\xi}_i\\ {}{\xi}_i,{\xi}_i^{\ast}\ge 0,\kern1em i=1,2,\dots, N\end{array}\right. $$

Then, the dual form of the non-linear SVR can be expressed as

$$ \max \kern1em -\frac{1}{2}{\displaystyle \sum_{i=1}^N{\displaystyle \sum_{j=1}^N\left({\alpha}_i-{\alpha}_i^{\ast}\right)\left({\alpha}_j-{\alpha}_j^{\ast}\right)K\left({x}_i,{x}_j\right)-\varepsilon {\displaystyle \sum_{i=1}^N\left({\alpha}_i+{\alpha}_i^{\ast}\right)+{\displaystyle \sum_{i=1}^N{y}_i\left({\alpha}_i-{\alpha}_i^{\ast}\right)}}}} $$
(4)
$$ \mathrm{Subject}\ \mathrm{t}\mathrm{o}\left\{\begin{array}{l}{\displaystyle \sum_{i=1}^N\left({\alpha}_i-{\alpha}_i^{\ast}\right)=0}\\ {}0\le {\alpha}_i\le C,\kern1em i=1,\;2,\dots,\;N\\ {}0\le {\alpha}_i^{\ast}\le C,\kern1em i=1,\;2,\dots, N\end{array}\right. $$

where α i and α i are Lagrange multipliers. K(x i , x j ) = ϕ(x i ). ϕ(x j ) is a kernel function to yield the inner products in the feature space ϕ(x i ) and ϕ(x j ). In this study, radial basis function (RBF) is used as the kernel function as follows:

$$ K\left({x}_i,\;{x}_j\right)= \exp \left({\scriptscriptstyle \raisebox{1ex}{$-{\left\Vert {x}_i-{x}_j\right\Vert}^2$}\!\left/ \!\raisebox{-1ex}{$2{\sigma}^2$}\right.}\right) $$
(5)

where σ denotes the width of radial basis function.

Therefore, the non-linear regression function can be given as:

$$ f(x)={\displaystyle \sum_{i=1}^N\left({\alpha}_i-{\alpha}_i^{\ast}\right)\ K\left({x}_i,\;{x}_j\right)+b} $$
(6)

2.2 Genetic Algorithm (GA)

The SVR model has three free parameters (C, ε, σ) to be determined by the user. Although determining these parameters is often a trial-and-error process, these parameters greatly affect the performance of the SVR model. In this study, genetic algorithm (GA) is applied for selecting the optimal parameters of the SVR model.

GA is based on the principle of the survival of the fittest, and attempt to retain genetic information from generation to generation. The major benefits of GA are its ability to find optimal or near optimal solutions with relatively modest computational requirements (Pai 2006).

Figure 1 depicts the framework of GA implementation for optimizing the SVR parameters, which is described as follows.

Fig. 1
figure 1

The framework of GA implementation for optimizing the SVR parameters

  1. a)

    Initialization: Establish randomly an initial population of chromosomes which represent the values of parameters C, ε and σ in the SVR model. The range of C is defined as [1 100], the range of ε is defined as [0.0001 0.01], and the range of σ is defined as [0.1 10]. The population size is set to 20.

  2. b)

    Evaluation of the fitness function: Calculate the fitness function of each chromosome. In this study, root mean square error (RMSE) is used as the fitness function.

  3. c)

    Selection: Select excellent chromosomes to reproduce.

  4. d)

    Crossover and mutation: Create new offspring by performing these operations.

  5. e)

    Stop condition: If the stop conditions are met, GA would be stopped. The optimum parameters would be output according to the best fitness function value. In contrast, steps from b to d are repeatedly executed until the conditions are satisfied.

2.3 Wavelet Transform

Wavelets are mathematical functions that give a time-scale representation of the time series and their relationships to analyze time series that contain non-stationarities. The wavelet transform allows the use of long time intervals for low frequency information and shorter intervals for high frequency information and is capable of revealing aspects of data like the identification of the dominant modes of variability, and the determination of how these modes vary in time (Adamowski and Sun 2010; Huaqi et al. 2011).

The continuous wavelet transform (CWT) of a signal x(t) is defined as follows:

$$ CW{T}_x^{\varPsi}\left(\tau,\;s\right)=\frac{1}{\sqrt{\left|s\right|}}{\displaystyle \underset{-\infty }{\overset{+\infty }{\int }}x(t){\varPsi}^{\ast}\left(\frac{t-\tau }{s}\right)\;}dt $$
(7)

Where s is the scale parameter, τ is the translation parameter and ‘*’ denotes the complex conjugate. The mother wavelet Ψ(t) is the transforming function.

However, the CWT requires a significant amount of computation time and data. In contrast, the DWT requires less computation time and is simpler to develop compared to the CWT (Adamowski and Chan 2011). Therefore, discrete wavelet transform (DWT) has been developed to overcome the drawbacks of the CWT.

The wavelet basis function for the DWT can be derived as:

$$ {\varPsi}_{m,n}(t)={a}^{\raisebox{1ex}{$-m$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}\varPsi \left(\frac{t-n\;{\tau}_0\;{a}^m}{a^m}\right) $$
(8)

where, m and n integers that control the scale and time, respectively; t is the time; a is a specified fixed dilation step greater than 1 and τ 0 is the location parameter that must be greater than zero. The term \( {a}^{\raisebox{1ex}{$-m$}\!\left/ \!\raisebox{-1ex}{$2$}\right.} \) in the above equation normalizes the functions.

The DWT operates two sets of functions: high-pass and low-pass filters. The original times series is passed through high-pass and low-pass filters, and detailed coefficients and approximation series are obtained (Adamowski and Sun 2010).

2.4 Model Performance Criteria

The results of the models applied in this study were evaluated by means of root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (CC), coefficient of efficiency (CE), and seasonally-adjusted coefficient of efficiency (SACE) between observed and forecasted monthly river flow.

3 Study Area and Data

In this study, the monthly river flow data of Kharjeguil station on Nav River and Ponel station on Shafarud River in northern Iran were used. For both stations the observed river flow data is 40 years (480 months) from October 1966 to September 2006. For both stations, the first 75 % of the whole data set is used for calibration/training and the remaining 25 % of the whole data set is used for testing/validation. The observed river flow data set is for water years, i.e. the first month of the year is October and the last month of the year is September.

As the river flow in the watersheds is strongly seasonal with a pronounced annual cycle, this information needs to be prepared to the model. We prepare this information by including two time series in the inputs. Each series is represented by the oscillation of a sine curve and the other of a cosine. The whole cycle is represented by 12 cyclic pairs of values, one unique pair for each month of the year. These two inputs will be referred to as time information.

4 Application and Results

When developing an SVR model, determining the optimum values of the parameters is a problem that is often chosen in applications by a trial-and-error approach. In this study, the genetic algorithm — support vector regression (GA-SVR) models are models which use, as inputs, the original river flow series and two time information. The coupled genetic algorithm and support vector regression models are referred to as GA-SVR models in this research. These models are obtained by combining two methods, genetic algorithm (GA) and support vector regression (SVR) in which the GA method is used for selecting the optimum parameters involved in the SVR method. The C, ε, and σ parameters of the optimum GA-SVR are provided in Table 1.

Table 1 The optimum parameters of GA-SVR and wavelet GA-SVR models for both stations

For both stations, all the GA-SVR models were first trained using the data in the training sets (using the first 75 % of the data) to obtain the optimum values of the parameters, and then validated (using the remaining 25 % of the data). The GA-SVR models were then compared using statistical measures of root mean square error (RMSE), mean absolute error (MAE), correlation coefficient (CC), coefficient of efficiency (CE), and seasonally-adjusted coefficient of efficiency (SACE). Table 2 shows the GA-SVR model performance statistics (RMSE, MAE, CC, CE, and SACE) for both rivers based on preceding monthly flows and time information to estimate current flow value.

Table 2 The RMSE, MAE, CC, CE, and SACE statistics of GA-SVR and wavelet GA-SVR applications for both stations

For the Kharjeguil station, it can be seen that the GA-SVR model whose inputs are the flows of two previous months along with time information has the best accuracy in the training period but the model is not able to generalize well. It seems that adding the second previous month to the model has caused the model to over-train. While the GA-SVR model (the first model) whose inputs are the flow of previous month and time information has the best accuracy in the testing period. Therefore, this model could be chosen as the best for this station. The GA-SVR forecast in the test period is shown in Fig. 2a. For the Ponel station, the statistics of GA-SVR models in the table indicate that the GA-SVR model (the first model of this station) has the higher generalization ability than the other model. Thus, this model could be chosen as the best model for this station. The best GA-SVR forecast of this station in the test period is shown in Fig. 2b.

Fig. 2
figure 2

Forecasted and observed river flow in the test period of the best GA-SVR model and the best wavelet GA-SVR model a for the Kharjeguil station and b for the Ponel station

In this study, the wavelet genetic algorithm — support vector regression (wavelet GA-SVR) models are models that use, as inputs, sub-series components (Ds and As that represent details and approximations sub-series, respectively) obtained from the use of the DWT algorithm on the original river flow data, and two time information (Fig. 3). Here, each flow of previous month is decomposed into an approximation and two details sub-series by using the functions in the MATLAB software. Each sub-series component contains distinct information about the original river flow data. In this research, the coupled wavelet transform and GA-SVR models are referred to as wavelet GA-SVR models (with wavelet referring to wavelet transform and GA-SVR referring to the coupled genetic algorithm and support vector regression). The C, ε, and σ parameters of the optimum wavelet GA-SVR are provided in Table 1.

Fig. 3
figure 3

The flow chart of combined wavelet GA-SVR model

For both stations, all the wavelet GA-SVR model results were then compared using the statistical measures of RMSE, MAE, CC, CE, and SACE. Table 2 shows the wavelet GA-SVR model performance statistics for both rivers based on sub-series components of preceding monthly river flows and two time information to estimate current flow value. In this table, D t − 1, D t − 2, A t − 1, A t − 2, Q t − 1, and Q t − 2 denote the D sub-series at time t − 1, the D sub-series at time t − 2, the A sub-series at time t − 1, the A sub-series at time t − 2, river flow data at time t − 1, and river flow data at time t − 2, respectively. The t 1 and t 2 variables also indicate the two time information.

For the Kharjeguil station, it can be seen that the wavelet GA-SVR model whose inputs are D t − 1, D t − 2, A t − 1, A t − 2, t 1 and t 2 has the best accuracy both in the training and validation periods. It is clear that using the sub-series obtained from the wavelet transform significantly improved the performance of the models. At this station, the wavelet GA-SVR model with six inputs could be chosen as the best model for forecasting river flow. The wavelet GA-SVR forecast in the test period is shown in Fig. 2a. It can be seen that the wavelet GA-SVR model performs much better than the regular GA-SVR model. For the Ponel station, the statistics of wavelet GA-SVR models in the table indicate that the wavelet GA-SVR model whose inputs are D t − 1, D t − 2, A t − 1, A t − 2, t 1 and t 2 has the higher accuracy and generalization ability than the other. Then, this model could be chosen as the best model for this station. The best wavelet GA-SVR forecast in the test period is shown in Fig. 2b. It can be observed from the hydrographs that the wavelet GA-SVR model performs much better than the regular GA-SVR model.

Overall, it can be seen that for both stations, the wavelet GA-SVR models provided more accurate forecasting results than the regular GA-SVR models. For the Kharjeguil station, the performance of the best wavelet GA-SVR model, which had a validation RMSE of 2.15, MAE of 1.41, CC of 0.491, CE of −0.008, and SACE of −0.560 using the regular GA-SVR model, improved to 1.42, 0.858, 0.754, 0.561, and 0.321, respectively. However, the inclusion of the wavelet transform could not improve the peak value estimation of the validation set. For the Ponel station, the performance of the best wavelet GA-SVR model, which had a validation RMSE of 2.49, MAE of 1.71, CC of 0.430, CE of −0.005, and SACE of −0.451 using the regular GA-SVR model, improved to 1.59, 0.928, 0.773, 0.589, and 0.407, respectively. The wavelet transform inclusion, for the first model, could slightly improve the testing set peak value estimation.

It should be noted that although adding inputs in the regular GA-SVR models improved the models’ performance in the training phase; however, their generalization ability were deteriorated. In contrast, with the inclusion of more inputs in the wavelet GA-SVR models, the performance of these models improved in both the training and testing periods. This indicates that the wavelet transform is able to provide data of high quality to the models with decomposing original data into several sub-series components that contain useful information than the original time series.

5 Conclusions

In this study, the potential of wavelet genetic algorithm — support vector regression (wavelet GA-SVR) was investigated for forecasting monthly river flow series for two rivers located in northern Iran. The wavelet GA-SVR models were developed by combining two methods, namely the discrete wavelet transform and genetic algorithm — support vector regression. In the developed models, the genetic algorithm is applied for selecting the optimal parameters of the support vector regression (SVR) model. The models were tested using different input combinations of monthly river flow data. However, two input variables, in this study referred to as time information, were also used in developing the models. The wavelet GA-SVR models were compared to regular GA-SVR models whose inputs are previous month of river flow data and time information. It was determined that forecasting monthly river flows based on the wavelet GA-SVR models could significantly improve the accuracy of the regular GA-SVR models. This is due to the fact that sub-series components used as input to the wavelet GA-SVR model provide important information.

Regarding to the objectives of the study, it was determined that:

  1. a)

    The wavelet GA-SVR model could be used with high accuracy in forecasting monthly river flow data compared to the regular GA-SVR models.

  2. b)

    Searching for the optimum values of parameters involved in an SVR model was solved by combining the model with the genetic algorithm.

  3. c)

    The use of all wavelet decomposed sub-series as inputs to the SVR models improved the accuracy of forecasting models.

These results indicate that the wavelet GA-SVR models are a promising method than the regular GA-SVR models in forecasting monthly river flow data.