1 Introduction

Reliable streamflow forecasts are very significant in planning and managing the water resources. In particular, in operations related to reservoirs and generation of hydroelectric power, one is more interested in the long-term streamflow forecasts. The streamflow forecast depends upon many seen factors such as precipitation, evapotranspiration, ambient temperature and on many other unseen factors such as the memory, soil characteristics of the catchment and degree of urbanization, thus making the time series of a streamflow highly nonlinear, time varying and stochastic in nature. Because of its highly nonlinear nature, predicting the accurate streamflow forecasts have always been a challenge for the hydrologists. In the past, there were many applications using the traditional statistical models like autoregressive model, moving average model and autoregressive moving average models for solving the nonlinear models. These models perform well when the data lie within the range of past observations. But they perform poorly to predict extremes and also when the data are lying just near to limits. However, with the advent of artificial neural networks (ANN’s) and inspired by its strong ability of nonlinear mapping, many applied them in diverse fields. Even in water resources, there are numerous applications of ANN. ANN's are giving good forecasts for the simpler problems. However, ANN’s were subjected to local convergence and slow learning so they could not gain satisfactory performance while solving the complex hydrological process. Recently, support vector machines (SVMs) emerged to be a successful computational technique as it overcomes many of the drawbacks of ANN and thus found numerous applications in different areas of science and engineering [14]. Even in the field of hydrology, researchers have applied SVM extensively. Sivapragasam et al. [5] have applied SVM in forecasting rainfall runoff, Yu et al. [6] employed SVM to predict rainfall runoff of Tryggevælde catchment, Khadam et al. [7] used SVM to describe the relative uncertainty of calibration data in hydrologic models, and Asefa et al. [8] used SVM in multiscale streamflow predictions.

Even though there is an increase in popularity of SVM, still the SVM is confined only to the ’expert’ users group as the performance of SVM models depends upon the appropriate selection of SVM hyper-parameters. Thus, the main concern for scientists is to find out proper parameter values for a given data set in SVM regression which can ensure good generalization performance. While the existing study on SVM regression [911] provides some recommendations for appropriate setting of SVM parameters, there is hardly general consensus and many contradictory opinions regarding these settings. Hence, many applications use re-sampling as a possible method for determining the SVM parameters. However, employing re-sampling for tuning several SVM parameters is very computationally expensive. Thus, the basic objective of this study is to determine ways and means for obtaining the optimal parameter values of SVM. In this study, particle swarm optimization (PSO) is adapted to find out the SVM parameters. Particle swarm optimization technique being a heuristic technique explores out the search space in an efficient manner and finds out the best set of parameters for SVM.

This paper is organized as follows. Section 2.1 describes the adaptation of PSO algorithm in estimating the optimal SVM parameters. In Sect. 4, the proposed model is applied over two stations and the results are compared with the standard traditional models autoregressive moving average model (ARMA) and artificial neural networks (ANN's). Finally, the conclusions are presented in Sect. 5.

2 Support vector machines

Support vector machine (SVM) was a most promising technique for data classification and regression. This section gives a brief account of SVM. Assume \(\{ (x_{1}, y_{1} ),{\ldots}.,(x{}_{\ell }, y{}_{\ell })\} \) be the given training data sets, where each \(x_{i} \subset R^{n} \) shows the input space of the sample and has a corresponding target value \(y_{i} \subset R\) for \(i=1,\ldots, l\) where l represents the size of the training data. The support vector regression solves an optimization problem:

$$ \hbox{minimize}\quad \frac{1}{2} \left\| w\right\| ^{2}+C\sum _{i=0}^{\ell }(\xi+\xi^*) $$
(1)
$$ \hbox{subjected to} \left\{ \begin{array}{ll} y_i-\langle w, x_i\rangle-b\leq\epsilon_i+\xi \\ \langle w, x_i\rangle+b-yi\leq\epsilon_i+\xi^* \\ \xi_i,\xi_i^* \geq 0\quad i=1,\ldots,l \end{array} \right. $$

where x i is mapped to a higher dimensional space by the function ϕ, ξ i is the upper training error (ξ * i is the lower) subject to the \(\epsilon\) insensitive tube \( yi-\langle w, x_i\rangle-b\leq\epsilon.\) The parameters which control the regression quality are the cost of error C, the width of the tube \(\epsilon\) and the mapping function ϕ. The constraints imply that we would like to put most data x i in the tube \( yi-\langle w, x_i\rangle-b\leq\epsilon. \) If x i is not in the tube, there is an error ξ i or ξ * i which we would like to minimize in the objective function. SVM avoids underfitting and overfitting the training data by minimizing the training error C l i=0 (ξ + ξ*) as well as the regularization term \(\frac{1}{2} \left\| w\right\| ^{2}.\) For traditional least square regression, \(\epsilon\) is always zero and data are not mapped into higher dimensional spaces. Hence, SVM is a more general and flexible treatment on regression problems.

2.1 Computation of SVM parameters

There are several Kernel functions which are being used in SVM. Dibike et al. [12] demonstrated that the radial basis function (RBF) outperforms other kernel functions after using different kernels in SVM for rainfall runoff modeling. Also, Da wei et al. [13] suggest that the regression process can be modeled more effectively using RBF because of its centralized feature. Besides this, many researchers report the use of SVM in hydrological modeling and forecasting and suggest that the radial basis function performs well [1416]. The RBF is thus adopted in this study also and expressed as

$$ k(x_{i} ,x)=\exp \left\{-\gamma \left|x-x_{i} \right|^{2} \right\} $$
(2)

where γ is the parameter of RBF which gives the width of the kernel. In general, γ varies from 0 to 1.

The SVM model used herein has three mutually dependent parameters, namely \(C, \epsilon, \gamma,\) thus changing the value of one parameter changes the other parameters too. A simultaneous or global optimization scheme such as PSO can be helpful [3] in determining the parameters of SVM. The general procedure of SVM-PSO method is illustrated in the flow chart given in Fig. 1.

Fig. 1
figure 1

Flow chart of PSO algorithm in SVM parameter selection

A common way to estimate the SVM parameters (\(C, \epsilon\) and γ) is to separate the data into two sets, namely a training data set and a validation data set. The prediction accuracy of this validation data set reflects the accuracy of the model, and SVM parameters that are able to give minimum prediction error are considered to be the optimal parameters. However, there is a risk of overfitting, in this procedure. Overfitting is a phenomenon that occurs when the performance error of the model is observed to be very small during training; however, when a new data set is used for simulation, the model tends to make some wild predictions. In the present study, a ‘k-fold’ cross-validation method is employed to check the generalization ability of the model and to select the appropriate input vector for time series. In the ‘k-fold’ cross-validation method, the data are segmented into ‘k’ sub-samples (k = 5 for the present study [17]. Of the various ‘k’ sub-samples, a single sub-sample is retained as validation data for testing the model and the remaining ‘k − 1’ comprise the training data. The cross-validation process is then repeated ‘k’ times with each of the k sub-samples used exactly once as validation data.

3 Particle swarm optimization technique in selecting the parameters of SVM

Swarm optimization, like genetic algorithms, is an optimization technique based on the metaphor of social behavior. Particle swarm optimization (PSO) is the member of a wide category of swarm intelligence-based methods used for solving global optimization problems [1821]. PSO is based on the simulation of simplified social models, such as bird flocking, fish schooling and swarm theory. A flow diagram of the entire process is also highlighted in Fig. 1. Initially, the upper and lower bounds of the three SVM parameters \(\gamma, \epsilon\) and C are specified. The values for the three parameters of SVM are then generated randomly within the bounds for each particle, and then these parameters are fed into SVM model. Next the fitness function is evaluated. In this study, the normalized mean square error (NMSE) serves as the fitness criterion for identifying the suitable parameters for SVM model. The NMSE value of each particle is then determined using the fitness function (Eq. 3).

$$ \text{NMSE}= \frac{n-1}{n}\frac{\sum^n_{i=1} [(Q_m)_i-(Q_s)_i]^2}{\sum^n_{i=1} [(Q_m)_i-(\tilde{Q}_m)_i]^2} $$
(3)

Where Q is the streamflow value, and the subscripts ‘m’ and ‘s’ represent the measured and simulated values, respectively. The average value of associated variable is represented with a ’tilde’ above it, and n depicts the total number of training records.

The fitness evaluation of the particle is then compared with the pbest value of the particle. If the current value is better than pbest, pbest value is set equal to the current value and the pbest location equal to the current location in dimensional space. The fitness evaluation is now compared with the overall previous best of the population. If the current value comes out to be better than gbest, gbest is reset to the current value of the particle’s array index. The velocity and position of the particle are then changed according to the equation

$$ v_{ij}^{t} = \left[\omega v_{ij}^{t-1} +c_{1} r_{1} \left(pbest_{ij}^{t-1} -x_{ij}^{t-1} \right)+c_{2} r_{2} \left(gbest_{j}^{t-1} -x_{ij}^{t-1} \right) \right] $$
(4)
$$ x_{ij}^{t} =x_{ij}^{t-1} +v_{ij}^{t} $$
(5)

where c 1 and c 2 are the acceleration constants and r 1 and r 2 are the random real numbers between 0 and 1. Thus, the particle flies through a potential solution toward pbest and gbest in a navigated way while still exploring new areas through stochastic mechanism to escape from a local optima. ω is called inertia weight to control the impact of the history of the velocity on the current one. The NMSE value is repeatedly determined until a criterion is met that usually specifies a good fitness or a maximum number of iterations allowed.

3.1 Parameters of PSO

The important parameters of PSO used in this model are given in Table 1. In this case, it has been assumed that the acceleration coefficients c 1c 2. Figure 2 depicts the variation in coefficient of correlation with an increase in the acceleration constant. The acceleration constant is varied from 0 to 10, and the performance of SVM-PSO is checked using a testing data set in terms of its R and NMSE values. It is found that as acceleration coefficients increases, the R value increases till it reaches a maximum of 0.93 at acceleration coefficient value 2; thereafter, R value again decreases as acceleration coefficient value increases. Thus, the value of c 1 and c 2 is taken to be 2.0 for this case. Further from Fig. 2, one can observe that with an increase in the value of acceleration coefficient, the NMSE value decreases. Inertia weight ω is set to vary linearly from 1 to near 0 during the course of an iteration run.

Fig. 2
figure 2

Sensitive analysis of SVM-PSO model at different acceleration coefficients

4 Case study and simulation results

In this study, the monthly streamflow data of Swan River near Bigfork and St. Regis River near Clark Fork were used in evaluating the performance capabilities of the proposed SVM-PSO model. The location of Bigfork station and Clark Fork station is shown in Fig. 3.

Fig. 3
figure 3

The study area

The St. Regis River near Clark Fork has a drainage area of 10,709 mile2, whereas the Swan River Basin near Bigfork has the total drainage area of 671 mile2. The observed data are 80 years (960 months) long with an observation period between 1930 and 2010 for both stations. A total of 75 % of the total data are used for fivefold cross-validation test. Remaining 25 % of the data are used for testing. These testing data are unseen by the model till the entire training program is completed.

4.1 SVM model development

The SVM model aims to develop a relationship of the form

$$ Z^m=f(X^n) $$
(6)

where X n is an n-dimensional input vector comprising of variables \(x_1, x_2, x_3, \ldots, x_n\) and Z m is an m-dimensional output vector consisting of the resulting variables of interest \(y_1, y_2, y_3, \ldots, y_n\). In modeling the streamflows, the values of x i may be streamflow values with different lags and the value of y i is the streamflow level of the next time step. However, the number of antecedent streamflow values to be included in the vector X n is not known apriori.

Therefore, in this study, the proper input data set is identified by developing various models with the different combinations of the streamflow values at several time lags. The input vector is modified each time by successively adding an streamflow value at one more time lag leading to the development of a new SVM model. The appropriate input vector is identified by comparing the coefficient of correlation, efficiency and NMSE. Three SVM models were developed with different sets of inputs variables as follows:

  • Model 1 Q(t) = f[Q(t − 1)]

  • Model 2 Q(t) = f[Q(t − 1), Q(t − 2)]

  • Model 3 Q(t) = f[Q(t − 1), Q(t − 2), Q(t − 3)]

4.1.1 Determining parameters of SVM models

SVM parameters were estimated for the different models using PSO technique as mentioned in Sect. 2.1. Since the bounds of parameters of SVM are not known apriori, a coarse range search is made to find the best region of the SVM parameters. Performing a complete grid search may be time-consuming; hence, a coarse grid search is performed first. Once the coarse grid search is performed; then, the fine grid search is performed. The range of parameters taken for coarse grid and fine grid search is given in Table 2.

Table 1 Parameters used in the PSO model
Table 2 Coarse and fine range partitions of SVM parameters taken during two-step optimization

Three models were evaluated for Swan River near Bigfork station, to predict current streamflow values. PSO was used to find parameters for all the three cases. The optimal set of SVM parameters obtained from PSO algorithm for Swan River near Bigfork station is given in Tables 3 and 4. The NMSE and R statistics of SVM-PSO models in training and testing are given in Table 5. The table indicates that the SVM-PSO model whose inputs were the flows of three pervious months (Q(t) = f[Q(t − 1), Q(t − 2), Q(t − 3)]) had the best accuracy in training, validation and testing periods.

Table 3 Optimal SVM parameters obtained from PSO for different models for Swan River near Bigfork station
Table 4 Optimal SVM parameters obtained from PSO for different models for St. Regis River near Clark Fork station
Table 5 The NMSE and R statistics of SVM-PSO application for Swan River near Bigfork station

Likewise, three SVM-PSO models were developed for St. Regis River near Clark Fork station to estimate current streamflow value. Table 4 depicts the parameters of the SVM model obtained from PSO technique after optimizing the Eq. 3. Further NMSE and R statistics were computed which are presented in Table 6. The table indicates that SVM-PSO model with three antecedent streamflows (Q(t) = f[Q(t − 1), Q(t − 2), Q(t − 3)]) shows better accuracy when compared with other models.

Table 6 The NMSE and R statistics of SVM-PSO application for St. Regis River near Clark Fork Station

4.2 Models for comparing forecast performance

The normalized mean squared error (NMSE) as given in Eq. 3 is used as the measurement of forecasting accuracy. Additionally, the coefficient of correlation (R) is given by Eq. 7.

$$ R=\frac{\sum^n _{i=1}[(Q_m)_i-(\tilde{Q}_m)][(Q_s)_i-(\tilde{Q}_s)]}{\sqrt{\sum^n _{i=1}[(Q_m)_i-(\tilde{Q}_m)]^2 \ast{\sum^n _{i=1}[(Q_s)_i-(\tilde{Q}_s)]^2}}} $$
(7)

Where Q is the streamflow value, and the subscripts ‘m’ and ‘s’ represent the measured and simulated values, respectively. The average value of associated variable is represented with a ’tilde’ above it, and n depicts the total number of training records.

The forecasting accuracy of the proposed SVM-PSO model is compared with the traditional ARMA and with ANN. The performance of SVM-PSO and ANN models is compared while predicting the streamflow values taking a one month lead. To set a direct comparison, the ANN model is trained using the same training data set (Q(t) = f[Q(t − 1), Q(t − 2), Q(t − 3)]) as used for the SVM-PSO for both the stations.

SVM-PSO, ANN and ARMA were used for forecasting the one month lead values for Bigfork station. In the testing phase, SVM-PSO forecasts the streamflow values with an NMSE of 0.13 and correlation coefficient is 0.86, whereas ANN predicts with NMSE of 0.45 and R value is 0.82. Likewise ARMA has NMSE of 0.79 and R value 0.76 for the Swan River. Even though the correlation coefficient (R) is almost same for all the three processes, still there is much change in the NMSE value. Hence, it is always advisable to check the mean square error to compare the performance of the algorithm. Further from the results, it seen that SVM-PSO model predicts the streamflow values with low NMSE and greater correlation coefficient and thus showing a better performance compared to ARMA and ANN. Figures 4, 5 and 6 show the comparison of the measured and predicted heads using the testing data for ARMA, ANN and SVM-PSO methods. Further in this study, the mean absolute peak prediction error was calculated. Table 7 depicts the error involved in predicting the peak streamflows by the each process. Table 7 shows that SVM-PSO model was able to predict the peaks more accurately compared to the ANN and ARMA. The performance of the various models is summarized in Table 8.

Fig. 4
figure 4

Comparison between simulated and actual streamflow values for testing data of Swan River near Big Fork Station(ARMA model)

Table 7 The comparison of peak estimates predicted by different models for the testing period of Swan River
Table 8 Comparison of R and NMSE values for Swan River and St. Regis River
Table 9 The comparison of peak estimates predicted by different models for the testing period of St. Regis River

Similarly, the performance of various models at the Clark Fork station is summarized in Table 8. The SVM-PSO has obtained the minimum NMSE of 0.193, compared to ANN and ARMA. This shows that SVM-PSO is able to capture the underlying dynamics of the streamflow values more closely when compared with ANN and ARMA. This is reconfirmed by observing the correlation values. Also, Figs. 7, 8 and 9 depict the measured and predicted heads using ARMA, ANN and SVM-PSO models for the testing data. Further it is seen that mean absolute peak prediction error has been calculated to determine which method is predicting peaks more closely. From Table 9, it is observed that SVM-PSO predicts peaks with 0.24 error, whereas ARMA model predicts peaks with 0.355 error and ANN has 0.266 error. Therefore, the nonlinear mapping ability and proper selection of parameters make the SVM-PSO successful in streamflow forecasting.

Fig. 5
figure 5

Comparison between simulated and actual streamflow values for testing data of Swan River near Big Fork Station (ANN model)

Fig. 6
figure 6

Comparison between simulated and actual streamflow values for testing data of Swan River near Big Fork Station (SVM-PSO model)

Fig. 7
figure 7

Comparison between simulated and actual streamflow values for testing data of St. Regis River near Clark Fork Station (ARMA model)

Fig. 8
figure 8

Comparison between simulated and actual streamflow values for testing data of St. Regis River near Clark Fork Station (ANN model)

Fig. 9
figure 9

Comparison between simulated and actual streamflow values for testing data of St. Regis River near Clark Fork Station (SVM-PSO model)

5 Conclusions

A hybrid model based on the combination of SVM and PSO is proposed in this study to improve the forecasting performance. The SVM-PSO model was obtained by integrating the two novel methods PSO and SVM. SVM operates on the principle of structural minimization rather than the minimization of the errors. Further PSO was employed in selecting the appropriate SVM parameters to enhance the forecasting accuracy. So this combination of SVM and PSO has made the proposed SVM-PSO model to perform better compared to the other traditional models. Furthermore, this study determines that the proposed SVM-PSO offers a valid alternative for application in hydrology. In this study, only streamflow values are used for the analysis. In future, the other hydrological variables such as rainfall and temperature can be used in the prediction of streamflow values. The idea of hybrid model can be used in other areas like weather forecast and rainfall runoff forecast to check the usability of the proposed model. Further, some enhanced versions of PSO technique [22] can be adapted to choose the SVM parameters.