Introduction

Stream-flow modeling is extremely important for the management of water resources (Yaseen et al. 2015b). Precise river flow forecasts are essential for the design of hydraulic structures (e.g., dams and irrigation schemes) that are inherently associated to stream-flow and flood gates and drought impacts study. This can provide decision-makers very significant information for rural and urban development projects, environmental impact assessments, irrigation schemes design and management, and reservoir operations (Badrzadeh et al. 2013). Development of accurate and reliable predictive models that can help extracting quantitative information from the antecedent patterns that bare embedded in data related to stream-flow can facilitate authorities administer water reservoirs in an optimal manner for water management, hydropower generation, agriculture and domestic and industrial water planning (Kisi and Cimen 2011).

There are two general categories of river flow forecasting models: (1) process-driven (physical or hydrological based) models and (2) data-driven (soft computing-based models) (Wang et al. 2006). The first category of models attempt applied for stream-flow predictions have been developed to describe and represent the physical processes in terms of relatively complex, deterministic-type mathematical equations, whereas the second category of models are purely of black box nature and do not need the understanding of the underlying physical process which governs the phenomena. Instead, data-driven models, to forecast the stream-flow values, utilize machine learning algorithms to extract pertinent data patterns and attributes. The only requirement in data-driven models is a set of hydrological variables related to stream-flow (e.g., rainfall, evaporation, etc.) that can provide the features for predictive modeling. With a sufficient amount of data, empirical equations are developed from the calibrated dataset, thus, providing distinct advantage in terms of their simplicity, low computational cost, and competitive performance relative to process-driven models.

Recently, the use of data-driven techniques to forecast stream-flow has received particular consideration from water resource specialists and researchers (Prairie et al. 2006; Toth and Brath 2007; Kashid et al. 2010; Kuo et al. 2010; Ni et al. 2010; Guimarães Santos and Silva 2014; Makwana and Tiwari 2014; Taormina and Chau 2015). These models, which were applied in a number of geographically diverse hydro-climatic zones, have shown good capability to generate reasonably accurate modeling results of stream-flow (Kumar et al. 2016). Among the primary predictive methods that have been considered lately, the application of support vector machine (SVM) and artificial neural network (ANNs) models has become prominent due to their broad application to diverse scientific domains (Yaseen et al. 2015a).

In general, ANNs models applied in hydrological modeling include (1) the generalized regression neural network (GRNN), (2) the radial basis function (RBF), and (3) the feed-forward back propagation (FFBP) models (Nourani et al. 2014). Over the last two decades, the application of FFBP, RBF, and GRNN models has increased especially in hydrological sciences (Fahimi et al. 2016). In comparison with many statistical modeling techniques, various forms of ANN models have showed significantly superior accuracy, particulary applications in stream-flow discharge forecasting, prediction of surface runoff and flood, stream-flow and water level predictions (Kagoda et al. 2010; Shiri and Kisi 2010). In a recent study, Tayyab et al. (2016) investigated stream-flow discharge forecasting by applying variations of the artificial neural network model including the training algorithms based on FFBPNN, RBFNN, and GRNN for the case of Jinsha River Basin in China (Tayyab et al. 2016). The results showed that the ANN model performance exceeded the performance of the statistical autoregressive (AR) model. By employing a number of test cases, the performance of the FFBPNN model was superior to all other models; however, for most cases, the GRNN model performed better than the RBFNN. In this paper, the GRNN model has been adopted for stream-flow forecasting.

Similar to an ANN model, a SVR model that utilizes a different modeling framework in terms of an application of kernel function for feature extraction has good potential to analyze unknown relationships between a set of input variables and the objective variable (Raghavendra and Deka 2014). Basically, SVR model can yield solutions in forecasting and predicting problems by means of pattern recognition techniques based on the structural risk minimization, and therefore, avoids the issue of over fitting the dataset (Liu and Lu 2014). Consequently, SVR model has been applied in hydrology and environmental applications in recent decades (Ch et al. 2013; Deo et al. 2016). Most recently, Wen et al. (2015) used limited climatic datasets to forecast the daily reference evapotranspiration by developing a SVR model (Wen et al. 2015). Their results showed that the SVR model was superior to the well-known empirical models that are commonly (e.g., the Priestley–Taylor, Hargreaves, and Ritchie model). In temperate and arid climate zones, Kim et al. (2012) tested the multilayer perception neural networks (MLP), GRNN, and SVR to simulate evapotranspiration (Kim et al. 2012), while (Gong et al. 2016) estimated monthly evapotranspiration using an SVR model compared with GRNN, multivariate adaptive regression spline (MARS) fuzzy genetic (FG), MLR, and adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP). In spite of numerous applications of these models in hydrology, to our knowledge, no studies have applied an ANN or SVR model for forecasting stream-flow in upper Senegal River basin, which falls in an important ecological zone in West Africa.

Recurrent droughts of the seventies in the Sahel regions have deeply affected crop production. This situation has brought the global and local communities to look for sustainable solutions in order to mitigate drought effects in these regions (Ndiaye 2010). For instance, in 1972, the Senegal River basin riparian states (Senegal, Mauritania, and Mali) have joined their efforts to create a river basin management organization called “Organisation pour la Mise en Valeur du fleuve Senegal (OMVS).” The main goal of this organization is to better manage the water resources, to develop irrigated agriculture, and to generate substantiate hydropower. OMVS led the construction of two dams: Manantali in the upstream of the river and Diama in the delta of the same river, and recently a new one (Felou) in Mali. Diama and Manantali dams have allowed the availability of 375,000 ha irrigable land in the Senegal River basin and thus facilitated the development of irrigated agriculture in this zone (Varis and Lahtela 2002).

It is particularly notable that the Manantali dam is built on the Bafing tributary and its primary purpose is to generate electric power, and to store water in the wet season for augmenting dry-season outflows for the benefit of agriculture and related irrigation practices. Its power production in terms of hydropower energy is estimated to be approximately 800 Gwh/year, guaranteed 9 out of 10 years for three of the country members of the OMVS (Mali, Mauritania, and Senegal). Therefore, any negative impacts of water supply with respect to the Manantali inflows is likely to affect the energy production as well as the agriculture sectors of those countries, which are the two keys elements of the African economy. Thus, it is crucial that we monitor, evaluate, and predict with good level of accuracy the availability of water in the Senegal River basin, particularly in its upper region that has influence of water management practices downstream of the river system.

Considering the importance of stream-flow knowledge in irrigations, water management and hydraulic infrastructural design in the Senegal River basin, this paper aims to investigate the capability of the SVR (Vapnik 1995) and GRNN (Specht 1991) models for forecasting and predicting daily stream-flow in the upper Senegal River basin at Bafing Makana station. To optimize the forecasting models, predictor variables based on antecedent stream-flow, rainfall, and evapotranspiration (1961–2014) within the Senegal River basin are applied.

Case study and methodological background

Study area

This research is conducted at the upper Senegal River basin (bounded by latitude 10°30′ and 12°30′N and longitude 12°30′ and 9°30′W). The total area of the study zone (the upper Senegal River basin at Bafing Makana station) is 21,290 km2 covering a part of Guinea Conakry and Mali (Fig. 1). It has a dense hydrographic network (Kane and Diallo 2005; Bodian et al. 2016); but considering the groundwater resources, the nature of the soil and the geological formations are not favorable to the existence of large aquifers. The area is characterized by the movement of the Inter Tropical Convergence Zone from south to north which directs the penetration of the West African monsoon driven by the thermal contrast between the Atlantic Ocean and the continent (Dione 1996). The climate of the basin is Guinean–Sudanese with a majority of the rainfall falling from April to October. The average rainfall of the basin is 1490 mm/year (Bodian et al. 2016). The elevation varies from 215 to 1389 m, and the slope indices decrease from upstream to downstream which points out the significance of the mountainous region of Fouta Djalon.

Fig. 1
figure 1

Location map of the upper Senegal River basin

Data

Daily rainfall data from 12 rain gauges and daily temperature data from 5 meteorological stations are used to develop the present stream-flow forecasting models. The data are collected from Mali and Guinea National Meteorological Agencies. Daily stream flows from 1961 to 2014 at the station of Bafing Makana are obtained from the Senegal River Basin Organization (OMVS). The meteorological information including in this study are the rainfall and evapotranspiration datasets over the period (1961–2004). It can be note, the most recent data information did not included in the modeling. This was mainly due a lack of the most recent period data. Therefore, a total of 43 years of the predictor data series that encompassed both the wet and the dry periods in this region are considered in this research paper.

Support vector regression (SVR)

Support vector regression (SVR) has been introduced by Vapnik as a novel statistical learning tool applied in complex prediction problems (Vapnik 1995). The basic idea of this technique is to map the data X into a high-dimensional feature space via a nonlinear mapping function to perform linear regression in this space (Wang et al. 2009). SVR is composed of a computer algorithm that learns the predictor data by examples to deduce the best function for the classifier/hyperplane in order to divide and analyze the linearly and nonlinearly separable data in the input space (Ghorbani et al. 2016). Figure 2a gives an example of the linearly separated data by means of support vectors within a hyperplane region. The use of kernel functions to map input data to a higher dimensional space makes the strength of the SVR model (Jain 2012), both for data classification and regression purposes.

Fig. 2
figure 2

a The schematic concept of the SVM model (Ghorban et al. 2016), b network architecture of the SVR model (Sujay and Deka 2015), c schematic diagram of generalized regression neuron network (GRNN) model

As the dataset used for stream-flow forecasting is numeric and exhibits statistical relationships between predictors, this paper has applied a regression form of the SVR model: support vector regression (SVR) algorithm. Further theoretical detail of SVR model can be found in the recent work of (Raghavendra and Deka 2014).

Figure 2a, b illustrates the basic form of an SVR model. Considering a couple of series of data (x i , y i ε (X × Y) where i varying from 1 to m (the total number of data patterns), x i   ε X = Rn is the predictor vector and y i   ε Y = Rn is the matching output (here, stream-flow), and the SVR model is described as follows (Raghavendra and Deka 2014):

$$f\left( X \right) = W_{i} \cdot \emptyset \left( X \right) + b$$
(1)

where \(W_{i}\), \(\emptyset (X)\), and b are the weight vector, the nonlinear transfer function that maps the input vectors into a high-dimensional feature space, and the bias, respectively.

In order to forecast the objective variable (i.e., stream-flow), the magnitudes of weight vector and bias are derived by minimizing the performance error function (Vapnik 1995):

$$\frac{1}{2}W^{\text{T}} \cdot W + C\sum\limits_{i = 1}^{N} {\xi_{i} } + C\sum\limits_{i = 1}^{N} {\xi_{i}^{*} }$$
(2)

this is subject to:

$$\begin{aligned} & W^{\text{T}} \cdot\Phi \left( {X_{i} } \right) + b - y_{i} \le\, \varepsilon + \xi_{i}^{*} \\ & y_{i} - W^{\text{T}} \cdot \Phi \left( {X_{i} } \right) - b \le \varepsilon + \xi_{i} \\ & \xi_{i} ,\xi_{i}^{*} \ge 0,\quad i = 1, \ldots ,N \\ \end{aligned}$$
(3)

The degree of the penalized loss when a training error is detected is determined by the parameter C (positive constant). Φ is the kernel function, N is the sample size, and \(\xi_{i}\) and \(\xi_{i}^{*}\) are slack variables specifying the upper and lower training error subject to an error tolerance \(\varepsilon\). In the regression problem, most data samples are expected to be within the \(\varepsilon\)-tube. If a data sample is not within the tube, then, an error \(\xi_{\text{i}}\) and \(\xi_{i}^{*}\) will exist. Subsequently, the coefficients \(\omega\) and b are determined by minimizing r(C): the regularized risk function (Raghavendra and Deka 2014):

$$r\left( C \right) = C\frac{1}{N}\sum\limits_{i = 1}^{N} {L_{\varepsilon } \left( {f\left( {x_{i} } \right),y_{i} } \right)} + \frac{1}{2}\left\| \omega \right\|^{2}$$
(4)

The ε-insensitive loss function: \(L_{\varepsilon } (f(x_{i} ),y_{i} )\) is defined (Raghavendra and Deka 2014):

$$L_{\varepsilon } \left( {f\left( {x_{i} } \right),y_{i} } \right) = \left\{ {\begin{array}{*{20}l} {\left| {f\left( {x_{i} } \right) - y} \right| - \varepsilon } \hfill &\quad {{\text{if}}\,\,\left| {f\left( {x_{i} } \right) - y} \right| \ge \varepsilon } \hfill \\ 0 \hfill &\quad {\text{otherwise}} \hfill \\ \end{array} } \right.$$
(5)

C > 0 is the penalty parameter and \(\frac{1}{2}\left\| \omega \right\|^{2}\) represents the regularization term. The loss function \(L_{\varepsilon } (f(x_{i} ),y_{i} ) = 0\) if the difference between the predicted f (x i ) and the measured value \(y_{i} < \varepsilon\). A nonlinear regression function (Eq. 6) is given by a function that minimizes Eq. (4):

$$f\left( x \right) = \mathop \sum \limits_{i = 1}^{l} \left( {\alpha_{i} - \alpha_{i}^{*} } \right)k\left( {x_{i } ,x} \right) + b$$
(6)

α i and α * i are the introduced Lagrange multipliers, and k (x i , x) refers to kernel function. The kernel function given by Eq. (7) describes the inner product in the D-dimension feature space.

$$k\left( {x_{i} , x} \right) = \mathop \sum \limits_{i = 1}^{D} \varphi_{j} (x_{i} )\varphi_{j} \left( x \right)$$
(7)

Based on the hydrological literature, radial basis function (RBF) has been used broadly in optimizing the kernel function of SVR model (Rubio et al. 2011). For the purpose of achieving good results, RBF is used in this research. The three parameters (penalty parameter C, error exceeding ε, and kernel function’s parameter γ) of the RBF equation were determined by a suitable grid search algorithm using the Matlab software and with harmony to the latest research conducted by (Yaseen et al. 2016).

Generalized regression neural network (GRNN)

The long-standing ANN model, of which the GRNN model is a special case, is a nonlinear modeling technique which is suitable for modeling over a range of variables (Babu and Reddy 2014). ANNs are able to identify a complex nonlinear relationship between the predictor (inputs) and output datasets. Their basic functioning units are composed of neurons. Each neuron receives and processes the input data before transforming it into output forms. There are two possibilities of input data: (1) pure collected data or (2) input results from other neurons, while the output data forms may be either the results of the final process or the input data of other neurons (Kim and Kim 2008).

In this research, we applied the GRNN developed initially by Specht (1991). GRNN is a variation of the radial basis neural network based on kernel regression network, requiring no iterative training procedure as with the case of back propagation ANN (Hannan et al. 2010). Instead, the GRNN is capable of approximating any relation between the input and output vectors and estimates the function directly from training dataset (Kisi 2006). Figure 2c shows a schematic view of GRNN. It is comprised of four nodal layers (input, pattern, summation, and output). Each of them is connected to adjacent ones by a set of weights between nodes. Without the need to iteratively tune the model as with the case of traditional ANN models, the GRNN model architecture is characterized by its fast learning and convergence to the optimal regression surface (Kisi 2006). It is also imperative to mention that the local minima problem is not a concern in GRNN-based ANN model, as with the case of other neural network models and they do not generate ambiguous predictions (Tayyab et al. 2016). Therefore, the proposed GRNN model provides an alternative framework for fast and accurate stream-flow forecasting in this study.

Forecasting model development

In this research, different predictive modeling scenarios based on the input attributes combinations have been considered. Scenario A denotes the univariate forecasting model development that considers the antecedent values of the stream-flow (Q) only, whereas Scenario B represents the multivariate prediction model that includes the stream-flow (Q), rainfall (R), and evapotranspiration (E) datasets as the predictor variables, carrying the climatological information required to model daily stream-flow. Table 1 shows the details of the model structures.

Table 1 Investigated modeling scenarios and their data set combinations

In order to determine the number of antecedent observations that are able to provide effective inputs to the prescribed GRNN and SVR models, partial autocorrelation functions of the daily stream-flow series at Bafing Makana station are computed (Fig. 3). It is evident that at the confidence level of 95%, the lag 1, lag 2, and lag 3 are highly significant in terms of their association with the stream-flow variations. Therefore, for this paper, the three antecedent days are considered for stream-flow forecasting (Table 1).

Fig. 3
figure 3

Partial autocorrelation function for daily stream-flow data at Bafing Makana station (with 5% significance limits indicated in red)

Model evaluation criteria

In theory, there exist several model evaluation criteria used to assess the forecasting accuracy; however, there is no consensus standard on the choice of one metric over the other as each metric is expected to reflect one or more characteristics of the forecasting method and the datasets used (Cheng et al. 2015). In this study, the prescribed GRNN and SVR are evaluated and compared with each other by using four common metrics: root mean square error (RMSE), mean absolute percentage error (MAPE), Willmott’s Index of agreement (WI), and the coefficient of determination (R2) expressed as:

$${\text{RMSE}} = \sqrt {\mathop \sum \limits_{i = 1}^{n} \frac{{\left( {Q_{{{\text{sim}}i}} - Q_{{{\text{obs}}i}} } \right)^{2} }}{N}}$$
(8)
$${\text{MAPE}} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \frac{{\left| {Q_{{{\text{obs}}i}} - Q_{{{\text{sim}}i}} } \right|}}{{\left| {Q_{{{\text{obs}}i}} } \right|}}$$
(9)
$${\text{WI}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{sim}}i}} - Q_{{{\text{obs}}i}} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {\left| {Q_{{{\text{sim}}i}} - \overline{{Q_{{{\text{obs}}i}} }} } \right| + \left| {Q_{{{\text{obs}}i}} - \overline{{Q_{{{\text{obs}}i}} }} } \right|} \right)^{2} }}$$
(10)
$$R^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left[ {\left( {Q_{{{\text{obs}}i}} - \overline{{Q_{{{\text{obs}}i}} }} )(Q_{{{\text{sim}}i}} - \left. {\overline{{Q_{{{\text{sim}}i}} }} )} \right]} \right.} \right.}}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{obs}}i}} - \overline{{Q_{{{\text{obs}}i}} }} } \right)^{2} \mathop \sum \nolimits_{i = 1}^{n} \left( {Q_{{{\text{sim}}i}} - \overline{{Q_{{{\text{sim}}i}} }} )} \right)^{2} } }}$$
(11)

In Eqs. (811), Qobsi and Qsimi are the observed and the forecasted stream-flow values and n is the number of observations or the time period over which the errors are forecasted in the testing data set. The RMSE and MAPE are deduced to measure the residual error, and R2 is applied to examine the level of statistical agreement between forecasted and observed stream-flow data in terms of the variance in the test set. In general, smaller values of the RMSE and MAPE and larger values of R2 indicate better performances of the forecasting models. In this paper, we also employed the Willmott’s Index, WI, which provides an alternative way to assess the performances of the forecasting model, especially in terms of its ability to address the limitations faced by RMSE (Willmott 1981).

The prediction modeling scenarios results

In this section, the accuracy of the GRNN and SVR models applied for 1-day-ahead stream-flow forecasting for the case of the upper Senegal River basin at Bafing Makana station is presented. Two modeling scenarios (A and B) are considered with three sets of input data for each scenario to judge the versatility of the prescribed models using statistical performance metrics and distribution of errors. Results from both scenarios are compared in light of the model’s performance and datasets used for forecasting stream-flow values.

Scenario A

For this scenario, three different combination inputs data are used: Set 1, Set 2, and Set 3. Table 2 lists the results from the GRNN and SVR model simulation for the three input combinations. The results show that the RMSE is about 88.4, 154.25, and 142.44 m3/s (81.90, 77.87, and 79.14 m3/s) forecasted by the GRNN (SVR) for Set 1, Set 2, and Set 3. This indicates that the Set 1 and Set 2 input combinations yield the lowest value of the RMSE (88.4 and 77.87 m3/s) for the case of the GRNN and SVR, respectively. Looking at the MAPE metric, one can note that for Set 3, the error values of about MAPE = 26.67 m3/s and for Set 2, a value of MAPE = 24.98 are recorded for the best performing GRNN and SVR models. When the optimal GRNN model (Set 1) and the optimal SVR model (Set 2) are compared, it is evident that SVR performs better than GRNN model in terms of lower RMSE and larger value of WI and R2. From Table 2, it is clearly shown that the SVR and the GRNN model provide different levels of performance for the different data sets. Therefore, the model accuracy depends on the input dataset but differences among the two models also exist for any common dataset that are used.

Table 2 Performance criteria values of Scenario A

For a direct comparison of the forecasted and the observed stream-flow, Fig. 4 shows the hydrographs as well as the scatter plots of the optimal GRNN and SVR models used for 1-day-ahead flow forecasting. The SVR model is able to forecast the mean and peak flow data more accurately than the GRNN model. Notwithstanding this performance difference, both models show generally good ability to predict daily stream-flow data as the time series of the forecasted and observed stream-flow are in reasonably good agreement.

Fig. 4
figure 4

Scatter plots and hydrographs of 1-day ahead stream-flow forecasting for the optimal GRNN and SVM models for Scenario A, cms cubic meter second

As additional measure of agreement between observed and simulated stream-flow data, the scatter plots of the optimum GRNN and SVR models are presented in Fig. 4. It should be noted that large deviations from the 45° line will indicate lesser prediction accuracy of the models. According to Fig. 4, the scatter plot of the SVR model falls close to 45° line, whereas that of the GRNN shows pronounced deviation from the 45° line. This result indicates that the SVR model accurately forecasts the stream-flow data better than the GRNN model. Also, it is noticeable that, according to the coefficient of determination and the best fit line, the SVR model is able to forecast better than the GRNN model. Considering all sets in this study, the SVR is able to generate better forecasting results than the GRNN model.

Further evaluation of the models is undertaken by investigating the observed and forecasted stream-flow from the optimal GRNN and SVR models using the five descriptive statistical metrics. Table 3 shows the values of the minimum stream-flow, median stream-flow, first and third quartile flows (25th percentile, Q1; 75th percentile, Q3, respectively), and maximum stream- flow for both models.

Table 3 Significant descriptive statistics for Scenario A for the optimal GRNN and SVR model compared with observed data

GRNN and SVR models are seen to overestimate the minimum stream-flow by about 12.73 and 5.87 times, respectively (Table 3), which indicates that neither of the two data-driven models are able to simulate the lowest values of stream-flow with very good degree of accuracy. This is not an unexpected result since the extremely low input features from the historical stream-flow and the climate inputs are usually more intermittent (i.e., generally rare) compared to the features representing the mean stream-flow data. Of course, a lack of sufficient features can lead to a low accuracy for the forecasted minimum stream-flow values. The implications of high inaccuracy of the minimum simulated stream-flow must be considered in the design of hydrological structures and water management decisions. For example, if data-driven model results are adopted in a future period where current stream-flow and rainfall values are dramatically low, such results should be subjected to “a conservative application and interpretation for core decision-making” and appropriate precautions should be taken to ensure that the decisions do not undermine their practical usage. Further benchmarking of the predicted results should be performed with additional information before such forecasts are implemented.

It is also noticeable that the SVR model exhibits a mean stream-flow very close to observed values (267.59 against 267.89) compared to the GRNN model which underestimates the mean value (248.79 against 267.89). Moreover, the evaluations of statistical parameters indicate that GRNN model underestimated the flows under the 50th and 75th percentiles more pronounced than SVR for the median. SVR slightly underestimates the third quartile. However, GRNN model predicts the maximum stream-flow closer to the observed values compared to the SVR. These results also show that both models overestimate low stream-flows and underestimate the high stream-flows. Generally, distribution of the low stream-flow values indicates that SVR has better predictions in comparison with the GRNN model.

Scenario B

For this scenario, stream-flow, rainfall, and evapotranspiration data were taken into account as predictor (input) variables of the GRNN and SVR model. Three lag times (t − 1, t − 2, and t − 3) are considered for all inputs, and each of lag time represented a data set input (i.e., lag (t − 1), (t − 2), and (t − 3)) and corresponded to Set 1, Set 2, and Set 3 respectively (Table 1). In this scenario, data that could impact the hydrological cycle are added and an improvement in the forecasting performance is expected. Table 4 presents the models, input sets (Set 1, Set 2, and Set 2), and the prediction skills for each set.

Table 4 Performance criteria values of Scenario B

In the Set 1, the input layer consists of three values of stream-flow, rainfall, and evapotranspiration both of them at time t − 1. For this Set, SVR performed better than GRNN if the RMSE (55.26 against 58.57), MAPE (20.11 against 29.20), and R2 (0.974 against 0.971) are considered. It is evident that this superiority of SVR is confirmed to be better for both Set 2 and Set 3. For the GRNN models, on average Set 1 shows the best performance with RMSE, MAPE, WI, and R2 of 58.57, 29.20, 0.99, and 0.971, respectively, and Set 3 shows the poorest results according to the same performance indicators (Table 4).

Concerning the SVR models, Set 3 performed better among the three sets with values of RMSE = 54.14, MAPE = 19.2, WI = .99, and R2 = 0.975. Results from Table 4 show that the GRNN model with Set 1 can provide the best result among the GRNN with respect to RMSE, MAPE, WI, and R2 criteria and Set 3 for SVR gave the best results with respect to the same criteria.

Comparison of the optimal GRNN (Set 1) and SVR (Set 3) indicates that SVR performs better than GRNN model. Figure 5 shows hydrographs as well as scatter plots of the best GRNN and SVR models in daily flow prediction. The SVR is capable to predict the mean and peak stream-flow data more precisely than the GRNN model. The scatter plot of the SVR model falls close to 45° line, whereas that of the GRNN model presents a pronounced deviation. Also, the coefficient of determination (R2) and fit line equations are better for the SVR model, suggesting that the SVR model forecasts are better than the GRNN model. For both the GRNN and SVR model, the simulated hydrographs reproduce quite well the observed hydrographs. If all sets, results indicate that the SVR model performed better forecasts of stream-flow than GRNN model.

Fig. 5
figure 5

Scatter plots and hydrographs of 1-day ahead stream-flow forecasting for the optimal GRNN and SVM models for Scenario B, cms cubic meter second

Similar to Scenario A, the observed flows and optimal GRNN and SVR flows are evaluated by determining the minimum stream-flow, maximum stream-flow, median stream-flow, first quartile flow, and third quartile stream-flow values (Table 5).

Table 5 Significant statistics values for Scenario B for the optimal GRNN and SVR models compared with observed data

The GRNN and SVR models have difficulty in simulating the low stream-flow data and gave negative values of − 31.43 and − 2.35, respectively (Table 5). The GRNN model is the poorest in terms of estimating the low values. SVR and GRNN models exhibit mean flow values that are very close to observed values, but both models slightly overestimate the mean values by less than 1%. Moreover, the evaluations of statistics parameters indicate that both the GRNN and SVR models underestimate the stream-flows under the 50th, 75th percentiles, and maximum flow with values of SVR closer to observed values than GRNN model (Table 5). Generally, based on the evaluation criteria and the five significant statistics values, the SVR model has better predictions in comparison with the GRNN model.

Scenarios comparisons

The performances of the forecasting results of daily river flow are compared using the GRNN and SVR models. Tables 2 and 4 gave the values of performance measures for each of the scenario and data set input layers. The two optimal models for each scenario can be compared based on the performance indicators determined in “Scenario A” and “Scenario B” sections. For both scenarios, this study shows that the SVR model appears to be the best model and the trials with Set 2 and Set 3 are the best for the Scenarios A and B, respectively. The main difference between Scenarios A and B is the number of input layers. Scenario A considered only the antecedent (lagged) stream-flow data as an input, whereas Scenario B took into account the related rainfall and evapotranspiration data together with stream-flow. Results showed that the fact that model integrated the other input variables that are likely to affect the hydrological cycle (i.e., rainfall and evapotranspiration) and the accuracy of the models could be improved substantially. From these results, it can be seen that the best input combination is achieved when the SVR model incorporates antecedent stream-flow, rainfall, and evapotranspiration as the predictor variables.

Results analysis summary

Based on the obtained results, in general, GRNN and SVR models are valuable predictive tools for forecasting short-term stream-flow in the context of data scarce regions. Comparisons among the two models showed that the SVR model was the best model for all scenarios and the data set input used in this study. The optimal SVR model with stream-flow, rainfall, and evapotranspiration integrated information from lags t − 3, t − 2, and t − 1 (Scenario B and Set 3). Differences in the accuracy of forecasts were also associated with the different scenarios that were tested. The results indicated that the accuracy of the forecasts increased with an increase in the inputs combinations. For all scenarios and the data input set, the SVR model outperformed the GRNN model. However, the incorporation of rainfall and evapotranspiration data as predictor variables led to improve the accuracy which is explained by the tightly linked hydrological cycle to rainfall and evapotranspiration changes in the study region.

In addition, in the light of the results achieved all the way through the proposed two modeling methods and also the input pattern scenarios, two major interpretations could be distinguished. Firstly, the necessity of external climatological input variables are needed for carrying out a reliable and accurate forecasting model for stream-flow. Apparently, the more hydrological input variables influencing on the stream-flow are counted in the model, the higher forecasting accuracy could be achieved. Even for those hydrological variables which have insignificant influence on the stream-flow might have indirect impact on the model accuracy. Furthermore, with respect to feature of the study area (the location of the study area with respect to the global environmental and climatological zone), particular hydrological parameters could have significant impact on the stream-flow forecasting, while these variables might have trivial impact in other study area. Therefore, a special attention has been given for the selection of input variables while developing stream-flow forecasting model reflecting not only the hydrological features and characteristics but also the climatological variables.

The second interpretation is the selection of the best modeling method. In fact, the existing evaluation metrics for the forecasting model are generally indecorous to provide enough information for evaluating the modeling method and hence introduce a solid judgment on the model performance. On the other hand, the evaluation of the model should be based on the purpose of the model based on the level the user of the model (water resources planner and decision-maker). Actually, the decision-makers usually give a great attention for the extreme stream-flow pattern in order to avoid having flood and/or drought period, and hence, it is preferable to achieve high forecasting accuracy for extreme events without precaution for the medium–high/medium stream-flow/medium–low categories. Also, the water resources planners usually keen for having homogenous distribution errors and then low RMSE for all categories of the stream-flow which give them the flexibility to introduce a proper plan in order to avoid the water deficit at any stream-flow categories.

In context of the present results, it is important to highlight practical significance of implementing data intelligence predictive models in the present study region using the case of upper Senegal River. One important consideration is that the electric power generated by Manantali dam that is built on the Bafing tributary is shared by the Senegal River basin riparian states on a daily basis. Hence, a prediction of the energy production at a horizon of one day ahead timescale is likely to generate valuable information for the river basin management organization (OMVS). Such information can be used by OMVS in their daily river basin management decisions since stream-flow a primary input required to estimate the power production. Therefore, developing accurate and reliable models for predicting stream-flow at short forecast horizons (e.g., daily) can help decision-makers to better manage the water resources. In addition, the established modeling can provide crucial information for energy production and management to avoid potential conflicts between the different riparian states.

Conclusion

Accurate river flow forecasts are a vital component of power production, sustainable water planning and management. Particularly, the forecasting of stream-flow is crucial for the management of hydropower production, flood and reservoir management decisions. Several data-driven techniques are currently available for hydrological forecasting purposes. This study has investigated and compared the abilities of the GRNN and the SVR model in forecasting the daily stream-flow of Bafing Makana station on the upper Senegal River basin using the input combinations of rainfall, evapotranspiration and historical stream-flow at different lagged timescales (t − 1, t − 2, t − 3). Four standard statistical performance evaluation measures (i.e., RMSE, MAPE, d, and R2) have been adopted to evaluate the performances of the data-driven models. These models with different input combinations were compared with each other in their ability to estimate daily stream-flow data. The results showed that the data-driven models with a larger number of inputs combination generally led to a better accuracy and that the SVR model outperformed the GRNN model in predicting daily stream-flow for all inputs combinations and modeling scenario. The present study showed that SVR model can be a valuable predictive tool for stream-flow prediction and it can assist the Senegal River basin organization to better manage the Senegal River water resources, especially in the upper part of the basin.