1 Introduction

Understanding the complicated phenomena of streamflow plays a significant part in water resources management. More specifically, long-term streamflow forecasting (e.g., monthly river flow) is greatly crucial for hydro-power generation, appropriate reservoir operation, effective irrigation management decision and several other hydrological applications. Over the past couple decades, streamflow modeling has received a massive attention by hundreds of researchers. This is due to the fact that, the global climate changes have been influenced the hydrologic cycle that caused numerous of flood and drought events. According to the literature, river flow forecasting has been undertaken based on two main methodologies, physical based models and conceptual based models “e.g., data-driven techniques”. Physical models usually required more effort and various hydrological variables to simulate the elemental physical processes of the watershed (Costabile et al. 2012). Whereas, data-driven soft computing approaches have shown the capability to capture the non-linearity relationship between the predictors and predicted without advance knowledge with less inputs hydrological parameters (Ahmed and Sarma 2007; Afan et al. 2014; Singh and Cui 2015; Tigkas et al. 2016).

Classically, black box time series models have been applied for streamflow forecasting since 1970 by (Box and Jenkins). Based on the review researches, those parametric linear models such as Moving Average (MA), Auto Regressive Integrated Moving Average (ARIMA), and Multiple Linear Regression (MLR) have been used in almost all the hydrological variables (Abrahart and See 2000; Maier and Dandy 2000; Abrahart et al. 2010; Abrahart et al. 2012; Yaseen et al. 2015). However, they perform poorly in the conditions of highly non-stationary and non-linear real problems. Since 1990, artificial intelligence methods have been extensively utilized in a wide range of hydrological applications and more specifically for streamflow forecasting, such as artificial neural network (ANN), support vector machine (SVM), adaptive neuro fuzzy inference system (ANFIS), genetic algorithm (GA), and gene expression programming (GEP) (Nourani et al. 2014; Yaseen et al. 2015).

Most recently, three data driven approaches have been gained a remarkable emerging and potential in handling the complex nonlinear problems such as least square support vector regression (LSSVR), multivariate adaptive regression splines (MARS) and M5 Model Tree. Those forgoing approaches have been broadly used in solving hydrologic problems. LSSVR is the modified version of support vector repression (SVR) that can exclude the quadratic programming problems (Suykens and Vandewalle 1999 ). In addition, it avoids several shortcomings of other data-driven learning processes (e.g., local minima, time consumption and over-fitting) (Ji et al. 2014). LSSVR has received a positive successful application in the engineering field; for instance, bearing raceway prediction (Tao et al. 2008), prediction of effluent parameter of wastewater treatment plant (Huang et al. 2009), airframe wing-box estimation (Deng and Yeh 2010), power system stabilization (Pahasa and Ngamroo 2011), prediction of CO2 in reservoir (Shokrollahi et al. 2013), oil recovery and economic analysis (Kamari et al. 2014), and oil reservoir viscosity determination (Hemmati-Sarapardeh et al. 2014). In the hydrological context, there are a few studies have been conducted using LSSVR; for example, evapotranspiration prediction (Guo et al. 2011; Kisi 2013), daily water demand estimation (Hwang et al. 2012), sediment transport modeling (Kisi 2012), reservoir inflow modeling (Okkan and Ali Serbes 2013), and water pollution prediction (Kisi and Parmar 2016), authors concluded the outperformance of the LSSVR over the other data-driven used in their researches and recommended its applicability for other hydrological variables.

Multivariate adaptive regression splines is a relatively modern artificial intelligence approach that firstly proposed by (Friedman 1991). The main advantages of this method are the capacity to capture the natural complication of the data mapping in high-dimensional data patterns, quick and flexible model, and perform the forecasting of continuous and binary output variables accurately. In addition, this nonparametric statistical method is a flexible procedure that organize the relationship between the inputs and output variables with less including variable interactions (Leathwick et al. 2006). Previous studies of the MARS algorithm in water resources application include rainfall and temperature forecasting, sediment concentration estimation, water pollution prediction, freshwater distribution system modeling, and drought events river flow simulation (Sarangi and Bhattacharya 2005; Leathwick et al. 2006; Sotomayor 2010; Adamowski et al. 2012; Shortridge et al. 2015). Thus, in the current research, the best knowledge of the authors is to introduce the multivariate adaptive regression splines approach for forecasting and predicting monthly streamflow.

Another new data-driven technique is M5 Model tree. M5 model tree is a data mining approach that splits the data time series into subspace using divide-and-conquer method, which makes it possible to divide the multi-dimensional parameter space and generate the model automatically based on the overall quality criterion (Quinlan 1992). Recently, scholars researched the utility of the M5 model tree in different hydrological applications such as water level optimization (Bhattacharya and Solomatine 2005), precipitation-river flow modeling (Solomatine and Dulal 2003), evapotranspiration prediction (Pal and Deswal 2009), flood events forecasting (Solomatine and Xue 2004), and sedimentation estimation (Sarangi and Bhattacharya 2005). Those are a few studies effectively accomplished in the water resources sector using M5 model tree.

For the best knowledge of the authors, the major objectives of the current research are (i) investigate three different modern heuristic regression approaches (i.e., LSSVR, MARS and M5 model tree) for modeling long-term streamflow, (ii) compare their performance with one classical method such as MLR, (iii) in order to demonstrate the effectiveness, four rivers placed in two different region namely, Batman and Garzan Rivers located in Turkey, Euphrates and Tigris Rivers located in Iraq, have been used to perform the proposed models. In the first phase of the study, streamflow forecasting is demonstrated based on the same river flow data for the same river. Whereas the second phase, streamflow prediction is conducted for specific stream based on the nearby stream. Furthermore, the influence of periodicity on the forecasting and predicting performance was examined.

2 Theoretical Overview

2.1 Least Square Support Vector Regression

LSSVR is the extended version of support vector regression (SVR) model, modified by (Suykens and Vandewalle 1999 ). Based on the literature, the major drawback of SVR is time consumption that overcame by the improved version of LSSVR via excluding the quadratic programming problem. This enhancement would avoid several limitations (e.g., the local minima, the over-fitting problem). In addition, it may produce a stable solution to crack the quadratic programming problems (Xie et al. 2013; Ji et al. 2014). Statistically, the main principle knowledge of LSSVR is to accomplish the optimum mapping function between the inputs x and the output y. This process is conducted through non-linear relationship function with high-dimensional feature space. To attain the optimal solution, regression model into the high-dimensional feature space was developed to capture the non-linear regression function. Regression function can be formulated as follows:

$$ \mathrm{y}\left(\mathrm{x}\right)={\mathrm{w}}^{\mathrm{T}}\upvarphi \left(\mathrm{x}\right)+\mathrm{b} $$
(1)

where y is the obtained value in terms of x, w is the coefficient vector, φ is the mapping function, b is the bias term achieved by the minimizing the upper bound of the generalization error. According to the standard of minimizing the regularized risk, the regression function of LSSVR (Suykens and Vandewalle 1999) can be well-defined as:

$$ \mathit{\min}\frac{1}{2}{w}^Tw+\frac{1}{2}\gamma {\displaystyle \sum_{i=1}^l\left({\xi}^2\right)} $$
(2)

That subject to the following constraints

$$ \mathrm{y}={\mathrm{w}}^{\mathrm{T}}\upvarphi \left({\mathrm{x}}_{\mathrm{i}}\right)+\mathrm{b}+{\upxi}_{\mathrm{i}}\left(\mathrm{i}=1,2,\dots, \mathrm{l}\right) $$
(3)

Where γ is the regularization parameter which is control the minimization of the forecasting or prediction error and the function smoothness, while ξ is the training error for the inputs (x i ).

At this point, Lagrange Multiplier is utilized to derive solution for w and ξ using formula (2). The objective function obtained by changing the constraint problem into an unconstraint problem. The Lagrange function L written as follows:

$$ \mathrm{L}\left(\mathrm{w},\mathrm{b},\upxi, \propto \right)=\mathrm{J}\left(\mathrm{w},\upxi \right)-{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{l}}{\mathrm{a}}_{\mathrm{i}}\left\{{\mathrm{w}}^{\mathrm{T}}\upvarphi \left({\mathrm{x}}_{\mathrm{i}}\right)+\mathrm{b}+{\upxi}_{\mathrm{i}}-{\mathrm{y}}_{\mathrm{i}}\right\}} $$
(4)

where a i presents Lagrange Multipliers.

The Lagrangian theorem and Karush-Kuhn-Tucker (KKT) condition permit (Fletcher 1987) to achieve the following function:

$$ \mathrm{y}\left(\mathrm{x}\right)={\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{l}}{\mathrm{a}}_{\mathrm{i}}\kern0.5em \mathrm{K}\left(\mathrm{x},{\mathrm{x}}_{\mathrm{i}}\right)+\mathrm{b}} $$
(5)

K(x) denotes the kernel function that satisfies Mercer’s conditions; K(x, x i ) = (φ(x) . φ(x i )) that eliminate vector dot product operation in some feature space.

In the current research, radial basis function (kernel function) was used to in the regression solution. The formula can be defined as:

$$ \mathrm{K}\left(\mathrm{x},{\mathrm{x}}_{\mathrm{i}}\right)={\mathrm{e}}^{\frac{{\left\Vert \mathrm{x}-{\mathrm{x}}_{\mathrm{i}}\right\Vert}^2}{2{\upsigma}^2}} $$
(6)

There are two parameters used for tuning LSSVR model, which are γ and σ2 (Cao et al. 2008). The current state-of-the-art of the authors is the utilization of LSSVR for streamflow forecasting and prediction. This is relying on the robustness of LSSVR model against the chaotic disturbances, complex non-linear and randomness problems. Furthermore, it’s utility to reduce the soft computing efforts comparatively to the classical approaches.

2.2 Multivariate Adaptive Regression Splines

MARS is a nonparametric regression model that was initially proposed by (Friedman 1991), which is utilized to forecast continuous numeric outcomes. The main feature of MARS algorithm is the forward and backward stepwise procedure that can controls and explains the complex nonlinear mapping between the inputs and output variables. The advantage of the backward stepwise procedure is to remove the unnecessary input candidates from the previous selected data set in order to enhance the forecasting accuracy. This function forecasts the new output Y according to the input variable X using either of the two basis functions, using a knot or value of variable that defines the inflection point along the inputs range (Sharda et al. 2006):

$$ Y=\mathit{\max}\left(0,X-c\right) $$
(7)
$$ Y=\mathit{\max}\left(0,c-X\right) $$
(8)

where the c parameter indicates the threshold value. There are two adjacent splines intersect at a knot, in order to maintain the continuity of the basis functions. The function is used in the forward and backward stepwise procedure to each input parameter is to identify the precise location of knots where the function value changes. Great to mention, MARS model is a data-driven process that gained popularity in time series analysis, most recently. In addition, it is even better to explore its capability to enhance river flow forecast models. Authors recommend the following references for the reader to refer for more comprehensive details of MARS model (Friedman 1991; Sharda et al. 2008; Zhang and Goh 2014).

2.3 M5 Model Tree

The complex time series problems can be comprehended by splitting the time space into a number of sub time space and build each category individually using linear regression model. M5 model tree algorithm is one of the new data mining method that divide the data space into smaller sub-spaces using divide and conquer procedure (Quinlan 1992). The fundamental concept of this model is the binary decision tree. The partition procedure follows the idea of a decision tree that has a regression function, which is able to forecast continuous numerical attribution. As shown in Fig. 1, M5 model tree perform its algorithm based on two stages, at the first stage time series data are divided into subset in order to initiate the decision tree. The splitting criterion for this model is relying on the standard deviation of the class values that reach a node as an amount of error at that node. Then after, computing the expected reduction in this error as a result of testing each attribute at that node (Solomatine and Dulal 2003; Pal and Deswal 2009). Now, the equation that compute the standard deviation reduction (SDR) can be expressed as:

$$ SDR=sd\left(\mathrm{K}\right)-{\displaystyle {\sum}_{\left|\mathrm{K}\right|}^{\left|Ki\right|}sd}(Ki) $$
(9)
Fig. 1
figure 1

The two stages of M5 model tree

The variables of the SDR formula explained are as follows; (i) sd represents the standard deviation, (ii) K denotes a set of examples that reaches the node, and (iii) the subset of examples that have the ith outcome of the potential set is represented as Ki. In the partition procedure, the first generation (child) nodes are less than the origin node in data’s standard deviation. As final step in first stage, M5 selects the split that maximizes the envisioned error reduction. Nevertheless, this separation usually produces a large diagram (tree) structure that need to be pruned subtrees using linear regression functions, which is representing the second stage of M5 modeling.

2.4 Multiple Linear Regression

There are several engineering applications involve exploring the relationship between two or more parameters. Regression analysis model is one of the popular statistical approach that is highly recommended for these kind of problems. Throughout the literature, streamflow forecasting has been undertaken using MLR model, due to the fact that this model comprises many regressors to deal with the time series data base. Theoretically, the relationship between the dependent variable (Y) “i.e., one-step-ahead streamflow” and the independent variables (Xi) “i.e., the preceding streamflow records” can be described as followed:

$$ \mathrm{Y}={\mathrm{P}}_{\mathrm{o}}+{\mathrm{P}}_1{\mathrm{X}}_1+{\mathrm{P}}_2{\mathrm{X}}_2+\cdots +{\mathrm{P}}_{\mathrm{n}}{\mathrm{X}}_{\mathrm{n}} $$
(10)

Where Y is the target output, Pi (i=0,…., n) are the regression coefficients, and Xi (i=0,…., n) are the input variables.

2.5 Model Performance Indicators

Hydrological applications usually are evaluated based on quantitative indicators. Legates and McCabe (1999) stated in their study that predictive models in the scope of hydrology recommended to be examined using “goodness-of-fit” for example determination coefficient (R) and minimum one of absolute error performance criteria (e.g., mean absolute error (MAE) and root mean square error (RMSE)). Thus, the proposed data-driven models were evaluated with respect to RMSE, MAE and R for each input combination. The statistic measure RMSE and MAE are formulated as follows:

$$ \mathrm{RMSE}=\sqrt{\frac{1}{\mathrm{N}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{n}}{\left({\mathrm{Q}}_{\mathrm{o}}-{\mathrm{Q}}_{\mathrm{f}}\right)}^2}} $$
(11)
$$ MAE=\sqrt{\frac{1}{\mathrm{N}}{\displaystyle \sum_{\mathrm{i}=1}^{\mathrm{n}}\left|{\mathrm{Q}}_{\mathrm{o}}-{\mathrm{Q}}_{\mathrm{f}}\right|}} $$
(12)
$$ R=\frac{{\displaystyle {\sum}_{t=1}^n\left[\left({Q}_o-\overline{Q_o}\right)\left({Q}_f-\overline{Q_f}\right)\right]}}{\sqrt{{{\displaystyle {\sum}_{t=1}^n{\left({Q}_o-\overline{Q_o}\right)}^2{\displaystyle {\sum}_{t=1}^n\left({Q}_f-\overline{Q_f}\right)}}}^2}} $$
(13)

where N is the number of the raw streamflow data, Q o is the actual (observed) flow values and Q f is the model output.

3 Cases Studies and Data Preparation

3.1 Turkey Region

Average monthly intermittent streamflow data of two stations in the East-Anatolia region located in Southeast Turkey were used. The location of the stations was illustrated in Fig. 2a. In this study, the Besiri Station (Station No: 2603) on the Garzan Stream and Malabadi Station (Station No: 2612) on the Batman Stream, in the Firat-Dicle Basin of Turkey were used. The drainage areas at these sites are 2450 km2 for Besiri and 4105 km2 for Malabadi. In Turkey, the first largest basin is Firat (basin number 21) with an approximately 127,000 km2 of land zone. Dicle Basin (basin number 26) is the third largest basin with an almost 57,000 km2 of land zone. Rely on basin land area, the Firat basin is the largest, with a total yearly flow volume approximately 32 billion m3. The second one is Dicle Basin, with approximately 25 billion m3 (Kaygusuz 1999; Demirbas and Bakis 2003). Streamflow forecasting for this region is very important for many of the activities such as flood mitigation, management of water reservoirs, distribution of drinking water and management of water infrastructures and dam planning etc. The observed data are 35 years (420 months) long with an observation period between 1964 and 1999 for mentioned stations. The observed data were obtained from the report of the Turkish General Directorate of Electrical Power Resources Survey and Development Administration.

Fig. 2
figure 2

a The Large basins of Turkey and The Malabadi (2612), Besiri (2603) stations, b Hit and Baghdad stations which are located in Iraq region

3.2 Iraq Region

Another two stations were selected to apply in this study which are Hit station on the Euphrates River and Baghdad station on Tigris River in Iraq region, as shown in Fig. 2b. Hit and Baghdad stations are covered a drainage area approximately 264,100 km2 and 134,000 km2, respectively. The geographic position of the Hit and Baghdad stations areas are stretched between (33° 36' 23") N Latitude and (42° 50' 14") E Longitude, (33° 24' 34") N Latitude and (44° 20' 32") E Longitude. Euphrates and Tigris Rivers are the essential source of fresh water, socioeconomic development and the political stabilization in this region. Developing such accurate forecasting and predicting river flow modeling in particular long-term (e.g., monthly streamflow) are significantly important to provide a considerable economic benefit, improve the irrigation sector, and solve the water shortage problems. The monthly streamflow data records 38 years (456 months) between (1960-1997) for Hit and Baghdad stations between (1968-2005) were used for this application. The hydrological data were obtained from the descriptive research that was conducted by Saleh (2010).

3.3 Data Time Series Preparation

For all presented stations, streamflow data time series were splitted into four training/testing divisions in order to achieve the best effective model formulation. For both of the applications forecasting and predicting, three divisions of the data were utilized to train the models, while the fourth was used to validate (test) the models network. The testing data phase was changed in all application; therefore, four different scenarios were investigated. Table 1 indicated the statistical characteristics of each data set used in this study for all stations. Those statistical indicators included over all mean (Xmean), standard deviation (Sx), minimum and maximum flow records (Xmin and Xmax), skewness (Csx), and the antecedent values of auto-correlation coefficient.

Table 1 The monthly statistical parameters of data set for Besiri, Malabadi, Hit, Baghdad stations

4 Application and Analysis

The effectiveness of the proposed artificial intelligence approaches were examined upon actual streamflow data obtained from official organizations authorized for monitoring such river flows. In the first part of the current study, it was decided to prove the efficiency of the LSSVR, MARS and M5-Tree models to forecast one month ahead streamflow and compare the results with MLR model. In addition, the effect of the periodic time scale on the forecasting results was also explored. Whereas, the second part of the study is to investigate the applicability of the data-driven to predict monthly streamflow using inflow time series data belonging to the nearby river. Different input combinations based on the present and antecedent streamflow were used to model the forecasting and prediction. In other words, Qt indicates the streamflow at time t, the input variables are; (i) Qt, (ii) Qt, Qt-1, (iii) Qt, Qt-1 and Qt-2. This application section provides a comprehensive detailed discussion and analysis of the proposed methods. It should be remarked that the utilized river flow data for all rivers are continuous and do not experience any missing monitoring events data during the examination period.

4.1 Streamflow Forecasting

As mentioned in the previous section, the first scenario was undertaken to forecast monthly streamflow. For the purpose of how the statistical analysis will generalize an independent data set, each input combination was cross validated by partitioning the time series data into four sets. By recalling the main parameters of LSSVR model, different regularization constant and width of radial basis function kernel were tried to obtain the minimum RMSE indicator. Table 2 displayed the optimal LSSVR parameters models of each input combination for the testing phase. Tables 3, 4, 5, and 6 indicated the testing phase outcomes using LSSVR, MARS, M5 model tree and MLR models for the all stations (Besiri, Malabadi, Hit and Baghdad). According to the mean values of the performance indicators (e.g., RMSE and MAE) of the modeling, there is a remarkable difference can be observed in the results, which are the values of the root mean square error and mean absolute error. The Turkish rivers modeling showed low percentages of RMSE and MAE comparing the Iraq Rivers. This is due to the mean average flow of the rivers, Garzan and Batman Rivers are characterized by mean river flow 53.66 and 129.37 m3/s, respectively. While Euphrates and Tigris rivers are 750.06 and 838.84 m3/s, respectively.

Table 2 Regularization constant and width of RBF kernel parameters of the optimal LSSVR models for each combination Besiri, Malabadi, Hit and Baghdad stations
Table 3 Comparison of LSSVR models
Table 4 Comparison of MARS models
Table 5 Comparison of M5-Tree models
Table 6 Comparison of MLR models

Based on the mean performance of RMSE and MAE, Tables 3, 4, and 5) exhibited M3 as the best data set to forecast one month ahead for Besiri and Malabadi stations. This might be because M3 data set provides a knowledgeable pattern of flow in the training and testing phases of the models that could perform very well comparing to the other data sets. On the other hand, the worst data set was M1 for LSSVR, MARS and M5 model tree for all the investigated inputs combination. This can be expounded that LSSVR, MARS and M5 model tree could not explore the nature of the streamflow of the M1 data set in the training and testing periods. However, LSSVR results outperformed MARS and M5-Tree models and the outstanding outcome presented for M3 data set period for the input combination (iii). The optimal LSSVR model (M3 data set and input iii) increased the RMSE accuracy of the optimal MARS and M5-Tree models by 3.9 and 31.2 % for the Besiri and by 2.6 and 20.6 % for the Malabadi stations, respectively. It should be noted that there is also a significant difference between MARS and M5-Tree for the both stations. Euphrates and Tigris Rivers modeling were totally different with obvious fluctuation of the best performance results. The consistency of the Iraq rivers region modeling conclusion was diverse, various data sets with different inputs combination performed the remarkable results of the used intelligence approaches. Hit station modeling showed the best accuracy belonging to M2 with one lag time for the LSSVR and MARS models, while M5 model tree demonstrated the best results for the M1 with one lag as well (the input combination (i)). Baghdad station obtained its best application using the first data set (M1) with one antecedent value of flow to forecast one-month-ahead. The variance of the best results here is because of the phenomena that characterized Iraq climatology which is highly nonstationary and each approach dealt with the data base with different consistency. Here, the lowest standard indicators appeared for the fourth data set (M4) with respect to the all inputs combination. In general, it could be noticed that LSSVR provides the admirable forecasting modeling of streamflow over the other methods. The RMSE performance of the best MARS and M5-Tree models was increased using the best LSSVR model by 10.1 and 36.7 % for the Hit and by 3.3 and 17.8 % for the Baghdad stations, respectively. Similar to the previous application, here also a considerable difference exists between MARS and M5-Tree models.

Traditionally, MLR models were examined for the same data sets and the remarkable goodness in term of RMSE and MAE were selected for comparison purpose. MLR results presented in Table 6 for all the stations. There is an outstanding harmony with gained results regarding the data sets and the preceding input vectors comparing with LSSVR, MARS and M5 model methods. What is worth to be observed? There is a noteworthy enhancement in the application of LSSVR, MARS and M5-Tree model methods comparatively with MLR method. In order to describe this improvement in rational way, the percentages of the accuracy increment for the performance criteria have been calculated. The mean RMSE and MAE accuracies of the MLR model successfully increased using LSSVR model by 8.95-4.19 %, 12.8-8.08 %, -0.12-4.03 % and 13.56-10.03 % for Besiri, Malabadi, Hit and Baghdad stations, respectively.

The periodicity data component was also examined and evaluated for the forecasting modeling section. In fact, the main idea behind including this periodic sub data which is one year to forecast one month ahead, is to supply the modeling an external pattern of flow that might give a comprehensive knowledge and better accuracy of results. Table 7 displayed the results of the testing phase for periodic LSSVR model. Obviously, adding the periodicity component has increased the average LSSVR model performance accuracy in term of the RMSE and MAE by 20-23.21 %, 28.73-33.82 %, 2.20-5.91 % and 4.98-11.08 % for Besiri, Malabadi, Hit and Baghdad stations, respectively. By comparing Table 7 with 3, the periodic LSSVR indicates the same consistency of modeling accuracy with LSSVR for Besiri and Malabadi stations which are M3 the best model and M1 the worst model. In addition, Hit station gives the same combination of results M2 the best model and M4 the worst model. Whereas, Baghdad station presents different outcome the best testing data set was 1977-1986 (M3) and the worest testing data set was similar to the previous application od the LSSVR, 1968-1976 (M4).

Table 7 Comparison of the P-LSSVR models

Further assessment for the effectiveness of the utilized data-driven models, it seems reasonable to investigate the linear relationship between the observed and forecasted time series for the testing period. Scatter plots are illustrated in Figs. 3a, b belonging to Besiri and Malabadi stations, respectively. Those figures demonstrated the best models of LSSVR, MARS, M5 model tree, periodic LSSVR (P-LSSVR) and MLR models for M3 input combination. P-LSSVR has been found the best model displayed closed to the fit line comparing to the other models. Similarly, Fig. 3c, d showed the best fit line regression indicator regarding Hit and Baghdad stations. Hit station performed the best value of R for LSSVR model with M2 data set and input combination (i). However, it is evident based on Fig. 3c that there is a slight deviation between LSSVR model and MLR. Fig. 3d displayed the best fit line all the models for M1 and combination (i), except MLR method with combination (ii), for Baghdad station.

Fig. 3
figure 3

The observed and forecasted streamflows scatterplot by the LSSVR, MARS, M5-Tree, MLR and P-LSSVR, a the M3 data set-Beşiri station, b the M3 data set-Malabadi station c the M1 and M2 data sets–Hit station, and d the M1 data set - Baghdad station

Overall, LSSVR and MARS generally performed superior to M5-Tree and MLR models. The reason behind this may be the fact that the linear structure of the M5-Tree and MLR models prevents them from accurately modeling highly nonlinear streamflow process. Wang et al. 2009 compared the ability of autoregressive moving-average ARMA, ANN, ANFIS, genetic programming (GP) and SVM methods in forecasting monthly discharge time series and they obtained R of 0.786, 0.786, 0.801, 0.815 and 0.823 for the ARMA, ANN, ANFIS, GP and SVM, respectively. Rezaeian-Zadeh et al. 2013 predicted monthly discharges in a semi-arid region using ANN with different training algorithms and they found that the best ANN model trained with scaled conjugate gradient algorithm provided a correlation 0.78. Turan and Yurdusev 2014 used ANFIS and genetic fuzzy system (GFS) in predicting monthly river flows of Gediz Basin in Turkey and they obtained R of 0.84 and 0.85 for the best ANFIS and GFS models. It is clear from the presented tables “performance metrics” that the LSSVR and MARS models provided accurate results in forecasting monthly streamflow from the R 2 viewpoint.

4.2 Streamflow Predicting

In this section, streamflow’s prediction has been conducted using the LSSVR, MARS, M5 model tree, P-LSSVR and MLR based on nearby streamflow data for particular station. The significant of this kind of modeling is for the cases of missing river flow or the poor quality of discharge monitoring (e.g., upstream or downstream stations). For this kind of problem, streamflow prediction using nearby station can be highly useful to predict the missing data. In this study, the prediction was undertaken for the Turkish streams. This is for the reason that Garzan and Batman rivers have the same drainage hydrological features; so that, the prediction will be implemented in homogenous physical characteristics. Here also, the data base was cross-validated and divided into four divisions. With similar to the previous sub section application procedure, Table 8 expresses the optimal parameters of LSSVR model. For the scenario of predicting streamflow at Malabadi station (Batman River) using river flow data of Besiri station (Garzan River), Table 9 and 10 provided the modeling evaluators of LSSVR, MARS and M5 Tree models, respectively. According to the mean RMSE and MAE indicators, the highest score given by LSSVR and MARS models for M3 and input combination (iii) and (ii); in that order, while M5 Tree model score the best accuracy of M4 data set and two lagged times. Negatively, the three models gave the lowest accuracy scores for M1 data set. The best LSSVR model (M3 data set and input iii) increased the RMSE performance of the best MARS (M3 data set and input ii) and M5-Tree (M4 data set and input iii) models by 5.3 and 11.9 %, respectively. Comparison of the best explored model which is using LSSVR approach with MLR model (table 10), there were a positive improvement in the prediction scenario accuracies in term of mean RMSE and MAE by 37.04-29.95 %, respectively.

Table 8 The optimal parameters of the LSSVR models in cross application
Table 9 Comparison of the LSSVR and MARS models in predicting monthly streamflow’s of the Malabadi Station by using the data of Besiri station
Table 10 Comparison of the M5-Tree and MLR models in predicting monthly streamflow’s of the Malabadi Station by using the data of Besiri station

The effect of embedding the periodicity feature was tested for prediction phase. This was conducted for the best accurate model has been obtained in the forgoing applications, which is least square support vector regression model. Again, the ideal regularization constant and RBF kernel values are visualized in Table 11. The test results of P-LSSVR is exhibited in Table 12; however, the best average performances accuracies of P-LSSVR were gained from M3 data set, whereas the worst model from M1 and M2 with slight variation. To further visualize the effect of including the periodic component, the percentages of the prediction development between LSSVR and P-LSSVR in term of the mean RMSE and MAE were 22.50-24.17 %, respectively. Finally, the actual and predicted river flow for LSSVR, MARS, M5 model tree, MLR and P-LSSVR are illustrated in Fig. 4 of the best sophisticated data set. Clearly, it was found that the closet prediction model is P-LSSVR with R value 0.89.

Table 11 The optimal parameters of P-LSSVR models in cross application
Table 12 Comparison of the P-LSSVR models in predicting monthly streamflow’s of the Malabadi Station by using the data of Besiri station
Fig. 4
figure 4

The streamflow prediction of the Malabadi Station by LSSVR, MARS, M5-Tree, MLR and P-LSSVR using M3 data sets of Beşiri Station

5 Conclusion

As a matter of fact, streamflow modeling is a challenging task for the hydrology researchers. This is due to the chaotic disturbances, complex non-linear dynamics and randomness phenomena of this hydrological variable. In the current research, the potential of three heuristic regression models namely; LSSVR, MARS and M5 model tree were investigated in forecasting and predicting long-term streamflow. The application and analysis were numerically conducted based on four rivers flow, Batman and Garzan Rivers located in Turkey, Euphrates and Tigris Rivers located in Iraq. However, the findings are enumerated as follows.

  1. (i)

    LSSVR, MARS and M5 tree models outperformed the classical MLR method in both scenarios forecasting and predicting.

  2. (ii)

    In general, LSSVR indicated better forecasted and predicted accuracies for one-month-ahead over MARS and M5 model tree. Indeed, this is due to the capability of the novel application of least square support vector regression which is developed version of support vector regression via excluding the quadratic programming problem in addition to the skill to capture the complicated non-linear relationship.

  3. (iii)

    The periodic component feature was embedded and considered within the input combinations of the modeling, the results illustrated that adding this component data was remarkably helpful to provide a detailed intuition into the process of the forecasted and predicted monthly streamflow and improves the accuracy modeling for all the examined rivers.