Developing Novel Robust Models to Improve the Accuracy of Daily Streamflow Modeling

Mohammadi, Babak; Ahmadi, Farshad; Mehdizadeh, Saeid; Guan, Yiqing; Pham, Quoc Bao; Linh, Nguyen Thi Thuy; Tri, Doan Quang

doi:10.1007/s11269-020-02619-z

Developing Novel Robust Models to Improve the Accuracy of Daily Streamflow Modeling

Published: 13 July 2020

Volume 34, pages 3387–3409, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Water Resources Management Aims and scope Submit manuscript

Developing Novel Robust Models to Improve the Accuracy of Daily Streamflow Modeling

Download PDF

Babak Mohammadi¹,
Farshad Ahmadi²,
Saeid Mehdizadeh³,
Yiqing Guan¹,
Quoc Bao Pham^4,5,
Nguyen Thi Thuy Linh⁶ &
…
Doan Quang Tri ORCID: orcid.org/0000-0003-3990-5838⁷

1049 Accesses
62 Citations
Explore all metrics

Abstract

Streamflow plays a major role in the optimal management and allocation of available water resources in each region. Reliable techniques are therefore needed to be developed for streamflow modeling. In the present study, the performance of streamflow modeling is improved via developing novel boosted models. The daily streamflows of four hydrometric stations comprising of the Brantford and Galt stations located on the Grand River, Canada, as well as Macon and Elkton stations respectively, located on the Ocmulgee and Umpqua rivers, United States, are used. Three different types of boosted models are implemented and proposed by coupling the classical multi-layer perceptron (MLP) with the optimization algorithms, including particle swarm optimization (PSO) and coupled particle swarm optimization-multi-verse optimizer (PSOMVO) and a time series model, namely the bi-linear (BL). So, the boosted MLP-PSO, MLP-PSOMVO, and MLP-BL models are developed. The accuracy of all the boosted models is compared with the classical MLP and BL by the statistical metrics used. It is concluded that all the boosted models developed at the studied stations lead to superior modeling results of the daily streamflows to the classical MLP; however, the boosted MLP-BL models generally outperformed the MLP-PSO and MLP-PSOMVO ones.

A novel framework for peak flow estimation in the himalayan river basin by integrating SWAT model with machine learning based approach

Article 24 November 2023

Improving Streamflow Prediction Using Hybrid BPNN Model Combined with Particle Swarm Optimization

Development of a Hybrid Data Driven Model for Hydrological Estimation

Article 07 June 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Streamflow is an essential component of the water cycle. It can give significant information to design water infrastructures and flood control systems, to mitigate the impacts of droughts on available water resources systems, to optimize management of the irrigation and agriculture at any particular region, to generate hydropower, etc. (Yaseen et al. 2016; Anghileri et al. 2016; Tikhamarine et al. 2019a; Fang et al. 2019). Therefore, knowing the streamflow time series is a necessity for hydrologists, water resources managers, and decision-makers. Streamflow is expected to include the non-linear, stochastic, and non-stationary behaviors that make a complex phenomenon (Bayazit 2015). Given this, robust techniques are required to be used and developed by the hydrologists to capture the features as mentioned above.

In a general classification, streamflow modeling approaches can be categorized in main two groups, including the physically-based techniques and the data-driven models (Peugeot et al. 2003; He et al. 2014; Di et al. 2014; Zhang et al. 2016). The streamflow time series are simulated in the physically-based models through modeling the potential interactions among the various factors consisting of the weather information, land surface characteristics, etc. (Wang et al. 2016; Fang et al. 2019). These models, therefore, seem to be complicated to use in the applications. Furthermore, the data-driven models are another type of streamflow modeling techniques. They have the capability to model the streamflow process via historical records of streamflow or other variables without any need to know the physical procedures governing the streamflow process (Di et al. 2014; Garcia et al. 2016; Zhang et al. 2016). Two well-known and commonly used types of data-driven models in streamflow forecasting are the time series models and artificial intelligence (AI) techniques. However, the application of AI techniques has received a widespread interest; while the time series models have been used lesser than the AI ones in streamflow modeling.

It is proven that the standalone classical models couldn’t provide appropriate performances for modeling the hydrological time series (e.g., streamflow). Therefore, major efforts have been made to improve the modeling accuracy of the standalone models. Recently, implementing boosted models has received remarkable progress by many researchers. In this context, coupling the standalone models such as AI-based approaches with the wavelet analysis, the time series models, the optimization algorithms, etc. can be taken into consideration as alternatives to the standalone models with a reliable level of performance. The boosted models generated via integrating the AI and time series models could demonstrate higher accuracies since both the outputs of the models as mentioned above are considered through a boosted AI-time series model. In fact, the standalone classical AI and time series models focus only on the modeling/capturing the deterministic and stochastic segments, respectively, while the boosted models use both terms to improve the modeling performance. Furthermore, the optimization algorithms are coupled with the AI models in order to find the minimum cost of the AI function and to improve training phase of the AI models. Recently, the practice of metaheuristic optimization algorithms demonstrated a considerable potential solution to alleviate the difficulties exist with training and parameterization of AI models. These algorithms enable automatically learner of the AI models and improve the model performance (Pham et al. 2020; Mohammadi 2019a, 2019b; Moazenzadeh et al. 2018). Various bio-inspired meta-heuristic algorithms have been invented to cope with optimization issues by imitation of the hydrological phenomena. Some of those prevalent nature-inspired meta-heuristic algorithms includes shuffled frog leaping algorithm (Mohammadi et al. 2020), particle swarm optimization (Tikhamarine et al. 2019b), bee colony algorithm (Choong et al. 2017), genetic algorithm (Jahani and Mohammadi 2019), krill-herd algorithm (Moazenzadeh and Mohammadi 2019, Mohammadi and Aghashariatmadri 2020), gray wolf optimizer algorithm (Maroufpoor et al. 2020; Tikhamarine et al. 2019a), firefly algorithm (Aghelpour et al. 2019), whale optimization algorithm (Mohammadi and Mehdizadeh 2020; Vaheddoost et al. 2020), and particle swarm optimization (Aghelpour et al. 2020).

In recent years, the application of AI-based models has received significant attention in modeling the streamflow time series on various time scales including the daily and monthly. Some of detailed information of previous works reviewed in this study is summarized (Table 1).

Table 1 Information of previous studies published in literature reviewed in the current study

Full size table

As can be concluded from the literature review mentioned in Table 1, the time series models have received less attention in streamflow modeling. In addition, the boosted models have illustrated superior performances compared to the standalone models that confirms the need to develop boosted models to precisely modeling of streamflow time series.

The chief purpose of this research work is to enhance the modeling accuracy of daily streamflow time series at two hydrometric stations in Canada, and two others in United States. In this process, an AI-based model including the multi-layer perceptron (MLP) is coupled with the particle swarm optimization (PSO), particle swarm optimization coupled on multi-verse optimizer (PSOMVO), and bi-linear (BL) models. Besides developing the aforementioned boosted models, the performance of classical MLP and BL is also evaluated in modeling the daily streamflows and then compared to the boosted models proposed. The innovative aspects of this study are to develop the boosted MLP-PSOMVO and MLP-BL, as well as the classical BL time series model. To the best of authors’ knowledge, this study is the first attempt in literature for the daily streamflow modeling through the boosted MLP-BL and MLP-PSOMVO models. Main reason to select the non-linear BL model is that it includes the potential for capturing the stochastic term of streamflow as a non-linear phenomenon in hydrologic cycle. Additionally, according to literature and reviews MLP model has good performance for predicting streamflow. And nature-inspired optimization such as PSO and MVO algorithm can be improved ability of classic MLP, then MLP-PSOMVO can be a proposed model for predicting streamflow by high accuracy in compression classic MLP model.

2 Materials and Methods

2.1 Study Area and Data Used Description

To carry out this research work, the daily streamflow information of two hydrometric stations in the Canada and two other ones in the United States are used. The Brantford and Galt stations located on the Grand River in Canada are considered. The Grand River with a length of 280 km is one of the large rivers in southwestern Ontario. It is completely within the boundaries of southern Ontario. Furthermore, the Macon station located on Ocmulgee River and near Elkton station located on Umpqua River, United States, are selected. The Macon and Elkton stations are respectively located in Georgia State of the southeastern United States and Oregon State of the northwestern United States. The geographical information of studied hydrometric stations and statistical properties of the observed daily streamflows including the minimum (X_min), maximum (X_max), mean (X_mean), standard deviation (X_sd), and coefficient of variation (X_cv) for both the train and test phases are presented (Table 2). The geographical locations of the studied hydrometric stations in the Canada and United States are depicted (Fig. 1).

Table 2 Statistical parameters of the observed daily streamflow data for the studied sites during train and test phases

Full size table

The daily streamflow time series of the studied sites span the water years from 1 Oct. 1998 to 30 Sep. 2018 (i.e., 20-year). The daily streamflow data of the Canadian and American stations are acquired from (https://wateroffice.ec.gc.ca) and (http://water.usgs.gov/waterwatch/), respectively. In this study, whole the data are split into the train (i.e., 75% of the data between 1 Oct. 1998 and 30 Sep. 2013) and test (i.e., 25% of the data between 1 Oct. 2013 and 30 Sep. 2018) datasets (Fig. 2). As clearly can be seen in Table 2, the statistical parameters of the daily streamflows at the studied locations are almost the same for the train and test stages.

The following equation is used in this study to standardize the observed daily streamflows of the studied sites as:

$$ {Q}_s=\frac{Q_o-\overline{Q_o}}{\sigma_{Q_o}} $$

(1)

where Q_s illustrates the standardized daily streamflow, Q_o is the observed daily streamflow, $ \overline{Q_o} $represents the mean of observed daily streamflows for each train and test periods, and $ {\sigma}_{Q_o} $denotes the standard deviation of the observed daily streamflows for each train and test phases.

2.2 Bi-Linear (BL) Time Series Model

Non-linear BL model was initially proposed by Granger and Andersen (1978). It is developed based on the ARMA models. The BL model is extracted from the second-order extension of Taylor series and displayed as BL(p, q, r, s) (Fan and Yao 2003). It can be formulated as the following equation:

$$ {Z}_t=\kern0.5em \sum \limits_{i=1}^p\left({\varphi}_i.{Z}_{t-i}\right)-\sum \limits_{j=1}^q\left({\theta}_j.{\varepsilon}_{t-j}\right)+\sum \limits_{i=0}^r\sum \limits_{j=1}^s\left({\beta}_{ij}.{Z}_{t-i-j}.{\varepsilon}_{t-j}\right)+{\varepsilon}_t $$

(2)

where Z_tis a standardized time series, p, q, r, sare the positive integers indicating the BL order, φ, θ, β are the model coefficients, and ε_t is a standardized stochastic series.

Considering $ \sum \limits_{i=0}^r\sum \limits_{j=1}^s\left({\beta}_{ij}.{Z}_{t-i-j}.{\varepsilon}_{t-j}\right) $(i.e., the product of Z_t and ε_t(in Eq. (2) causes the BL to be considered as a non-linear model (Ainkaran 2004); hence, it can be used in modeling non-linear phenomena such as streamflow.

Interested readers can refer to Fan and Yao (2003) for more details regarding the required steps of fitting a non-linear BL.

2.3 Multi-Layer Perceptron (MLP) Neural Network

MLP is known as one of the most common examples of feed-forward neural networks, the potential of which in modeling many engineering problems has been confirmed in recent years (Kisi et al. 2017; Jahani and Mohammadi 2019). In general, MLP consists of a sequential multi-layer network structure, including input layer, hidden/middle layer(s), and output layer. After obtaining information in the input layer, the information processing and model learning process begin in the hidden layer(s) with a number of neurons and activation functions. The type of activation function is selected with respect to the user experience, where sigmoid and linear functions are often used for the hidden and output layers, respectively.

Considering φ as an activation function of the j^th neuron in the hidden layer, the output of the neuron can be expressed as follows:

$$ {y}_i=\varphi \left(\sum {W}_{ji}{x}_i+{b}_i\right) $$

(3)

where x_i is the input, W and b stand for the synaptic weight vector and bias, respectively. In this research, the Levenberg-Marquardt method is used to optimize the classic MLP structure (Kisi et al. 2016).

2.4 Particle Swarm Optimization (PSO)

The PSO meta-heuristic algorithm is firstly proposed by Eberhart and Kennedy (1995) for optimization of the complicated processes. It is a universal minimization method that can be used to deal with problems that are answered at one point or level in the next n-space. In such a space, hypotheses are made and an initial velocity is given to them. These particles then move in the response space, and the results are calculated based on a “merit criterion” after each time interval (Assareh et al. 2010). Over time, the particles accelerate toward particles that have a higher merit criterion and are in the same communication group. Although, each method works well in a range of issues, this method has shown great success in solving continuous optimization problems. Steps of the PSO algorithm in cloud: (i) Establishing and evaluating the primary population. (ii) Determine the best personal memories and the best collective memories. (iii) Speed and position update. (iv) If the conditions for stopping are not met, we will go to step 2.

2.5 Multi-Verse Optimizer (MVO)

This section introduces a brief overview of a newely developed nature-inspired algorithm named multi-verse optimizer (MVO), which was firstly proposed by Mirjalili et al. (2016). This algorithm is inspired focusing on three concepts in cosmology: the white hole, the black hole, and the wormhole. The mathematical relationships of these concepts are designed to perform local exploration, exploitation and search, respectively (Mirjalili et al. 2016). The fore mentioned concepts accomplish exploration, exploitation and local inspection based on the mathematical formulation so that there are two basic coefficients consisting of wormhole existence probability and travelling distance rate. The result of four challenging test problems on MVO algorithm show that the proposed algorithm is capable of delivering highly competitive results and is better than the best algorithms available in the literature in most tests (Faris et al. 2016). An interested reader can refer to Mirjalili et al. (2016) for more details about the MVO structure.

2.6 Models Development

As mentioned, the daily streamflow time series of the considered locations are modeled through the classical BL and MLP, as well as the boosted MLP-PSO, MLP-PSOMVO, and MLP-BL models. The below steps are followed for developing the classical and boosted models:

For the case of classical BL, various BL models containing the different orders are initially tested and fitted to the standardized daily streamflow data. Then, the optimal BL models are selected at each location focusing on the minimum values for the Akaike Information Criterion (AIC).

The classical MLP models at the study sites are developed by using the antecedent daily streamflows. In this regard, one-day to three-day lagged streamflow data (i.e., Q_t-1, Q_t-2, Q_t-3) is used for modeling the streamflow of present day (i.e., Q_t). The same input and output parameters are also employed when implementing the boosted models. Table 3 summarizes the input and output variables considered in the modeling procedure.

Table 3 Input and output parameters used to develop the classical and boosted models

Full size table

Regarding to use the PSO as an updator tool for weight of MLP, training phase of MLP should be improved and this is a reason for better training of classical MLP. In the boosted model of MLP-PSO, the PSO has an optimization role to find best optimal biases an weights for classical MLP model. In the boosted MLP-PSOMVO model, the search agents of weights and biases of the MLP model are calculated using the PSO and MOVO algorithms while the MLP uses benefits of PSO and MVO algorithms at same time. Then, PSOMVO can help to MLP for finding the optimal weights and biases and also the PSOMVO can find probable answers that the classical MLP and PSO have failed to provide it. So, these are reason for improving accuracy of MLP by new boosted PSOMVO method. Figure 3 shows a schematic diagram for input, output and modeling process via the classical MLP and boosted MLP-PSO and MLP-PSOMVO models implemented.

Finally, the modeled outputs of the classical MLP are summed with the residuals or stochastic errors of the classical BL to develop the boosted MLP-BL models as follows:

$$ {Q}_{Hyb}={Q}_{MLP}+{Q}_{BL} $$

(4)

where Q_Hyb presents the modeled daily streamflow via the boosted MLP-BL model, Q_MLP is the modeled daily streamflow by the classical MLP, and Q_BL illustrates the error of classical BL.

As already mentioned, the classical models are not able to capture the observed hydrological parameters time series such as streamflow with reliable precision. The classical time series (e.g., BL) and AI (e.g., MLP) models could have capability to modeling the stochastic and deterministic terms, respectively. The powerful modeling approaches are therefore needed to consider both the mentioned terms in the modeling procedure of hydrological parameters. The proposed MLP-BL models in this study apply both terms when modeling the daily streamflow.

As noted, whole the streamflow datasets are standardized before the modeling using Eq. (1). It is apparent that the modeled outputs of the daily streamflows via the classical and boosted models will be in the standardized forms. In other words, they must be de-standardized by multiplying the outputs of any modeling techniques by the corresponded standard deviations and then summing the resulting values by the corresponding averages.

2.7 Performance Evaluation Metrics

The root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²) are used in the present study to evaluate the accuracy of classical and proposed models in daily streamflow modeling as below:

$$ RMSE=\sqrt{\frac{\sum \limits_{i=1}^N{\left({Q}_{o,i}-{Q}_{m,i}\right)}^2}{N}} $$

(5)

$$ MAE=\frac{\sum \limits_{i=1}^N\left|{Q}_{o,i}-{Q}_{m,i}\right|}{N} $$

(6)

$$ {R}^2={\left[\frac{\sum \limits_{i=1}^N\left({Q}_{o,i}-\overline{Q_o}\right)\cdot \left({Q}_{m,i}-\overline{Q_m}\right)}{\sqrt{\sum \limits_{i=1}^N{\left({Q}_{o,i}-\overline{Q_o}\right)}^2\cdot \sum \limits_{i=1}^N{\left({Q}_{m,i}-\overline{Q_m}\right)}^2}}\right]}^2 $$

(7)

where Q_{o, i}, Q_{m, i}, $ \overline{Q_o} $ and $ \overline{Q_m} $indicate the ith observed daily streamflow, the ith modeled daily streamflow, the average of the observed daily streamflows, and the average of the modeled daily streamflows, respectively. Obviously, lower values achieved for the RMSE and MAE metrics, and the higher values obtained for the R² illustrate the better performance of the applied modeling techniques in daily streamflow modeling.

3 Results

Firstly, the superior BL time series models are fitted to the observed streamflow data at the studied locations in their standardized forms. It is found that BL(13,4,1,1), BL(10,6,1,1), BL(10,0,1,1), and BL(12,1,1,1) are the best-performing BL models at the Brantford, Galt, Macon, and Elkton, respectively, with the minimum values for the AIC. The values of RMSE, MAE, and R² metrics for both the train and test phases are calculated (Table 4). The highest and lowest accuracy levels of the developed BL models are respectively observed at the Galt and Elkton stations, respectively. The poorer performance of BL at the Elkton station (i.e., higher values of the RMSE and MAE, as well as the lower value of the R²) can be attributed to high streamflows of this location than the other stations.

Table 4 Values of RMSE, MAE, and R² statistics obtained for the BL models at the studied sites during train and test phases

Full size table

Scatter plots for the observed against the modeled streamflow data via the BL model for each location during the test phase is depicted (Fig. 4). The dotted red lines in this Figure demonstrate the regression lines fitted to the observed and modeled streamflows. Furthermore, the equations provided in the form of y = ax + b are the equations of the regression lines in which y is the modeled streamflow, x is the observed streamflow, a is the slope, and b is the intercept. The better performance of each model is obtained when the values of a and b are closer to 1 and 0, respectively. This issue can be seen in the Brantford and Galt stations, while it cannot be concluded for the Macon and Elkton stations, specifically the Elkton, indicating the lower accuracy of the BL models developed at these locations.

The classical MLP is then used for modeling the daily streamflows of the studied locations with considering the input and output parameters mentioned in Table 3. In this context, different numbers for the hidden layer neurons are tested and then the optimal hidden layer neurons at each station are selected. It was found that the optimal numbers of neurons in the hidden layer for the classical MLP-based M1, M2, M3 models are 2, 30, 6 at Brantford, 14, 7, 20 at Galt, 18, 14, 20 at Macon, and 18, 3, 7 at Elkton, respectively. The values of RMSE, MAE, and R² statistics computed for the conventional MLP during both train and test phases are listed in the first sections of Tables 5, 6, 7 and 8. The results denote the applicability of antecedent daily streamflow values for the streamflow modeling of each day at any location. Moreover, assessing the accuracy of MLP models developed under models No. M1-M3 indicates that the modeling accuracy of daily streamflow improves with increasing the number of lags.

Table 5 Values of RMSE, MAE, and R² statistics obtained at the Brantford station during train and test phases

Full size table

Table 6 Values of RMSE, MAE, and R² statistics obtained at the Galt station during train and test phases

Full size table

Table 7 Values of RMSE, MAE, and R² statistics obtained at the Macon station during train and test phases

Full size table

Table 8 Values of RMSE, MAE, and R² statistics obtained at the Elkton station during train and test phases

Full size table

As previously mentioned, the main aim of the current study is to increase the performance of classical MLP for modeling the daily streamflow. To that end, the boosted MLP-PSO, MLP-PSOMVO, and MLP-BL models are developed. The values of statistical metrics obtained for the aforementioned boosted models are tabulated in the second to fourth sections of the Tables 5, 6, 7 and 8. As clearly can be seen, the higher values of the RMSE and MAE, as well as the lower values of R² are significantly improved through the proposed models than those of the corresponding metrics in the classical MLP.

Here, the performance of all the classical and boosted models implemented in this study is evaluated for the daily streamflow modeling of the considered locations. The BL time series models developed at all stations perform weaker than the classical MLP. A comparative assessment of the classical MLP and boosted MLP-based models illustrates that the boosted models including the MLP-PSO, MLP-PSOMVO, and MLP-BL outperform the classical MLP in daily streamflow modeling. In this context, the MLP-PSOMVO models present superior performances in comparison to the MLP-PSO ones at all the study locations; however, the boosted MLP-BL models are the best-performing boosted paradigms with the highest degree of precision. The only exception is observed at the Macon station, where the boosted MLP-PSOMVO models present the best results. The superior models for high-performance streamflow modeling are related to the models no. of M3 (MLP-BL) at the Brantford, M2 (MLP-BL) at the Galt, M3 (MLP-PSOMVO) at the Macon, and M2 (MLP-BL) at the Elkton. As an example, the RMSE values of aforementioned superior boosted models in comparison to the corresponding models of classical MLP are reduced in the train and test phases by 73.719% and 72.366% for the Brantford, 72.513% and 68.707% for the Galt, 31.147% and 37.719% for the Macon, 73.823% and 72.719% for the Elkton.

In order to graphically assess the efficiencies of classical MLP and boosted MLP-based models, the scatter and radar diagrams are prepared (Figs. 5, 6 and 7). In doing so, the best-performing classical MLP and boosted MLP-PSO, MLP-PSOMVO, and MLP-BL models at each station during the test phase is selected to prepare the scatter plots. The error criteria of the superior models are highlighted in boldface in Tables 5, 6, 7 and 8. Clearly, lesser dispersions of the data, higher values of the slopes, and lower values for the intercepts in the fitted regression lines of the boosted models than those of the classical MLP ones denote the dependable performances of the boosted models for the daily streamflow modeling. In addition, radar diagrams are provided to illustrate how the values of RMSE metric change in the classical MLP and implemented proposed models (Fig. 7). For this goal, the RMSE values of all the models under models no. M1-M3 during the test phase are used. The radar diagrams in this study are as a triangular since three types of models (i.e., M1, M, M3) are considered when modeling the daily streamflow. As obviously can be seen, triangles made for the classical MLP models at the studied locations are large indicating the highest values of RMSE and therefore lowest accuracy of the classical MLP. However, these triangles are much denser in the boosted models, especially for the boosted MLP-BL and MLP-PSOMVO models. This point confirms the suitability of boosted models compared to the classical MLP for modeling the daily streamflow time series with a reliable precision.

4 Discussion

As concluded, the performance of classic MLP at the studied stations was found to be better than the classic BL. This result could be due to the time scale of the data used (i.e., daily), so that contradictory results may be achieved considering other time scales including the monthly, seasonally, etc. Superior performances of the linear and non-linear types of time series models compared to the AI techniques were reported by some researchers on a monthly time scale. In this regard, the linear AR model illustrated higher accuracy than the AI-based models (GEP, FFNN, RBFN, ANFIS) applied to monthly streamflow modeling (Terzi and Ergin 2014). Moreover, the capability of linear FARIMA as well as non-linear SETAR and GARCH time series-based models for monthly streamflow modeling was better than the MARS, GEP, ANN, and RF (Mehdizadeh et al. 2019b; Fathian et al. 2019).

As is apparent, the modeled streamflow data using any modeling techniques such as MLP could show deviations from the observed data. Hence, the main purpose of this study was to improve the daily streamflow modeling via coupling the MLP with three techniques including the time series-based BL, as well as two optimization-based PSO and PSOMVO. Superior performances of the boosted MLP-PSO and MLP-PSOMVO models than the classical MLP can be explained taking into account the fact that PSOMVO with global and local search capability at the same time can train MLP by minimum error. As a result, the proposed method can find the optimal values of the desired function and is a suitable learning algorithm for the classical MLP. The classical MLP has better learning with PSOMVO which leads to higher accuracy of streamflow modeling. On the other hand, considering both the stochastic and deterministic terms via the boosted MLP-BL can be the most important reason for the higher modeling accuracy of the boosted models compared to the classical MLP when modeling the daily streamflow.

One of the results of boosted models (i.e., reliable accuracy of the boosted MLP-BL than the classical MLP) supports the outcomes concluded in previous works. For example, MARS and GEP models were coupled on the ARCH time series model for reference evapotranspiration estimation (Mehdizadeh 2018). The linear AR, MA, and ARMA time series models were hybridized with the MARS and K-nearest neighbors (KNN) approaches to precipitation modeling (Mehdizadeh 2020). The performance of GEP and Bayesian network (BN) models was improved for streamflow modeling through coupling the mentioned AI models with the linear AR and ARMA (Mehdizadeh and Kozekalani Sales 2018). In other studies, hybrid models were proposed by combining the various time series models including the linear FARIMA as well as non-linear ARCH, GARCH, and SETAR with the diverse types of AI models such as MARS, GEP, and ANN to precipitation and streamflow modeling (Mehdizadeh et al. 2017, 2018, 2019b). Improvements in the performance of AI techniques via the boosted AI-time series models were reported by the scholars.

Furthermore, higher performance of the boosted models developed through coupling the AI and optimization-based models compared to the standalone AI models was confirmed and reported by some scholars. For example, a new boosted model based on coupling shuffled frog leaping algorithm (SFLA) and ANFIS was implemented for predicting streamflow in two rivers in Vietnam (Mohammadi et al. 2020). A boosted technique was proposed by integrating the MARS and DE as an optimization approach (i.e., MARS-DE) for monthly streamflow estimation of Tigris River, Iraq (Al-Sudani et al. 2019). The proposed boosted model achieved better estimates of the streamflow than the standalone MARS. In another study, boosted models were developed via coupling the GWO method with the AI-based SVM, ANN, and MLR for estimating the monthly streamflow of Aswan High Dam, Egypt (Tikhamarine et al. 2019a). The results denoted the superior performance of the proposed models than the standalone ones. Also, there are other nature-inspired algorithms as they recently developed by researchers in hydrological studies. For example, the SVR model was developed by Krill Herd algorithm (SVR-KHA) for daily solar radiation estimation in Iran (Mohammadi and Aghashariatmadari 2020). The ANFIS model was modified by Grey Wolf Optimizer algorithm (ANFIS-GWO) for soil moisture simulating (Maroufpoor et al. 2020). The MLP model was extended by Whale Optimization Algorithm (MLP-WOA) for predicting of field capacity and the permanent wilting point (Vaheddoost et al. 2020).

One of the main flaws of the classical AI models such as MLP applied in the current study is their weaknesses in modeling the extreme values of the hydrological time series such as streamflow. The extreme values of the streamflow time series include the droughts and floods. Knowing the extreme streamflows (i.e., high and low values) in each river is very important and could be of use to design water resources structures like dam spillway and sluiceway operations (Kisi and Sanikhani 2012). Evaluating the performance of classical MLP models in modeling the extreme values demonstrate that the classical MLP models developed at whole the study stations provide the over-estimation and under-estimation in modeling the low and peak streamflow data. However, the boosted models developed show superior performances in capturing the observed daily streamflow time series, specifically for the high values (i.e., floods). In this regard, one of the most important reasons for the low performance of the classical MLP models is their poorer ability to model the high values, while the boosted models illustrate an acceptable capability for modeling the peak values of the streamflow time series. This is clearly observed in the scatter plots illustrated in Figs. 5 and 6.

5 Conclusions

In the present study, the daily streamflow time series of four hydrometric stations consisting of the Brantford and Galt in Canada, as well as Macon and Elkton in United States were modeled. Two classical models and three boosted paradigms are used as modeling techniques. Assessing the accuracy of classical models, namely the time series-based BL and an AI-based MLP illustrated that the MLP performed better than the BL for the daily streamflow modeling of the studied sites. The boosted models were then developed and proposed for improving the streamflow models (i.e., MLP-PSO, MLP-PSOMVO, and MLP-BL) were found to provide better results compared to the classical MLP. In general, the boosted MLP-BL outperformed both the other boosted models at the studied locations, except for the Macon station where the MLP-PSOMVO was the bet-performing boosted model.

As clearly concluded, the proposed models showed reliable performances than the classical models used in this study. More boosted models are suggested to be implemented in future research works to improve the modeling accuracy of hydrological variables time series such as streamflow. In this context, coupling the AI-based models with the wavelet analysis, other types of the time series models and optimization algorithms could be of use. The modeling accuracy of proposed models could also be examined on the other time scales such as the monthly and seasonally. in modeling the streamflow. Additionally, it is suggested to test the performance of proposed models for modeling the other hydrological variables including rainfall, evaporation, evapotranspiration, etc.

References

Abdollahi S, Raeisi J, Khalilianpour M, Ahmadi F, Kisi O (2017) Daily mean Streamflow prediction in perennial and non-perennial Rivers using four data driven techniques. Water Resour Manege 31(15):4855–4874
Google Scholar
Abudu S, Cui C-l, King JP, Abudukadeer K (2010) Comparison of performance of statistical models in forecasting monthly streamflow of Kizil River, China. Water Sci Eng 3(3):269–281
Google Scholar
Adamowski A, Chan HF, Prasher SO, Sharda VN (2012) Comparison of multivariate adaptive regression splines with coupled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro-watersheds with limited data. J Hydroinf 14(3):731–744
Google Scholar
Aghelpour P, Mohammadi B, Biazar SM (2019) Long-term monthly average temperature forecasting in some climate types of Iran, using the models SARIMA, SVR, and SVR-FA. Theor Appl Climatol 138(3–4):1471–1480
Google Scholar
Aghelpour P, Bahrami-Pichaghchi H, Kisi O (2020) Comparison of three different bio-inspired algorithms to improve ability of neuro fuzzy approach in prediction of agricultural drought, based on three different indexes. Comput Electron Agric 170:105279
Google Scholar
Ainkaran P (2004) Analysis of some linear and nonlinear time series models. A thesis submitted in fulfillment of the requirements for the degree of Master of Science, School of Mathematics and Statistics, University of Sydney
Al-Sudani ZA, Salih SQ, Sharafati A, Yaseen ZM (2019) Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation. J Hydrol 573:1–12
Google Scholar
Anghileri D, Voisin N, Castelletti A, Pianosi F, Nijssen B, Lettenmaier DP (2016) Value of long-term streamflow forecasts to reservoir operations for water supply in snow-dominated river catchments. Water Resour Res 52(6):4209–4225
Google Scholar
Assareh E, Behrang MA, Assari MR, Ghanbarzadeh A (2010) Application of PSO (particle swarm optimization) and GA (genetic algorithm) techniques on demand estimation of oil in Iran. Energy 35(12):5223–5229
Google Scholar
Bayazit M (2015) Nonstationarity of hydrological records and recent trends in trend analysis: a state-of-the-art review. Environ Process 2(3):527–542
Google Scholar
Choong SM, El-Shafie A, Mohtar WW (2017) Optimisation of multiple hydropower reservoir operation using artificial bee colony algorithm. Water Resour Manag 31(4):1397–1411
Google Scholar
Di C, Yang X, Wang X (2014) A four-stage hybrid model for hydrological time series forecasting. PLoS One 9(8):e104663
Google Scholar
Eberhart R, Kennedy J (1995) Particle swarm optimization. In Proceedings of the IEEE international conference on neural networks 4:1942–1948
Fan J, Yao Q (2003) Nonlinear time series, nonparametric and parametric methods. Springer-Verlag, NewYork, Inc.
Google Scholar
Fang W, Huang S, Ren K, Huang Q, Huang G, Cheng G, Li K (2019) Examining the applicability of different sampling techniques in the development of decomposition-based streamflow forecasting models. J Hydrol 568:534–550
Google Scholar
Faris H, Aljarah I, Mirjalili S (2016) Training feedforward neural networks using multi-verse optimizer for binary classification problems. Appl Intell 45:322–332
Google Scholar
Fathian F, Mehdizadeh S, Kozekalani Sales A, Safari MJS (2019) Hybrid models to improve the monthly river flow prediction: integrating artificial intelligence and non-linear time series model. J Hydrol 575:1200–1213
Google Scholar
Garcia M, Portney K, Islam S (2016) A question driven socio-hydrological modeling process. Hydrol Earth Syst Sci 20(1):73–92
Google Scholar
Granger CWJ, Andersen A (1978) Non-linear time series modelling. In applied time series analysis I (pp. 25-38). Academic press
Hadi SJ, Tombul M (2018) Monthly streamflow forecasting using continuous wavelet and multi-gene genetic programming combination. J Hydrol 561:674–687
Google Scholar
He Z, Wen X, Liu H, Du J (2014) A comparative study of artificial neural network, adaptive neuro fuzzy inference system and support vector machine for forecasting river flow in the semiarid mountain region. J Hydrol 509:379–386
Google Scholar
Jahani B, Mohammadi B (2019) A comparison between the application of empirical and ANN methods for estimation of daily global solar radiation in Iran. Theor Appl Climatol 137(1–2):1257–1269
Google Scholar
Kisi O, Sanikhani H (2012) River flow estimation and forecasting by using two different adaptive neuro-fuzzy approaches. Water Resour Manag 26:1715–1729
Google Scholar
Kisi O, Nia AM, Gosheh MG, Tajabadi MRJ, Ahmadi A (2012) Intermittent streamflow forecasting by using several data driven techniques. Water Resour Manag 26(2):457–474
Google Scholar
Kisi O, Genc O, Dinc S, Zounemat-Kermani M (2016) Daily pan evaporation modeling using chi-squared automatic interaction detector, neural networks, classification and regression tree. Comput Electron Agric 122:112–117
Google Scholar
Kisi O, Sanikhani H, Cobaner M (2017) Soil temperature modeling at different depths using neuro-fuzzy, neural network, and genetic programming techniques. Theor Appl Climatol 129(3–4):833–848
Google Scholar
Liu Z, Zhou P, Chen G, Guo L (2014) Evaluating a coupled discrete wavelet transform and support vector regression for daily and monthly streamflow forecasting. J Hydrol 519(D):2822–2831
Google Scholar
Maroufpoor S, Bozorg-Haddad O, Maroufpoor E (2020) Reference evapotranspiration estimating based on optimal input combination and hybrid artificial intelligent model: hybridization of artificial neural network with grey wolf optimizer algorithm Journal of Hydrology, 125060. 10.1016/j.jhydrol.2020.125060, 588
Mehdizadeh S (2018) Estimation of daily reference evapotranspiration (ET_o) using artificial intelligence methods: offering a new approach for lagged ET_o data-based modeling. J Hydrol 559:794–812
Google Scholar
Mehdizadeh S (2020) Using AR, MA, and ARMA time series models to improve the performance of MARS and KNN approaches in monthly precipitation modeling under limited climatic data. Water Resour Manag 34:263–282 https://doi.org/10.1007/s11269-019-02442-1
Mehdizadeh S, Kozekalani Sales A (2018) A comparative study of autoregressive, autoregressive moving average, gene expression programming and Bayesian networks for estimating monthly streamflow. Water Resour Manag 32(9):3001–3022
Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2017) A comparison of monthly precipitation point estimates using integration of soft computing methods and GARCH time series model. J Hydrol 554:721–742
Google Scholar
Mehdizadeh S, Behmanesh J, Khalili K (2018) New approaches for estimation of monthly rainfall based on GEP-ARCH and ANN-ARCH hybrid models. Water Resour Manag 32(2):527–545
Google Scholar
Mehdizadeh S, Fathian F, Safari MJS, Adamowski JF (2019a) Comparative assessment of time series and artificial intelligence models to estimate monthly streamflow: a local and external data analysis approach. J Hydrol 579:124225
Google Scholar
Mehdizadeh S, Fathian F, Adamowski JF (2019b) Hybrid artificial intelligence-time series models for monthly streamflow modeling. Appl Soft Comput 80:873–887
Google Scholar
Mirjalili S, Mirjalili SM, Hatamlou A (2016) Multi-verse optimizer: a nature-inspired algorithm for global optimization. Neural Comput Appl 27(2):495–513
Google Scholar
Moazenzadeh R, Mohammadi B (2019) Assessment of bio-inspired metaheuristic optimisation algorithms for estimating soil temperature. Geoderma 353:152–171
Google Scholar
Moazenzadeh R, Mohammadi B, Shamshirband S, Chau KW (2018) Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Eng Appl Comput Fluid Mech 12(1):584–597
Google Scholar
Mohammadi B (2019a) Predicting total phosphorus levels as indicators for shallow lake management. Ecol Indic 107:105664
Google Scholar
Mohammadi B (2019b) “Prediction of effective climate change indicators using statistical downscaling approach and impact assessment on pearl millet (Pennisetum glaucum L.) yield through genetic algorithm in Punjab, Pakistan” by Asmat Ullah, Nasrin Salehnia, Sohrab Kolsoumi, Ashfaq Ahmad, Tasneem Khaliq. Ecol Indic 101:973–974
Google Scholar
Mohammadi B, Aghashariatmadari Z (2020) Estimation of solar radiation using neighboring stations through hybrid support vector regression boosted by krill herd algorithm. Arab J Geosci 13:363. https://doi.org/10.1007/s12517-020-05355-1
Article Google Scholar
Mohammadi B, Mehdizadeh S (2020) Modeling daily reference evapotranspiration via a novel approach based on support vector regression coupled with whale optimization algorithm. Agric Water Manag 237:106145. https://doi.org/10.1016/j.agwat.2020.106145
Article Google Scholar
Mohammadi B, Linh NTT, Pham QB, Ahmed AN, Vojteková J, Guan Y, Abba SI, El-Shafie A (2020) Adaptive neuro-fuzzy inference system coupled with shuffled frog leaping algorithm for predicting river streamflow time series. Hydrol Sci J:1–14. https://doi.org/10.1080/02626667.2020.1758703
Peugeot C, Cappelaere B, Vieux BE, Séguis L, Maia A (2003) Hydrologic process simulation of a semiarid, endoreic catchment in Sahelian West Niger. 1. Model-aided data analysis and screening. J Hydrol 279(1–4):224–243
Google Scholar
Pham QB, Afan HA, Mohammadi B, Ahmed AN, Linh NTT, Vo ND, ... El-Shafie A (2020) Hybrid model to improve the river streamflow forecasting utilizing multilayer perceptron-based intelligent water drop optimization algorithm. https://doi.org/10.1007/s00500-020-05058-5
Siqueira H, Boccato L, Luna I, Attux R, Lyra C (2018) Performance analysis of unorganized machines in streamflow forecasting of Brazilian plants. Appl Soft Comput 68:494–506
Google Scholar
Terzi O, Ergin G (2014) Forecasting of monthly river flow with autoregressive modeling and data-driven techniques. Neural Comput Applic 25:179–188
Google Scholar
Tikhamarine Y, Souag-Gamane D, Ahmed AN, Kisi O, El-Shafie A (2019a) Improving artificial intelligence models accuracy for monthly streamflow forecasting using grey wolf optimization (GWO) algorithm. J Hydrol 582:124435
Google Scholar
Tikhamarine Y, Malik A, Kumar A, Souag-Gamane D, Kisi O (2019b) Estimation of monthly reference evapotranspiration using novel hybrid machine learning approaches. Hydrol Sci J 64(15):1824–1842
Google Scholar
Tongal H, Booij MJ (2018) Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J Hydrol 564:266–282
Google Scholar
Uysal G, Forman AA, Fensoy A (2016) Streamflow forecasting using different neural network models with satellite data for a snow dominated region in Turkey. Procedia Eng 154:1185–1192
Google Scholar
Vaheddoost B, Guan Y, Mohammadi B (2020) Application of hybrid ANN-whale optimization model in evaluation of the field capacity and the permanent wilting point of the soils. Sci Pollut Res 27:13131–13141. https://doi.org/10.1007/s11356-020-07868-4
Article Google Scholar
Wang S, Huang G, Baetz B, Huang W (2016) Probabilistic inference coupled with possibilistic reasoning for robust estimation of hydrologic parameters and piecewise characterization of interactive uncertainties. J Hydrometeorol 17(4):1243–1260
Google Scholar
Yaseen ZM, Kisi O, Demir V (2016) Enhancing long-term streamflow forecasting and predicting using periodicity data component: application of artificial intelligence. Water Resour Manag 30(12):4125–4151
Google Scholar
Zhang H, Singh VP, Wang B, Yu Y (2016) CEREF: a hybrid data-driven model for forecasting annual streamflow from a socio-hydrological system. J Hydrol 540:246–256
Google Scholar

Download references

Author information

Authors and Affiliations

College of Hydrology and Water Resources, Hohai University, Nanjing, 210098, China
Babak Mohammadi & Yiqing Guan
Department of Hydrology & Water Resources Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran
Farshad Ahmadi
Water Engineering Department, Urmia University, Urmia, Iran
Saeid Mehdizadeh
Institute of Research and Development, Duy Tan University, Danang, 550000, Vietnam
Quoc Bao Pham
Faculty of Environmental and Chemical Engineering, Duy Tan University, Danang, 550000, Vietnam
Quoc Bao Pham
Thuyloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam
Nguyen Thi Thuy Linh
Sustainable Management of Natural Resources and Environment Research Group, Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Doan Quang Tri

Authors

Babak Mohammadi
View author publications
You can also search for this author in PubMed Google Scholar
Farshad Ahmadi
View author publications
You can also search for this author in PubMed Google Scholar
Saeid Mehdizadeh
View author publications
You can also search for this author in PubMed Google Scholar
Yiqing Guan
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Bao Pham
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thi Thuy Linh
View author publications
You can also search for this author in PubMed Google Scholar
Doan Quang Tri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Doan Quang Tri.

Ethics declarations

Conflict of Interest

The authors declare no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mohammadi, B., Ahmadi, F., Mehdizadeh, S. et al. Developing Novel Robust Models to Improve the Accuracy of Daily Streamflow Modeling. Water Resour Manage 34, 3387–3409 (2020). https://doi.org/10.1007/s11269-020-02619-z

Download citation

Received: 02 February 2020
Accepted: 02 July 2020
Published: 13 July 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s11269-020-02619-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Developing Novel Robust Models to Improve the Accuracy of Daily Streamflow Modeling

Abstract

Similar content being viewed by others

A novel framework for peak flow estimation in the himalayan river basin by integrating SWAT model with machine learning based approach

Improving Streamflow Prediction Using Hybrid BPNN Model Combined with Particle Swarm Optimization

Development of a Hybrid Data Driven Model for Hydrological Estimation

1 Introduction