1 Introduction

Flood control, drought relief, and the optimal utilization of water resources require accurate prediction of streamflow. However, the hydrological process is extremely complex and difficult to predict, especially in the medium and long term because of human impact, changing climatic conditions, and the geographical environment (Vicente-Guillén et al. 2012). A large number of researchers are devoted to understanding the dynamics of rainfall-runoff process (Bradford et al. 1991; Duan et al. 1992; Huang et al. 2014). In the past, the hydrological process was regarded as stochastic (Sivakumar et al. 2001). With the rapid development of nonlinear science, nonlinear time series analysis has brought a significant method revolution. The “science of chaos” has found applications in almost all the natural sciences, including hydrological sciences (Islam and Sivakumar 2002). Even simple deterministic systems can display complex or chaotic behavior. It is now believed that the nonlinear chaotic model can better describe the complex hydrological dynamic process (Sivakumar 2000), and chaos theory has become increasingly common in the study of the dynamics of hydrological process (Hu et al. 2013; Ouyang et al. 2016; Hong et al. 2016; Zhao et al. 2017).

Many researches have investigated chaotic behavior of hydrological processes (Mohammad 2016) by analyzing streamflow series using the runoff coefficient (Sivakumar et al. 2001), the exponent method (Xu et al. 2009), the correlation dimension method (Labat et al. 2016), and several independent methods, techniques and tools (Kedra 2013). Some studies have also used nonlinear chaotic methods to predict streamflow as univariate series (Porporato and Ridolfi 1997; Islam and Sivakumar 2002; Zhou et al. 2018) and as multivariate series incorporating information from other time series (Han et al. 2017), with chaos theory integrated using various approaches including local autoregressive polynomial methods (Bordignon and Lisi 2000), local approximations (Islam and Sivakumar 2002), genetic programming (Ghorbani et al. 2018), and artificial neural networks (ANN) (Khan et al. 2005; Dhanya 2010). Neural networks are particularly useful for forecasting because they deal well with the nonlinearity and instability of hydrological time series when the input vectors are designed using the phase space reconstruction method (Peng et al. 2017). There have been many achievements in the application of the ANN technique. However, Common ANNs are highly dependent on the iterative tuning of model parameters and the initial values of weights and biases, which easily lead to the instability of forecasting result. Therefore, employing different heuristic searching algorithms becomes popular in the training process.

A new learning paradigm called an Extreme Learning Machine (ELM) has been proposed for training single hidden-layer feedforward neural networks (Huang et al. 2006). ELM is much faster and more adaptable than traditional ANN (Huang et al. 2015; Taormina and Chau 2015). In ELM, the biases of the hidden layer and the weights of the input and hidden layers are randomly generated, and the weights of the hidden and output layers can be determined directly using the Moore-Penrose generalized inverse method. An intelligent optimization algorithm is commonly used to optimize the biases and weights to reduce the influence of the parameters being randomly selected and improve the prediction performance of the ELM model. The particle swarm optimization approach (PSO) has many computational advantages over other optimization search methods (Jiang et al. 2010). However, the potential for premature convergence degrades the performance of the algorithm and reduces the probability of finding global optima (Chu et al. 2010; Jiang et al. 2013). Using ideas drawn from population division and biological evolution, Jiang et al. (2015) proposed an improved particle swarm optimization (IPSO) to solve nonlinear optimization problems. In this paper, this method was applied for training an ELM to determine the optimal values of the biases and weights.

The objectives of the studies are as follows: (1) to analyze the chaotic behavior of the monthly streamflow series of the Chaohe River Basin using a variety of techniques. (2) to develop a hybrid model integrating chaos theory and extreme learning machines with optimal parameters selected by an improved particle swarm optimization (ELM-IPSO) to analyze and predict monthly streamflow.

2 Study Area and Data Used

The Chaohe River basin is located between 40°20′ – 41°27′N and 116°87′ – 117°34′E. The river originates in Fengning County, flows through Luanping County in Hebei Province, China, then runs down to Miyun County and empties into the Miyun Reservoir. The length of the river is about 170 km, and the average annual streamflow volume is 18.04 × 109 m3. Daiying hydrological station is a control station for the Chaohe River basin, and the catchment area upstream from the control section is 4701 km2 (Fig. 1). Monthly streamflow data of Daiying hydrological station, provided by the Beijing Water Authority, were used to analyze the chaos characteristics in the process of river flow. Figure 2 shows the variation in monthly streamflow for the period between January 1956 and December 2010.

Fig. 1
figure 1

Map of the Chaohe River Basin and the distribution of hydrological stations

Fig. 2
figure 2

Time series plot for monthly streamflow data

3 Methodology

Chaos theory was developed at the end of the nineteenth century. It deals with complex and unpredictable nonlinear systems (Dhanya and Kumar 2010). The essence of chaos is the sensitivity of the system to the change of initial conditions (Sivakumar 2004). Several studies have since applied ideas from chaos theory to understanding geophysical phenomena. The qualities that make a system chaotic are: (i) it is deterministic; (ii) it is sensitive to initial conditions; (iii) it is neither random nor disorderly.

3.1 Phase Space Reconstruction

Phase space reconstruction is a useful tool for characterizing dynamical systems by a phase space diagram, which is essentially a coordinate system that has all the variables of the system as its basis. Each trajectory in the phase space diagram describes the evolution of the system, and each point represents the state of the system at a given time (Sivakumar 2000). All trajectories from different initial conditions in phase space will eventually converge to a subset, which is called the attractor of the system.

Phase space reconstruction was firstly proposed by Takens, who proved theoretically and by numerical simulation that state space reconstruction can preserve the geometric invariance of nonlinear dynamic systems (Takens 1981). Existing methods for phase space reconstruction include the method of time delays, the differential coordinate method, and the principal component analysis method, among which the method of time delays is the most popular. For a single variable time series x1, x2, ⋯, xn, its phase space reconstruction can be expressed as.

$$ {Y}_i=\left({x}_i,{x}_{i+\tau },\cdots, {x}_{i+\left(m-1\right)\tau}\right),\kern0.5em i=1,2,\cdots, n-\left(\mathrm{m}-1\right)\tau, $$
(1)

Where m is called the embedding dimension, τ is the delay time and n is the length of the time series. The calculation of the reconstruction parameters m and τ is the key to using the delay coordinate method for phase space reconstruction.

Takens demonstrated that there exists an embedding dimension for m ≥ 2d + 1, where d is the dimension of the dynamical system, for which regular trajectories (attractors) can be constructed. According to Takens Theorem, the phase space can maintain the basic properties of the original state space. Therefore, phase space reconstruction is an effective tool to explore the characteristics of the dynamical system.

3.2 Identification of Chaotic Characteristics

Various techniques have been proposed for the identification of chaos including the Kolmogorov entropy method (Benettin et al. 1979), the correlation dimension method (Grassberger and Procaccia 1983), the Lyapunov exponent method (Wolf et al. 1985), the nonlinear prediction method (Farmer and Sidorowich 1987), the false nearest neighbor algorithm (Kennel et al. 1992), the method of redundancy (Paluš et al. 1995), and the surrogate data method (Schreiber and Schmitz 1996). Generally, we need to apply several methods to distinguish infallibly between a chaotic and stochastic system. In this paper, the correlation dimension method, the Lyapunov exponent method and the nonlinear prediction method were used to analyze the chaotic characteristics of streamflow series.

3.2.1 Lyapunov Exponent

Lyapunov exponents are used to determine the chaotic characteristics of the system according to whether the phase trajectory has the features of diffusion motion. When the largest Lyapunov exponent is greater than 0, the system is chaotic.

The main methods for calculating the maximum Lyapunov exponent include Wolf’s algorithm (Wolf et al. 1985), the Jacobi matrices method (Sano and Sawada 1985), and the small data set method (Rosenstein et al. 1993). Based on the advantages of reliability, rapidity and accessible to an application, the small data set method was applied to calculate the maximum Lyapunov exponent in this paper.

Let Yj and \( {Y}_{\hat{j}} \) be the reference point and the nearest neighbor of two trajectories in state space, then the distance between them is \( {d}_j(0)=\left\Vert {Y}_j-{Y}_{\hat{j}}\right\Vert \). \( {d}_j(i)=\left\Vert {Y}_{j+i}-{Y}_{\hat{j}+i}\right\Vert \) will be the distance after i discrete-time steps. Hence, an exponential function dj(i) ≅ dj(0)eλ(i ⋅ Δt) can describe the divergence form of trajectory with initial separation dj(0), where Δt is the sampling period of the time series and λ is the largest Lyapunov exponent. Therefore, it can be obtained lndj(i) = ln dj(0) + λ(i ⋅ Δt), in which, λ can be easily calculated using a least-squares method (Rosenstein et al. 1993).

3.2.2 Correlation Dimension Method

The main feature of chaos is the existence of strange attractors in phase space, which can be described by a correlation dimension with correlation integral. At present, the most widely used method to calculate the correlation dimension of a time series is the Grassberger-Procaccia algorithm (Grassberger and Procaccia 1983), which was therefore chosen for this work.

Suppose r is the radius of the sphere centered on Yi or Yj, then the correlation integral C(r) is given by:

$$ C(r)=\underset{n\to \infty }{\lim}\frac{2}{n\left(n-1\right)}\sum \limits_{i,j=1}^n\theta \left[r-\left\Vert {Y}_i-{Y}_j\right\Vert \right] $$
(2)

where θ(⋅) is the Heaviside function:

$$ \theta (u)=\left\{\begin{array}{l}0,\kern1em u\le 0\\ {}1,\kern1em u>0\end{array}\right. $$
(3)

When r → 0, the relationship between C(r) and r is as follows: \( \underset{r\to 0}{\lim }C(r)\infty {r}^D \), where D is the correlation dimension and can describe the self-similar structure of a singular attractor. It can be calculated by: D = log Cn(r)/ log r.

In the actual calculation, r is usually increased from a small value to a large one. For each r, the least-squares method is used to fit the plot of log C(r) versus log r and get the best line. The slope of the line is the correlation exponent. If the correlation exponent is saturated to a constant as the embedding dimension increases, then it is generally considered that the series is chaotic and the constant is the correlation dimension. If there is no saturation phenomenon, the system is entirely stochastic (Dhanya and Kumar 2010). Therefore, it can be distinguished chaotic sequences from stochastic sequences by whether the correlation dimension saturates or not.

3.2.3 Chaos Identification Method Based on Prediction Accuracy

The most common methods used for distinguishing dynamical chaos from stochastic noise in hydrological processes are the Lyapunov exponent method and the correlation dimension method. However, the value of the Lyapunov exponent is impacted by the choice of fitting region. The value of the correlation dimension is also affected by the embedding dimension. To avoid these problems, an approach is presented for identifying chaos based on the accuracy of nonlinear forecasts.

For a time series, the prediction accuracy can be measured by the correlation coefficient between the actual sequence and predicted sequence. The higher the correlation coefficient, the higher the prediction accuracy. The correlation coefficient is calculated as follows:

$$ R=\frac{\sum \limits_{t=1}^n\left[\left({x}_t-\overline{x}\right)\left({\hat{x}}_t-\overline{\hat{x}}\right)\right]}{\sqrt{\sum \limits_{t=1}^n{\left({x}_t-\overline{x}\right)}^2}\sqrt{\sum \limits_{t=1}^n{\left({\hat{x}}_t-\overline{\hat{x}}\right)}^2}} $$
(4)

where xt and \( {\hat{x}}_t \) are the observed value and predicted value, respectively, \( \overline{x} \) and \( \overline{\hat{x}} \) are the average values of xt and \( {\hat{x}}_t \), respectively, and n is the length of the time series. The parameter R indicates the strength of the linear relationship between the observed and simulated streamflow series.

Dynamic chaos and stochastic noise can be distinguished by comparing the predicted and actual trajectory (Sugihara and May 1990).

  1. (1)

    A fixed delay time is used to make a single-step prediction for a different embedding dimension. For a chaotic time series, the forecast accuracy will be at a maximum initially, after which the accuracy decreases with increasing embedding dimension. For stochastic time series, by contrast, the forecast accuracy does not change with the embedding dimension.

  2. (2)

    A fixed embedding dimension is used to make a multi-step prediction for a different delay time. For a chaotic time series, the forecast accuracy decreases with increasing prediction-time interval, whereas for stochastic noise, the forecast accuracy has nothing to do with the prediction interval.

3.3 Chaotic Time Series Prediction

An m-dimensional vector X can be embedded into m-dimensional phase space using an m-dimensional map fT, which can be expressed as:

$$ Y={f}_T(X) $$
(5)

where Y is also m-dimensional vector.

The input variables X and output variables Y can be described as:

$$ X=\left[\begin{array}{l}{x}_1\kern1em {x}_{1+\tau}\kern1.5em \cdots \kern1em {x}_{1+\left(m-1\right)\tau}\\ {}{x}_2\kern1em {x}_{2+\tau}\kern1.5em \cdots \kern1em {x}_{2+\left(m-1\right)\tau}\\ {}\kern1em \vdots \kern2em \vdots \kern2.5em \ddots \kern2em \vdots \kern1em \\ {}{x}_N\kern1em {x}_{N+\tau}\kern1em \cdots \kern1em {x}_{N+\left(m-1\right)\tau}\end{array}\right],Y=\left[\begin{array}{l}{x}_{2+\left(m-1\right)\tau}\\ {}{x}_{3+\left(m-1\right)\tau}\\ {}\kern2em \vdots \\ {}\kern1em {x}_n\end{array}\right] $$
(6)

Where N = n − 1 − (m − 1)τ is the number of sample points.

The phase space reconstruction method is usually used to find a proper formula fT in Eq. (5). The local-region forecasting method based on the embedding theory of Takens is a simple and effective method for finding a map fT (Sivakumar 2000). Neural networks are widely used for seeking the map fT in many fields. The Extreme Learning Machines (ELM) is a kind of feedforward neural network, in which the input weights and hidden biases are randomly generated and don’t need to be adjusted. Compared with traditional neural networks, it has excellent generalization performance and fast learning ability. In this study, the ELM method based on phase space reconstruction was used for predicting the monthly streamflow.

3.3.1 Extreme Learning Machine

Set {(xt, yt)|x ∈ Rn, y ∈ Rm, t = 1, 2, ⋯, N} as N training sets, where xt = [xt1, xt2, ⋯, xtn]T is the input sample and yt = [yt1, yt2, ⋯, ytm]T is the output sample. The ELM model with L hidden nodes can be expressed as:

$$ {f}_L=\sum \limits_{i=1}^N{\beta}_ig\left({w}_i{x}_t+{b}_i\right)={o}_t,\kern0.5em t=1,2,\cdots, N $$
(7)

where wi = [w1i, w2i, ⋯, wni] and bi are input weights and hidden biases respectively; βi = [βi1, βi2, ⋯, βim]T is the output weight between the hidden layer and the output layer; g is an activation function; and ot = [ot1, ot2, ⋯, otm]T is the output value.

The training objective of the extreme learning machine network is to seek the optimum, wi, βi and bi such that \( \sum \limits_{t=1}^N\left\Vert {o}_t-{y}_t\right\Vert =0 \). Then

$$ {f}_L=\sum \limits_{i=1}^N{\beta}_ig\left({w}_i{x}_t+{b}_i\right)={y}_t,\kern0.5em t=1,2,\cdots, N $$
(8)

The above formula can be simplified to  = Y, where.

$$ H=\left[\begin{array}{c}h\left({x}_1\right)\\ {}\kern0.99em \vdots \\ {}h\left({x}_N\right)\end{array}\right]={\left[\begin{array}{c}g\left({w}_1{x}_1+{b}_1\right)\kern1.33em \cdots \kern1em g\left({w}_L{x}_1+{b}_L\right)\\ {}\kern0.99em \vdots \kern4.99em \ddots \kern3em \cdots \\ {}g\left({w}_1{x}_N+{b}_1\right)\kern1em \cdots \kern1em g\left({w}_L{x}_N+{b}_L\right)\end{array}\right]}_{N\times L},\beta ={\left[\begin{array}{c}{\beta_1}^T\\ {}\kern0.66em \vdots \\ {}{\beta_L}^T\end{array}\right]}_{L\times m},Y={\left[\begin{array}{c}{Y_1}^T\\ {}\kern0.66em \vdots \\ {}{Y_N}^T\end{array}\right]}_{N\times m} $$
(9)

Given the hidden node parameters \( \left({\hat{w}}_i,{\hat{b}}_i\right) \) randomly, the output matrix of the hidden layer can be computed. Then, the smallest-norm least-squares solution, \( \hat{\beta} \), is thus \( \hat{\beta}={H}^TY \), where HT is the Moore–Penrose generalized inverse of H.

3.3.2 Parameter Calibration

It is necessary to adopt some effective methods to optimize parameters \( \left({\hat{w}}_i,{\hat{b}}_i\right) \) on the ELM model. The Genetic Algorithm (GA) (Wang 1997) and particle swarm optimization (PSO) (Jiang et al. 2010) are the foremost methods to improve the prediction performance. Both GA and PSO are parallel intelligent optimization algorithms. But PSO approach has rapid convergence over traditional GA. However, similar to GA, the possibility of premature convergence reduces its usefulness for global searches (Wang et al. 2012). To address this drawback, Jiang et al. (2015) improved the traditional PSO by introducing the idea of population hybrid evolution, named IPSO, to avoid premature convergence. In this paper, IPSO is used to enhance the learning performance of the extreme learning machine model. And a hybrid model has been proposed that integrates chaos theory and an extreme learning machine with optimal parameters selected by improved particle swarm optimization (ELM-IPSO) for monthly streamflow analysis and prediction.

The monthly streamflow forecasting using ELM-IPSO based on phase-space reconstruction is described as follows and shown in Fig. 3.

  1. Step 1:

    The input-output series (x1, y1), ⋯, (xN, yN) for phase space reconstruction are determined using Eq. (6).

  2. Step 2:

    The extreme learning machine model is constructed with g chosen to be a sigmoid function, g(x) = 1/1 + ex.

  3. Step 3:

    The IPSO method is used to solve the ELM model.

  4. Step 4:

    The total error is calculated: \( E=\frac{1}{2}\sum \limits_{j=1}^m\sum \limits_{t=1}^N{\left({y}_{j,t}-{\hat{y}}_{j,t}\right)}^2 \).

  5. Step 5:

    If E is less than ε or the maximum number of generations is satisfied, the network training is complete.

  6. Step 6:

    The trained ELM model is used to predict streamflow.

Fig. 3
figure 3

Flow chart of the model construction procedure

3.4 Assessment Criteria

The prediction accuracy is evaluated by mean absolute error (MAE), root mean square error (RMSE), water balance relative error (RE), and Nash-Sutcliffe efficiency coefficient (NSE). They are defined as:

$$ MAE=\frac{1}{n}\sum \limits_{i=1}^n\left|{Q}_{obs,i}-{Q}_{sim,i}\right|, $$
(10)
$$ RMSE=\sqrt{\frac{1}{n}\sum \limits_{i=1}^n{\left({Q}_{obs,i}-{Q}_{sim,i}\right)}^2}, $$
(11)
$$ RE=\frac{\left|{\overline{Q}}_{obs}-{\overline{Q}}_{sim}\right|}{{\overline{Q}}_{obs}}\times 100\%, $$
(12)
$$ NSE=1-\frac{\sum \limits_{i=1}^n{\left({Q}_{obs,i}-{Q}_{sim,i}\right)}^2}{\sum \limits_{i=1}^n{\left({Q}_{obs,i}-{\overline{Q}}_{obs}\right)}^2}, $$
(13)

where Qobs, i and Qsim, i are the observed and predicted streamflow, respectively, \( {\overline{Q}}_{obs} \) and \( {\overline{Q}}_{sim} \) are the average values, and n is the length of the streamflow series.

The MAE is a measure of how close the predictions are to the observations. The RMSE is a way to quantify the difference between the predicted and observed values. The RE is the systematic relative error. The closer the values of MAE, MRSE and RE are to zero, the better the simulation effect. The NSE also measures the coincidence between the observed and simulated sequences. The value of the NSE is always expected to be close to unity for a right prediction.

4 Results and Discussion

The monthly streamflow data from 1956 to 2000 were analyzed for the existence of chaos and to determine the initial embedding dimension and delay time to reconstruct the phase space. The data from 2001 to 2010 were used for prediction.

4.1 Identification of Chaotic Characteristics

4.1.1 Determination of Delay Time

The delay time τ was calculated using the mutual information method for phase space reconstruction. Figure 4 shows the mutual information for various lag times. Because the first minimum value reached by the mutual information function is at lag time 6, it was selected as the delay time. However, the mutual information function is not a necessary or sufficient tool to describe whether a process is stochastic or chaotic. Therefore, it is necessary to use other methods to determine further whether the streamflow sequence is chaotic or not.

Fig. 4
figure 4

Mutual information with delay time for monthly streamflow data

4.1.2 Correlation Dimension Method

The correlation integral C(r) was calculated using the Grassberger–Procaccia algorithm. Figure 5a shows a plot of C(r) versus r on a logarithmic scale for embedding dimensions, m, from 1 to 20. The slope of the plot determines the correlation exponent. Figure 5b shows the relationship between the correlation exponent values and the embedding dimension values. It can be noticed that the correlation exponent increases with embedding dimension and the slope of the plot tends to be saturated for embedding dimension m ≥ 9. The saturation value of the correlation exponent is about 4.135. Therefore, it can be indicated that the streamflow series exhibits low-dimension chaotic behavior, and that streamflow prediction may be feasible using chaotic prediction methods.

Fig. 5
figure 5

a log C(r) vs log r for monthly streamflow data and b relationship between correlation exponent and embedding dimension

4.1.3 Lyapunov Exponent

The largest Lyapunov exponent was calculated using a method for small data sets. Figure 6 shows a plot of y(i) versus i, where i is the discrete time step and y(i) is the average logarithmic distance of all neighbors after i discrete time steps. In Fig. 6, there is an approximately straight line before i = 5 and the slope of the dotted line is equal to the theoretical value of the largest Lyapunov exponent. The positive largest Lyapunov exponent confirms that trajectories diverge exponentially and hence that the monthly streamflow is chaotic.

Fig. 6
figure 6

y(i) vs i using method for small data sets for monthly streamflow data

4.2 Phase Space Reconstruction Parameter Optimization

The phase space was reconstructed with delay times from 1 to 10 and embedding dimensions from 1 to 10. For different delay times, the ELM-IPSO method was used to get the corresponding optimal embedding dimension and prediction accuracy. Each combination of delay time and optimal embedding dimension (Table 1) results in different prediction accuracy. The delay time and embedding dimension corresponding to the maximal prediction accuracy were selected as the adjusted reconstructed parameters. The best prediction accuracy, ρ = 0.871, is achieved for embedding dimension 5 and delay time 1 (Table 1). Therefore, τ = 1 and m = 5 were chosen as the adjusted optimal space phase reconstruction parameters for real prediction.

Table 1 Different delay time and its corresponding optimal embedding dimension and the maximum correlation coefficience

Table 1 Optimal embedding dimensions and maximum correlation coefficients for different delay times.

4.3 Prediction Accuracy

4.3.1 Parameter Settings

To test the performance of ELM-IPSO for monthly streamflow prediction, it was compared with an auto-regression method (AR), a three-layer feedforward artificial neural network (ANN), an extreme learning machine with genetic algorithm (ELM-GA) and with PSO algorithm (ELM-PSO). For all benchmark models, phase space was reconstructed firstly to design input vectors, and the training and validation data sets of all models are the same. The difference between the three methods is the choice of forecasting technique: ELM-IPSO, AR, ANN, ELM-GA or ELM-PSO. This allows us to evaluate which model is the most accurate.

In the ANN network, tansig function and logsig function are chosen as activation function of hidden layer and output layer, respectively. And Traingdx function is selected to train network. The number of hidden layer nodes is determined as 13 according to a trial and error method. And a maximum number of iteration, acceptable error and learning efficiency are set 5000, 0.01 and 0.1, respectively. In the ELM model, the sigmoid function is chosen as the transfer function and the number of hidden layer nodes is determined as 20 according to a trial and error method. The relevant experimental parameters for GA, PSO and IPSO algorithms are shown in Table 2. When the maximum iteration is reached, the algorithms are terminated. To avoid the influence of randomicity, all algorithms need 10 trials to get the optimal solution.

Table 2 The parameters of GA, PSO and IPSO

4.3.2 Results and Analysis

Performance measures for the various prediction methods are shown in Table 3. These results indicate the following: 1) Although RE reached 23.01% for AR in the forecasting period, for the other assessment criteria is in the allowable range (no more than 20%) in the training and forecasting periods. 2) Both in the training and forecasting periods, for ELM-IPSO, the MAE, RMSE and RE are the lowest, and NSE and R are the highest. By these performance measures, the ELM-IPSO method is the most effective for streamflow prediction. 3) Compared with AR, ANN and ELM methods, it can be seen from the forecasting results that the AR method and ANN model cannot give a satisfactory performance. Therefore, the extreme learning machine method may be more suitable for streamflow forecasting. 4) Additionally, by comparing the results for ELM-GA and ELM-IPSO, it can be seen that relative to ELM-GA, for ELM-IPSO, MAE and RMSE decreased by 7.16% and 1.67% respectively in the training period, and by 3.57% and 2.01% in the forecasting period. NSE and R were improved by 4.17% and 2.44 respectively in the training period, and by 2.63% and 1.13% in the forecasting period. Similarly, by comparing ELM-PSO and ELM-IPSO, it can be seen that relative to ELM-GA, the ELM-IPSO decreases of 4.04% and 0.91% in MAE and RMSE respectively in the training period, however increased by 7.41% and 2.06 in the forecasting period; and was improved by 1.41% and 1.16% in NSE and R respectively in the training stage, and by both zero in the forecasting period. These results show that the monthly streamflow forecasting accuracy can be improved by using the IPSO algorithm to train ELM. Figure 7 shows the iterative processes of the fitness values of GA, PSO and IPSO for solving the ELM model. Compared with ELM-GA and ELM-PSO, ELM-IPSO has faster convergence speed and can find the optimal solution quickly.

Table 3 Forecasting performance of the different methods
Fig. 7
figure 7

The relationship between the fitness value and the number of iterations

Figure 8 shows the streamflow simulation results of the training and real prediction periods. There is excellent agreement between the observed streamflow and the forecasting streamflow. Figure 9 shows a scatter plot to evaluate model capabilities for simulating the dynamics of streamflow, in which a linear regression equation was used to analyze the correlation between the simulated and measured streamflow. It can be seen that the determination coefficients (R2) of 0.7587 and 0.7926 at the 0.01 significance level for the training and prediction periods respectively, which indicates the prediction of ELM-IPSO method has a good correlation with the observed data. These results further indicate the effectiveness of the ELM-IPSO method.

Fig. 8
figure 8

Comparison of observed and predicted monthly streamflow during training and prediction periods

Fig. 9
figure 9

Scatter plots of the training and prediction periods

5 Conclusions

The purpose of this study is to analyze the chaotic properties of streamflow series using various techniques and propose a hybrid model integrating chaos theory and extreme learning machines to predict streamflow. Monthly streamflow data from Daiying hydrological station in the Chaohe River basin in northern China were used for the study. The behavior of streamflow dynamics is investigated by calculating their correlation dimensions using the Grassberger–Procaccia algorithm and the maximal Lyapunov exponents using methods for small data sets and nonlinear prediction. Then, based on phase space reconstruction, an extreme learning machine with parameters selected using an improved particle swarm optimization (ELM-IPSO) is developed to improve the streamflow prediction. Monthly streamflow data from 1956 to 2000 were used to determine the initial embedding dimension and delay time to reconstruct the phase space. The data from 2001 to 2010 were used for prediction. The accuracy of the streamflow prediction (linear correlation coefficient of about 0.89 and efficiency coefficient of about 0.78) indicate the validity of the proposed ELM-IPSO method for predicting streamflow. Compared with AR, ANN, ELM-GA and ELM-PSO methods, ELM-IPSO has the lowest MAE, RMSE, and RE value, and the highest NSE and R value, during both the training and prediction stages. These results demonstrate that ELM-IPSO is an effective technique in improving the forecasting accuracy of monthly streamflow.