1 Introduction

Flooding is one of the most frequently occurring natural hazards worldwide, and often causes major damage to our society (Li et al. 2014). Therefore, the task of hydrologists and water resources engineers is frequently to assess and forecast the flood discharge for developing the best management practices to avoid and reduce flood disaster. For such purpose, the flood forecasting has always received tremendous attention of researchers and been a particular interested issue in operational hydrology (Cloke and Pappenberger 2009). In the past decades, various studies on smaller time scales (daily or hourly) flood forecasting in different parts of the world have been carried out by many researchers (Xu et al. 2013). However, the definition of the flood threat associated with many problems concerning reservoirs and flood protection works involves not only the smaller time scales (daily or hourly) flood forecasting but also the annual maximum flood peak discharge forecasting which would have an enormous threat to the life, environment, economies and society. Therefore, the development of a precise technique for extreme flood forecasting is an important task in disaster warning systems and is expected to be useful to disaster prevention and mitigation in future.

Influencing factors of extreme flood include many other variables, for example, extreme rainfall, climate change, forest, urban development, land development, etc. (Tripathi et al. 2014). The nonlinear and complex processes among these influencing factors and extreme flood peak lead to difficulties in constructing a physically based mathematical model. Furthermore, the success of the forecasting depends on achieving a reliable fit for the model, which requires a sufficiently long and high quality data record. Unfortunately some influencing factors are not available in the vast majority of catchments or the catchment has undergone significant land-use or climate changes in the past and has no good historical record (Li et al. 2014). Possibly due to this reason, only a few studies have considered this problem in the context of extreme flood forecasting (Wu et al. 2014). Therefore, the use of historical observations of the annual extreme flood peak discharge is an important tool for obtaining a clearer understanding of what the future will hold (Blöschl and Montanari 2010). In this approach, thorough understanding of the physical laws is not required and the data requirements are not as extensive as for the process model (Danandeh Mehr et al. 2013). The traditional technique is using multiple linear regression (MLR) methods to seek a relationship between input and output. A great deal of work has been done in applying MLR method for hydrology modelling. MLR models are considered as benchmark for comparison with other techniques in reservoir flood forecasting (Chau et al. 2005). MLR models are also considered as benchmark for comparison with other techniques for modelling annual rainfall-runoff relationship (Wang et al. 2013). Magar and Jothiprakash (2011) developed the MLR approach to construct intermittent reservoir daily inflow forecasting system. (Latt and Wittenberg 2014) have compared stepwise MLR and artificial neural network (ANN) for improving flood forecasting in a developing country and thought that linear regression models are quite applicable to forecasting, however, they require a prior assumption about the type and consistency of the relation between dependent and independent variables.

In recent years, the ANNs and ANFIS as artificial intelligence techniques, which neither presuppose a detailed understanding of a river’s physical characteristics, nor do they require extensive data pre-processing (Dawson et al. 2002), have been increasingly used in hydrologic forecasting, such as rainfall-runoff modeling (Dawson and Wilby 1998), stream flow forecasting (Chen and Chau 2016; Wang et al. 2009, 2014, 2015), flood forecasting (Chau et al. 2005). However, if peak streamflow data are insufficient, NN-based models are usually unable to yield satisfactory solutions of extreme values. A reasonable explanation may be that not enough training data are available for proper training of high value instances (Wu et al. 2014). Therefore, the development of a better forecasting technique has always been recognized as a key task in operational hydrology.

The Projection Pursuit Regression (PPR) is a statistical technique and a powerful tool for seeking the interesting projections from high-dimensional spaces into lower-dimensional ones by means of linear projections (Friedman and Stuetzl 1981). In this paper, the aim is to develop a novel model for providing precise annual maximum peak flood discharge forecasting. In order to achieve this purpose, as the first time, the projection pursuit regression using hermite polynomial (Hermite-PPR) is proposed to construct forecasting models. The method utilizes the statistical properties of the data series for identifying an appropriate input vector to the model, and the parameters of Hermite-PPR are optimized with social spider optimization (SSO) algorithm (Cuevas et al. 2013) and Least square ( LS) method. The development and performances of MLR, ANN, ANFIS and Hermite-PPR models are demonstrated with the annual maximum flood peak discharge at Yichang hydrological station before discussing the results and making concluding remarks.

The rest of the paper is organized as follows: Section 2 introduces the study area and data information. Section 3 gives a brief introduction to the basic theory and algorithm of MLR, ANN, ANFIS, SSO algorithms. The modelling technique of the proposed approach is also demonstrated in this section. The five different statistical indices for performance evaluation of models are described in section 4. In section 5, the application results are presented, including comparison and analysis. Section 6 states the conclusions.

2 The Study Area and Data Information

The study area is the Yangtze basin located in the subtropical monsoon region. The Yangtze River is about 6300 km long and has a drainage basin area of 1.80 million km2 (see Fig. 1), and is the longest river in China and the third in the world in terms of length and discharge (Yu et al. 2009). The mean annual precipitation in the basin ranges between 270 and 500 mm in the western region and 1600–1900 mm in the southeastern region (Gemmer et al. 2008). Under the monsoonal climate, floods occur annually in the summer, especially during June and July, when slowly drifting cold fronts meet the moist and stable subtropical air-mass and generate excess rainfall in the Yangtze catchment (Zhang et al. 2006). Historically, the Yangtze River catchment has been known for its frequent huge floods that have halted to a large degree the social advancement of the basin (Yu et al. 2009).

Fig. 1
figure 1

The study area and location of hydrological station

In this paper, the annual maximum flood peak discharges are taken from Yichang hydrological station (controlling 1,005,501 km2) (Zhang et al. 2006). The annual maximum flood peak discharge data from 1882 to 2004 are studied at Yichang station, and the data set from 1882 to 1994 is used for training models whilst that from 1995 to 2004 is used for validating performances of models (Fig. 2). The interannual variability of annual maximum flood peak discharge at Yichang station is large. During the studied period, the minimum and maximum flood peak discharges are 29,800 m3/s and 71,100 m3/s respectively whilst the average annual runoff is 51,380 m3/s.

Fig. 2
figure 2

The annual maximum flood peak discharge at Yichang station

3 Description of Methodology

3.1 Multiple Linear Regression (MLR) Model

MLR is a generalized linear modeling technique that is widely used prediction equation

(Latt and Wittenberg 2014). In this study, the regression coefficients were determined using the least-squares method.

3.2 Artificial Neural Networks (ANNs)

A three-layer feed-forward ANN model trained with scaled conjugate gradient algorithm (Wang et al. 2009) is used in this study. All the data series were normalized using the minimum and maximum values so that the variables value set ranged from 0 to 1. The tan-sigmoid transfer function is adopted at the hidden layer whilst the linear transfer function is used at the output layer. The training epoch is set to 1000.

3.3 Adaptive Neuro-Fuzzy Inference System (ANFIS)

Details of the ANFIS adopted for another benchmark comparison can be found in Jang (1993).

3.4 Projection Pursuit Regression Using Hermite Polynomial (Hermite-PPR)

The projection pursuit is a technique for the exploratory analysis of multivariate data sets. In the regression problem, one is given a p-dimensional random vector x, the components of which are called predictor variables, and a predicted variable y, which is called the response, and the basic model is given as follows (Friedman and Stuetzl 1981):

$$ y={a}_0+{\displaystyle \sum_{m=1}^M{c}_mg\left({a}^Tx\right)} $$
(1)

where x is a column vector which contains p explanatory variables (columns) and n observations (row). y is the a particular observation variable to be predicted. a 0 here refers to the ridge coefficient, which is the usual intercept term in a usual regression function. a is unknown parameters, and denotes the projection direction vectors. g is an appropriate set of functions and c m is the coefficient corresponding. M is an unknown integer, which denotes the number of ridge functions g. If M, c m and g are equal to 1, respectively, then Eq. (1) is transformed into the conventional multiple regression model.

The orthonormal Hermite polynomial functions are chosen for their parametric orthonormal property and the easiness of recursive calculation of the functional values (Jeng-Neng et al. 1994). Therefore, the hermite projection pursuit regression can be given as follows:

$$ y={\displaystyle \sum_{m=1}^M{\displaystyle \sum_{j=1}^r{c}_{mj}{h}_{ij}\left({z}_i\right)}},\kern0.3em i=1,2,\dots, n $$
(2)

where n is the number of samples, z i is the projection of ith input samples in projection direction a, and it is obtained by Eq.(3).

$$ {z}_i={\displaystyle \sum_{k=1}^p{a}_k{x}_{ik},i=1,2,\dots, n};k=1,2,\dots, p $$
(3)

r is the rank of hermite polynomial, a denotes projection direction vector, and subjects to a T a = 1. c is unknown parameters, and denotes the coefficient of hermite polynomial and h denotes the orthonormal hermite functions, and can be defined as follows

$$ {h}_r(z)={\left(r!\right)}^{-\frac{1}{2}}{\pi}^{\frac{1}{4}}{2}^{-\frac{r-1}{2}}{H}_r(z)\phi (z),-\infty <z<\infty $$
(4)

where r ! denotes r factorial, \( \phi (z)=\frac{1}{\sqrt{2\pi }}{e}^{-\frac{z^2}{2}} \), H r (z) is hermite polynomial,and can be constructed in a recursive manner

$$ \left\{\begin{array}{l}{H}_0(z)=1\\ {}{H}_1(z)=2z\\ {}\cdots \\ {}{H}_r(z)=2\left(z{H}_{r-1}(z)-\left(r-1\right){H}_{r-2}(z)\right)\end{array}\right. $$
(5)

Hence, the values of parameters a and c can be determined by solving the following optimal problem.

$$ \min f\left(a,c\right)=\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({Y}_i-{y}_i\right)}^2} $$
(6)

where Y i is observed value, a T a = 1.

3.5 Social Spider Optimization (SSO) Algorithm

The SSO algorithm developed by (Cuevas et al. 2013) is a novel swarm intelligence computation technique, which is based on the simulation of the cooperative behavior of social-spiders. In SSO algorithm, each individual of the population is modeled considering two genders (males and females). Depending on gender, each individual is conducted by a set of different evolutionary operators which mimic different cooperative behaviors that are typical in a colony (Cuevas et al. 2013). Such mechanism allows not only to emulate in a better realistic way the cooperative behavior of the colony, but also to incorporate computational mechanisms to avoid critical flaws, such as the premature convergence and the incorrect exploration-exploitation balance. For more details regarding the SSO algorithm, please see Cuevas et al. (2013). Here, a brief statement of the SSO algorithm can be given as follows:

3.5.1 Defining the Number of Female and Male

The SSO algorithm defines the number of female and male spiders that will be characterized as individuals in the search space. Considering N as the total number of colony members, the number of females N f is randomly selected within the range of 65% ~ 90%. Therefore, N f and N m can be calculated as

$$ {N}_f= floor\left[\left(0.9- rand\cdot 0.25\right)\cdot N\right] $$
(7)
$$ {N}_m=N-{N}_f $$
(8)

where rand is a random number between [0,1], floor maps a real number to an integer number.

Therefore, the complete population S, including N elements, is composed of two sub-groups F and M. The group F gathers the set of female individuals (\( F=\left\{{f}_1,{f}_2,\dots, {f}_{N_f}\right\} \)) whereas group M assembles the male members (\( M=\left\{{m}_1,{m}_2,\dots, {m}_{N_m}\right\} \)),where S = F ∪ M,(S = {s 1, s 2, …, s N }), So,

$$ S=\left\{{s}_1={f}_1,{s}_2={f}_2,\dots, {s}_{N_f}={f_N}_{{}_f},{s}_{N_f+1}={m}_1,{s}_{N_f+2}={m}_2\dots, {s}_N={m}_{N_m}\right\} $$
(9)

3.5.2 Fitness Assignation

In SSO algorithm, every spider receives a weight w i which denotes the solution quality that corresponds to the spider i (without consideration of gender) of the population S. The w i can be calculated by the next equation

$$ {w}_i=\frac{J\left({s}_i\right)- wors{t}_s}{bes{t}_s- wors{t}_s} $$
(10)

where J(s i) is the fitness value, which can be obtained by the evaluation of the spider position s i according to the objective function. The values best s and worst s are calculated as (considering a maximization problem)

$$ bes{t}_s=\underset{k\in \left\{1,2,\dots, N\right\}}{ \max}\left(J\left({s}_k\right)\right)\kern0.4em and\kern0.5em wors{t}_s=\underset{k\in \left\{1,2,\dots, N\right\}}{ \min}\left(J\left({s}_k\right)\right) $$
(11)

3.5.3 Modeling of the Vibrations Through the Communal web

As a way, the communal web is applied to transmit information among the colony members. In SSO algorithm, this information is encoded as small vibrations which are perceived by the individual i as a result of the information transmitted by the member j. This process can be modeled by the following equation:

$$ Vi{b}_{i,j}={w}_j\cdot {e}^{-{d}_{i,j}^2} $$
(12)

where d i,j  = ‖s i  − s j ‖, which represents the Euclidian distance between the spiders i and j.

Furthermore, there are three special relationships considered within the SSO algorithm:

  1. 1)

    If the individual c(s c ) is the nearest member to individual i(s i) and possesses a higher weight in comparison to i(w c > w i), then the vibrations Vibc i can be defined as

    $$ Vib{c}_i={w}_c\cdot {e}^{-{d}_{i,c}^2} $$
    (13)
  2. 2)

    If the individual b(s b) holds the best weight (best fitness value) of the complete population S, that is to say \( {w}_b=\underset{k\in \left\{1,2,\dots, N\right\}}{ \max}\left({w}_k\right) \), then the vibrations Vibb i can be defined as

    $$ Vib{b}_i={w}_b\cdot {e}^{-{d}_{i,b}^2} $$
    (14)
  3. 3)

    If the individual f(s f) is the nearest female individual to i, then the vibrations Vibf i can be defined as

    $$ Vib{f}_i={w}_f\cdot {e}^{-{d}_{i,f}^2} $$
    (15)

3.5.4 Initializing the Population

In SSO algorithm, the entire population (female and male) can be generated randomly by initializing the set S of N spider positions. Each spider position, f i or m i, is a n-dimensional vector, which represents the parameter values to be optimized. f i and m i can be given as follows:

$$ \begin{array}{l}{f}_{i,j}^0={b}_j^{low}+ rand\left(0,1\right)\cdot \left({b}_j^{high}-{b}_j^{low}\right)\\ {}\kern2.4em i=1,2,\dots {N}_f;j=1,2,..,n\end{array} $$
(16)
$$ \begin{array}{l}{m}_{k,j}^0={b}_j^{low}+ rand\left(0,1\right)\cdot \left({b}_j^{high}-{b}_j^{low}\right)\\ {}\kern2.4em k=1,2,\dots {N}_m;j=1,2,..,n\end{array} $$
(17)

where j,i and k are individual indexes, respectively; zero signals denotes the initial population; rand(0,1) represents a random number between 0 and 1; and b low j and b high j represents the lower and upper initial parameter bound, respectively.

3.5.5 Cooperative Operators

In SSO algorithm, the cooperative operators contain female cooperative operator and male cooperative operator. The female cooperative operator can be modeled as follows:

$$ {f}_i^{k+1}=\left\{\begin{array}{l}{f}_i^k+\alpha \cdot Vib{c}_i\cdot \left({s}_c-{f}_i^k\right)+\beta \cdot Vib{b}_i\cdot \left({s}_b-{f}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right)\kern0.1em ,\left({r}_m<PF\right)\\ {}{f}_i^k-\alpha \cdot Vib{c}_i\cdot \left({s}_c-{f}_i^k\right)-\beta \cdot Vib{b}_i\cdot \left({s}_b-{f}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right),\left({r}_m\ge PF\right)\end{array}\right. $$
(18)

where α, β, δ and rand are random numbers between [0,1], k denotes the iteration number. The individual s c and s b denotes the nearest member to i that holds a higher weight and the best individual of the entire population S, respectively. r m is a uniform random number, which is generated between [0,1], whereas PF is a threshold and is often set to be 0.7 (Cuevas et al. 2013).

The male cooperative operator can be modeled as following:

$$ {m}_i^{k+1}=\left\{\begin{array}{l}{m}_i^k+\alpha \cdot \left(\frac{{\displaystyle {\sum}_{h=1}^{N_m}{m}_h^k\cdot {w}_{N_f+h}}}{{\displaystyle {\sum}_{h=1}^{N_m}{w}_{N_f+h}}}-{m}_i^k\right),\kern0.4em if{w}_{N_{f+i}}\le {w}_{N_f+m}\\ {}{m}_i^k+\alpha \cdot Vib{f}_i\cdot \left({s}_f-{m}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right),\kern0.3em if{w}_{N_f+i}>{w}_{N_f+m}\end{array}\right. $$
(19)

where the individual s f denotes the nearest female individual to the male individual i whereas \( \left({\displaystyle {\sum}_{h=1}^{N_m}{m}_h^k\cdot {w}_{N_f+h}}/{\displaystyle {\sum}_{h=1}^{N_m}{w}_{N_f+h}}\right) \) corresponds to the weighted mean of the male population M.

3.5.6 Mating Operator

In SSO algorithm, a new brood s new is generated by mating operator, when a dominant male spider locates a number of female members within a specific range r. r is defined as a radius which depends on the size of the search space, and can be computed by the following equation:

$$ r=\frac{{\displaystyle \sum_{j=1}^n\left({b}_j^{high}-{b}_j^{low}\right)}}{2\cdot n} $$
(20)

In the mating process, the influence probability Psi of each individual is calculated by the roulette method, which is given as follows:

$$ P{s}_i=\frac{w_i}{{\displaystyle \sum_{j=1}^k{w}_j}} $$
(21)

When the new spider is formed, it will be compared with the worst spider of the colony. If the new spider is better than the worst spider, it will replace the worst spider. Otherwise, it will be discarded.

3.6 The Modelling Approach of Proposed Technique

When hermite-PPR is used to forecasting annual maximum peak flood discharge based on historical observed records, two key problems are confronted: how to choose the input variables, and how to set the best parameters of hermite-PPR. These two problems are very important for obtaining satisfactory prediction accuracy.

3.6.1 Selection of Predictor Variables and Data Processing

In data-driven modeling approach, the statistical procedures (i.e. the autocorrelation function (ACF) and the partial autocorrelation function (PACF)) were suggested for identifying an appropriate input vector for a model (Wang et al. 2009). By analyzing autocorrelation coefficient, the significant lags of independent variables that are potentially influencing the output can be identified in this paper. The autocorrelation coefficient of lag k step of times series q i can be calculated as

$$ {R}_k=\frac{{\displaystyle \sum_{i=k+1}^n\left({q}_i-\overline{q}\right)\left({q}_{i-k}-\overline{q}\right)}}{{\displaystyle \sum_{i=1}^n{\left({q}_i-\overline{q}\right)}^2}}\begin{array}{cc}\hfill \hfill & \hfill \left(k=1,2,\cdots m\right)\hfill \end{array} $$
(22)
$$ \overline{x}=\frac{1}{n}{\displaystyle \sum_{i=1}^n{q}_i} $$
(23)

where n is the number of samples, m < n/4, m is maximal integer and less than n/4. According to theory of sampling distribution of R k , in the case of 1-a confidence level, if R k can satisfy

$$ {R}_k\notin \left[\frac{-1-{\mu}_{\alpha /2}{\left(n-k-1\right)}^{1/2}}{n-k},\frac{-1+{\mu}_{\alpha /2}{\left(n-k-1\right)}^{1/2}}{n-k}\right] $$
(24)

then the significant dependence can be inferred by k-step delay of times series x i , and x i − k can be selected as predictor variables. μ a/2 can be found from normal distribution table. In this study, the confidence level is 80%.

In order to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges and numerical difficulties during the calculation, it is necessary to normalize the original data. This can be made by the following equation

$$ {x}_i=\frac{\left({q}_i-\overline{q}\right)}{\sqrt{\frac{1}{n-1}{\displaystyle \sum_{i=1}^n{\left({q}_i-\overline{q}\right)}^2}}}\begin{array}{cc}\hfill \hfill & \hfill \left(i=1,2,\cdots n\right)\hfill \end{array} $$
(25)

where x i represents the normalized discharge sequence, q i represents annual maximum flood peak discharge series, \( \overline{q} \) represents mean value of q i, n is the number of samples.

3.6.2 Optimizing the Hermite-PPR Parameters With SSO and LS Method

The quality of hermite-PPR for prediction depends on several parameters: the projection direction vector a, the coefficient of hermite polynomial c and the number of ridge functions M. Equation (6) is a nonlinear constrained optimization problem. In this paper, a novel optimization algorithm named social spider optimization (SSO) algorithm (Cuevas et al. 2013) is employed to determine the optimal projection direction vector a. The coefficient of hermite polynomial c can be estimated by using least square (LS) method. The value M can be found through a forward stage-wise strategy which stops when the model fit cannot be significantly improved. The hermite-PPR algorithm with SSO and LS method can be simply represented as follows:

3.6.3 Setting Parameters

Parameters such as the number of ridge functions M, the total number of colony spiders N and the evolution number of generation G max of SSO algorithm need to be set.

3.6.4 Generating Projection Direction Vector a

In this study, the projection direction vector a is optimized by SSO algorithm. Each spider position is a n-dimensional vector, which represents a possible solution of projection direction vector a. First, considering N as the total number of n-dimensional colony members, define the number of females N f and male N m spiders according to Eq.( 7 ) and ( 8 ). Then, initialize randomly the female individuals (\( F=\left\{{f}_1,{f}_2,\dots, {f}_{N_f}\right\} \)) and male members (\( M=\left\{{m}_1,{m}_2,\dots, {m}_{N_m}\right\} \)) and calculate the radius of mating using Eq. ( 20 ).

3.6.5 Calculating the Coefficient of Hermite Polynomial c

According to projection direction vector generated, the projection value z can be calculated by Eq. (3). The hermite polynomial h r (z) can be obtained by Eq. (4). Then the coefficient c can be obtained using LS method by solving

$$ {\displaystyle \underset{c_j}{ \min }}{\left\Vert {Y}_i-{\displaystyle \sum_{j=1}^r{c}_j{h}_{ij}\left({z}_i\right)}\right\Vert}^2 $$
(26)

3.6.6 Executing Iterative Process of SSO Algorithm

First, according to the generated projection direction vector a and the obtained coefficient c, the regressive value can be calculated by Eq. (2). The fitness value can be obtained by the evaluation of the spider position s i with regard to the objective function as follows

$$ Fitval=\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({Y}_i-{y}_i\right)}^2} $$
(27)

Then, calculate the weight of every spider by Eq.(10); move female spiders and male spiders according to the female and male cooperative operator by Eqs.(18) and (19), respectively; perform the mating operation to form new spider by Eq. (21). Finally, if the stop criteria are met, the process is finished and the optimal projection direction parameters and the optimal coefficient c are obtained. The first ridge function optimization is over. Here, the iterative process of SSO algorithm is only a brief description, the detailed description about the iterative process of SSO algorithm can be found in Cuevas et al. (2013).

3.6.7 Optimization Terminated of Model and Results Output

According to the acquired optimal projection direction parameters a and the acquired optimal coefficient c, the simulated residual ε i and the termination value can be calculated. If the termination value satisfies the termination condition,the results are obtained. Otherwise, y i is replaced by ε i , and the optimization of the next ridge function is performed.

4 The Evaluation of Forecasting Performance

The performance of MLR, ANN and SSO-HPPR for the annual maximum peak flood discharge forecasting was evaluated using five different statistical indices, which are described as follows:

Firstly, the root mean squared error (RMSE) is selected as the performance criterion of level prediction, records in real units the level of overall agreement between the observed and modelled datasets (Dawson et al. 2007) and is indicative of the good measure of model performance for high flows (Karunanithi et al. 1994). The RMSE can be calculated as

$$ RMSE=\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_f(i)-{y}_o(i)\right)}^2}} $$
(28)

Secondly, the mean absolute relative error (MARE) is a relative criterion which is sensitive to the forecasting errors that occur in the low(er) magnitudes of each dataset, and is less sensitive to the larger errors that usually occur at higher magnitudes, because the errors are not squared. MARE is an unbiased statistic for computing the predictive capability of a model, and can be calculated as:

$$ MARE\left(\%\right)=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left|\frac{y_f(i)-{y}_o(i)}{y_o(i)}\right|}\times 100 $$
(29)

Thirdly, the coefficient of correlation (CC) is selected as the degree of collinearity criterion of level prediction. CC is generally used to compare alternative models, though it is oversensitive to high extreme values (outliers) and insensitive to additive and proportional differences between model predictions and measured data (Legates and McCabe 1999). CC can be calculated as:

$$ CC=\frac{\frac{1}{n}{\displaystyle \sum_{i=1}^n\left({y}_o(i)-{\overline{y}}_0\left)\right({y}_f(i)-{\overline{y}}_f\right)}}{\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_0(i)-{\overline{y}}_0\right)}^2}}*\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_f(i)-{\overline{y}}_f\right)}^2}}} $$
(30)

Fourthly, the Nash-Sutcliffe efficiency coefficient (NSEC) is used as one of very popular index to assess the predictive power of hydrological models (Nash and Sutcliffe 1970). It can be calculated as

$$ NSEC=1-\frac{{\displaystyle \sum_{i=1}^n{\left({y}_o(i)-{y}_f(i)\right)}^2}}{{\displaystyle \sum_{i=1}^n{\left({y}_o(i)-{\overline{y}}_o\right)}^2}} $$
(31)

where y o (i) and y f (i) are, respectively, the observed and forecasted runoff and \( {\overline{y}}_o \), \( {\overline{y}}_f \) denote their means value, and n is the number of samples considered.

Finally, the qualified rate (QR) was defined by Standard for hydrological information and hydrological forecasting (GB/T 22482–2008) (Quality of nation of P. R. china and Standardization administration of P. R.china 2008). According to the related specifications of GB/T 22482–2008, if the absolute relative error (ARE) of a forecasted peak flood discharge is less than 20% of observed value, it is a qualified forecast. ARE is calculated as:

$$ ARE\left(\%\right)=\left|\frac{y_f-{y}_o}{y_o}\right|\times 100 $$
(32)

QR is defined by

$$ QR=\frac{n_q}{n}\times 100 $$
(33)

where n q is the number of qualified forecast, n is the total number of forecast.

If QR is more than 85%, the performances of forecasting satisfy the first level standard. If QR is more than 70% and less than 85%, the performances of forecasting satisfy the second level standard. If QR is more than 60% and less than 70%, the performances of forecasting satisfy the third level standard. Otherwise, the results of the performances of model are not feasible for peak flood discharge forecasting. Therefore, QR is an overall evaluation of prediction precision level.

5 Models Application and Discussion

Based on previously introduced, the proposed model is developed and evaluated using the annual maximum flood peak discharge data from 1882 to 2004 at Yichang station, and the data set from 1882 to 1994 is used for training models whilst that from 1995 to 2004 is used for validating performances of model. In order to select the appropriate predictor variables, the autocorrelation coefficient of lag k step R k, its upper bound R 1k and lower limit R 2k of annual maximum flood peak discharge time series q i can be calculated by Eqs. (22) ~ (24) based on 80% confidence level. The results of R k, R 1k and R 2k are described in Table 1. As can be seen from Table 1, when the variable q i lag 1, 2, 3, 20, 21, 22, 25, 28, 30 steps, the dependence is obvious under the condition  8 0% confidence level. Therefore, q i-1 ,q i-2 ,q i-3 ,q i-20 ,q i-21 ,q i-22 ,q i-25 ,q i-28 and q i-30 can be selected as predictor variables. In the modeling process, all the data series were normalized using Eq.(25); the range of projection direction parameters a is set between −1 and 1, the rank of hermite polynomial r is 6, the population size of SSO algorithm N is 50, the evolution number of generation of SSO algorithm G max is 500 and the number of ridge functions M is 2. Thus, the prediction model of annual maximum peak discharge of Yangtze River at YiChang station can be obtained as follows:

Table 1 The autocorelation coefficient of lag k step R k, its upper bound R 1k and lower limit R 2k of q i based on 80% confidence level
$$ \begin{array}{l}f\left({x}_i\right)={\displaystyle \sum_{m=1}^2{\displaystyle \sum_{j=1}^6{c}_{mj}{h}_{ij}\left({\displaystyle \sum_{k=1}^9{a}_{mk}{x}_{ik}}\right)}}\\ {}{a}_{mk}=\left[\begin{array}{ccccccccc}\hfill -0.5026\hfill & \hfill -0.2533\hfill & \hfill 0.6548\hfill & \hfill 0.3622\hfill & \hfill 0.0643\hfill & \hfill 0.1909\hfill & \hfill -0.0611\hfill & \hfill -0.1827\hfill & \hfill -0.2135\hfill \\ {}\hfill 0.1644\hfill & \hfill -0.4742\hfill & \hfill -0.4068\hfill & \hfill -0.1840\hfill & \hfill 0.1287\hfill & \hfill -0.0123\hfill & \hfill 0.3641\hfill & \hfill 0.5149\hfill & \hfill 0.3665\hfill \end{array}\right]\\ {}{c}_{mj}=\left[\begin{array}{cccccc}\hfill -0.8767\hfill & \hfill -0.3722\hfill & \hfill -2.3517\hfill & \hfill 6.5304\hfill & \hfill -1.4695\hfill & \hfill 6.5744\hfill \\ {}\hfill 3480.4190\hfill & \hfill 700.6110\hfill & \hfill 201.9909\hfill & \hfill -23940.9609\hfill & \hfill -13034.4499\hfill & \hfill -32053.4996\hfill \end{array}\right]\end{array} $$
(34)

For the same basis of comparison, the same training and testing sets, respectively, are used for the above MLR and three-layer feed-forward ANN model developed. Using the least-squares method, the MLR model can be obtained as following:

$$ \begin{array}{l}{y}_i=41330.98+0.1572{q}_{i-1}+0.1510{q}_{i-2}-0.2395{q}_{i-3}-0.1689{q}_{i-20}-0.0605{q}_{i-21}\\ {}\kern2.1em -0.0539{q}_{i-22}+0.0946{q}_{i-25}+0.1555{q}_{i-28}+0.1571{q}_{i-30}\end{array} $$
(35)

The four quantitative standard statistical performance evaluation measures RMSE, MARE, CC, NSEC are employed to evaluate the performances of above three models developed, and the statistical results of different models are summarized in Table 2. From Table 2, it can be observed the values with hermite-PPR model with SSO and LS method hybrid optimization were able to produce a good and close forecast, as compared with those of MLR, ANFIS and three-layer feed-forward ANN models. In the training phase, the hermite-PPR model improved the ANFIS model, ANN model and MLR method by about 28.67%, 31.63% and 53.46%, respectively, and a 32.76%, 34.94% and 49.94% reduction in RMSE and AARE values, respectively. Improvements in the forecasting results regarding the CC value were approximately 56.37%, 84.71% and 123.65%, respectively. The hermite-PPR model obtained the best value of NSEC increase by 150.41% and 244.13% comparing with the ANFIS model and the ANN model, respectively. In the validation phase, the hermite-PPR model improved the ANFIS model, ANN model, MLR method by about 6.03%, 9.83% and 42.28%, respectively, and a 1.88%, 13.34% and 39.68% reduction in RMSE and AARE values, respectively. Improvements in the forecasting results regarding CC value were approximately 7.46%, 17.57% and 10.42%, respectively. The hermite-PPR model obtained the best value of NSEC increase by 33.86% and 81.82% comparing with the ANFIS model and the ANN model, respectively. However, in training and validation, the MLR method obtained the negative value of NSEC which indicates that the observed mean value is a better predictor than it. Therefore, it can be concluded that it is not feasible to use the MLR method to forecast the annual maximum flood peak discharge at Yichang station.

Table 2 Forecasting performance indices of different models

In order to further analyze the performance of model, QR, which is hydrological information and hydrological forecasting (GB/T 22482–2008) in China for flood peak discharge forecasting, is also employed to evaluate the performances of the above three models developed. In addition, the percentage which the absolute value of relative error fall into the other interval, the maximum absolute relative error (%) and the minimum absolute relative error (%) are also analysed, and Table 3 presents the results of study sites in terms of various performance statistics. As can be seen from Table 3 that QR of MLR, ANN ANFIS and hermite-PPR is 61.4%, 77.1% 78.3% and 94%, respectively, in training phase, and 40.0%, 90%, 80% and 90%, respectively, in validation phase. Therefore, in training and validation phase, QR of hermite-PPR model are both more than 85% and the performances of forecasting satisfy the first level standard. QR of ANFIS model are both less than 85% and the performances of forecasting satisfy the second level standard. QR of ANN model is more than 85% in validation phase, which satisfies the first level standard, whereas is less than 85% in training phase, which satisfies the second level standard. QR of MLR method is just more than 60% in training phase, which satisfies the third level standard, whereas is less than 60% in validation phase, which indicates the results of the performances of model are not feasible for peak flood discharge forecasting. Furthermore, as can be seen from Table 3 that the percentage which the absolute value of relative error fall into the other interval, the maximum absolute relative error (%) and the minimum absolute relative error (%) are both better than the ANFIS, ANN and MLR models. Figure 3 illustrates the annual maximum flood peak discharge forecasting results at Yichang station using different models. It can be seen from Fig. 3 that the hermite-PPR model with SSO and LS algorithm can match flood peak discharge better than ANFIS, ANN and MLR models. The results of this analysis indicate that the hermite-PPR model is able to obtain the best result in terms of different evaluation measures. This illustrates that the hermite-PPR model is suitable for non-linear and nonstationary extreme flood peak analysis, the idea of ‘projection pursuit’ is feasible, and the hermite-PPR model can overcome the drawbacks of traditional models to generate a synergetic effect in forecasting and improve the prediction performance.

Table 3 The performance analysis based on different models
Fig. 3
figure 3

Annual maximum flood peak discharge forecasted by Hermite-PPR, ANFIS, ANN and MLR models at Yichang station in Yangtze River

6 Conclusions

It is very difficult to describe relationships between the potential predictors x and the expected value y using certain functions for annual maximum peak flood discharge forecasting. In this paper, a new hermite-PPR model with SSO and LS algorithm for designing annual maximum flood peak discharge forecasting model is presented. The method utilizes the statistical properties of the data series for identifying an appropriate input vector to the model, and trains model with a novel SSO and LS algorithm. The proposed model was tested using real datasets from a large size catchment of the Yangtze River in China. A typical three-layer feed forward ANN model, ANFIS and MLR are employed as benchmark comparison for illustrating the forecasting capability of the proposed model. Five statistical performance evaluation measures are employed to evaluate the performances of the models. The results indicated a promising role of the new hermite-PPR model with SSO and LS algorithm in annual maximum flood peak discharge forecasting, showed higher performance level of hermite-PPR model over ANFIS, ANN and MLR models in Yangtze River. In addition, the implementation of the methodology presented can lead to certain automation procedures in model development. Since the proposed model is based on the information contained in the data series itself, and is based on clear statistical properties as decision rules, the approach becomes more explicit and can be adopted for practical application.