The Annual Maximum Flood Peak Discharge Forecasting Using Hermite Projection Pursuit Regression with SSO and LS Method

Wang, Wen-chuan; Chau, Kwok-wing; Xu, Dong-mei; Qiu, Lin; Liu, Can-can

doi:10.1007/s11269-016-1538-9

The Annual Maximum Flood Peak Discharge Forecasting Using Hermite Projection Pursuit Regression with SSO and LS Method

Published: 11 November 2016

Volume 31, pages 461–477, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Water Resources Management Aims and scope Submit manuscript

The Annual Maximum Flood Peak Discharge Forecasting Using Hermite Projection Pursuit Regression with SSO and LS Method

Download PDF

Wen-chuan Wang¹,
Kwok-wing Chau²,
Dong-mei Xu¹,
Lin Qiu¹ &
…
Can-can Liu³

762 Accesses
42 Citations
Explore all metrics

Abstract

Accurate prediction of extreme flood peak discharge is essential in developing the best management practices to avoid and reduce flood disaster. In recent years, many techniques have been pronounced as a branch of computer science to model wide range of hydrological process. Nevertheless, exploration of more efficient technique is necessary in terms of accuracy and applicability. In this study, a novel hermite-PPR model with SSO and LS algorithm is proposed for designing annual maximum flood peak discharge forecasting model at Yichang station on Yangtze River in China. The statistical properties of the data series are utilized for identifying an appropriate input vector to the model and then the performance of the proposed models were compared with adaptive neuro-fuzzy inference system (ANFIS), artificial neural network (ANN) and multiple linear regression (MLR) methods in terms of root mean squared error (RMSE), mean absolute relative error (MARE), coefficient of correlation (CC), Nash-Sutcliffe efficiency coefficient (NSEC) and qualified rate (QR). The results indicate that the presented methodology in this research can obtain significant improvement in forecasting accuracy in terms of different evaluation criteria during training and validation phases.

Regional Flood Frequency Analysis Through Some Machine Learning Models in Semi-arid Regions

Article 01 July 2020

Stream Flow Forecasting with One Day Lead Using ʋ-Support Vector Regression

Evaluation of statistical models and modern hybrid artificial intelligence in the simulation of precipitation runoff process

Article 26 August 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Flooding is one of the most frequently occurring natural hazards worldwide, and often causes major damage to our society (Li et al. 2014). Therefore, the task of hydrologists and water resources engineers is frequently to assess and forecast the flood discharge for developing the best management practices to avoid and reduce flood disaster. For such purpose, the flood forecasting has always received tremendous attention of researchers and been a particular interested issue in operational hydrology (Cloke and Pappenberger 2009). In the past decades, various studies on smaller time scales (daily or hourly) flood forecasting in different parts of the world have been carried out by many researchers (Xu et al. 2013). However, the definition of the flood threat associated with many problems concerning reservoirs and flood protection works involves not only the smaller time scales (daily or hourly) flood forecasting but also the annual maximum flood peak discharge forecasting which would have an enormous threat to the life, environment, economies and society. Therefore, the development of a precise technique for extreme flood forecasting is an important task in disaster warning systems and is expected to be useful to disaster prevention and mitigation in future.

Influencing factors of extreme flood include many other variables, for example, extreme rainfall, climate change, forest, urban development, land development, etc. (Tripathi et al. 2014). The nonlinear and complex processes among these influencing factors and extreme flood peak lead to difficulties in constructing a physically based mathematical model. Furthermore, the success of the forecasting depends on achieving a reliable fit for the model, which requires a sufficiently long and high quality data record. Unfortunately some influencing factors are not available in the vast majority of catchments or the catchment has undergone significant land-use or climate changes in the past and has no good historical record (Li et al. 2014). Possibly due to this reason, only a few studies have considered this problem in the context of extreme flood forecasting (Wu et al. 2014). Therefore, the use of historical observations of the annual extreme flood peak discharge is an important tool for obtaining a clearer understanding of what the future will hold (Blöschl and Montanari 2010). In this approach, thorough understanding of the physical laws is not required and the data requirements are not as extensive as for the process model (Danandeh Mehr et al. 2013). The traditional technique is using multiple linear regression (MLR) methods to seek a relationship between input and output. A great deal of work has been done in applying MLR method for hydrology modelling. MLR models are considered as benchmark for comparison with other techniques in reservoir flood forecasting (Chau et al. 2005). MLR models are also considered as benchmark for comparison with other techniques for modelling annual rainfall-runoff relationship (Wang et al. 2013). Magar and Jothiprakash (2011) developed the MLR approach to construct intermittent reservoir daily inflow forecasting system. (Latt and Wittenberg 2014) have compared stepwise MLR and artificial neural network (ANN) for improving flood forecasting in a developing country and thought that linear regression models are quite applicable to forecasting, however, they require a prior assumption about the type and consistency of the relation between dependent and independent variables.

In recent years, the ANNs and ANFIS as artificial intelligence techniques, which neither presuppose a detailed understanding of a river’s physical characteristics, nor do they require extensive data pre-processing (Dawson et al. 2002), have been increasingly used in hydrologic forecasting, such as rainfall-runoff modeling (Dawson and Wilby 1998), stream flow forecasting (Chen and Chau 2016; Wang et al. 2009, 2014, 2015), flood forecasting (Chau et al. 2005). However, if peak streamflow data are insufficient, NN-based models are usually unable to yield satisfactory solutions of extreme values. A reasonable explanation may be that not enough training data are available for proper training of high value instances (Wu et al. 2014). Therefore, the development of a better forecasting technique has always been recognized as a key task in operational hydrology.

The Projection Pursuit Regression (PPR) is a statistical technique and a powerful tool for seeking the interesting projections from high-dimensional spaces into lower-dimensional ones by means of linear projections (Friedman and Stuetzl 1981). In this paper, the aim is to develop a novel model for providing precise annual maximum peak flood discharge forecasting. In order to achieve this purpose, as the first time, the projection pursuit regression using hermite polynomial (Hermite-PPR) is proposed to construct forecasting models. The method utilizes the statistical properties of the data series for identifying an appropriate input vector to the model, and the parameters of Hermite-PPR are optimized with social spider optimization (SSO) algorithm (Cuevas et al. 2013) and Least square ( LS) method. The development and performances of MLR, ANN, ANFIS and Hermite-PPR models are demonstrated with the annual maximum flood peak discharge at Yichang hydrological station before discussing the results and making concluding remarks.

The rest of the paper is organized as follows: Section 2 introduces the study area and data information. Section 3 gives a brief introduction to the basic theory and algorithm of MLR, ANN, ANFIS, SSO algorithms. The modelling technique of the proposed approach is also demonstrated in this section. The five different statistical indices for performance evaluation of models are described in section 4. In section 5, the application results are presented, including comparison and analysis. Section 6 states the conclusions.

2 The Study Area and Data Information

The study area is the Yangtze basin located in the subtropical monsoon region. The Yangtze River is about 6300 km long and has a drainage basin area of 1.80 million km² (see Fig. 1), and is the longest river in China and the third in the world in terms of length and discharge (Yu et al. 2009). The mean annual precipitation in the basin ranges between 270 and 500 mm in the western region and 1600–1900 mm in the southeastern region (Gemmer et al. 2008). Under the monsoonal climate, floods occur annually in the summer, especially during June and July, when slowly drifting cold fronts meet the moist and stable subtropical air-mass and generate excess rainfall in the Yangtze catchment (Zhang et al. 2006). Historically, the Yangtze River catchment has been known for its frequent huge floods that have halted to a large degree the social advancement of the basin (Yu et al. 2009).

In this paper, the annual maximum flood peak discharges are taken from Yichang hydrological station (controlling 1,005,501 km²) (Zhang et al. 2006). The annual maximum flood peak discharge data from 1882 to 2004 are studied at Yichang station, and the data set from 1882 to 1994 is used for training models whilst that from 1995 to 2004 is used for validating performances of models (Fig. 2). The interannual variability of annual maximum flood peak discharge at Yichang station is large. During the studied period, the minimum and maximum flood peak discharges are 29,800 m³/s and 71,100 m³/s respectively whilst the average annual runoff is 51,380 m³/s.

3 Description of Methodology

3.1 Multiple Linear Regression (MLR) Model

MLR is a generalized linear modeling technique that is widely used prediction equation

(Latt and Wittenberg 2014). In this study, the regression coefficients were determined using the least-squares method.

3.2 Artificial Neural Networks (ANNs)

A three-layer feed-forward ANN model trained with scaled conjugate gradient algorithm (Wang et al. 2009) is used in this study. All the data series were normalized using the minimum and maximum values so that the variables value set ranged from 0 to 1. The tan-sigmoid transfer function is adopted at the hidden layer whilst the linear transfer function is used at the output layer. The training epoch is set to 1000.

3.3 Adaptive Neuro-Fuzzy Inference System (ANFIS)

Details of the ANFIS adopted for another benchmark comparison can be found in Jang (1993).

3.4 Projection Pursuit Regression Using Hermite Polynomial (Hermite-PPR)

The projection pursuit is a technique for the exploratory analysis of multivariate data sets. In the regression problem, one is given a p-dimensional random vector x, the components of which are called predictor variables, and a predicted variable y, which is called the response, and the basic model is given as follows (Friedman and Stuetzl 1981):

$$ y={a}_0+{\displaystyle \sum_{m=1}^M{c}_mg\left({a}^Tx\right)} $$

(1)

where x is a column vector which contains p explanatory variables (columns) and n observations (row). y is the a particular observation variable to be predicted. a ₀ here refers to the ridge coefficient, which is the usual intercept term in a usual regression function. a is unknown parameters, and denotes the projection direction vectors. g is an appropriate set of functions and c _m is the coefficient corresponding. M is an unknown integer, which denotes the number of ridge functions g. If M, c _m and g are equal to 1, respectively, then Eq. (1) is transformed into the conventional multiple regression model.

The orthonormal Hermite polynomial functions are chosen for their parametric orthonormal property and the easiness of recursive calculation of the functional values (Jeng-Neng et al. 1994). Therefore, the hermite projection pursuit regression can be given as follows:

$$ y={\displaystyle \sum_{m=1}^M{\displaystyle \sum_{j=1}^r{c}_{mj}{h}_{ij}\left({z}_i\right)}},\kern0.3em i=1,2,\dots, n $$

(2)

where n is the number of samples, z _i is the projection of ith input samples in projection direction a, and it is obtained by Eq.(3).

$$ {z}_i={\displaystyle \sum_{k=1}^p{a}_k{x}_{ik},i=1,2,\dots, n};k=1,2,\dots, p $$

(3)

r is the rank of hermite polynomial, a denotes projection direction vector, and subjects to a ^T a = 1. c is unknown parameters, and denotes the coefficient of hermite polynomial and h denotes the orthonormal hermite functions, and can be defined as follows

$$ {h}_r(z)={\left(r!\right)}^{-\frac{1}{2}}{\pi}^{\frac{1}{4}}{2}^{-\frac{r-1}{2}}{H}_r(z)\phi (z),-\infty <z<\infty $$

(4)

where r ! denotes r factorial, $ \phi (z)=\frac{1}{\sqrt{2\pi }}{e}^{-\frac{z^2}{2}} $, H _r(z) is hermite polynomial,and can be constructed in a recursive manner

$$ \left\{\begin{array}{l}{H}_0(z)=1\\ {}{H}_1(z)=2z\\ {}\cdots \\ {}{H}_r(z)=2\left(z{H}_{r-1}(z)-\left(r-1\right){H}_{r-2}(z)\right)\end{array}\right. $$

(5)

Hence, the values of parameters a and c can be determined by solving the following optimal problem.

$$ \min f\left(a,c\right)=\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({Y}_i-{y}_i\right)}^2} $$

(6)

where Y _i is observed value, a ^T a = 1.

3.5 Social Spider Optimization (SSO) Algorithm

The SSO algorithm developed by (Cuevas et al. 2013) is a novel swarm intelligence computation technique, which is based on the simulation of the cooperative behavior of social-spiders. In SSO algorithm, each individual of the population is modeled considering two genders (males and females). Depending on gender, each individual is conducted by a set of different evolutionary operators which mimic different cooperative behaviors that are typical in a colony (Cuevas et al. 2013). Such mechanism allows not only to emulate in a better realistic way the cooperative behavior of the colony, but also to incorporate computational mechanisms to avoid critical flaws, such as the premature convergence and the incorrect exploration-exploitation balance. For more details regarding the SSO algorithm, please see Cuevas et al. (2013). Here, a brief statement of the SSO algorithm can be given as follows:

3.5.1 Defining the Number of Female and Male

The SSO algorithm defines the number of female and male spiders that will be characterized as individuals in the search space. Considering N as the total number of colony members, the number of females N _f is randomly selected within the range of 65% ~ 90%. Therefore, N _f and N _m can be calculated as

$$ {N}_f= floor\left[\left(0.9- rand\cdot 0.25\right)\cdot N\right] $$

(7)

$$ {N}_m=N-{N}_f $$

(8)

where rand is a random number between [0,1], floor maps a real number to an integer number.

Therefore, the complete population S, including N elements, is composed of two sub-groups F and M. The group F gathers the set of female individuals ($ F=\left\{{f}_1,{f}_2,\dots, {f}_{N_f}\right\} $) whereas group M assembles the male members ($ M=\left\{{m}_1,{m}_2,\dots, {m}_{N_m}\right\} $),where S = F ∪ M,(S = {s ₁, s ₂, …, s _N}), So,

$$ S=\left\{{s}_1={f}_1,{s}_2={f}_2,\dots, {s}_{N_f}={f_N}_{{}_f},{s}_{N_f+1}={m}_1,{s}_{N_f+2}={m}_2\dots, {s}_N={m}_{N_m}\right\} $$

(9)

3.5.2 Fitness Assignation

In SSO algorithm, every spider receives a weight w _i which denotes the solution quality that corresponds to the spider i (without consideration of gender) of the population S. The w _i can be calculated by the next equation

$$ {w}_i=\frac{J\left({s}_i\right)- wors{t}_s}{bes{t}_s- wors{t}_s} $$

(10)

where J(s _i) is the fitness value, which can be obtained by the evaluation of the spider position s _i according to the objective function. The values best _s and worst _s are calculated as (considering a maximization problem)

$$ bes{t}_s=\underset{k\in \left\{1,2,\dots, N\right\}}{ \max}\left(J\left({s}_k\right)\right)\kern0.4em and\kern0.5em wors{t}_s=\underset{k\in \left\{1,2,\dots, N\right\}}{ \min}\left(J\left({s}_k\right)\right) $$

(11)

3.5.3 Modeling of the Vibrations Through the Communal web

As a way, the communal web is applied to transmit information among the colony members. In SSO algorithm, this information is encoded as small vibrations which are perceived by the individual i as a result of the information transmitted by the member j. This process can be modeled by the following equation:

$$ Vi{b}_{i,j}={w}_j\cdot {e}^{-{d}_{i,j}^2} $$

(12)

where d _i,j = ‖s _i − s _j‖, which represents the Euclidian distance between the spiders i and j.

Furthermore, there are three special relationships considered within the SSO algorithm:

1)
If the individual c(s _c) is the nearest member to individual i(s _i) and possesses a higher weight in comparison to i(w _c > w _i), then the vibrations Vibc _i can be defined as
$$ Vib{c}_i={w}_c\cdot {e}^{-{d}_{i,c}^2} $$
(13)
2)
If the individual b(s _b) holds the best weight (best fitness value) of the complete population S, that is to say $ {w}_b=\underset{k\in \left\{1,2,\dots, N\right\}}{ \max}\left({w}_k\right) $, then the vibrations Vibb _i can be defined as
$$ Vib{b}_i={w}_b\cdot {e}^{-{d}_{i,b}^2} $$
(14)
3)
If the individual f(s _f) is the nearest female individual to i, then the vibrations Vibf _i can be defined as
$$ Vib{f}_i={w}_f\cdot {e}^{-{d}_{i,f}^2} $$
(15)

3.5.4 Initializing the Population

In SSO algorithm, the entire population (female and male) can be generated randomly by initializing the set S of N spider positions. Each spider position, f _i or m _i, is a n-dimensional vector, which represents the parameter values to be optimized. f _i and m _i can be given as follows:

$$ \begin{array}{l}{f}_{i,j}^0={b}_j^{low}+ rand\left(0,1\right)\cdot \left({b}_j^{high}-{b}_j^{low}\right)\\ {}\kern2.4em i=1,2,\dots {N}_f;j=1,2,..,n\end{array} $$

(16)

$$ \begin{array}{l}{m}_{k,j}^0={b}_j^{low}+ rand\left(0,1\right)\cdot \left({b}_j^{high}-{b}_j^{low}\right)\\ {}\kern2.4em k=1,2,\dots {N}_m;j=1,2,..,n\end{array} $$

(17)

where j,i and k are individual indexes, respectively; zero signals denotes the initial population; rand(0,1) represents a random number between 0 and 1; and b ^low_j and b ^high_j represents the lower and upper initial parameter bound, respectively.

3.5.5 Cooperative Operators

In SSO algorithm, the cooperative operators contain female cooperative operator and male cooperative operator. The female cooperative operator can be modeled as follows:

$$ {f}_i^{k+1}=\left\{\begin{array}{l}{f}_i^k+\alpha \cdot Vib{c}_i\cdot \left({s}_c-{f}_i^k\right)+\beta \cdot Vib{b}_i\cdot \left({s}_b-{f}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right)\kern0.1em ,\left({r}_m<PF\right)\\ {}{f}_i^k-\alpha \cdot Vib{c}_i\cdot \left({s}_c-{f}_i^k\right)-\beta \cdot Vib{b}_i\cdot \left({s}_b-{f}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right),\left({r}_m\ge PF\right)\end{array}\right. $$

(18)

where α, β, δ and rand are random numbers between [0,1], k denotes the iteration number. The individual s _c and s _b denotes the nearest member to i that holds a higher weight and the best individual of the entire population S, respectively. r _m is a uniform random number, which is generated between [0,1], whereas PF is a threshold and is often set to be 0.7 (Cuevas et al. 2013).

The male cooperative operator can be modeled as following:

$$ {m}_i^{k+1}=\left\{\begin{array}{l}{m}_i^k+\alpha \cdot \left(\frac{{\displaystyle {\sum}_{h=1}^{N_m}{m}_h^k\cdot {w}_{N_f+h}}}{{\displaystyle {\sum}_{h=1}^{N_m}{w}_{N_f+h}}}-{m}_i^k\right),\kern0.4em if{w}_{N_{f+i}}\le {w}_{N_f+m}\\ {}{m}_i^k+\alpha \cdot Vib{f}_i\cdot \left({s}_f-{m}_i^k\right)+\delta \cdot \left( rand-\frac{1}{2}\right),\kern0.3em if{w}_{N_f+i}>{w}_{N_f+m}\end{array}\right. $$

(19)

where the individual s _f denotes the nearest female individual to the male individual i whereas $ \left({\displaystyle {\sum}_{h=1}^{N_m}{m}_h^k\cdot {w}_{N_f+h}}/{\displaystyle {\sum}_{h=1}^{N_m}{w}_{N_f+h}}\right) $ corresponds to the weighted mean of the male population M.

3.5.6 Mating Operator

In SSO algorithm, a new brood s _new is generated by mating operator, when a dominant male spider locates a number of female members within a specific range r. r is defined as a radius which depends on the size of the search space, and can be computed by the following equation:

$$ r=\frac{{\displaystyle \sum_{j=1}^n\left({b}_j^{high}-{b}_j^{low}\right)}}{2\cdot n} $$

(20)

In the mating process, the influence probability Ps_i of each individual is calculated by the roulette method, which is given as follows:

$$ P{s}_i=\frac{w_i}{{\displaystyle \sum_{j=1}^k{w}_j}} $$

(21)

When the new spider is formed, it will be compared with the worst spider of the colony. If the new spider is better than the worst spider, it will replace the worst spider. Otherwise, it will be discarded.

3.6 The Modelling Approach of Proposed Technique

When hermite-PPR is used to forecasting annual maximum peak flood discharge based on historical observed records, two key problems are confronted: how to choose the input variables, and how to set the best parameters of hermite-PPR. These two problems are very important for obtaining satisfactory prediction accuracy.

3.6.1 Selection of Predictor Variables and Data Processing

In data-driven modeling approach, the statistical procedures (i.e. the autocorrelation function (ACF) and the partial autocorrelation function (PACF)) were suggested for identifying an appropriate input vector for a model (Wang et al. 2009). By analyzing autocorrelation coefficient, the significant lags of independent variables that are potentially influencing the output can be identified in this paper. The autocorrelation coefficient of lag k step of times series q _i can be calculated as

$$ {R}_k=\frac{{\displaystyle \sum_{i=k+1}^n\left({q}_i-\overline{q}\right)\left({q}_{i-k}-\overline{q}\right)}}{{\displaystyle \sum_{i=1}^n{\left({q}_i-\overline{q}\right)}^2}}\begin{array}{cc}\hfill \hfill & \hfill \left(k=1,2,\cdots m\right)\hfill \end{array} $$

(22)

$$ \overline{x}=\frac{1}{n}{\displaystyle \sum_{i=1}^n{q}_i} $$

(23)

where n is the number of samples, m < n/4, m is maximal integer and less than n/4. According to theory of sampling distribution of R _k, in the case of 1-a confidence level, if R _k can satisfy

$$ {R}_k\notin \left[\frac{-1-{\mu}_{\alpha /2}{\left(n-k-1\right)}^{1/2}}{n-k},\frac{-1+{\mu}_{\alpha /2}{\left(n-k-1\right)}^{1/2}}{n-k}\right] $$

(24)

then the significant dependence can be inferred by k-step delay of times series x _i, and x _i − k can be selected as predictor variables. μ _a/2 can be found from normal distribution table. In this study, the confidence level is 80%.

In order to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges and numerical difficulties during the calculation, it is necessary to normalize the original data. This can be made by the following equation

$$ {x}_i=\frac{\left({q}_i-\overline{q}\right)}{\sqrt{\frac{1}{n-1}{\displaystyle \sum_{i=1}^n{\left({q}_i-\overline{q}\right)}^2}}}\begin{array}{cc}\hfill \hfill & \hfill \left(i=1,2,\cdots n\right)\hfill \end{array} $$

(25)

where x _i represents the normalized discharge sequence, q _i represents annual maximum flood peak discharge series, $ \overline{q} $ represents mean value of q _i, n is the number of samples.

3.6.2 Optimizing the Hermite-PPR Parameters With SSO and LS Method

The quality of hermite-PPR for prediction depends on several parameters: the projection direction vector a, the coefficient of hermite polynomial c and the number of ridge functions M. Equation (6) is a nonlinear constrained optimization problem. In this paper, a novel optimization algorithm named social spider optimization (SSO) algorithm (Cuevas et al. 2013) is employed to determine the optimal projection direction vector a. The coefficient of hermite polynomial c can be estimated by using least square (LS) method. The value M can be found through a forward stage-wise strategy which stops when the model fit cannot be significantly improved. The hermite-PPR algorithm with SSO and LS method can be simply represented as follows:

3.6.3 Setting Parameters

Parameters such as the number of ridge functions M, the total number of colony spiders N and the evolution number of generation G _max of SSO algorithm need to be set.

3.6.4 Generating Projection Direction Vector a

In this study, the projection direction vector a is optimized by SSO algorithm. Each spider position is a n-dimensional vector, which represents a possible solution of projection direction vector a. First, considering N as the total number of n-dimensional colony members, define the number of females N _f and male N _m spiders according to Eq.( ₇ ) and ( ₈ ). Then, initialize randomly the female individuals ($ F=\left\{{f}_1,{f}_2,\dots, {f}_{N_f}\right\} $) and male members ($ M=\left\{{m}_1,{m}_2,\dots, {m}_{N_m}\right\} $) and calculate the radius of mating using Eq. ( ₂₀ ).

3.6.5 Calculating the Coefficient of Hermite Polynomial c

According to projection direction vector generated, the projection value z can be calculated by Eq. (3). The hermite polynomial h _r(z) can be obtained by Eq. (4). Then the coefficient c can be obtained using LS method by solving

$$ {\displaystyle \underset{c_j}{ \min }}{\left\Vert {Y}_i-{\displaystyle \sum_{j=1}^r{c}_j{h}_{ij}\left({z}_i\right)}\right\Vert}^2 $$

(26)

3.6.6 Executing Iterative Process of SSO Algorithm

First, according to the generated projection direction vector a and the obtained coefficient c, the regressive value can be calculated by Eq. (2). The fitness value can be obtained by the evaluation of the spider position s _i with regard to the objective function as follows

$$ Fitval=\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({Y}_i-{y}_i\right)}^2} $$

(27)

Then, calculate the weight of every spider by Eq.(10); move female spiders and male spiders according to the female and male cooperative operator by Eqs.(18) and (19), respectively; perform the mating operation to form new spider by Eq. (21). Finally, if the stop criteria are met, the process is finished and the optimal projection direction parameters and the optimal coefficient c are obtained. The first ridge function optimization is over. Here, the iterative process of SSO algorithm is only a brief description, the detailed description about the iterative process of SSO algorithm can be found in Cuevas et al. (2013).

3.6.7 Optimization Terminated of Model and Results Output

According to the acquired optimal projection direction parameters a and the acquired optimal coefficient c, the simulated residual ε _i and the termination value can be calculated. If the termination value satisfies the termination condition,the results are obtained. Otherwise, y _i is replaced by ε _i, and the optimization of the next ridge function is performed.

4 The Evaluation of Forecasting Performance

The performance of MLR, ANN and SSO-HPPR for the annual maximum peak flood discharge forecasting was evaluated using five different statistical indices, which are described as follows:

Firstly, the root mean squared error (RMSE) is selected as the performance criterion of level prediction, records in real units the level of overall agreement between the observed and modelled datasets (Dawson et al. 2007) and is indicative of the good measure of model performance for high flows (Karunanithi et al. 1994). The RMSE can be calculated as

$$ RMSE=\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_f(i)-{y}_o(i)\right)}^2}} $$

(28)

Secondly, the mean absolute relative error (MARE) is a relative criterion which is sensitive to the forecasting errors that occur in the low(er) magnitudes of each dataset, and is less sensitive to the larger errors that usually occur at higher magnitudes, because the errors are not squared. MARE is an unbiased statistic for computing the predictive capability of a model, and can be calculated as:

$$ MARE\left(\%\right)=\frac{1}{n}{\displaystyle \sum_{i=1}^n\left|\frac{y_f(i)-{y}_o(i)}{y_o(i)}\right|}\times 100 $$

(29)

Thirdly, the coefficient of correlation (CC) is selected as the degree of collinearity criterion of level prediction. CC is generally used to compare alternative models, though it is oversensitive to high extreme values (outliers) and insensitive to additive and proportional differences between model predictions and measured data (Legates and McCabe 1999). CC can be calculated as:

$$ CC=\frac{\frac{1}{n}{\displaystyle \sum_{i=1}^n\left({y}_o(i)-{\overline{y}}_0\left)\right({y}_f(i)-{\overline{y}}_f\right)}}{\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_0(i)-{\overline{y}}_0\right)}^2}}*\sqrt{\frac{1}{n}{\displaystyle \sum_{i=1}^n{\left({y}_f(i)-{\overline{y}}_f\right)}^2}}} $$

(30)

Fourthly, the Nash-Sutcliffe efficiency coefficient (NSEC) is used as one of very popular index to assess the predictive power of hydrological models (Nash and Sutcliffe 1970). It can be calculated as

$$ NSEC=1-\frac{{\displaystyle \sum_{i=1}^n{\left({y}_o(i)-{y}_f(i)\right)}^2}}{{\displaystyle \sum_{i=1}^n{\left({y}_o(i)-{\overline{y}}_o\right)}^2}} $$

(31)

where y _o(i) and y _f(i) are, respectively, the observed and forecasted runoff and $ {\overline{y}}_o $, $ {\overline{y}}_f $ denote their means value, and n is the number of samples considered.

Finally, the qualified rate (QR) was defined by Standard for hydrological information and hydrological forecasting (GB/T 22482–2008) (Quality of nation of P. R. china and Standardization administration of P. R.china 2008). According to the related specifications of GB/T 22482–2008, if the absolute relative error (ARE) of a forecasted peak flood discharge is less than 20% of observed value, it is a qualified forecast. ARE is calculated as:

$$ ARE\left(\%\right)=\left|\frac{y_f-{y}_o}{y_o}\right|\times 100 $$

(32)

QR is defined by

$$ QR=\frac{n_q}{n}\times 100 $$

(33)

where n _q is the number of qualified forecast, n is the total number of forecast.

If QR is more than 85%, the performances of forecasting satisfy the first level standard. If QR is more than 70% and less than 85%, the performances of forecasting satisfy the second level standard. If QR is more than 60% and less than 70%, the performances of forecasting satisfy the third level standard. Otherwise, the results of the performances of model are not feasible for peak flood discharge forecasting. Therefore, QR is an overall evaluation of prediction precision level.

5 Models Application and Discussion

Based on previously introduced, the proposed model is developed and evaluated using the annual maximum flood peak discharge data from 1882 to 2004 at Yichang station, and the data set from 1882 to 1994 is used for training models whilst that from 1995 to 2004 is used for validating performances of model. In order to select the appropriate predictor variables, the autocorrelation coefficient of lag k step R _k, its upper bound R _1k and lower limit R _2k of annual maximum flood peak discharge time series q _i can be calculated by Eqs. (22) ~ (24) based on 80% confidence level. The results of R _k, R _1k and R _2k are described in Table 1. As can be seen from Table 1, when the variable q _i lag 1, 2, 3, 20, 21, 22, 25, 28, 30 steps, the dependence is obvious under the condition ₈ 0% confidence level. Therefore, q _i-1,q _i-2,q _i-3,q _i-20,q _i-21,q _i-22,q_i-25,q _i-28 and q _i-30 can be selected as predictor variables. In the modeling process, all the data series were normalized using Eq.(25); the range of projection direction parameters a is set between −1 and 1, the rank of hermite polynomial r is 6, the population size of SSO algorithm N is 50, the evolution number of generation of SSO algorithm G _max is 500 and the number of ridge functions M is 2. Thus, the prediction model of annual maximum peak discharge of Yangtze River at YiChang station can be obtained as follows:

Table 1 The autocorelation coefficient of lag k step R _k, its upper bound R _1k and lower limit R _2k of q _i based on 80% confidence level

Full size table

$$ \begin{array}{l}f\left({x}_i\right)={\displaystyle \sum_{m=1}^2{\displaystyle \sum_{j=1}^6{c}_{mj}{h}_{ij}\left({\displaystyle \sum_{k=1}^9{a}_{mk}{x}_{ik}}\right)}}\\ {}{a}_{mk}=\left[\begin{array}{ccccccccc}\hfill -0.5026\hfill & \hfill -0.2533\hfill & \hfill 0.6548\hfill & \hfill 0.3622\hfill & \hfill 0.0643\hfill & \hfill 0.1909\hfill & \hfill -0.0611\hfill & \hfill -0.1827\hfill & \hfill -0.2135\hfill \\ {}\hfill 0.1644\hfill & \hfill -0.4742\hfill & \hfill -0.4068\hfill & \hfill -0.1840\hfill & \hfill 0.1287\hfill & \hfill -0.0123\hfill & \hfill 0.3641\hfill & \hfill 0.5149\hfill & \hfill 0.3665\hfill \end{array}\right]\\ {}{c}_{mj}=\left[\begin{array}{cccccc}\hfill -0.8767\hfill & \hfill -0.3722\hfill & \hfill -2.3517\hfill & \hfill 6.5304\hfill & \hfill -1.4695\hfill & \hfill 6.5744\hfill \\ {}\hfill 3480.4190\hfill & \hfill 700.6110\hfill & \hfill 201.9909\hfill & \hfill -23940.9609\hfill & \hfill -13034.4499\hfill & \hfill -32053.4996\hfill \end{array}\right]\end{array} $$

(34)

For the same basis of comparison, the same training and testing sets, respectively, are used for the above MLR and three-layer feed-forward ANN model developed. Using the least-squares method, the MLR model can be obtained as following:

$$ \begin{array}{l}{y}_i=41330.98+0.1572{q}_{i-1}+0.1510{q}_{i-2}-0.2395{q}_{i-3}-0.1689{q}_{i-20}-0.0605{q}_{i-21}\\ {}\kern2.1em -0.0539{q}_{i-22}+0.0946{q}_{i-25}+0.1555{q}_{i-28}+0.1571{q}_{i-30}\end{array} $$

(35)

The four quantitative standard statistical performance evaluation measures RMSE, MARE, CC, NSEC are employed to evaluate the performances of above three models developed, and the statistical results of different models are summarized in Table 2. From Table 2, it can be observed the values with hermite-PPR model with SSO and LS method hybrid optimization were able to produce a good and close forecast, as compared with those of MLR, ANFIS and three-layer feed-forward ANN models. In the training phase, the hermite-PPR model improved the ANFIS model, ANN model and MLR method by about 28.67%, 31.63% and 53.46%, respectively, and a 32.76%, 34.94% and 49.94% reduction in RMSE and AARE values, respectively. Improvements in the forecasting results regarding the CC value were approximately 56.37%, 84.71% and 123.65%, respectively. The hermite-PPR model obtained the best value of NSEC increase by 150.41% and 244.13% comparing with the ANFIS model and the ANN model, respectively. In the validation phase, the hermite-PPR model improved the ANFIS model, ANN model, MLR method by about 6.03%, 9.83% and 42.28%, respectively, and a 1.88%, 13.34% and 39.68% reduction in RMSE and AARE values, respectively. Improvements in the forecasting results regarding CC value were approximately 7.46%, 17.57% and 10.42%, respectively. The hermite-PPR model obtained the best value of NSEC increase by 33.86% and 81.82% comparing with the ANFIS model and the ANN model, respectively. However, in training and validation, the MLR method obtained the negative value of NSEC which indicates that the observed mean value is a better predictor than it. Therefore, it can be concluded that it is not feasible to use the MLR method to forecast the annual maximum flood peak discharge at Yichang station.

Table 2 Forecasting performance indices of different models

Full size table

In order to further analyze the performance of model, QR, which is hydrological information and hydrological forecasting (GB/T 22482–2008) in China for flood peak discharge forecasting, is also employed to evaluate the performances of the above three models developed. In addition, the percentage which the absolute value of relative error fall into the other interval, the maximum absolute relative error (%) and the minimum absolute relative error (%) are also analysed, and Table 3 presents the results of study sites in terms of various performance statistics. As can be seen from Table 3 that QR of MLR, ANN ANFIS and hermite-PPR is 61.4%, 77.1% 78.3% and 94%, respectively, in training phase, and 40.0%, 90%, 80% and 90%, respectively, in validation phase. Therefore, in training and validation phase, QR of hermite-PPR model are both more than 85% and the performances of forecasting satisfy the first level standard. QR of ANFIS model are both less than 85% and the performances of forecasting satisfy the second level standard. QR of ANN model is more than 85% in validation phase, which satisfies the first level standard, whereas is less than 85% in training phase, which satisfies the second level standard. QR of MLR method is just more than 60% in training phase, which satisfies the third level standard, whereas is less than 60% in validation phase, which indicates the results of the performances of model are not feasible for peak flood discharge forecasting. Furthermore, as can be seen from Table 3 that the percentage which the absolute value of relative error fall into the other interval, the maximum absolute relative error (%) and the minimum absolute relative error (%) are both better than the ANFIS, ANN and MLR models. Figure 3 illustrates the annual maximum flood peak discharge forecasting results at Yichang station using different models. It can be seen from Fig. 3 that the hermite-PPR model with SSO and LS algorithm can match flood peak discharge better than ANFIS, ANN and MLR models. The results of this analysis indicate that the hermite-PPR model is able to obtain the best result in terms of different evaluation measures. This illustrates that the hermite-PPR model is suitable for non-linear and nonstationary extreme flood peak analysis, the idea of ‘projection pursuit’ is feasible, and the hermite-PPR model can overcome the drawbacks of traditional models to generate a synergetic effect in forecasting and improve the prediction performance.

Table 3 The performance analysis based on different models

Full size table

6 Conclusions

It is very difficult to describe relationships between the potential predictors x and the expected value y using certain functions for annual maximum peak flood discharge forecasting. In this paper, a new hermite-PPR model with SSO and LS algorithm for designing annual maximum flood peak discharge forecasting model is presented. The method utilizes the statistical properties of the data series for identifying an appropriate input vector to the model, and trains model with a novel SSO and LS algorithm. The proposed model was tested using real datasets from a large size catchment of the Yangtze River in China. A typical three-layer feed forward ANN model, ANFIS and MLR are employed as benchmark comparison for illustrating the forecasting capability of the proposed model. Five statistical performance evaluation measures are employed to evaluate the performances of the models. The results indicated a promising role of the new hermite-PPR model with SSO and LS algorithm in annual maximum flood peak discharge forecasting, showed higher performance level of hermite-PPR model over ANFIS, ANN and MLR models in Yangtze River. In addition, the implementation of the methodology presented can lead to certain automation procedures in model development. Since the proposed model is based on the information contained in the data series itself, and is based on clear statistical properties as decision rules, the approach becomes more explicit and can be adopted for practical application.

References

Blöschl G, Montanari A (2010) Climate change impacts—throwing the dice? Hydrol Process 24(3):374–381
Google Scholar
Chau KW, Wu CL, Li YS (2005) Comparison of several flood forecasting models in Yangtze River. J Hydrol Eng 10(6):485–491
Article Google Scholar
Chen XY, Chau KW (2016) A hybrid double feedforward neural network for suspended sediment load estimation. Water Resour Manag 30(7):2179–2194
Article Google Scholar
Cloke HL, Pappenberger F (2009) Ensemble flood forecasting: a review. J Hydrol 375(3–4):613–626
Article Google Scholar
Cuevas E, Cienfuegos M, Zaldivar D, Perez-Cisneros M (2013) A swarm optimization algorithm inspired in the behavior of the social-spider. Expert Syst Appl 40(16):6374–6384
Article Google Scholar
Danandeh Mehr A, Kahya E, Olyaie E (2013) Streamflow prediction using linear genetic programming in comparison with a neuro-wavelet technique. J Hydrol 505:240–249
Article Google Scholar
Dawson CW, Wilby R (1998) An artificial neural network approach to rainfall-runoff modelling. Hydrol Sci J-J Des Sciences Hydrol 43(1):47–66
Article Google Scholar
Dawson CW, Harpham C, Wilby RL, Chen Y (2002) Evaluation of artificial neural network techniques for flow forecasting in the River Yangtze, China. Hydrol Earth Syst Sci 6(4):619–626
Article Google Scholar
Dawson CW, Abrahart RJ, See LM (2007) HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environ Model Softw 22(7):1034–1052
Article Google Scholar
Friedman JH, Stuetzl W (1981) Projection pursuit regression. J Am Stat Assoc 76(376):817–823
Article Google Scholar
Gemmer M, Jiang T, Su B, Kundzewicz ZW (2008) Seasonal precipitation changes in the wet season and their influence on flood/drought hazards in the Yangtze River Basin, China. Quat Int 186(1):12–21
Article Google Scholar
Jang J-SR (1993) ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans Syst, Man, and Cybernet 23(3):665–685
Article Google Scholar
Jeng-Neng H, Lay SR, Maechler M, Martin RD, Schimert J (1994) Regression modeling in back-propagation and projection pursuit learning. Neural Netw, IEEE Trans 5(3):342–353
Article Google Scholar
Karunanithi N, Grenney WJ, Whitley D, Bovee K (1994) Neural networks for river flow prediction. J Comput Civil Eng - ASCE 8(2):201–220
Article Google Scholar
Latt Z, Wittenberg H (2014) Improving flood forecasting in a developing country: a comparative study of stepwise multiple linear regression and artificial neural network. Water Resour Manag 28(8):2109–2128
Article Google Scholar
Legates DR, McCabe GJ (1999) Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour Res 35(1):233–241
Article Google Scholar
Li J, Thyer M, Lambert M, Kuczera G, Metcalfe A (2014) An efficient causative event-based approach for deriving the annual flood frequency distribution. J Hydrol 510:412–423
Article Google Scholar
Magar RB, Jothiprakash V (2011) Intermittent reservoir daily-inflow prediction using lumped and distributed data multi-linear regression models. J Earth Syst Sci 120(6):1067–1084
Article Google Scholar
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I — a discussion of principles. J Hydrol 10(3):282–290
Article Google Scholar
Quality of nation of P. R. china and Standardization administration of P. R.china (2008) National Standard of the People’s Republic of China, Standard for hydrological information and hydrological forecasting (GB/T 22482–2008). China Water & Power Press, Beijing, pp 5–6
Google Scholar
Tripathi R, Sengupta SK, Patra A, Chang H, Jung IW (2014) Climate change, urban development, and community perception of an extreme flood: a case study of Vernonia, Oregon, USA. Appl Geogr 46:137–146
Article Google Scholar
Wang WC, Chau KW, Cheng CT, Qiu L (2009) A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series. J Hydrol 374(3–4):294–306
Article Google Scholar
Wang WC, Xu DM, Chau KW, Chen SY (2013) Improved annual rainfall-runoff forecasting using PSO-SVM model based on EEMD. J Hydroinf 15(4):1377–1390
Article Google Scholar
Wang WC, Xu DM, Chau KW, Lei GJ (2014) Assessment of river water quality based on theory of variable fuzzy sets and fuzzy binary comparison method. Water Resour Manag 28(12):4183–4200
Article Google Scholar
Wang WC, Chau KW, Xu DM, Chen XY (2015) Improving forecasting accuracy of annual runoff time series using ARIMA based on EEMD decomposition. Water Resour Manag 29(8):2655–2675
Article Google Scholar
Wu MC, Lin GF, Lin HY (2014) Improving the forecasts of extreme streamflow by support vector regression with the data extracted by self-organizing map. Hydrol Process 28(2):386–397
Article Google Scholar
Xu DM, Wang WC, Chau KW, Cheng CT, Chen SY (2013) Comparison of three global optimization algorithms for calibration of the Xinanjiang model parameters. J Hydroinf 15(1):174–193
Article Google Scholar
Yu F, Chen Z, Ren X, Yang G (2009) Analysis of historical floods on the Yangtze River, China: characteristics and explanations. Geomorphology 113(3–4):210–216
Article Google Scholar
Zhang Q, Liu C, Xu CY, Xu Y, Jiang T (2006) Observed trends of annual maximum water level and streamflow during past 130 years in the Yangtze River basin, China. J Hydrol 324(1–4):255–265
Article Google Scholar

Download references

Acknowledgments

This research was supported by Public welfare project fund of Ministry of water resources Central Research (201501008), Grant of Hong Kong Polytechnic University (4-ZZAD), National Natural Science Foundation of China (NO:51509088), Program for Science & Technology Innovation Talents in Universities of Henan Province (13HASTIT034), and Science and technology innovation team in Colleges and universities in Henan Province (14IRTSTHN028).

Author information

Authors and Affiliations

School of Water Conservancy, North China University of Water Resources and Electric Power, Zhengzhou, 450045, People’s Republic of China
Wen-chuan Wang, Dong-mei Xu & Lin Qiu
Department of Civil and Environmental Engineering, Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, People’s Republic of China
Kwok-wing Chau
Department of Civil, Environmental and Geomatic Engineering, University College London, Gower Street, London, WC1E 6BT, UK
Can-can Liu

Authors

Wen-chuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kwok-wing Chau
View author publications
You can also search for this author in PubMed Google Scholar
Dong-mei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Can-can Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kwok-wing Chau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, Wc., Chau, Kw., Xu, Dm. et al. The Annual Maximum Flood Peak Discharge Forecasting Using Hermite Projection Pursuit Regression with SSO and LS Method. Water Resour Manage 31, 461–477 (2017). https://doi.org/10.1007/s11269-016-1538-9

Download citation

Received: 07 January 2015
Accepted: 30 October 2016
Published: 11 November 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s11269-016-1538-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The Annual Maximum Flood Peak Discharge Forecasting Using Hermite Projection Pursuit Regression with SSO and LS Method

Abstract

Similar content being viewed by others

Regional Flood Frequency Analysis Through Some Machine Learning Models in Semi-arid Regions

Stream Flow Forecasting with One Day Lead Using ʋ-Support Vector Regression

Evaluation of statistical models and modern hybrid artificial intelligence in the simulation of precipitation runoff process

Explore related subjects

1 Introduction

2 The Study Area and Data Information

3 Description of Methodology

3.1 Multiple Linear Regression (MLR) Model

3.2 Artificial Neural Networks (ANNs)

3.3 Adaptive Neuro-Fuzzy Inference System (ANFIS)

3.4 Projection Pursuit Regression Using Hermite Polynomial (Hermite-PPR)

3.5 Social Spider Optimization (SSO) Algorithm

3.5.1 Defining the Number of Female and Male

3.5.2 Fitness Assignation

3.5.3 Modeling of the Vibrations Through the Communal web

3.5.4 Initializing the Population

3.5.5 Cooperative Operators

3.5.6 Mating Operator

3.6 The Modelling Approach of Proposed Technique

3.6.1 Selection of Predictor Variables and Data Processing

3.6.2 Optimizing the Hermite-PPR Parameters With SSO and LS Method

3.6.3 Setting Parameters

3.6.4 Generating Projection Direction Vector a

3.6.5 Calculating the Coefficient of Hermite Polynomial c

3.6.6 Executing Iterative Process of SSO Algorithm

3.6.7 Optimization Terminated of Model and Results Output

4 The Evaluation of Forecasting Performance

5 Models Application and Discussion

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation