Keywords

1 Introduction

Hydrological systems and their underlying processes are complicated. In real world, these systems are approximated by hydrological models, and a model is considered adequate if the difference between model prediction and measurement is small, satisfying a predetermined criterion.

Since the pioneering development of the rational method in the middle of the nineteenth century (Mulvaney 1850), development of hydrological models has gone through several stages, input-output models (black box), lumped conceptual models (grey box), and physically based distributed models (white box) (Fig. 1). As the simplest models, black-box hydrological models are based on input-output relations and do not describe the underlying hydrologic processes. In many cases, however, black-box models are adequate and served as a first step for modelers to conceptualize and simplify hydrologic systems. In fact, other types of hydrological models, i.e., gray-box and white-box models, more or less also owe their origin to black-box models.

Fig. 1
figure 1

Black-box, gray-box, and white-box models

Compared to other types of models, black-box models require the least input data. Precipitation and temperature are frequently used as input. Temperature is significant for hydrological modeling, especially, in climate regimes with snow. Other hydrometeorological variables can also be possibly employed in black-box models. Some models can even be run only with output data, such as runoff data. Spatial characteristics of catchments are seldom considered but they are useful in the estimation of model parameters and can aid model interpretation and analysis.

Further, running black-box models is more easily achieved in terms of computer resources. Most black-box models do not demand high computational capability. Hence, a personal laptop or computer is enough for most cases. Many software or computer languages, such as Matlab or R, are friendly in programming black-box models.

This chapter describes different types of black-box hydrological models that are based on input-output relationships, including graphical antecedent precipitation index (API) models, regression models, time series models, artificial neural network (ANN) models, fuzzy logic models, and frequency analysis models. In the following sections, the main types of black-box models are discussed with the focus placed on the definition of a model, mathematical description and schematic representation of the elements and structures of basic model forms, and examples of applications in hydrology and water resources engineering. It should be noted that a complete discussion of the theory of hydrological systems or a complete coverage of the studies published in the literature is not a major concern of this chapter. For each type of black-box hydrological models, the basic forms are presented and a general discussion of their strengths and weaknesses is also provided.

2 Antecedent Precipitation Index (API) Models

The antecedent precipitation, i.e., the amount of precipitation that has occurred prior to a single storm event, plays an important role in calculating the runoff response to rainfall, especially in catchments where runoff generation is dominated by soil water or groundwater storage, i.e., runoff generation follows the principle of the “variable source area” theory (Hewlett and Hibbert 1967). The antecedent precipitation index (API) is generally defined as the weighted summation of daily precipitation amounts that is used as an index of soil moisture, and is expressed by the following equation (Kohler and Linsley 1951):

$$ \mathrm{API}=\sum \limits_{t=-1}^{-i}{P}_t{k}^{-t} $$
(1)

where Pt is the amount of precipitation on the tth day prior to the occurrence of storm, and k is normally a constant.

The above exponential model is based on the assumption that the greater the time lapse between a rainfall event and a given day, the less influence the rain has on the soil moisture content of that day (Saxton et al. 1967). The value of i is usually taken as 5, 7, or 14 days (Viessman and Lewis 1996; Ali et al. 2010).

2.1 Calculations of API

The decrease of API is usually assumed to follow a logarithmic decay. Thus, during periods of no precipitation:

$$ {\mathrm{API}}_i=k\times {\mathrm{API}}_{i-1} $$
(2)

For periods with precipitation:

$$ {\mathrm{API}}_i=k\times \left({\mathrm{API}}_{i-1}+\left.{P}_t\right)\right. $$
(3)

This means that if any rain occurs, it should be added to the index (Fig. 2). In areas of snowfall, precipitation is applied to the model on the days when it melts rather than on the days when it falls. The value of k varies with basin physical characteristics and the meteorological condition, and a range of 0.85-0.90 over most of the eastern central parts of the United States was suggested (Viessman and Lewis 1996), which can be used as a reference for other regions.

Fig. 2
figure 2

API relation (modified from NWSRFS User Manual Documentation at http://www.nws.noaa.gov/ohd/hrl/nwsrfs/users_manual/htm/xrfsdocpdf.php)

To overcome the paradoxes that API often remains a subjectively determined and arbitrarily implemented parameter in rainfall-runoff modeling, Heggen (2001) proposed the use of a normalized antecedent precipitation index (NAPI) in place of API (Equation (4)). NAPI is defined as the ratio of the API on the day to the product of the average daily precipitation and the weighted sum of decay coefficients of the respective days before the storm.

$$ \mathrm{NAPI}=\frac{\sum \limits_{t=-1}^i{P}_t{k}^{-t}}{\overline{P}\sum \limits_{t=-1}^{-i}{k}^{-t}} $$
(4)

where \( \overline{P} \) is the average rainfall for antecedent days, and the other terms have been defined before. The soil moisture condition is assumed to be “dry” if NAPI < 0.33, the wet condition is defined as NAPI > 3, and the intermediate range 0.33 ~3 is the “fair” condition (Hong et al. 2007).

2.2 Graphical API Models

During the 1950–1960s, scientists were seeking techniques which would (1) simplify the relationships of rainfall and runoff, (2) require less time for the calculation and forecasting especially when the computers were not yet available, and (3) not require information of soil and surface characteristics, vegetation differences, and land use, which were usually not available.

Because of the importance of antecedent soil moisture condition to runoff generation, many indices have been used to estimate the moisture condition, such as (1) days since last rain, (2) discharge at the beginning of the storm, and (3) antecedent precipitation. API, a rough representation of the initial soil-moisture condition, generally provides better results among the three indices and can also be easily determined. Based on API, Kohler and Linsley (1951) developed a relationship between storm runoff and precipitation by a graphical method of coaxial relations (Fig. 3). The graphical method consists of 3 three-variable relations, relating storm runoff as the dependent variable to the antecedent precipitation (API), date (week number), rainfall amount, and rainfall duration as independent variables.

Fig. 3
figure 3

Coaxial relation – antecedent precipitation index (modified from Sittner et al., 1969)

2.3 Summary

Antecedent soil moisture condition is important for watershed modeling that ultimately provides information on flood forecasting, water resources management, hydroelectric power generation, and irrigation management. Because the observed soil moisture data at a larger scale are usually not available, antecedent precipitation index, API, a rough representation of the initial soil-moisture condition, has been widely used in different hydrological modeling studies since the 1950s and the studies have generally shown that the use of API has potential to provide satisfactory results.

One typical example of using API as an important part of hydrological modeling is the model developed by the U.S. Soil Conservation Service (SCS) (1972) that incorporates initial losses or abstractions into a coefficient as a function of what is referred to as curve number, CN, which is a function of land use, soil type, hydrologic condition of basin, and the antecedent moisture condition (AMC) that generally is equvilant to the concept of antecedent precipitation index (API).

Other applications of API can be found in more complex rainfall-runoff models for storm runoff simulation (Saxton et al. 1967; Sittner et al. 1969; Fedora and Beschta 1989; Ali et al. 2010; Rajurkar et al. 2004; Dawson and Abrahart 2007), and in other models for landslide studies (Glade et al. 2000; Ma et al. 2014), global runoff simulation (Hong et al. 2007), and the calculation of Forest Fire Danger Index (FFDI) (Liu et al. 2003), etc.

3 Regression Models

Regression (the term was first used by Pearson 1908) analysis is commonly used to describe quantitative relationships between a response variable and one or more explanatory variables. In hydrology, regression model is a useful tool for detecting relations between runoff and precipitation for the same watershed, between runoff (or precipitation) in different watersheds, between crop growth and precipitation, and so on.

An analytical problem to be solved by regression analysis involves (Riggs 1985): (1) selection of factors which are expected to influence the dependent variable; (2) describing these factors quantitatively; (3) selection of the regression model; (4) computing the regression equation, the standard error of estimate, and the significance of the regression coefficients; and (5) evaluation of results.

3.1 Simple Linear Regression

Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable (dependent variable) can be predicted from others (independent variables).

The simplest statistical model for simple linear regression

$$ {Y}_i=a+{bX}_i+{e}_i\qquad i=1,\dots, n $$
(5)
$$ {\widehat{Y}}_i=a+{bX}_i $$
(6)

where:

  • ei is the error term or residual of the regression line, e1, …, en are unobservable random variables, and are usually assumed as independent and normally distributed with mean zero and an unknown constant standard deviation, σ;

  • Xi and Yi are the observed independent and dependent variables, respectively;

  • \( {\widehat{Y}}_i \) are the values estimated from the regression line;

  • a and b are regression coefficients, where b is called the slope of the line and a is the y-intercept. The slope measures the amount Y increases/decreases when X increases/decreases by one unit. The y-intercept is the value of Y when X = 0.

3.1.1 Parameter Estimation

The goal is to find the equation of the straight line

$$ {\widehat{Y}}_i=a+{bX}_i $$

which would provide a “best” fit for the data points. That is to say, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model as small as possible. In statistics, simple linear regression is the least squares estimator of a linear regression model. In other words, a (the y-intercept) and b (the slope) are solved by the following minimization problem:

$$ \underset{a,b}{\min}\sum \limits_{i=1}^n{\widehat{e}}_i^2=\sum \limits_{i=1}^n{\left({Y}_i-a-{bX}_i\right)}^2 $$
(7)

It can be shown that the values of a and b that minimize the objective function (7) are

$$ b=\frac{\sum \limits_{i=1}^n{X}_i{Y}_i-n\overline{X}\ \overline{Y}}{\sum \limits_{i=1}^n{X}_i^2-n\overline{X}}=\frac{\frac{1}{n-1}\sum \limits_{i=1}^n\left({X}_i-\overline{X}\right)\left({Y}_i-\overline{Y}\right)}{\frac{1}{n-1}\sum \limits_{i=1}^n{\left({X}_i-\overline{X}\right)}^2}=\frac{\operatorname{cov}\left(X,Y\right)}{\operatorname{var}(X)} $$
(8)
$$ a=\overline{Y}-b\overline{X} $$
(9)

3.1.2 Model Evaluation

Coefficient of determination: After fitting a line to the data-points, we want to know how much of the variability in the dependent variable (Y) is explained by the regression. For this, the coefficient of determination (R2) is often used and is expressed as:

$$ {R}^2=\frac{\mathrm{Explained}\ \mathrm{variance}}{\mathrm{Total}\ \mathrm{variance}} $$

The variability in the dependent variable Y is quantified as a sum of squares:

  • \( \sum {\left({Y}_i-\overline{Y}\right)}^2 \) = total sum of squares corrected for the mean = total variance

  • \( \sum {\left({\widehat{Y}}_i-\overline{Y}\right)}^2 \) = the squared deviations of the predicted values from the mean value, explained variance by the regression line

  • \( \sum {\left({Y}_i-{\widehat{Y}}_i\right)}^2 \) = the sum of squares of deviation from the regression = unexplained variance

The most general definition of the coefficient of determination is (Haan, 2002),

$$ {R}^2=\frac{\sum {\left({\widehat{Y}}_i-\overline{Y}\right)}^2}{\sum {\left({Y}_i-\overline{Y}\right)}^2} $$
(10)

R2 ranges from 0 to 1, and it is normally expressed as a percentage.

Standard error of estimate (S): A measure of the variability of the regression line, i.e., the dispersion around the regression line is S. It tells how much variation there is in the dependent variable between the raw value and the expected value in the regression:

$$ S=\frac{\mathrm{SSE}}{n-2}=\sqrt{\frac{\sum \limits_{i=1}^n{\left({y}_i-{\widehat{y}}_i\right)}^2}{n-2}} $$
(11)

where SSE is the residual sum of squares or the sum of squares due to error. This S allows us to generate the confidence interval on the regression line as well as on regression coefficients.

Standard error (deviation) for parameters a and b: In regression analysis, the standard errors of the least square estimators for a (Sa) and b (Sb) are estimated by

$$ {S}_a=S\sqrt{\frac{1}{n}+\frac{{\overline{x}}^2}{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2}} $$
(12)
$$ {S}_b=S\sqrt{\frac{1}{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2}} $$
(13)

3.1.3 Confidence Intervals

Confidence intervals are important for testing the statistical significance of the regression coefficient as well as the regression line.

A 100(1-α)% confidence interval for a and b is:

$$ \left(\widehat{a}-{t}_{\alpha /2}{S}_a,\, \widehat{a}+{t}_{\alpha /2}{S}_a\right) $$
$$ \left(\widehat{b}-{t}_{\alpha /2}{S}_b,\, \widehat{b}+{t}_{\alpha /2}{S}_b\right) $$

where α is the significance level, and tα/2 is the critical value of the t-distribution with degrees of freedom (d.f.) =n-2.

A 100(1-α)% confidence interval on the regression line is:

$$ \left({\widehat{y}}_k\pm {t}_{\alpha /2}\sqrt{\frac{1}{n}+\frac{{\left({x}_k-\overline{x}\right)}^2}{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2}}\right) $$

where \( {\widehat{\overline{y}}}_k \) represents the predicted mean value of \( {\widehat{y}}_k \); the confidence intervals are the narrowest at \( {x}_k=\overline{x} \) and widen as xk deviates from \( \overline{x} \) as can be seen from Fig. 4.

Fig. 4
figure 4

A typical plot of simple regression with the 95% confidence intervals and 95% prediction intervals (modified from Haan, 2002)

A 100(1-α)% confidence interval on the individual points is:

$$ \left({\widehat{\overline{y}}}_k\pm {t}_{\alpha /2}\sqrt{1+\frac{1}{n}+\frac{{\left({x}_k-\overline{x}\right)}^2}{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2}}\right) $$

where the symbols are the same as above.

3.1.4 Significance of Coefficients

The statistical significance of parameters a and b equal to or larger/smaller than a given value (including zero) can be tested based on the t-distribution against a one- or two-sided alternative, depending on the nature of the relation that is anticipated. For example, if we want to test if a is significantly different from zero, the null hypothesis would be H0: a = 0, and the test statistic is \( t=\frac{a-0}{s_a} \). H0 will be rejected if |t| ≥ t1 − α/2 , n − 2.

Similarly, if we want to test if b is significantly different from zero, the null hypothesis would be H0: b = 0, and the test statistic is \( t=\frac{b-0}{s_b} \). H0 will be rejected if |t| ≥ t1 − α/2 , n − 2.

3.2 Multiple Linear Regression Analysis

The general purpose of multiple linear regression is to learn more about the relationship of a dependent or criterion variable to several independent or predictor variables.

In general, a multiple regression procedure estimates a linear equation of the form:

Yi = a + b1xi1 + b2xi2 +  ⋯  + bixip + ei  i = 1 ,  …  , n where xi1 ,  …  , xip are the values of the input variables for the ith experimental run and Yi is the corresponding response. The error terms ei are usually assumed to be independent and normally distributed with mean zero and constant variance σ2. The unknown parameters are a, bi, and σ2 or σ.

3.2.1 Parameter Estimation

As in the simple linear regression, the first step in the multiple regression analysis is to obtain the least squares estimates of parameters a and bi that minimize

$$ \sum {\left({y}_i-a-{x}_{i1}{b}_1-\cdots -{x}_{ip}{b}_p\right)}^2 $$

In practice, n observations would be available on the dependent variable y and independent variables x1 to xp. The p unknown parameters are estimated from the n observations. Thus, n must be equal to or greater than p, and in practice, n should be at least 3 or 4 times as large as p.

Most of least squares analyses of multiple linear regression models are carried out with the aid of a computer.

3.2.2 Evaluation of Multiple Regression Model

Similar to simple linear regression, R2, the coefficient of multiple determination or multiple coefficient of determination is computed and used to evaluate how good the multiple regression is. The multiple coefficient of determination is defined as

$$ {R}^2=\frac{\mathrm{Sum}\ \mathrm{of}\ \mathrm{squares}\ \mathrm{due}\ \mathrm{to}\ \mathrm{regression}}{\mathrm{Sum}\ \mathrm{of}\ \mathrm{squares}\ \mathrm{corrected}\ \mathrm{for}\ \mathrm{the}\ \mathrm{mean}}=\frac{\mathrm{Regression}\ \mathrm{SS}}{\mathrm{Total}\ \mathrm{SS}}=\frac{\sum \limits_{i=1}^n{\left({\widehat{y}}_i-\overline{y}\right)}^2}{\sum \limits_{i=1}^n{\left({y}_i-\overline{y}\right)}^2} $$
(14)

where yi are the observed values of the dependent variable, \( \overline{y} \) as its mean, and \( {\widehat{y}}_i \) are the fitted values.

There are two reasons causing R2 to tend to overestimate the variance accounted for, compared to an estimate that would be obtained from the population: a large number of predictors and a small sample size. Therefore, the calculated R2 values usually need to be adjusted using Eq. (15). So, with a large sample and with few predictors, adjusted R2 should be very similar to the R2 value.

$$ {R}_{\mathrm{adjusted}}^2=1-\left(1-{R}^2\right)\left(\frac{n-1}{n-k-1}\right) $$
(15)

where n is the number of data, and k is the number of independent variables used in the regression.

3.2.3 Stepwise Multiple Regression

In order to minimize the problem of over-parameterization, the independent variables need to be carefully chosen. The principle is that on one hand all relevant variables for theoretical or other reasons should be included, and on the other hand as few independent variables as possible (principle of parsimony) should be included, also because the more variables, the greater uncertainty, the larger type II error, and the fewer degrees of freedom.

In statistics, stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure. The decision as to which variable should be included is determined by performing a sequence of F-tests or t-tests, but other techniques are also possible.

The main approaches are as follows:

Forward selection: In stage one, the independent variable best correlated with the dependent variable is included in the equation. In the second stage, the remaining independent variables with the highest partial correlation with the dependent variable, after the first independent variable is removed, is entered. The improvement of R2 value in each step is checked by the F-test (Haan 2002).

$$ {F}_c=\frac{\left(1-{R}_{k-1}^2\right)\cdot \left(n-k-1\right)}{\left(1-{R}_k^2\right)\cdot \left(n-k-2\right)} $$
(16)

where \( {R}_{k-1}^2\ \mathrm{and}\ {R}_k^2 \) are the determination coefficient with k-1 and k-independent variables, respectively; n is the number of data; and k is used as the number of independent variables. If Fc > F1 − α , N − n − 1 , N − n − 2, we know that the addition of Xn is significant.

We continue until no variables "significantly" explain the residual variation. At each step, the increment of R2 is tested by the F-test.

Backward elimination which involves starting with all variables and eliminating independent variables one at a time until the elimination of one makes a significant difference in R-squared.

Multiple linear regression models are probably one of the most commonly used methods for hydrologic forecasting. However, various other types of statistically based regression models, e.g., nonlinear regression, principal component regression, partial least squares regression, are also used in hydrological studies (Eldaw et al. 2003; Sharda et al. 2008; Adamowski et al. 2012; Yasar et al. 2012).

3.3 Summary

Simple and multiple regression techniques are widely used in hydrology. Many applications of regression models can be found in the following categories: (1) hydrological forecasting, including streamflow forecasting (Yu and Liong 2007), rainfall prediction (Makarau and Jury 1997; Francis and Renwick 1998), water demand forecasting (Billings and Agthe 1998; Polebitski and Palmer 2009), etc.; (2) transferring information on hydrological behavior to ungauged catchments (Vogel and Kroll 1990; Pandey and Nguyen 1999); (3) infilling missing values of hydrologic variables, such as runoff, precipitation, temperature, soil moisture, etc. (Beauchamp 1989; Eischeid et al. 2000; Dumedah and Coulibaly 2011); and (4) regression models, which are also one of the four general categories of downscaling methods. The relationships between large-scale and local-scale climatic fields were established by regression-based schemes (Hewitson and Crane 1996; Wilby et al. 1999; Charles et al. 2007).

In regression models, it should be noted that the regression equation does not imply a cause-and-effect relationship of the dependent variable to independent variables. Both may be influenced by some other factors that are not readily measured. However, there should be some physical tie between the variables if the results can be considered meaningful. Thus, there should be a physically plausible argument for selecting the explanatory variables to estimate the dependent variable.

Like many other statistical procedures, the regression analysis method described above is built under the assumption that the data are normally distributed, but the types of data used in hydrology commonly are not normally distributed, and some have no probability distribution at all. Hydrologists must select procedures most nearly suitable to the characteristics of data and must interpret the results accordingly.

4 Time Series Models

This section deals with basic time series models, which have become a major tool in hydrology in the era of information technology. In hydrology, time series models are usually used for building mathematical models to generate synthetic hydrologic records, to forecast hydrologic events, to detect trends and shifts in hydrologic records, and to fill in missing data and extend records (Salas et al. 1980, Salas 1993; Haan 2002).

A time series is a series of observations of a variable in the course of time, where time is discretized to a series of time points or moments. A complete observed time series, y(t), can be decomposed into a number of components as expressed by:

$$ y(t)={y}_1(t)+{y}_2(t)+{y}_3(t)+{y}_4(t) $$
(17)

where y1(t) is the trend component, y2(t) is the periodic component, y3(t) is the catastrophic event, and y4(t) is the random stochastic component. The first two terms are deterministic and can be identified and quantified fairly easily; the last two are stochastic and cannot easily be identified and quantified.

4.1 Types of Hydrologic Time Series

Stationary and nonstationary series: If the statistics of the sample (mean, variance, covariance, autocorrelation, etc.) do not change with time or length of the sample, then the time series is said to be stationary to the second-order moment, weakly stationary, or stationary in a broad sense. Otherwise, it is a nonstationary series, i.e., if a definite trend is discernible in the series or there is periodicity in a series, then the time series is nonstationary.

Generally, annual hydrologic time series are considered to be stationary, although this assumption may not be strictly correct due to large-scale climate changes and human activities. On the other hand, hydrologic time series defined at time scales smaller than a year, such as monthly and daily series, are typically nonstationary.

White noise series: For a stationary time series, if the process is purely random and stochastically independent, the time series is called a white noise series. It is the simplest example of a stochastic process. Such processes contain no memory by construction, that is, for every t, element Xt is independent of every other element in the process.

Gaussian time series: A Gaussian random process is a process (not necessarily stationary) of which all random variables are normally distributed, and of which all simultaneous distributions of random variables of the process are normal. When a Gaussian random process is weakly stationary, it is also strictly stationary, since the normal distribution is completely characterized by its first- and second-order moments.

4.2 Time Series Models for Stationary Data

A time series model is an empirical model for stochastically simulating and forecasting the behavior of uncertain hydrologic systems. Time series models include stochastic models for a purely random time series with known distribution, for stationary time series, e.g., autoregressive (AR) models, moving average (MA) models, autoregressive moving average (ARMA) models, as well as for nonstationary time series, e.g., autoregressive integrated moving average (ARIMA) models, Thomas-Fiering model, etc.

4.2.1 Time Series Models for Purely Random Series with Known Probability Distribution

Possibly, the simplest stochastic process to model is where the events can be assumed to occur at discrete times with the time between events constant, the events at any time are independent of the events at any other time, and the probability distribution of the event is known. Stochastic generation from a model of this type merely amounts to generating a sample of random observations from a univariate probability distribution.

Example: If Xt is a white noise series and normally distributed, i.e., \( {X}_t\sim N\left({\mu}_x,{\sigma}_x^2\right) \), then the model can be

$$ {X}_t={\mu}_x+{\sigma}_xZ $$
(18)

where μx and σx are the mean and standard deviation of the Xt series, and Z ~ N(0, 1) is a random series having standard normal distribution, which can be generated by Monte Carlo simulation.

4.2.2 Autoregressive Models

The autoregressive models are used to model stationary time series when persistence (memory) is present. The general form of a pth-order autoregressive model, also called Markov type model, AR(p), is

$$ {\displaystyle \begin{array}{ll}{y}_t& =\mu +{\beta}_1\left({y}_{t-1}-\mu \right)+{\beta}_2\left({y}_{t-2}-\mu \right)+\cdots +{\beta}_p\left({y}_{t-p}-\mu \right)+{\varepsilon}_t\\ {}& =\mu +\sum \limits_{i=1}^p{\beta}_i\left({y}_{t-i}-\mu \right)+{\varepsilon}_t\end{array}} $$
(19)

where μ is the mean value of the series, p is the order of AR model, written as AR(p), βt are the regression coefficients, and εt are the noise or prediction error, normally assumed as \( N\left(0,{\sigma}_{\varepsilon}^2\right) \). There are p+2 parameters to be estimated: β1 , β2 ,  …  , βp, μ, and \( {\sigma}_{\varepsilon}^2 \), the variance of residuals.

The most frequently encountered AR processes are of first or second order, and AR(0) process is white noise.

The equation for the first-order autoregressive model is:

$$ {y}_t=\mu +{\beta}_1\left({y}_{t-1}-\mu \right)+{\varepsilon}_t $$
(20)

Parameters β1, μ, and σε of the model are estimated using the Yule-Walker equations. The relation between regression coefficient β and autocorrelation coefficient ρ is written as:

$$ {\rho}_k=\sum \limits_{j=1}^P{\beta}_j{\rho}_{k-j} $$
(21)

where ρk is the autocorrelation coefficient. The parameters are estimated as:

$$ {\widehat{\beta}}_1={\rho}_1;{\widehat{\sigma}}_{\varepsilon}^2={\sigma}_y^2\left(1-{\beta}_1^2\right);\widehat{\mu}=\overline{y}=\frac{1}{n}\sum \limits_{i=1}^n{y}_i $$
(22)

The procedure for generating a series of values for yt using AR(1) model is:

  • Estimate μy, σy, and β1 by \( \overline{y}={\mu}_y \), sy = σy, and r1 = β1, respectively, and \( {\sigma}_{\varepsilon}^2={\sigma}_y^2\left(1-{\beta}_1^2\right) \)

  • Select a zt at random from an N(0, 1) distribution

  • Select an initial value for yt − 1

  • Calculate yt based on \( \overline{y} \), sy, and β1 , and yt − 1 by

$$ {y}_t={\mu}_y+{\beta}_1\left({y}_{t-1}-{\mu}_y\right)+{z}_t{\sigma}_y\sqrt{\left(1-{\beta}_1^2\right)} $$
(23)
  • Delete the first 50 values to get rid of the influence of the initial values

4.2.3 Moving-Average Models

The moving-average model of order q process, denoted by MA(q), is formulated as follows:

$$ {y}_t={\mu}_y+{\varepsilon}_t+{\theta}_1{\varepsilon}_{t-1}+{\theta}_2{\varepsilon}_{t-2}+\cdots +{\theta}_q{\varepsilon}_{t-q} $$
(24)

where εt is a white noise with \( {\varepsilon}_t\sim N\left(0,{\sigma}_{\varepsilon}^2\right) \); θi are parameters of order q, i.e., parameter θk = 0 for k > q.

The above equation gives the definition of the MA model: For a white noise or purely random series, εt is assumed to be normally distributed with zero mean and constant standard deviation. That is, a moving average model is conceptually a linear regression of the current value of the series against the previous (unobserved) white noise error terms or random shocks.

4.2.4 ARMA Models

Autoregressive moving-average (ARMA) models, sometimes called Box-Jenkins models (Box and Jenkins 1976), consist of two parts, an autoregressive (AR) part and a moving-average (MA) part. The model is usually then referred to as the ARMA(p, q) model, where p is the order of the autoregressive part and q is the order of the moving-average part.

In this case, xt is a mixed process where the output is a function of past outputs and current/past inputs

$$ {x}_t=c+\sum \limits_{t=1}^p{x}_{t-i}{\beta}_i+{\varepsilon}_t+\sum \limits_{j=1}^q{\theta}_j{\varepsilon}_{t-j} $$
(25)

All notations have the same meaning as before. The error terms εt are generally assumed to be independent identically distributed random variables (i.i.d.) sampled from a normal distribution with zero mean: \( {\varepsilon}_t\sim N\left(0,{\sigma}_{\varepsilon}^2\right) \), where \( {\sigma}_{\varepsilon}^2 \) is the variance of the error.

There are p+q+2 parameters (βi, i = 1,  … , p; θi, i = 1,  … , q; c; σε). Some formulations transform the series by subtracting the mean of the series from each data point. This yields a series with a mean of zero. Whether one needs to do this or not is dependent on the software one uses to estimate the model.

In practice, the ARMA(1,1) model is often used:

$$ {x}_t={\beta}_1{x}_{t-1}+{\varepsilon}_t+{\theta}_1{\varepsilon}_{t-1} $$
(26)

Parameters \( {\beta}_1,{\theta}_1,\mathrm{and}\ {\sigma}_{\varepsilon}^2 \) can be estimated by solving the following equations:

$$ {\rho}_1=\frac{\left({\beta}_1-{\theta}_1\right)\left(1-{\theta}_1{\beta}_1\right)}{1+{\theta}_1^2-2{\beta}_1{\theta}_1} $$
(27)
$$ {\rho}_2={\beta}_1{\rho}_1 $$
(28)
$$ {\sigma}_{\varepsilon}^2=\frac{\left(1-{\beta}_1^2\right){\sigma}_x^2}{1-2{\beta}_1{\theta}_1+{\theta}_1^2} $$
(29)

4.3 Nonstationary Time Series Models

4.3.1 ARIMA Models

The acronym ARIMA stands for "Auto-regressive integrated moving average ." The ARMA models are suitable for data with the following two basic characteristics: (1) no apparent deviation from stationary assumption and (2) rapidly decreasing autocorrelation function. If these conditions are not met by a time series, a proper transformation should be performed to generate the time series with the above two conditions satisfied. This has usually been achieved by differencing, which is the essence of ARIMA models (Karamouz et al. 2012). It should be noted that the ARIMA models are nonstationary and cannot be used for synthetic generation of stationary time series, but they can be used for forecasting (Salas et al. 1980; Weeks and Boughton 1987).

4.3.2 First-Order Markov Process with Periodicity: Thomas-Fiering Model

The first-order Markov model of the previous section assumes that the process is stationary in its first three moments. It is possible to generalize the model so that the periodicity in hydrologic data is accounted for to some extent. The main application of this generalization has been in generating monthly streamflow where pronounced seasonality in monthly flows exists. In its simplest form, the method consists of the use of 12 linear regression equations. If, say, 30 years of records are available, the thirty January flows and the thirty December flows are abstracted and the January flow is regressed upon the December flow; similarly, the February flow is regressed upon the January flow, and so on for each month of the year.

$$ {\displaystyle \begin{array}{l}{q}_{\mathrm{Jan}}={\overline{q}}_{\mathrm{Jan}}+{b}_{\mathrm{Jan}}\left({q}_{\mathrm{Dec}}-{\overline{q}}_{\mathrm{Dec}}\right)+{\varepsilon}_{\mathrm{Jan}}\\ {}{q}_{\mathrm{Feb}}={\overline{q}}_{\mathrm{Feb}}+{b}_{\mathrm{Feb}}\left({q}_{\mathrm{Jan}}-{\overline{q}}_{\mathrm{Jan}}\right)+{\varepsilon}_{\mathrm{Feb}}\\ {}\dots \dots \end{array}} $$

Fig. 5 shows a regression analysis of qj + 1 on qj, pairs of successive monthly flows for the months (j+1) and j over the years of record, where j = 1, 2, 3, …, 12 (January, February, …, December) and when j = 12, j+1= January of next year (there would be 12 such regressions). If the regression coefficient of month j+1 on j is bj, then the regression line values of a monthly flow, \( {\widehat{q}}_{j+1} \), can be determined from the previous month’s flow qj, by the equation:

$$ {\widehat{q}}_{j+1}={\overline{q}}_{j+1}+{b}_j\left({q}_j-{\overline{q}}_j\right) $$
Fig. 5
figure 5

Regression analysis of qj + 1 on qj

To account for the variability in the plotted points about the regression line reflecting the variance of measured data about the regression line, a further random component is added:

$$ {\varepsilon}_j=Z\cdot {S}_{j+1}\sqrt{\left(1-{r}_j^2\right)} $$

where sj + 1 is the standard deviation of flows in month j+1, rj is the correlation coefficient between flows in months j+1 and j throughout the record, and Z = N(0, 1) a normally distributed random deviate with zero mean and unit standard deviation. The general form may be written as

$$ {\widehat{q}}_{j+1,i}={\overline{q}}_{j+1}+{b}_j\left({q}_{j,i-1}-{\overline{q}}_j\right)+{Z}_{j+1,i}\cdot {S}_{j+1}\sqrt{\left(1-{r}_j^2\right)} $$
(30)

where bj = rj × sj + 1/sj. There are 36 parameters for the monthly model (\( \overline{q} \), r, and s for each month). Subscript j refers to a month. For monthly synthesis, j varies from 1 to 12 throughout the year. Subscript i is a serial designation from year 1 to year n. Other symbols are the same as mentioned earlier.

The procedure for using the model is as follows:

  1. 1.

    For each month, j = 1, 2,…, 12, calculate

  1. (a)

    Mean flow \( {\overline{q}}_j=\frac{1}{n}\sum \limits_i{q}_{j,i};\left(\ i=j,12+j,24+j,\dots \right) \)

  2. (b)

    Standard deviation \( {S}_j=\sqrt{\frac{\sum \limits_i{\left({q}_{j,i}-{\overline{q}}_j\right)}^2}{n-1}} \)

  3. (c)

    The correlation coefficient with flow in the preceding month,

$$ {r}_j=\frac{\sum \limits_{i=1}\left({q}_{j,i}-{\overline{q}}_j\right)\left({q}_{j+1,i}-{\overline{q}}_{j+1}\right)}{\sqrt{\sum \limits_i\left({q}_{j,i}-{\overline{q}}_j\right)\sum \limits_i\left({q}_{j+1,i}-{\overline{q}}_{j+1}\right)}} $$
  1. (d)

    The slope of the regression equation relating the month’s flow to flow in the preceding month:

$$ {b}_j={r}_j\frac{S_{j+1}}{S_j} $$
  1. 2.

    The model is then a set of 12 regression Eq. (30)

$$ {\widehat{q}}_{j+1,i}={\overline{q}}_{j+1}+{b}_j\left({q}_{j,i}-{\overline{q}}_j\right)+{Z}_{j+1,i}\cdot {s}_{j+1}\sqrt{\left(1-{r}_j^2\right)} $$

where Z is a random normal deviate N(0, 1)

  1. 3.

    To generate a synthetic flow sequence, calculate (generate) a random number sequence {Z1, Z2, …}, and substitute in the model

4.3.3 ANMAX Model

Autoregressive moving-average models with exogenous inputs are denoted by ANMAX (p, q, b), which shows a model with p autoregressive terms (AR(p)), q moving-average terms (MA(q)), and b exogenous input terms as a linear combination of the last b terms of a known and external time series dt (Bailite 1980). The model formulation is as follows:

$$ {y}_t={\varepsilon}_t+\sum \limits_{i=1}^p{\beta}_i(p){y}_{t-i}+\sum \limits_{i=1}^q{\theta}_i{\varepsilon}_{t-i}+\sum \limits_{i=1}^b{\eta}_i{d}_{t-i} $$
(31)

where parameters η1 ,  …  , ηb are related to the selected exogenous input. These models can be successfully utilized in cases where the historical data cannot completely cover the variations and behavior of the studied variables.

4.3.4 ARCH Model

Volatility (i.e., time-varying variance) clustering, in which large changes tend to follow large changes and small changes tend to follow small changes, has been well recognized in time series analysis. This phenomenon is called conditional heteroscedasticity, and can be modeled by ARCH-type (AutoRegressive Conditional Heteroscedasticity, ARCH) model s, including the ARCH model introduced by Engle (1982) and their GARCH (the generalized ARCH) extension proposed by Bollerslev (1986). In these models, the key concept is the conditional variance, i.e., the variance conditional on the past.

4.3.5 Disaggregation Model

Disaggregation models are used to decompose time series into several subseries that are temporal or spatial fractions of the key time series. Valencia and Schaake (1973) and later extension by Mejia and Rousselle (1976) introduced the basic disaggregation model for temporal disaggregation of annual time series into seasonal time series. Disaggregation models of hydrologic time series are efficient techniques for cases where the preservation of statistical characteristics of both annual and seasonal scales is essential for the project under study. Most applications of disaggregation have been in the temporal domain, although some investigators have applied the same principle in the spatial domain.

4.4 Summary

Time series models are now a major tool in planning, operation, and decision making in hydrology and water resources. On one hand, time series models possess many appealing features. First, they can be used to model a time series without considering its physical nature. Second, they can be used to extrapolate past patterns of behavior into the future. They allow a researcher, who has data only in past years, to forecast future events without having to search for other related time series data. Third, the time series approach also allows for the use of one time series to explain the behavior of another series, if the other time series data are correlated with a variable of interest and if there appears to be some cause for this correlation. On the other hand, some time series models, like ARIMA, are complex techniques, and require a great deal of experience and data. Although they often produce satisfactory results, those results depend on the researcher’s level of expertise.

Machiwal and Jha (2006) reviewed both theoretical and applied research of time series models in the hydrological science. In hydrologic studies, time series models have been widely applied for detecting climatic changes (e.g., Kite 1989), investigating the long-term hydroclimatological trends (Lachtermacher and Fuller 1994), exploring the possible impact of climate change on hydrologic variables or water resources (Westmacott and Burn 1997), modeling precipitation (Janos et al. 1988), evapotranspiration (Mohan and Arumugam 1995), streamflow (Moatmari et al. 1999; Pekarova and Pekar 2006; Shao et al. 2009), groundwater (Houston 1983; Van Geer and Zuur 1997), drought (Mishra and Desai 2005), water quality (Ahmad et al. 2001), and water demand and consumption (Bougadis et al. 2005; Jorge 2007), etc.

5 Artificial Neural Network (ANN) Models

An artificial neural network (ANN) is a biologically inspired distributed computing processor system in parallel with certain performance characteristics resembling biological neural networks of the human brain, which differs from conventional computers in the way they process information (Haykin 1994). It has a distributed processing structure (Alp and Cigizoglu 2007) and consists of processing elements and connections between them with coefficients bound to the connections. Mathematically, ANNs may be treated as a universal approximator. They are able to extract the relation between inputs and outputs of a process without the physics being explicitly provided to them and to generalize the structure hidden within the whole dataset. ANN models are able to simulate nonlinear relationships through an automatic “training process” (Hsu et al. 1997). The ANN models have no limitations in the form of fixed assumptions or formal constraints and are faster compared with its conventional simulation methods, robust in noisy environments, flexible in many problems, and highly adaptive to the newer environments (Jain et al. 1999). There have been many standard ANN software that can be used to pursue intricate multipurpose nonlinear solutions.

5.1 Structure of ANN

ANNs are a computational model and have been developed as a generalization of mathematical models of human cognition or neural biology. An ANN is based on the following rules:

  • Information processing occurs at many single elements called nodes, also referred to as units, cells, or neurons

  • Signals are passed between nodes through connection links

  • Each connection link has an associated weight that represents its connection strength

  • Each node typically applies a nonlinear transformation called an activation function to its net input to determine its output signal

According to the absence or presence of feedback connections in a network, two types of architecture are distinguished: feedforward architecture and feedback architecture. A typical feedforward multilayer artificial neural network with a single hidden layer is illustrated in Fig. 6 (Friedman and Kandel 1999; Xiong et al. 2004).

Fig. 6
figure 6

A feedforward multilayer neural network with a single hidden layer (modified from Friedman and Kandel, 1999)

This kind of ANNs can solve a wide variety of problems, such as classifying patterns, storing and recalling data, performing general mapping from input pattern (space) to output pattern (space), grouping similar patterns, or finding solutions to constrained optimization problems. It consists of input nodes \( {\left\{{\mathrm{X}}_{\mathrm{i}}\left(\mathrm{p}\right)\right\}}_{i=1}^n \) (and one input to the neuron, called a bias, has a constant value of 1 and is usually represented as a separate input), hidden nodes \( {\left\{{Z}_j(p)\right\}}_{j=1}^l \) (and a bias), and output nodes \( {\left\{{Y}_k(p)\right\}}_{k=1}^m \), where X, Z, and Y represent the input, hidden, and output layers, respectively, n, l, m represent the number of the nodes in each layer, and p denotes the training pattern. The weights associated with the connections between input and hidden nodes are denoted by vij, 0 ≤ i ≤ n, 1 ≤ j ≤ l. Those between the hidden and the output nodes are denoted by wjk, 0 ≤ j ≤ l, 1 ≤ k ≤ m.

For node Zj in the hidden layer (Fig. 6), its effective aggregated input signal, denoted by z_inj, is calculated as:

$$ {z}_{\_}{in}_j={v}_{0j}+\sum \limits_{i=1}^n{v}_{ij}{x}_i,1\le j\le l $$
(32)

where xi, 1 ≤ i ≤ n represents the input to each node in the input layer.

For node Zj, its corresponding output signal, denoted by zj, is obtained by using an activation function f(x)

$$ {z}_j=f\left({z}_{\_}{in}_j\right),\quad 1\le j\le l $$
(33)

The most widely used activation function is the sigmoid function (Friedman and Kandel 1999). The sigmoid function is a bounded, monotonic, nondecreasing function that provides a graded and nonlinear response. Among several different sigmoid functions, the one most often used for ANNs is the logistic function

$$ {z}_j=f\left({z}_{\_}{in}_j\right)=\frac{1}{1+{e}^{-\sigma \cdot {z}_{\_}{in}_j}} $$
(34)

where σ is an adjustable parameter used in the activation function f(x). This function enables a network to map any nonlinear process.

5.2 Network Training

In order for an ANN to generate an output vector Y = (y1, y2,  … , yP) that is as close as possible to the target vector T = (t1, t2,  … , tp), a training process, also called learning, is employed to find optimal weight matrices vij and wjk, which minimize a predetermined error function that usually has the form:

$$ E=\sum \limits_p\sum \limits_l{\left({y}_i-{t}_i\right)}^2 $$
(35)

Here, ti is a component of the desired output T; yi is the corresponding ANN output; l is the number of output nodes; and p is the number of training patterns. Training is a process by which the connection weights of an ANN are adapted through a continuous process of stimulation by the environment in which the network is embedded. The learning ability of a neural network is achieved by applying a learning (training) algorithm.

Training algorithms are mainly classified into three groups (Kasabov 1996):

  1. (1)

    Supervised. The training examples comprise input vectors x and the desired output vectors y. Training is performed until the neural network “learns” to associate each input vector x to its corresponding and desired output vector y; for example, a neural network can learn to approximate a function y = f(x) represented by a set of training examples (x, y). It encodes the examples in its internal structure.

  2. (2)

    Unsupervised. Only input vectors x are supplied; the neural network learns some internal features of the whole set of all the input vectors presented to it.

  3. (3)

    Reinforcement learning, sometimes called reward-penalty learning, is a combination of the above two paradigms; it is based on presenting input vector x to a neural network and looking at the output vector calculated by the network. If it is considered “good,” then a “reward” is given to the network in the sense that the existing connection weights are increased; otherwise, the network is “punished,” the connection weights, being considered as “not appropriately set,” decrease. Thus, reinforcement learning is learning with a critic, as opposed to learning with a teacher.

Learning is not an individual ability of a single neuron. It is a collective process of the whole neural network and a result of the training procedure. The connection weight matrix W has its meaning as a global pattern. It represents “knowledge” in its entirety. We do not know exactly how learning is achieved in the human brain. But learning (supervised or unsupervised) can be achieved in an artificial neural network. And there are some genetic laws of learning which have been discovered and implemented.

After training has been accomplished, it is hoped that the ANN will then be capable of generating reasonable results given new inputs. In contrast, an unsupervised training algorithm does not involve a teacher. During training, only an input data set is provided to the ANN that automatically adapts its connection weights to cluster those input patterns into classes that have similar properties. There are occasions when a combination of these two training strategies leads to reinforcement learning. A score or grade is used to rate the network performance over a series of training patterns. Most hydrologic applications have utilized supervised training. The manner in which the nodes of an ANN are structured is closely related to the algorithm that is used to train it.

5.3 Summary

ANNs have been utilized in many hydrologic problems and to evaluate if indeed all the strengths of ANNs have been effectively utilized in these applications. These appilications include streamflow simulation (Shamseldin 1997; Hsu et al. 1995; Kişi, 2007; Wu and Chau 2011; He et al. 2014; Abrahart and See 2000; Aziz et al. 2014), water quality modeling (Rogers and Dowla 1994; Abyaneh 2014), ground water modeling (Aziz et al. 1992; Daliakopoulos et al. 2005), reservoir operation (Raman and Chandramouli 1996), water resources allocation and management (Raman and Sunilkumar 1995), evaporation estimation (Kumar et al. 2002; Shiri et al. 2014), hydrograph generation from hydrometeorological parameters (Ahmad and Simonovic 2005), impact of climatic variations on flow discharge and dissolved organic carbon and nitrogen contents (Clair and Ehrman 1998), etc.

Zealand et al. (1999) claim that ANNs have the following beneficial model characteristics: (1) They infer solutions from data without prior knowledge of the regularities in the data. (2) ANNs are able to adapt to solutions over time to compensate for changing circumstances. (3) ANNs can generalize from previous examples to new ones, which is useful because real-world data are noisy, distorted, and often incomplete. (4) ANNs are also good at the abstraction of essential characteristics from inputs containing irrelevant data. (5) They are nonlinear, i.e., they can solve some complex problems more accurately than do linear techniques. (6) ANNs are highly parallel, containing many identical, independent operations that can be executed simultaneously, often making them faster than alternative methods.

However, ANNs also have several drawbacks for some applications. (1) Most of the ANN applications have been unable to explain the basic process in a comprehensibly meaningful way by which ANNs arrive at a decision. (2) When there is no learnable function or the data set is insufficient in size, they may fail to produce a satisfactory solution. (3) The optimum network geometry as well as the optimum internal network parameters are problem dependent and generally have to be found using a trial-and-error process. (4) The performance of an ANN deteriorates rapidly when the input vectors are far from the space of inputs used for training. (5) ANNs cannot cope with major changes in the system, because they are trained on historical data sets.

6 Fuzzy Logic Models

Fuzzy logic models are based on fuzzy logic system (Kruse et al. 1994; Klir and Yuan 1995; Kasabov 1996; Zimmermann 2001). Fuzzy logic systems, or fuzzy systems, are knowledge-based or rule-based systems. A fuzzy system is constructed from a set of fuzzy IF-THEN rules. A fuzzy IF-THEN rule is an IF-THEN statement in which some words are characterized by continuous membership functions (Wang 1997). For example, the following is an IF-THEN rule:

$$ \mathrm{IF}\ x\ \mathrm{is}\ A,\mathrm{THEN}\ y\ \mathrm{is}\ B $$

The functioning of fuzzy systems is based on fuzzy sets theory (Zadeh 1965). The fuzzy sets theory, as an extension of the classical sets theory, are generally used to describe imprecision or vagueness. By translation into fuzzy IF-THEN rules, subjective knowledge can be incorporated in fuzzy logic systems in a natural and transparent way. Furthermore, the major strength of fuzzy logic systems resides in their ability to infer the behaviour of complex systems purely from data (data-driven), but still providing some insight about their internal operation. Finally, fuzzy systems are flexible modeling tools, as their architecture and the inference mechanisms can be adapted to the given modeling problem.

Fuzzy logic models consist of three steps, taking inputs, applying fuzzy rules, and producing outputs. Inputs to a fuzzy system can be either exact, crisp values, or fuzzy values. Output values from a fuzzy system can be fuzzy or exact (crisp). The process of transforming a single crisp value into a fuzzy value is called fuzzification, while the process of transforming a fuzzy value into a single crisp value is called defuzzification.

6.1 Basic Concepts of Fuzzy Systems

6.1.1 Fuzzy Sets and Membership Functions

Fuzzy sets, which may be generally used to describe imprecision or vagueness, have firstly been introduced by Zadeh (1965). Fuzzy sets are sets of objects without clear boundaries; in contrast with ordinary sets where for each object it can be decided whether it belongs to the set or not, a partial membership in a fuzzy set is possible. The traditional way of representing elements x of a set A is through the characteristic function:

  • μA(x) = 1, if x is an element of set A, and

  • μA(x) = 0, if x is not an element of set A,

that is, an object either belongs or does not belong to a given set.

An object can belong to a set partially in fuzzy sets. The degree of membership is defined through a generalized characteristic function called membership function:

$$ {\mu}_A(x):U\to \left[0,1\right] $$

where U is called the universe and A is a fuzzy subset of U.

The values of the membership function are real numbers in the interval [0, 1], where 0 means that the object is not a member of the set and 1 means that it belongs entirely to the set. Each value of the function is called a membership degree. One way of defining a membership function is through an analog function.

Fig. 7 (Kasabov 1996) shows three membership functions representing three fuzzy sets labeled as “short,” “medium,” and “tall,” all of them being fuzzy values of a variable “height.” As we can see, the value 174 cm belongs to the fuzzy set “medium” to a degree of 0.6 and at the same time to the set “tall” to a degree of 0.4.

Fig. 7
figure 7

Membership functions of representative three fuzzy sets for the variable “height” (modified from Kasabov, 1996)

To take another familiar example (Bárdossy et al. 1995), the set of young persons is fuzzy, as there is no generally accepted boundary between young and not young. The membership function of this set A may be defined as

$$ {\mu}_A(x)=\left\{\begin{array}{ccc}1\hfill & \mathrm{if}\ x\le 25& \\ {}\frac{40-x}{15}& \qquad \mathrm{if}\ 25<x\le 40& \\ {}0\hfill & \mathrm{if}\ x>40& \end{array}\right. $$

Fuzzy set theory can be considered as an extension of ordinary set theory; compared to the classical set theory, fuzzy set theory is very flexible in describing the features of objects; it has advantages in expressing vague, uncertain, and imprecise information, which appears frequently in scientific and engineering fields (Zadeh 1965; Zimmermann 2001).

6.1.2 Fuzzy Rules

A fuzzy rule consists of a set of arguments in the form of fuzzy sets with membership functions and a response also in the form of a fuzzy set. For a general input vector, the rule is applied as:

$$ \mathrm{If}\ {a}_1\mathrm{is}{A}_{i, 1}\odot {a}_2{\displaystyle \begin{array}{c}\mathrm{is}\end{array}}{\displaystyle \begin{array}{c}{A}_{i,2}\end{array}}\odot \dots \odot {a}_k{\displaystyle \begin{array}{c}\mathrm{is}\end{array}}{\displaystyle \begin{array}{c}{A}_{i,k}\end{array}},\mathrm{then}\ {\displaystyle \begin{array}{c}{B}_i\end{array}} $$

where ⊙ is any logical operator, specified according to the application. Usually rules are formulated using AND/OR operators. For example, in modeling moisture movement in an unsaturated zone (Dou et al. 1999), it may consist of two premises (i.e. k = 2). Ai , 1 may correspond to a class of upper layer moisture content (e.g., low, medium, high), and Ai , 2 to a class of lower layer moisture content (e.g., very low, medium, high, saturated), and the response \( {B}_i \) may be the actual quantity of water flux between the two layers.

6.2 Operations with Fuzzy Sets

Fuzzy set theory can be considered as an extension of ordinary set theory, i.e., the classical sets are a special case of fuzzy sets, when two membership degrees only, 0 and 1, are used, and crisp borders between the sets are defined. The following operations over two fuzzy sets A and B defined over the same universe U are the most common in fuzzy theory (Zadeh 1965; Zimmermann 2001).

Containment, A ⊂ B

A is contained inB (or, equivalently, A is a subset of B, or A is smaller than or equal to B) if and only if μA ≤ μB. In symbols

$$ A\subset B\iff {\mu}_A\le {\mu}_B $$

Intersection, A ∩ B

The intersection of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, written as C = A ∩ B, whose membership function is related to that of A and B by

$$ {\mu}_C(x)=\min \left[{\mu}_A(x),{\mu}_B(x)\right],\quad x\in U $$

or in abbreviated form

$$ {\mu}_C(x)={\mu}_A(x)\wedge {\mu}_B(x),\quad x\in U $$

Union, A ∪ B

The union of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, written as C = A ∪ B, whose membership function is related to that of A and B by

$$ {\mu}_C(x)=\max \left[{\mu}_A(x),{\mu}_B(x)\right],\quad x\in U $$

or in abbreviated form

$$ {\mu}_C(x)={\mu}_A(x)\vee {\mu}_B(x),\quad x\in U $$

Equality, A = B

Two fuzzy sets A and B are equal, written as A = B, if and only if

$$ {\mu}_A(x)={\mu}_B(x),\quad x\in U $$

Complement

The complement of a fuzzy set A is denoted by A and is defined by

$$ {\mu}_{A^{\prime }}(x)=1-{\mu}_A(x),\quad x\in U $$

Concentration, CON(A)

$$ {\mu}_{\mathrm{CON}(A)}(x)={\left({\mu}_A(x)\right)}^2,\quad x\in U $$

this operation is used as a linguistic modifier “very”

Dilation, DIL(A)

$$ {\mu}_{\mathrm{DIL}(A)}(x)={\left({\mu}_A(x)\right)}^{0.5},\quad x\in U $$

this operation is used as a linguistic modifier “more or less.”

Algebraic product, A ⋅ B

The algebraic product of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, written as C = A ⋅ B, whose membership function is related to those of A and B by:

$$ {\mu}_C(x)={\mu}_A(x)\cdot {\mu}_B(x),\quad x\in U $$

Algebraic sum, A + B

The algebraic sum of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, written as C = A + B, whose membership function is related to thoses of A and B by

$$ {\mu}_C(x)={\mu}_A(x)+{\mu}_B(x),\quad x\in U $$

The De Morgan laws are valid for the algebraic sum and difference.

Bounded product

The bounded product of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, whose membership function is related to those of A and B by

$$ {\mu}_C(x)=\max \left[0,{\mu}_A(x)+{\mu}_B(x)-1\right],\quad x\in U $$

Bounded sum

The bounded sum two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, whose membership function is related to those of A and B by

$$ {\mu}_C(x)=\max \left[1,{\mu}_A(x)+{\mu}_B(x)\right],\quad x\in U $$

Bounded difference, A/ − /B

The bounded difference of two fuzzy sets A and B with respective membership functions μA(x) and μB(x) is a fuzzy set of C, written as C = A/ − /B, whose membership function is related to those of A and B by:

$$ {\mu}_C(x)=\min \left[0,{\mu}_A(x)-{\mu}_B(x)\right],\quad x\in U $$

Normalization, NORM(A)

$$ {\mu}_{\mathrm{NORM}(A)}(x)={\mu}_A(x)/\max \left\{{\mu}_A(x)\right\},\quad x\in U $$

The operations over fuzzy sets have some properties, for example, they are associative, commutative, and distributive, that is,

Associative: (a ∗ b) ∗ c = a ∗ (b ∗ c)

Commutative: a ∗ b = b ∗ a (not valid for the bounded difference)

Distributive: a ∗ (b ⋆ c) = (a ⋆ b) ∗ (a ⋆ c)

where ∗ and ⋆ denote any operations from those listed above.

6.3 Types of Fuzzy Systems

To construct a fuzzy system, first we need to obtain a collection of fuzzy IF-THEN rules from human experts or based on domain knowledge. The next step is to combine these rules into a single system. Different fuzzy systems use different principles for this combination. There are three types of fuzzy systems that are commonly found in the literature: (i) pure fuzzy systems, (ii) Takagi-Sugeno-Kang (TSK) fuzzy systems, and (iii) fuzzy systems with fuzzifier and defuzzifier.

6.3.1 Pure Fuzzy Systems

The main feature with the pure fuzzy system is that its inputs and outputs are fuzzy sets, whereas the inputs and outputs are real-valued variables in engineering systems (Fig. 8) (Wang 1997).

Fig. 8
figure 8

Basic configuration of pure fuzzy systems (modified from Wang, 1997)

Each single IF-THEN rule of pure fuzzy systems has the following general form:

Rule m: IF \( \left({x}_1\mathrm{is}\ {A}_{1,m}\right) \) AND \( \left({x}_2\mathrm{is}\ {A}_{2,m}\right) \)AND … AND\( \left({x}_k\mathrm{is}\ {A}_{k,m}\right) \) THEN y is…expressing the relation between k input variables x1 , x2 ,  …  , xk and output y. Terms Ak , m in the antecedents of the rules (i.e., the IF part of the rules) represent fuzzy sets (Zadeh 1965) used to partition the input space into overlapping regions.

6.3.2 Takagi-Sugeno-Kang (TSK) Fuzzy Systems

In contrast to pure fuzzy systems, Takagi and Sugeno (1985) and Sugeno and Kang (1988) proposed another fuzzy system whose inputs and outputs are real-valued variables, named Takagi-Sugeno-Kang (TSK) fuzzy systems. The TSK fuzzy systems have the following general structure:

IF \( \left({x}_1\mathrm{is}\ {A}_{1,m}\right) \) AND \( \left({x}_2\mathrm{is}\ {A}_{2,m}\right) \) AND … AND \( \left({x}_k\mathrm{is}\ {A}_{k,m}\right) \) THEN y = fm(x1, x2,  … , xk)

Each fuzzy rule in a TSK fuzzy inference system can be regarded as a local model of the system under consideration. The functions fm are usually first-order polynomials, given by

$$ {f}_m\left({x}_1,{x}_2,\dots, {x}_k\right)={b}_{0,m}+{b}_{1,m}\cdot {x}_1+{b}_{2,m}\cdot {x}_2+\cdots +{b}_{k,m}\cdot {x}_k $$

Fig. 9 shows a schematic diagram of the functioning of a typical multiple-input single-output TSK fuzzy system (Jacquin and Shamseldin 2006). The first stage in the inference process of a TSK fuzzy model is the calculation of the degree of fulfilment (DOF) of each rule. The output of each rule is obtained by the evaluation of the corresponding function fm . Finally, the overall fuzzy model response is obtained as the weighted average of the individual rule responses.

Fig. 9
figure 9

Functioning of a multiple-input single-output Takagi-Sugeno-Kang fuzzy inference system (modified from Jacquin and Shamseldin 2006)

The degree of fulfilment of a rule evaluates the compatibility of a given input vector with the antecedent of the rule (i.e., the IF part). The degree of fulfilment is normally evaluated using a T-norm, such as the algebraic product:

$$ {\mathrm{DOF}}_m\left({x}_1,{x}_2,\dots, {x}_k\right)={\mu}_{A_{1,m}}\left({x}_1\right)\cdot {\mu}_{A_{2,m}}\left({x}_2\right)\cdot \, \cdots \, \cdot {\mu}_{A_{k,m}}\left({x}_k\right) $$

Several types of the membership functions can be used for the fuzzy sets in the antecedents of the rules (Zimmermann 2001; Piegat 2001). The Gaussian membership functions, which have the following analytical expression:

$$ {\mu}_{k,m}\left(\mathrm{x}\right)=\exp \left[-\frac{\left({x}_k-{c}_{k,m}\right)}{2{\sigma}_{k,m}^2}\right] $$

are a common choice (Chang et al. 2001; Gautam and Holz 2001; Xiong et al. 2001). In this case, each membership function has two parameters, namely the center ck , m and the spread σk , m.

Some problems with the Takagi-Sugeno-Kang fuzzy system are listed as follows: (i) its THEN part is a mathematical formula and therefore may not provide a natural framework to represent human knowledge, and (ii) there is not much freedom left to apply different principles in fuzzy logic, so that the versatility of fuzzy systems is not well represented in this framework.

6.3.3 Fuzzy Systems with Fuzzifier and Defuzzifier

In order to use pure fuzzy systems for simulating engineering systems, a simple method is to add a fuzzifier, which transforms a real-valued variable into a fuzzy set, to the input, and a defuzzifier, which transforms a fuzzy set into a real-valued variable, to the output. Thus, we get a fuzzy system with fuzzifier and defuzzifier, as shown in Fig. 10 (Wang 1997).

Fig. 10
figure 10

Basic configuration of fuzzy systems with fuzzifier and defuzzifier (modified from Wang, 1997)

6.4 Adaptive Neuro-Fuzzy Inference System (ANFIS)

During the past four decades, significant progress has been made in the two artificial intelligence techniques, i.e., fuzzy inference system (FIS) and artificial neural networks (ANNs). A judicious integration of FIS and ANN can produce a functional neural fuzzy system capable of learning, high-level thinking, and reasoning (Jang et al. 1997; Loukas 2001). It provides an effective approach for dealing with large imprecisely defined complex systems. An ANFIS works by applying neural learning rules to identify and tune the parameters and structure of an FIS.

A typical architecture of an ANFIS, in which a circle indicates a fixed node, whereas a square indicates an adaptive node, is shown in Fig. 11 (Jang et al. 1997). For simplicity, we assume that the examined FIS has two inputs and one output.

Fig. 11
figure 11

Architecture of the ANFIS (modified from Jang et al., 1997)

In Fig. 11, x and y are the two crisp inputs and Ai and Bi are the linguistic labels associated with the node function. For rainfall-runoff modeling in hydrology, the input and output nodes represent rainfall process and discharge observations, respectively.

The attractive features of an ANFIS include: easy to implement, fast and accurate learning, strong generalization abilities, excellent explanation facilities through fuzzy rules, and easy to incorporate both linguistic and numeric knowledge for problem solving (Jang et al. 1997). Due to these fascinating features of the ANFIS, it is widely used in hydrological science.

6.5 Summary

An important contribution of fuzzy system theory is that it provides a systematic procedure for transforming a knowledge base into a nonlinear mapping. On one hand, fuzzy systems are multi-input-single-output mappings from a real-valued vector to a real-valued scalar (a multi-output mapping can be decomposed into a collection of single-output mappings), and the precise mathematical formulas of these mappings can be obtained; on the other hand, fuzzy systems are knowledge-based systems constructed from human knowledge in the form of fuzzy IF-THEN rules.

The fields of hydrology and water resources commonly involve a system of concepts, principles, and methods for dealing with modes of reasoning that are approximate rather than exact. The capability of dealing with imprecision gives fuzzy logic great potential for hydrological analysis and water resources decision making.

In hydrology, the concept of fuzzy theory and its application have found many applications in a number of research areas, such as groundwater flow in the unsaturated zone (Bárdossy and Disse 1993; Bárdossy et al. 1995; Dou et al. 1995; Schulz and Huwe 1997; Dou et al. 1999; Hong et al. 2002; Afshar et al. 2007), the interdependence between global circulation and rainfall (Pongracz et al. 2001), reconstruction of missing precipitation events (Abebe et al. 2000; Coulibaly and Evora 2007), rainfall-runoff modeling (Yu and Yang 2000; Hundecha et al. 2001; Huang et al. 2010), flood forecasting (See and Openshaw 1999; Xiong et al. 2001), flood frequency analysis (Shu and Burn 2004), reservoir operation (Russell and Campbell 1996), water resources allocation and management (Yurdusev and Firat 2009), drought prediction (Pesti et al. 1996), and evaporation estimation (Cobaner 2011; Shiri et al. 2013, 2014).

The advantages of a fuzzy logic model include the following: (1) The model is not sensitive to parameter changes, and can be easily programmed, codes remain simple, short, and require little computer time. (2) The model is transparent and easy to understand due to its rule-based structure, which imitates the human way of thinking. (3) The fuzzy rule-based model can encode the expert’s knowledge. (4) The most distinguishing property of fuzzy logic is that it deals with fuzzy propositions, that is, propositions which contain fuzzy variables and fuzzy values; thus, the fuzzy systems are especially good at dealing with nonlinear relatioships. However, it must be reminded that fuzzy logic models, just like other types of the black-box models, can only describe the input-output relationships without explicit consideration of the internal hydrologic processes that lead to this transformation.

7 Frequency Analysis Models

Flood frequency estimation has been fundamental in engineering hydrology since Fuller (1914) approached the temporal variability of flood flows of extremely high return periods. The primary objective of frequency analysis is to relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions (Chow et al. 1988). The purpose of frequency analysis is to analyze past records of hydrologic variables so as to estimate future occurrence probabilities of extreme events.

The hydrologic data analyzed in frequency analysis are assumed to be independent and identically distributed (i.e., the i.i.d. assumption), if the hydrologic system producing them (e.g., a storm rain system) is considered to be stochastic, space-independent, and time-independent. The hydrologic data employed in frequency analysis should be carefully selected so that the assumptions of independence and identical distribution are satisfied. The data used in the analysis must also be evaluated in terms of the objectives, length of records available, and completeness of records. A frequency analysis can be performed using single-site data, regional data, or both. It can also include historical information and reflect physical constraints.

7.1 Graphical Method

Plotting position refers to the probability value assigned to each value in a random sample. It is used to calculate and graphically display the empirical frequency curve by plotting each ranked value against a probability scale. Numerous methods have been proposed for the determination of plotting positions, most of which are empirical. If n is the total number of values to be plotted and m is the rank of a value in a list ordered by descending magnitude, the exceedance probability of the mth largest value is xm; the general form of plotting positions can be written:

$$ P\left(X\ge {x}_m\right)=\frac{m-b}{n+1-2b} $$
(36)

where b is a parameter commonly varying between 0 and 0.5 for various formulae. For example, b = 0.5 for Hazen’s formula, b = 0.3 for Chegodayev’s formula. The most popular plotting position formula is the Weibull plotting position when b= 0, which is an expected (unbiased) form of the exceedance probability at the mth largest observation for all distributions.

The procedure of graphical method includes the following:

  • Select a Qmax value from each year

  • Arrange the data in a decreasing order, i.e., Q1Q2Q3

  • Assign a frequency/probability of exceedance to each Qi. The most common method is the Weibull formula:

$$ P\left(Q>{Q}_m\right)=\frac{m}{n+1} $$
(37)

where n is the total number of data and m is the order of Q. This means that m (Qmax) =1, and m (Qmin) = n.

  • Plot Q versus P(Q>Qm) or plot Q versus T = 1/P(Q>Qm), where T is return period

  • On millimeter paper (without distribution assumption) – fit a curve

  • On probability paper (with distribution assumption) – fit a straight line if the data fit the probability distribution as the probability paper presents

  • Knowing the probability of P(Q>QT) or T, the design flow QT can be read from the plot or knowing the magnitude of QT, we can read P(Q>QT) or T from the plot

7.2 Analytical Method – Frequency Factor Method

A random variable X can be decomposed into two components and written as:

$$ X=\overline{x}+\varDelta X $$
(38)

where ΔX is the deviation from the mean, \( \overline{x} \). A new quantity can be defined as: K = ΔX/s, where s is the standard deviation of the data, and the former equation can be rewritten as:

$$ X=\overline{x}+ sK $$
(39)

For a design value XT with return period of T, Eq. (39) can be written as

$$ {X}_T=\overline{x}+{K}_Ts\ \mathrm{or}\ {X}_T=\overline{x}\left(1+{C}_v{K}_T\right) $$
(40)

where Cv is the coefficient of variation, and KT is the frequency factor depending on the probability distribution being used and on the return period, T.

Equation (40) is the working equation for frequency analysis, which can be used to calculate the design value XT for given design level T and probability distribution. It can also be used to estimate the return period of a given X value.

Examples of design flow calculation for different probability distributions are presented below.

7.2.1 Example for Normal Distribution

If X is normally distributed, i.e., \( X\sim N\left(\overline{x},{\mathrm{s}}^2\right) \), from \( {X}_T=\overline{x}+{K}_Ts \) we get

$$ {K}_T=\frac{X_T-\overline{x}}{s} $$

That means KT is the standardized normal variate Z \( \left(Z=\frac{X-\mu }{\sigma}\right) \), i.e., KT = Z ~ N(0, 1). KT can then be read from the standard normal distribution table for a given T or calculated from the related equation by numerical methods.

7.2.2 Example for Lognormal Distribution

X is said to be lognormally distributed if Y = ln(X) is normally distributed with mean μY and variance \( {\sigma}_Y^2 \).

The procedure for calculating XT from log-normal distribution is:

  • Let yi = ln xi for all xi.

  • Calculate \( \overline{y}=\frac{1}{n}\sum {y}_i \) and \( {s}_y=\sqrt{\frac{\sum {\left({y}_i-\overline{y}\right)}^2}{n-1}} \)

  • Read KT from normal distribution table for a given T

  • Calculate \( {Y}_T=\overline{y}+{K}_T{s}_y \)

  • Calculate \( {X}_T={e}^{Y_T} \)

It should be noted that if only the mean and standard deviation \( \left(\overline{x},{s}_x\right) \) of a lognormally distributed variable X are available, then the mean and standard deviation \( \left(\overline{y},{s}_y\right) \) of the associated normally distributed variable Y=ln(X) are calculated as:

$$ \overline{y}=\ln \left(\frac{{\overline{x}}^2}{\sqrt{s_x^2+{\overline{x}}^2}}\right),{s}_y=\sqrt{\ln \left(\frac{s_x^2}{{\overline{x}}^2}+1\right)} $$

7.2.3 Extreme Value Type I Distribution

The KT value can either be calculated by using the equation below or read from the extreme value type I distribution table.

$$ {K}_T=-\frac{\sqrt{6}}{\pi}\left\{0.5772+\ln \left[\ln \left(\frac{T}{T-1}\right)\right]\right\},\Rightarrow T=\frac{1}{1-\exp \left\{-\exp \left[-\left(0.5572+\frac{\pi {K}_t}{\sqrt{6}}\right)\right]\right\}} $$
(41)

The design flow XT can then be calculated using Eq. (40).

7.2.4 Pearson Type III Distribution

The Pearson type III distribution has three parameters, λ, β, and ε, which can be estimated through the calculation of mean, standard deviation, and coefficient of skewness.

The procedure for the Pearson type III distribution is described as follows:

  • Compute the mean, \( \overline{x}=\frac{1}{n}\sum \limits_{i=1}^n{x}_i\to \lambda \)

  • Compute the standard deviation, \( s=\sqrt{\frac{\sum \limits_{i=1}^n{\left({x}_i-\overline{x}\right)}^2}{n-1}}\to \beta \)

  • Compute the coefficient of skewness, \( {C}_s=\frac{n\sum {\left({x}_i-\overline{x}\right)}^3}{\left(n-1\right)\left(n-2\right){s}^3}\to \varepsilon \)

  • Compute KT by Eq. (42) or read from the Table of KT values for P-III distribution

$$ {K}_T=z+\left({z}^2-1\right)k+\frac{1}{3}\left({z}^3-6z\right){k}^2-\left({z}^2-1\right){k}^3+{zk}^4+\frac{1}{3}{k}^5 $$
(42)

where K = Cs/6

$$ z=w-\frac{2.516+0.8029w+0.01033{w}^2}{1+1.4328w+0.1893{w}^2+0.00131{w}^3} $$
$$ w={\left[\ln \left(\frac{1}{p^2}\right)\right]}^{1/2} $$
$$ p=\frac{1}{T} $$
  • Compute \( {x}_T=\overline{x}+{K}_Ts \)

7.2.5 Log-Pearson Type III Distribution

Similar to the lognormal distribution, the procedure for log-Pearson type-III distribution is as follows:

  • Transform X to Y=log(X) or ln(X)

  • Compute the mean, y

  • Compute the standard deviation, sy

  • Compute the coefficient of skewness, Cs

  • Compute KT by Eq. (42) or read from the Table of KT values for P-III distribution

  • Compute \( {y}_T=\overline{y}+{K}_Ts \)

  • Compute \( {x}_T={10}^{y_T} \) or \( {x}_T={e}^{y_T} \)

7.3 Data Sampling Methods

Data for frequency analysis studies can be compiled in several ways. In general, there are three data sampling methods: Annual Maximum Series, Partial Duration Series, and Annual Exceedance Series.

Annual Maximum Series: The Annual Maximum Series (AMS) data consist of the largest event in each year, regardless of whether the second largest event in a year exceeds the largest events of other years. An objection to using the AMS is that, in many cases, the second largest event in a year exceeds the largest event of other years.

Partial Duration Series: A Partial Duration Series (PDS) is a series of data which are selected so that their magnitude is greater than a predefined base value (Chow et al. 1988). Partial duration series or peaks-over-threshold (POT) pick all peaks above a threshold.

Annual Exceedance Series: If the base value of the PDS is selected so that the number of values in the series is equal to the number of years of record, the series is called an Annual Exceedance Series (AES) (Chow et al. 1988). An AES may be regarded as a special case of the PDS. Although AES is useful for some purposes, it may be difficult to verify that all the observations are independent.

The return period TE of event magnitudes developed from an AES is related to the corresponding return period T for magnitudes derived from an AMS by (Chow 1964)

$$ {T}_E={\left[\ln \left(\frac{T}{T-1}\right)\right]}^{-1} $$
(43)

7.4 Outliers and Zeros

Outliers are data points that depart significantly from the trend of the remaining data. The retention or deletion of these outliers can significantly affect the magnitude of statistical parameters computed from the data. The U.S. Water Resources Council (1982) Bulletin 17B suggests that outliers can be identified from

$$ {X}_H=\overline{X}+{K}_n{S}_X $$
$$ {X}_L=\overline{X}-{K}_n{S}_X $$

where XH and XL are the threshold values for high and low outliers , and Kn can be approximated from

$$ {K}_n\approx 1.055+0.981\, {\log}_{10}n $$

where n is the number of observations.

The detailed description of the treatment of outliers is contained in the U.S. Water Resources Council (1982) Bulletin 17B.

Treatment of zeros: Most hydrologic variables are bounded on the left by zero. Zero values should not simply be ignored, nor do they necessarily reflect inaccurate measurements of the minimum flow in a channel. A zero in a set of data that is being logarithmically transformed requires special handling. One solution is to add a small constant to all of the observations. Another method is to analyze the nonzero values and then adjust the relation to the full period of record. This method biases the results as the zero values are essentially ignored. A third and theoretically more sound method would be to use the theorem of total probability (For details see Haan (2002, pp168–169)).

7.5 Regionalization

Two broad categories of regionalization procedures have been widely used in the field of frequency analysis: the index-flood approach (Dalrymple 1960) and the multiple-regression approach (Benson 1962).

7.5.1 Index-Flood Method

The key assumption of an index-flood procedure is that the stations form a homogeneous region, meaning that the frequency distributions of the N stations are identical apart from a site-specific scaling factor, the index flood. We may then write

$$ {Q}_i(F)={\mu}_iq(F),\quad i=1,2,\dots, N $$
(44)

where Qi(F), 0 < F < 1, is the quantile function of the frequency distribution at site i; μi is the index flood (Hosking and Wallis 1997); N is the number of sites; q(F) is the regional growth curve, a dimensionless quantile function common to every site.

The index flood is estimated by \( {\widehat{\mu}}_i={\overline{Q}}_i \), the sample mean of the data at site i, and the dimensionless rescaled data are computed by \( {q}_{ij}={Q}_{ij}/{\widehat{\mu}}_i \), where Qij is the observed data at site i, j = 1 , 2 ,  …  , ni, and ni is the sample size at site i.

Hosking and Wallis (1997) suggested an index-flood method where parameters are estimated separately at each site. They considered the use of a weighted average of the at-site estimates:

$$ {\widehat{\theta}}_k^R=\sum \limits_{i=1}^N{n}_i{\widehat{\theta}}_k^{(i)}/\sum \limits_{i=1}^N{n}_i $$
(45)

where \( {\widehat{\theta}}_k^{(i)} \) stands for the L-moment of interest. The estimated regional quantile \( \widehat{q}(F)=q\left(F;{\widehat{\theta}}_1^R,\dots, {\widehat{\theta}}_P^R\right) \) is obtained by substituting the estimates \( {\widehat{\theta}}_k^{(i)} \) into q(F) (Hosking and Wallis 1993). The quantile estimates at site i can be obtained using the estimates of μi and q(F).

$$ {\widehat{Q}}_i(F)={\widehat{\mu}}_i\widehat{q}(F) $$
(46)

This index-flood procedure makes the following assumptions: (i) observations at any given site are identically distributed, and independent both serially and spatially; (ii) frequency distributions at different sites are identical apart from a scale factor; and (iii) the mathematical form of the regional growth curve is correctly specified.

7.5.2 Regional Regression

Regional regression models have long been used to predict flood quantiles at ungauged sites, and in a nationwide test, this method did as well as or better than more complex rainfall-runoff modeling in predicting flood quantiles (Newton and Herrin 1982).

Consider the traditional log-linear model for a statistic yi which is to be estimated by using watershed characteristics such as drainage area and slope:

$$ {y}_i=\alpha +{\beta}_1\log \left(\mathrm{area}\right)+{\beta}_2\log \left(\mathrm{slope}\right)+\cdots +\varepsilon $$
(47)

A challenge in analyzing this model and estimating its parameters with available records is that one only obtains sample estimates, denoted \( {\widehat{y}}_i \), of the hydrologic statistics yi. Thus, the observed error ε is a combination of (1) the time-sampling error in sample estimators of yi (these errors at different sites can be cross-correlated if the records are concurrent) and (2) underlying model error (lack of fit) due to the failure of the model to exactly predict the true value of the yi’s at every site. Often these problems have been ignored and standard ordinary least squares (OLS) regression has been employed (Thomas and Benson 1970).

7.6 Summary

Frequency analysis has been one of the earliest and most frequent uses of statistical methods in hydrology. In the earlier years (before 1960s), frequency analysis was mainly used for flood flow estimation, and nowadays, frequency analysis has been applied to almost every hydrological extreme variable, such as floods (Vogel et al. 1993; Vogel and Wilson 1996), low flows (Nathan and McMahon 1990; Lawal and Watt 1996; Durrans and Tomic 2001), rainfall events of various kinds (Pilgrim 1998; Öztekin 2007; Stedinger et al. 1993), droughts and dry spells (Lana and Burgueno 1998; Lana et al. 2008; Hallack-Alegria and Watkins 2007), etc. Hydrological frequency analysis (HFA) has been playing an essential role in the planning, design, and management of projects for flood control and water usages.

As to regional frequency analysis, recent advances mainly refer to the use of L-moments together with the index-flood method, as reported by Hosking and Wallis (1997). This methodology has been applied in modeling floods, rainfall extremes, and low flows (Hosking et al. 1985; Vogel and Wilson 1996; Kjeldsen et al. 2001; Kumar et al. 2003; Yue and Wang 2004; Lim and Voeller 2009; Saf 2009).

The basic assumption of traditional HFA methods (both for one individual site and for a region) is that the hydrological data used are stationary, independent, and identically distributed over time. However, in the past decades, this stationarity assumption has been severely challenged because global climate change and/or large-scale human activities have altered the statistical characteristics of hydrological processes. Nonstationary frequency analysis is now a relatively new modeling approach and the number of studies is continuously increasing (Khaliq et al. 2006).

Studies of flood frequency analysis under nonstationary conditions have mostly assumed trends in time (Strupczewski et al. 2001; Renard et al. 2006; Leclerc and Ouarda 2007). In the last decade, some researchers have also explored the possibility of incorporating climate indices as external forcing into models for flood frequency analysis, assuming linear and nonlinear dependences (Sankarasubramanian and Lall 2003; El Aldouni et al. 2007; Ouarda and El-Aldouni 2011). Results have shown the feasibility of incorporating climate indices as covariates in the models, thus enabling the models to better describe changes in flood regimes over time.