Keywords

3.1 Introduction

Flood frequency analysis is a constant concern in the hydrological practice. The sizing of bridges, culverts and other facilities, the design capacities of levees, spillways and other control structures, and reservoir operation or management depend upon the estimated magnitude of various design flood values (ASCE 1996). Nowadays, the general methodology based on the univariate distribution is to derive the fitted distribution representing the probability of an annual maximum flood being exceeded (USWRC 1981; MWR 2006).

As the duration of gauged record rarely exceeds 50 years, estimates corresponding to high return period obtained from the systematic data alone are subject to large sampling errors. Furthermore, the existence of a cyclic variation over periods longer than the duration of the records might well introduce further bias (Leese 1973; Stedinger and Cohn 1986; Guo and Cunnane 1991). Therefore, to overcome the problem of relatively short data series for frequency analysis, the need to augment the flow record with historical is widely acknowledged in the hydrological community. Several methods for incorporating historical information into flood frequency studies have been suggested, including historically weighted moments, maximum likelihood, probability weighted moments and L-moments (USWRC 1982; Guo and Cunnane 1991; Hosking 1995).

The hydrologic extreme values and critical thresholds derived from complex hydrological events for engineering design are usually obtained from single site characteristics (e.g., annual maximum peak discharge). Therefore, conventional hydrological frequency analysis has also mainly focused on one characteristic value and univariate distributions that cannot provide a complete description of hydrologic events with multi-characteristics. Many hydrological frequency problems, such as design flood hydrograph that includes flood peak and flood volumes, should be solved by the multivariate distributions (Dupuis 2007; Xiao et al. 2008, 2009).

In this chapter, the multivariate frequency analysis has been carried out. One of the main difficulties in the multivariate quantile estimation is how to choose the proper combinations of design values of the concerned random variables for a given multivariate return period of hydrologic structure design. Take the bivariate case (peak discharge Q and flood volume W) as an example. The combinations can differ greatly regarding their values: moving along the multivariate quantile curve to an asymptote, one of the two variables will approach its marginal value, while the other tends to increase indefinitely (for unbounded random variables). Chebana and Ouarda (2011) proposed the decomposition of the level curve into a naive part (tail) and the proper part (central); they assumed that the naive part was composed of two segments starting at the end of each extremity of the proper part. Salvadori et al. (2011) introduced two basic design realizations, i.e., component-wise excess design realization and most-likely design realization. Li et al. (2016) used the conditional expectation combination method to derive the quantiles of flood peak and 7-day volume under different JRPs, and they found that the bivariate design values have smaller flood volume and larger flood peak than bivariate equivalent frequency combination results.

3.2 Annual Maximum Flood Frequency Analysis Based on Copula

Annual maximum (AM) flood series can be characterized by flood occurrence dates and flood magnitudes. The marginal distribution of flood occurrence dates, peak discharges, and flood volumes are established.

3.2.1 Margin Distribution of AM Flood Occurrence Dates

The AM flood occurrence dates can be described by the directional statistics (DS) method. The date firstly should be converted to the angle of a circle by

$$\alpha_{i} = D_{i} \frac{2\pi }{L}\quad 0\le \alpha_{i} \le 2\pi$$
(3.1)

where L is the length of flood season; Di is the flood occurrence date.

The x and y coordinates of the flood dates described by the angles is determined by

$$(a_{i} ,b_{i} ) = (\cos \,\alpha_{i} ,\sin \,\alpha_{i} )$$
(3.2)
$$\bar{a} = \sum\limits_{i = 1}^{n} {\cos x_{i} /n}$$
(3.3)
$$\bar{a} = \sum\limits_{i = 1}^{n} {\sin x_{i} /n}$$
(3.4)

where n is the sample size.

The mean direction of the circular data (denoted by \(\bar{\theta }\)) is estimated by

$$\overline{\theta } = \left\{ {\begin{array}{*{20}l} {\arctan \overline{b} /\overline{a} } \hfill & {\bar{a} > 0,\quad {\bar{\text{b}}} > 0} \hfill \\ {2\pi + \arctan \overline{b} /\overline{a} \, } \hfill & { \, \bar{a} > 0,\quad {\bar{\text{b}}} < 0} \hfill \\ {\pi + \arctan \overline{b} /\overline{a} } \hfill & {\bar{a} < 0} \hfill \\ {\pi / 2} \hfill & {\bar{a} = 0,\quad {\bar{\text{b}}} > 0} \hfill \\ { 3\pi / 2} \hfill & {\bar{a} = 0,\quad {\bar{\text{b}}} < 0} \hfill \\ {unkown} \hfill & {\bar{a} = 0,\quad {\bar{\text{b}}} = 0} \hfill \\ \end{array} } \right.$$
(3.5)

A measure of the variability of the flood occurrences about the mean date is determined by defining the mean resultant vector as:

$$\overline{r} = \sqrt {\overline{a}^{2} + \overline{b}^{2} } \quad 0 \le r \le 1$$
(3.6)

where \(\bar{r}\) describes the dispersion measure (Black and Werritty 1997).

Since the distribution of dates is on a circle, rather than along a line, the use of the normal distribution is no longer appropriate. Therefore, the von Mises distribution is introduced and used to describe seasonal data with a single peak.

Fisher (1993) termed the von Mises distribution as the “natural” analog of the normal distribution for seasonal data with a single peak. It is the most commonly used and has some similar characteristics to the normal distribution (Mardia 1972). The probability density function of von Mises distribution is given by:

$$f(x) = \frac{1}{{2\pi I_{0} (\kappa )}}\exp [\kappa \,\cos (x - \mu )]\;0 \le x \le 2\pi ,\;0 \le \mu \le 2\pi ,\;\kappa \ge 0$$
(3.7)

It is symmetric and unimodal, with a mean direction at μ and the dispersion given by a concentration parameter \(\kappa = A^{ - 1} (r)\)\(A^{ - 1} (r)\) is the inverse function of A\(I_{0} (\kappa )\) is the modified Bessel function of order zero. For large values of к, the distribution is concentrated around the mean. When к = 0, the density gives the uniform distribution on [0, 2].

3.2.2 Margin Distribution of AM Flood Peaks and Volumes

For the AM flood series, the Pearson type III (P-III) has been recommended by MWR (2006) as a uniform procedure for flood frequency analysis in China. The PDF of the P-Ш distribution is given in Table 1.1 of Chap. 1.

3.2.3 Bivariate Distribution of AM Flood Occurrence Dates and Magnitudes

For estimating the design flood, the bivariate joint distributions of AM flood occurrence dates and magnitudes (or flood peaks and volumes) need to be built. Every joint distribution can be written regarding a copula and its univariate marginal distributions. The copula is a function that links univariate marginal distribution functions to construct a multivariate distribution function. The definition and establishment of copulas can be seen in Chap. 2. The Gumbel copula is used to establish the joint distribution in this section.

3.2.4 Case Study

As an illustrative example, the Geheyan reservoir is selected as a case study. The Geheyan reservoir is a key control and multi-purpose water resources engineering project in the Qingjiang Basin, which is one of the main tributaries of the Yangtze River in China. The basin encompasses an area of 17,000 km2 with the annual average rainfall 1500 mm. The annual average discharge and runoff at dam site are 393 m3/s and 124 × 108 m3 (from 1951 to 2005), respectively. The flood season lasts for five months from 1 May to 30 September (153 days).

3.2.4.1 Computation of Empirical Probability

The empirical probabilities can be computed by the Gringorten plotting–position formula

$$P(j) = \frac{j - 0.44}{n + 0.12}$$
(3.8)

where P(j) is the cumulative frequency, indicating the probability that a given value is less than the jth smallest observation in the data set of n observations.

Observed joint probabilities are computed based on the same principle as in the case of a single variable. A two-dimensional table is constructed first in which the variables X and Y are arranged in ascending order. The joint cumulative frequency (non-exceedance joint probability) is then given by (Yue et al. 1999):

$$F(t_{k} ,q_{j} ) = P(X \le t_{k} ,Y \le q_{j} ) = \frac{{\sum\limits_{m = 1}^{k} {\sum\limits_{l = 1}^{j} {n_{m,\,l} - 0.44} } }}{n + 0.12}$$
(3.9)

3.2.4.2 Evaluation Criteria

A Chi-Square Goodness-of-fit test (\(\chi^{ 2}\)), mean Rbias and RRMSE are selected to test the fitting descriptive ability of flood frequency curve, which can be calculated by

$$\chi^{2} = \sum\limits_{i = 1}^{n} {\left( {P_{the} (i) - P_{emp} (i)} \right)^{2} /} P_{emp} (i)$$
(3.10)
$$Rbias = \frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\hat{Q}(i) - Q(i)} \right)} /Q(i)$$
(3.11)
$$RRMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {\frac{{\hat{Q}(i) - Q(i)}}{Q(i)}} \right)^{2} } }$$
(3.12)

where \(P_{the}\) and \(P_{emp}\) are the theoretical and empirical frequencies; and \(\hat{Q}(i)\) and \(Q(i)\) are the estimated and observed values, respectively.

3.2.4.3 Conditional Probability

The parameters of Von Mises and P-III distribution are estimated by L-moments method for given AM flood series of occurrence dates, peak discharges or volumes, respectively. A Chi-Square Goodness-of-fit test is performed to test the assumption, H0, that the flood occurrence dates and magnitudes follow the Von Mises and P-III distributions. Table 3.1 shows that the assumption cannot be rejected at the 0.5% significance level. It is shown that the values of Rbias and RRMSE are very small, which mean that the marginal distribution can fit data set very well.

Table 3.1 The goodness of fit and \(\chi^{2}\) test statistics

Table 3.2 lists the conditional probability of P(X > xp|Y > y1%) given xp. Under the condition of annual maximum flood magnitude Y > y1%, the probability corresponding occurrence date after May 27 is 98.45%, the probability of annual maximum flood occurred during May 27 to 29 is (98.45−29.86%) = 68.59%, and during July 18 to 29 is (81.16−75.29%) = 5.87%.

Table 3.2 Conditional probability of X given Y > y1%

3.2.4.4 Fitting Marginal Distributions

The marginal distribution frequency curves of flood peaks and 7-day flood volumes are shown in Fig. 3.1, in which the line represents the theoretical distribution, and the crossing represents the empirical probabilities. Figure 3.1 indicates that these theoretical distributions can fit the observed data reasonably well.

Fig. 3.1
figure 1

Probability curves of flood peak and 7-day flood volume

The Gumbel copula is used to model the dependence between the extreme maximum annual flood peaks and 7-day flood volume in this study. The probability plot of joint distribution is shown in Fig. 3.2, in which the Gumbel copula can fit the empirical bivariate distribution very well.

Fig. 3.2
figure 2

Comparison of observed and theoretical bivariate probability distribution

3.3 Copula-Based Flood Frequency Considering Historical Information

Flood events consist of flood peaks and flood volumes that are mutually correlated and need to be described by multivariate analysis methods, of which the copula functions are most desirable ones. Until now, the multivariate flood frequency analysis methods based on copulas doesn’t consider the historical flood information. This may underestimate or overestimate the flood quantiles or conditional probabilities corresponding to high return periods, especially when the length of gauged record data series is relatively short.

3.3.1 Maximum Likelihood Estimation for Censored Samples

In certain sampling situations, the exact values of a proportion of the sample are unknown, although their range may be specified. Usually, the range consists of all points above or below a threshold level. Under these circumstances, the sample is said to be censored. Censored samples occur, for example, when instruments are not calibrated for measurements above or below a certain level. Both historical data and recent flood data (i.e., systematic record) may give rise to censored samples, but because the censoring is generally above a threshold in the former and below in the latter, they must be treated separately (Leese 1973).

Censored-sample maximum likelihood estimators were initially developed by Hald (1949) and Cohen (1976) for the normal and lognormal distributions. They were subsequently adapted by Leese (1973), Condie and Lee (1982), and Stedinger and Cohn (1986) for common case in hydrology where one have both a censored-sample historical flood record and also a systematic gaged record. The maximum likelihood estimation method for type-I censoring is described as follows.

In the annual maximum flood series of Fig. 3.3, there is a total of g known floods. Of these, k is known to be the k largest in the period of n years. The n year period contains within it a systematic record (recently gauged data) of s years (s ≤ n) length. Of the k largest floods, c occurred during the systematic record (c ≤ k and c < s, and also g = s+kc). Assume a fixed threshold X0 exceeded by the k largest floods and not exceeded by any of the remaining nk floods, recorded or not (i.e., the k values which exceed X0 form a type I censored sample). It is also noted that the m (m = kc) floods in the pre-gauging period h (h = ns) are known as they are included in the k values which exceed X0, and it is assumed that no other floods exceeded the threshold during that period.

Fig. 3.3
figure 3

Sketch of the annual maximum flood series when historical floods are available. Notations: s—the length of the systematic record; h—the length of the pre-gauging period; y1, y2, y3—historical flood events; X0—perception threshold

Let fX and FX denote the probability density function (PDF) and the cumulative distribution function (CDF) of variable X, respectively. The resulting likelihood function for the whole sample of s+m known and hm unknown values is given by (Leese 1973; Condie 1986; Stedinger and Cohn 1986; Guo and Cunnane 1991)

$$l({\varvec{\upalpha}}) = \prod\limits_{i = 1}^{s + m} {f_{X} (x_{i} )} [\int\limits_{ - \infty }^{{X_{0} }} {f_{X} (x)} dx]^{h - m}$$
(3.13)

where α is the parameter vector of fX and FX.

Since c flood events exceeding the perception threshold X0 occur among the systematic data (analogously to the sketch in Fig. 3.3), the c events are virtually removed from the period s and are treated as historical data (Bayliss and Reed 2001). Then, Eq. 3.13 can be expressed as

$$l({\varvec{\upalpha}}) = \prod\limits_{i = 1}^{s - c} {f_{X} (x_{i} )} \prod\limits_{j = 1}^{k} {f_{X} (y_{j} )} [\int\limits_{ - \infty }^{{X_{0} }} {f_{X} (x)} dx]^{h - m}$$
(3.14)

where \(x_{i}\)(i = 1, 2 … s − c) denotes the systematic data less than the threshold X0 and \(y_{i}\)(j = 1, 2 … k) denotes the k (k = m+c) largest floods exceeding the threshold X0; \(\prod\limits_{i = 1}^{s - c} {f_{X} (x_{i} )}\) and \(\prod\limits_{j = 1}^{k} {f_{X} (y_{j} )}\) are the likelihood functions of sc systematic records and the k largest floods, respectively; and \([\int_{ - \infty }^{{X_{0} }} {f_{X} (x)} dx]^{h - m}\) represents the likelihood function for the hm unknown values, which has been defined and applied by Leese (1973), Condie (1986), Stedinger and Cohn (1986), and Guo and Cunnane (1991).

The log-likelihood function for the univariate distribution can be expressed as

$$L({\varvec{\upalpha}}) = \sum\limits_{i = 1}^{s - c} {\log f_{X} (x_{i} )} + \sum\limits_{j = 1}^{k} {\log f_{X} (y_{j} )} + (h - m)\log F_{X} (X_{0} )$$
(3.15)

The maximum likelihood estimates are those values of α that maximize Eq. 3.15.

3.3.2 Bivariate Flood Frequency Analysis with Historical Information

The conventional flood frequency analysis incorporation with historical information is based on univariate distribution. To overcome the shortcomings of univariate frequency analysis, a multivariate copula-based flood frequency analysis model that considers historical information was proposed and discussed by Li et al. (2013). As the historic flood events occurred hundreds of years ago, the durations of them are hard to measure or investigate. There is no publication or any gauged record related to the duration samples of historical floods. Besides, the perception threshold of flood duration is also difficult to fix for maximum likelihood estimation. Thus, only the distribution of flood peak and volume with historical information is studied.

3.3.3 Inference Function for Margins Method

In classical statistics, the inference function for margins (IFM) method was first defined as a terminology by McLeish and Small (1988). Compared with other estimation methods, the IFM method is the preferred fully parametric method for multidimensional parameter estimation because it is close to maximum likelihood (ML) in approach and is easier to implement (Joe and Xu 1996; Joe 1997). Comparisons of various types have been made in Xu (1996) for some multivariate models which suggest that the IFM method is highly efficient compared to maximum likelihood. Similar comparisons have also been made by Joe (1997), (2005) and the derived conclusions are: (1) the ML estimation is much more time-consuming than IFM method, (2) the IFM method allows one to do inference and modelling starting with univariate and lower-dimensional margins, (3) there is some robustness against misspecification of the dependence structure and also there should be more robustness against outliers or perturbations of the data, compared with the ML method; and (4) the IFM rather than the ML method avoids the sparseness problem to a certain degree, especially if parameters can all be estimated from univariate and bivariate likelihoods. Therefore, the IFM method is selected and described briefly as follows:

Under the assumption that the marginal distributions are continuous with probability density functions \(f_{X} (x;{\varvec{\upalpha}}_{{\mathbf{1}}} )\) and \(f_{Y} (y;{\varvec{\upalpha}}_{{\mathbf{2}}} )\), the joint PDF then becomes

$$f_{X,Y} (x,y;{\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ,\theta ) = c_{\theta } [F_{X} (x;{\varvec{\upalpha}}_{{\mathbf{1}}} ),F_{Y} (y;{\varvec{\upalpha}}_{{\mathbf{2}}} )]f_{X} (x;{\varvec{\upalpha}}_{{\mathbf{1}}} )f_{Y} (y;{\varvec{\upalpha}}_{{\mathbf{2}}} )$$
(3.16)

where FX and FY are univariate CDFs with respective parameter vectors α1, α2, and \(c_{\theta }\) is the density of \(C_{\theta }\) parametrized by a parameter θ, defined as

$$c_{\theta } (u,v) = \frac{{\partial^{2} C_{\theta } (u,v)}}{\partial u\partial v}$$
(3.17)

For the observed bivariate series (x1, y1),…, (xs, ys) with a sample size s, we can consider the two log-likelihood functions for the univariate marginal distribution, i.e.

$$L_{1} ({\varvec{\upalpha}}_{{\mathbf{1}}} ) = \sum\limits_{i = 1}^{s} {\log f_{X} (x_{i} ;{\varvec{\upalpha}}_{{\mathbf{1}}} )}$$
(3.18a)
$$L_{2} ({\varvec{\upalpha}}_{2} ) = \sum\limits_{i = 1}^{s} {\log f_{Y} (y_{i} ;{\varvec{\upalpha}}_{2} )}$$
(318b)

and the log-likelihood function for the joint distribution,

$$L(\theta ,{\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ) = \sum\limits_{i = 1}^{s} {\log f_{X,Y} (x_{i} ,y_{i} ;{\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ,\theta )}$$
(3.19)

The IFM method consists of two separate optimizations of univariate likelihoods, followed by an optimization of multivariate likelihood as a function of the dependence parameter vector. More specifically,

  1. (a)

    The log-likelihoods \(L_{1} ({\varvec{\upalpha}}_{{\mathbf{1}}} )\) and \(L_{2} ({\varvec{\upalpha}}_{{\mathbf{2}}} )\) of the two univariate marginal distributions are separately maximized by Eq. 3.18a, 318b to get estimates \({\hat{\varvec{\upalpha}}}_{{\mathbf{1}}}\) and \({\hat{\varvec{\upalpha}}}_{{\mathbf{2}}}\);

  2. (b)

    The function \(L(\theta ,{\hat{\varvec{\upalpha}}}_{{\mathbf{1}}} ,{\hat{\varvec{\upalpha}}}_{{\mathbf{2}}} )\) is maximized over θ to get \(\hat{\theta }\) in Eq. 3.19.

That is, under regularity conditions, \(({\hat{\varvec{\upalpha}}}_{ 1} ,{\hat{\varvec{\upalpha}}}_{ 2} ,\hat{\theta })\) is the solution of

$$(\partial L_{1} /\partial {\varvec{\upalpha}}_{{\mathbf{1}}} ,\partial L_{2} /\partial {\varvec{\upalpha}}_{{\mathbf{2}}} ,\partial L/\partial \theta ) = 0$$
(3.20)

This procedure is computationally simpler than that of estimating all parameters \({\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ,\theta\) simultaneously in Eq. 3.19.

3.3.4 Modified IFM Method with Incorporation of Historical Information

Since the current IFM method can only be used for systematic data series, a modified IFM (MIFM) method with an incorporation of historical and paleological information is proposed and described as follows.

Let xi and yi (i = 1,…, s-c) respectively denote the systematic data of marginal distributions (flood peak and volume); gj and pj (j = 1,…, k) respectively denote the k largest floods of marginal distributions (flood peak and volume) with the same years of occurrence. Of the k largest floods, c occurred during the systematic record and m occurred during the pre-gauging period h (k = m+c and h = n − s); X0 (or Y0) is the fixed threshold of margin exceeded by the k largest flood peaks (or volumes) and not exceeded by any of the remaining n − k flood peaks (or volumes). Furthermore, let fx, and fy denote the univariate marginal PDFs, and Fx, and Fy denote the univariate marginal CDFs of variables X and Y, respectively. fXY denotes the joint PDF.

Referring to Eq. 3.14, the likelihood function with historical floods for joint distributions can be described as

$$\begin{aligned} l(\theta ,{\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ) & = \prod\limits_{i = 1}^{s - c} {f_{XY} (x_{i} ,y_{i} )} \prod\limits_{j = 1}^{k} {f_{XY} (g_{j} ,p_{j} )} [\int\limits_{ - \infty }^{{X_{0} }} {\int\limits_{ - \infty }^{{Y_{0} }} {f_{XY} (x,y)dxdy} } ]^{h - m} \\ & = \prod\limits_{i = 1}^{s - c} {f_{XY} (x_{i} ,y_{i} )} \prod\limits_{j = 1}^{k} {f_{XY} (g_{j} ,p_{j} )} \{ C_{\theta } [F_{X} (X_{0} ),F_{Y} (Y_{0} )]\}^{h - m} \\ \end{aligned}$$
(3.21)

Then, the log-likelihood function for joint distribution can be expressed as:

$$\begin{aligned} L(\theta ,{\varvec{\upalpha}}_{{\mathbf{1}}} ,{\varvec{\upalpha}}_{{\mathbf{2}}} ) & = \sum\limits_{i = 1}^{s - c} {\log c_{\theta } [F_{X} (x_{i} ),F_{Y} (y_{i} )]} + \sum\limits_{j = 1}^{k} {\log c_{\theta } [F_{X} (g_{j} ),F_{Y} (p_{j} )]} \\ & \quad + (h - m)\log C_{\theta } [F_{X} (X_{0} ),F_{Y} (Y_{0} )] + \sum\limits_{i = 1}^{s - c} {\log f_{X} (x_{i} )} + \sum\limits_{j = 1}^{k} {\log f_{X} (g_{j} )} \\ & \quad + \sum\limits_{i = 1}^{s - c} {\log f_{Y} (y_{i} )} + \sum\limits_{j = 1}^{k} {\log f_{Y} (p_{j} )} \\ \end{aligned}$$
(3.22)

In which, the two log-likelihood functions for the univariate marginal distribution are

$$L_{1} ({\varvec{\upalpha}}_{{\mathbf{1}}} ) = \sum\limits_{i = 1}^{s - c} {\log f_{X} (x_{i} )} + \sum\limits_{j = 1}^{k} {\log f_{X} (g_{j} )}$$
(3.23)
$$L_{2} ({\varvec{\upalpha}}_{{\mathbf{2}}} ) = \sum\limits_{i = 1}^{s - c} {\log f_{Y} (y_{i} )} + \sum\limits_{j = 1}^{k} {\log f_{Y} (p_{j} )}$$
(3.24)

Similar to the IFM method, the MIFM method also consists of two separate procedures:

  1. (a)

    The log-likelihoods \(L_{1} ({\varvec{\upalpha}}_{{\mathbf{1}}} )\) and \(L_{2} ({\varvec{\upalpha}}_{{\mathbf{2}}} )\) are separately maximized by Eqs. 3.23 and 3.24 to get estimates \({\hat{\varvec{\upalpha}}}_{ 1}\) and \({\hat{\varvec{\upalpha}}}_{ 2}\);

  2. (b)

    The function \(L(\theta ,{\hat{\varvec{\upalpha}}}_{ 1} ,{\hat{\varvec{\upalpha}}}_{ 2} )\) is maximized by Eq. 3.22 over θ to get \(\hat{\theta }\).

As a consequence, the precious historical information is used to estimate not only the parameters of marginal distributions but also the dependence parameters of joint distribution that is based on the correlation of the marginal distributions. The more additional information of marginal distribution provides, the more precise dependence structure will be obtained.

3.3.5 Case Study

The Three Gorges reservoir (TGR) in China is selected as an illustrative example. The basin area of TGR is one million km2, and the annual average discharge and runoff volume at the dam site are 14,300 m3/s and 4510 × 108 m3, respectively. The TGR located on middle reaches of the Yangtze River is the largest water conservancy project in the world, with a normal pool level at an elevation of 175 m. The total storage capacity of the TGR is 393 × 108 m3, of which 221.5 × 108 m3 is flood control storage, and 165 × 108 m3 is the conservation regulating storage volume. With 26 hydro-generators installed, the mean annual electricity output of the TGR reaches up to 847 × 108 kW•h. The TGR also plays a key role in the flood prevention of Yangtze River basin which is the richest area in China (Li et al. 2010).

3.3.5.1 Systematic Record and Historical Floods

The annual maximum peak discharge (Q), 3-day flood volume (W3), and 15-day flood volume (W15) are available with a systematic record of 128 years (1882–2009, i.e., no systematic data are formally gauged before 1882). Besides the systematic observations, a lot of historical flood events had been investigated by CWRC (Changjiang Water Resources Commission) in the last century for the design of the Three Gorges Project. The gathered information from gauging authority records, historical documents, archives, flood marks and stone inscriptions showed the concrete positions of high water stages recorded. As a result, the eight largest historical floods since 1153 were quantificationally evaluated by CWRC and other relevant units (CWRC 1996).

As the same notations defined previously, the length of the systematic observations is unequivocally given: s = 128 years; since no extraordinary flood occurred during the systematic record, c = 0 and k = m; for the joint distribution of flood peak (Q) and 3-day flood volume (W3), k = m = 8; for the joint distribution of flood peak and 15-day flood volume (W15), k = m = 3; the perception thresholds of peak discharge, 3-day flood volume and 15-day flood volume are X0Q = 80,000 m3/s, X0w3 = 200 × 108 m3 and X0w15 = 780 ×  108 m3, respectively; and the pre-gauging period, h = 730 (i.e. from 1153 to 1882). These data settings are also listed in Table 3.3.

Table 3.3 Data settings for the modified IFM method

3.3.5.2 Parameter Estimation for Marginal Distributions

The empirical probabilities of univariate discontinuous series can be computed by Weibull formula recommended by MWR (2006)

$$P_{i} = P(x \ge x_{i} ) = \left\{ {\begin{array}{*{20}l} {P_{h} (i) = \frac{i}{n + 1}} \hfill & {i = 1, \cdots ,k} \hfill \\ {P_{s} (i) = P_{h} (k) + (1 - P_{h} (k)) \times \frac{i}{s - c + 1}} \hfill & {i = 1, \cdots ,s - c} \hfill \\ \end{array} } \right.$$
(3.25)

where Pi represents the exceedance probability; Ph(i) is the empirical probabilities of historical floods for i = 1,…, k; Ps(i) is the empirical probabilities of systematic data for i = 1,…, sc; and the meanings of n, k, s, c are the same as those defined in Fig. 3.3.

The parameters of the P-III marginal distributions estimated by the first stage of the MIFM method in Eqs. 3.23 and 3.24 are listed in Table 3.4. A Chi-Square Goodness-of-fit test is performed to test the assumption, H0, that the flood magnitudes follow the P-III distribution. Table 3.5 shows that the assumption cannot be rejected at the 5% significance level. The marginal distribution frequency curves of flood peak and flood volumes are drawn in Fig. 3.4, in which the line represents the theoretical distribution, the crossings and circles represent systematic record and historical flood data, respectively. Figure 3.4 indicates that all the theoretical distributions can fit the observed data reasonably well.

Table 3.4 Estimated parameters of P-III marginal distributions for flood peak and volumes by MIFM
Table 3.5 Hypothesis test results of P-III marginal distributions for flood peak and volumes
Fig. 3.4
figure 4

P-III distributions fitted to flood peak and volumes with historical information

3.3.5.3 Empirical Joint Probabilities of Dependence Flood Variables

Empirical (observed) joint probabilities of flood peak (Q) and volume (W) are computed in a manner analogous to that for a univariate variable. A two-dimensional table is constructed in which the variable X and Y are arranged in descending order. The joint probabilities (exceedance) of k historical floods and s-c systematic data are empirically computed separately, which are expressed as

$$\begin{array}{*{20}l} {F(x_{i} ,y_{i} ) = } \hfill \\ {P(X \ge x_{i} ,Y \ge y_{i} ) = \, } \hfill \\ \end{array} \left\{ {\begin{array}{*{20}l} {P_{h} (i) = \frac{{\sum\limits_{l = 1}^{i} {\sum\limits_{p = 1}^{i} {N_{lp} } } }}{n + 1}} \hfill & {i = 1, \ldots ,k} \hfill \\ {P_{s} (i) = P_{h} (k) + (1 - P_{h} (k)) \times \frac{{\sum\limits_{l = 1}^{i} {\sum\limits_{p = 1}^{i} {M_{lp} } } }}{s - c + 1}} \hfill & {i = 1, \ldots ,s - c} \hfill \\ \end{array} } \right.$$
(3.26)

where F(xi, yi) is obtained by arranging the number of (xi, yi) by either xi or yi; Ph(i) is the empirical joint probabilities of historical floods and Nlp is the number of (xi, yi) counted as xj ≥ xi and yj ≥ yi, i = 1,…, k, 1 ≤ ji; Ps(i) is the empirical joint probabilities of systematic data and Mlp is the number of (xi, yi) counted as xj ≥ xi and yj ≥ yi, i = 1,…, sc, 1 ≤ ji; and n is the total length of the analyzed time period (n = s+h).

3.3.5.4 Identification of Copula

The parameters of marginal distributions are estimated in the first stage of MIFM method. The dependence parameter θ is obtained by maximizing the log-likelihood function of the joint distribution. For Gumbel copula, the estimation results are θ = 16.2524 for the joint distribution of flood peak and 3-day flood volume, and θ = 3.2977 for that of flood peak and 15-day flood volume. For Student copula, the estimation results are (θ = 0.9947, \(\nu\) = 6) for the joint distribution of flood peak and 3-day flood volume, and (θ = 0.8598, \(\nu\) = 5) for that of flood peak and 15-day flood volume. The root mean square errors (RMSE) of Gumbel and Student copulas are listed in Table 3.6. The comparison results show that the Gumbel copula represents the bivariate distribution of correlated flood peak and volumes better than that of Student copula.

Table 3.6 RMSE of Gumbel and student’s copulas and upper TDC estimated by parametric and nonparametric methods

The upper tail dependence coefficients (TDC) of Gumbel copula (\(\lambda_{U} = 2 - 2^{1/\theta }\)) and student’s t copula (\(\lambda_{U} = 2t_{\nu + 1} \left( { - \sqrt {(\nu + 1)(1 - \theta )/(1 + \theta )} } \right)\)) are computed by the estimated parameters and listed in Table 3.6. The upper TDC can also be estimated by the nonparametric estimation, which is a much more general as no assumption is made about copula and marginal distributions (Poulin et al. 2007). The Log, Sec and CFG estimators of upper TDC (Coles et al. 1999; Joe et al. 1997; Poulin et al. 2007; Frahm et al. 2005) are respectively determined as follows.

$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\lambda }_{U}^{LOG} = 2 - \frac{{\log C_{n} \left( {{{(n - k)} \mathord{\left/ {\vphantom {{(n - k)} n}} \right. \kern-0pt} n},{{(n - k)} \mathord{\left/ {\vphantom {{(n - k)} n}} \right. \kern-0pt} n}} \right)}}{{\log \left( {{{(n - k)} \mathord{\left/ {\vphantom {{(n - k)} n}} \right. \kern-0pt} n}} \right)}},\quad 0 < k < n$$
(3.27)
$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\lambda }_{U}^{SEC} = 2 - \frac{{1 - C_{n} \left( {{{(n - k)} \mathord{\left/ {\vphantom {{(n - k)} n}} \right. \kern-0pt} n},{{(n - k)} \mathord{\left/ {\vphantom {{(n - k)} n}} \right. \kern-0pt} n}} \right)}}{{{{1 - (n - k)} \mathord{\left/ {\vphantom {{1 - (n - k)} n}} \right. \kern-0pt} n}}},\quad 0 < k < n$$
(3.28)
$$\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\frown}$}}{\lambda }_{U}^{CFG} = 2 - 2\exp \left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {\log \left( {\sqrt {\log \frac{1}{{U_{i} }}\log \frac{1}{{V_{i} }}} /\log \frac{1}{{\hbox{max} (U_{i} ,V_{i} )^{2} }}} \right)} } \right]$$
(3.29)

in which

$$C_{n} (u,v) = \frac{1}{n}\sum\limits_{i = 1}^{n} {{\mathbf{I}}(\frac{{R_{i} }}{n + 1} \le u,\frac{{S_{i} }}{n + 1} \le v)}$$
(3.30)

where \(C_{n} (u,v)\) is the empirical copula, I denote the indicator function, Ri and Si are the ranks of block maxima xi and yi, respectively. \(\left\{ {(U_{1} ,V_{1} )} \right., \ldots ,\left. {(U_{n} ,V_{n} )} \right\}\) denote random sample obtained from the copula C.

The nonparametric estimation results of upper TDC are calculated and also listed in Table 3.6. The comparison results of Table 3.7 show that the upper TDC of Gumbel copula is much closer to the nonparametric estimation results than that of student copula. This indicates that Gumbel copula reproduces better the observed tail dependence coefficient, and the extreme behavior of Gumbel copula is more similar to that of the sample. Therefore, the Gumbel copula is used to model the dependence between the extreme maximum annual flood peak and volumes in this study.

Table 3.7 Parameters of marginal distributions and copula estimated by different data and methods

3.3.5.5 Copula-Based Conditional Distributions

The conditional flood distributions with historical flood data can be easily derived if the copula-based bivariate flood distribution is constructed. For instance, the conditional distributions for flood volume given that the peak discharge exceeding a certain threshold qx0 can be expressed as

$$\begin{aligned} P(W \le w\left| {Q > q_{X0} } \right.) & = \frac{{P(W \le w,Q > q_{X0} )}}{{P(Q > q_{X0} )}} \\ & = \frac{{F_{Y} (w) - C_{\theta } [F_{X} (q_{X0} ),F_{Y} (w)]}}{{1 - F_{X} (q_{X0} )}} \\ \end{aligned}$$
(3.31a)
$$\begin{aligned} P(W > w\left| {Q > q_{X0} } \right.) & = \frac{{P(W > w,Q > q_{X0} )}}{{P(Q > q_{X0} )}} \\ & = \frac{{1 - F_{X} (q_{X0} ) - F_{Y} (w) + C_{\theta } [F_{X} (q_{X0} ),F_{Y} (w)]}}{{1 - F_{X} (q_{X0} )}} \\ \end{aligned}$$
(3.31b)

where Fx and FY represent the marginal distributions, and θ represents the dependence parameter of the bivariate distribution.

Likewise, the conditional distribution functions for peak discharge given that the flood volumes exceeding a certain threshold WY0 can be expressed as

$$\begin{aligned} P(Q \le q\left| {W > w_{Y0} } \right.) & = \frac{{P(Q \le q,W > w_{Y0} )}}{{P(W > w_{Y0} )}} \\ & = \frac{{F_{X} (q) - C_{\theta } [F_{X} (q),F_{Y} (w_{Y0} )]}}{{1 - F_{Y} (w_{Y0} )}} \\ \end{aligned}$$
(3.32a)
$$\begin{aligned} P(Q > q\left| {W > w_{Y0} } \right.) & = \frac{{P(Q > q,W > w_{Y0} )}}{{P(W > w_{Y0} )}} \\ & = \frac{{1 - F_{X} (q) - F_{Y} (w_{Y0} ) + C_{\theta } [F_{X} (q),F_{Y} (w_{Y0} )]}}{{1 - F_{Y} (w_{Y0} )}} \\ \end{aligned}$$
(3.32b)

The historical floods, which usually occurred as extraordinary events, may help exposit the correlation of variables with high return period. As a consequence, the incorporation of historical information into bivariate frequency analysis can provide better insight into the dependence structure of variables. The conditional probabilities accounting for historical floods can provide more comprehensive and adequate information, which is useful in evaluating the flood prevention capability.

3.3.5.6 Comparative Study and Discussions

The comparative study and discussions of MIFM and IFM methods are conducted in this section. First, the parameters of marginal distributions (Q, W3, and W15) and copulas are estimated by IFM and MIFM methods, respectively. Table 3.7 shows that the different data and methods lead to different parameter estimation results of both marginal distributions and copula. Second, the quantiles of flood peak (Q), 3-day flood volume (W3) and 15-day flood volume (W15) are estimated by univariate distribution (Chinese design flood guidelines), MIFM and IFM methods, respectively.

The Relative Errors (RE) of T-year quantile estimator are calculated by

$${\text{RE}} = \frac{{\hat{X}_{T} - X_{T} }}{{X_{T} }} \times 100\%$$
(3.33)

where XT is the univariate quantile estimated by univariate distribution (Chinese design flood guidelines) with an incorporation of historical information; \(\hat{X}_{T}\) represents the bivariate quantiles estimated by MIFM method with an incorporation of historical information or by IFM method using systematic records alone.

The relative errors (RE) of flood peak, 3-day flood volume, and 15-day flood volume are calculated and listed in Tables 3.8, 3.9, and 3.10, respectively. The results of these tables indicate that the bivariate quantiles estimated by MIFM approach is much closer to the univariate quantiles than that estimated by IFM method. The quantiles estimated by IFM method are much smaller than that of Chinese design flood guidelines. The mean relative errors are equal to −5.70, −3.24, and −1.88% for flood peak, 3-day flood volume, and 15-day flood volume, respectively.

Table 3.8 Comparison of quantile Q estimated by univariate and bivariate distributions
Table 3.9 Comparison of quantile W3 estimated by univariate and bivariate distributions
Table 3.10 Comparison of quantile W15 estimated by univariate and bivariate distributions

3.4 Bivariate Design Flood Quantile Selection Using Copulas

To derive the feasible range, a boundary identification method is suggested, which is inspired by the ideas of Chebana and Ouarda (2011) and Volpi and Fiori (2012). Li et al. (2016) estimated the bivariate feasible ranges of flood peak and flood volume suitable for combination in the critical level curve. Two combination methods for estimating unique bivariate flood quantiles, i.e., the EFC method and the CEC method, are proposed based on the assumption of the relationship between u and v (or q and w).

3.4.1 Bivariate Return Period

In the conventional univariate analysis, flood events of interest are often defined by return periods. In the bivariate domain, however, it is still discussed by the community as to which method is most suitable to transform the joint exceedance probability to a bivariate joint return period (JRP). Different JPRs estimated by copula function have been developed for the case of a bivariate flood frequency analysis. Eight types of possible joint events were presented by Salvadori and De Michele (2004) using “OR” and “AND” operators, of which, two cases are of the greatest interest in hydrological applications (Shiau et al. 2006; Salvadori and De Michele 2004):

  1. (1)

    (OR case) either Q > q or W > w, i.e.,

    $$E_{or} = \left\{ {Q > q\,{\text{or}}\,W > w} \right\}$$
    (3.34)
  2. (2)

    (AND case) both Q > q and W > w, i.e.,

    $$E_{and} = \left\{ {Q > q\,{\text{and}}\,W > w} \right\}$$
    (3.35)

In simple words: for Eor to happen it is sufficient that either peak discharge Q or flood volume W (or both) exceed given thresholds; instead; for Eand to happen it is necessary that both Q and W are larger than prescribed values. Thus, two different JRPs can be defined accordingly (De Michele et al. 2005):

$$T_{or} = \frac{\mu }{{P\left[ {Q > q\,or\,W > w} \right]}} = \frac{\mu }{1 - F(q,w)}$$
(3.36)
$$T_{and} = \frac{\mu }{{P\left[ {Q > q\,and{\kern 1pt} \,W > w} \right]}} = \frac{\mu }{{1 - F_{Q} (q) - F_{W} (w) + F(q,w)}}$$
(3.37)

where μ is the mean inter-arrival time between two consecutive events (in the case of annual maxima μ = 1 year), and F(q, w) = P(Q ≤ q, W ≤ w).

The Kendall JRP was introduced by Salvadori and De Michele (2004) to identify the univariate critical threshold in a multivariate context, which is given by:

$$\theta_{t} = \frac{{\mu_{T} }}{{1 - K_{C} (t)}}$$
(3.38)

where KC is the Kendall’s distribution function associated with the joint cumulative distribution function of the copula’s level curves: KC(t) = P[C(u, v) ≤ t]. It allows for the calculation of the probability that a random point (u, v) in the unit square has a smaller (or larger) copula value than a given critical probability level t. In other words, it is related to the probability of occurrence of an event in the area over the copula level curve of value t.

Different definitions of the multivariate return period are available in the literature, based on regression analysis, bivariate conditional distributions, survival Kendall distribution function, and structure performance function. For instance, some studies have focused on a structure-based return period for the design and or risk assessment of hydrological structures in a bivariate environment (Volpi and Fiori 2014). A comprehensive review of the JRP estimation methods was given by Volpi and Fiori (2014).

The OR return period given in Eq. 3.36 has been extensively applied in multivariate hydrological frequency analysis (e.g., Shiau et al. 2006; Salvadori and De Michele 2004; Chebana and Ouarda 2011; Volpi and Fiori 2012; Li et al. 2013). In this study, we focus on the OR case for quantile estimation in a bivariate context.

3.4.2 Feasible Range Identification for Bivariate Quantile Curve

The critical level curve, as shown in Fig. 3.5, was defined as a bivariate quantile curve by Chebana and Ouarda (2011). As previously stated, for the case of OR return period, the function that describes the level curve for any given return period T or critical probability level p has two asymptotes, q = qp and w = wp, where \(q_{p} = F_{Q}^{ - 1} (p)\) and \(w_{p} = F_{W}^{ - 1} (p)\) are the quantiles of the marginal distribution for the given probability level p. According to Eq. 3.36 in the bivariate case, the choice of an appropriate return period T or a critical probability level p for hydraulic structure design will lead to the infinite combinations of flood peak and volume. However, all the bivariate flood events with the same value of T or p along the level curve differ greatly not only in terms of their quantile values, but also in terms of their probability of occurrence, which is measured by the joint probability density function (PDF), i.e., f(q, w), evaluated along the critical level curve (Volpi and Fiori 2012). Meanwhile, different combinations of Q and W are generally not equivalent from a practical point of view, although they all satisfy the flood prevention standards. The boundaries (see points B and C in Fig. 3.5) for selection of design flood peak and volume are necessary in the case that the flood combinations are outside the boundaries with unrealistically low occurrence probabilities.

Fig. 3.5
figure 5

Bivariate quantile curve with a critical probability level p

Chebana and Ouarda (2011) proposed a method to decompose the quantile curve in Fig. 3.5 into a naive part (i.e., the subset BC) and a proper part (outside subset BC). They assumed that the naive part is composed of two segments starting at the end of each extremity of the proper part. They also suggested selecting these boundary points according to the empirical version or as close as to the asymptotes (the naive part). Volpi and Fiori (2012) defined the distance of each point along the quantile curve in Fig. 3.5 from its vertex as a random variable (s) and derived its PDF. The boundary points of the quantile curve are identified with a chosen percentage in the probability of the events. They also proposed a way of decomposition of the quantile curve into the naive part and proper part. However, the procedure presented by Volpi and Fiori (2012) is difficult to apply in the curvilinear coordinate system [s(x, y), n(x, y)] or to derive the expression of a random variable (s). To overcome these limitations, an approach to identify the boundary points (i.e., B and C) of the quantile curve is developed. A new density function φ(q) is used to measure the relative likelihood of flood events, which is a non-curvilinear variable in the procedure.

To derive the new density function with a chosen probability level to decompose the quantile curve, a joint distribution of annual maximum flood peak (Q) and flood volume (W) should be built by copula functions. The joint distribution function F(q, w) can be expressed in terms of its marginal functions and FW(w) by using an associated dependence function C, F(q, w) = C[FQ(q), FW(w)].

It is found that flood peak and volumes are usually upper-tailed dependent variables and the Gumbel copula can reproduce best the observed tail dependence coefficient (e.g., Poulin et al. 2007) Therefore, the Gumbel copula is taken as an example to illustrate the developed boundary identification method because of its easy expression and wide applications (Li et al. 2013).

For the Gumbel copula function, the relationship of joint distribution Cθ(u, v) and bivariate return period T can be expressed as (μ = 1 for annual maxima flood series):

$$C_{\theta } (u,v) = \exp \{ - [( - \ln \,u)^{\theta } + ( - \ln \,v)^{\theta } ]^{1/\theta } \} = 1 - \frac{1}{T}$$
(3.39)

where θ is the dependence parameter of the Gumbel copula, u = FQ(q), v = FW(w).

Thus, the relationship between u and v with the given bivariate return period T can be derived as:

$$v = \exp \left\{ { - [( - \ln \,u)^{\theta } - ( - \ln (1 - \frac{1}{T}))^{\theta } ]^{1/\theta } } \right\}$$
(3.40)

Replacing u = FQ(q), and v = FW(w)into the above equation yields:

$$F_{W} (w) = \exp \left\{ { - [( - \ln \,F_{Q} (q))^{\theta } - ( - \ln (1 - \frac{1}{T}))^{\theta } ]^{1/\theta } } \right\} = \eta (F_{Q} (q))$$
(3.41)

in which, \(\eta (x) = \exp \left\{ { - [( - \ln x)^{\theta } - ( - \ln (1 - \frac{1}{T}))^{\theta } ]^{1/\theta } } \right\}\)

Thus, the relationship between Q and W with the fixed bivariate return period T can be derived as:

$$w = F_{W}^{ - 1} (v) = F_{W}^{ - 1} \left( {\eta (F_{Q} (q)} \right) = \varsigma (q)$$
(3.42)

where \(F_{W}^{ - 1} \left( v \right)\) is the inverse CDF of flood volume W. The above equation reveals that W can be derived by Q if the bivariate return period T is fixed.

It should be noted that other copulas with more complicated formulas sometimes may be needed. For the Frank copula, Clayton copula and several two-parameter copulas, the implicit expression for describing the relationship between Q and W in Eqs. 3.39 to 3.42 can be derived. For copulas with more complicated expressions, the numerical method should be applied. For example, the unique value of w could be obtained with given q by a trial and error method.

After obtaining the corresponding relationship of the values of w and q for the flood events along the critical level curve, the bivariate joint PDF of w and q can be expressed according to Sklar’s theory as (Nelsen 2006):

$$f(q,w) = c_{\theta } (F_{Q} (q),F_{W} (w)) \cdot \,f_{Q} (q) \cdot \,f_{W} (w)$$
(3.43)

where fQ(q) and fW(w) are univariate PDFs of flood peak and volume, respectively, and \(c_{\theta } (u,v)\) is the density of \(C_{\theta } (u,v)\) and defined as:

$$c_{\theta } = \frac{{\partial^{2} C_{\theta } (u,v)}}{\partial u\partial v}$$
(3.44)

Referring to Eqs. 3.41 and 3.42, the bivariate joint PDF of flood peak and volume can be finally described as the function of the single random variable of flood peak Q for the fixed bivariate return period T, i.e.,

$$f(q,w) = c_{\theta } (F_{Q} (q),\eta (F_{Q} (q))) \cdot \,f_{Q} (q) \cdot \,f_{W} (\varsigma (q))$$
(3.45)

According to Eq. 3.45, there is a curve that can describe the relationship between joint PDF f(q, w)and flood peak Q for a given bivariate return period T or a critical probability level p. Assume that the area between the curve of f(q, w) and the horizontal axis of flood peak Q is A, i.e.,

$$A{ = }\int\limits_{{q_{p} }}^{ + \infty } {f(q,w)dq = \int\limits_{{q_{p} }}^{ + \infty } {c(F_{Q} (q),\eta (F_{Q} (q))) \cdot \,f_{Q} (q) \cdot \,f_{W} (\varsigma (q))} } dp$$
(3.46)

where qp represents univariate design value of flood peak, i.e., \(q_{p} = F_{Q}^{ - 1} (p)\), which is chosen as the lower bound of flood peak in the estimation of the bivariate design flood values.

As f(q, w) is a joint density function of q and w, area A does not equal to 1 if only q is taken as an integral variable (i.e., A ≠ 1). A new density function φ(q) over the area A which has proper density characters is constructed and expressed as follows:

$$\varphi (q) = \frac{f(q,w)}{A} = \frac{f(q,w)}{{\int_{{q_{p} }}^{ + \infty } {f(q,w)dq} }}$$
(3.47)

Obviously, there is a one-to-one correspondence between the density function φ(q) and bivariate PDF f(q, w). The density function φ(q) varies with the horizontal axis and \(\int_{{q_{T} }}^{ + \infty } {\varphi (q)dq = 1}\).

As previously stated, the bivariate design flood combinations near the upper and lower bounds of the quantile curve have lower occurrence probability than that near the middle of the quantile curve. As a consequence, the bivariate PDF f(q, w) of bivariate design flood combination near the upper and lower bounds of quantile curve is smaller than that near the middle of the quantile curve. The density function φ(q) has the same property as the bivariate PDF f(q, w). As the design flood peak (or flood volume) varies from the lower bound, i.e., (qp) to infinitely great, the density function φ(q) increases to the maximum value and then decreases gradually, as shown in Fig. 3.6. The vertex of the density function φ(q) describing the full dependence (Chebana and Ouarda 2011; Volpi and Fiori 2012) between peak and volume has the highest density. In other words, this is the most likely bivariate design flood event.

Fig. 3.6
figure 6

Relationship between density function φ(q) and flood peak Q

Once the density function φ(q) along Q is defined by Eq. 3.43, we can evaluate the lower and upper bounds that contain φ(q) with probability of 1−ε, for a given probability level ε. The quantiles of lower and upper bounds (qB and qC) are specified respectively by (Volpi and Fiori 2012):

$$\int\limits_{{q_{p} }}^{{q_{B} }} {\varphi (q)dq = \alpha_{1} }$$
(3.48)
$$\int\limits_{{q_{p} }}^{{q_{C} }} {\varphi (q)dq = 1- \alpha_{ 2} }$$
(3.49)

where α1 + α2 = ε. The lower and upper bounds qB and qC identify a feasible range on the quantile curve, bounded by the points of coordinates (qB, ζ(qB)) and (qC, ζ(qC)), that excludes the ε percentage in the probability of the critical events. The probability levels α1 and α2 can be arbitrarily chosen, taking account of the specific problem under investigation (Volpi and Fiori 2012).

3.4.3 Bivariate Flood Quantile Selection

For a given bivariate return period T, there are countless combinations of u and v that satisfy Eq. 3.39. To derive the design values of flood peak q and flood volume w, the unique combination of u and v (or q and w) should be determined. Hence besides Eq. 3.39, one more equation that can establish the relationship between u and v (or q and w) is necessary. Two combination methods were proposed to derive the quantiles of flood peak and flood volume for given multivariate return periods, and they are now outlined.

3.4.3.1 Equivalent Frequency Combination Method

With a given bivariate return period T, we assume that the flood peak and flood volume have the same probability of occurrence, i.e., u = v (or FQ(q) = FW(w)). This assumption is usually taken as a uniform procedure for the derivations of design flood values and design flood hydrograph in China (MWR 2006; Xiao et al. 2008, 2009; Chen et al. 2010). Then, the design frequency of bivariate equivalent frequency combination can be obtained by jointly solve the equation u = v and Eq. (3.39).

Taking the Gumbel copula for example, the relationship between u and v with the given bivariate return period T is described in Eq. 3.39. Based on the assumption that u = v, the probabilities of occurrence of flood peak and volume (i.e., u and v) can be estimated by the solution of the following equation.

$$u = v = (1 - \frac{1}{T})^{\varsigma }$$
(3.50)

where \(\varsigma { = 2}^{{ - \frac{1}{\theta }}}\), and θ is the dependence parameter of the Gumbel copula.

Consequently, the design value of bivariate equivalent frequency combination can be derived by the inverse function of marginal distributions:

$$q = F_{Q}^{( - 1)} (u)$$
(3.51a)
$$w = F_{W}^{( - 1)} (v)$$
(3.51b)

3.4.3.2 Conditional Expectation Combination Method

Since the flood peak Q and flood volume W are dependent variables, one may wish to predict the value of W based on an observed value of Q. Let g(Q) be a predictor, i.e., gN = {all Borel functions g with E[g(Q)]2 < ∞ Each predictor is assessed by the “mean squared prediction error” E[Wg(Q)]2. The conditional expectation E(W|Q) is the best predictor of W in the sense that

$$E\left[ {W - E(W|Q)} \right]^{2} = \mathop {\hbox{min} }\limits_{g \in N} E\left[ {W - g(Q)} \right]^{2}$$
(3.52)

Herein, during a flood event, when the flood peak Q = q takes place; the conditional expectation \(E(w|q)\) is used to estimate the value of flood volume, which can be derived by

$$E(w|q) = \int\limits_{ - \infty }^{ + \infty } {wf_{W|Q} (w)dw}$$
(3.53)

where fW|Q(w) is the density function of the conditional CDF FW|Q(w) and defined as (Zhang and Singh 2006).

$$f_{W|Q} (w) = \frac{f(q,w)}{{f_{Q} (q)}} = \frac{{c_{\theta } (u,v)f_{Q} (q)f_{W} (w)}}{{f_{Q} (q)}} = c_{\theta } (u,v)f_{W} (w)$$
(3.54)

Hence, Eq. 3.53 can be expressed by

$$E(w|q) = \int\limits_{ - \infty }^{ + \infty } {wf_{W|Q} (w)} dw = \int_{ - \infty }^{ + \infty } {wc_{\theta } (u,v)f_{W} (w)dw = \int_{0}^{1} {F_{W}^{ - 1} (v)} } c_{\theta } (u,v)dv$$
(3.55)

where \(F_{W}^{ - 1} ( \cdot )\) is the inverse CDF of W.

Then, the flood peak q and E(w|q) will be the conditional expectation combination if the following equations are satisfied

$$\left\{ {\begin{array}{*{20}l} {u = F_{Q} (q)} \hfill \\ {v = F_{W} [E(w|q)]} \hfill \\ {\frac{1}{{1 - C_{\theta } (u,v)}} = T} \hfill \\ \end{array} } \right.$$
(3.56)

The above equation can be solved by trial and error method with different values of q.

3.4.4 Case Study

3.4.4.1 Bivariate Quantile Curve and Feasible Range Identification

The return period of design flood of Geheyan reservoir, i.e., T = 1000-year, is selected as the bivariate return period and T = 200-year is also chosen for comparison. The bivariate quantile curves of the two return periods are shown in Fig. 3.7. Even if the Gumbel copula model is symmetric, the probability density function φ(q) is not symmetrical due to the difference in the marginal distributions.

Fig. 3.7
figure 7

Bivariate quantile curve of joint distribution of flood peak and 7-day flood volume

The upper and lower bounds on the level curve are estimated numerically by solving Eqs. 3.48 and 3.49, and assuming for simplicity (although other assumptions are possible) α1 = α2 = ε/2, with ε = 0.05. The upper and lower bounds are denoted as B1 and C1, respectively, in Fig. 3.7. It is found that the bounds are close to the horizontal asymptote (i.e., w7 = 61.49 × 108 m3 for T = 1000 and w7 = 50.23 × 108 m3 for T = 200) and vertical asymptote (i.e., qp = 22,800 m3/s for T = 1000 and qp = 19,300 m3/s for T = 200) due to the small value assumed for the probability level ε. The upper and lower bounds are also calculated by the boundary identification method proposed by Volpi and Fiori (2012). The results are also presented in Table 3.11, and the derived bounds are denoted as B2 and C2, as shown in Fig. 3.7. It is shown that the bounds estimated by the proposed method and that proposed by Volpi and Fiori (2012) are very similar.

Table 3.11 Comparison of the lower and upper bounds of the quantile curve

3.4.4.2 Estimation of Bivariate Flood Quantiles

The bivariate EFC and CEC methods are used to estimate flood peak and 7-day flood volume quantiles with return periods of T = 1000 and T = 200 years, respectively. For comparison, the univariate flood quantiles (called marginal quantiles by Chebana and Ouarda 2011) are estimated by marginal distributions, assuming that the univariate return periods (TQ and TW) are equal to the bivariate return period (i.e., TQ = TW = T). The univariate flood quantiles can be obtained from the equations \(q = F_{Q}^{ - 1} (p) = F_{Q}^{ - 1} (1 - \frac{1}{T})\) and \(w = F_{W}^{ - 1} (p) = F_{W}^{ - 1} (1 - \frac{1}{T})\). The results of the component-wise excess realization and the most likely realization proposed by Salvadori et al. (2011) are also estimated. The estimation results of bivariate and univariate quantiles are listed in Table 3.12. It is shown that the design values of bivariate quantiles are larger than those of univariate quantiles. The quantiles estimated by the four bivariate event selection methods are also shown in Fig. 3.7, and the estimation points of the EFC method are denoted as point E, while the quantiles estimated by the CEC method are denoted as point F. For the results of selection approaches proposed by Salvadori et al. (2011), the events of component-wise excess realization are denoted as point W, and the events of most likely realization are denoted as point L. From Fig. 3.7, we find that the joint design values estimated by the four event-selection methods are within the feasible regions. Consequently, the two proposed methods and selection approaches proposed by Salvadori et al. (2011) can be selected as an option of deriving unique flood quantiles, and they can satisfy the inherent law of hydrologic events and have a statistical basis to some degree. It can be seen from Table 3.12 and Fig. 3.7 that the estimated events of the EFC method and that of the most likely realization are similar. The bivariate EFC results have larger flood volume and smaller flood peak than bivariate CEC results. As well, the results estimated by the component-wise excess realization have larger flood peak and smaller flood volume than the other three methods.

Table 3.12 Design flood values and corresponding highest water levels estimated by bivariate quantile combinations and univariate distribution

3.4.4.3 Design Flood Hydrograph Based on Joint Distribution

The two combination methods are applied to derive the design flood hydrograph (DFH), and the resulting highest reservoir water level is selected as an index to evaluate the effects of different hydrological loads on the structure. The DFH for a dam is the flood of suitable probability and magnitudes adopted to ensure safety of the dam in accordance with appropriate design standards. The annual maximum flood hydrograph of 1997, which has a high peak and large volume with a posterior-peak shape, is selected as a typical flood hydrograph (TFH). The DFH with bivariate combinations is amplified from a TFH by the following method (Xiao et al. 2008):

$$DFH(t) = (TFH(t) - Q_{TFH} ) \times (w/DT - q)/(W_{TFH} /DT - Q_{TFH} ) + q$$
(3.57)

where DFH(t) and TFH(t) are the flood discharges of the DFH and TFH for time t respectively; QTFH is flood peak discharge of TFH; WTFH is 7-day flood volume of TFH for flood duration DT; q and w are flood peaks and 7-day flood volumes of bivariate design flood combination, respectively. Nevertheless, other DFH generation methods based on flood peak and volume are also available and can be applied with the bivariate design value combinations.

The DFHs of 1000-year and 200-year return periods are constructed, respectively, with the bivariate EFC method and bivariate CEC method as shown in Fig. 3.8. It is found in Fig. 3.8 that only a few differences exist between the DFHs estimated by the EFC and CEC methods. This is because that the differences between the bivariate design values vary within a small range. Volpi and Fiori (2012) found that the feasible range on a p-level curve strongly depends on the correlation coefficient of Q and W. In the limiting case of full dependence, the level curve reduces to its vertex and the width of the feasible range tends to 0 (Volpi and Fiori 2012). Since the Kendall correlation coefficient between flood peak and 7-day volume in Geheyan reservoir equals to 0.66, the differences of quantiles estimated by EFC and CEC methods are relatively small in this case study.

Fig. 3.8
figure 8

DFHs derived by EFC method and CEC method

The DFH rescaled by univariate distribution design values and two realizations proposed by Salvadori et al. (2011) is also derived from TFH by Eq. 3.59. These DFHs are routed through the Geheyan reservoir with initial water level (flood control limiting water level, 192.2 m). The corresponding highest reservoir water levels (Zmax) are calculated and are listed in Table 3.12.

It is shown in Table 3.12 that the design values of flood peak and 7-day flood volume obtained by univariate distribution method are both smaller than those obtained by four bivariate methods. The resulting Zmax of the univariate method is relatively lower than those of bivariate approaches. Since flood events are naturally multivariate phenomena and flood peak and flood volume are mutually correlated, the quantiles estimated by bivariate distribution are more rational than these by univariate distribution (Chebana and Ouarda 2011).

The comparison results listed in Table 3.12 also show that Zmax obtained by bivariate EFC method is larger than that obtained by the other three bivariate methods, while the component-wise excess method reaches the lowest Zmax. The results of Zmax calculated by most-likely realization are a little lower than those of the EFC method, and the CEC method obtains a slightly higher Zmax than the component-wise excess method. Comparing the results of 200-year and 1000-year return period, it is found that the differences among the four bivariate methods decrease as the return period increases. The water level reaches 202.97 m by the EFC method and is slightly higher than other methods for the 1000-year return period. Since the Geheyan reservoir has a large amount of flood control storage with annual regulation ability, the design flood volume is relatively more important than peak discharge for flood prevention safety. As a consequence, the bivariate EFC method with slightly larger 7-day flood volume is safer for reservoir design than other methods.

3.5 Conclusion

According to the bivariate joint distribution of annual maximum flood occurrence dates and magnitudes, flood peaks and volumes, a flood frequency analysis model with an incorporation of historical floods are established based on GH copula. Modified inference functions for the margins (MIFM) method and the quantile curve boundary identification method are developed. The following conclusions are drawn from this Chapter:

  1. (1)

    The Von Mises and Pearson Type III distributions can fit observed data series very well. The goodness-of-fit tests indicate a good agreement between observed and theoretical probabilities for both marginal and joint distributions.

  2. (2)

    The proposed MIFM method may reduce the uncertainties of parameter estimation in flood frequency analysis, since the historical floods have been taken into account.

  3. (3)

    The quantile combination methods provide a simple but effective way for bivariate quantile estimation with given bivariate return period. The results illustrate that the joint design values estimated by the two proposed combination methods are within the feasible regions, and the equivalent frequency combination method perform satisfactorily.