Keywords

1 Introduction

Detailed information on random yet mutually correlated flood parameters such as flood duration, flood volume, and peak flow is crucial for the design, management, and planning of hydrological structures. A number of flood frequency analysis [1] methodologies has been developed so far to summarize flood characteristics and find their correlations to estimate the severity of flood events using univariate [24] and multivariate techniques [58]. Both analyses require many restrictive assumptions to be considered [5]. However, the crucial flood characteristics can be presented using multivariate analysis as a joint cumulative distribution function (CDF) and probability density function (PDF). Therefore, multivariate analyses are getting increasingly more popular in recent years for flood frequency analysis [68].

The best marginal distribution for the flood parameters aforementioned is not necessarily from the same probability distribution function. This has encouraged the introduction of a Copula concept [9, 10] into flood frequency analysis [1113] to model the correlations among the flood parameters without taking the type of marginal distributions into consideration. This implies that such joint distribution model is not as restricted as traditional flood frequency analyses. A univariate marginal can be connected to its full multivariate distribution using a Copula. Its model gives more freedom than traditional bivariate models by accommodating various marginal distributions. Therefore, Copula-based flood frequency analysis has emerged as a better option than conventional ones, and its empirical joint distribution has been proven superior to standard join parametric distribution [14, 15]. Copula models have been successfully applied in many fields including survival analysis [1618], actuarial science [19, 20], and finance [21, 22] though their numbers are still rather limited at this stage.

Generally, Copulas can be parameterized by one or two parameters. Gaussian Copula, a member of Copula family, can be computed and simulated easily and can also be swiftly extended to arbitrary dimensions. Moreover, it can be uniquely defined by the correlation matrix of marginal distributions and therefore only requires calculating pairwise correlations. Gaussian Copula has been used in this study to estimate the joint CDF and joint bivariate return period of three flood parameters of the Johor River in Malaysia.

2 Materials and Methods

2.1 Case Study

The Johor River (Fig. 1) is located in the south of Peninsular Malaysia, covering an area of 2700 km2. The topography of the catchment is undulating, but quite steep in the upstream. It has a tropical climate with a mean annual rainfall of 2470 mm, mean air temperature of 28.5 °C, and mean relative humidity of 85 %. This river is selected since flood is a recurrent phenomenon in the basin. This study utilized 45-year hourly stream flow data of Johor River measured at the Rantau Panjang gauging station (01° 46′ 50″N and 103° 44′ 45″E) by the Department of Irrigation and Drainage (DID), Malaysia.

Fig. 1
figure 1

Location of Johor River in south of peninsular Malaysia

2.2 Determination of Flood Characteristics

The hourly river discharge data aforementioned were utilized to determine the annual flood peaks and its corresponding volume and duration. The initiation and ending of all flood events were marked using the method of [23, 24]. The peak flow \((Q_{p} )\) was determined using the flood duration frequency approach [25], where the peak flow corresponds to the maximum amount of flood in each water year. The flood volume (V) is approximately the total water volume, and the flood duration (D) is the time elapsed during the flood event.

2.3 Modeling Peak flow, Flood Duration, and Flood Volume

The distribution of the flood variables was modeled using Generalized Pareto, Pearson, Exponential, Beta, and General Extreme Value distributions. The goodness-of-fit tests measure how well a random sample fits a theoretical PDF. In this study, the Kolmogorov–Smirnov (K-S) goodness-of-fit tests were conducted at a 5 % significance level.

2.4 Copula Function

The function of a Copula is a joint distribution with uniform random variables [26], which can be expressed as:

$$C:[0,1]^{n} \to [0,1]$$

Within the unit hypercube, every n dimensional hyper cube’s probability has to be positive. Sklar’s theorem [9] links the Copulas to the multivariate distributions, which means that the Copula can represent each multivariate distribution F(t 1t n ) as:

$$F(t_{1} , \ldots ,t_{n} ) = C(F_{{t_{1} }} (t_{1} ), \ldots ,F_{{t_{n} }} (t_{n} ))$$
(1)

where \(F_{{t_{i} }} (t_{i} )\) is the ith one-dimensional margin of the multivariate distribution. The Copula C becomes unique when the distribution is continuous. Nelsen [10] constructed Copulas from distribution function as:

$$C(u) = C(u_{1} , \ldots ,u_{n} ) = F\left( {F_{{t_{1} }}^{ - 1} (t_{1} ), \ldots ,F_{{t_{n} }}^{ - 1} (t_{n} )} \right)$$
(2)

If the corresponding dependence of a bivariate Copula is symmetrical, it can be expressed as:

$$C(u,v) = C(1 - u,1 - v) - 1 + u + v$$
(3)

When the Copula density is symmetrical with the secondary diagonal of the unit square, i.e., u = 1 − v, the Copula density, c, fulfills the following condition:

$$C(u,v) = C(1 - u,1 - v)$$
(4)

2.5 Gaussian (Normal) Copula

The normal Copula [27, 28] takes the form of:

$$C(u,v;\theta ) =\Phi _{G} (\Phi ^{ - 1} (u),\Phi ^{ - 1} (v);\theta )$$
(5)
$$= \int\limits_{{ - {\infty }}}^{{\Phi ^{ - 1} (u)}} {\int\limits_{{ - {\infty }}}^{{\Phi ^{ - 1} (v)}} {\frac{1}{{2\pi (1 - \theta^{2} )^{1/2} }}} } \times \left\{ {\frac{{ - (x^{2} - 2\theta xy + y^{2} }}{{2(1 - \theta^{2} )}}} \right\}{\text{d}}x{\text{d}}y$$
(6)

where \(\Phi ^{ - 1} ( \cdot )\) is the inverse function of the standard normal distribution (CDF) \(\Phi ( \cdot )\) and \(\theta\) is the linear correlation coefficient between \(\Phi ^{ - 1} (u)\) and \(\Phi ^{ - 1} (v)\) restricted to the interval (−1, 1) which is explained in below.

2.6 Parameter Estimation of Copulas

The inference function for margin (IFM) method was used to determine the Copula parameter \((\theta )\), which is a parameter used to measure the degree of association between two univariate CDFs, using MATLAB coding in this research. Basically, this method has two steps:

  1. 1.

    Using two margins’ log-likelihood functions, estimate \(\alpha\) and \(\beta\) for the PDF of \(f_{x} (x;\alpha )\) and \(f_{y} (y;\beta )\), respectively. The two parameters \(\alpha\) and \(\beta\) may have \(\alpha_{1} ,\alpha_{2} , \ldots ,\alpha_{i} , \ldots ,\alpha_{m}, i \in [1,m]\) and \(\beta_{1} ,\beta_{2} , \ldots ,\beta_{i} , \ldots ,\beta_{n} ,j \in [1,n]\), respectively.

  2. 2.

    Use the estimated \(\alpha\) and \(\beta\) to solve the general log-likelihood function to find θ through Eq. 7:

$$\begin{aligned} \ln \,L[f_{x,y} (x,y;\alpha ,\beta ,\theta )] & = \sum\limits_{k = 1}^{k} {\ln \,C_{\theta } (F_{X} (x_{k} ;\alpha ),F_{Y} (y_{k} ;\beta } );\theta ) \\ & \quad + \sum\limits_{k = 1}^{k} {\ln [f_{x} (x_{k} ;\alpha ) + f_{Y} (y_{k} ;\beta )]} \\ \end{aligned}$$
(7)

The accepted rang of Gaussian Copula parameter is (\(- 1 < \theta < 1\)).

2.7 Bivariate Joint Return Periods

For the bivariate case, the joint return period can be characterized in two ways: (i) Return period for X ≥ x AND Y ≥ y, let the corresponding return period represented by TXY; (ii) Return period for X ≥ x OR Y ≥ y, let the corresponding return period represented by T’XY. The joint return periods, and, or Copula-based flood events can be expressed as follows [13, 29]):

$$\begin{aligned} T_{x,y} & = \frac{1}{{P(X \ge x\,{\text{AND}}\,Y \ge y)}} = \frac{1}{{1 - F_{x} (x) - F_{y} (y) + F_{x,y} (x,y)}} \\ & = \frac{1}{{1 - F_{x} (x) - F_{y} (y) + C\left[ {F_{x} (x),F_{y} (y)} \right]}} \\ \end{aligned}$$
(8)
$$T_{x,y}^{\prime } = \frac{1}{{P(X \ge x\;{\text{OR}}\;Y \ge y)}} = \frac{1}{{1 - F_{x,y} (x,y)}} = \frac{1}{{1 - C\left[ {F_{x} (x)F_{y} (y)} \right]}}$$
(9)

Based on the above equations, the meaning of \(T_{x,y}\) is the joint return period for variable X equal or greater then a certain value and variable Y equal to or greater than another certain value. On the other hand, the meaning of \(T_{x,y}^{\prime }\) is the joint return period for variable X equal or greater than a certain value or variable Y equal to or greater than another certain value [30].

3 Result and Discussion

3.1 Statistical Analysis

The summary of statistics for the flood parameters is given in Table 1. The observed averages of peak flow, flood duration, and flood volume at the study site were 248 m3/s, 349 h, and 105 mm, respectively.

Table 1 The summary statistics of flood parameters

Table 2 presents the contribution of the shape parameter (\(\kappa\)), continuous scale parameter (\(\sigma\)), and continuous location parameter (\(\mu\)) of various distributions used to fit flood variable data. Results of the K-S test showed that the Generalized Pareto distribution is most compatible with peak flow distribution. The best-fit distribution for the flood duration and volume distributions is the Generalized Extreme Value. The Kendall’s rank correlations are tabulated in Table 3. The positive value of Kendall’s rank correlation shows that the flood variables are dependent and satisfy the first condition of Copula [31].

Table 2 Fitting result parameters for various distributions of flood variables
Table 3 Kendall’s rank correlations and Copula linear correlation parameter among flood variables

Figures 2, 3, and 4 present the joint CDF of the peak flow and duration, peak flow and volume, and duration and volume based on Gaussian Copula, respectively. By horizontally cutting the joint cumulative distribution, a set of counter lines are obtained, which are also shown in the figures. It should be noted that for a given joint probability, there may exist more than one possible flood variable combinations. The contour lines of joint cumulative distribution of the peak flow and duration are depicted in Fig. 2. Joint cumulative distribution refers to the probability that a specified value of one variable will be exceeded at the same time with a specified value of a second variable. Therefore, the joint probability graph presented in Fig. 2 refers to the chance of two conditions, namely peak flow and flood duration occurring at the same time. Joint cumulative contours, which are lines of equal probability of the variables, are simultaneous probability values indicated by any point on the contour. The amount of a CDF value is the probability of flood variables which call P(x) to being less than or equal to the specific value. On the other hand, the probability of flood variables to exceed the specific value is 1 − P(x).

Fig. 2
figure 2

The joint CDF f(x, y) of peak flow (Q), duration (D), and the contour of f(x, y) based on Gaussian Copula

Fig. 3
figure 3

The joint CDF g(x, y) of peak flow (Q), volume (V), and the contour of g(x, y) based on Gaussian Copula

Fig. 4
figure 4

The joint CDF h(x, y) of duration (D), volume (V), and the contour of h(x, y) based on Gaussian Copula

Figures 2, 3, and 4 show how joint distribution of two flood variables can be determined simultaneously and thus more meaningful for solving many problems of hydrological design. For example, for a given flood peak, it is possible to obtain the probability of non-occurrence of various combinations of flood duration and flood volume, and vice versa. In short, the figure allows one to obtain information concerning the occurrence probabilities of flood volume under the condition that a less than or equal to given flood peak or flood duration occurs, and vice versa.

The Copula-based joint CDF of peak flow and flood duration for Johor River based on the Gaussian Copula is shown in Eq. (10):

$$\begin{aligned} F(x,y) & = \int\limits_{ - \infty }^{{\phi^{ - 1} \left( {1 - \left( {1 - 0.04\frac{x - 70.684}{184.48}} \right)^{{\frac{1}{0.04}}} } \right)}} {\int\limits_{ - \infty }^{{\phi^{ - 1} \left( {\exp \left( { - \left( {1 - 0.2\left( {\frac{y - 299.35}{122.45}} \right)} \right)^{{\frac{1}{0.24}}} } \right)} \right)}} {0.1592} } \\ & \quad \quad \times \exp \left\{ { - \frac{{x^{2} - 0.033\,xy + y^{2} }}{1.9995}} \right\}{\text{d}}x{\text{d}}y \\ \end{aligned}$$
(10)

where \(\phi = {\text{CDF}}\) of the standard normal and the \(\phi^{ - 1}\) is the inversed standard normal.

Figure 2 revealed that the probabilities are between 0.2 and 1.0 based on the Gaussian Copula. Furthermore, it is possible to derive the probability of occurrence for each pair of given values of peak flow and duration. For example, for the peak flow values of less or equal to 113 m3/s and duration less or equal to 229 h, the probability of occurrence is 0.2. Conversely, the probability of occurrence for peak flow exceeding 113 m3/s and the duration exceeding 228 h is 0.8 (1–0.2).

The probabilities of occurrence of joint peak flow and duration based on the Gaussian Copula are 0.6,0.4, 0.2, and 0, when the peak flows are greater than 168 m3/s, 241 m3/s, 339 m3/s, and 447 m3/s and the durations greater than 296 h, 363 h, 433 h, and 492 h, respectively. Specifically, the joint probability of occurrence for this combination is close to 0 when the peak flow is greater than 447 m3/s and duration greater than 492 h. Also, Fig. 2 shows that when the peak flow is equal to 113 m3/s, the probability of not being exceeded is always 0.2 for all values of flood durations that fall on the 0.2 contour line.

Meanwhile, the Gaussian Copula-based joint CDF of peak flow and flood volume can be calculated as in Eq. 11 below:

$$\begin{aligned} \phi^{ - 1} G(x,y) & = \int\limits_{ - \infty }^{{\phi^{ - 1} \left( {1 - \left( {1 - 0.03\frac{x - 70.684}{184.48}} \right)^{{\frac{1}{0.04}}} } \right)}} {\int\limits_{ - \infty }^{{\phi^{ - 1} \left( {\exp \left( { - \left( {1 - 0.07399\left( {\frac{y - 83.017}{42.822}} \right)} \right)^{{\frac{1}{0.07}}} } \right)} \right)}} {0.2084} } \\ & \quad \times \exp \left\{ { - \frac{{x^{2} - 1.291{\mkern 1mu} xy + y^{2} }}{1.1667}} \right\}{\text{d}}x{\text{d}}y \\ \end{aligned}$$
(11)

Figure 3 depicts the contour lines generated from Eq. 11. For hydraulic design and hydraulic infrastructure operation, a combined occurrence of these two flood characteristics is often important. The joint probability graph presented in Fig. 3 refers to the chance of peak flow and flood volume occurring simultaneously. In Fig. 3a, b, the probabilities are shown in intervals of 0.2 from 0 to 1.0.

The probabilities of joint peak flow and flood volume occurrence based on the Gaussian Copula were 0.8, 0.6, 0.4, 0.2, and 0, when the peak flow were greater than 81, 139, 202, 308, and 428 m3/s and the flood volume greater than 19, 66, 113, 159, and 183 mm, respectively. It can be mentioned that the joint probability is 0.2 when the peak flow exceed 308 m3/s and the flood volume is greater than 159 mm. The joint probability is almost zero when the peak flood exceeds 428 m3/s and flood volume is more than 183 mm. On the other hand, when the peak flow is greater than 81 m3/s, the probability of occurrence across all flood volumes is anticipated to be always 0.8.

The Copula-based joint CDF of flood duration-flood volume for Johor River based on the Gaussian Copula is calculated as in Eq. 12:

$$\begin{aligned} H(x,y) & = \int\limits_{ - \infty }^{{\phi^{ - 1} \left( {\exp \left( { - \left( {1 - 0.2\left( {\frac{x - 299.35}{122.45}} \right)} \right)^{{\frac{1}{0.2}}} } \right)} \right)}} {\int\limits_{ - \infty }^{{\phi^{ - 1} \left( {\exp \left( { - \left( {1 - 0.07\left( {\frac{y - 83.017}{42.822}} \right)} \right)^{{\frac{1}{0.09}}} } \right)} \right)}} {0.1772} } \\ & \quad \times \exp \left\{ { - \frac{{x^{2} - 0.8794{\mkern 1mu} xy + y^{2} }}{1.6133}} \right\}{\text{d}}x{\text{d}}y \\ \end{aligned}$$
(12)

Similar to Figs. 2 and 3, in Fig. 4, the contour lines of joint CDF for flood duration and flood volume based on Eq. 12 are shown. Similarly, it illustrates the chance of a certain amount of flood duration and flood volume occurring at the same time. As shown in Fig. 4, the probabilities of occurrence based on the Gaussian Copula were 0.8, 0.6, 0.4, 0.2, and 0, when the flood duration was greater than or equal to 221, 307, 368, 420, and 496 h while the flood volume was greater than or equal to 66, 96, 129, 159, and 194 mm. These are interpreted in such manner: assuming that the flood duration is greater than or equal to 420 h and the flood volume is not smaller than 159 mm, the joint probability of occurrence is thus 0.2. Also, if flood duration and flood volume are, respectively, smaller than 420 h and 159 mm, the joint probability now becomes 0.8. The probability is at its maximum (i.e., 0) when flood duration exceeds 496 h and flood volume exceeds 194 mm. Also, the probability is consistently 0.2 if the flood volume is less than 66 mm.

Figures 2, 3, and 4, show joint distribution of two flood variables can be determined simultaneously to help in solving many of hydrological design problems is shown. The graphs allow one to obtain information concerning the occurrence probabilities when two variables are recorded at certain values.

The contour lines for specific joint return periods, in which both peak flow and duration are exceeded (TQD), peak flow and volume are exceeded (TQV), and flood volume and duration are exceeded (TVD), have inward bounds as shown in Figs. 5a, 6a, and 7a, respectively, whereas the contour lines for specific joint return periods, in which either peak flow or duration is exceeded (TQD), peak flow or volume is exceeded (TQV), and flood volume or duration is exceeded (TVD), have outward bounds as shown in Figs. 5b, 6b, and 7b, respectively.

Fig. 5
figure 5

Joint return periods of peak flow and duration based on the Gaussian Copula: a both duration and peak flow are exceeded, TQD (years); b either duration or peak flow is exceeded, TQD (years)

Fig. 6
figure 6

Joint return periods of peak flow and volume based on the Gaussian Copula for a both volume and peak flow are exceeded, TQV (years); b either volume or peak flow is exceeded, TQV (years)

Fig. 7
figure 7

Joint return periods of flood duration and volume based on the Gaussian Copula: a both flood duration and volume are exceeded, TVD (years); b either duration or volume is exceeded, TVD (years)

Figure 5a is derived from the Gaussian Copula and shows that for the bivariate joint return periods of 2, 5, 10, 20, 50, and 100 years, both peak flow and flood duration are, respectively, greater than or equal to 230 m3/s and 367 h; 391 m3/s and 447 h; 510 m3/s and 528 h; 623 m3/s and 580 h; 770 m3/s and 630 h; and 874 m3/s and 668 h. Therefore, the joint return period of occurrence of peak flow and flood duration is 50 years when peak flow is greater than 770 m3/s and flood duration is greater than 630 h. In addition, Fig. 5b shows the joint return period of peak flow or flood duration for 2, 5, 10, 20, 50, and 100 years when flood peak exceeds a specific value or flood duration exceeds another specific value.

Historical floods are analyzed by using the graphs given in Figs. 5a, b. The worst flood in the basin during the study period occurred in the year 1995–1996. The flood had a peak flow of 725 m3/s and duration of 192 h. The joint return periods derived from Gaussian Copula for this flood event TQD estimated using Eq. 7; TQD using Eq. 8 are 38.64 and 0.94 years, respectively. This means that the joint return period of peak flow and flood duration greater than or equal to 725 m3/s and 192 h, respectively, is 38.64 years, and the return period of either peak flow greater than 725 m3/s or flood duration greater than 192 h is only 0.94 year. The summary of bivariate joint return period of flood variables based on the Gaussian Copula from Figs. 5, 6, and 7 is as follow:

From Figs. 5a, 6a, and 7a, it can be noticed that for the same values of peak flow and duration, peak flow and volume, volume and duration, the joint return period of T is much greater than that of T′. For example, in the mentioned year 1995–1996, for annual peak flow of 725 m3/s the corresponding flood duration was 192 h, the joint return periods for this flood event TQD is 38.64 years and TQD is 0.94 year. Similar results are observed for joint return periods of volume and duration (i.e., TVD is greater than that of TVD) and also joint return periods of peak flow and volume (i.e., TQV is greater than that of TQV).

4 Conclusion

The concept of Copula has been used in this paper to derive bivariate joint distributions for the Johor River Basin’s flood characteristics. The modeling of flood variable function was done using the Generalized Pareto, Pearson, Exponential, Beta, and GEV distributions with their goodness of fit measured using the K-S test. The best fitted distribution functions were used to develop the joint CDFs of peak flow volume, volume duration, and peak flow duration using Gaussian Copula. Moderate correlations were found between peak flow and flood volume (Kendall’s τ = 0.472) as well as flood duration and flood volume (Kendall’s τ = 0.333). However, the peak flow and flood duration combination had a weak correlation (Kendall’s τ = 0.015). Statistically significant correlations between the flood variables are prerequisite for Gaussian Copula bivariate flood frequency analysis. The Copula parameter \(\theta\) was applied to model the joint distributions of correlated flood variables for each pair based on the IFM method. Results showed that all calculated \(\theta\) values were within the acceptable range and could be applied to compute the bivariate joint distribution of flood variables for Gaussian Copula families. By horizontally cutting the joint CDF, a set of contour lines were obtained for each Copula family which represented the occurrence probabilities for the joint variables at intervals of 0.2, 0.4, 06, 0.8, and 1.0 were obtained. The joint return periods for pair of flood variables were also calculated. For Gaussian Copula, the bivariate joint return periods of 2, 5, 10, 20, 50, and 100 years for peak flow (m3/s) equal to or greater than and flood duration (h) equal to or greater than are (230 m3/s, 367 h), (391 m3/s, 474 h), (510 m3/s, 528 h), (623 m3/s, 580 h), (770 m3/s, 630 h), and (874 m3/s 668 h), respectively. For hydraulic design and hydraulic infrastructure operation, a combined occurrence of two flood characteristics is often important. Therefore, it is expected that the results of bivariate Copula frequency analysis could provide a better alternative for water resources management and flood risk assessment. Development of trivariate Copula functions for flood frequency analysis is recommended for future work. For simplicity, this can be achieved by performing bivariate analysis in two stages: first by determining the bivariate of variable pairs and then rerunning the bivariate of the paired variables with another variable. Examples of combinations include the bivariate of peak flow flood volume with flood duration; peak flow duration with flood volume; and flood volume flood duration with peak flow.