1 Introduction

Floods occur for a number of reasons, however the primary cause behind most floods is heavy rainfall over a long period of time (McBride and Nicholls 1983; Nicholls and Wong 1990). During the past several years, the state of Victoria, Australia, has experienced severe floods, especially at higher altitudes in the western and northeastern part of the state. Heavy rainfall across north-eastern Australia contributed to this natural disaster by causing flooding in the upper reaches of many of Victoria’s major rivers. These major floods caused massive destruction to existing infrastructure and resulted in hundreds of evacuations in affected areas. Regions that experienced these extreme climactic events also faced severe damages to properties and farmlands, with severe losses in incomes and consequential economic hardship. Due to the desire to understand, predict and control these recurring extreme flood events, research into areas related to occurrences of rainfall and its severity have increased significantly e.g. Pui et al. (2011), Mehrotra and Sharma (2011), Chappell et al. (2013), and Zhao et al. (2013).

The practice of using parametric distributions such as Lognormal, Weibull, Gamma, Pearson, Gumbel and other extreme value distributions to analyze rainfall and drought data has become very common among researchers in climatology and environmental studies. For example, Abdul Rauf and Zeephongsekul (2011, 2014) used parametric distributions to investigate rainfall severity and duration patterns in the state of Victoria, Australia; Shiau (2006) and Shiau and Modarres (2009) estimated the Standard Precipitation Index by fitting rainfall intensity using the Gamma distribution. As would be expected, this parametric approach does not work well for every precipitation data and appears to fit poorly near the tails of the distribution (Haghighat jou et al. 2013). To alleviate this problem, a nonparametric kernel density approach has been applied to fit precipitation data. Examples of such works are Haghighat jou et al. (2013), where nonparametric kernel density is used to estimate the annual precipitation series in Iran, and Sharma (2000), who presented a nonparametric long-term probabilistic forecast model based on estimation of the conditional probability distribution of rainfall using nonparametric kernel density estimation techniques.

Since rainfall characteristics such as intensity, severity and duration are important variables in hydrological research, deriving their joint distribution in order to study their statistical behavior is crucial. In the traditional approach, the joint distribution of these characteristics are from the same parametric family of distributions. For example, Yue (2000) model the annual maximum storm peaks and amounts by a normal distribution. In Shiau (2003), a bivariate extreme value distribution with Gumbel marginal distributions is used to model extreme flood events characterized by flood volumes and flood peaks. However, it is not realistic to postulate the same parametric marginal distribution for all characteristics since this assumption is usually not justified in practice. Sklar in 1959 introduced the concept of a copula function joining different marginal distributions into a multivariate distribution (Joe 1997; Nelsen 2006; Genest and Favre 2007). Since its introduction, copulas have been widely applied in many disciplines, especially by economists, climate scientists, hydrologists and actuarial scientists. Copulas was introduced in rainfall studies (Serinaldi et al. 2009; Kao and Govindaraju 2007, 2008, 2010) as it provides a more flexible approach that allows different types of marginal distributions to coexist and join together by a copula to form a multivariate distribution. Another advantage of using copula is the relaxation of the independence assumption which is inappropriate in modeling hydrological variables (De Michele and Salvadori 2003; Genest and Favre 2007; Zhang and Singh 2007).

The study of rainfall characteristics is one of the key research areas among researchers working in the area of water resources management. Two pertinent rainfall characteristics are the severity and duration of rainfall in a geographic region. Prior to embarking upon developing any viable water plan, the determination of the joint distribution of these two rainfall characteristics is very important and this will be an objective of this study. The first application of a nonparametric approach to copulas analysis was by Deheuvels (1979) who used empirical copula with empirical marginal distributions. Genest et al. (1995) used a semiparametric method by postulating that the copula function belongs to a parametric family whose parameter is estimated using maximum likelihood estimation. Scaillet and Fermanian (2002) applied a similar method to estimate copulas where there is temporal dependence, a situation prevalent in financial time series data. In another study, Chen and Huang (2007) proposed a bivariate kernel copula which assists in alleviating the problem of boundary bias.

The novelty of this paper is to introduce a nonparametric and a semiparametric approach to the analysis of a bivariate rainfall data model for two rainfall characteristics, namely severity and duration. This will involve using both a parametric and nonparametric copula as well as marginal distributions. To the best of our knowledge, the nonparametric and semiparametric approaches have not been exploited to any great extent and certainly not in the context of analyzing Australian rainfall data. Three approaches will be used here. The first approach combines nonparametric marginal distributions with a parametric copula. The role is reversed in the second approach where the marginal distributions are assumed parametric and the copula nonparametric. The third approach utilizes both nonparametric marginal distribution and nonparametric copula. These approaches supplement our earlier work (Abdul Rauf and Zeephongsekul 2014) where a parametric approach was used to estimate both the marginal distributions and the copula.

This paper is organized as follows. After this Introduction, Section 2 briefly describes the study area of the selected rainfall stations. Section 3 consists of the theoretical framework for the proposed models. Section 4 concludes the paper with some suggestions for future work.

2 Study Area and Data

The focus of this study is the state of Victoria, Australia. Victoria is located in the south-eastern part of Australia. Geographically, it is the smallest mainland state (refer to Fig. 1). Melbourne, the capital of Victoria, is Australia’s second largest city. Melbourne has four seasons although the climate is highly variable between seasons. Summer occurs from December to February, autumn during March to May, winter from June to August and spring from September to November. Annual maximum temperatures for Melbourne occur in the summer months of January and February. During this time, the climate is hot with dry spells. Although winter has the coldest temperature October tends to be the wettest month. Rainfall varies across the state, for example in 2010, maximum rainfall (single point estimate) in Victoria ranged from 425 millimetres to 1,250 millimetres.

Fig. 1
figure 1

Map of Victoria Australia and the coordinates of the six rainfall stations

Given that a large amount of time series observations are required in order to construct reliable bivariate statistical models of joint distributions of rainfall duration and severity, a 61-year record of data during the years 1950 –2010 is obtained from the Bureau of Meteorology (BOM) Australia, for analysis. Table 1 gives the geographical coordinates, annual mean rainfall and percentages of missing observations of the six selected rainfall stations used to collect rainfall data.

Table 1 Geographic locations of the six (6) selected stations in Victoria

3 Models and Data Analysis

3.1 Kernel Density Estimation

Kernel density estimation is one of the most popular non-parametric method used to estimate the probability density function (pdf) of a random variable. It is easy to apply and can often uncover structural features in the data set which a parametric approach might not reveal. Recently, nonparametric methods have been extensively applied to rainfall studies Sharma and Lall (1999), Sharma (2000), Haghighat jou et al. (2013), and Kim et al. (2006) used kernel density estimation to estimate the rainfall probability density function. In this paper, we will estimate the marginal distribution functions using both parametric and non-parametric approach. For the parametric approach, we will fit the Gamma, Weibull, Log-normal and Exponential to the data and, for the nonparametric approach, kernel density estimation is used to estimate the marginal distribution of both rainfall characteristics, i.e. its severity and its duration.

Given a random sample X 1, ... , X n from a population with a continuous, univariate probability density function (pdf) f(⋅) and cumulative distribution function (cdf) F(⋅), the kernel density estimator of f(⋅) is defined as

$$ \hat{f}(x,h)=\frac{1}{nh}\sum\limits_{i=1}^{n} K\left(\frac{x-X_{i}}{h}\right) $$
(1)

where K(⋅) is the kernel function and h the bandwidth. Under some mild conditions (c.f. Wand and Jones 1995), the kernel density estimator is a consistent estimator of the true pdf f(⋅). Kernel functions are symmetric unimodal functions about the origin and they satisfy the conditions \( \lim _{|x|\rightarrow \infty } |x| K(x) = 0\) and \({\int }_{- \infty }^{\infty }\, x^{2} K(x) dx < \infty \). The standard kernel functions are the Gaussian, Triangular, Biweight and the Epanechnikov functions c.f. (Wand and Jones 1995). The nonparametric kernel estimator of F(x) is obtained by integrating (1) giving

$$ \hat{F}(x,h)=\frac{1}{nh}\sum\limits_{i=1}^{n}{\int}_{-\infty}^{x} K(\frac{u-x_{i}}{h})du$$
(2)
$$ =\frac{1}{n}\sum\limits_{i=1}^{n} \textbf{K}(\frac{x-x_{i}}{h}) $$
(3)

where \(\textbf {K}(x)={\int }_{-\infty }^{x} K(u) du\).

It is well known that the choice of a kernel function does not significantly affect the quality of the approximation and in this paper we use the Gaussian kernel which is defined by

$$ K(u)=\frac{1}{\sqrt{2\pi}} \text{exp} (-\frac{1}{2}u^{2}). $$
(4)

The choice of the bandwidth h, on the other hand, does affect the quality of the approximation. For the bandwidth, we adopt the value suggested by Silverman (1986) which is

$$ h=0.9An^{-1/5} $$
(5)

where

$$A= \frac{\min\{\mathrm{standard deviation, interquartile range}\}}{1.34}. $$

It was shown in Silverman (1986) that this bandwidth adapts well to the Gaussian kernel and is a robust measure of the spread of the underlying distribution of which it is estimating. Using simulated data, it was also shown that this bandwidth gives a reasonable Mean Integrated Square Error (MISE) values where

$$ \text{MISE}(\hat{f})=E\left(\int{\lbrack\hat{f}(x)-f(x)\rbrack}^{2} dx\right) $$
(6)

as well as revealing the bimodality and skewness of the underlying distribution.

3.2 Standard Precipitation Index

Standard Precipitation Index (SPI) was introduced by McKee et al. (1993) for monitoring drought and was used to identify extreme drought events and evaluate their severity. SPI measures the precipitation deviations based on the long-term precipitation data for a given period. One of the most significant aspect of this index arises from the fact that it is easy to calculate and interpretable using standard probabilistic analysis (Guttman 1998). SPI is calculated as the precipitation value y such that,

$$ {\Phi}(SPI)=F(y) $$
(7)

or, equivalently

$$ SPI={\Phi}^{-1}(F(y)) $$
(8)

where F(⋅) is the cdf of the precipitation variable and Φ(.) is the cumulative standard normal distribution function. The standard procedure used to compute SPI is to first fit the precipitation data to the Gamma distribution using Maximum Likelihood Estimation (MLE) method (Yusof et al. 2013), and then apply Eq. 8 to obtain the SPI values. The details SPI categories and how to monitor the rainfall characteristics (severity and duration) can be found in Yusof et al. (2013) and Abdul Rauf and Zeephongsekul (2014).

One key disadvantage in using the above approach is that for many precipitation time series data, the Gamma distribution does not provide a good fit to rainfall data. This then called upon an alternative approach and, when uncertain as to which parametric distribution to use, it is best to adopt a nonparametric approach since it imposes less restriction on the underlying distributions. A nonparametric approach has been introduced by Cancelliere et al. (2007) in drought forecasting where they proposed two methodologies for forecasting seasonal SPI. Kim et al. (2006) used a nonparametric local polynomial estimator which called upon a kernel smoother to build an explicit model for the equiprobable transformation of the cumulative distribution functions of rainfall. In this study, instead of using gamma distribution, we used kernel density estimation in fitting the precipitation data.

The input data for this study consists of monthly SPI values computed in a 3–month time scale for the period from June 1950 to June 2010. For example, the monthly SPI calculated in a 3–month time scale at the end of November deploys the precipitation total for September, October and November in that particular year. Similarly, the 3-month SPI calculated for November 1950 would have used the precipitation total of September 1950 to November 1950 in order to calculate the index. The indices used in this study were prepared by adopting the procedure employed by Kim et al. (2006). Figure 2 presents the nonparametric SPI for Acherton Station, Victoria.

Fig. 2
figure 2

Nonparametric SPI Indices for Archerton Station, Victoria (1950-2010)

3.3 Copulas and Semiparametric models

The traditional approach in building multivariate distributions in the field of hydrology has several limitations due to the fact that hydrological variables such as rainfall severity and duration, are highly dependent and their distributions can differ significantly from each others. These restrictions can be accommodated with the introduction of copulas into the modeling process. Copulas, introduced in 1959 by A. Sklar, facilitate the modeling of the dependency between variables and allow the flexibility of choosing the marginal distributions of the individual variables. This attribute is akin to a multivariate normal distribution where the mean vector and covariance matrix jointly determines the multivariate distribution, although in this case, the marginal distributions are also normal. In the case of copulas, the marginal distributions and copula function will determine the joint distribution and does it uniquely if these marginal distributions are continuous functions. The name copula is derived from the Latin word copulare which means to couple or join. This term lends emphasis to a copula’s role in linking the univariate marginal distribution functions to form a joint distribution function. For two variables, Sklar’s theorem (Nelsen 2006) states that if F X, Y (x, y) is a joint distribution function of a bivariate random variables (X, Y) with marginal distributions F X (x) and F Y (y) respectively, then there exists a copula function C(⋅) such that

$$ F_{X,Y}(x,y)=C(F_{X}(x),F_{Y}(y)). $$
(9)

If both F X (x) and F Y (y) are continuous distributions, then this copula is unique for the particular joint distribution. Differentiating (9) with respect to x and y yields the joint probability density function given by

$$ f_{X,Y}(x,y)= c(F_{X}(x), F_{Y}(y)) \cdot f_{X}(x) \cdot f_{Y}(y) $$
(10)

where c is the copula density function defined by

$$ c(u,v)=\frac{\partial^{2} C(u,v)}{\partial u \partial v}. $$
(11)

Two–dimensional copulas have been applied to model hydrological and drought phenomena by a number of researchers including Shiau (2006), Zhang and Singh (2007), Serinaldi et al. (2009), and Mirabbasi et al. (2012).

This paper will focus on three specific approaches to the analysis of bivariate rainfall variables, namely severity and duration. These approaches, summarized in Table 2, are both nonparametric and semiparametric. In the table P will refer to Parametric and N to Nonparametric. The parametric distributions that will be used to fit the marginal distributions are the Gamma, Log-Normal, Weibull and Exponential distributions as these were seen in an earlier paper (Abdul Rauf and Zeephongsekul 2014) to fit the data well. In the nonparametric case, the kernel density functions discussed in Section 3.1 is used to estimate the marginal distributions. In a related paper, Reddy and Ganguli (2012) applied both parametric and nonparametric approaches for fitting marginal distributions.

Table 2 Proposed Semiparametric Approaches

3.3.1 PN Model

In this model, the marginal distributions are parametric and the copula nonparametric. The copula is based on the Beta kernel function developed by Brown and Chen (1999), Harrell and Davis (1982) and Chen (1999, 2000). One of the attractive features of this kernel which warranted its use here is its ability to alleviate the severe boundary bias common in many standard kernel estimators. Univariate Beta-kernel density function based on sample of uniform variables U 1,U 2, … , U n with support in [0,1] is defined by

$$ b(u)=\frac{1}{n}\sum\limits_{i=1}^{n} K \left(U_{i},\frac{u}{h} + 1, \frac{1-u}{h} + 1 \right) $$
(12)

where K(⋅,α, β) denotes the Beta density function with parameters α and β given by

$$ K(x,\alpha,\beta)=\frac{\Gamma(\alpha){\Gamma}(\beta)}{\Gamma(\alpha+\beta)} x^{\alpha-1} (1-x)^{\beta} $$
(13)

and h is the bandwidth.For this model, we adopt the Beta kernel copula, introduced by Charpentier et al. (2006), which has the copula density function obtained from the product of Beta densities, defined by

$$\begin{array}{@{}rcl@{}} \hat{c}_{h}(u,v) &=& \frac{1}{nh^{2}}\sum\limits_{i=1}^{n} K\left(U_{i}, \frac{u}{h}+1,\frac{1-u}{h}+1\right) \\ &&\times K\left(V_{i}, \frac{v}{h}+1,\frac{1-v}{h}+1\right). \end{array} $$
(14)

3.3.2 NP Model

In this approach, we combine nonparametric marginal distributions which utilize the kernel density function (1) with a parametric copula. The three parametric copulas belonging to the family of Archimedean copulas are given below:

  1. 1.

    Clayton

    $$\begin{array}{@{}rcl@{}} C_{\theta}(u,v) &= & (u^{-\theta}+v^{-\theta}-1)^{-1/\theta}, \\ & & \hspace{2cm} 0\leq\theta<\infty. \end{array} $$
    (15)
  2. 2.

    Frank

    $$\begin{array}{@{}rcl@{}} C_{\theta}(u,v) &=& -\theta^{-1}\log \left(1+\frac{(e^{-\theta u}-1)(e^{-\theta v}-1)}{e^{-\theta}-1}\right), \\ & & \hspace{2cm} -\infty\leq\theta<\infty. \end{array} $$
    (16)
  3. 3.

    Gumbel-Hougaard

    $$\begin{array}{@{}rcl@{}} C_{\theta}(u,v) &=& \exp{\left[-((-{{\log u})^{\theta} + ({-\log v})^{\theta}})^{1/ \theta}\right]}, \\ & & \hspace{2cm} 1\leq\theta<\infty. \end{array} $$
    (17)

Note that θ, the copulas parameter, is usually estimated using Maximum Likelihood Estimation (MLE) method.

3.3.3 NN Model

In the third approach, we have a combination of both nonparametric marginal distributions and nonparametric copula. We use a nonparametric kernel density to estimate the marginal distributions of rainfall severity and rainfall duration. The Beta kernel copula introduced in Eq. 14 is employed to obtain the joint cdf of the rainfall characteristics.

3.4 Application of Models to Data

In this section, we build a comprehensive copula-based model to estimate the joint distribution of our two rainfall characteristics. To recapitulate, data from six rainfall stations that are generally considered as flood prone areas in the state of Victoria, Australia are used in this study. Archerton, Woods Point and Mount Buffalo Chalet are located in the North-eastern region of Victoria and Tanybryn, Weaaproinah and Wyelangtathree are located in the South-western region.

Figure 2 shows the monthly SPI for Acherton from year 1950 to 2010. From this graph, it is apparent that Archerton regularly faces very wet event once every four years between 1950 to 1981. But, from 1981 to 1997, the region had a relatively dry period with SPI not exceeding 2.0. After this period, this area regularly faces very wet event once every 5 to 6 years with SPI exceeding 2.0 on two occasions during these years.

For the first approach (PN model), four parametric distributions, namely Gamma, Weibull, Lognormal and Exponential distributions, were fitted to rainfall severity and rainfall duration for the six stations. All parameters for the each marginal distributions are estimated from the data using the MLE method. Table 3 present the values of the estimated parameters. We note here that since a nonparametric approach has been used to compute SPI values, the results shown in the table are different from the results produced by a parametric approach in Abdul Rauf and Zeephongsekul (2014). The goodness of fit statistics used were the Schwartz Information Criterion (SIC), also known as the Bayesian Information Criterion (BIC), and Akaike Information Criterion (AIC) defined below:

$$ BIC= -2 \ln (L_{max}) + k\,\ln(n) $$
(18)
$$ AIC= -2\ln (L_{max}) + (\frac{2nk}{n-k-1}). $$
(19)

Here n be number of observations, k the number of parameters to be estimated and L m a x is the maximum value of the log-likelihood function for the estimated model. For each station, the best fitted distribution for each rainfall characteristic is subsequently selected using the AIC and BIC values and these are displayed in Table 4. The parametric models that best fitted the data would give the lowest values using these criteria. The table shows that the Lognormal distribution best fitted both rainfall severity and duration. Consider Wyelangta’s station as an example, we estimated its best fitted cumulative distributions for rainfall severity and rainfall duration to be

$$ F_{S}(s)=\frac{1}{0.47 \sqrt{2\pi}t}{{\int}_{0}^{s}}\, {e^{-\frac{(\ln t - 0.69)^{2}}{0.44}}}dt,\,\, s>0 $$
(20)

and

$$\begin{array}{@{}rcl@{}} F_{D}(d)=\frac{1}{0.39 \sqrt{2\pi}t}{{\int}_{0}^{s}}\, e^{-\frac{(\ln t - 0.32)^{2}}{0.30}}dt, \,\,d>0 \end{array} $$
(21)

respectively. The Beta kernel copula density was computed using Eq. 14 with the uniform variables U i and V i generated using the Lognormal distributions (20) and (21) respectively. The resultant copula density is plotted in Fig. 3. Note that there is a bias correction at the extreme corners, i.e. with higher rainfall severity, the rainfall duration has longer duration with a higher probability and likewise at region with low rainfall intensity and duration.

Table 3 Parameter estimate for marginal distributions
Table 4 Goodness of Fit Test for Marginal Distributions- AIC and BIC Information Criterion
Fig. 3
figure 3

Bivariate beta copula density function of rainfall severity and rainfall duration: PN Wyelangta

We proceed with the second approach (NP model) that utilized a parametric copula model with nonparametric marginal distributions. For this purpose, the marginal distributions were estimated using kernel density functions and the three parametric Archimedean copulas given by Eqs. 15, 16 and 17 were chosen. For each of the three copulas, we estimates the parameter θ using MLE method and these are displayed in Table 5. Scatter plots of the simulated marginal distribution data were generated using the three estimated archimedean copula functions for all six stations. Due to space and page limitations, we have only displayed the plots for the first three stations (Fig. 4). The graphs show that there is a correlation between these marginal distribution data for all stations: the Frank copula demonstrates a symmetric dependence structure across all stations, while the Clayton and Gumbel-Hougaard copula were found to have a higher dependency in the tails. These dependencies were found to be higher in the left tail for the asymmetric Clayton copula, and in the right tail for the Gumbel-Hougaard copula. From the plot, we can conclude that the two rainfall characteristics show a high degree of dependence on each other and they are certainly captured by these scatterplots.

Table 5 Parameter estimate for fitted Copulas
Fig. 4
figure 4

Parametric copula based joint distribution for 3 stations (Archerton, Woods Point and Mount Buffalo Chalet)

The comparison between the three copulas to determine which copula is best suited for representing the joint distributions is done with the Goodness of fit test using a variation of the Cram\(\acute {\mathrm {e}}\)r-von Mises statistic, S n introduced in Genest et al. (2009) for family of copula. S n is here defined by:

$$ S_{n}=\sum\limits_{i=1}^{n} \left(C_{Ni}-C_{\theta i}\right)^{2} $$
(22)

where C N is the empirical copulas (Genest et al. 2009), C θ the estimated parametric copulas and the subscript i represents the sample number. The copula with the smallest value of S n (hence giving the largest p–value) is chosen to represent the joint distribution of rainfall severity and duration. The values of S n and p–values for all three copulas are presented in Table 6. For all the six stations, it is evident that the Clayton copula provides the best fit with largest p-value. The bivariate copula density function using the Clayton copula for Wyelangta station is displayed in Fig. 5, where the plot shows that Clayton copula has more probability concentrated in the left tail.

Table 6 Goodness-of-fit test for fitted Copulas using S n
Fig. 5
figure 5

Bivariate Clayton copula density function of rainfall severity and rainfall duration : NP Wyelangta

As was shown by Charpentier et al. (2006), a parametric copula would tend to underestimate the true joint distribution. In the third approach (NN), the choice of the Beta kernel is essential in order to improve the bias at the boundaries. In Fig. 6, the plot shows two peaks near the lower tails and a higher peak on the right tails of the distributions. There are some obvious disparities between this figure and Fig. 3 (PN model) and Fig. 5 (NP model) of the same station. Figure 5 shows low probability density near the tails of the surface which may not reflect the true nature of the joint distribution.

Fig. 6
figure 6

Bivariate beta copula density function of rainfall severity and rainfall duration : NN Wyelangta

To check the goodness of fit, we employ the Mean Absolute Error (MAE) between the fitted values based on each of the three approaches and the values obtained by calculating the corresponding empirical copula to the data. The MAE is defined in a similar way to S n but using absolute instead of mean–squared deviation:

$$ \textrm{MAE}=\frac{1}{n}\,\sum\limits_{i=1}^{n} |C_{Pi} - C_{Ni}| $$
(23)

where C P i is the fitted copula values which were calculated based on the three approaches, and C N i are the empirical copula values (Deheuvels 1979). From Table 7, it is found that the PN and NN approaches generally give the smallest MAE values with NN dominating for most of the selected stations. This again indicates that the nonparametric approach has great merit over the parametric approach when it comes to fitting joint distribution based on large historical hydrological data. Whether it is true in general remains to be seen with new field work and additional data collection.

Table 7 Comparison of three approaches using MAE

3.5 Return Periods

A return period is the interval of time between consecutive recurrence of an event such as severe draught, flood or extreme rainfall. Estimation of return periods of these events, characterized by various levels of severity, is an essential part of any hydrological and water planning projects. Most hydrological characteristics such as rainfall severity and its duration are highly dependent and separate analysis of each characteristic is therefore not sufficient in assessing overflow water risks or in performing flood analysis. A multivariate approach is therefore an essential and a far superior method used in analyzing these events. For example, Kim (2003) obtained the joint distributions of drought duration and drought intensity using bivariate kernel estimator for estimating the joint return periods for the arid regions in Conchos River Basin, Mexico. However, the traditional approach of considering the joint distribution of rainfall characteristics using standard bivariate modeling presents some limitations as mentioned, and these can be circumvented by using Copulas. In this section, the expected return periods, using a single or joint rainfall characteristics, introduced by Shiau and Shen (2001) formulated for drought events with certain severity and duration will be applied to the rainfall data from Victoria. This will incorporate copulas and the proposed semiparametric approaches outlined in Table 2.

Let L be the extreme rainfall interarrival time, then the expected return period (Shiau and Shen 2001) for floods with severity S greater than or equal to a certain value s is given by:

$$ T_{s}=\frac{E(L)}{1-F_{S}(s)} $$
(24)

where F S (s) is the cumulative distribution function of S. Similarly, the expected return period for floods with rainfall duration D greater than or equal to d is given by

$$ T_{d}=\frac{E(L)}{1-F_{D}(d)} $$
(25)

where F D (d) is the cumulative distribution of the rainfall duration.

In this study, we used two approaches to estimate the cumulative distribution functions for rainfall severity and durations. In the first (parametric) approach, we used the Lognormal distribution for both F S (s) and F D (d), since this distribution best fit both rainfall characteristics as can be seen from Table 4. In the second (nonparametric) approach, we used a kernel density estimator to estimate the distributions of the two rainfall characteristics.

The expected interarrival time for extreme rainfall events, E(L), for the six (6) selected stations in Table 1 were estimated to be 10.0, 10.8, 9.0, 10.6, 8.9 and 9.2 months respectively. The expected return periods for severity and duration exceeding certain values were then calculated using Eqs. 24 and 25 respectively.The return periods up to 100 years against a set of abscissae of severity for both Lognormal and kernel density estimate are displayed in Figs. 7 and 8 respectively. For example, if severity exceeds 8.60 and duration exceeds 5.2, then a return period of 100 years is expected using the Lognormal distribution. Similarly, if severity exceeds 10.7 and duration exceeds 5.9, then a return period of 100 years is expected using the kernel density estimator. As expected, the graphs of the return periods rise sharply with increasing severity and duration but much more smoothly in the parametric than the nonparametric case. The results also suggest the areas with higher mean annual rainfall and lower interarrival times are more exposed to the extreme rainfall events that can cause floods.

Fig. 7
figure 7

Return Period for rainfall severity using Lognormal

Fig. 8
figure 8

Return Period for rainfall severity using kernel density estimate

Next, we consider the joint return and conditional return period which specifically involve the dependency between the two rainfall characteristics. The joint probability can be calculated in terms of copulas which provides the freedom for the marginal distributions to assume any appropriate parametric or nonparametric form. The following four expected return periods were introduced by Shiau (2003) and are readily interpretable using conditional probabilities:

$$\begin{array}{@{}rcl@{}} T_{DS}&=& \frac{E(L)}{P(D \geq d \quad \textit{and} \quad S\geq s)} \\ &=&\frac{E(L)}{1-F_{D}(d)-F_{S}(s)+F_{DS}(d,s)} \\ &=&\frac{E(L)}{1-F_{D}(d)-F_{S}(s)+C(F_{D}(d), F_{S}(s))}. \end{array} $$
(26)
$$\begin{array}{@{}rcl@{}} T^{\prime}_{DS}&=& \frac{E(L)}{P(D \geq d \quad \textit{or} \quad S\geq s)}=\frac{E(L)}{1-F_{DS}(d,s)} \\ &=&\frac{E(L)}{1-C(F_{D}(d), F_{S}(s))} \end{array} $$
(27)

T D S (\(T^{\prime }_{DS}\)) is the conditional return period given that the duration Dd and (or) severity Ss. The following conditional return periods are based on further conditioning (26) on Ss and Dd respectively:

$$\begin{array}{@{}rcl@{}} T_{D|S\geq s} &=& \frac{T_{DS}}{1-F_{S}(s)} \\ & = & \frac{T_{s}}{1-F_{D}(d)-F_{S}(s)+C(F_{D}(d), F_{S}(s))} \\ \end{array} $$
(28)

and

$$\begin{array}{@{}rcl@{}} T_{S|D\geq d} & = & \frac{T_{DS}}{1-F_{D}(d)} \\ &=&\frac{T_{d}}{1-F_{D}(d)-F_{S}(s)+C(F_{D}(d), F_{S}(s))} \\ . \end{array} $$
(29)

Using all three approaches, Figs. 9 and 10 provide graphical representations of T D|Ss and T S|Dd for Archerton station given that rainfall severity and rainfall duration exceed certain values listed in the abscissae respectively. From the figures, we conclude that the results show similar trend for all cases. However, both PN and NP cases tend to show higher return periods for high severity at the same duration, while PN and NN cases tend to show a higher return periods for higher duration at the same severity.

Fig. 9
figure 9

Conditional return period of rainfall duration when rainfall severity exceed certain value - PN (top right), NP (top left) and NN (bottom)

Fig. 10
figure 10

Conditional return period of rainfall severity when rainfall duration exceed certain value - PN (top right), NP (top left) and NN (bottom)

4 Conclusions

Severe floods are worldwide phenomena which are becoming more frequent due to erratic weather conditions and climate changes. Since rainfall is the major cause of flood events, and its duration and severity influence their protraction, these two rainfall characteristics are considered important variables in hydrological studies.

In this paper, we used rainfall data from 6 selected stations from North-eastern and South-western Victoria, Australia, to analyze rainfall severity and duration prevalent in that state. A copula methodology is used to derive the joint distributions of these variables. A novelty of the paper is to employ both a nonparametric and a semiparametric approach, whereby in the first approach, we have allowed both the copula and the marginal distributions to assume nonparametric forms, while in the second approach, the copula and the marginal distributions take their turn assuming a parametric and nonparametric form respectively. This contrasts sharply with the standard approach adopted in many papers which assumes that both the marginal distributions and copulas assume parametric forms. Estimating parameters of these purely parametric models using standard techniques is often time consuming. Furthermore, legitimate concern can be raised with respect to their accuracies in case the assumptions underlying these models are violated or small data sets are involved. On the other hand, the nonparametric approach ameliorate these problems and can give better results without assuming a particular form for the marginal or copula distributions.

We began this paper by first quantifying severity through the Standard Precipitation Index (SPI). For SPI estimation, we presented an alternative approach using nonparametric kernel density by employing the Gaussian kernel to estimate the probability density function of rainfall intensity. We then apply several copulas–based approaches, each involving a combination of parametric or nonparametric marginal distributions conjoined by a parametric or nonparametric copula, to model the two rainfall characteristics. Using goodness of fit tests, we found that the Lognormal distribution provides the best fit among four parametric marginal distributions and the Clayton copula the best fit copula among the three Archimedean copulas chosen. Further, Table 7 indicates that the purely nonparametric approach (NN) generally provides a better fit to the data than the two mixed approaches. Finally, we used the three approaches to derive several return periods of severe rainfall events for the stations selected. Estimation of return period is of course crucial in water management planning and the results obtained are not unexpected. Both parametric and nonparametric approaches gave similar trend, with the parametric approach providing slightly higher return periods than the nonparametric approach.

There is much scope for applying the nonparametric approaches adopted in this paper to flood or even drought events in other regions of the world. It would also be interesting to see whether the results obtained in this paper can be replicated elsewhere. Finally, the methods can be extended without much difficulty to more than two rainfall characteristics thus making them more applicable under a wider range of flood conditions.