Introduction

Drought is an extreme event in hydrologic cycle. The occurrences of drought events usually feature determinacy and randomness. With the increasing impact of climate change and anthropogenic activities, drought happens in more areas with higher frequency. Drought issue has become one of the major factors to affect sustainable economic and social development. Drought occurs frequently not only in the northern China with shortage of water resources but also in the southern China with relatively abundant water resources. In recent years, several extreme drought events happened frequently in southwest China and the middle and lower Yangtze River. The annual average affected areas (the areas that crop yields decreased by over 10 % than normal annual yields) and damaged areas (the areas that crop yields decreased by over 30 % than normal annual yields) of drought disasters were nearly 0.21 × 108 km2 and 0.10 × 108 km2 from 1950 to 2010, which were 2.19 times and 1.77 times of the impacts of flood disasters, respectively (State Flood Control and Drought Relief Headquarters 2010). Therefore, it is important to study on the possibilities and impact of the drought duration and severity to cope with drought.

The drought with typical probabilistic characterized (Sen 1980a; Loaiciga and Leipnik 1996; Chung and Salas 2000; Mishra et al. 2009) is one of the hydrological extremes including multiple correlative variables such as the duration or severity of drought. The theory of runs provides a support for probability estimation of drought variable from time domain (Downer et al. 1967; Llamas and Siddiqui 1969; Sen 1976, 1980b; Dracup et al. 1980a, b; Frick et al. 1990; Fernández and Salas 1999a, b). The probabilistic feature analysis of drought mainly includes three aspects at present: (1) univariate analysis (Sen 1980a, b; Güven 1983; Zelenhasic and Salvai 1987; Mathier et al. 1992; Sharma 1995), (2) bivariate analysis (Shiau and Shen 2001; Bonaccorso et al. 2003; Kim et al. 2003; González and Valdés 2003; Salas et al. 2005; Mishra et al. 2009; Song and Singh 2010a, b), and (3) multivariate analysis based on Copula function (Favre et al. 2004; Salvadori and De Michele 2004; Genest and Favre 2007; Poulin et al. 2007; Zhang and Singh 2006, 2007; Bárdossy and Li 2008; Chowdhary and Singh 2010; Shiau 2006; Shiau et al. 2007). The univariate analysis and bivariate analysis with the same marginal posterior could not rightly appreciate the relationship of joint distribution among the droughty multivariate, because of the significant correlation and different distribution among different drought variables. However, the Copula function allows the marginal distributions of each single-factor variables with any form and the various kinds of relationship among variables, which has favorable flexibility and adaptability (Xie and Huang 2008).

In this paper, we use the theory of runs to identify the drought duration and severity. We adopt the kernel density estimation to obtain the marginal distribution function for the drought duration and severity. We adopt the Copula function to obtain the joint distribution function for them. Then, we calculate the return period and the drought risk in China based on the monthly Palmer Drought Severity Index (PDSI) data over 188 stations from 1901 to 2010.

Methodology

Theory of runs

Self-calibrated PDSI with Penman-Monteith PE (sc_PDSI_pm) based on IPCC AR4 (IPCC 2007) 22-model ensembles mean climate under the twentieth century forcing and A1B scenario (Dai et al. 2004; Dai 2011a, b). The detail of the data is as in the studies by Dai (2011a). This study collected monthly sc_PDSI_pm data over 188 stations in China from 1901 to 2010 (Fig. 1).

Fig. 1
figure 1

Stations for drought risk analysis

The drought duration \( D \) is expressed as a drought parameter which is continuously below the critical level. In other words, it is the time period between the initiation and termination of a drought event. That is the positive run length. The drought severity \( S \) indicates a cumulative deficiency of a drought parameter below the critical level. \( {X}_0 \), \( {X}_1 \), and \( {X}_2 \) are thresholds of the PDSI (Table 1). Figure 2 shows that “g” is a drought event because X is more than \( {X}_1 \). “h” is not a drought event because the D is only one unit and X is less than \( {X}_2 \), though it is more than \( {X}_1 \). “p” is a drought event because X is more than \( {X}_1 \), though there is one unit of D below \( {X}_1 \) between \( {D}_1 \) and \( {D}_2 \), say \( D={D}_1+{D}_2+1 \), \( S={S}_1+{S}_2 \). More details can be found in the studies by Lu et al. (2010).

Table 1 Thresholds of the PDSI parameters
Fig. 2
figure 2

Identification of the drought duration and severity

Kernel density estimation

The kernel density estimation is an ordinary non-parameter estimation algorithm (Guo et al. 1996). The kernel probability density function for single variable is

$$ {f}_X(x)=\frac{1}{nh}{\displaystyle \sum_{i=1}^nK\left(\frac{x-{x}_i}{h}\right)} $$
(1)

Here, n is the number of the observed value x i ; K(·) is the kernel density estimation function (KDEF); and h is the bandwidth, which decides the variance of the KDEF. Uniform, Triangle, Epanechnikov, and Gaussian are the common kernel density estimation functions (Silverman 1998; Tae-Woong et al. 2003; Lall et al. 1996).

Copula function

Copula function is the function which connects the joint distribution function and their marginal distribution functions together (Wei and Zhang 2008). The dualistic Copula function C(·, ·) has the following properties: (1) C(·, ·) has the domain I 2, that is [0, 1]2; (2) C(·, ·) has the zero basal plane, and is increasing for the two-dimensional scale; and (3) C(u, 1) = u and C(1, v) = v are satisfied for an arbitrary variable. It is assumed that F(x) and G(y) is the continuous one-dimensional distribution function. Let be u = F(x) and v = G(y). Then u and v all obey uniform distribution [0, 1]. Therefore, C(u, v) is a two-dimensional distribution function which the marginal distribution obeys the uniform one [0, 1] and has the property 0 ≤ C(u, v) ≤ 1 for any (u, v) within the definition domain.

Let H(·, ·) be the joint distribution function which has the marginal ones for F(·) and G(·) based on the Sklar theorem (Sklar 1959), then there is a Copula function C(·, ·) which is satisfied that:

$$ H\left(x,y\right)=C\left(F(x),G(y)\right) $$
(2)

If the F(·) and G(·) were continuous, the C(·, ·) will be determined uniquely. If the F(·) and G(·) were the one-dimensional distribution functions and the C(·, ·) was the corresponding Copula function, H(·, ·) defined by Eq. (2) is the joint distribution function which has the marginal distribution functions for F(·) and G(·).

Normal Copula function (Nelsen 2006), t-Copula function (Bouyé et al. 2000; Cherubini et al. 2004), and Archimedes Copula function (Genest and Mackay 1986) are the common dualistic Copula functions, while Gumbel Copula (Frees and Valdez 1998; Patton 2002), Clayton Copula, and Frank Copula are the common dualistic Archimedes Copula functions.

Return period

The return periods of the marginal distribution functions for the drought duration and severity can be calculated from Eq. (3), which is deduced by Shiau and Shen in 2001.

$$ \begin{array}{c}\hfill {T}_D=\frac{E(L)}{1-{F}_D(d)}\hfill \\ {}\hfill {T}_S=\frac{E(L)}{1-{F}_S(s)}\hfill \end{array} $$
(3)

Here, T D and T S are the return periods of the marginal distribution functions for the drought duration and severity, respectively; F D (d) and F S (s) are the marginal distribution functions for the drought duration and severity; and E(L) is the expected value of the drought intervals.

Among all these, the return periods of the joint distribution for the drought duration and severity include two situations: T o (D > d or S > s) and T a (D > d and S > s). T o and T a can be calculated from Eq. (4) proposed by Shiau in 2003, which are the return periods of the joint distribution with double variables based on the Copula function.

$$ \begin{array}{c}\hfill {T}_o\left(d,s\right)=\frac{E(L)}{1-F\left(d,s\right)}\kern6em \hfill \\ {}\hfill {T}_a\left(d,s\right)=\frac{E(L)}{1-{F}_D(d)-{F}_S(s)+F\left(d,s\right)}\hfill \end{array} $$
(4)

Here, F(d, s) is the joint distribution function of the drought duration and severity.

The associated drought risks R can be calculated to impose the probability of extreme drought with T-year return period for N years (Chow et al. 1988).

$$ R=1-{\left(1-\frac{1}{T}\right)}^N $$
(5)

Results and discussion

Marginal distribution functions of the drought duration and severity

The two main aspects affecting kernel density estimation are the bandwidth and the KDEF. Firstly, the optimal bandwidth and the optimal KDEF for drought duration and severity are determined. Secondly, the Gaussian function as the KDEF is confirmed and the influence of kernel density estimation of different bandwidths is observed. It can be seen that the values and curve shapes of kernel density estimation are quite different under different bandwidths. Take Mainland China, excluding Taiwan, Hong Kong, and Macao, the same below (Fig. 3). With less bandwidth level, as h is 0.3 for the drought duration and h is 0.5 for the drought severity, the curve shapes of kernel density estimation are more flexural and less smooth which can reflect many details. But with bigger bandwidth level, as h is 2.0 for the drought duration and h is 3.0 for the drought severity, the curve shapes of kernel density estimation are much smoother which cover many details. According to Silverman empirical law (Silverman 1998), the bandwidth level of kernel estimation of drought duration is 0.5, and the bandwidth level of kernel estimation of drought severity is 1.0. The level of bandwidth is fixed, as h is 0.5 for the drought duration and h is 1.0 for the drought severity; the kernel functions are chosen, such as Gaussian, Uniform, Triangle, and Epanechnikov; and then the influence of different kernel functions on kernel density estimation is observed. It can be seen that different kernel functions have little influence on kernel density estimation (Fig. 4). But the smooth Gaussian kernel function is better than Uniform. According to Silverman empirical law, Gaussian kernel function is chosen as the final result.

Fig. 3
figure 3

Maps of the kernel density estimation with different bandwidths. The unit of the drought duration is in months; the unit of the drought severity and the density function is 1 (the same as in Fig. 4)

Fig. 4
figure 4

Maps of the kernel density estimation with different KDEFs

In order to compare the fitting effect of parametric and non-parametric methods, this paper selects three kinds of frequently used distributions, Normality, Index, and Gamma, to estimate the cumulative distribution of drought duration and drought severity, while chooses kernel estimation, as h is 0.5 for the drought duration, h is 1 for the drought severity, and kernel functions are all Gaussian functions. The empirical estimation is Kaplan-Meier. The result of non-parametric kernel estimation can be a better representation for probability density features of non-unimodal type than parametric method (Fig. 5). Table 2 shows the optimal bandwidth level of kernel density estimation and the basic statistics of drought duration and severity in eight river basins. And the kernel functions are all Gaussian.

Fig. 5
figure 5

Maps of cumulative distribution estimation for parametric and non-parametric methods

Table 2 Basic statistics of regional drought events

Joint distribution function of drought duration and severity

The frequency histogram of marginal distribution of drought duration and severity has asymmetric tail taking Mainland China as an example (Fig. 6). The top part of tail is high, and the base is lower. These illustrate that there is more relevant among variables at the top of tail and asymptotic independent at the base of tail. Therefore, we can choose dualistic Gumbel Copula function as the method for describing drought duration and severity.

Fig. 6
figure 6

Frequency histogram of marginal distribution of drought duration and severity

This paper uses maximum likelihood method for parameter estimation to compare the imitative effect of dualistic Gumbel Copula function, dualistic normal Copula function, dualistic t-Copula function, dualistic Clayton Copula function, and dualistic Frank Copula function. The squared Euclidean distance between the above five kinds of dualistic Copula function and experience Copula function are calculated, respectively. The squared Euclidean distance between dualistic normal Copula function with the linear related parameter of 0.95 and experience Copula function is 0.40. The squared Euclidean distance between dualistic t-Copula function with the linear related parameter of 0.95 and degree of freedom of 18 and experience Copula function is 0.22. The squared Euclidean distance between dualistic Clayton Copula function with the parameter of 4.12, dualistic Frank Copula function with the parameter of 18.50, dualistic Gumbel Copula function with the parameter of 4.98, and the experience Copula function is 8.91, 0.40, and 0.15, respectively. At the distance standard of squared Euclidean, dualistic Gumbel Copula function could better fit the dependency structure of drought duration and severity. This result is consistent with the result obtained by the frequency histogram of marginal distribution of drought duration and severity.

The Gumbel Copula function of drought duration and severity has the smallest squared Euclidean distance in all basins except the Yangtze and Songliao River Basin (Fig. 7). And the t-Copula function has the smallest squared Euclidean distance in these two basins. But the difference of squared Euclidean distance between t-Copula function and Gumbel Copula function is small in Yangtze and Songliao River Basin. Therefore, in order to calculate conveniently, the joint distribution function of drought duration and severity in the eight basins is Gumbel Copula function. The relevant parameters are shown in Table 3.

Fig. 7
figure 7

Density and distribution function chart of the dualistic Gumbel Copula (α = 4.98)

Table 3 Parameters and squared Euclidean distance of Copula functions

Return period of drought duration and severity

The univariate return periods of marginal distribution of drought duration and severity are between T o and T a . The two kinds of return periods of joint distribution can be seen as two extreme cases of marginal distribution (Fig. 8). Take Mainland China as an example. When drought duration is 5 months, and drought severity is 10 mm, the marginal distribution return periods of drought duration and severity are 51 and 52 months, and the joint distribution ones are 46 and 58 months. Under these conditions, the joint distribution of drought duration and severity in Yellow River and Songliao River basins has the biggest return period, in which T a is 69 months, and has the smallest return period in southwestern region in which T a is 42 months (Table 4).

Fig. 8
figure 8

Various return periods for drought events given for a duration of 5 months. Red symbols express the return period of the marginal distribution for drought severity; blue symbols express the return period T o of joint distribution of drought duration and severity; green symbols express the return period T a of joint distribution

Table 4 Return periods of the marginal distribution corresponding to the joint distribution of 5 months

At the same conditions of drought severity, the return period of joint distribution, T a , is increasing continuously with the extension of drought duration. And the end of joint distribution, return period is gradually becoming the marginal distribution of drought severity (Fig. 9).

Fig. 9
figure 9

Joint return period for drought events given various durations. Black, red, blue, purple, and green symbols express the return periods of joint distribution with drought duration of 3, 5, 7, 10, 12 months, respectively; abscissa is the drought severity and ordinate is the return period

Drought risk analysis

The drought risk map taking basin as the spatial scale can better reflect the spatial variation of drought risk compared with taking Mainland China as the spatial scale (Fig. 10). In order to compare the space characteristic of drought risk, this paper chooses the sub-basin as the spatial scale to build respective models of drought risk identification. The results are more precise than those for the whole basin. And this illustrates that it is more suitable for drought risk assessment. Figure 11 is the drought risk map with the spatial scale of basin. At the extreme drought situation, the basins with high risk of drought in the next 50 years are Haihe, Huaihe, Songliao, and rivers in the northwest China. In each river basin, the drought risk value is increasing with the increase of forecasting time.

Fig. 10
figure 10

The probability calculated in a China and b basins, respectively. The probability that drought duration will exceed 5 months and drought severity will exceed 10 mm at least once during the next 5 years

Fig. 11
figure 11

Drought risk maps at the extreme drought situation in the next a 5, b 10, c 20, and d 50 years

Conclusions

In this paper, we calculate the return period and the drought risk in China based on the monthly PDSI data over 188 stations from 1901 to 2010. We use the theory of runs to identify the drought duration and severity. We adopt the kernel density estimation to obtain the marginal distribution function, and the Gumbel Copula function to obtain the joint distribution function. After confirming the marginal distribution function of drought duration and severity, we compare the influence of bandwidth and kernel function on kernel estimation, respectively, and the fitting effect of kernel and parameter estimation. After confirming the joint distribution function of drought duration and severity, we compare the fitting effect of dualistic Gumbel Copula function, dualistic normal Copula function, dualistic t-Copula function, dualistic Clayton Copula function, and dualistic Frank Copula function which are used frequently in hydrologic and meteorological fields. We compare the return period of univariate marginal distribution and bivariate joint distribution when the joint distribution return period of drought duration and severity are calculated. We compare the spatial difference of drought risk calculated based on Mainland China and river basin, respectively.

The bandwidths of drought duration about eight major river basins of China are between 0.66 and 1.09, and the bandwidths of drought severity are between 1.45 and 2.43. Kernel functions are Gaussian. The joint distribution functions of eight major river basins are all Gumbel Copula, and their parameters are between 4.79 and 6.09. The return period of bivariate joint distribution can be treated as two extreme return period cases of univariate marginal distribution. Under the same conditions of drought severity, the return period of bivariate joint distribution, T a , is increasing continuously with the continuous extension of drought duration. Its end part is gradually becoming the return period of drought severity marginal distribution. The drought risk map taking river basin as the spatial scale can better reflect the spatial variation of drought risk compared with taking Mainland China as the spatial scale. At the extreme drought situation, the basins with high risk of drought in the next 50 years are Haihe, Huaihe, Songliao, and rivers in the northwest China. In each river basin, the drought risk value is increasing with the increase of forecasting time. The research results of this paper can provide some basic supports to cope with drought.