1 Introduction

Climate change is continuously changing the atmosphere with the increasing number of projected drought risks and floods in different parts of the world [1]. Natural hazards caused more than 1.6 million human losses and about 260–310$ billion in economic losses worldwide since 1990 out of which 50% of economic loss is due to droughts and floods [2]. Climate change also affects agriculture production worldwide and needs urgent strategies according to changing environment [3, 4]. High temperature increases drought conditions where dry regions become drier and wet become wetter due to evaporation, drying surface, and groundwater reduction which need climate resistance crops for increased grain production [5]. Drought diminishes agricultural production which is one of the main reasons for hunger and malnutrition that leads to food insecurity [6, 7]. Droughts do not always have devastating effects on agriculture unless it matches the growing period of crops [8].

Drought is an extreme hydrological event, and its prediction is crucial to get rid of potential future losses. Drought prediction provides a basis for more reliable risk assessment, planning, and water resource engineering [9]. It is a natural but temporarily happening phenomenon caused due to imbalance between supply and demand of water for continuously observing less than average delivery of rainfall [10]. The accurate prediction of less or no rainfall is necessary for arrangements to save lives and crops as much as possible. Drought is linked to observed rainfall and high temperature where its variability can cause climatic extremes like droughts and floods which will affect agriculture production, environmental issues, and human losses [11, 12]. Additionally, it will also affect the drought duration and severity in different regions.

Droughts are events with uncertain frequency, severity, and duration which result in decreased water resource accessibility as well as decreased carrying ability of the ecology. The drought structures have important applications in water resource planning and supply, particularly where drought occurs frequently with economic and social fatalities [13]. Significant work has been done in the field of drought assessment and prediction using probability distributions worldwide [1, 14, 15]. However, most of the studies used univariate drought analysis [16,17,18,19,20]. Drought is a multivariate relationship of several correlated variables like drought duration and severity which explain maximum drought variability [21]. Usually, the engineering structures do not fail due to the exceedance or non-exceedance of a single environmental variable but fail to the collective effect of various related environmental variables from a certain threshold level [22,23,24]. Multivariate modeling of drought analysis gives a more detailed description using copula functions [25]. In literature, probability distributions and copula functions are used for multivariate drought analysis and joint projections using drought characteristics, that is, the drought duration and drought severity [23, 25,26,27,28].

Drought is considered a multivariate statistical phenomenon with several drought characteristics like frequency, duration, and severity where these characteristics are increased in Asian countries [29]. Pakistan has strong vulnerabilities to climate change that impacts the water resource system and agriculture [30, 31]. The commonly occurring extreme environmental events like floods and droughts have adversely affected the economic growth of the country. There is increasing concern in Pakistan, about the enlarging drought frequency, duration, and severity [8]. A substantial increase in the occurrence of heatwaves is a sign of forthcoming drought and its growing severity in Pakistan [32].

The drought condition is rapidly becoming the worst disaster due to the shortage of rainfall and increased temperature in Balochistan, Pakistan. Frequently occurring drought is a major challenge to the people and government of Balochistan province. To cope with the food insecurity condition in the region, the farmers intensify their struggle for water resources other than rainfall to secure agriculture [8]. Ullah et al. [18, 33] investigated drought conditions using the reconnaissance drought index (RDI) and standardized precipitation index (SPI), respectively, which showed that Balochistan has the maximum drought threats in Pakistan. The climate history of Pakistan has some serious droughts with large durations and severe effects. For example, the drought of 1998–2002 is one of the most severe droughts in Pakistan [34]. This drought brought huge tragedies and casualties in the country specifically in the province of Balochistan. In developing countries like Pakistan, poor manufacturers and consumers are harshly affected by extreme climatic events and droughts [10]. In literature, several studies are conducted to study drought risk in Pakistan [18, 20, 33, 35,36,37,38]. However, there is no detailed study to perform bivariate drought projections using drought duration and severity in Balochistan province. Therefore, this study is conducted to work on some of the main objectives in Balochistan, Pakistan. Drought and water risk assessment is statistically measured by the numerical relationship of climate data through drought indices. Hence, the SPI is to be utilized at a 12-month time scale to extract drought characteristics of duration and severity for the selected meteorological stations in Balochistan. Bivariate homogenous regions are constructed using statistical techniques for the duration and severity variables. The copula function is used to combine both variables for joint return periods at selected years. The outcomes of this study will support water resource planning and drought risk assessment in Balochistan.

2 Study Materials and Methods

2.1 Study Area and Drought Index

Balochistan is situated in the southwestern part of Pakistan, with vast deserts and some highlands. The study area lies between the latitudes 25° to 32°N and longitudes 61° to 71°E with an area of around 347,190 km2 (Fig. 1). Balochistan is categorized by its mixed climate which varies from semiarid to hyper-arid. There are several metrological sites in Balochistan mostly having low elevations. But some of the sites have fewer records of climate data. The SPI gives more reliable results for the site if it contains at least 30 years of precipitation data [39]. Therefore, 13 sites were selected that fulfill the criterion and precipitation data were taken from the Pakistan Meteorological Department (PMD) [40]. Missing values were the major irregularities of data which were filled in by using multiple regression techniques as follows:

$${\widehat{y}}_{uv}={\beta }_{0}+{\beta }_{1}{x}_{u1}+ {\beta }_{2}{x}_{u2}+ \dots +{\beta }_{m}{x}_{um}$$
(1)

where \({\widehat{y}}_{uv}\) denotes the month of missing values while \({x}_{u1}\), \({x}_{u2}\), \({x}_{u3}\), …, \({x}_{um}\) are the rest of the months (u = 1, 2, …, n; v = 1, 2, …, m) and β0, β1, β2, …, βm are the regression coefficients to show per unit share of each month for the missing value.

Fig. 1
figure 1

Map of the included meteorological stations in Balochistan, Pakistan

According to Zargar et al. [41], more than 150 drought indices are developed and many more are developing continuously, e.g., the standardized precipitation temperature index (SPTI) [42], combined drought index (CDI) using copula functions [43], and copula-based joint drought index [44]. All the indices have some merits and demerits due to their application and the data required. The Palmer drought severity index (PDSI) [45] requires more climate data which are not easily available. The standardized precipitation evapotranspiration index (SPEI) [46] is based on precipitation and potential evapotranspiration (PET) data, whereas the PET is calculated using the Thornthwaite equation. However, PET is underestimated in arid and semiarid areas while overestimated in humid and semi-humid areas using the Thornthwaite equation [47, 48]. Similarly, the SPI [49] is the simple and easily applicable probabilistic drought index that requires only precipitation data. The index can be used to find drought characteristics at 1-, 3-, 6-, 9-, and 12-month time scales, and so on. It is expected to explain less variability due to only precipitation data,however, it is recognized by the World Meteorological Organization (WMO. Therefore, in this study, the SPI is used for a 12-month period which is more suitable for drought risk assessment and water resource planning in the country.

Yevjevich [50] defined the method of run theory to extract drought characteristics using threshold levels. In literature, different threshold levels are used, e.g., −0.5 by Liu et al. [51], −0.8 by Ganguli and Reddy [29], −0.85 by Santos et al. [52], and −1 by Goyal and Sharma [53]. Knowledge and experience are needed for the threshold level, as a small level gives a weak approximation which causes bias in the estimated returns while a large level increases variance in estimated parameters due to fewer observations [, 33, 54]. Drought starts when the SPI severity level reaches −0.85 [55]. Therefore, a −0.85 threshold level is selected to identify the drought characteristics from the SPI series.

The drought characteristics of duration and severity for 13 sites are extracted using run theory. A run is the graph of a time series of drought values and is called positive if the values of a drought index are above the truncation level of \({X}_{p}\) and negative if it is below \({X}_{p}\). Drought duration (D) is a portion of the number of consecutive months (M) of SPI time series from the start of drought to its end. It can be written as follows:

$$D=\sum\nolimits_{i=1}^MI\left({SPI}_i\leq-0.85\right),\;for\;M=1,2,\dots,T$$
(2)

where T stands for the full length of monthly data of a metrological station and \(I({SPI}_{i} \le -0.85)\) is an indicator function denoting value 1 if \({(SPI}_{i} \le -0.85)\), otherwise zero. It is to be noted that the SPI series may give several portions of consecutive months of drought durations with a minimum of 1 month that is M = 1 and a maximum of M = T for a metrological station. Similarly, drought severity (S) is the sum of drought values within each of the above portions of drought durations:

$$S=-\sum\nolimits_{i=1}^D{SPI}_i,\;for\;D=1,2,\dots,M$$
(3)

2.2 Homogeneity Measures

Discordancy Measure (\({D}_{k}\))

It calculates the value for each site of the study. It is a statistical test for the initial screening of the site’s data and finds the discordant site(s) within a region. The \({D}_{k}\) value for the ith site (i = 1, 2, …, N) in a region is obtained as follows:

$${D}_{i}=\frac{1}{3}N{\left({w}_{i}-\overline{w }\right)}^{T}{S}^{-1}({w}_{i}-\overline{w })$$
(4)

where the entries are defined as \({w}_{i}={[{t}_{2}^{(i)} {t}_{3}^{\left(i\right)} {t}_{4}^{\left(i\right)}]}^{T}\), \(\overline{w }={N}^{-1} \sum_{i=1}^{N}{w}_{i}\), and \(S=\sum_{i=1}^{N}{({w}_{i}-\overline{w })({w}_{i}-\overline{w })}^{T}\). The ith site is discordant when the \({D}_{i}\) value is greater than the level of significance proposed by Hosking and Wallis [56].

Heterogeneity Measure

Another statistical tool is the heterogeneity measure which is applied to find the homogeneity value for a region. The test uses the sample estimates of L-moment ratios found from observed data and its expected L-moment ratios, where the expected L-moment ratios are found from the Monte Carlo simulation by generating \({N}_{sim}\) equivalent homogenous regions from Kappa probability distribution with 4 parameters. The variation between the observed and expected L-moment ratios among the sites within a region is calculated in the shape of standard deviation (\({S}_{r}\), r = 1, 2, 3) for the site sample L-moment ratios weighted proportionally to its sample size. The three forms of standard deviations are calculated as follows:

$${S}_{1}={[\frac{\sum_{\mathrm{i}=1}^{N}{N}_{\mathrm{i}}{\left({t}_{2}^{i}-{t}_{2}^{R}\right)}^{2}}{\sum_{\mathrm{i}=1}^{N}{N}_{\mathrm{i}}}]}^\frac{1}{2}$$
(5)
$${S}_{2}=\frac{{[\sum_{\mathrm{i}=1}^{N}{N}_{\mathrm{i}}{({t}_{2}^{i}-{t}_{2}^{R})}^{2}+ {({\mathrm{t}}_{3}^{\mathrm{i}}- {\mathrm{t}}_{3}^{\mathrm{R}})}^{2}]}^\frac{1}{2}}{\sum_{\mathrm{i}=1}^{N}{N}_{\mathrm{i}}}$$
(6)
$${S}_{3}=\frac{{[\sum_{\mathrm{i}=1}^{\mathrm{N}}{N}_{\mathrm{i}}{({\mathrm{t}}_{3}^{\mathrm{i}} - {\mathrm{t}}_{3}^{\mathrm{R}})}^{2}+ {({\mathrm{t}}_{4}^{\mathrm{i}}- {\mathrm{t}}_{4}^{\mathrm{R}})}^{2}]}^\frac{1}{2}}{\sum_{\mathrm{i}=1}^{N}{N}_{\mathrm{i}}}$$
(7)

where N is the number of sites in the ith region, \({t}_{2}^{\left(\mathrm{i}\right)}\) is the ratios of the \({r}^{th}\) site, and \({t}_{2}^{R}\) is the regional average of ratios of all the sites of a region. The Monte Carlo simulated regions are considered to have analogous information to observed data and the heterogeneity measure is calculated as:

$$H_r=\frac{(S_r-\mu_s)}{\sigma_s},\;for\;r=1,2,3$$
(8)

where \({\mu }_{s}\) and \({\sigma }_{s}\) are the mean and standard deviation from Monte Carlo simulated regions. The \({H}_{r}\) measure changes with the change of standard deviation (\({S}_{r}\)). Hosking and Wallis [56] give three stages of \({H}_{r}\) based on its value. The region is acceptably homogenous when \({H}_{r}\)< 1, possibly heterogeneous when \(1\le {H}_{r}\)< 2, and heterogeneous when \({H}_{r}\ge 2\)

2.3 Copula Modeling

Dependence Measures

Copula models have better applications in correlated variables. The correlation between the variables can be quantitatively found using Spearman’s rho, Pearson’s correlation coefficient, and the non-parametric Kendall’s tau correlation coefficient. The Kendall tau correlation coefficient is considered more suitable in drought variables [57]. The correlation is tested under the hypothesis of no relationship. The no relationship is rejected (accepted) by comparing the p-value with the 5% level of significance. The estimate of Kendall’s tau correlation coefficient can be found for a bivariate random sample (\({d}_{1}, {s}_{1})\), \(\left({d}_{2}, {s}_{2}\right),\) …, \(({d}_{n}, {s}_{n})\) of n observations of drought durations and severities as follows:

$$\tau ={\left(\genfrac{}{}{0pt}{}{n}{2}\right)}^{-1}\sum_{1\le i<j\le n}sign\left\{\left({d}_{i}-{d}_{j}\right)\right.\left.\left({s}_{i}-{s}_{j}\right)\right\}$$
(9)

where the two pairs (\({d}_{i}, {s}_{i})\) and \(\left({d}_{j}, {s}_{j}\right)\) are called concordant when \(\left({d}_{i}-{d}_{j}\right)\left({s}_{i}-{s}_{j}\right)>0\) and discordant when \(\left({d}_{i}-{d}_{j}\right)\left({s}_{i}-{s}_{j}\right)<0\). However, the quantitative measures do not give the physical structure of the dependence between the variables. Therefore, some graphical statistical tools are discussed for the joint dependence structure of variables. The scatter plot is one of the most used tools. It shows the nature and direction of the relationship between the variables. The chi-plot is a rank-based scatter plot for detecting the dependence structure [58], 2001). It is a plot of the chi-square test of independence \({(\chi }_{i})\) and the measure of paired distances \({(\lambda }_{i})\) for \({\lambda }_{i} \epsilon (-\mathrm{1,1})\) between the observations. There will be positive dependence between the variables if the scatteredness of values \({(\chi }_{i}\), \({\lambda }_{i})\) is above the confidence interval of the chi-plot and vice versa [59]. Another measure for dependence structure is the K-plot related to Kendall tau statistics [60]. It is a plot of the relative frequency under independence \({(H}_{i})\) and the expectation of the ith value of order statistics \({(W}_{i:n})\). The variables will be independent if the values are near the line of 45° and have high positive dependence when the value of pairs \({(H}_{i}, {W}_{i:n})\) are scattered in the form of a curve over the line 45°.

Copula modeling has mainly two steps that are the probability distributions and copula functions. In the first step, the best-fitted marginal probability distributions have a key role in the estimation of reliable projections of drought duration and severity. The one- and two-parameter distributions do not capture the tail effect of extreme events properly [56, 61]. Hence, several studies used three-parameter probability distributions for the drought duration and severity [25, 26, 62]. However, the exponential, gamma, and other two-parameter distributions are efficiently used for drought duration and severity [21, 27, 63, 64]. Wang et al. [28] used one-, two-, and three-parameter probability distributions for the analysis of drought characteristics. Therefore, in this study, two-parameter gamma, logistic, and Weibull distributions and three-parameter generalized extreme value and generalized Pareto distributions are used to select the most suitable probability distributions for the variables given in Table 3.

In statistics, graphical and quantitative methods are used simultaneously for the selection of the most suitable probability distributions. A graphical method of plotting the multiple probability distributions over the histogram of data is used. A distribution is considered better if it covers the histogram properly. However, graphical methods do not always give suitable selection and should not be the only criterion for the selection of probability distributions [56]. Therefore, chi-square goodness-of-fit test is used under the statistical hypothesis testing, whether the candidate probability distribution is acceptable or not, using the following relation:

$${\chi }_{dist}^{2}=\frac{1}{N}\sum_{i=1}^{N}\left(\frac{{\left({O}_{i}-{E}_{i}\right)}^{2}}{{E}_{i}}\right)$$
(10)

where “dist” is used for the candidate probability distribution, \({O}_{i}\) is for the observed values, while \({E}_{i}\) is for the expected frequency calculated as \({E}_{i}=N*P({x}_{i})\), where N is the number of observations and \(P({x}_{i})\) is the corresponding probability from a candidate probability distribution. The selection criterion for distribution is based on comparing the chi-square test statistics value and its p-value at a 5% level of significance. A distribution is considered most suitable if it has a minimum chi-square test value with a maximum p-value.

In the second step, a copula is a relative modem phenomenon which is a type of multivariate statistical function. It was first described by Sklar [65], which diminishes the modeling of k-dimensional distribution function, F\((.)\). It is a multivariate function for modeling the dependence between sets of data without involving some complex assumptions about the marginal and joint nature of the variables [57]. Multi-dimensional climatological phenomenon is usually described by more than two dependent variables using pairwise coupling of variables through vine copula [23, 66, 67]. However, in the case of two-dimensional phenomena, single-parameter copula models are used. In this study, we use the copula function for modeling and applications of drought duration (D) and severity (S) with marginal distribution functions \({F}_{\mathrm{D}}(d)\) and \({F}_{\mathrm{S}}(s)\), respectively. Then, the bivariate Sklar’s theorem with uniform marginal distributions that is \(U\sim (\mathrm{0,1})\) will be expressed as follows:

$${F}_{DS}\left(d,s\right)=C\left\{{F}_{D}\left(d\right), {F}_{S}\left(s\right);\theta \right\},\to d,s\in \mathcal{R}$$
(11)

where \(\theta\) is the parameter of the selected copula model. The marginal distributions \({F}_{D}(d)\) and \({F}_{S}(s)\) are used to convert the D and S variables through cumulative probabilities into uniformly distributed values within the interval [0, 1]. Furthermore, if marginal distributions are continuous, then C(.) is uniquely defined [57]. C(.) is the cumulative distribution function of any family-like elliptical, Archimedean, and extreme value copulas, to measure the joint dependence between drought duration and severity variables.

The families of Archimedean and extreme value copulas mostly used copula functions for modeling joint dependence between the variables in hydrology and metrology [9, 28, 64, 68]. These families of copulas are suitable for asymmetric distributions of sets of data [69]. Several bivariate copula models of a single parameter are checked but a limited number of functions are possible to be considered in a study. Therefore, Clayton, Gumbel-Hougaard (G-H), and Galambos copulas are selected to be tested for the selection of the best-fit copula function (Table 1). The Clayton copula is related to the Archimedean family, the Galambos copula is from the extreme value family while the G-H copula is the only function that is related to both families. The inversion of Kendall’s τ method will be used to estimate the copula parameters for the selected copula models that use the association of Kendall’s tau and copula parameter \((\theta )\) [57, 70].

Table 1 Selected copula models with its range and upper tail dependence measure

In the next step, the best-fit copula model is selected using statistical measures. The scatter plot of the observed drought duration and severity along with its simulated pairs from the copula models is used to compare the dependence structure. For further analysis, the pairs of cumulative probabilities from the observed data with simulated pairs from the specific copula models are plotted to check the dependence pattern of the variables. A joint contour plot of the empirical and theoretical copula models is presented to assess the suitability of the copula models for the two variables. The empirical copula can be expressed as follows:

$${C}_{\mathrm{n}}\left({d}_{\mathrm{i}} {s}_{\mathrm{i}}\right)=\frac{1}{\mathrm{n}}\sum_{\mathrm{i}=1}^{\mathrm{n}}\mathrm{I}\left(\frac{{D}_{\mathrm{i}}}{n+1}\le {d}_{i} \frac{{S}_{\mathrm{i}}}{n+1}\le {s}_{i}\right)$$
(12)

where I(A) is the indicator function for the set A where A has 0 value if it is false and 1 value if it is true, \({d}_{i}\) and \({s}_{i}\) are uniformly distributed over [0, 1], n is the total number of observations, while \({D}_{\mathrm{i}}\) and \({S}_{\mathrm{i}}\) are the ranked ordered observations of drought durations and severities. Better graphical matching of the copulas will suggest the selection of theoretical copula models.

The numerical measurements are always considered more robust as goodness-of-fit criteria for the selection of statistical models. Statistical criteria of the Akaike information criterion (AIC) [71] and Bayesian information criterion (BIC) [72] are used to select the copula models for the regions [28, 64]. A model is considered best fitted if it has maximum absolute values for the criteria.

2.4 Tail Dependence of Copula

The measure of tail dependence of fitted copula models has an important role in hydrology. It is the asymptotic dependence of the distributions in the fitted copula model expressed as the probability of extreme events that jointly occur on any tail (upper or lower) or both. In case the data of two variables are correlated between extreme events, then copula models with strong tail dependence have particular significance [73]. A copula that fails to model tail dependence is expected to give misleading projections of extreme events and return periods [74]. In the case of drought analysis, if a fitted copula model does not capture the tail dependence within drought characteristics, it might give greater uncertainty in the estimates of drought hazard [75]. The upper tail dependence is the probability of extreme events jointly occurring in the upper right corner of the scatter plots of the ranked data, defined as follows:

$${\lambda }_{U}=\underset{u\to 1}{\mathrm{lim}}\left[\frac{1-2u+C\left(u, u\right)}{1-u}\right]$$
(13)

where u is any constant value of the standard uniform variable. Tail behavior is determined entirely by the form of copula used, not by the marginal distribution chosen.

2.5 Probability and Return Periods of Extreme Drought Events

The joint probability distribution of drought characteristics is an efficient method for predicting and managing droughts as well as water management in such circumstances [61, 76]. The bivariate cumulative probability distributions can be calculated through copula to measure the joint occurrence of drought duration and severity. The joint occurrence non-exceedance probability of the two drought characteristics would be as follows.

$$P\left(D\le {d}_{i},S\le {s}_{i}\right)={F}_{12}\left({d}_{i},{s}_{i}\right)=C({F}_{1}\left({d}_{i}\right), {F}_{2}\left({s}_{i}\right))$$
(14)

Another type is the joint occurrence exceedance probability that is calculated while exceeding the specific threshold levels using the relation:

$$P\left(D\ge {d}_{i},S\ge {s}_{i}\right)=1-{F}_{1}\left({d}_{i}\right)-{F}_{2}\left({s}_{i}\right)+C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))$$
(15)

The copula function can also be used to find conditional probabilities for specific drought duration (\({d}^{^{\prime}}\)) and severity (\({s}^{^{\prime}}\)) as follows:

$$P\left(S\le {s}_{i}|D\ge {d}^{^{\prime}}\right)=\frac{{F}_{2}\left({s}_{i}\right)-C({F}_{1}\left({d}^{^{\prime}}\right), {F}_{2}\left({s}_{i}\right))}{1-{F}_{1}\left({d}^{^{\prime}}\right)}$$
(16)
$$P\left(D\le {d}_{i}|S\ge {s}^{^{\prime}}\right)=\frac{{F}_{1}\left({d}_{i}\right)-C({F}_{1}\left({d}_{i}\right), {F}_{2}\left({s}^{^{\prime}}\right))}{1-{F}_{2}\left({s}^{^{\prime}}\right)}$$
(17)

The joint probabilities can provide some valuable information for drought management in the region. The joint exceedance probabilities over specific values for drought duration and severity can help in the water supply and demand system as drought mitigation plans [77].

The next purpose of this study is to find the risk of extreme drought events at various return periods in the future. Frequency analysis is an important approach for hydrologists and water resource engineers based on the return period using hydrologic extreme events [78]. The drought return periods are particularly important due to suitable water usage planning in drought conditions [79]. The estimation of joint return periods has an important role in drought planning and water resources management. The primary and secondary return periods are the two types of joint return periods estimated for extreme drought events. The primary joint return periods contain the \({T}_{\mathrm{OR}}\) and the \({T}_{\mathrm{AND}}\) return periods. The \({T}_{\mathrm{OR}}\) is under the condition D\({d}_{i}\) or S\({s}_{i}\) which is either drought severity or duration will exceed the specific values, while \({T}_{\mathrm{AND}}\) is calculated when D\({d}_{i}\) and S\({s}_{i}\) that is both drought severity and duration exceed the specific values [61]:

$${T}_{\mathrm{OR}}=\frac{\mathrm{E}(\mathrm{IAT})}{\mathrm{P}(\mathrm{D}\ge {d}_{i}\mathrm{ or S}\ge {s}_{i})}=\frac{\mathrm{E}(\mathrm{IAT})}{1-C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))}$$
(18)
$$\begin{aligned}{T}_{\mathrm{AND}}&=\frac{\mathrm{E}(\mathrm{IAT})}{\mathrm{P}(\mathrm{D}\ge {d}_{i}\mathrm{\, and\, S}\ge {s}_{i})}\\&=\frac{\mathrm{E}(\mathrm{IAT})}{1-{F}_{1}\left({d}_{i}\right)-{F}_{2}\left({s}_{i}\right)+C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))}\end{aligned}$$
(19)

where E(IAT) is the average interarrival time of drought events and is calculated by the ratio of the total number of years to the total number of droughts [26].

The secondary or Kendall’s tau return periods are defined by Salvadori et al. [80]. The Kendall measure \({K}_{c}\left(q\right)\) is related to the joint distribution of copula function to describe the risk level at which the joint probability for the random variables is at least q-value, at a given probability of \(q\epsilon (\mathrm{0,1})\). This type of return period is well-defined, and each group of variables corresponds to a distinct risk area within a given return period. The secondary return periods are computed as follows:

$${T}_{\mathrm{KEN}}=\frac{\mathrm{E}(\mathrm{IAT})}{P(C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))\ge q)}=\frac{\mathrm{E}(\mathrm{IAT})}{1-{K}_{c}(q)}$$
(20)

where \({K}_{c}(q)\) represents Kendall’s distribution function for the selected theoretical copula function at a \({q}^{th}\) probability value and is defined as:

$${K}_{c}\left(q\right)= P(C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))\le q)$$
(21)

However, if the selected two-dimensional copula function is the G-H copula with parameter \(\theta\), then the \({K}_{c}\left(q\right)\) is calculated as follows:

$${K}_{c}\left(q\right)=q-\frac{q*\mathrm{log}(q)}{\theta }$$
(22)

According to Chen and Guo [61], the conditional return period of droughts will be calculated from copula functions using the following relations:

$${T}_{S\left|D\ge {d}^{^{\prime}}\right.}=\frac{E(IAT)}{1-{F}_{1}({d}_{i})}*\frac{1}{1-{F}_{1}\left({d}_{i}\right)-{F}_{2}\left({s}_{i}\right)+C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))}$$
(23)
$${T}_{D\left|S\ge {s}^{^{\prime}}\right.}=\frac{E(IAT)}{1-{F}_{2}({s}_{i})}*\frac{1}{1-{F}_{1}\left({d}_{i}\right)-{F}_{2}\left({s}_{i}\right)+C({F}_{1}\left({d}_{i}\right),{F}_{2}\left({s}_{i}\right))}$$
(24)

where \({T}_{S\left|D\ge {d}^{^{\prime}}\right.}\) represents conditional return periods of drought severity S at a given duration \(D\ge {d}^{^{\prime}}\) and vice versa. The failure of water resources risk requires to study of drought events, at a given threshold level of drought duration \({d}^{^{\prime}}\) (or severity \({s}^{^{\prime}}\)).

3 Results and Discussion

3.1 Bivariate Analysis of Drought Characteristics

Droughts are usually considered a lack of precipitation in an area. The observed precipitation record shows that there is only 175 mm of average yearly precipitation in Balochistan. Due to the very less record of precipitation, Balochistan is considered an arid region [36]. Drought is effectively measured using its characteristics like drought duration and severity. Therefore, the method of run theory was used to extract the values of drought duration and severity for the included sites in the study. All the sites have very high average and maximum values of drought duration and severity which show that there is a severe condition of drought in the study region. The sites along the coastal line have minimum values of rainfall with maximum drought durations and severities.

The homogeneity of meteorological sites is necessary for a more reliable projection of droughts. Yoo et al. [81] used the method of index-flood method for the construction of homogenous regions separately for both drought duration and drought severity variables. The discordancy and heterogeneity statistical measures were calculated to check the homogeneity of 13 meteorological sites in Balochistan province. The discordancy measure was computed to check the climate data for errors and to find any discordant site(s) in the region. The \({D}_{i}\) value for the Barkhan site was 3.10 which exceeds the significance level of 2.869 for 13 sites by Hosking and Wallis [56]. Therefore, the Barkhan site was dropped from further analysis of the study. After the removal of Barkhan site, the measure was again checked which satisfied the significance level of the measure w.r.t drought duration and severity (Table 2). Finally, for the ultimate checking of homogeneity, the heterogeneity measure was used with three possible results. The results show that all three values are less than one and hence the region is homogenous for both drought duration and severity (Table 2). Thus, the region can be used for a reliable statistical assessment of the study.

Table 2 The values of discordancy and heterogeneity measures for the selected sites

The quantitative strength between drought characteristics was measured using Kendall’s tau (0.886), Spearman’s rank (0.969), and Pearson’s correlations (0.947). The high positive values of correlations mean that drought characteristics have strong and direct relations in between. The Pearson correlation coefficient is better to measure linear dependence and might not give good results when there are outliers in the data [74]. In the case of outliers, Kendall’s tau correlation coefficient will be a good choice to describe more variations [82]. The qualitative methods of scatter plot, chi-plot, and K-plot were used to show the dependence structure between drought duration and severity (Fig. 2). The scatter plot of drought values uniformly distributed between (0, 1) is shown (Fig. 2a) which shows a high positive relationship between the variables. In the chi-plot, all the values are above the specified interval of chi-square, while in K-plot, the values form a curve above the line of 45° angles (Fig. 2b, c). All these graphs show a high positive correlation between the drought characteristics of duration and severity. The high positive value of the variable shows that the copula function is more suitable for their joint relationship [83]. These quantitative and qualitative methods show a high correlation between the drought characteristics,therefore, it is more suitable to use the bivariate copula function for further statistical analysis.

Fig. 2
figure 2

Graphs for measuring the dependence between the values of drought duration and severity. a Scatter plot of the CDFs of duration and severity, b chi-plot between \(\lambda\) and \(\chi\) of duration and severity values, and c K-plot between \({W}_{i,n}\) and H of the drought duration and severity, respectively

3.2 Bivariate Copula Modeling

The selection of best-fitted probability distributions is necessary for the copula function. The best-fitted probability distributions have a significant role in the projections of hydrological events even if it does not satisfy some of the statistical assumptions [84]. The five selected probability distributions were plotted over the histograms of drought duration and severity data (Fig. 3). The histograms of both sets of data have extremely positively skewed tails and suggest that positively skewed distributions will be suitable to represent the data. The logistic distribution has about symmetric nature while GEV has a very peaked graph. The gamma, Weibull, and GPA probability distributions have a better fitting over the histograms.

Fig. 3
figure 3

Comparison of selected probability distributions for the selection of best-fit distribution of a drought severity and b drought duration, respectively

The chi-square goodness-of-fit test was used as a numerical measure under the statistical hypothesis testing, whether the candidate probability distribution is acceptable or not. The five probability distributions were estimated, and the chi-square test values were computed (Table 3). According to the selection criterion, the GPA distribution has minimum chi-square values with maximum p-values. Therefore, using graphical and numerical measurements, the GPA distribution was selected as the best-fitted probability distribution for drought risk assessment using duration and severity variables.

Table 3 Probability density functions, estimated parameters, and chi-square goodness-of-fit test values of the selected distributions for drought duration and severity

The Clayton, Gumbel, and Galambos copula models were used to show the joint relationship between drought duration and severity. For fitting a copula model, it is required to transform these two variables into uniformly distributed over the interval (0, 1). The previously selected GPA distribution was used to calculate the cumulative probabilities at each point of drought duration and severity. Copula parameters were estimated using the maximum likelihood method (Table 4). The graphical methods of scatter plots and contour plots were used to select the best-fitted copula model. The scatter plots of observed and cumulative probabilities of drought duration and severity (red dots) and their corresponding simulated values (light blue dots) were graphed along with contour plots of empirical and theoretical copulas for the three copula models (Fig. 4). According to the scatter plots, the G-H and Galambos copulas have a sharper upper tail which shows that they can capture long-term durations and extreme severities more accurately compared to Clayton copula which is more scattered. The contour plots of theoretical and empirical copulas show more suitable matching for the G-H and Galambos copulas. Hence, the G-H and Galambos copulas are more suitable choices.

Table 4 The values of an estimated parameter of copula models along with criteria of AIC, BIC, and UTD in the region
Fig. 4
figure 4

Graphical tools for selection of best-fit copula function where represents scatter plots of observed drought durations and severities, b represents scatter plots of transformed values of drought durations and severities both with their simulated values, and represents the contour plots of selected theoretical and empirical copulas

The AIC, BIC, and tail dependence were calculated as numerical criteria for the selection of the best-fitted copula model (Table 4). The AIC and BIC criteria suggest that the Clayton copula is more suitable, followed by the G-H copula, for modeling the relationship between drought duration and severity. The climate and water conditions become more severe for humans as well as ecological systems when drought duration and severity become high enough [64, 85]. It is better to use copula models with upper tail dependence for drought analysis [15, 86]. The upper tail dependence was calculated for all the modeled copulas. Clayton copula has a lack of measuring the upper tail dependence whereas G-H and Galambos copulas can measure upper tail dependence between drought duration and severity. It is of prime interest that in drought investigation the copula models with upper tail dependence have greater significance [25]. The G-H copula is appreciated which gives more drought risk information in bivariate frequency analysis [87]. Because of this restriction, the Clayton copula was dropped due to lack of upper tail dependence, and the next most suitable is that the G-H copula model was selected as best-fitted for the study area given as follows:

$$C(d,s)=\mathrm{exp}[-{\{{\left(-\mathrm{ln}\left(d\right)\right)}^{6.98}+ {\left(-\mathrm{ln}\left(s\right)\right)}^{6.98}\}}^{\nicefrac{1}{6.98}}]$$
(25)

where d and s are the CDF values using GPA distribution for both drought duration and severity, respectively.

3.3 Bivariate Drought Probabilities and Return Periods

The drought probabilities are important for planning droughts and water resources management in the regions [77]. Let us consider a drought of 12-month duration with a severity level of 5 for the region. The regional best-fit distributions were used to find the univariate probabilities as \({F}_{1}(d\le 12)\)= 0.852 and \({F}_{2}\left(s\le 5\right)=0.500\), respectively. The joint and conditional probabilities of drought duration and drought severity are determined as the most significant criteria to plan short- as well as long-term decisions for drought and water resources management [61]. The Gumbel copula was used to find the joint probabilities of drought conditions for not exceeding 12 months and the severity of 5 is \({F}_{12}\left(d\le 12,s\le 5\right)\) = 0.4999. Finally, the probability of drought duration exceeding 12 months with a severity of 5 is calculated as \({F}_{12}\left(d\ge 12,s\ge 5\right)\) = 0.148. Such joint probabilities of drought for exceeding a specific threshold level can give important information to water resource engineers for drought mitigation schemes [21].

The return period is the expected interarrival time between drought events of a specific or less magnitude [79, 88]. In highly correlated variables, the univariate frequency analysis may over- or underestimate the risk of drought events [89]. In extreme drought analysis, the drought events with longer duration and high severity are more significant due to their heavy effect on water resource projections and high risk to an ecological and agricultural system [25]. Suppose a water supply company makes plans to provide sufficient water for a drought condition exceeding 12 months and severity of 3 in the region, then the joint return period for a similar situation is 19.20 years. It means that a similar drought is expected to occur in the next 20 years approximately.

Let one drought event is considered at Kalat station with a duration of 27 months (May 1978–July 1980) and cumulative severity of 40.51. According to the given formulas, the drought event has a return period of 122.03 years for drought duration. The estimated return period for the severity is 87.55 years. The pair has larger differences in return periods of both drought characteristics. These show the significance of the used drought variable for the univariate frequency analysis of similar nature with high differences in their frequencies. Therefore, it is better to estimate drought risk in the form of its joint behavior.

The primary return periods contain two types of joint return periods (\({T}_{OR}\) and \({T}_{AND}\)) which were estimated for the drought index to study the joint behavior of drought duration and severity. To continue the above example, the estimated \({T}_{OR}\) return period for the drought event is 86.37 years, while for \({T}_{AND}\), it is 124.41 years. These joint return periods cover the univariate return periods. It means that drought with similar duration and severity is expected to occur again according to the calculated two types of return periods (years). The value of the interarrival period (IAP) for the set of sites is 3.07 years using the observed data. Consequently, the bivariate frequency analysis is a better choice to simplify the joint behavior using both types of joint return periods for 1.25, 2, 5, 10, 20, 25, 50, and 100 years (Table 5). The secondary or Kendall return periods are considered more reliable and practical to show the high-danger areas in the region. It is calculated using Eq. (24) for the non-exceedance probabilities of above selected return periods as q = 0.20, 0.50, 0.80, 0.90, 0.95, 0.96, 0.98, and 0.99. The secondary return periods provide a correct risk assessment [25, 90]. Secondary return periods always occur between \({T}_{OR}\) and \({T}_{AND}\) joint return periods (\({T}_{OR}\le {T}_{KEN}\le {T}_{AND}\)) given in Table 5. In comparison to the primary return period, the secondary return period neither overestimates nor underestimates, which may raise the cost or risk of failure.

Table 5 Calculated quantiles of drought severity and duration, joint and conditional return periods for the selected years in the Balochistan region

In bivariate frequency analysis, there may be multiple pairs of drought duration and severity for a certain return period. There will be several possible combinations of drought durations and severities which may not be simplified using univariate return periods. Therefore, contour lines would graphically best explain these joint return periods. The contour lines for the primary joint return periods of both types are constructed for 1.25, 2, 5, 10, 20, 25, 50, and 100 years (Fig. 5). The \({T}_{OR}\) return periods are more practical and called standard return periods [21, 25]. The \({T}_{OR}\) drought return periods have no bounds that can exceed any drought variable (duration or severity). The \({T}_{OR}\) return periods are always less than \({T}_{AND}\) return periods as the probability of occurrence for both variables simultaneously is less compared to occurring only one out of two variables. The joint return periods help plan water usage in drought conditions [91].

Fig. 5
figure 5

Contour plots to show the selected return periods using Gumbel copula: a for joint AND return periods and b for joint OR return periods of drought durations and severities in Balochistan province

The conditional probabilities of drought severity for the given drought durations of 15, 25, 40, 50, 75, and 90 percentiles using the observed data were graphed (Fig. 6a). The graph of conditional probabilities of severity decreases with the increase in given durations. Similarly, the graph of conditional probabilities of drought durations at given severities of the above-said percentiles shows that distribution decreases with the increase in given severities (Fig. 6b). The probabilities have a significant variation in higher percentile values of drought characteristics. These graphs will be helpful to determine the probability of one drought characteristic at a given value of other characteristics.

Fig. 6
figure 6

The graph of conditional copula probabilities at various percentile values in Balochistan province

The conditional return periods have significant importance for water resource engineers in constructing hydraulic design criteria and estimating risk [91, 92]. The conditional return periods can be found in two scenarios. First is the conditional return periods for drought severity at given threshold levels of drought duration using 15, 25, 40, 50, 75, and 90 percentiles (Fig. 7a). Higher percentile values have higher return periods. Second is the conditional return periods for drought durations at given threshold levels of drought severities using the said percentiles (Fig. 7b). The graph shows the same pattern as above. The numerical return periods of drought severity given duration and duration given severity were calculated for 1.25, 2, 5, 10, 20, 25, 50, and 100 years at only 15th percentile values to save time and space (Table 5). The drought duration given severity has larger values compared to the drought severity given duration. The conditional results are particularly important for planning drought or water resources at a specific duration or severity.

Fig. 7
figure 7

The graphs of conditional copula return periods at various percentile values in Balochistan province

The scatter plot of observed pairs of drought durations and severities was plotted in the 1st graph out of three in Fig. 4 with the simulated pairs for the copula models. The pairs of G-H copula show a positively increasing trend along the main diagonal with a bunch of dots at the lower as well as with some extreme dots. Most of the observed dots occur in the lower which means that there are drought events with smaller duration and severity and have smaller interarrival times. Similarly, the extreme dots show the chances of drought events with a larger duration, severity, and interarrival time. These quantitative and qualitative results were calculated for the Balochistan province.

Risk assessment based on severity-duration frequency curves is a standard approach used by water resource engineers for the optimum use of water and to construct infrastructure and mitigation schemes [93, 94]. Rainfall is an important variable used in the SPI and is considered the main source of water depict or drought. The prolonged drought with maximum variability and recurrently happening strong rainfall would be the key characteristics of site climatic sensitivity, whereas consistent rainfall and small droughts are the characteristics of suitable climatic settings [63].

4 Conclusion

The drought condition is rapidly escalating to the worst among other disasters in Balochistan which is suffered by nearly all droughts that occurred in Pakistan. Droughts can bitterly be tackled with proper water resource engineering. Risk assessment based on drought severity and duration frequency curves is an authentic tool used by water resource engineers for optimum and reliable planning of water and drought mitigation in the world. Therefore, this study was conducted to evaluate and find drought projections for water resources management through severity-duration frequency curves and return periods in Balochistan. The SPI was applied at a higher scale for extreme drought events in the form of duration and severity. For a more reliable drought risk assessment and water planning in the region, the homogeneity of drought duration and severity was performed through discordancy and heterogeneity measures.

The best-fitted regional distributions were selected in the form of GPA distribution for drought duration and severity events. The bivariate regional frequency analysis was performed for a more robust quantile estimate. The scatter plots, chi-plots, and K-plots along with the three types of statistical correlation coefficients were used to show that the variables are highly positively correlated. The copula function gives good results when there is a high correlation. Therefore, the G-H copula function was selected as the best-fitted copula compared to Clayton and Galambos copulas based on several qualitative and quantitative tools. Potential drought risks were estimated through drought probabilities and return periods. Three types of joint and two types of conditional return periods were estimated for the selected years. The Kendall return periods are preferred which lie between the traditional joint return periods that is \({T}_{OR}\) and \({T}_{AND}\) return periods. Most of the drought events lie along the lower return periods and have smaller interarrival times while there are several extreme drought events with larger interarrival times in the region. The conditional return periods have high return periods because drought with such a high severity (duration) and fixed duration (severity) takes a long time to occur. Contour lines were drawn to show the variability between the return periods for different combinations of drought duration and severity. The numerical values for the selected return periods were also calculated which show that drought duration and severity increase with an increase in return periods. The results illustrate the likelihood of droughts with maximum duration and severity in Balochistan. These results will be helpful for drought mitigation and water planning in the Balochistan province of Pakistan.