1 Introduction

Drought analysis is essential for a better assessment of drought impacts on socioeconomic and environmental systems, as well as for an adequate planning and implementation of effective mitigation measures.

At-site drought analysis, which is based on hydrometeorological time series observed at a given location, comprises various tools spanning from computation of drought indices for monitoring purposes (Bonaccorso et al. 2012 and references therein) to stochastic characterization of droughts aiming at deriving the marginal and joint probability distributions of drought characteristics (Zelenhasic and Salvai 1987; Kendall and Dracup 1992; Shiau and Shen 2001; Bonaccorso et al. 2003b, 2013; Cancelliere and Salas 2004, 2010; Mohan and Sahoo 2008). While at-site analysis can provide useful information on drought occurrences in a limited area, regional analysis enables to identify droughts that affect significantly a large region by considering, besides drought duration and severity, also drought areal extent.

One of the simplest regional drought analysis consists in spatially interpolating drought-related meteorological variables or drought indices computed at-site, in order to describe spatial variability of drought events (e.g. the rainfall depth at the time interval i, the deviation of the total rainfall from the corresponding long term mean, the Standardizad Precipitation Index, the Palmer Drought Index, etc.). In this case, the areal extent of an historical drought can be drawn by plotting a drought descriptor versus the corresponding percentage areal extent.

A slightly more sophisticated analysis may consist in deriving the relationship between a drought descriptor of selected probability of occurrence (or return period) and the corresponding percentage areal extent. Among the latter category, drought severity-area-frequency (SAF) curves have been proposed for drought assessment in a region.

Once that droughts have been locally identified, the approach generally adopted to derive SAF curves consists of the following steps (Kim et al. 2002; Loukas and Vasiliades 2004; Mishra and Desai 2005; Mishra and Singh 2008): 1) estimate a measure of drought severity (e.g. sum of negative precipitation runs in a dry spell, sum of negative SPI values in a dry spell, etc.) associated with different areal extents (in terms of percentage area) by considering different areal thresholds; 2) determine the probability distribution that best fits drought severity series for different areal extents; 3) perform frequency analysis in order to associate drought severity with different return periods; 4) draw SAF curves for the region under consideration. Alternatively, the probability distribution of areal extent for a fixed threshold of the index, measuring drought severity, can also be derived to characterize SAF curves.

Depending on the available observational records and the applied drought identification method, there might be few relevant drought events for a robust regional drought frequency analysis oriented to SAF curves derivation. To this end, some authors have proposed Monte Carlo simulation methods to synthetically extend the historical records (Henriques and Santos 1999; Hisdal and Tallaksen 2003; Burke and Brown 2010).

As an alternative to the traditional inferential approach, probabilistic characterization of regional droughts can be performed by analytical derivation or approximation of the probability distributions of regional drought characteristics. To this end, Santos (1983) analyzed the probabilistic features of regional drought characteristics, assuming precipitation series time independent and distributed according to a multivariate normal. In particular, based on the statistics of the underlying precipitation, the author derived the moments and approximate pdf’s of regional drought areal coverage, duration and intensity, assumed normally distributed as well.

Cancelliere (2011) used the moments derived by Santos (1983) to compute the parameters of probability distributions of regional drought characteristics, properly chosen in order to take into account that some variables are too skewed and/or bounded to be assumed as normally distributed.

Principal Component Analysis (PCA) can be also applied to study spatial variability of drought events over a region (Lloyd-Hughes and Saunders 2002; Bonaccorso et al. 2003a; Vicente-Serrano 2006; Bordi et al. 2009; Santos et al. 2010; Raziei et al. 2011, 2013).

In this study an analytical approach to probabilistically characterize the relationship between drought severity and areal extent, namely drought SAF curves, is proposed, following the results from Santos (1983) and Cancelliere (2011). The developed procedure enables to characterize a given drought event in a region, by comparing the corresponding drought severity-areal extent relationship with theoretical SAF curves corresponding to different non-exceedance probabilities.

Hereinafter, drought severity is expressed in terms of Standardized Precipitation Index (SPI) (McKee et al. 1993), and drought areal extent is the proportion of the total area of the region under investigation where the SPI values are below a fixed threshold. In practice, different thresholds are selected ranging between the SPI classification values commonly applied to categorize, besides dry conditions, also wet conditions (see Table 1).

Table 1 Weather classification based on the Standardized Precipitation Index (Hayes et al. 1999)

The proposed methodology is applied to investigate spatio-temporal features of meteorological drought occurrences over Sicily, Italy, for the period 1921–2005, and the validity of the approach is verified by comparing the observed areal extent for given SPI thresholds, with the corresponding quantiles computed by means of the derived analytical approximations.

2 Probabilistic Characterization of Drought Areal Extent

In order to probabilistically characterize drought areal extent in a region, we will consider that in the area of interest there are m precipitation stations. Each station k can be characterized by the corresponding area of influence s k , computed for instance by the Thiessen’s polygons method, which can also be expressed as a fraction a k of the total area. It may be worthwhile to note that the assumption of m stations can be easily extended to the case when precipitation information is available at gridded locations.

Let X k,t be the monthly precipitation observed at station k and at time t. For each station k, the SPI series can be computed based on X k,t , t = 1, 2, … n by means of an equi-probability transformation of aggregated monthly precipitation values into standard normal values, with the aggregation time scale generally fixed according to the purpose of the analysis. In practice, computation of the SPI index requires: i) fitting a probability distribution to aggregated monthly precipitation series (e.g. 3-month, 6-month, 12-month, … aggregation time scale), ii) computing for each value the non-exceedance probability and iii) determining the corresponding standard normal quantile, which is the SPI value. As a consequence of the index computation procedure, the resulting SPI series will be distributed as a standard normal random variable. Negative values of the index will broadly define drought conditions, while positive values will identify wet ones. Also, different threshold values can be introduced, thus characterizing in some details drought or wet conditions. For further details on SPI computation, the readers may, for instance, refer to Edwards and McKee (1997).

With reference to each station k, let Z k,t be the SPI value at time t and z 0 be a SPI threshold value.

Then the following indicator variable can be defined:

$$ \begin{array}{l}{I}_{k,t}=0\ \mathrm{if}\ {Z}_{k,t}>{z}_0\\ {}{I}_{k,t}=1\ \mathrm{if}\ {Z}_{k,t}\le {z}_0\end{array} $$
(1)

where I k,t is an indicator variable with probability p k  = P[Z k,t  ≤ z 0 ] of being 1 and 1– p k of being 0.

Thus, I k,t =1 indicates that at time t at site k, the SPI value Z k,t is below a fixed threshold z 0 .

Let Ad t be the proportion of the total area of the region under investigation where Z k,t  ≤ z 0 in a given interval t, defined as:

$$ A{d}_t\left({z}_0\right)={\displaystyle \sum_{k=1}^m{a}_k}\cdot {I}_{k,t} $$
(2)

Following a similar line of reasoning adopted by Santos (1983), the first two moments of Ad t can be expressed as:

$$ \varepsilon \left[A{d}_t\right]=\varepsilon \left[{\displaystyle \sum_{k=1}^m{a}_k}\cdot {I}_{k,t}\right]={\displaystyle \sum_{k=1}^m{a}_k}\cdot {p}_k $$
(3)

and

$$ \mathrm{Var}\left[A{d}_t\right]={\displaystyle \sum_{k=1}^m{a}_k^2}\cdot {p}_k\cdot \left(1-{p}_k\right)+2{\displaystyle \sum_{k=1}^m{\displaystyle \sum_{j=k+1}^m{a}_k}}\cdot {a}_j\cdot \left({p}_{k,j}-{p}_k{p}_j\right) $$
(4)

where p k,j  = P[z k,t  ≤ z 0, z j,t  ≤ z 0].

Note that, since by definition SPI is distributed according to a standard normal, it is p k  = Φ(z o ), where Φ(.) is the standard normal cumulative density function. Then the above equations can be rewritten as:

$$ \varepsilon \left[A{d}_t\right]=\varepsilon \left[{\displaystyle \sum_{k=1}^m{a}_k{I}_{k,t}}\right]={\displaystyle \sum_{k=1}^m{a}_k{p}_k}={\displaystyle \sum_{k=1}^m{a}_k\cdot \varPhi}\left({z}_0\right)=\varPhi \left({z}_0\right) $$
(5)
$$ \mathrm{Var}\left[A{d}_t\right]={\displaystyle \sum_{k=1}^m{a}_k^2}\varPhi \left({z}_0\right)\left[1-\varPhi \left({z}_0\right)\right]+2{\displaystyle \sum_{k=1}^m{\displaystyle \sum_{j=k+1}^m{a}_k{a}_j\left[{p}_{k,j}-{\varPhi}^2\left({z}_0\right)\right]}} $$
(6)

Inspection of Eqs. (3), (4) and (5), (6) enables to draw some general remarks about the features of Ad t in relation to the characteristics of the underlying SPI field and in particular of the spatial correlation between SPI at different stations. More specifically, Eq. (5) reveals that the expected value of Ad t is a function of z 0 only, and therefore clearly it does not depend on the spatial correlation between stations. On the other hand the variance of Ad t is function of p k,j , which in turn will depend on the spatial correlation. In particular, the term [p k,j – Φ2(z0)] in Eq. (6) will be zero in the case of independence between precipitation observations in stations k and j, while it will be greater than zero in the case of positive dependence in the precipitation (or SPI) field.

As an example, if z 0  = 0, the Sheppard’s solution (Sheppard 1899) can be applied for p k,j , namely:

$$ {p}_{k,j}=P\left[{Z}_{k,t}\le 0,{Z}_{j,t}\le 0\right]=\frac{1}{4}+\frac{1}{2\pi } si{n}^{-1}\left({r}_{k,j}\right) $$
(7)

As it can be observed from Fig. 1, where the term [p k,j – Φ2 (z0)] is plotted versus the spatial correlation coefficient r k,j for the case z 0  = 0, the former is always greater than or equal to zero.

Fig. 1
figure 1

[p k,j – Φ2(z0)] versus the spatial correlation rk,j for z0 = 0 (see Eq. (7))

As a consequence, it can be concluded that the stronger the spatial dependence of precipitation (and hence of SPI) in a region, the larger the variance of Ad t .

The joint probability p k,j can be computed once that the joint distribution of (Z k,t ,Z j,t ) is known. Given that the Z k,t are by definition marginally distributed according to a standard normal, as a first approximation it is reasonable to assume Z k,t and Z j,t bivariate normal with zero vector mean \( \widehat{\boldsymbol{\upmu}} \) and variance-covariance matrix Σ:

$$ \boldsymbol{\Sigma} =\left(\begin{array}{cc}\hfill 1\hfill & \hfill {\gamma}_{k,j}\hfill \\ {}\hfill {\gamma}_{j,k}\hfill & \hfill 1\hfill \end{array}\right) $$
(8)

where γ k,j  = Cov(Z k,t , Z j,t ).

With the exception of a few simple cases, finding the exact distribution of Ad t is a difficult task, as it entails derivation of the multivariate distribution of dependent Bernoulli random variables rescaled with respect to the corresponding influence areas a k (Santos 1983; Cancelliere 2011). For instance, assuming all the influence areas equal (a k  = a j for every k, j) and neglecting spatial correlation between precipitation at the different stations, it can be shown that the random variable m · Ad t is distributed according to a binomial distribution with parameters p k and m (Mood et al. 1974).

For the more general case, since Ad t is bounded between 0 and 1, a beta distribution can be adopted as an approximation (Johnson et al. 1994; Sheffield et al. 2004; Cancelliere 2011):

$$ {f}_{A{d}_t}(a)=\frac{1}{B\left(\delta, \xi \right)}\cdot {a}^{\delta -1}{\left(1-a\right)}^{\xi -1}\left(0\le a\le 1\right) $$
(9)

where B (δ,ξ) is the complete beta function ∫  10 w δ − 1(1 − w)ξ − 1dw.

The parameters δ, ξ can be estimated as a function of the first two moments μ A  = ε[Ad t] and σ 2 A  = Var[Ad t ] given by Eqs. (3) and (4) as (Johnson et al. 1994):

$$ \delta ={\mu}_A^2\cdot \left(\frac{1-{\mu}_A}{\sigma_A^2}\right)-{\mu}_A $$
(10)
$$ \xi =\frac{\left(1-{\mu}_A\right)\cdot \delta }{\mu_A} $$
(11)

Therefore, once that the first two moments of areal extent are known, the beta probability distribution is defined as well.

Finally, SAF curves for a fixed probability and different threshold values z 0, can be derived by using the inverse of the cumulative distribution function of the beta distribution (namely the integral of Eq. (9)). In practice, for fixed non-exceedance probability q, the corresponding areal extent Ad(q), will be given by the solution of the following equation:

$$ q={\displaystyle \underset{0}{\overset{ Ad(q)}{\int }}\frac{1}{B\left(\delta, \xi \right)}{a}^{\delta -1}{\left(1-a\right)}^{\xi -1}\mathrm{d}a} $$
(12)

where the parameters δ, ξ are implicitly function of z 0 through Eqs. (5), (6), (10) and (11). Note that from a computational point of view, the solution of Eq. (12) can be easily found by means of standard numerical methods.

3 Probabilistic Characterization of Drought in Sicily

3.1 Case Study

The developed methodology has been applied to characterize probabilistically drought areal extent in Sicily island, Italy. Sicily is one of the largest islands in the Mediterranean sea with a surface of approx. 25,000 km2. The climate of the island is semiarid, with mean annual precipitation around 700 mm and high intra-annual variability from year to year.

Available data include a dataset of monthly precipitation collected by the Water Observatory of Sicily Region at 105 rainfall stations over the period 1921–2005 (85 years), where occasionally missing-values have been estimated by regression techniques. Such data are submitted to a manual inspection for internal consistency and temporal coherency by the Water Observatory, before publication (Sciuto et al. 2009). Corresponding monthly SPI series have been computed for each station considering a 12-month aggregation time scale (SPI-12), with view to a possible application of the proposed methodology to other drought indices, such as the Palmer index, and to a comparison of related results. Indeed, Lloyd-Hughes and Saunders (2002) have demonstrated that SPI-12 exhibits a close correspondence to the Palmer Drought Severity Index in studying drought climatology for Europe. Also, a good correspondence has been observed between SPI-12 and the Palmer Hydrological Drought Index (PHDI) in Sicily by Rossi et al. (2009). Besides, a different SPI aggregation time scale than the one here considered, can be chosen according to the purpose of the study.

Percentages of influence area a k of each station with respect to the whole regional area were determined by means of Thiessen polygons. Figure 2 shows the location of the investigated stations, as well as the related influence polygons.

Fig. 2
figure 2

Location of investigated precipitation stations in Sicily island and related Thiessen polygons

Then for each month, observed Severity-Area curves have been computed by considering the percentage of the total area Ad t (z 0 ) where observed SPI is below a given z 0 value, namely by applying Eq. (2).

As an example, with reference to an aggregation time scale of 12 months, Fig. 3 shows three of such curves, corresponding to three different months and years. Inspection of the plot reveals the different conditions that occurred in these 3 months with respect to drought. In particular, the curve related to April 2002 indicates that a high percentage of the total area was affected by dry conditions (z 0  ≤ 0) and in particular that almost 80 % of the total area was affected by SPI values below −1.5. On the other hand, the curve related to December 1976 clearly indicates the relatively wet conditions observed over the island, with no areas with negative SPI values and only 15 % of the area with SPI less than 1. The third curve is related to April 1966 and reveals a generally normal condition with respect to drought, where most of the area (approx. 90 %) exhibit SPI values between −1 and 1.

Fig. 3
figure 3

Severity-Area curves related to three relevant months

In order to characterize probabilistically the above mentioned SAF curves, the procedure outlined in Section 2 has been applied. More specifically, capitalizing on the fact that the SPI series are by definition zero mean, unit variance, the sample cross-covariances \( {\widehat{\gamma}}_{k,j} \) of the observed SPI values at the 105 stations have been computed as:

$$ {\widehat{\gamma}}_{k,j}={\displaystyle \sum_{t=1}^m{z}_{k,t}{z}_{j,t}} $$
(13)

By fixing the z 0 value, the expected value and variance of the percentage of the area Ad t (z 0 ), where Z t,k  ≤ z 0 , have been computed by means of Eqs. (5) and (6), assuming the underlying SPI values distributed according to a multivariate normal.

Then the parameters δ, ξ of the distribution of Ad t (z 0 ) (assumed as beta) have been computed by means of Eqs. (10) and (11). In Fig. 4 the probability plots of the derived distributions of Ad t (z 0 ), are shown for different z 0 values. The plots reveal a generally good agreement between observed quantiles and those computed by means of the derived distributions, which is particularly remarkable considering that no distribution fitting has been carried out since the parameters of the distribution have been determined analytically as a function of the characteristics of the underlying SPI field.

Fig. 4
figure 4

Probability plots of the distribution of areal extent as a function of the threshold z0. The plots compare the observed areal extent with the corresponding quantiles computed by means of the derived analytical approximations

Once the distribution of Ad t (z 0 ) is known, quantiles Ad(q) corresponding to non-exceedance probabilities q can be computed by inverting the beta distribution. Such quantiles represent the percentage of the area with non-exceedence probability q where the SPI values are below z 0 , and therefore, the plot of Ad(q) vs. z 0 will enable to characterize probabilistically the Severity-Area curves.

In Fig. 5 the values of q in the plane [Ad(q), z 0 ] have been interpolated and represented according to a colour scale. On the same plot, the three observed SAF curves already illustrated in Fig. 3 are also shown. As the x-axis covers both negative and positive values of z 0 , the figure accounts for dry and wet conditions at the same time, thus returning an overall picture of the regional climatic behaviour and related occurrence probability for any given month and year. Clearly, if the focus is on regional drought analysis only, one should look at the half plane corresponding to negative z 0 .

Fig. 5
figure 5

Severity-Area-Frequency curves and observed Severity-Area curves for three relevant months

The plot confirms the severe drought conditions that occurred on April 2002, whose corresponding SAF curve exhibits non-exceedance probabilities in the order of 99 %. On the other hand, the curve related to December 1976 exhibits non-exceedance probabilities in the order of 1 %, thus confirming the significant wet conditions experienced over the island in that month. Similar considerations can be drawn for April 1966, thus confirming the relatively normal conditions of the island in that month.

4 Conclusions

Probabilistic characterization of areal extent of droughts occurring over a region is an important step toward a robust water resources planning. Drought assessment on a regional and multi-monthly spatial and temporal scale respectively, can also be useful to evaluate the appropriateness of inter-basin water transfers as a drought mitigation measure.

In this study, an analytical methodology to probabilistically characterize the relationship between meteorological drought severity (computed in terms of Standardized Precipitation Index, SPI) and areal extent, expressed as Drought severity-Area-Frequency (SAF) curves, has been proposed. In particular, approximate analytical expressions of SAF curves, describing the proportion of the total area of the study region where SPI values are below a fixed threshold as a function of non-exceedance probabilities, have been derived and validated with reference to drought occurrences over Sicily, Italy, for the period 1921–2005.

The derived analytical expressions enable to draw some general conclusions about the effect of spatial dependence of SPI fields on the moments of areal extent. In particular, it has been shown that the stronger the spatial dependence, the higher the variance of areal extent of SPI values below a given threshold will be. Clearly, the possibility to take into account the spatial dependence of the considered drought severity variable represents a significant advancement of the proposed methodology with respect to the inferential procedure traditionally applied to derive SAF curves.

Validation of the procedure with reference to observed droughts and wet periods in Sicily, has confirmed the validity of the proposed analytical approach, that enables to derive the distributional properties of areal extent of drought, on the basis of the stochastic structure of the underlying SPI series.

Finally it may be worthwhile to highlight the fact that although the methodology has been developed with reference to SPI, it can be easily extended to other drought indices such as PDSI (Palmer 1965) or RDI (Tsakiris et al. 2007), in order to analyze and monitor the spatial features of these indices.