1 Introduction

Prolonged drought leading to vegetation destruction and loss in biological and economic productivity of the dry land is cited by WMO (2006) as one of the causes of desertification. Given the ever-increasing threats of this phenomenon in many areas of the world as a result of climate change, scientists have invested in the quantification and categorization of drought for forecasting purposes. Despite these efforts on the part of the scientific community, one of the remaining challenges is to develop and improve efficient drought tools.

There are several indices developed for the quantification of drought, for instance the Palmer drought severity index (PDSI) (Palmer 1965), the standardized precipitation index (SPI) (McKee et al. 1993), the rainfall anomaly index (RAI), Bhalme and Mooley drought index (BMDI) and the Palmer drought index (PDI) (Oladipio 1985), the Z-score or standardized rainfall anomalies (Jones and Hulme 1996), the standardized precipitation evapotranspiration index (SPEI) (Sergio et al. 2008) and the precipitation anomaly index (PAI) (Zhang et al. 2013). Many other indices can be found in Amin et al. (2011). Several studies comparing results of these indices exist in the scientific literature (Oladipio 1985; Guttman 1998; Szalai and Szinell 2000; Lloyd-Hughes and Saunders 2002; Majid et al. 2017; Gustavo et al. 2018). It appears that these indices are efficient in quantifying drought, but with performances often changing slightly from one index to another (Alley 1984; Oladipio 1985), SPI being often more robust (Okpara and Tarhule 2015) and the PDSI sometimes problematic (Oladipio 1985; Alley 1984). Of all these indices, SPI, a probability index formulated by McKee et al. (1993) presents many advantages (Hayes et al. 1999): it is the most common and applicable tool, mainly in areas where access to data other than rainfall is difficult or even impossible (e.g., many African countries). Furthermore, It is not affected by topography and is suitable for quantifying most types of droughts (Lana et al. 2001; Szalai and Szinell 2000).

SPI which is the index studied in this paper for quantifying drought, needs first an investigation on appropriate statistical function fitting precipitation distribution. The gamma function has been found to fit the precipitation distribution quite well (Sharma and Singh 2010; Alvarez et al. 2011); it provides the best model for describing monthly precipitation over most of Europe (Lloyd-Hughes and Saunders 2002). In the region of West-Africa, particularly in the Niger River Basin, Okpara and Tarhule (2016) fitted to monthly rainfall series, five different distribution functions (lognormal, exponential, log-logistic, Weibull and gamma) and the results of the case study showed the gamma-type two distributions to be the best fit over the upper Niger sub-basin. Other studies using SPI as drought indicator used the gamma distribution as the best fit only because it is commonly used (Dutra et al. 2013). Likewise, Okpara et al. (2017) also used this distribution (Gamma probability distribution type 2) to fit gauged-based monthly rainfall and derived SPI was used to investigate the potentials of SPI as standard measure for meteorological drought in West Africa. However, SPI derived from gamma distribution shows some good results. In Zambia, it successfully categorized extremely dry years (1992 and 2015), severely dry year (1995), moderately dry years (1972, 1980, 1987, 1999 and 2005) and 26 near normal years Libanda et al. (2019). Naumann et al. (2014) investigated the capability of different data sets (the ECMWF ERA-Interim reanalysis, the Tropical Rainfall Measuring Mission satellite monthly rainfall product 3B-43, the Global Precipitation Climatology Centre gridded precipitation data set, the Global Precipitation Climatology Project Global Monthly Merged Precipitation Analyses, and the Climate Prediction Center Merged Analysis of Precipitation) and drought indicators (the Standardized Precipitation Index, the Standardized Precipitation-Evaporation Index, and Soil Moisture Anomalies) to improve drought monitoring in Africa. The gamma function was chosen as distribution to fit precipitation and derive SPI. They found that for the areas affected by drought, all the drought indicators agree on the time of drought onset and recovery, and show higher uncertainties in regions with limited rain gauge data.

In many studies, the precipitation record is directly fitted to a gamma distribution before deriving SPI (Gidey et al. 2018; Mustafa and Rahman 2018; Spinoni et al. 2014; Amin et al. 2011; Lloyd-Hughes and Saunders 2002). Nevertheless, this function is not always the best-fit distribution in some areas (Guenang and Mkankam 2014) and very few studies consider this preliminary step that skipping may generate wrong SPI values.

This paper aims for providing appropriate distribution functions that better fit multi-scale aggregated precipitation at each grid point in Central Africa and investigate on the derived SPIs if using non appropriate distribution functions; it also investigates the possibility to reduce the number of time scales in a multiscalar drought study to guide users on the choice of a minimum and efficient time scales for its activity sector. Study domain, data and methodology are described in the next section. Sections 3 and 4 present the results and discussion, respectively, and the work ends with conclusions.

2 Study Domain, Data and Methodology

2.1 Study Domain

The domain of this study is delimited by the coordinates \(15^{\circ }\hbox {S}\)\(15^{\circ }\hbox {N}\) and \(5^{\circ }\hbox {E}\)\(35^{\circ }\hbox {E}\) (Fig. 1) comprising southern part of Niger, Nigeria, Chad, Sudan, Cameroon, Central African Republic (CAR), Guinea, Gabon, Congo, The Democratic Republic of Congo (DRC), Angola, Zambia, Malawi, Tanzania, Burundi, Rwanda, and Uganda. Figure 1 show that the topography is diversified with highlands, basins, plateaus and plains. Its hydrological network is also important. The northern part of the domain (\(>7^{\circ }\)N of latitude and including northern Nigeria, Chad, north Cameroon and Sudan) is subject to desert conditions and the southern part to tropical rain-forest. The intermediate zone is dominated by an equatorial climate. More information about the description of the region can be found in Fotso-Nguemo et al. (2016). This area is a center of interest for the study of drought as its plays an important role for the global climate change perspective. Hence the need to develop and improve drought quantification tools for control purposes and alerts in case of risk.

Fig. 1
figure 1

Study area and topography

This domain is located between the tropics where solar radiations are constantly high during the year (Qiang 2003). The southern part of the domain is the Congo basin known as the second largest rainforest in the world, a significant carbon sink in global climate perspective. The northern part (Sahel region) is defined as an arid region. The western part is bordered by Atlantic Ocean. The relief of the area is diversified and characterized by many high lands.

2.2 Data Used

Two datasets ranging from 1951 to 2016 were used in this study:

  • Gridded monthly precipitation data version 4.01, from the Climatic Research Unit (CRU) released the 20 September 2017 (Harris et al. 2014). They have \(0.5^{\circ } \times 0.5^{\circ }\) spatial resolution and cover the time period ranging from 1951 to 2010. These data (CRU TS 4.01) are downloadable free of charge from the CRU website. Version 3 of these data was used in a part of this region especially in Cameroon to compute SPI and results were satisfactory similar to those from observation stations (Guenang and Mkankam 2014). CRU data have also been used successfully for computing total water storage change and establishing the link with onset and retreat dates of the rainy season in the same area (Guenang et al. 2016).

  • Full monthly precipitation dataset version 8, from the Global Precipitation Climatology Centre (GPCP) which is a German contribution to the World Climate Research Program (WCRP) and to the Global Climate Observing System (GCOS) (Schneider et al. 2016, 2017, 2018). These data based on quality-controlled data from all stations in GPCC’s data base are on a regular grid with a spatial resolution of \(0.5^{\circ } \times 0.5^{\circ }\) latitude by longitude. They are optimized for best spatial coverage and used for water budget studies and available through the GPCC Product Access Page.

2.3 Methodology

First, gridded monthly precipitation time series (CRU TS 4.01) were aggregated on 1-, 3-, 6-, 9-, 12-, ..., 48-month time scales. Second, four probability distribution functions (gamma, weibull, exponential and lognormal) estimated by the method of the maximum-likelihood (ML) (Thom 1958) were tested on time series of each aggregated precipitation and the Kolmogorov–Smirnov goodness-of-fit test statistic (Frank and Massey 1951) was applied.

The Kolmogorov–Smirnov test (KS-test) is a non-parametric test used to decide if a sample comes from a population following a specific distribution. The KS-test statistic, D, for a given cumulative distribution function F and empirical cumulative distribution function \(F_n\) is defined as the maximum absolute difference between F(x) and \(F_{n}(x)\) for the same x.

$$\begin{aligned} D = \sup _{-\infty< x < \infty } |F_n(x) - F(x)|, \end{aligned}$$
(1)

where n is the number of independent and identically distributed observations x of a continuous variable, and

$$\begin{aligned} F_n(x) = \frac{1}{n}\sum _{i=1}^n I_{(-\infty ,x]}(X_i) \end{aligned}$$
(2)

According to Glivenko–Cantelli theorem (Howard 1959), D almost converges to zero for \(n \rightarrow \infty\), if empirical sample data follow the distribution F. At each grid point, the best-fit distribution function among the four used was that having the lowest KS-test statistic. Next, the best-fit function was used to calculate the cumulative distribution of data, which was finally transformed into standardized normal variate (Abramowitz and Stegun 1965) of zero mean and unity variance. This Standardized Precipitation Index quantifies precipitation deficits and value less than 0 (zero) indicates a drought condition. The drought severity is categorized into four classes shown in Table 1. More details on the method, theory and the main mathematical tools used can be found in Guttman (1999), Lloyd-Hughes and Saunders (2002) and Guenang and Mkankam (2014).

Table 1 Drought classification according to SPI ranges

Spatial SPIs using each of the four distribution functions at all grid points were computed and compared to those derived from appropriate distribution functions at each grid point in order to evaluate the impact of the choice of distribution on SPI values. A classification of drought events according to SPI ranges (Lloyd-Hughes and Saunders 2002) was adopted (Table 1). The interpretation of SPI is done on the basis of the following illustration: the 3-month SPI provides a comparison of the precipitation over a specific 3-month period with the precipitation totals from the same 3-month period for all the years included in the historical record. Similar interpretation was done for n-month SPI value, where n is the number of months used as time scale.

After, calculation of cross-correlations between multiscalar SPIs were undertaken to objectively reduce the number of time scales without losing important information. So, the total surface average of SPIs for each time scale was calculated and used for computing cross-correlation.

3 Results

3.1 Determination of Statistical Distribution Functions Fitting Various Time Scales Aggregated Precipitation

Figures 2 and 3 show the spatial distribution of statistical functions that better fit the time series of precipitation over the period 1951–2016, for several time scales ranging from 1- to 48-month, and for CRU and GPCC precipitation data, respectively.

Fig. 2
figure 2

Distribution functions fitting several time scales aggregated precipitation from CRU and for the time period 1951–2016. The time scales concerned are 1-, 3-, 6-, ..., 48-month. Four distribution functions were tested and the best fit at each grid point was represented. So four different colors were used for the four distribution functions

Fig. 3
figure 3

Distribution functions fitting several time scales aggregated precipitation from GPCC and for the time period 1951–2016. The time scales concerned are 1-, 3-, 6-, ..., 48-month. Four distribution functions were tested and the best fit at each grid point was represented. So four different colors were used for the four distribution functions

For short time scales ranging from 1- to 6-month (Fig. 2a, b, c), the four distribution functions tested are spatially well represented, but with a strong preponderance of gamma and weibull functions at 1- and 3-month, and only weibull largely predominates at 6-month, the lognormal function being poorly represented. At 9-month time scale and above, areas where precipitation time series are described by these functions decrease in the benefit of the lognormal function which below 9-month time scale was very weakly represented. For 12-month time scale and more, the lognormal function predominates except at 18- and 21-month where both gamma and weibull outperform, respectively (Fig. 2g, h). We note that between 12- and 48-month, the three functions, namely gamma, weibull and exponential are all represented at competing portions, but spatially scattered. The exponential function has the lowest spatial representation at 1-, 3- and 6-month, but none of the gridded time series precipitation follows this distribution at longer time scales. In many cases, the Republic Democratic of Congo (DRC) area globally shows preference to weibull function. GPCC precipitation data show similar results (Fig. 3), but with distribution functions somewhat more dispersed spatially from 12-month time scale.

Overall, from 1- to 6-month, the four distribution functions are well grouped by zone and the weibull function is the most extended spatially. The gamma function for 1-month time scale is preferentially located in the belt \(10^{\circ }\hbox {N}\)\(15^{\circ }\hbox {N}\) of latitude, and in the southern part of the domain. In the band \(7^{\circ }\hbox {N}\)\(10^{\circ }\hbox {N}\), and in the loop crossing the southern parts of Gabon and the DRC, a thin layer of the exponential function is revealed, surrounding a space largely dominated by the weibull function.

3.2 Study of the links between SPIs at different time scales

Tables 2 and 3 show cross-correlations between SPI values at different time scales (1-, 3-, 6-, 9-, 12-, ..., 48-month) and computed using appropriate distribution functions at each grid point. Results give a symmetric matrix and only the part upper the diagonal is shown. Correlations between two SPIs at the same time scale are obviously perfect (\(r = 1\)). Correlations between SPI from 1- to 6-month time scales and SPI at other time scales are generally too weak (\(|r| < 0.1\)) except few cases. This shows less similarity between SPI at short time scales. However, from 9 to 36 months, cross-correlations between SPIs are higher and ranged from 0.1 to 0.9. Using CRU data (Table 2), \(\hbox {SPI}_1\) and \(\hbox {SPI}_6\) are significantly correlated and one can account for about 76% of the variance of the other. \(\hbox {SPI}_6\) is also significantly correlated with \(\hbox {SPI}_{18}\) at 85%. By reasoning in a similar way for other time scales, we establish a correspondence table (Table 4) to better understand the links between SPIs at different scales. In this table, SPI ran at 9-month time scale (\(\hbox {SPI}_{9}\) in column 1) can significantly explain 81% of the variance of \(\hbox {SPI}_{21}\) (column 2), 64% for \(\hbox {SPI}_{33}\) (column 2) and 51% for \(\hbox {SPI}_{45}\) (column 2); \(\hbox {SPI}_{18}\) and \(\hbox {SPI}_{30}\) can be skipped in the benefit of \(\hbox {SPI}_{6}\)* which is already selected, because \(\hbox {SPI}_{18}\) (column 1) can be significantly explained by 85% of \(\hbox {SPI}_{6}\)* (column 2). \(\hbox {SPI}_{42}\) (column 1) can be explained at 92% by \(\hbox {SPI}_{30}\) (column 2) which can in turn be explained by 73% of \(\hbox {SPI}_{6}\)* (column 3). In the same way as previously, we can skip \(\hbox {SPI}_{39}\), \(\hbox {SPI}_{42}\), \(\hbox {SPI}_{45}\) and \(\hbox {SPI}_{48}\) in the benefit of \(\hbox {SPI}_{15}\)*, \(\hbox {SPI}_{6}\)*, \(\hbox {SPI}_{9}\)* and \(\hbox {SPI}_{12}\)*, respectively. In addition, spatial patterns of SPIs from 18- to 48-month show resemblances to at least one from 1- to 15-month time scale (Fig. 5: (i), (m) and (q) are similar to (e), (j), (k), (l), (n), (o) and (p) are similar to (c) but with a slight overestimation of SPI absolute value). Precipitation data from GPCC gives similar results to those from CRU, but with correlation values all slightly larger, increasing the number of significant correlations (> 0.5) between time scales (from 32 to 37 significant values). The links that are added are those between 24- and 33-month, 1- and 42-month, 12- and 48-month, 24- and 45-month, and 33- and 48-month. It remains true that SPIs at these scales are represented between 1  and 15 months. The correspondence table (Table 4) established for CRU data remains valid for GPCC. Finally, SPIs from 1- to 15-month time scales can significantly explain the variances of SPIs at other time scales. Therefore, the rest of the study will be limited to time scales ranging from 1  to 15 month.

Table 2 Cross-correlation between SPIs computed with CRU precipitation data for several time scales and for total area
Table 3 Cross-correlation between SPIs computed with GPCC precipitation data for several time scales and for total area
Table 4 Correspondence between SPIs at different time scales and for CRU precipitation data

3.3 Sensitivity of SPI to statistical distribution functions

Figures 4 and 5 present multiscalar SPI means (sub-figures (a1), ..., (a6)) using appropriate distribution functions at each grid point and SPI biases ((b1), ..., (b6); (c1), ..., (c6); (d1), ..., (d6); (e1), ..., (e6)) if using one of the four distribution functions (gamma, exponential, lognormal and weibull, respectively) at all grid points. 1- to 15-month time scales aggregated monthly precipitation data from CRU and GPCC were used. The time period 01/1993–12/1993 was chosen within the decade 1990s known as one of the driest in Central Africa.

Fig. 4
figure 4

Multiscalar SPI mean from January to December 1993 ((a1), ..., (a6)) using appropriate distribution functions at each grid point and SPI biases ((b1), ..., (b6); (c1), ..., (c6); (d1), ..., (d6); (e1), ..., (e6); ) if using one of the four distribution functions (gamma, exponential, lognormal and weibull, respectively) at all grid points. Aggregated CRU monthly precipitation data from 1- to 15-month time scales were used

Fig. 5
figure 5

Multiscalar SPI mean from January to December 1993 ((a1), ..., (a6)) using appropriate distribution functions at each grid point and SPI biases ((b1), ..., (b6); (c1), ..., (c6); (d1), ..., (d6); (e1), ..., (e6); ) if using one of the four distribution functions (gamma, exponential, lognormal and weibull, respectively) at all grid points. Aggregated GPCC monthly precipitation data from 1- to 15-month time scales were used

Figure 4 ((a1), ..., (a6)) indicates in Congo Basin an area where SPIs are negative and of the order of − 1.2, showing the Moderate drought type, that widens from 1- to 6-month time scale. From 9-month, it decreases in mild drought (SPI \(> -1\)) while another similar drought pole appears north of the basin and intensifies at higher timescales. The study of SPIs trends indicates that the Congo Basin in general has recorded a significant drying that deserves attention in further investigation. The other parts of the domain generally have much lower and positive SPIs, indicating an increase in precipitation. Spatial patterns of SPIs with GPCC data are similar to that of CRU (Fig. 5 (a1), ..., (a6)) but with lower values and highest dispersion from 12-month time scale. So, GPCC underestimates droughts and spatially more disperses them as compared to CRU.

If the same distribution function is applied at all grid points, instead of the appropriate distribution functions at each grid point, the results of SPIs obtained become biased. These are the cases shown in Figs. 4 and 5 ((bi), (ci), (di), (ei), i=1, ..., 6) where the gamma, exponential, lognormal and weibull distribution functions were used, respectively. The bias is weaker at 1-month time scale. It is generally observed that the intensity and the spatial extent of biases increase as the time scale increases and is much more localized in the center of the domain, particularly where grid points have inappropriate distribution functions. The exponential function, the least suitable distribution in the domain, presents the highest biases (c1, ..., c6), whereas the most representative weibull function (d1, ..., d6) spatially has the weakest bias. These biases sometimes going up to 2 can unfortunately give rise to a type of drought rather than another.

In all, SPI is sensitive to distribution function used to fit precipitation. However, weibull followed by gamma shows best performances for computing SPI, while the exponential most of the time shows significant difference. From 12-month time scale and above, the best fit distribution function at each grid point will be benefited to expect good SPI computation and then better drought description.

4 Discussion

Several studies show that the choice of methods to calculate drought characteristics can introduce uncertainties in drought projections in the future periods (Okpara and Tarhule 2011; Mo 2008; Sheffield and Wood 2007). One of the sources of error in SPI results could come from the fact that in many computations (Lloyd-Hughes and Saunders 2002; Amin et al. 2011; Spinoni et al. 2014; Gidey et al. 2018; Mustafa and Rahman 2018), the precipitation record is directly fitted to a gamma distribution before deriving SPI, yet there are other distribution functions. In a part of the current study, we tested four distribution functions (gamma, exponential, lognormal and weibull) and chose the suitable fit at each grid point for better computing SPI. We found that the functions weibull and gamma are concurrently the best fits in most area over Central Africa. These results corroborate those previously obtained in Cameroon by Guenang and Mkankam (2014), where using both station and CRU datasets, they found that distribution function fitting precipitation changes with space and depends on the aggregated precipitation time scales. Thus the gamma distribution is not only the best-fit function in Africa contrary to Europe (Sharma and Singh 2010; Alvarez et al. 2011). Danielle et al. (2015) also fitted and tested three distributions functions, including lognormal, gamma and generalized extreme value to identify suitable parameters for the 3-month, 6-month and 12-month SPI. They found that the gamma distribution was suitable for precipitation in most area over the globe. Unfortunately afterward, the SPI calculation was done only with the best distribution statistically at all grid points, which could sometimes skew the results in areas where a different distribution function was suitable. Other investigations (Sienz et al. 2012) have noted limitations with the gamma distribution, particularly under extreme climates, and the three-parameter distributions such as the Pearson Type three have been proposed as an alternative (Guttman 1999).

There is a challenge to reduce the number of time scale in multi-scalar drought study or to find optimal time scale to be used if studying many drought indices. In the first case, considering different time scales, we found that some SPIs are highly correlated and show similar patterns. Thus the choice of time scales is important to avoid redundant drought interpretation. In the second case, Javier et al. (2018) found that 6-month time scale best reproduces observed events across all the drought indices used.

The current study has also shown that CRU TS 4.01 and GPCC datasets corroborate, but GPCC’s SPIs are wetter and slightly more dispersed at longer time scales. The version of CRU precipitation data used have been improved comparatively to version 3.10.01 or older whose a wet bias (due to a sharp decline in its number of rain-gauges for the recent decades) was underlined by Trenberth et al. (2014). This bias is due to a sharp decline in its number of rain-gauges for the recent decades. Furthermore, CRU data show high correlation with the GRACE data in Central Africa, especially south of the equator (Aiguo and Tianbao 2016). Otherwise, some investigations showed reliability of GPCC data and recommended to use them where there is poor data coverage since the 1990s (Aiguo and Tianbao 2016).

This study also noticed a trend to drying in Congo basin in particular, reinforcing the idea that the central African rain-forests have experienced a long-term drying (Wenjian et al. 2016; Spinoni et al. 2014). Such climatic condition is very alarming in Jordan according to Mustafa and Rahman (2018) who detected an increase in drought magnitude by applying both the Mann–Kendall trend test to precipitation recorded at several stations of the country and the regression test to the SPI values. Northern Ethiopia is not spared by this hazards in the sense that Gidey et al. (2018) detected a spatial and temporal increase in drought with incidences high during the past three decades. Over Africa in general, many different measures of drought all found widespread drying from 1950 to 2012 (Aiguo and Tianbao 2016). Dai (2013) tries to explain that the increase of drying over many land areas in the globe in general is due to the warming since the 1980s. It would be necessary to go deeper into this study in further investigations.

5 Conclusions

In a context where the climate is changing, it is of utmost importance to follow the dynamic of indices that can inform on the state of the climate in various regions of the world. Therefore, this study was undertaken to examine SPI calculation method and to show the sensitivity of results to the choice of distribution function used to fit precipitation data in one of the stages of SPI computation. Study domain was Central Africa and two monthly gridded precipitation datasets, both from the CRU and GPCC, were used on the time period 1951–2016. Four statistical distribution functions (gamma, weibull, exponential and lognormal) were tested to determine the best fit for multi-scale precipitation accumulation ranging from 1-, 3-, 6-, ... to 48-month. Next the multi-scale SPIs were computed and the sensitivity of results to distribution functions was examined. The study also attempted to establish relationship between SPIs computed at various timescales.

The results let appear that for short time scales (not more than 9-month), most gridded precipitation data preferably follow the weibull distribution function. Many other grid points show interest for gamma, few for lognormal and very few for exponential. Observed spatial pattern at 1-month time scale progressively shifts northward along longer time scales. Spatial distribution of functions scattered at 12-month, and no grid point follows the exponential function. Results of cross-correlations showed links between SPIs at different time scales and led to the necessity to reduce the study from 1- to 15-month. Overall, SPI values are sensitive to distribution function used to fit precipitation. The functions weibull and gamma showed lower bias in computing SPI if used at all grid points while the exponential most of the time shows significant difference. For long time scale (12-month and more), the best fit distribution function chosen at each grid point will be benefit to expect good SPI computation and then better drought description.

The SPI was recommended in 2009 as the best meteorological index (Hayes et al. 2011). One of the advantages of SPI is its ability to compute drought levels for different time scales ranging from 3, 6, 12, 24, and 48-month periods (Amin et al. 2011), but the pattern of increase in the duration and magnitude of droughts resulting from higher temperatures can not be identified by SPI, so SPEI was developed to overcome this shortcoming (Vicente-Serrano et al. 2010).