Introduction

Drought, like other natural phenomena closely linked to climate change, is increasingly affecting the four corners of the globe (Bhaga et al. 2020). It is one of the costliest natural disasters in the world, affecting more people than other forms of disasters (Zarei et al. 2021). Indeed it is a natural hazard that begins slowly so that we often speak of a slowly evolving phenomenon (Sylla et al. 2016). As a result, its manifestations may take longer to make themselves felt (Han et al. 2019). The impacts of drought vary according to regions, needs, and disciplinary perspectives (Liu et al. 2012; Dai et al. 2020). They depend on the socio-economic environment in which it occurs, since each region has its own climatic characteristics (Maia et al. 2015; Gebremeskel Haile et al. 2019). According to climatologists and meteorologists, it is the state of an environment facing a significantly long and severe lack of water, less than normal with negative impacts on flora, fauna and societies (Quenum et al. 2019). Although occasional droughts have always been part of the earth’s natural phenomena, higher temperatures, greater water evaporation and less vegetation cover all contribute to exacerbating the phenomenon (Ojha et al. 2021). Since drought can be analyzed and interpreted from different angles and different perceptions (Liu et al. 2018), there is no single definition of drought accepted worldwide (Wilhite and Glantz 1985). In general, it is defined according to the situation experienced from one area to another (Qin et al. 2015). Depending on the manifestations observed, droughts are classified into four types: meteorological, hydrological, agricultural and socio-economic (Wang et al. 2016). The three first types refer to the deficits in precipitation, in soil moisture and in streamflow, respectively (Dai et al. 2020) while socio-economic drought refers to the insufficiency of water resources systems to meet the water demand (Zhao et al. 2019). The current study focuses on meteorological drought which, according to Wang et al. (2016) is the starting phase for other types of droughts.

Since each region has its own climatic characteristics, drought monitoring involves many different methods because the amount of precipitation, the seasonal cycle and the nature of the precipitation vary from region to region (Wilhite et al. 2007; Park et al. 2017; Bhaga et al. 2020). This complexity in accurately describing the phenomenon has prompted researchers to define drought indices ranging from the simplest to the most complex. These indices make it possible to characterize the drought by intensity, duration, spatial extent, probability of recurrence (Spinoni et al. 2014; Huang et al. 2017), and its detection at different stages of its evolution (location, time of appearance and end) (He et al. 2018; Santé et al. 2019; Zhang and Li 2020). There are several drought indicators, the choice of which depends on the type of impact to be taken into account as part of the mechanism for monitoring and understanding changes in the vulnerability of the phenomenon (Huang et al. 2019; Bae et al. 2019). Of these different indices, we can cite the Palmer Drought Index (PDSI: Palmer (1965)), the Standardized Precipitation and Evapotranspiration Index (SPEI: Vicente-Serrano et al. (2010)), the Standardized precipitation index (SPI: McKee et al. (1993)), the rainfall anomaly index (RAI: Van Rooy (1965)) and the Reconnaissance Drought Index (RDI: Tsakiris and Vangelis (2005)). The SPI is recommended by the World Meteorological Organization as a standard for characterizing meteorological droughts (Hayes et al. 2011) because of the particular advantages: it offers good flexibility of use for multiple TSs (Hayes et al. 1999; Gidey et al. 2018; Guenang et al. 2019), it applies to all climatic regimes and has good spatial consistency which allows comparisons between different areas subject to different climates (Hayes et al. 1999; Pieper et al. 2020), its probabilistic nature places it in a historical context which is well suited to decision-making (Gebremichael et al. 2022). Due to these exceptional advantages, the index have shown effectiveness in detecting various historical drought events in many regions of the world (Dogan et al. 2012; Ndayiragije et al. 2022). Motivated by all these strengths of the SPI and the fact that it only depends on the precipitation for which data were available, it was used as a drought indicator in this study.

SPI proponents have suggested using the gamma distribution to fit cumulative precipitation in the calculation of this index, but many studies show the limitations of this distribution (Stagge et al. 2015; Touma et al. 2015; Blain et al. 2018), and researchers have indicated that the applicability of theoretical distributions to describe cumulative precipitation was inconsistent between different regions and climates (Raziei 2021). So, some findings point at the gamma and weibull distributions to be the best suited for long periods (larger than 3 months) and for short periods (smaller than 3 months) respectively (Stagge et al. 2015). Other studies around the world and in particular in Africa, have shown some distribution functions more appropriate than the gamma function which is in most cases used by default (Angelidis et al. 2012; Guenang and Mkankam Kamga 2014; Okpara and Tarhule 2015; Pieper et al. 2020; Zhang and Li 2020). However, all these studies are limited to a reduced number of distribution functions and the quantification of errors made by using inappropriate distributions remains a real challenge.

The objective of this study is to find appropriate statistical rainfall distribution models for the computation of SPI and to quantify the errors made on the SPI values if inappropriate statistical models of precipitation distribution were used beforehand for its computation. Given that the study area has a high rainfall variability, it was necessary to go further by increasing the number of distribution functions to be tested in order to increase the probability of finding the most appropriate leading to a more accurate SPI. Subsequently, the error made by using the default gamma function as is most often the case was quantified. The next section (“Materials and methods”) shows materials and methods. “Results” shows the results and “Discussion and conclusion” provides the discussion and conclusion.

Materials and methods

Study area

Located in the heart of Central Africa, between 1.40\(^{\circ }\)N and 13\(^{\circ }\)N of latitude and 8.30\(^{\circ }\)E–16.10\(^{\circ }\)E of longitude, at the bottom of the Gulf of Guinea, Cameroon covers an area of \(475,650 km^{2}\), including \(466,050 km^{2}\) of continental area and \(9,600 km^{2}\) of maritime area with a 402 km long maritime facade. The country belongs to the junction area between equatorial Africa in the south and tropical Africa to the north (Net 2019). Cameroon is characterized by an extraordinarily contrasting relief where high and low lands alternate. It is traditionally divided into five AEZs which are defined on the basis of their ecological, edaphic and climatic characteristics (Nfornkah et al. 2021). Details on geographical and climatic characteristics of these AEZs are presented in Table 1.

Table 1 Geographical and climatic characteristics of the AEZs of Cameroon (Vondou et al. 2021)
Fig. 1
figure 1

Study area with the geographical location of the 24 precipitation stations (indicated by numbers) in Cameroon

Data used

Monthly precipitation data ranging from 1951 to 2005 were obtained from the database of the National Meteorological Service of Cameroon. They are from 24 measuring meteorological stations and were successfully used in many studies Penlap et al. (2004); Guenang and Mkankam Kamga (2014). The geographical positions of these stations and the topography of the domain are shown in Fig. 1. The stations on which the study were focused are distinguished by a different coloring from the others. The selection of representative stations by zone was made on the basis of the minimum missing values.

Computation of the SPI

The SPI is computed by fitting an appropriate probability density function to the frequency distribution of precipitation, summed over a considered TS (1, 3, 6, 9, 12, 15, 18, 21 and 24 months) and the adjusted distribution is transformed into a normal standardized distribution, so that the average SPI is equal to zero (Raziei 2021). The ML estimation method was used to find the optimal parameters of the distribution functions to be tested and the K-S test was then performed to choose the best fit distribution from the following ten functions: gamma, weibull, exponential, lognormal, gumbel, cauchy, logistic, \(chi-square\), burr and pareto. The lowest K-S statistics determines the best fit distribution. This was afterwards used in the data generation to calculate the cumulative distribution function (CDF) which were transformed into normalized random variables, and then into SPI. The same procedure is applied for each station and all TSs.

Table 2 Classification of drought according to SPI values (Awchi and Kalyana 2017)

The period covered by the SPI varies according to the type of drought subject to the analysis and applications envisaged (Gebremichael et al. 2022). Thus, the interpretation of the SPI indicates the anomalies, which are deviations from the average of the total precipitation observed for any period. However, precipitation with high positive values corresponds to very wet periods (positive SPI) while high negative values correspond to periods of extreme drought (negative SPI). McKee et al. (1993), uses the SPI classification indicated in Table 2 to define different drought categories.

The ML method

The ML method makes it possible to estimate the parameters of a regression model, under the assumption that the true law of distribution of said parameters is known (Streit and Luginbuhl 1994). It consists, for a given sample, in maximizing the likelihood function (joint density function) with respect to the parameters. It seeks to find the parameter capable (with a high probability) of reproducing the true values of the sample (those actually observed), i.e. to find the most likely value of the parameter of a population starting from a given sample (Horváth 1993). Applied to a set of data, it provides values of the distribution parameter which maximize the likelihood function (Meng et al. 2014).

Either a random sample \( X_1, X_2, X_3,..,X_n \) from a distribution F(x; \( \theta _1,\theta _2,....\theta _p \)). When they exist, the estimators obtained by the ML method are the solutions \( \hat{\theta _1},\hat{\theta _2},....\hat{\theta _p}\) of the system of p equations:

$$\begin{aligned} \frac{\partial {L(\theta _1,\theta _2,....\theta _p )}}{\partial {\theta _r}}= 0 \end{aligned}$$
(1)

with r=1,2,..,p; where the likelihood function is defined by:

$$\begin{aligned} L(\theta _1,\theta _2,....\theta _p )= \prod _{i}^n f(X_i,\theta _1,\theta _2,....\theta _p) \end{aligned}$$
(2)

It is often easier to maximize the logarithm of the likelihood function than the likelihood itself. Either method leads to the same maximum because the logarithmic function is a monotonically increasing function.

$$\begin{aligned} \ln { L(\theta _1,\theta _2,....\theta _p )}= \prod _{i}^n \ln {f(X_i,\theta _1,\theta _2,....\theta _p)} \end{aligned}$$
(3)

The ML method is considered to be a very efficient estimator because it generates results with a lower variance value. Moreover, in long series (n > 100) the results are even more satisfactory. It has the desirable properties of a good estimator; in fact it is correct (it tends in probability towards the true value \(\theta \)), asymptotically unbiased (the mathematical expectation of the estimator \(\hat{\theta } \) is equal to the true value of the parameter \(\theta \)) and asymptotically efficient (Horváth 1993).

Law of statistical distributions used to fit data

The gamma law

Several researches have been made on the gamma law, in particular (Choi and Wette 1969) deal in detail with the gamma law. The random variable X follows a gamma distribution if its probability density function (PDF) is:

$$\begin{aligned} f(x)= \frac{1}{\beta ^{(\alpha )}\Gamma {(\alpha )}} x^{(\alpha -1)}\exp {(-\frac{\alpha }{\beta })} \end{aligned}$$
(4)

To obtain the gamma cumulative function, we proceed as follows:

$$\begin{aligned} F(x)= \int _0^{x} f(x)=\frac{1}{\beta ^{(\alpha )}\Gamma {(\alpha )}} \int _0^{x} x^{(\alpha -1)}\exp {(-\frac{\alpha }{\beta })}dx \end{aligned}$$
(5)

with

$$\begin{aligned} \left\{ \begin{array}{ll} \alpha>0 &{} \text {is the shape parameter} \\ \beta >0 &{} \text {is the scale parameter} \\ \Gamma &{} \text {is the mathematical}\,\, gamma\,\,\text {function} \\ \end{array} \right. \end{aligned}$$

\(\alpha \) and \(\beta \) are obtained by the ML method as follows:

$$\begin{aligned} \left\{ \begin{array}{l} \hat{\alpha }= \frac{1}{4A}(1+\sqrt{1+\frac{4A}{3}}) \\ \hat{\beta }= \frac{-x}{\hat{\alpha }} \\ A=\ln {(-x)} - \frac{\sum {\ln {(x)}}}{n} \\ \end{array} \right. \end{aligned}$$
(6)

With n the number of observation years. We also note that for \(x=0\), this function is not defined, and its modified cumulative function is in the form:

$$\begin{aligned} H(x)= q+(1-q)F(x) \end{aligned}$$
(7)

With q the probability at each station of having zero precipitation over the entire considered period.

The weibull law

The PDF of a random positive variable X distributed according to the Weibull law (Panahi and Asadi 2011) is:

$$\begin{aligned} f(x,\alpha ,\beta )=\alpha \beta x^{\alpha -1} \exp {(-\beta x^{\alpha })} \end{aligned}$$
(8)

Where \(\alpha \) and \(\beta \) are respectively the shape and scale parameters which are obtained by the ML method seen above and which are presented in detail by Wu (2002). There is no closed-form expressions of the parameters \(\alpha \) and \(\beta \), and therefore they are estimated by maximizing the log-likelihood expression of the equation (Panahi and Asadi 2011). Its complementary cumulative distribution function is a stretched exponential function so its explicit form is given by:

$$\begin{aligned} F(x)=1-\exp {(-(\frac{x}{\alpha })^{\beta })} \end{aligned}$$
(9)

The exponential law

A random variable X is distributed according to an exponential law if its PDF is given by:

$$\begin{aligned} f(x)=\frac{1}{\beta } \frac{\exp {[-(x-\mu )]}}{\beta } \end{aligned}$$
(10)

with \( x\ge \mu \) and \(\beta >0 \), where \(\mu \) is the location parameter and \(\beta \) the scale parameter (Rahman and Pearson 2007). The scale parameter is often denoted \(\lambda =\frac{1}{\beta }\) and is called constant failure rate. The PDF of the exponential law can therefore be written:

$$\begin{aligned} f(x)=\lambda \exp {[-(x-\mu )]}^{\lambda } \end{aligned}$$
(11)

Its distribution function is of the form:

$$\begin{aligned} F(x)=1-\exp (-(x-\mu ))\lambda \end{aligned}$$
(12)

The parameters \(\mu \) and \(\lambda \) are estimated from a random and independent sample. The ML estimator is determined by canceling the derivative of the logarithm of the likelihood function of the exponential law, which leads to:

$$\begin{aligned} \hat{\lambda }=\frac{1}{\bar{x}} \end{aligned}$$
(13)

with \(\bar{x}=\frac{1}{n}\sum _{i=1}^n x_i \)

The lognormal law

A positive random variable x follows a lognormal distribution if the logarithm of the random variable is normally distributed. The PDF of a lognormal distribution is defined as (Mage and Ott 1984):

$$\begin{aligned} f(x)=\frac{1}{x\sigma \sqrt{2\pi }}\exp {[\frac{-(\ln {x}-\mu )^{2}}{2\sigma ^2}]} \end{aligned}$$
(14)

with \(x>0\), \(\sigma >0\) and \( - \infty< \mu < +\infty \)

The term \(\mu \) is the scale parameter that stretches or shrinks the distribution, and \(\sigma ^2\) is the shape parameter that affects the shape of the distribution. They can be determined by the ML estimator method as follows:

$$\begin{aligned} \left\{ \begin{array}{l} \hat{\mu }= \frac{1}{n}\sum _{i=1}^n \ln {x_i} \\ \hat{\sigma ^2}= \frac{1}{n}\sum _{i=1}^n (\ln {x_i}-\hat{\mu })^2\\ \end{array} \right. \end{aligned}$$
(15)

The gumbel law

Also called doubly exponential law or law of extreme values, a random variable X is distributed according to a Gumbel law (Cooray 2010) if its PDF is given by :

$$\begin{aligned} f(x)=\frac{1}{\beta }\exp {[-\exp {(-\frac{x-\mu }{\beta })}]}\exp {(-\frac{x-\mu }{\beta })} \end{aligned}$$
(16)

with

$$\begin{aligned} \left\{ \begin{array}{ll} \mu>0 &{} \text {is the position or mode parameter} \\ \beta >0 &{} \text {is the non-zero scale parameter, positive or negative} \\ - \infty< x < + \infty &{} \\ \end{array} \right. \end{aligned}$$

The terms \(\mu \) and \(\beta \) are estimated using the ML method. Its cumulative distribution function is of the form:

$$\begin{aligned} F(x)=\exp {[-\exp {(-\frac{x-u}{\beta })}]} \end{aligned}$$
(17)

Gumbel’s law represents the maximum and minimum of a number of samples of normally distributed data.

The cauchy law

A random variable X follows a Cauchy law or even a Lorentz law if its PDF depending on the two parameters \(\mu > 0\) and \(\beta > 0\) (Schuster 2012) is defined by:

$$\begin{aligned} f(x)=\frac{1}{\pi }[\frac{\beta }{(x-\mu )^2+\beta ^2}] \end{aligned}$$
(18)

with \(-\infty< x < +\infty \). The particularity of this law is that it has neither expectation nor variance. The term \(\mu \) is the position parameter and the term \(\beta \) the scale parameter, that is the spread parameter. Likewise the term \(\mu \) represents both the mode and the median. These two parameters are estimated by the ML method. Its cumulative distribution function is of the form:

$$\begin{aligned} F(x)=\frac{1}{\pi }\tan ^{-1}{(\frac{(x-\mu )}{\beta })+\frac{1}{2}} \end{aligned}$$
(19)

The logistic law

A random variable X follows a logistic law if its PDF is given by Pérez-Sánchez and Senent-Aparicio (2018):

$$\begin{aligned} f(x)=\frac{\exp {\frac{-(x-\alpha )}{\beta }}}{(\alpha )(1+\exp {\frac{-(x-\alpha )}{\beta }})^2} \end{aligned}$$
(20)

\(-\infty<x<+\infty \), with \(\alpha \) the shape parameter and \(\beta \) the scale parameter non-zero and positive. Its cumulative distribution function is given by:

$$\begin{aligned} F(x)=\frac{1}{1+\exp {(\frac{-(x-\alpha )}{\beta })}} \end{aligned}$$
(21)

The parameters \(\alpha \) and \(\beta \) are estimated by the ML method and were considered as starting value for the program \(\alpha = 0\) and \(\beta = 1\).

The chi-square law

It is a continuous distribution with k degrees of freedom, used to describe the distribution of a sum of squared random variables (Robertson 1969). Similarly, its importance also comes from its usefulness for independent data sets to test the goodness of fit of a data distribution (Canal 2005). A random variable X follows a \(chi-square\) distribution if its PDF is given by:

$$\begin{aligned} f(x)=\frac{1}{\frac{2^{k}}{2}\Gamma {(\frac{k}{2})}} x^{(\frac{k}{2}-1)} \exp {(-\frac{1}{2}x)} \end{aligned}$$
(22)

with \(x\ge 0\). Its cumulative distribution function is :

$$\begin{aligned} F(x)=\frac{\gamma {(\frac{k}{2},\frac{x}{2})}}{\Gamma {(\frac{k}{2})}} \end{aligned}$$
(23)

with \(\gamma {(\frac{k}{2},\frac{x}{2})}\) the lower incomplete gamma function

The burr law

The burr type XII distribution is a continuous and widely known distribution because it includes the characteristics of various well-known distributions such as for example the weibull and gamma distributions (Pérez-Sánchez and Senent-Aparicio 2018). A random variable X follows a burr or burr type XII distribution if its PDF is:

$$\begin{aligned} f(x)=\frac{\alpha \gamma }{\lambda }(\frac{x}{\lambda })^{\alpha -1} (1+(\frac{x}{\lambda })^{\alpha })^{-\gamma -1} \end{aligned}$$
(24)

with

$$\begin{aligned} \left\{ \begin{array}{ll} \alpha>0 &{} \text {is the shape parameter} \\ \gamma>0 &{} \text {is the shape parameter} \\ \lambda>0 &{} \text {is the scale parameter} \\ x >0 &{} \\ \end{array} \right. \end{aligned}$$

The estimation of these parameters with the ML method is the most common (Ghitany and Al-Awadhi 2002). Its cumulative distribution function is:

$$\begin{aligned} F(x)=1-(1+(\frac{x}{\lambda })^{\alpha })^{-\gamma } \end{aligned}$$
(25)

The pareto law

A random variable X follows a Pareto law if its PDF is defined by the relation (Pérez-Sánchez and Senent-Aparicio 2018):

$$\begin{aligned} f(x)=\frac{\alpha \beta ^{\alpha }}{x^{\alpha +1}} \end{aligned}$$
(26)

with \(\beta \le x \le \infty \), \(\alpha > 0\) and \(\beta > 0\). The terms \(\alpha \) and \(\beta \) are respectively the shape and scale parameters, which are estimated by the ML method as follows:

$$\begin{aligned} \left\{ \begin{array}{l} \hat{\beta }= \min _{1\le i \le n} {x_i} \\ \hat{\alpha }= \frac{n}{\sum {\ln {x_i}}-n\ln {\beta }}\\ \end{array} \right. \end{aligned}$$
(27)

Its cumulative distribution function is :

$$\begin{aligned} F(x)=1-(\frac{\beta }{x})^{\alpha } \end{aligned}$$
(28)

The K-S fit test

As mentioned by Stephens (1970), this test is inspired by the statistics proposed by Kolmogorov (1933) for fitting to a distribution. It determines to what extent the data Xi (i=1, ...n) follow a specific distribution law F(X). The K-S test is a nonparametric test that can be used to compare a sample with a reference probability distribution or to compare two samples (Mitchell 1971). The idea is to calculate the maximum difference, in absolute value, between the empirical cumulative distribution and the theoretical cumulative distribution under the null hypothesis for the running sum of the chosen TS. Under the H0 hypothesis, this difference is small and the distribution of observations fits well into a given distribution (Berger and Zhou 2014). For a specific data set and distribution, the better the law fits the data, the weaker the K-S test will be. So, for a law to be the best, its K-S test must be considerably weaker than the others. It quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution or between the empirical distribution functions of two samples. Therefore, the smaller the D statistic, the closer the theoretical distribution is to the empirical distribution (Massey Jr 1951; Ramachandran and Tsokos 2015). For a distribution function cumulative F(x) given, Stephens (1970) defined the statistic (K-S) by:

$$\begin{aligned} D_{n}= max_{x} \Vert F_{n}(x)-F(x)\Vert \end{aligned}$$
(29)

with \(- \infty< x < + \infty \), and by Glivenko-Cantelli theorem (DeHardt 1971),

$$\begin{aligned} F_{n} (x)=\frac{1}{n}\sum _{i=1}^n I_{(-\infty ,x)} (X_{i}) \end{aligned}$$
(30)

where

$$\begin{aligned} \left\{ \begin{array}{ll} n &{} \text {is the observation parameter in population x} \\ F_{n} (x) &{} \text {is the empirical cumulative distribution function} \\ F(x) &{} \text {is the theoretical cumulative distribution function} \\ I_{(-\infty ,x)} &{} \text {is the indicator function of the event x} \end{array}\right. \end{aligned}$$
Fig. 2
figure 2

Cumulative distribution functions for 3-month aggregate precipitation showing empirical cumulative distribution function and gamma, weibull, exponential, lognormal, gumbel, cauchy, logistic, \(chi-square\), burr, and pareto distributions fitted to station data from a) Poli (AEZ1), b) Ngaoundere (AEZ2), c) Koundja (AEZ3), d) Bafia (AEZ5), e) Douala (AEZ4), f) Nkongsamba (AEZ4)

Fig. 3
figure 3

Same as in Fig. 2, but for 12-month aggregated precipitation

Fig. 4
figure 4

Same as in Fig. 2, but for 24-month aggregated precipitation

Results

Determination of the adequate distribution functions

Figures 2, 3 and 4 show comparative results of the CDF for historical precipitation data and for each of the ten trial distribution functions. The results are shown for six target stations of the AEZs (Poli, Ngaounderé, Koundja, Bafia, Douala and Nkongsamba) and for 3, 12 and 24 months TS. The following abbreviations were adopted for the functions: gamma (g), weibull (w), exponential (e), lognormal (ln), gumbel (gu), cauchy (c), logistic (lo), \(chi-square\) (ch), burr (bu) and pareto (p). The K-S test was applied and the results are presented in Tables 3, 4 and 5 for 3, 12 and 24 months TSs respectively.

The results show that at 3-month TS (Table 3), the logistic distribution is the best fit in the four stations namely Poli, Ngaoundere, Bafia and Koundja. For the two other stations Douala and Nkongsamba, burr and weibull are the best fit respectively. At 12-month TS (Table 4), data from the stations of Poli, Nkongsamba and Douala fit better with the burr distribution; Koundja shows the gamma as the best fit, while data from Ngaoundere and Bafia are better suited to logistic distribution. At 24-month TS (Table 5), the burr distribution is the best choice at Poli, Nkongsamba and Bafia while Ngaoundere, Koundja and Douala show a preference to logistic, gamma and gumbel as the best fits respectively.

The results for all 24 stations and for eight TSs (3, 6, 9, 12, 15, 18, 21 and 24 months) are presented in Table 6. In general, the logistic and burr distributions are the most suitable for most stations except for 9-month TS where weibull followed by burr outperform the others. In Table 6, it is observed that for short (3-month) and long (> 6-month) TSs , the logistic and burr distributions are the most appropriates respectively. From the statistical point of view and at all TSs, the function burr is the most representative followed by logistic and then gamma. Table 7 summarizes the best fits for all AEZs. It is observed a few cases where functions that better fit the data are in equal numbers.

Analysis of computed SPIs with adequate and default gamma distributions

Time series of SPIs computed with adequate distributions

SPI time series were calculated using the best fit distribution at each station and results are shown in Figs. 5, 6 and 7. The SPIs on a 3-month TS (Fig. 5) show for each station, a high frequency of drought events ranging from mild to extreme categories. For 12-month SPI (Fig. 6), each station shows at least one extremely dry episode. Throughout the study period, the six stations differ markedly in the frequency of extreme drought periods (4 in Bafia and Poli, 3 in Nkongsamba, 2 in Ngaoundéré and Koundja, and none in Douala). For 24-month SPI (Fig. 7) and during the first 30 years, Ngaounderé, Douala and Nkongsamba stations only recorded very few drought but the following years recorded more frequent drought events. The dramatic drought events of the 1970s and 1980s are highlighted in each station and the magnitude and duration of the drought increased, especially from the mid-1970s.

Evaluating the shift in SPI values due to the use of gamma function instead of the appropriate functions

Figures 5, 6 and 7 show SPIs computed using both gamma and the best-fit distributions. At 3-month TS (Fig. 5), the values of SPIs obtained with the gamma distribution (\(SPI_g\)) are in general smaller than those obtained with the best distribution (\(SPI_{bd}\)), which means that gamma leads to an underestimation of extreme humidity and an overestimation of severe and extreme drought events. For the stations of the AEZ4 (Douala and Nkongsamba) in the Littoral area, the SPIs are less sensitive to the choice of the distribution function. Similar patterns are observed at 12-month TS (Fig. 6), but the differences between both SPIs (\(SPI_g\) and \(SPI_{bd}\)) are higher and more depicted. As for 24-month TS (Fig. 7), results are similar to those obtained at 3 and 12 months in the AEZs 1 and 2 (Poli and Ngaoundéré respectively) while the reverse situation is observed in the AEZ4 (Douala). However, in the AEZs 4 and 5 (Nkongsamba and Bafia respectively), the gamma distribution leads to an overestimation of extremely humid and drought intensity.

Figure 8 shows the root mean square error (RMSE) between the SPI values computed from the appropriate distribution function and the gamma function for 3, 6, 9, 12, 15, 18, 21 and 24 months TSs at each station. Results are shown for the four drought categories. In general, the RMSE increases with the severity of drought (from mild to extreme drought) for each TS and in any AEZ; likewise, it increases with TS for the AEZs 1, 2 and 5 (Poli, Ngaoundere and Bafia respectively), but for other areas (AEZs 3 and 4) no consistent increase with TS is observed and the RMSEs are the lowest, sometimes equal to zero because gamma is the most suitable and match the default function (gamma) or equal to low values due to the fact that the appropriate distribution function found has flexibility similar to the gamma function.

Discussion and conclusion

Discussion

In most studies on SPI, gamma is chosen by default as the best fit without any comparison with other distributions. It was shown that the SPI with the commonly used gamma distribution leads to shortcomings in evaluating ensemble simulations (Pieper et al. 2020). For West Africa, Okpara and Tarhule (2015) verified that the type two gamma distribution was a better model for adjusting precipitation over the Niger basin. So, it is clear that several functions can override the gamma distribution as the best fit in many stations and provide better SPI values. From this current study on a larger number of distribution functions, we found that new functions (burr and logistic) are able to better fit the data in some stations as compared to the findings of Guenang and Mkankam Kamga (2014) where only four functions (gamma, weibull, lognormal and exponential) were tested. Therefore, the choice of the appropriate distribution function depends on the geographical location of the station and the TS considered. This is confirmed by the results of the current study that show different distribution functions for different areas. The results also corroborate those of Cindric and Pasaric (2012), who suggested that it is not possible to recommend a single, optimal distribution because the ratio of skewness and the coefficient of variation of data precipitation could be the indicator for the choice of the most appropriate distribution for a particular region. Moreover, Angelidis et al. (2012) and Stagge et al. (2015) suggested that the suitable probability distribution is related with the TS of precipitation data to be fitted.

Stagge et al. (2015) compared the seven probability distributions and concluded that the gamma distribution produces the best fit for precipitation with long period (>6 months), while weibull is consistently the best for precipitation with short period (1 to 3 month). In the present study, the logistic distribution produces the best fit for precipitation with short accumulation (3 months TS), while for long periods (\(>6\) months TS) the burr distribution performs the best. Pieper et al. (2020) and Zhang and Li (2020) estimated that the appropriate probability distribution is related to the number of parameters of the PDF of the distribution to be fitted to rainfall data. They demonstrated that distributions with three parameters such as exponential weibull and \(log-logistic\) respectively, perform better than the correspondents two-parameter distributions, which is in agreement with the results obtained in this paper where the three-parameter burr distribution gave the best results in most cases. Some distribution functions such as \(chi-square\), cauchy, exponential and pareto generally show poorer fit in the study area. Guenang and Mkankam Kamga (2014) also found that the exponential distribution is the least suitable for the domain.

Table 3 Values of the K-S fit test for the ten distribution functions and for 3-month TS
Table 4 Values of the K-S fit test for the ten distribution functions and for 12-month TS
Table 5 Values of the K-S fit test for the ten distribution functions and for 24-month TS
Table 6 Best distribution functions that better fit station precipitation data at different TSs
Table 7 Best distribution functions by AEZs at different TSs

Considering the effects of different probability distributions on SPI characteristics in comparison with the default gamma distribution, it is observed that SPIs for mild and moderate droughts are less sensitive to the distribution functions used, than those corresponding to severe and extreme droughts. This statement agrees with that of Angelidis et al. (2012) who concludes that the consistency of the calculated SPI with different distributions is good for normal periods, while becoming poor for very dry or very wet periods. We also found that different probability distributions lead to a large difference in severe and extreme droughts. In fact, the SPI time series patterns obtained with gamma lead to an underestimation of extreme humidity and an overestimation of severe and extreme drought events as compare to that obtained with the best distribution. In a context of climate change, particularly due to global warming, strong fluctuations in average precipitation have a severe effect on the occurrence of drought. The current study also found that the magnitude and duration of drought increased with time for both short and long TSs. This may be the consequence of reduced precipitation resulting from climate change as suggested by Vicente-Serrano et al. (2010) and Tirivarombo et al. (2018) because temperature is also an important factor that can influence the availability of water as it controls the rates of evapotranspiration.

Fig. 5
figure 5

SPIs at 3-month TS. It is overlaid SPI computed using appropriate distribution function to that from default gamma distribution function at the stations of a) Poli (AEZ1), b) Ngaoundere (AEZ2), c) Koundja (AE3), d) Bafia (AEZ5), e) Douala (AEZ4), f) Nkongsamba (AEZ4)

Fig. 6
figure 6

Same as in Fig. 5, but for 12-month TS

Fig. 7
figure 7

Same as in Fig. 5, but for 24-month TS

Fig. 8
figure 8

RMSE between the SPI values of the appropriate distribution function and the gamma distribution function for each station and for 3, 6, 9, 12, 15, 18, 21 and 24 months TSs

Conclusion

This study was undertaken to contribute to the improvement of mathematical tools for modeling drought which is a dangerous phenomenon and whose adaptation is difficult in developing countries such as Cameroon. The SPI used as drought indicator was studied in this paper by examining the relevance of using probability distribution functions different from those commonly used to fit and describe observed precipitation data, as preliminary step for SPI computation. Ten statistical distribution functions were tested to find the best fit in each of the 24 observation stations belonging to the five AEZs of Cameroon over the period 1951–2005 and for different TSs (3, 6, 12, 15, 18, 21 and 24 months). The ML method was used to estimate the parameters of the distribution functions. The K-S statistic was used to select the distribution functions that better fit station data, which were then used to calculate the SPI. The results were used to study drought occurrence and quantify the errors made if non appropriate distribution functions were used.

The appropriate distribution function for precipitation data was found to depend on the location of the station and the number of months in TS. The gamma distribution usually used as default is not always the best fit for SPI computation if many functions were tested. It was found that the logistic probability distribution remains the best choices in most cases for 3-month TS while above, burr shows the best fit. For 12 months and more, gamma, burr and logistic are found to be the best fits for many different stations. In all cases, a significant difference was found between the SPIs calculated with the best fit function and those calculated with the default gamma distribution; the differences between both SPIs are more significant from 12-month TS with higher values of the RMSEs.

This study raises the importance and the necessity of a preliminary study consisting in finding the best distribution functions fitting the data and using them for the calculation of SPI in order to reduce errors and increase the accuracy of the results.