1 Introduction

Drought is an insidious natural hazard that results from a deficiency of precipitation from expected or “normal” such that when it is extended over a season or longer period of time, the amount of precipitation is insufficient to meet the demands of human activities and the environment. Drought is a temporary aberration, unlike aridity, which is a permanent feature of the climate. Seasonal aridity (i.e., a well-defined dry season) also needs to be distinguished from drought. These terms are often confused or used interchangeably. The differences need to be understood and properly incorporated in drought monitoring and early warning systems and preparedness plans. (Wilhite 2009).

Drought is a natural part of the climate that occurs in both high and low rainfall areas and virtually all climate regimes (Wilhite 2009; Wilhite and Buchanan 2005). The Mediterranean is one of the regions where the impacts of drought events showed an exponential increase during the last 20 years. Like other Mediterranean countries, Algeria has witnessed during the last 20 years an intense and persistent drought. This drought, which is characterized by an important rainfall deficit has affected the whole of Algeria particularly, the north-west part. Similar droughts in amplitude and intensity have yet been noticed at the beginning of the century, between 1910 and 1940 (Medejerab and Henia 2011).

An objective evaluation of drought condition in a particular area is the first step for planning water resources in order to prevent and mitigate the negative impacts of future occurrences. For this purpose, along the years, several indices have been developed to evaluate the water supply deficit in relation to the time duration of the precipitation shortage. Among them, we mention the Percent of Normal, the Standardized Precipitation Index (SPI) (McKee et al. 1993), Palmer Drought Severity Index (PDSI) (Palmer 1965), Deciles, Crop Moisture Index (CMI) (Bordi and Sutera 2007), Reconnaissance Drought Index (Tsakiris et al. 2007a, b), Streamflow Drought Index (SDI) (Nalbantis and Tsakiris 2009), Surface Water Supply Index (SWSI) (Shafer and Dezman 1982) and many other indices which we don’t mention in the present study.

Forecasting future dry events in a region is very important for finding sustainable solutions to water management and risk assessment of drought occurrence (Bordi and Sutera 2007). Furthermore, drought forecasting plays an important role in the mitigation of impacts of drought on water resources systems (Kim and Valdes 2003).

In recent years, there have been considerable methods for modeling various aspects of droughts using drought indices in different regions. The PDSI and the SPI are more commonly used indices (Mishra et al. 2007). In this study, the SPI drought index was chosen to forecast drought because it is a powerful, flexible index that is simple to calculate, also because its ability to represent drought on multiple time scales. These time scales reflect the impact of drought on the availability of the different water resources. Soil moisture conditions respond to precipitation anomalies on a relatively short scale. Groundwater, streamflow and reservoir storage reflect the longer-term precipitation anomalies (WMO 2012).

The models used for drought forecasting can be ranged from simplistic approaches to more complex models. First, time series models, often known as stochastic models, have been used: Paulo and Pereira (2006, 2007 and 2008) presented details of application of Markov chain modeling to predict drought characteristics of drought events for several sites in South Portugal. Cancelliere et al. (2007) presented stochastic methodologies to compute drought transition probabilities based on SPI index and to forecast future SPI values on the basis of past precipitation. Nalbantis and Tsakiris (2009) utilized too Markov chain procedures to predict streamflow drought occurrence. Recently, Avilès et al. (2015) applied the Markov chain First and second orders to predict the frequency of monthly droughts and checked their performance using two skill scores. Mishra and Desai (2005) used Autoregressive Integrated Moving Average (ARIMA) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models to forecast droughts using SPI series. They found that the models can be used to forecast droughts up to 2 months with reasonable accuracy. Modarres (2007) used a multiplicative seasonal autoregressive integrated moving average (SARIMA) model to the monthly streamflow forecasting for drought analysis, this study also demonstrates the usefulness of SARIMA models in forecasting, water resources planning and management. Han et al. (2010) used ARIMA models with the VTCI index, based on remote sensing data, to the drought forecasting. Fernández et al. (2009) proposed to use a multiplicative seasonal autoregressive integrated moving average model, due to the stochastic behavior of droughts, to forecast monthly streamflow with a 12 month lead time. Forecasting results were used to derive drought indices. However, because of the strength and flexibility associated with time series models, it depends on one’s capability in terms of identification, estimation and diagnostic checks to select the best model from amongst candidate models. Moreover, time series models are basically linear models assuming that data are stationary, and have a limited ability to capture nonstationarities and nonlinearities in data (Kim and Valdes 2003; Mishra and Desai 2006).

To overcome these limitations, it was necessary for hydrologists to consider alternative models to forecast non-stationary and non-linear data. In recent decades, artificial neural networks (ANNs) have shown great ability in modelling and forecasting nonlinear and non-stationary time series in hydrology and water resource engineering due to their innate nonlinear property and flexibility for modelling (ASCE 2000).

Some of the applications of ANN models in drought forecasting. Mishra and Desai (2006) compared linear stochastic models (ARIMA/SARIMA), recursive multistep neural network (RMSNN) and direct multi-step neural network (DMSNN) for drought forecasting The results obtained from the models show that recursive multi-step approach is best suited for 1 month ahead prediction. When longer lead time of 4 months is considered direct multi-step approach outperforms recursive multi-step models. Morid et al. (2007) used Artificial Neural Network (ANN) and predicts quantitative values of drought indices (the Effective Drought Index (EDI) and the Standard Precipitation Index (SPI)) using previous rain and two climate indices as explicative variables. They suggested that final forecasting models can be utilized by drought early warning systems. Marj and Meijerink (2011) proposed a feedforward multiple neural network for drought forecasting; the inputs of the model used are the climatic indices Southern Oscillation Index (SOI) and North Atlantic Oscillation (NAO) to forecast normalized difference vegetation index (NDVI).

However, the ability of artificial intelligence techniques to forecast non stationary data is limited (Belayneh et al. 2014). Indeed, they have not the ability to handle non-stationary data if pre-processing of the input data is not done. Wavelet transforms have been found to be very effective in hydrologic forecasting. Wavelet transforms provide a useful decomposition of the original time series, so that wavelet-transformed data improves the ability of a forecasting model by capturing useful information on various resolution levels (Kim and Valdes 2003). Over the last decade, several studies have investigated the ability of hybrid wavelet transforms and ANN models (WA-ANN): Kim and Valdes (2003) developed a conjunction model based on dyadic wavelet transforms and neural networks, using PDSI as a drought index to forecast droughts. The results indicated that the conjunction model significantly improves the ability of neural networks for drought forecasting. Özger et al. (2012) made a comparison between a coupled wavelet fuzzy logic (WFL) model with an artificial neural network (ANN) model and a coupled wavelet and ANN (WANN) model, they showed that WFL was more accurate for drought forecasting. Belayneh and Adamowski (2012) compared the effectiveness of three data-driven models for forecasting drought conditions using artificial neural networks (ANNs), support vector regression, and wavelet neural networks. The forecast results indicate that the coupled wavelet neural network models were the best models for forecasting SPI values over multiple lead times. Belayneh et al. (2014) compared the effectiveness of five data-driven models for forecasting long term droughts; the forecast results indicate that the coupled wavelet neural network models were better than the others. Sang (2013) proposed an improved wavelet modeling framework for hydrologic time series forecasting. Recently, a comprehensive review on application of hybrid wavelet-AI models in hydrology has been provided by Nourani et al. (2014). The authors have also emphasized the importance of hybrid models for drought forecasting.

The main objective of the present study is to investigate artificial intelligence techniques for forecasting droughts in a semi arid region (in this case the Algerois catchment in Algeria) and to compare their performance with a traditional time series modelling technique ARMA. The techniques employed in this study include artificial neural networks (ANN) and wavelet neural networks (WANN) to forecast drought, using SPI as drought quantifying parameter. SPI-3, SPI-6 and SPI-12, indicators of short and long terms drought conditions, were forecast for different lead times. The standard statistical performance measures are employed to evaluate the developed models.

In previous works, drought forecasting using WANN models was carried out only with the Haar (db1) wavelet, the analysis of the effect of different vanishing moments of Daubechies wavelets, on the performance of different lead times drought forecasting, have never been explored to date; so this will be the principal contribution of this work. In addition, it should be noted that forecasting droughts has never been explored to date in the study area.

2 Materials and Methods

2.1 The Standardized Precipitation Index

McKee et al. (1993) developed the Standardized Precipitation Index (SPI) for the purpose of defining and monitoring drought. The SPI drought index was chosen to forecast drought in this study due to the following advantages:

The first advantage is its simplicity. Unlike other drought indices, the SPI is based only on precipitation amount. Second, the SPI is versatile: it can be calculated on any time scale, which makes it possible to deal with many of the drought types. The third advantage of the SPI is that, because of its normal distribution, the frequencies of the extreme and severe drought classifications for any location and any time scale are consistent. In addition, since it provides a dimensionless index, the SPI values can be compared easily across space and time. Finally, because it is based only on precipitation and not on soil moisture conditions as is the PDSI, the SPI is just as effective during winter months and is also not adversely affected by topography (Hayes et al. 1999). Details on SPI computation are given in the work of Edwards and McKee (1997).

2.2 Stochastic Models

The stochastic models which are often known as time series models (ARIMA) have been used in scientific, economic and engineering applications for the analysis of time series. Time series modelling techniques have been shown to provide a systematic empirical method for simulating and forecasting the behavior of uncertain hydrologic systems and for quantifying the expected accuracy of the forecasts which can be found in lot of research literatures (Mishra and Desai 2006). There are two classes of stochastic models, which are described below.

2.2.1 Nonseasonal Model

The autoregressive (AR) and moving average (MA) models are in fact subsets of the general autoregressive moving average ARMA family of models which can be used when the data are stationary. However, in general, time series that reflect hydrologic aspects may be nonstationary over any time interval being considered (Hipel and Mcleod 1994). Models which describe such nonstationary behavior can be obtained by supposing some suitable difference of the process to be stationary. This important class of models for which the dth difference is a stationary mixed auto-regressive moving average process. These models are called auto-regressive integrated moving average (ARIMA) processes (Box and Jenkins 1976). The general non-seasonal ARIMA (p,d,q) model is given by:

$$ {\phi}_p(B){\nabla}^d{x}_t={\theta}_q(B){a}_t $$
(1)

2.2.2 Seasonal Model

The seasonal autoregressive integrated moving average (SARIMA) models are useful for modelling seasonal time series in which the mean and other statistics for a given season are not stationary across the years (Hipel and McLeod 1994).

The general form of seasonal model SARIMA(p,d,q)(P,D,Q)s, where (p,d,q) is the non seasonal part of the model and (P,D,Q) is the seasonal part of the model, which is given by:

$$ {\phi}_p(B){\varPhi}_P\left({B}^s\right){\nabla}^d{\nabla}_s^D{x}_t={\theta}_q(B){\varTheta}_Q\left({B}^s\right){a}_t $$
(2)

where, {xt} is the nonstationary time series;{a t } is the usual Gaussian white noise process;ϕ p(B) and θq(B) are nonseasonal AR and MA operators of orders p and q, respectively; ΦP(B s) and ΘQ(B s) are seasonal AR and MA operators in B of orders P and Q, respectively; p is the order of nonseasonal autoregression; d is the number of nonseasonal differences; q is the order of nonseasonal moving average; P is the order of seasonal autoregression; D is the number of seasonal differences; Q is the order of seasonal moving average; S is the length of season; ∇d and ∇ D s are nonseasonal and seasonal difference operators; B is the backshift operator. The expressions are shown as follows:

$$ {\phi}_p(B)=\left(1-{\phi}_1B-{\phi}_2{B}^2-\dots -{\phi}_p{B}^p\right) $$
(3)
$$ {\varPhi}_P\left({B}^s\right)=\left(1-{\varPhi}_s{B}^s-{\varPhi}_{2s}{B}^{2s}-\dots -{\varPhi}_{Ps}{B}^{Ps}\right) $$
(4)
$$ {\theta}_q(B)=\left(1-{\theta}_1B-{\theta}_2{B}^2-\dots -{\theta}_q{B}^q\right) $$
(5)
$$ {\varTheta}_Q\left({B}^s\right)=\left(1-{\varTheta}_s{B}^s-{\varTheta}_{2s}{B}^{2s}-\dots -{\varTheta}_{Qs}{B}^{Qs}\right) $$
(6)
$$ {B}^k{x}_t={x}_{t-k} $$
(7)
$$ {\nabla}^d={\left(1-B\right)}^d $$
(8)
$$ {\nabla}_s^D={\left(1-{B}^s\right)}^D $$
(9)

2.3 Artificial Neural Networks (ANN)

Neural networks are a class of flexible nonlinear models that can discover patterns adaptively from the data. Theoretically, it has been shown that given an appropriate number of nonlinear processing units, neural networks can learn from experience and estimate any complex functional relationship with high accuracy. Although many types of neural network models have been proposed, the most popular one for time series forecasting is the feed-forward model. A feed-forward ANN comprises a system of units, analogous to neurons that are arranged in layers (Imrie et al. 2000). A multilayer perceptron (MLP) is the most popular neural network architecture. MLPs have often been used in hydrology for their simplicity. MLP is typically composed of several layers of nodes (neurons). An input layer, which is the first layer, receives external information. The problem solution is obtained in an output layer which is the last layer. One or more intermediate layers, which are called hidden layers, separate the input and output layers. The nodes in adjacent layers are usually fully connected by acyclic arcs, which are called synapses, from the input layer to the output layer (Zhang et al. 1998). The proposed model structure is presented in the next section.

2.4 Wavelet Artificial Neural Networks (WANN)

WANN denotes the conjunction of wavelet decomposition and ANN. The wavelet decomposition is employed to decompose an input time series into approximations and details components. The decomposed time series are used as inputs to ANN model.

2.4.1 Wavelet Decomposition

Wavelets are mathematical functions that give a time-scale representation of the time series and their relationships to analyse time series that contain nonstationarities. Wavelet analysis allows the use of long time intervals for low frequency information and shorter intervals for high frequency information. Wavelet analysis is capable of revealing aspects of data like trends, breakdown points, and discontinuities that other signal analysis techniques might miss. Furthermore, it can often compress or de-noise a signal (Kim and Valdes 2003, Adamowski and Sun 2010)

The continuous wavelet transform (CWT) at time t for a time series f(t) is defined as follows (Shoaib et al. 2014):

$$ {W}_{a,b}(t)={\displaystyle {\int}_{-\infty}^{+\infty }f(t)}\ \frac{1}{\sqrt{a}}\ {\psi}^{*}\left(\frac{t-b}{a}\right)dt $$
(10)

where * refers to the complex conjugate of the function. The entire range of the signal is analysed by the wavelet function by using two parameters, namely, ‘a’ and ‘b’. The parameters ‘a’ and ‘b’ are known as the dilation (scale) parameter and translation (position) parameter, respectively.

A disadvantage of these non-orthogonal wavelets is that the CWT of a given signal is characterized by redundancy of information among the wavelet coefficients. This redundancy, on account of correlation between coefficients, is intrinsic to the wavelet-kernel and not a characteristic of the analysed signal. As an alternative, for practical applications (as in the study of noise reduction models for communication systems and image and signal compression), Discrete Wavelet Transform (DWT) is usually preferred (Maheswaran and Khosa 2012). The DWT scale and position are based on the power of two (dyadic scales and positions) and can be defined as (Shoaib et al. 2014):

$$ W{\left(a,b\right)}_D={2}^{\raisebox{1ex}{$-j$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}{\displaystyle {\int}_{j=1}^{j=J}{\psi}^{*}}\left({2}^{\raisebox{1ex}{$-j$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}-k\right)f(t)dt $$
(11)

where the real numbers j and k are the integers that control the wavelet dilation and translation, respectively. In the DWT, a signal S passes through the low pass filter Lo and the high pass filter Ho followed by down sampling (keeping only one data point out of two) with both filters. At decomposition level n, the high pass filter followed by down sampling produces the detail information D(n) of the signal while the low pass filter associated with the scaling function produces the approximation A(n). This decomposition (filtering and down sampling) process continues until the desired level is achieved.

The DWT is generally a non-redundant transform and, accordingly, only a minimally required number of wavelet coefficients are preserved at each level of decomposition which, as a further consequence, enables reconstruction of the original signal from a reduced number of wavelet coefficients. While this property is useful in applications such as data and image compression, this type of non-redundant DWT, however, is prone to shift sensitivity and is therefore an undesirable feature when applied to problems related to singularity detection, forecasting and nonparametric regression (Maheswaran and Khosa 2012).

The above-mentioned disadvantages of DWT have been discussed at length in Aussem et al. (1998), Renaud et al. (2005), Adamowski and Sun (2010), Maheswaran and Khosa (2012) and Belayneh et al. (2014). To overcome this problem, these authors have proposed the use of the a’ trous algorithm. Corresponding to the original series x(t), smoother versions of x(t) are defined at different scales as given by Eqs. (12) and (13)

$$ {c}_0(t)=x(t) $$
(12)
$$ {c}_j(t)={\displaystyle \sum_{l=-\infty}^{\infty }}h(l){c}_j\left(t+{2}^{j-1}l\right) $$
(13)

in the preceding Eq. (13), j takes values from 1, 2, 3,...., J where J is the level of decomposition and ‘h’ is a low pass filter with compact support.

The detail component of x(t) at level i is defined as

$$ {d}_j(t)={c}_{j-1}(t)-{c}_j(t) $$
(14)

In Eq. (14), the set {d 1, d 2, …, d p , c p } represents the additive wavelet decompositions of the data up to resolution level p and cp is the residual component or the approximation.

2.5 Performance Measures

The forecast performance of all developed models was evaluated using the following measures of goodness of fit:

Nash-Sutcliffe Efficiency coefficient (NSE)

the Nash-Sutcliffe model efficiency coefficient is better suited to evaluate the goodness-of-fit than the coefficient of determination. NSE is sensitive to additive and proportional differences between predictions and observations. However, NSE is overly sensitive to extreme values because it squares the values of paired differences (Legates and McCabe 1999).

$$ NSE=1-\frac{{\displaystyle {\sum}_{i=1}^N}{\left(SP{I}_i-\widehat{SP{I}_i}\right)}^2\ }{{\displaystyle {\sum}_{i=1}^N}{\left(SP{I}_i-\overline{SPI}\right)}^2} $$
(15)

Root Mean Squared Error and Mean Absolute Error

(RMSE) and (MAE) are well accepted absolute goodness-of-fit indicators for continuous variables, that describe the difference in observed and predicted values in the appropriate units (Legates and McCabe 1999). They are calculated as shown (Eqs (16) and (17)).

$$ RMSE=\sqrt{\frac{{\displaystyle {\sum}_{i=1}^N}{\left(SP{I}_i-\widehat{SP{I}_i}\right)}^2}{N}} $$
(16)
$$ MAE=\frac{{\displaystyle {\sum}_{i=1}^N}\left|SP{I}_i-\widehat{SP{I}_i}\right|}{N} $$
(17)

In general, high values of NSE (up to 100 %) and small values for RMSE and MAE indicate a good model. These measures can be used to evaluate performance of hydrological models satisfactorily (Legates and McCabe 1999).

2.6 Study Area and Database

The case study area comprises the western part of the Algerois catchment (Fig. 1), it has an area of 5225.3 km2, a Mediterranean climate with a mean annual rainfall between 600 and 800 mm in the coastal regions and between 500 and 1000 mm in the interior regions.

Fig. 1
figure 1

Location of rain gauge stations used in the study

In this study, Thiessen polygon method was used to estimate mean areal rainfall using historical monthly rainfall data from year 1936 to 2008 available at 17 rainfall gauging stations.

The observations number and rain gauge density seem reasonably correct in the present study. Indeed, the minimum number of 50 observations is needed to build reasonable ARIMA model (Wei 1990). In addition, the minimum rain gauge density for flat regions of Mediterranean and tropical zones, according to World Meteorological Organization, is one per 600–900 km2.

The statistical tests, K-S and chi-square tests are employed and showed that rainfall in the basin follows a Gamma distribution.

The regional SPI time series were then calculated based on average areal precipitation estimated, as stated earlier, using Thiessen polygon method. The SPI series of different time scales (SPI-3, SPI-6 and SPI-12) are shown in Fig. 2.

Fig. 2
figure 2

SPIs series over different time scales based on average rainfall

3 Results and Discussions

3.1 Models Development Results

3.1.1 Stochastic Models

The time series model development consists of three stages, i.e. identification, estimation and diagnostic check (Box and Jenkins 1976).

For all developed SPI time scales series, data set from 1936 to 1994 is used to estimate the model parameters and the data from 1995 to 2008 is used to check the forecast accuracy. For illustration, one example was briefly described (SPI-12).

At the identification stage, the more appropriate models to fit to the data can be tentatively selected by examining various types of graphs: The ACF and the PACF of the original time series of SPI-12, indicate a nonstationary behavior and suggests a Differencing. So first difference was used to remove the trend and difference at lag 12 was used to remove the seasonal variation. The plot of ACF and PACF after Differencing is shown in Fig. 3. In the differenced series of SPI-12, it is observed that the ACF curve decay fast with significant spikes at lag 12, 24, indicating a stationary behavior. In the PACF there are significant spikes at lag 12, 24, 36, 48. So the SARIMA (0, 1, 0)(P, 1,Q)12 model could be fitted to the differenced SPI-12 series.

Fig. 3
figure 3

ACF and PACF plots used for the selection of candidate models for SPI-12 series

The model that gives the minimum Akaike Information Criterion (AIC) and Schwarz Bayesian Criterion (SBC) was selected as best fit model (Mishra and Desai 2005).

The mathematical formulation for the AIC (Akaike 1974) is defined as:

$$ AIC=-2 log\ L+2m $$
(18)

where m = (p + q + P + Q) is the number of terms estimated in the model and L denotes the likelihood function of the ARIMA models and it is a monotonically decreasing function of the sum of squared residuals. The mathematical formulation for the SBC (Schwarz 1978) is defined as:

$$ SBC=-2 log\ L+m\ ln(n) $$
(19)

where n denotes the number of observations.

The identification of the best model for different SPI series based on minimum AIC and SBC criteria is given in Table 1. After the identification of model, estimation of parameters was done using least square method.

Table 1 Comparison of AIC and SBC for selected candidate models

For a good forecasting model, the residuals left over after fitting the model should be white noise (Mishra and Desai 2005). Portmanteau test is also employed to check the independence of residuals (Table 2). The Q(r) statistics values are compared with χ 2 distribution with respective degree of freedom at a 5 % significance level. From Table 2 it can be seen that the residuals from the best models are white noise.

Table 2 Portmanteau test for SPI series

3.1.2 ANN Models

Figure 4 shows the proposed three-layer feed-forward model. The input nodes SPIt, SPIt-1,.., SPIt-n are the previous lagged observations while the output SPIt+L provides the forecast for the future value where L is the lead time. In this study L was varied from 1 to 6 months lead time.

Fig. 4
figure 4

The proposed Feed-forward Neural network

ANN application basically consists of three steps: network architecture definition, network training, and network testing.

One of the most commonly-used algorithms for ANN training is the Levenberg Marquardt (LM) Back propagation algorithm, which is applied in the present study. Only one hidden layer was considered, with a tangent sigmoid activation function between the input and hidden layers, and a linear activation function between hidden and output layers.

For both ANN and WANN, available data from 1936 to 2008 was divided into three subsets. First 60 % of the samples are assigned to the training set, the next 20 % to the validation set, and the remaining 20 % of the data to the testing set.

For ANN model development, optimal network architecture for different SPI series was found by trial-and-error approach. Performance criteria (NSE, RMSE, MAE), for each combination taking different number of input and hidden neurons, between observed and forecasted data were calculated. Input neurons ranging from 1 to 20 were tested. For each input layer dimension, the number of hidden nodes was progressively increased from 1 to 20. The combination having the best performance measures was chosen as optimal network. To prevent overfitting, the early stopping technique was used in this study.

Nash-Sutcliffe Efficiency for different combination of input neurons and hidden neurons (hn) for SPI-12, based on ANN model is shown in Fig. 5. Results indicate that the RMSE and the MAE exhibit the same evolution as indicated by the Nash-Sutcliffe efficiency. Based on performance criteria, it’s observed that Beyond 13 input neurons, there is no improvement. So ANN with 13 input neurons and 4 hidden neurons is chosen to forecast drought. In a similar way the optimal architecture for the other SPI series are calculated and shown in Table 3. For example, the optimal structure for SPI-3 and for 1-month ahead forecast was determined as 4-10-1. Here the numbers indicate, respectively, a neural network model with 4 input, 10 hidden and 1 output nodes

Fig. 5
figure 5

Evolution of Nash-Sutcliffe efficiency for different input number

Table 3 The best ARIMA models forecasts

3.1.3 WANN Models

In forecasting task, the selections of an efficient mother wavelet and decomposition level are two important issues (Nourani et al. 2014). Detailed analyses regarding the performance of different mother wavelets in hydrological simulations have led to the conclusion that, to determine the ideal mother wavelet for a given problem, a variety of mother wavelets should be tested through a trial-and-error process (Maheswaran and Khosa 2012; Nalley et al. 2012; Nourani et al. 2011; Sang 2012).

The extremal-phase wavelets including Haar and dbN wavelets, where N refers to the number of vanishing moments, are also called Daublets or Daubechies wavelets. Daubechies wavelets are one of the most widely used wavelets. They represent a collection of orthogonal mother wavelets with compact support, characterized by a maximal number of vanishing moments for some given length of the support.

Maheswaran and Khosa (2012) in their study concluded that wavelets with wider support and higher vanishing moments, such as the daubchies-2 (db2) and spline-B3 wavelets, are recommended for time series having long term memory and nonlinear feature, so this is why all daubchies wavelets were used in this study.

Furthermore, the optimum decomposition level was usually determined through a trial-and-error process, but afterwards a formula which relates the minimum level of decomposition, L, to the number of data points within the time series Ns, was introduced in the literature (Aussem et al. 1998; Nourani et al. 2009b; Wang and Ding 2003):

$$ L=int\left[ log{N}_s\right] $$
(20)

Later, Nourani et al. (2011) criticized the outcome of this formula, stating that having been derived for fully autoregressive (AR) signals, it only considers time series length, without paying any attention to seasonal effects.

In this study, The ANN models are built such that the DWs of the original SPI time series are the inputs to the ANN and the original un-decomposed SPI time series are the outputs of the ANN.

This study aims at examining the effects of different mother wavelets on the efficiency of developed models. For this purpose, the performance of ANN models was investigated for different mother wavelets, including Haar wavelet (db1) and daubechies wavelets (dbn, with n varying between 2 and 17).

Furthermore, for all forecast lead times, each SPI time series was decomposed between 1 and 6 levels to determine the optimum decomposition level.

For 1-month lead time forecast, the results of SP-3, illustrated by Figs. 6 and 7a, show that db 15 is the best mother wavelet with an optimum decomposition level of 4. Figure 7b gives more details for optimal level choice.

Fig. 6
figure 6

Effect of mother wavelet and decomposition level on NSE and RMSE (lead time: 1 month)

Fig. 7
figure 7

Effect of decomposition level on NSE for SPI-3 time series (lead time: 1 month)

On the basis of performance measures, it can be concluded that the hybrid WANN model with the db15 wavelet function performed best among all the 17 wavelets being tested in this paper. The db15 wavelet function with fifteen vanishing moments best describes the temporal and the spectral information in the SPI-3 data for the WANN hybrid models as suggested by the results of the present study. This may be due to the fact that by decomposing a signal up to the optimum level, the information contained in the data can be extracted more efficiently by a complex wavelet function such as db15.

Figure 7b shows that the NSE values increases with the increase in the decomposition level with maximum value being obtained at level four.

For the other lead times the db15 was fixed based on the aforementioned results, but the level decomposition was varied between 1 and 6 and the optimal model structures are presented in Table 4.

Table 4 The best ANN models forecasts

In a similar manner, for SPI-6 and SPI-12 results depict that the best mother wavelet was db13 and the optimum decomposition level lies between 2 and 4. Table 4 shows the different WANN optimal structures for all lead times.

Once SPI time series were successfully decomposed, these coefficients of details and approximations were used as input to ANN component of the hybrid model to obtain predicted output. The output signals were kept as original series without decomposition.

3.2 Evaluation of Developed Models in Drought Forecasting

Using the optimal structures of SARIMA, ANN and WANN models, the SPI time series for the three time scales were forecasted for 1 to 6-months lead times. The performance criteria obtained are shown in Tables 3, 4 and 5.

Table 5 The best WANN models forecasts

Analyzing the obtained results, it is possible to see that the SARIMA and ANN models present the same precision for all SPI time scales and for all lead time forecasts. An example of forecasts of SPI-12 for 1 month ahead is shown in Fig. 8, where the proposed SARIMA and ANN models have respectively NSE of 85.76, 86.08 % and RMSE of 0.3209, 0.3252.

Fig. 8
figure 8

A comparison between observed data and forecasts over 1-month lead time using SARIMA and ANN for SPI-12

It may be seen that, for all developed models, the forecast effectiveness is lower in SPI-3, the best result is obtained in the forecast of SPI-12 (Example for 1 month lead time ANN forecasts NSE is equal to 44.99 and 86.08 % respectively for SPI-3 and SPI-12) (Tables 3, 4 and 5). This is due to the high temporal variability in precipitation in SPI-3, while for the other scales this variability is attenuated because more monthly data could be summed. Thus, while the SPI time scale increases, the SPI forecast is improved.

For SARIMA and ANN models, it is possible to see that, when the lead time forecast is increased a significant deterioration of models accuracy is observed which is very common in hydrological forecasting (NSE from 85.76 to 19.71 % and 86.08 to 20.95 % for SARIMA and ANN forecasts of SPI-12 respectively). For 2 and 3 months lead times, only SPI-12 presents consistent results. Over 3 months lead forecasts, neither SARIMA nor ANN are able to represent the drought tendency as it can be seen in Tables 3 and 4. Whereas, the WANN models have very good accuracy even for 6-months lead time forecasts with NSE equal to 96.07 %.

Figure 8 provides ANN and SARIMA model results for SPI-12 at 1 month lead time. Figure 8 indicates that the models have good generalization ability for SPI-12 and very little time shift error as the extreme events for the predicted values correspond to the extreme values of the observed values.

Wavelets transforms pre-processing leading to the WANN models depicts a great increasing in model’s efficiency for all SPI time scales. The results of all the proposed WANN models with lead time forecasts varying from 1 to 6 months can be found in Table 5. The use of wavelets as a pre-processing tool resulted in more accurate and precise models, according to the performance measures used, as seen in Table 5. The model results presented a higher level of the NSE criteria, RMSE and MAE values are lower compared to ANN models. Indeed, it can be seen that the forecasting results of SPI-3 for all the lead times used have been significantly improved (Example: for 6 months ahead forecast of SPI-3 the NSE passed from −3.43 % for the ANN model to 69.20 % for the WANN (with db15wavelet)).

Moreover, number of hidden nodes decreases compared to the ANN models; this is due to fact that wavelet decomposition reduces SPI series complexity. This result makes models more parsimonious. Forecasts of SPI-12 (for 1 and 6 months lead times) results are shown in Figs. 9 and 10. The Fig. 9a indicates that the WANN forecasts closely mirror the observed results and do not underestimate extreme events of drought. Figure 9b is a scatter plot of the observed and predicted SPI values from the WANN model. The figure shows that the points are closer to the trend line and there are no points of significant overestimation or underestimation. Based on the findings related to this study, WANNs have the strong ability to capture the variations of all SPIs time series used (SPI-3, SPI-6 and SPI12).

Fig. 9
figure 9

a A comparison between observed data and forecasts over 1-month lead time using WANN for SPI-12; b Scatter plot from WANN forecasts of SPI-12 (1-month lead time)

Fig. 10
figure 10

a A comparison between observed data and forecasts over 6-month lead time using WANN for SPI-12; b Scatter plot from WANN forecasts of SPI-12 (6-month lead time)

4 Conclusions

Early indication of possible drought can help to set out drought mitigation strategies and measures in advance. Therefore, drought forecasting plays an important role in the planning and management of water resource systems by reducing drought-related impact significantly.

This paper investigates the accuracy of different models for forecasting droughts using SPI as drought indicator in the algerois basin, North Algeria. The SPI is used, as mentioned earlier, due to its lot of advantages over other drought indices and because it can be used to quantify most types of drought events. This why SPI-3, SPI-6 and SPI-12 were used. The specific objectives are to develop and evaluate stochastic models and artificial neural networks for improving SPI (at different time scales) forecasting for different lead times. Comparing and evaluating of forecasting models based on various performance measures, including the Nash-Sutcliffe model efficiency coefficient, the root Mean Squared Error (RMSE) and the Mean Absolute Error (MAE). The following conclusions can be drawn from the study:

  • Overall, for SPI-12 all models used founds to provide good forecasts at 1 month lead time compared to SPI-3 and SP-6;

  • Despite linear property of SARIMA models they yielded satisfactory performances, with respect to various model efficiencies, for SPI-12 forecasts at 1 month lead time. Whereas for the other SPI time scales (SPI-3 and SPI-6) they cannot give accurate predictions.

  • Forecast results deteriorated as the forecast lead time increased for all the models;

  • The wavelet- pre-processing for SPI time series aids in improving the model performance by capturing helpful information on various resolution levels. Indeed, WANN models have the strong ability to capture the variations of SPI on different time scales, especially, for SPI-3, so the NSE increased, for example for 1 month lead time, from 0.4479 for ANN model to 0.9745 for WANN model, an increase of 54 % which in not negligible. Moreover, good performances were obtained up to 6 months lead time.

  • Wavelet analysis also reduces the phenomenon complexity and therefore it reduces the number of hidden nodes.

The present study focused on the forecasting of SPI at three time scales for short lead times, recommendations for future studies include: exploring lead times 1 year for SPI-3 and SPI-6 and 2 years for SPI-12; exploring the use of other mother wavelets to improve the SPI-3 forecasts accuracy; and comparing the use of other input variables and other artificial intelligence models in conjunction with wavelet pre-processing.

The developed models should be applied to forecast droughts for all the other Algerian basins, having the same climatological and hydrological conditions as the algerois basin, for the development of a drought preparedness plan in the region so as to ensure sustainable water resource planning and management. For other regions including the arid and semi-arid regions, the same approach could be investigated in future studies.